-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Advanced Natural Language Processing with TensorFlow 2
By :
"A picture is worth a thousand words" is a famous adage. In this chapter, we'll put this adage to the test and generate captions for an image. In doing so, we'll work with multi-modal networks. Thus far, we have operated on text as input. Humans can handle multiple sensory inputs together to make sense of the environment around them. We can watch a video with subtitles and combine the information provided to understand the scene. We can use facial expressions and lip movement along with sounds to understand speech. We can recognize text in an image, and we can answer natural language questions about images. In other words, we have the ability to process information from different modalities at the same time, and then put them together to understand the world around us. The future of artificial intelligence and deep learning is in building multi-modal networks as they closely...
Change the font size
Change margin width
Change background colour