Book Image

Python Deep Learning Cookbook

By : Indra den Bakker
Book Image

Python Deep Learning Cookbook

By: Indra den Bakker

Overview of this book

Deep Learning is revolutionizing a wide range of industries. For many applications, deep learning has proven to outperform humans by making faster and more accurate predictions. This book provides a top-down and bottom-up approach to demonstrate deep learning solutions to real-world problems in different areas. These applications include Computer Vision, Natural Language Processing, Time Series, and Robotics. The Python Deep Learning Cookbook presents technical solutions to the issues presented, along with a detailed explanation of the solutions. Furthermore, a discussion on corresponding pros and cons of implementing the proposed solution using one of the popular frameworks like TensorFlow, PyTorch, Keras and CNTK is provided. The book includes recipes that are related to the basic concepts of neural networks. All techniques s, as well as classical networks topologies. The main purpose of this book is to provide Python programmers a detailed list of recipes to apply deep learning to common and not-so-common scenarios.
Table of Contents (21 chapters)
Title Page
About the Author
About the Reviewer
Customer Feedback

Understanding videos with deep learning

In Chapter 7Computer Vision, we showed how to and segment objects in single images. The objects in these images were fixed. However, if we add a temporal dimension to our input, objects can move within a certain scene. Understanding what is happening throughout multiple frames (a video) is a much harder task. In this recipe, we want to demonstrate how to get started when tackling videos. We will focus on combining a CNN and an RNN. The CNN is used to extract features for single frames; these features are combined and used as input for an RNN. This is also known as stacking, where we build (stack) a model on top of another model.

For this recipe, we will be using a dataset that contains 13,321 short videos. These videos are distributed over a total of 101 different classes. Because of the complexity of this task, we don't want to train our models from scratch. Therefore, we will be using the pretrained weights of the InceptionV3 model provided within...