Book Image

Deep Learning with TensorFlow 2 and Keras - Second Edition

By : Antonio Gulli, Amita Kapoor, Sujit Pal
Book Image

Deep Learning with TensorFlow 2 and Keras - Second Edition

By: Antonio Gulli, Amita Kapoor, Sujit Pal

Overview of this book

Deep Learning with TensorFlow 2 and Keras, Second Edition teaches neural networks and deep learning techniques alongside TensorFlow (TF) and Keras. You’ll learn how to write deep learning applications in the most powerful, popular, and scalable machine learning stack available. TensorFlow is the machine learning library of choice for professional applications, while Keras offers a simple and powerful Python API for accessing TensorFlow. TensorFlow 2 provides full Keras integration, making advanced machine learning easier and more convenient than ever before. This book also introduces neural networks with TensorFlow, runs through the main applications (regression, ConvNets (CNNs), GANs, RNNs, NLP), covers two working example apps, and then dives into TF in production, TF mobile, and using TensorFlow with AutoML.
Table of Contents (19 chapters)
17
Other Books You May Enjoy
18
Index

Transformer architecture

Even though the transformer architecture is different from recurrent networks, it uses many ideas that originated in recurrent networks. It represents the next evolutionary step of deep learning architectures that work with text, and as such, should be an essential part of your toolbox. The transformer architecture is a variant of the Encoder-Decoder architecture, where the recurrent layers have been replaced with Attention layers. The transformer architecture was proposed by Vaswani, et al. [30], and a reference implementation provided, which we will refer to throughout this discussion.

Figure 7 shows a seq2seq network with attention and compares it to a transformer network. The transformer is similar to the seq2seq with Attention model in the following ways:

  1. Both source and target are sequences
  2. The output of the last block of the encoder is used as context or thought vector for computing the Attention model on the decoder
  3. The target sequences...