Book Image

Mastering TensorFlow 1.x

Book Image

Mastering TensorFlow 1.x

Overview of this book

TensorFlow is the most popular numerical computation library built from the ground up for distributed, cloud, and mobile environments. TensorFlow represents the data as tensors and the computation as graphs. This book is a comprehensive guide that lets you explore the advanced features of TensorFlow 1.x. Gain insight into TensorFlow Core, Keras, TF Estimators, TFLearn, TF Slim, Pretty Tensor, and Sonnet. Leverage the power of TensorFlow and Keras to build deep learning models, using concepts such as transfer learning, generative adversarial networks, and deep reinforcement learning. Throughout the book, you will obtain hands-on experience with varied datasets, such as MNIST, CIFAR-10, PTB, text8, and COCO-Images. You will learn the advanced features of TensorFlow1.x, such as distributed TensorFlow with TF Clusters, deploy production models with TensorFlow Serving, and build and deploy TensorFlow models for mobile and embedded devices on Android and iOS platforms. You will see how to call TensorFlow and Keras API within the R statistical software, and learn the required techniques for debugging when the TensorFlow API-based code does not work as expected. The book helps you obtain in-depth knowledge of TensorFlow, making you the go-to person for solving artificial intelligence problems. By the end of this guide, you will have mastered the offerings of TensorFlow and Keras, and gained the skills you need to build smarter, faster, and efficient machine learning and deep learning systems.
Table of Contents (21 chapters)
19
Tensor Processing Units

Preparing the data for word2vec models

We shall use the popular PTB and text8 datasets for our demonstrations.

The Penn Treebank (PTB) dataset is a by-product of Penn Treebank project carried out at UPenn (https://catalog.ldc.upenn.edu/ldc99t42). The PTB project team extracted about one million words from the three years of Wall Street Journal stories and annotated them in Treebank II style. The PTB dataset comes in two flavors: Basic Examples, that are about 35 MB in size, and Advanced Examples, that are about 235 MB in size. We shall use the simple dataset that consists of 929K words for training, 73K words for validation, and 82K words for testing. You are encouraged to explore the advanced dataset. Further details on the PTB dataset are available at the following link: http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz.

The PTB dataset can be downloaded from the...