Book Image

Predictive Analytics with TensorFlow

By : Md. Rezaul Karim
Book Image

Predictive Analytics with TensorFlow

By: Md. Rezaul Karim

Overview of this book

<p>Predictive analytics discovers hidden patterns from structured and unstructured data for automated decision-making in business intelligence.</p> <p>This book will help you build, tune, and deploy predictive models with TensorFlow in three main sections. The first section covers linear algebra, statistics, and probability theory for predictive modeling.</p> <p>The second section covers developing predictive models via supervised (classification and regression) and unsupervised (clustering) algorithms. It then explains how to develop predictive models for NLP and covers reinforcement learning algorithms. Lastly, this section covers developing a factorization machines-based recommendation system.</p> <p>The third section covers deep learning architectures for advanced predictive analytics, including deep neural networks and recurrent neural networks for high-dimensional and sequence data. Finally, convolutional neural networks are used for predictive modeling for emotion recognition, image classification, and sentiment analysis.</p>
Table of Contents (20 chapters)
Predictive Analytics with TensorFlow
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Index

Preface

The continued growth in data, coupled with the need to make increasingly complex decisions against that data, is creating massive hurdles that prevent organizations from deriving insights in a timely manner using traditional approaches. Machine learning is concerned with algorithms that transform raw data into information and then into actionable intelligence. This fact makes machine learning well suited to the predictive analytics. Without machine learning, therefore, it would be nearly impossible to keep up with these massive streams of information altogether.

On the other hand, deep learning is a branch of machine learning algorithms based on learning multiple levels of representation. A deep learning algorithm is nothing more than the implementation of a complex and deep neural network so that it can learn through the analysis of large amounts of data. Thus, it took just a few years to develop powerful deep learning algorithms to recognize images, natural language processing, and perform a myriad of other complex tasks.

Considering these motivations and requirements, this book is dedicated to developers, data analysts, machine learning practitioners, and deep learning enthusiasts who want to build powerful, robust, and accurate predictive models with the power of TensorFlow from scratch, and combining other open source Python libraries.

The first section of this book covers applied math, statistics, and probability theory for predictive analytics. It will then cover useful Python packages to getting started with data science in a practical manner. The second section shows how to develop large-scale predictive analytics pipelines using supervised learning algorithms, for example, classification and regression; and unsupervised learning algorithms, for example, clustering. It'll then demonstrate how to develop predictive models for NLP.

Finally, reinforcement learning and a factorization machine-based recommendation system will be used to develop predictive models. The third section covers practical mastery of deep learning architectures for advanced predictive analytics, including deep neural networks and recurrent neural networks for high-dimensional and sequence data. Finally, it'll show how to develop convolutional neural networks-based predictive models for emotion recognition, image classification, and sentiment analysis.

Happy Reading!

What this book covers

Chapter 1, Basic Python and Linear Algebra for Predictive Analytics, discusses the basic concepts in linear algebra for predictive analytics, such as vectors, matrices, tensors, linear dependence, and span. Then, we move on to a brief introduction to Principal Component Analysis (PCA) and Singular Value Decomposition (SVD). Finally, some predictive modeling tools in Python will be discussed.

Chapter 2, Statistics, Probability, and Information Theory for Predictive Modeling, covers some statistic, probabilistic, and information theory concepts before getting started on predictive analytics: random sampling, hypothesis testing, chi-square test, correlation, expectation, variance, covariance and Bayes' rule, and so on. It then discusses the central objects of probability theory: random variables, stochastic processes, and events. Information theory, which studies the quantification, storage, and communication of information, will be discussed at the end of the chapter.

Chapter 3, From Data to Decisions - Getting Started with TensorFlow, provides a detailed description of the main TensorFlow features in a real-life problem, followed by detailed discussions about TensorFlow installation and configuration. It then covers computation graphs, data, and programming models before getting started with TensorFlow. The last part of the chapter contains an example of implementing linear regression model for predictive analytics.

Chapter 4, Putting Data in Place - Supervised Learning for Predictive Analytics, covers some TensorFlow-based supervised learning techniques from a theoretical and practical perspective. In particular, the linear regression model for regression analysis will be covered on a real dataset. It then shows how we could solve the Titanic survival problem using logistic regression, random forests, and SVMs for predictive analytics.

Chapter 5, Clustering Your Data - Unsupervised Learning for Predictive Analytics, digs deeper into predictive analytics and finds out how we can take advantage of it to cluster records belonging to the certain group or class for a dataset of unsupervised observations. It will then provide some practical examples of unsupervised learning. Particularly, clustering techniques using TensorFlow will be discussed with some hands-on examples.

Chapter 6, Predictive Analytics Pipelines for NLP, shows how to use TensorFlow for text analytics with a focus on text classification from an unstructured spam prediction and movie review dataset. Based on the spam filtering dataset, it shows how to develop predictive models using a linear regression algorithm with TensorFlow. Particularly, it will use the bag-of-words (BOW) and TF-IDF algorithms for spam prediction. Later on, it will also show how to develop large-scale predictive models for predicting sentiment from the movie review dataset using the continuous bag-of-words (CBOW) and continuous skip-gram algorithms.

Chapter 7, Using Deep Neural Networks for Predictive Analytics, demonstrates how to train DNNs and analyze the performance metrics that are needed to evaluate a DNN predictive model. It also shows how to tune the hyperparameters for DNNs for better and optimized performance. It will provide two examples on how to build very robust and accurate predictive models for predictive analytics as well, in particular, using Deep Belief Networks (DBN) and Multilayer Perceptron (MLP) on a bank marketing dataset.

Chapter 8, Using Convolutional Neural Networks for Predictive Analytics, discusses how to develop predictive analytics applications such as emotion recognition, image classification, and text classification using the convolutional neural network algorithm on real image/text datasets. Finally, it will provide some pointers on how to tune and debug CNN-based networks for optimized performance.

Chapter 9, Using Recurrent Neural Networks for Predictive Analytics, provides some theoretical background for RNNs. Then, it shows a few examples of implementing predictive models for image classification, sentiment analysis of movies, and products spam prediction for NLP. Finally, it shows how to develop predictive models for time-series data.

Chapter 10, Recommendation System for Predictive Analytics, provides several examples of how to develop recommendation systems for predictive analytics followed by some theoretical background of recommendation systems, for example, matrix factorization. Later in the chapter, an example of developing movie recommendation engine using SVD and K-means will be shown. Finally, the chapter shows how we could use factorization machines to develop a more accurate and robust recommendation system.

Chapter 11, Using Reinforcement Learning for Predictive Analytics, talks about designing machine learning systems driven by criticism and rewards. It will show several examples of how to apply reinforcement learning algorithms for developing predictive models on real-life datasets.

What you need for this book

All the examples have been implemented in Python 2 and 3 with TensorFlow 1.2.0+. You will also need some additional software and tools. To be more specific, the following tools and libraries are required, preferably the latest version:

  • Python (2.7.x or 3.3+)

  • TensorFlow (1.0.0+)

  • Bazel (latest version)

  • pip/pip3 (latest version for Python 2 and 3 respectively)

  • matplotlib (latest version)

  • pandas (latest version)

  • NumPy (latest version)

  • SciPy (latest version)

  • sklearn (latest version)

  • yahoo_finance (latest version)

  • Bazel(latest version)

  • CUDA (latest version)

  • CuDNN (latest version)

Linux distributions are preferable (including Debian, Ubuntu, Fedora, RHEL, and CentOS) and to be more specific, for Ubuntu it is recommended to have the 14.04 (LTS) 64-bit (or later) complete installation or VMWare player 12 or VirtualBox. You can also run TensorFlow jobs on Windows (XP/7/8/10) or Mac OS X (10.4.7+).

Processor Core i5 or Core i7 with GPU support is recommended to get the best results. However, multicore processing would provide faster data processing and scalability of the predictive analytics jobs—at least 8 GB RAM (recommended) for a standalone mode and at least 32 GB RAM for a single VM and higher for a cluster. There is enough storage for running heavy jobs (depending on the dataset size you will be handling), preferably at least 50 GB of free disk storage.

Who this book is for

This book is dedicated to developers, data analysts, and deep learning enthusiasts who want to build powerful, robust, and accurate predictive models with the power of TensorFlow from scratch and in combination with other open source Python libraries. If you want to build your own extensive applications that work and can predict smart decisions in the future, then this book is what you need! A good command of object-oriented programming with Python is a prerequisite. Some competence in applied mathematics, statistics, linear algebra, and information theory is a plus and would help readers understand the concepts presented in this book.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

A block of code for importing necessary packages and libraries modules is set as follows:

#Import libraries (Numpy, Tensorflow, matplotlib)
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plot

When creating the session from the TensorFlow and do some computation, we used the following code segment:

with tf.Session() as sess:
      sess.run(tf.global_variables_initializer())
      writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
      print("done")

Any command–line input or output is written as follows:

# cp /usr/src/asterisk-addons/configs/cdr_mysql.conf.sample
     /etc/asterisk/cdr_mysql.conf

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes, for example, appear in the text like this: "Clicking the Next button moves you to the next screen."

Note

Warnings or important notes appear in a box like this.

Note

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e–mail to , and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e–mailed directly to you.

You can download the code files by following these steps:

  1. Log in or register to our website using your e-mail address and password.

  2. Hover the mouse pointer on the SUPPORT tab at the top.

  3. Click on Code Downloads & Errata.

  4. Enter the name of the book in the Search box.

  5. Select the book for which you're looking to download the code files.

  6. Choose from the drop-down menu where you purchased this book from.

  7. Click on Code Download.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows

  • Zipeg / iZip / UnRarX for Mac

  • 7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Predictive–Analytics–with–TensorFlow. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/PredictiveAnalyticswithTensorFlow_ColorImages.pdf

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyright material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at , and we will do our best to address the problem.