Book Image

Mastering Predictive Analytics with scikit-learn and TensorFlow

By : Alvaro Fuentes
Book Image

Mastering Predictive Analytics with scikit-learn and TensorFlow

By: Alvaro Fuentes

Overview of this book

Python is a programming language that provides a wide range of features that can be used in the field of data science. Mastering Predictive Analytics with scikit-learn and TensorFlow covers various implementations of ensemble methods, how they are used with real-world datasets, and how they improve prediction accuracy in classification and regression problems. This book starts with ensemble methods and their features. You will see that scikit-learn provides tools for choosing hyperparameters for models. As you make your way through the book, you will cover the nitty-gritty of predictive analytics and explore its features and characteristics. You will also be introduced to artificial neural networks and TensorFlow, and how it is used to create neural networks. In the final chapter, you will explore factors such as computational power, along with improvement methods and software enhancements for efficient predictive analytics. By the end of this book, you will be well-versed in using deep neural networks to solve common problems in big data analysis.
Table of Contents (7 chapters)

Dimensionality reduction and PCA

The dimensionality reduction method is the process of reducing the number of features under consideration by obtaining a set of principal variables. The Principal Component Analysis (PCA) technique is the most important technique used for dimensionality reduction. Here, we will talk about why we need dimensionality reduction, and we will also see how to perform the PCA technique in scikit-learn.

These are the reasons for having a high number of features while working on predictive analytics:

  • It enables the simplification of models, in order to make them easier to understand and to interpret. There might be some computational considerations if you are dealing with thousands of features. It might be a good idea to reduce the number of features in order to save computational resources.
  • Another reason is to avoid the "curse of dimensionality...