Book Image

Machine Learning Quick Reference

By : Rahul Kumar
Book Image

Machine Learning Quick Reference

By: Rahul Kumar

Overview of this book

Machine learning makes it possible to learn about the unknowns and gain hidden insights into your datasets by mastering many tools and techniques. This book guides you to do just that in a very compact manner. After giving a quick overview of what machine learning is all about, Machine Learning Quick Reference jumps right into its core algorithms and demonstrates how they can be applied to real-world scenarios. From model evaluation to optimizing their performance, this book will introduce you to the best practices in machine learning. Furthermore, you will also look at the more advanced aspects such as training neural networks and work with different kinds of data, such as text, time-series, and sequential data. Advanced methods and techniques such as causal inference, deep Gaussian processes, and more are also covered. By the end of this book, you will be able to train fast, accurate machine learning models at your fingertips, which you can easily use as a point of reference.
Table of Contents (18 chapters)
Title Page
Copyright and Credits
About Packt
Contributors
Preface
Index

Preface

Machine learning involves developing and training models to predict future outcomes. This book is a practical guide to all the tips and tricks related to machine learning. It includes hands-on, easy-to-access techniques on topics such as model selection, performance tuning, training neural networks, time series analysis, and a lot more.

This book has been tailored toward readers who want to understand not only the concepts behind machine learning algorithms, but also the mathematics behind them. However, we have tried to strike a balance between these two. 

Who this book is for

If you're a machine learning practitioner, data scientist, machine learning developer, or engineer, this book will serve as a reference point for building machine learning solutions. You will also find this book useful if you're an intermediate machine learning developer or data scientist looking for a quick, handy reference to all the concepts of machine learning. You'll need some exposure to machine learning to get the best out of this book.

What this book covers

Chapter 1Quantification of Learning, builds the foundation for later chapters. First, we are going to understand the meaning of a statistical model. We'll also discuss the thoughts of Leo Breiman about statistical modeling. Later, we will discuss curves and why they are so important. One of the typical ways to find out the association between variables and modeling is curve fitting, which is introduced in this chapter.

To build a model, one of the steps is to partition the data. We will discuss the reasoning behind this and examine an approach to carry it out. While we are building a model, more often that not it is not a smooth ride, and we run into several issues. We often encounter overfitting and underfitting, for several reasons. We need to understand why and learn how to overcome it. Also, we will be discussing how overfitting and underfitting are connected to bias and variance. This chapter will discuss these concepts with respect to neural networks. Regularization is one of the hyperparameters that is an integral part of the model building process. We will understand why it is required. Cross-validation, model selection, and 0.632+ bootstrap will be talked about in this chapter, as they help data scientists to fine-tune a model. 

Chapter 2Evaluating Kernel Learning, explains how support vector machines (SVMs) have been among the most sophisticated models and have grabbed a lot of attention in the areas of classification and regression. But practitioners still find them difficult to grasp as it involve lots of mathematics. However, we have tried to keep it simple and mathematical too, so that you should be able to understand the tricks of SVMs. Also, we'll look at the kernel trick, which took SVMs to another level by making computation simple, to an extent. We will study the different types of kernel and their usage.

Chapter 3Performance in Ensemble Learning, explains how to build models based on the concepts of bagging and boosting, which are ruling the world of hackathons. We will discuss bagging and boosting in detail. They have led to the creation of many good algorithms, such as random forest and gradient boosting. We will discuss each in detail with the help of a use case so that you can understand the difference between these two. Also, an important part of this chapter deals with the optimization of hyperparameters.

Chapter 4Training Neural Networks, covers neural networks, which have always been deemed black box algorithms that take lots of effort to understand. We have tried to unbox the complexities surrounding NNs. We have started with detailing how NNs are analogous to the human brain. This chapter also covers what parameters such as weights and biases are and how an NN learns. An NN's learning process involves network initialization, a feedforward system, and cost calculation. Once a cost is calculated, backpropagation kicks off. 

Next comes the challenges in the model, such as exploding gradients, vanishing gradients, and overfitting. This chapter encompasses all such problems, helps us understand why such challenges occur, and explains how to overcome them.

Chapter 5Time-Series Analysis, covers different time series models for analyzing demand forecasting, be it stock price or sales forecasting, or anything else. Almost every industry runs into such use cases. In order to carry out such use cases, there are multiple approaches, and what we have covered is autoregressive models, ARMA, ARIMA, and others. We have started with the concepts of autoregression. Then comes stationarity, which is an important element of such models. This chapter examines stationarity and how we can detect it. Also, assessment of the model is covered too. Anomaly detection in econometrics is also discussed at length with the help of a use case.

Chapter 6Natural Language Processing, explains what natural language processing is making textual data talk. There are a number of algorithms that make this work. We cannot work with textual data as it is. It needs to be vectorized and embedded. This chapter covers various ways of doing this, such as TF-IDF and bag-of-words methods.

 

We will also talk about how sentiment analysis can be done with the help of such approaches, and compare the results of different methods. We then move on to topic modeling, wherein the prime motive is to extract the the main topics from a corpus. And later, we will examine a use case and solve it with a Bayesian approach.

Chapter 7Temporal and Sequential Pattern Discovery, focuses on why it is necessary to study frequent itemsets and how we can deal with them. We cover the use of the Apriori and Frequent Pattern Growth algorithms to uncover findings in transactional data.

Chapter 8Probabilistic Graphical Models, covers Bayesian networks and how they are making a difference in machine learning. We will look at Bayesian networks (trees) constructed on conditional probability tables.

Chapter 9Selected Topics in Deep Learning, explains that as the world is transitioning from simple business analytics to deep learning, we have lots to catch up on. This chapter explores weight initialization, layer formation, the calculation of cost, and backpropagation. And subsequently, we will introduce Hinton's capsule network and look at how it works.

Chapter 10, Causal Inference, discusses algorithms that provide a directional view around causality in a time series. Our stakeholders often mention the causality behind the target variable. So, we have addressed it using the Granger causality model in time series, and we have also discussed Bayesian techniques that enable us to achieve causality.

Chapter 11, Advanced Methods, explains that there are number of state-of-the-art models in the pipeline, and they need a special mention in this book. This chapter should help you understand and apply them. Also, we have talked about independent component analysis and how it is different from principal component analysis. Subsequently, we discuss the Bayesian technique of multiple imputation and its importance. We will also get an understanding of self-organizing maps and why they are important. Lastly, we will also touch upon compressed sensing.

To get the most out of this book

This book requires a basic knowledge of Python, R, and machine learning.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

 

 

You can download the code files by following these steps:

  1. Log in or register at www.packt.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-Quick-Reference. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781788830577_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Now we will extract a bootstrap sample with the help of the resample function:"

A block of code is set as follows:

#using "resample" function generate a bootstrap sample
boot_samp = resample(dataset, replace=True, n_samples=5, random_state=1)

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Note

Warnings or important notes appear like this.

Note

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

 

 

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.