Book Image

Practical Machine Learning

By : Sunila Gollapudi
Book Image

Practical Machine Learning

By: Sunila Gollapudi

Overview of this book

This book explores an extensive range of machine learning techniques uncovering hidden tricks and tips for several types of data using practical and real-world examples. While machine learning can be highly theoretical, this book offers a refreshing hands-on approach without losing sight of the underlying principles. Inside, a full exploration of the various algorithms gives you high-quality guidance so you can begin to see just how effective machine learning is at tackling contemporary challenges of big data This is the only book you need to implement a whole suite of open source tools, frameworks, and languages in machine learning. We will cover the leading data science languages, Python and R, and the underrated but powerful Julia, as well as a range of other big data platforms including Spark, Hadoop, and Mahout. Practical Machine Learning is an essential resource for the modern data scientists who want to get to grips with its real-world application. With this book, you will not only learn the fundamentals of machine learning but dive deep into the complexities of real world data before moving on to using Hadoop and its wider ecosystem of tools to process and manage your structured and unstructured data. You will explore different machine learning techniques for both supervised and unsupervised learning; from decision trees to Naïve Bayes classifiers and linear and clustering methods, you will learn strategies for a truly advanced approach to the statistical analysis of data. The book also explores the cutting-edge advancements in machine learning, with worked examples and guidance on deep learning and reinforcement learning, providing you with practical demonstrations and samples that help take the theory–and mystery–out of even the most advanced machine learning methodologies.
Table of Contents (23 chapters)
Practical Machine Learning
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Preface

Finding something meaningful in increasingly larger and more complex datasets is a growing demand of the modern world. Machine learning and predictive analytics have become the most important approaches to uncover data gold mines. Machine learning uses complex algorithms to make improved predictions of outcomes based on historical patterns and the behavior of datasets. Machine learning can deliver dynamic insights into trends, patterns, and relationships within data, which is immensely valuable to the growth and development of business.

With this book, you will not only learn the fundamentals of Machine learning, but you will also dive deep into the complexities of the real-world data before moving onto using Hadoop and its wider ecosystem of tools to process and manage your structured and unstructured data.

What this book covers

Chapter 1, Introduction to Machine learning, will cover the basics of Machine learning and the landscape of Machine learning semantics. It will also define Machine learning in simple terms and introduce Machine learning jargon or commonly used terms. This chapter will form the base for the rest of the chapters.

Chapter 2, Machine learning and Large-scale datasets, will explore qualifiers of large datasets, common characteristics, problems of repetition, the reasons for the hyper-growth in the volumes, and approaches to handle the big data.

Chapter 3, An Introduction to Hadoop's Architecture and Ecosystem, will cover all about Hadoop, starting from its core frameworks to its ecosystem components. At the end of this chapter, readers will be able to set up Hadoop and run some MapReduce functions; they will be able to use one or more ecosystem components. They will also be able to run and manage Hadoop environment and understand the command-line usage.

Chapter 4, Machine Learning Tools, Libraries, and Frameworks, will explain open source options to implement Machine learning and cover installation, implementation, and execution of libraries, tools, and frameworks, such as Apache Mahout, Python, R, Julia, and Apache Spark's MLlib. Very importantly, we will cover the integration of these frameworks with the big data platform—Apache Hadoop

Chapter 5, Decision Tree based learning, will explore a supervised learning technique with Decision trees to solve classification and regression problems. We will cover methods to select attributes and split and prune the tree. Among all the other Decision tree algorithms, we will explore the CART, C4.5, Random forests, and advanced decision tree techniques.

Chapter 6, Instance and Kernel methods based learning, will explore two learning algorithms: instance-based and kernel methods; and we will discover how they address the classification and prediction requirements. In instance-based learning methods, we will explore the Nearest Neighbor algorithm in detail. Similarly in kernel-based methods, we will explore Support Vector Machines using real-world examples.

Chapter 7, Association Rules based learning, will explore association rule based learning methods and algorithms: Apriori and FP-growth. With a common example, you will learn how to do frequent pattern mining using the Apriori and FP-growth algorithms with a step-by-step debugging of the algorithm.

Chapter 8, Clustering based learning, will cover clustering based learning methods in the context of unsupervised learning. We will take a deep dive into k-means clustering algorithm using an example and learn to implement it using Mahout, R, Python, Julia, and Spark.

Chapter 9, Bayesian learning, will explore Bayesian Machine learning. Additionally, we will cover all the core concepts of statistics starting from basic nomenclature to various distributions. We will cover Bayes theorem in depth with examples to understand how to apply it to the real-world problems.

Chapter 10, Regression based learning, will cover regression analysis-based Machine learning and in specific, how to implement linear and logistic regression models using Mahout, R, Python, Julia, and Spark. Additionally, we will cover other related concepts of statistics such as variance, covariance, ANOVA, among others. We will also cover regression models in depth with examples to understand how to apply it to the real-world problems.

Chapter 11, Deep learning, will cover the model for a biological neuron and will explain how an artificial neuron is related to its function. You will learn the core concepts of neural networks and understand how fully-connected layers work. We will also explore some key activation functions that are used in conjunction with matrix multiplication.

Chapter 12, Reinforcement learning, will explore a new learning technique called reinforcement learning. We will see how this is different from the traditional supervised and unsupervised learning techniques. We will also explore the elements of MDP and learn about it using an example.

Chapter 13,Ensemble learning, will cover the ensemble learning methods of Machine learning. In specific, we will look at some supervised ensemble learning techniques with some real-world examples. Finally, this chapter will have source-code examples for gradient boosting algorithm using R, Python (scikit-learn), Julia, and Spark machine learning tools and recommendation engines using Mahout libraries.

Chapter 14, New generation data architectures for Machine learning, will be on the implementation aspects of Machine learning. We will understand what the traditional analytics platforms are and how they cannot fit in modern data requirements. You will also learn about the architecture drivers that promote new data architecture paradigms, such as Lambda architectures polyglot persistence (Multi-model database architecture); you will learn how Semantic architectures help in a seamless data integration.

What you need for this book

You'll need the following softwares for this book:

  • R (2.15.1)

  • Apache Mahout (0.9)

  • Python(sckit-learn)

  • Julia(0.3.4)

  • Apache Spark (with Scala 2.10.4)

Who this book is for

This book has been created for data scientists who want to see Machine learning in action and explore its real-world application. With guidance on everything from the fundamentals of Machine learning and predictive analytics to the latest innovations set to lead the big data revolution into the future, this is an unmissable resource for anyone dedicated to tackling current big data challenges. Knowledge of programming (Python and R) and mathematics is advisable, if you want to get started immediately.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The Map() function works on the distributed data and runs the required functionality in parallel."

A block of code is set as follows:

public static class VowelMapper extends Mapper<Object, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
{
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens())
{
word.set(itr.nextToken());
context.write(word, one);
}
}

Any command-line input or output is written as follows:

$ hadoop-daemon.sh start namenode

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to , and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

The author will be updating the code on https://github.com/PacktCode/Practical-Machine-Learning for you to download as and when there are version updates.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from http://www.packtpub.com/sites/default/files/downloads/Practical_Machine_Learning_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at if you are having a problem with any aspect of the book, and we will do our best to address it.