Book Image

Machine Learning with R Cookbook, Second Edition - Second Edition

By : Yu-Wei, Chiu (David Chiu)
Book Image

Machine Learning with R Cookbook, Second Edition - Second Edition

By: Yu-Wei, Chiu (David Chiu)

Overview of this book

Big data has become a popular buzzword across many industries. An increasing number of people have been exposed to the term and are looking at how to leverage big data in their own businesses, to improve sales and profitability. However, collecting, aggregating, and visualizing data is just one part of the equation. Being able to extract useful information from data is another task, and a much more challenging one. Machine Learning with R Cookbook, Second Edition uses a practical approach to teach you how to perform machine learning with R. Each chapter is divided into several simple recipes. Through the step-by-step instructions provided in each recipe, you will be able to construct a predictive model by using a variety of machine learning packages. In this book, you will first learn to set up the R environment and use simple R commands to explore data. The next topic covers how to perform statistical analysis with machine learning analysis and assess created models, covered in detail later on in the book. You'll also learn how to integrate R and Hadoop to create a big data analysis platform. The detailed illustrations provide all the information required to start applying machine learning to individual projects. With Machine Learning with R Cookbook, machine learning has never been easier.
Table of Contents (21 chapters)
Title Page
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Introduction


The aim of machine learning is to uncover hidden patterns and unknown correlations, and to find useful information from data. In addition to this, through incorporation with data analysis, machine learning can be used to perform predictive analysis. With machine learning, the analysis of business operations and processes is not limited to human scale thinking; machine scale analysis enables businesses to capture hidden values in big data.

Machine learning has similarities to the human reasoning process. Unlike traditional analysis, the generated model cannot evolve as data is accumulated. Machine learning can learn from the data that is processed and analyzed. In other words, the more data that is processed, the more it can learn.

R, as a dialect of GNU-S, is a powerful statistical language that can be used to manipulate and analyze data. Additionally, R provides many machine learning packages and visualization functions, which enable users to analyze data on the fly. Most importantly, R is open source and free.

Using R greatly simplifies machine learning. All you need to know is how each algorithm can solve your problem and then you can simply use a written package to quickly generate prediction models on data with a few command lines. For example, you can perform Naïve Bayes for spam mail filtering, conduct k-means clustering for customer segmentation, use linear regression to forecast house prices, or implement a hidden Markov model to predict the stock market, as shown in the following screenshot:

Stock market prediction using R

Moreover, you can perform nonlinear dimension reduction to calculate the dissimilarity of image data and visualize the clustered graph, as shown in the following screenshot. All you need to do is follow the recipes provided in this book:

A clustered graph of face image data

This chapter serves as an overall introduction to machine learning and R; the first few recipes introduce how to set up the R environment and the integrated development environment, RStudio. After setting up the environment, the following recipe introduces package installation and loading. In order to understand how data analysis is practiced using R, the next four recipes cover data read/write, data manipulation, basic statistics, and data visualization using R. The last recipe in the chapter lists useful data sources and resources.