Machine Learning with R

Machine Learning with R

By : Brett Lantz

Buy this Book

Machine Learning with R

By: Brett Lantz

Buy this Book

Overview of this book

Machine learning, at its core, is concerned with transforming data into actionable knowledge. This fact makes machine learning well-suited to the present-day era of "big data" and "data science". Given the growing prominence of R‚Äîa cross-platform, zero-cost statistical programming environment‚Äîthere has never been a better time to start applying machine learning. Whether you are new to data science or a veteran, machine learning with R offers a powerful set of methods for quickly and easily gaining insight from your data. "Machine Learning with R" is a practical tutorial that uses hands-on examples to step through real-world application of machine learning. Without shying away from the technical details, we will explore Machine Learning with R using clear and practical examples. Well-suited to machine learning beginners or those with experience. Explore R to find the answer to all of your questions. How can we use machine learning to transform data into action? Using practical examples, we will explore how to prepare data for analysis, choose a machine learning method, and measure the success of the process. We will learn how to apply machine learning methods to a variety of common tasks including classification, prediction, forecasting, market basket analysis, and clustering. By applying the most effective machine learning methods to real-world problems, you will gain hands-on experience that will transform the way you think about data. "Machine Learning with R" will provide you with the analytical tools you need to quickly gain insight from complex data.

Machine Learning with R

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Introducing Machine Learning

The origins of machine learning

Uses and abuses of machine learning

How do machines learn?

Steps to apply machine learning to your data

Choosing a machine learning algorithm

Using R for machine learning

Summary

Managing and Understanding Data

R data structures

Vectors

Factors

Managing data with R

Exploring and understanding data

Summary

Lazy Learning – Classification Using Nearest Neighbors

Understanding classification using nearest neighbors

Diagnosing breast cancer with the kNN algorithm

Summary

Probabilistic Learning – Classification Using Naive Bayes

Understanding naive Bayes

Example – filtering mobile phone spam with the naive Bayes algorithm

Summary

Divide and Conquer – Classification Using Decision Trees and Rules

Understanding decision trees

Example – identifying risky bank loans using C5.0 decision trees

Understanding classification rules

Example – identifying poisonous mushrooms with rule learners

Summary

Forecasting Numeric Data – Regression Methods

Understanding regression

Example – predicting medical expenses using linear regression

Understanding regression trees and model trees

Example – estimating the quality of wines with regression trees and model trees

Summary

Black Box Methods – Neural Networks and Support Vector Machines

Understanding neural networks

Modeling the strength of concrete with ANNs

Understanding Support Vector Machines

Performing OCR with SVMs

Summary

Finding Patterns – Market Basket Analysis Using Association Rules

Understanding association rules

Example – identifying frequently purchased groceries with association rules

Summary

Finding Groups of Data – Clustering with k-means

Understanding clustering

Summary

Evaluating Model Performance

Measuring performance for classification

Estimating future performance

Summary

Improving Model Performance

Tuning stock models for better performance

Improving model performance with meta-learning

Summary

Specialized Machine Learning Topics

Working with specialized data

Improving the performance of R

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Uses and abuses of machine learning

At its core, machine learning is primarily interested in making sense of complex data. This is a broadly applicable mission, and largely application agnostic. As you might expect, machine learning is used widely. For instance, it has been used to:

Predict the outcomes of elections
Identify and filter spam messages from e-mail
Foresee criminal activity
Automate traffic signals according to road conditions
Produce financial estimates of storms and natural disasters
Examine customer churn
Create auto-piloting planes and auto-driving cars
Identify individuals with the capacity to donate
Target advertising to specific types of consumers

For now, don't worry about exactly how the machines learn to perform these tasks; we will get into the specifics later. But across each of these contexts, the process is the same. A machine learning algorithm takes data and identifies patterns that can be used for action. In some cases, the results are so successful that they seem to reach near-legendary status.

One possibly apocryphal tale is of a large retailer in the United States, which employed machine learning to identify expectant mothers for targeted coupon mailings. If mothers-to-be were targeted with substantial discounts, the retailer hoped they would become loyal customers who would then continue to purchase profitable items like diapers, formula, and toys.

By applying machine learning methods to purchase data, the retailer believed it had learned some useful patterns. Certain items, such as prenatal vitamins, lotions, and washcloths could be used to identify with a high degree of certainty not only whether a woman was pregnant, but also when the baby was due.

After using this data for a promotional mailing, an angry man contacted the retailer and demanded to know why his teenage daughter was receiving coupons for maternity items. He was furious that the merchant seemed to be encouraging teenage pregnancy. Later on, as a manager called to offer an apology, it was the father that ultimately apologized; after confronting his daughter, he had discovered that she was indeed pregnant.

Whether completely true or not, there is certainly an element of truth to the preceding tale. Retailers, do in fact, routinely analyze their customers' transaction data. If you've ever used a shopper's loyalty card at your grocer, coffee shop, or another retailer, it is likely that your purchase data is being used for machine learning.

Retailers use machine learning methods for advertising, targeted promotions, inventory management, or the layout of the items in the store. Some retailers have even equipped checkout lanes with devices that print coupons for promotions based on the items in the current transaction. Websites also routinely do this to serve advertisements based on your web browsing history. Given the data from many individuals, a machine learning algorithm learns typical patterns of behavior that can then be used to make recommendations.

Despite being familiar with the machine learning methods working behind the scenes, it still feels a bit like magic when a retailer or website seems to know me better than I know myself. Others may be less thrilled to discover that their data is being used in this manner. Therefore, any person wishing to utilize machine learning or data mining would be remiss not to at least briefly consider the ethical implications of the art.

Ethical considerations

Due to the relative youth of machine learning as a discipline and the speed at which it is progressing, the associated legal issues and social norms are often quite uncertain and constantly in flux. Caution should be exercised when obtaining or analyzing data in order to avoid breaking laws, violating terms of service or data use agreements, abusing the trust, or violating privacy of the customers or the public.

Tip

The informal corporate motto of Google, an organization, which collects perhaps more data on individuals than any other, is "don't be evil." This may serve as a reasonable starting point for forming your own ethical guidelines, but it may not be sufficient.

Certain jurisdictions may prevent you from using racial, ethnic, religious, or other protected class data for business reasons, but keep in mind that excluding this data from your analysis may not be enough—machine learning algorithms might inadvertently learn this information independently. For instance, if a certain segment of people generally live in a certain region, buy a certain product, or otherwise behave in a way that uniquely identifies them as a group, some machine learning algorithms can infer the protected information from seemingly innocuous data. In such cases, you may need to fully "de-identify" these people by excluding any potentially identifying data in addition to the protected information.

Apart from the legal consequences, using data inappropriately may hurt your bottom line. Customers may feel uncomfortable or become spooked if aspects of their lives they consider private are made public. Recently, several high-profile web applications have experienced a mass exodus of users who felt exploited when the applications' terms of service agreements changed and their data was used for purposes beyond what the users had originally agreed upon. The fact that privacy expectations differ by context, by age cohort, and by locale, adds complexity to deciding the appropriate use of personal data. It would be wise to consider the cultural implications of your work before you begin on your project.

Tip

The fact that you can use data for a particular end does not always mean that you should.

Machine Learning with R

By : Brett Lantz

Machine Learning with R

By: Brett Lantz

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning with R

Uses and abuses of machine learning

Ethical considerations

Tip

Tip