R Machine Learning By Example

R Machine Learning By Example

By : Raghav Bali

Buy this Book

R Machine Learning By Example

By: Raghav Bali

Buy this Book

Overview of this book

Data science and machine learning are some of the top buzzwords in the technical world today. From retail stores to Fortune 500 companies, everyone is working hard to making machine learning give them data-driven insights to grow their business. With powerful data manipulation features, machine learning packages, and an active developer community, R empowers users to build sophisticated machine learning systems to solve real-world data problems. This book takes you on a data-driven journey that starts with the very basics of R and machine learning and gradually builds upon the concepts to work on projects that tackle real-world problems. You’ll begin by getting an understanding of the core concepts and definitions required to appreciate machine learning algorithms and concepts. Building upon the basics, you will then work on three different projects to apply the concepts of machine learning, following current trends and cover major algorithms as well as popular R packages in detail. These projects have been neatly divided into six different chapters covering the worlds of e-commerce, finance, and social-media, which are at the very core of this data-driven revolution. Each of the projects will help you to understand, explore, visualize, and derive insights depending upon the domain and algorithms. Through this book, you will learn to apply the concepts of machine learning to deal with data-related problems and solve them using the powerful yet simple language, R.

R Machine Learning By Example

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Getting Started with R and Machine Learning

Delving into the basics of R

Data structures in R

Working with functions

Controlling code flow

Advanced constructs

Next steps with R

Machine learning basics

Summary

Let's Help Machines Learn

Understanding machine learning

Algorithms in machine learning

Families of algorithms

Summary

Predicting Customer Shopping Trends with Market Basket Analysis

Detecting and predicting trends

Market basket analysis

Evaluating a product contingency matrix

Frequent itemset generation

Association rule mining

Summary

Building a Product Recommendation System

Understanding recommendation systems

Issues with recommendation systems

Collaborative filters

Building a recommender engine

Production ready recommender engines

Summary

Credit Risk Detection and Prediction – Descriptive Analytics

Data analysis and transformation

Next steps

Summary

Credit Risk Detection and Prediction – Predictive Analytics

Predictive analytics

How to predict credit risk

Important concepts in predictive modeling

Getting the data

Data preprocessing

Feature selection

Modeling using logistic regression

Modeling using support vector machines

Modeling using decision trees

Modeling using random forests

Modeling using neural networks

Model comparison and selection

Summary

Social Media Analysis – Analyzing Twitter Data

Social networks (Twitter)

Data mining @social networks

Getting started with Twitter APIs

Twitter data mining

Challenges with social network data mining

References

Summary

Sentiment Analysis of Twitter Data

Understanding Sentiment Analysis

Sentiment analysis upon Tweets

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Advanced constructs

We heard the term vectorized earlier when we talked about operating on vectors without using loops. While looping is a great way to iterate through vectors and perform computations, it is not very efficient when we deal with what is known as Big Data. In this case, R provides some advanced constructs which we will be looking at in this section. We will be covering the following functions:

lapply: Loops over a list and evaluates a function on each element
sapply: A simplified version of lapply
apply: Evaluates a function on the boundaries or margins of an array
tapply: Evaluates a function over subsets of a vector
mapply: A multivariate version of lapply

lapply and sapply

Like we mentioned earlier, lapply takes a list and a function as input and evaluates that function over each element of the list. If the input list is not a list, it is converted into a list using the as.list function before the output is returned. It is much faster than a normal loop because the actual looping is done internally using C code. We look at its implementation and an example in the following code snippet:

> # lapply function definition
> lapply
function (X, FUN, ...) 
{
    FUN <- match.fun(FUN)
    if (!is.vector(X) || is.object(X)) 
        X <- as.list(X)
    .Internal(lapply(X, FUN))
}
<bytecode: 0x00000000003e4f68>
<environment: namespace:base>
> # example
> nums <- list(l1=c(1,2,3,4,5,6,7,8,9,10), l2=1000:1020)
> lapply(nums, mean)

Output:

Coming to sapply, it is similar to lapply except that it tries to simplify the results wherever possible. For example, if the final result is such that every element is of length 1, it returns a vector, if the length of every element in the result is the same but more than 1, a matrix is returned, and if it is not able to simplify the results, we get the same result as lapply. We illustrate the same with the following example:

> data <- list(l1=1:10, l2=runif(10), l3=rnorm(10,2))
> data

Output:

> 
> lapply(data, mean)

Output:

> sapply(data, mean)

Output:

apply

The apply function is used to evaluate a function over the margins or boundaries of an array; for instance, applying aggregate functions on the rows or columns of an array. The rowSums, rowMeans, colSums, and colMeans functions also use apply internally but are much more optimized and useful when operating on large arrays. We will see all the preceding constructs in the following example:

> mat <- matrix(rnorm(20), nrow=5, ncol=4)
> mat

Output:

> # row sums
> apply(mat, 1, sum)
[1]  0.79786959  0.53900665 -2.36486927 -1.28221227  0.06701519
> rowSums(mat)
[1]  0.79786959  0.53900665 -2.36486927 -1.28221227  0.06701519
> # row means
> apply(mat, 1, mean)
[1]  0.1994674  0.1347517 -0.5912173 -0.3205531  0.0167538
> rowMeans(mat)
[1]  0.1994674  0.1347517 -0.5912173 -0.3205531  0.0167538
>
> # col sums
> apply(mat, 2, sum)
[1] -0.6341087  0.3321890 -2.1345245  0.1932540
> colSums(mat)
[1] -0.6341087  0.3321890 -2.1345245  0.1932540
> apply(mat, 2, mean)
[1] -0.12682173  0.06643781 -0.42690489  0.03865079
> colMeans(mat)
[1] -0.12682173  0.06643781 -0.42690489  0.03865079
>
> # row quantiles
> apply(mat, 1, quantile, probs=c(0.25, 0.5, 0.75))

Output:

Thus you can see how easy it is to apply various statistical functions on matrices without using loops at all.

tapply

The function tapply is used to evaluate a function over the subsets of any vector. This is similar to applying the GROUP BY construct in SQL if you are familiar with using relational databases. We illustrate the same in the following examples:

> data <- c(1:10, rnorm(10,2), runif(10))
> data

Output:

> groups <- gl(3,10)
> groups
 [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Levels: 1 2 3
> tapply(data, groups, mean)

Output:

> tapply(data, groups, mean, simplify = FALSE)

Output:

> tapply(data, groups, range)

Output:

mapply

The mapply function is a multivariate version of lapply and is used to evaluate a function in parallel over sets of arguments. A simple example is if we have to build a list of vectors using the rep function, we have to write it multiple times. However, with mapply we can achieve the same in a more elegant way as illustrated next:

> list(rep(1,4), rep(2,3), rep(3,2), rep(4,1))

Output:

> mapply(rep, 1:4, 4:1)

Output:

R Machine Learning By Example

By : Raghav Bali

R Machine Learning By Example

By: Raghav Bali

Overview of this book

Related Content you might be interested in

Current Title:

R Machine Learning By Example

Advanced constructs

lapply and sapply

apply

tapply

mapply