Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Practical Machine Learning with R
  • Table Of Contents Toc
Practical Machine Learning with R

Practical Machine Learning with R

By : Jeyaraman, Olsen, Wambugu
5 (1)
close
close
Practical Machine Learning with R

Practical Machine Learning with R

5 (1)
By: Jeyaraman, Olsen, Wambugu

Overview of this book

With huge amounts of data being generated every moment, businesses need applications that apply complex mathematical calculations to data repeatedly and at speed. With machine learning techniques and R, you can easily develop these kinds of applications in an efficient way. Practical Machine Learning with R begins by helping you grasp the basics of machine learning methods, while also highlighting how and why they work. You will understand how to get these algorithms to work in practice, rather than focusing on mathematical derivations. As you progress from one chapter to another, you will gain hands-on experience of building a machine learning solution in R. Next, using R packages such as rpart, random forest, and multiple imputation by chained equations (MICE), you will learn to implement algorithms including neural net classifier, decision trees, and linear and non-linear regression. As you progress through the book, you’ll delve into various machine learning techniques for both supervised and unsupervised learning approaches. In addition to this, you’ll gain insights into partitioning the datasets and mechanisms to evaluate the results from each model and be able to compare them. By the end of this book, you will have gained expertise in solving your business problems, starting by forming a good problem statement, selecting the most appropriate model to solve your problem, and then ensuring that you do not overtrain it.
Table of Contents (8 chapters)
close
close

Chapter 6: Unsupervised Learning

Activity 20: Perform DIANA, AGNES, and k-means on the Built-In Motor Car Dataset

Solution:

  1. Attach the cluster and factoextra packages:

    library(cluster)

    library(factoextra)

  2. Load the dataset:

    df <- read.csv("mtcars.csv")

  3. Set the row names to the values of the X column (the state names). Remove the X column afterward:

    rownames(df) <- df$X

    df$X <- NULL

    Note

    The row names (states) become a column, X, when you save it as a CSV file. So, we need to change it back, as the row names are used in the plot in step 7.

  4. Remove those rows with missing data and standardize the dataset:

    df <- na.omit(df)

    df <- scale(df)

  5. Implement divisive hierarchical clustering using DIANA. For easy comparison, document the dendrogram output. Feel free to experiment with different distance metrics:

    dv <- diana(df,metric = "manhattan", stand = TRUE)

    plot(dv)

    The output is as follows:

    Figure 6.41: Banner from diana()
    Figure 6.41: Banner from diana()

    The next plot is as follows:

    Figure 6.42: Dendrogram from diana()
    Figure 6.42: Dendrogram from diana()
  6. Implement bottom-up hierarchical clustering using AGNES. Take note of the dendrogram created for comparison purposes later on:

    agn <- agnes(df)

    pltree(agn)

    The output is as follows:

    Figure 6.43: Dendrogram from agnes()
    Figure 6.43: Dendrogram from agnes()
  7. Implement k-means clustering. Use the elbow method to determine the optimal number of clusters:

    fviz_nbclust(mtcars, kmeans, method = "wss") +

        geom_vline(xintercept = 4, linetype = 2) +

        labs(subtitle = "Elbow method")

    The output is as follows:

    Figure 6.44: Optimal clusters using the elbow method
    Figure 6.44: Optimal clusters using the elbow method
  8. Perform k-means clustering with four clusters:

    k4 <- kmeans(df, centers = 4, nstart = 20)

    fviz_cluster(k4, data = df)

    The output is as follows:

    Figure 6.45: k-means with four clusters
    Figure 6.45: k-means with four clusters
  9. Compare the clusters, starting with the smallest one. The following are your expected results for DIANA, AGNES, and k-means, respectively:
Figure 6.46: Dendrogram from running DIANA, cut at 20
Figure 6.46: Dendrogram from running DIANA, cut at 20

If we consider cutting the DIANA tree at height 20, the Ferrari is clustered together with the Ford and the Maserati (the smallest cluster):

Figure 6.47: Dendrogram from agnes, cut at 4
Figure 6.47: Dendrogram from agnes, cut at 4

Meanwhile, cutting the AGNES dendrogram at height 4 results in the Ferrari being clustered with the Mazda RX4, the Mazda RX4 Wag, and the Porsche. k-means clusters the Ferrari with the Mazdas, the Ford, and the Maserati.

Figure 6.48: kmeans clustering
Figure 6.48: kmeans clustering

Clearly, the choice of clustering technique and algorithms results in different clusters being created. It is important to apply some domain knowledge to determine the most valuable end results.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Practical Machine Learning with R
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon