Book Image

R for Data Science

By : Dan Toomey
Book Image

R for Data Science

By: Dan Toomey

Overview of this book

Table of Contents (19 chapters)

Questions


Factual

  • How do you decide whether to use kmeans or kdemoids?

  • What is the significance of the boxplot layout? Why does it look that way?

  • Describe the underlying data produced in the outliers for the iris data, given the density plot.

  • What are the extract rules for other items in the market dataset?

When, how, and why?

  • What is the risk of not vetting the outliers that are detected for the specific domain? Shouldn't the calculation always work?

  • Why do we need to exclude the iris category column from the outlier detection algorithm? Can it be used in some way when determining outliers?

  • Can you come up with a scenario where the market basket data and rules we generated were not applicable to the store you are working with?

Challenges

  • I found it difficult to develop test data for outliers in two dimensions that both occurred in the same instance using random data. Can you develop a test that would always have several outliers in at least two dimensions that occur in the same instance?

  • There is a good dataset on the Internet regarding passenger data on the Titanic. Generate the rules regarding the possible survival of the passengers.