Book Image

Machine Learning with R - Fourth Edition

By : Brett Lantz
5 (1)
Book Image

Machine Learning with R - Fourth Edition

5 (1)
By: Brett Lantz

Overview of this book

Dive into R with this data science guide on machine learning (ML). Machine Learning with R, Fourth Edition, takes you through classification methods like nearest neighbor and Naive Bayes and regression modeling, from simple linear to logistic. Dive into practical deep learning with neural networks and support vector machines and unearth valuable insights from complex data sets with market basket analysis. Learn how to unlock hidden patterns within your data using k-means clustering. With three new chapters on data, you’ll hone your skills in advanced data preparation, mastering feature engineering, and tackling challenging data scenarios. This book helps you conquer high-dimensionality, sparsity, and imbalanced data with confidence. Navigate the complexities of big data with ease, harnessing the power of parallel computing and leveraging GPU resources for faster insights. Elevate your understanding of model performance evaluation, moving beyond accuracy metrics. With a new chapter on building better learners, you’ll pick up techniques that top teams use to improve model performance with ensemble methods and innovative model stacking and blending techniques. Machine Learning with R, Fourth Edition, equips you with the tools and knowledge to tackle even the most formidable data challenges. Unlock the full potential of machine learning and become a true master of the craft.
Table of Contents (18 chapters)
16
Other Books You May Enjoy
17
Index

Advanced Data Preparation

The truism that 80 percent of the time invested in real-world machine learning projects is spent on data preparation is so widely cited that it is mostly accepted without question. Earlier chapters of this book helped perpetuate the cliché by stating it as a matter of fact without qualification, and although it is certainly a common experience and perception, it is also an oversimplification, as tends to be the case when generalizing from a statistic. In reality, there is no single, uniform experience for data preparation. Yet, it is indeed true that data prep work almost always involves more effort than anticipated.

Rare is the case in which you will be provided a single CSV formatted text file, which can be easily read into R and processed with just a few lines of R code, as was the case in previous chapters. Instead, necessary data elements are often distributed across databases, which must then be gathered, filtered, reformatted, and combined...