Book Image

R Machine Learning Essentials

By : Michele Usuelli
Book Image

R Machine Learning Essentials

By: Michele Usuelli

Overview of this book

Table of Contents (15 chapters)
R Machine Learning Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Optimizing the k-nearest neighbor algorithm


We built our KNN model using 37 features that have a different relevance to the language. Given a new flag, its neighbors are the flags sharing a lot of attributes, regardless of their relevance. If a flag has different common attributes that are irrelevant to the language, we erroneously include it in the neighborhood. On the other hand, if a flag shares a few highly-relevant attributes, it won't be included.

KNN performs worse in the presence of irrelevant attributes. This fact is called the curse of dimensionality and it's quite common in machine learning algorithms. A solution to the curse of dimensionality is to rank the features on the basis of their relevance and to select the most relevant. Another option that we won't see in this chapter is using dimensionality reduction techniques.

In the previous chapter, in the Ranking the features using a filter or a dimensionality reduction section, we measured the feature's relevance using the information...