Book Image

Applied Supervised Learning with Python

By : Benjamin Johnston, Ishita Mathur
Book Image

Applied Supervised Learning with Python

By: Benjamin Johnston, Ishita Mathur

Overview of this book

Machine learning—the ability of a machine to give right answers based on input data—has revolutionized the way we do business. Applied Supervised Learning with Python provides a rich understanding of how you can apply machine learning techniques in your data science projects using Python. You'll explore Jupyter Notebooks, the technology used commonly in academic and commercial circles with in-line code running support. With the help of fun examples, you'll gain experience working on the Python machine learning toolkit—from performing basic data cleaning and processing to working with a range of regression and classification algorithms. Once you’ve grasped the basics, you'll learn how to build and train your own models using advanced techniques such as decision trees, ensemble modeling, validation, and error metrics. You'll also learn data visualization techniques using powerful Python libraries such as Matplotlib and Seaborn. This book also covers ensemble modeling and random forest classifiers along with other methods for combining results from multiple models, and concludes by delving into cross-validation to test your algorithm and check how well the model works on unseen data. By the end of this book, you'll be equipped to not only work with machine learning algorithms, but also be able to create some of your own!
Table of Contents (9 chapters)

Classification Using K-Nearest Neighbors


Now that we are comfortable with creating multiclass classifiers using logistic regression and are getting reasonable performance with these models, we will turn our attention to another type of classifier: the K-nearest neighbors (K-NN) clustering method of classification. This is a handy method, as it can be used in both supervised classification problems as well as in unsupervised problems.

Figure 4.32: Visual representation of K-NN

The solid circle approximately in the center is the test point requiring classification, while the inner circle shows the classification process where K=3 and the outer circle where K=5.

K-NN is one of the simplest "learning" algorithms available for data classification. The use of learning in quotation marks is explicit, as K-NN doesn't really learn from the data and encode these learnings in parameters or weights like other methods, such as logistic regression. K-NN uses instance-based or lazy learning in that it simply...