Book Image

Applied Supervised Learning with Python

By : Benjamin Johnston, Ishita Mathur
Book Image

Applied Supervised Learning with Python

By: Benjamin Johnston, Ishita Mathur

Overview of this book

Machine learning—the ability of a machine to give right answers based on input data—has revolutionized the way we do business. Applied Supervised Learning with Python provides a rich understanding of how you can apply machine learning techniques in your data science projects using Python. You'll explore Jupyter Notebooks, the technology used commonly in academic and commercial circles with in-line code running support. With the help of fun examples, you'll gain experience working on the Python machine learning toolkit—from performing basic data cleaning and processing to working with a range of regression and classification algorithms. Once you’ve grasped the basics, you'll learn how to build and train your own models using advanced techniques such as decision trees, ensemble modeling, validation, and error metrics. You'll also learn data visualization techniques using powerful Python libraries such as Matplotlib and Seaborn. This book also covers ensemble modeling and random forest classifiers along with other methods for combining results from multiple models, and concludes by delving into cross-validation to test your algorithm and check how well the model works on unseen data. By the end of this book, you'll be equipped to not only work with machine learning algorithms, but also be able to create some of your own!
Table of Contents (9 chapters)

Summary


In this chapter, we started by talking about why data exploration is an important part of the modeling process and how it can help in not only preprocessing the dataset for the modeling process, but also help us engineer informative features and improve model accuracy. This chapter focused on not only gaining a basic overview of the dataset and its features, but also gaining insights by creating visualizations that combine several features.

We looked at how to find the summary statistics of a dataset using core functionality from pandas. We looked at how to find missing values and talked about why they're important, while learning how to use the Missingno library to analyze them and the pandas and scikit-learn libraries to impute the missing values.

Then, we looked at how to study the univariate distributions of variables in the dataset and visualize them for both categorical and continuous variables using bar charts, pie charts, and histograms. Lastly, we learned how to explore relationships...