Book Image

Machine Learning with Python

By : Oliver Theobald
Book Image

Machine Learning with Python

By: Oliver Theobald

Overview of this book

The course starts by setting the foundation with an introduction to machine learning, Python, and essential libraries, ensuring you grasp the basics before diving deeper. It then progresses through exploratory data analysis, data scrubbing, and pre-model algorithms, equipping you with the skills to understand and prepare your data for modeling. The journey continues with detailed walkthroughs on creating, evaluating, and optimizing machine learning models, covering key algorithms such as linear and logistic regression, support vector machines, k-nearest neighbors, and tree-based methods. Each section is designed to build upon the previous, reinforcing learning and application of concepts. Wrapping up, the course introduces the next steps, including an introduction to Python for newcomers, ensuring a comprehensive understanding of machine learning applications.
Table of Contents (18 chapters)
Free Chapter
1
FOREWORD
2
DATASETS USED IN THIS BOOK
3
INTRODUCTION
4
DEVELOPMENT ENVIRONMENT
5
MACHINE LEARNING LIBRARIES
6
EXPLORATORY DATA ANALYSIS
7
DATA SCRUBBING
8
PRE-MODEL ALGORITHMS
9
SPLIT VALIDATION
10
MODEL DESIGN
11
LINEAR REGRESSION
12
LOGISTIC REGRESSION
13
SUPPORT VECTOR MACHINES
14
k-NEAREST NEIGHBORS
15
TREE-BASED METHODS
16
NEXT STEPS
APPENDIX 1: INTRODUCTION TO PYTHON
APPENDIX 2: PRINT COLUMNS

PRE-MODEL ALGORITHMS

 

As an extension of the data scrubbing process, unsupervised learning algorithms are sometimes used in advance of a supervised learning algorithm to prepare the data for prediction modeling. In this way, unsupervised algorithms are used to clean or reshape the data rather than to derive actionable insight.

Examples of pre-model algorithms include dimension reduction techniques, as introduced in the previous chapter, as well as k-means clustering. Both of these algorithms are examined in this chapter.

 

Principal Component Analysis

One of the most popular dimension reduction techniques is principal component analysis (PCA). Known also as general factor analysis, PCA is useful for dramatically reducing data complexity and visualizing data in fewer dimensions. The practical goal of PCA is to find a low-dimensional representation of the dataset that preserves as much of the original variation as possible. Rather than removing individual features from...