Book Image

Machine Learning with PyTorch and Scikit-Learn

By : Sebastian Raschka, Yuxi (Hayden) Liu, Vahid Mirjalili
5 (7)
Book Image

Machine Learning with PyTorch and Scikit-Learn

5 (7)
By: Sebastian Raschka, Yuxi (Hayden) Liu, Vahid Mirjalili

Overview of this book

Machine Learning with PyTorch and Scikit-Learn is a comprehensive guide to machine learning and deep learning with PyTorch. It acts as both a step-by-step tutorial and a reference you'll keep coming back to as you build your machine learning systems. Packed with clear explanations, visualizations, and examples, the book covers all the essential machine learning techniques in depth. While some books teach you only to follow instructions, with this machine learning book, we teach the principles allowing you to build models and applications for yourself. Why PyTorch? PyTorch is the Pythonic way to learn machine learning, making it easier to learn and simpler to code with. This book explains the essential parts of PyTorch and how to create models using popular libraries, such as PyTorch Lightning and PyTorch Geometric. You will also learn about generative adversarial networks (GANs) for generating new data and training intelligent agents with reinforcement learning. Finally, this new edition is expanded to cover the latest trends in deep learning, including graph neural networks and large-scale transformers used for natural language processing (NLP). This PyTorch book is your companion to machine learning with Python, whether you're a Python developer new to machine learning or want to deepen your knowledge of the latest developments.
Table of Contents (22 chapters)
20
Other Books You May Enjoy
21
Index

Using Python for machine learning

Python is one of the most popular programming languages for data science, and thanks to its very active developer and open-source community, a large number of useful libraries for scientific computing and machine learning have been developed.

Although the performance of interpreted languages, such as Python, for computation-intensive tasks is inferior to lower-level programming languages, extension libraries such as NumPy and SciPy have been developed that build upon lower-layer Fortran and C implementations for fast vectorized operations on multidimensional arrays.

For machine learning programming tasks, we will mostly refer to the scikit-learn library, which is currently one of the most popular and accessible open-source machine learning libraries. In the later chapters, when we focus on a subfield of machine learning called deep learning, we will use the latest version of the PyTorch library, which specializes in training so-called deep neural network models very efficiently by utilizing graphics cards.

Installing Python and packages from the Python Package Index

Python is available for all three major operating systems—Microsoft Windows, macOS, and Linux—and the installer, as well as the documentation, can be downloaded from the official Python website: https://www.python.org.

The code examples provided in this book have been written for and tested in Python 3.9, and we generally recommend that you use the most recent version of Python 3 that is available. Some of the code may also be compatible with Python 2.7, but as the official support for Python 2.7 ended in 2019, and the majority of open-source libraries have already stopped supporting Python 2.7 (https://python3statement.org), we strongly advise that you use Python 3.9 or newer.

You can check your Python version by executing

python --version

or

python3 --version

in your terminal (or PowerShell if you are using Windows).

The additional packages that we will be using throughout this book can be installed via the pip installer program, which has been part of the Python Standard Library since Python 3.3. More information about pip can be found at https://docs.python.org/3/installing/index.html.

After we have successfully installed Python, we can execute pip from the terminal to install additional Python packages:

pip install SomePackage

Already installed packages can be updated via the --upgrade flag:

pip install SomePackage --upgrade

Using the Anaconda Python distribution and package manager

A highly recommended open-source package management system for installing Python for scientific computing contexts is conda by Continuum Analytics. Conda is free and licensed under a permissive open-source license. Its goal is to help with the installation and version management of Python packages for data science, math, and engineering across different operating systems. If you want to use conda, it comes in different flavors, namely Anaconda, Miniconda, and Miniforge:

After successfully installing conda through either Anaconda, Miniconda, or Miniforge, we can install new Python packages using the following command:

conda install SomePackage

Existing packages can be updated using the following command:

conda update SomePackage

Packages that are not available through the official conda channel might be available via the community-supported conda-forge project (https://conda-forge.org), which can be specified via the --channel conda-forge flag. For example:

conda install SomePackage --channel conda-forge

Packages that are not available through the default conda channel or conda-forge can be installed via pip as explained earlier. For example:

pip install SomePackage

Packages for scientific computing, data science, and machine learning

Throughout the first half of this book, we will mainly use NumPy’s multidimensional arrays to store and manipulate data. Occasionally, we will make use of pandas, which is a library built on top of NumPy that provides additional higher-level data manipulation tools that make working with tabular data even more convenient. To augment your learning experience and visualize quantitative data, which is often extremely useful to make sense of it, we will use the very customizable Matplotlib library.

The main machine learning library used in this book is scikit-learn (Chapters 3 to 11). Chapter 12, Parallelizing Neural Network Training with PyTorch, will then introduce the PyTorch library for deep learning.

The version numbers of the major Python packages that were used to write this book are mentioned in the following list. Please make sure that the version numbers of your installed packages are, ideally, equal to these version numbers to ensure that the code examples run correctly:

  • NumPy 1.21.2
  • SciPy 1.7.0
  • Scikit-learn 1.0
  • Matplotlib 3.4.3
  • pandas 1.3.2

After installing these packages, you can double-check the installed version by importing the package in Python and accessing its __version__ attribute, for example:

>>> import numpy
>>> numpy.__version__
'1.21.2'

For your convenience, we included a python-environment-check.py script in this book’s complimentary code repository at https://github.com/rasbt/machine-learning-book so that you can check both your Python version and the package versions by executing this script.

Certain chapters will require additional packages and will provide information about the installations. For instance, do not worry about installing PyTorch at this point. Chapter 12 will provide tips and instructions when you need them.

If you encounter errors even though your code matches the code in the chapter exactly, we recommend you first check the version numbers of the underlying packages before spending more time on debugging or reaching out to the publisher or authors. Sometimes, newer versions of libraries introduce backward-incompatible changes that could explain these errors.

If you do not want to change the package version in your main Python installation, we recommend using a virtual environment for installing the packages used in this book. If you use Python without the conda manager, you can use the venv library to create a new virtual environment. For example, you can create and activate the virtual environment via the following two commands:

python3 -m venv /Users/sebastian/Desktop/pyml-book
source /Users/sebastian/Desktop/pyml-book/bin/activate

Note that you need to activate the virtual environment every time you open a new terminal or PowerShell. You can find more information about venv at https://docs.python.org/3/library/venv.html.

If you are using Anaconda with the conda package manager, you can create and activate a virtual environment as follows:

conda create -n pyml python=3.9
conda activate pyml