-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
The Supervised Learning Workshop - Second Edition
By :
Would you like to understand how and why machine learning techniques and data analytics are spearheading enterprises globally? From analyzing bioinformatics to predicting climate change, machine learning plays an increasingly pivotal role in our society.
Although the real-world applications may seem complex, this book simplifies supervised learning for beginners with a step-by-step interactive approach. Working with real-time datasets, you'll learn how supervised learning, when used with Python, can produce efficient predictive models.
Starting with the fundamentals of supervised learning, you'll quickly move to understand how to automate manual tasks and the process of assessing data using Jupyter and Python libraries like pandas. Next, you'll use data exploration and visualization techniques to develop powerful supervised learning models, before understanding how to distinguish variables and represent their relationships using scatter plots, heatmaps, and box plots. After using regression and classification models on real-time datasets to predict future outcomes, you'll grasp advanced ensemble techniques such as boosting and random forests. Finally, you'll learn the importance of model evaluation in supervised learning and study metrics to evaluate regression and classification tasks.
By the end of this book, you'll have the skills you need to work on your own real-life supervised learning Python projects.
If you are a beginner or a data scientist who is just getting started and looking to learn how to implement machine learning algorithms to build predicting models, then this book is for you. To expedite the learning process, a solid understanding of Python programming is recommended as you'll be editing the classes or functions instead of creating from scratch.
Chapter 1, Fundamentals, introduces you to supervised learning, Jupyter notebooks, and some of the most common pandas data methods.
Chapter 2, Exploratory Data Analysis and Visualization, teaches you how to perform exploration and analysis on a new dataset.
Chapter 3, Linear Regression, teaches you how to tackle regression problems and analysis, introducing you to linear regression as well as multiple linear regression and gradient descent.
Chapter 4, Autoregression, teaches you how to implement autoregression as a method to forecast values that depend on past values.
Chapter 5, Classification Techniques, introduces classification problems, classification using linear and logistic regression, k-nearest neighbors, and decision trees.
Chapter 6, Ensemble Modeling, teaches you how to examine the different ways of ensemble modeling, including their benefits and limitations.
Chapter 7, Model Evaluation, demonstrates how you can improve a model's performance by using hyperparameters and model evaluation metrics.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Use the pandas read_csv function to load the CSV file containing the synth_temp.csv dataset, and then display the first five lines of data."
Words that you see on screen, for example, in menus or dialog boxes, also appear in the text like this: "Open the titanic.csv file by clicking on it on the Jupyter notebook home page."
A block of code is set as follows:
print(data[pd.isnull(data.damage_millions_dollars)].shape[0]) print(data[pd.isnull(data.damage_millions_dollars) & (data.damage_description != 'NA')].shape[0])
New terms and important words are shown like this: "Supervised means that the labels for the data are provided within the training, allowing the model to learn from these labels."
Lines of code that span multiple lines are split using a backslash ( \ ). When the code is executed, Python will ignore the backslash, and treat the code on the next line as a direct continuation of the current line.
For example:
history = model.fit(X, y, epochs=100, batch_size=5, verbose=1, \ validation_split=0.2, shuffle=False)
Comments are added into code to help explain specific bits of logic. Single-line comments are denoted using the # symbol, as follows:
# Print the sizes of the dataset
print("Number of Examples in the Dataset = ", X.shape[0])
print("Number of Features for each example = ", X.shape[1])
Multi-line comments are enclosed by triple quotes, as shown below:
""" Define a seed for the random number generator to ensure the result will be reproducible """ seed = 1 np.random.seed(seed) random.set_seed(seed)
Before we explore the book in detail, we need to set up specific software and tools. In the following section, we shall see how to do that.
All code in this book is executed using Jupyter Notebooks and Python 3.7. Jupyter Notebooks and Python 3.7 are available once you install Anaconda on your system. The following sections lists the instructions for installing Anaconda on Windows, macOS, and Linux systems.
Here are the steps that you need to follow to complete the installation:
These installations will be executed in the ‘C’ drive of your system. However, you can choose to change the destination.
bash ~/Downloads/Anaconda-2020.02-Linux-x86_64.shYou can find more details regarding the installation for various systems by visiting this site: https://docs.anaconda.com/anaconda/install/.
pip comes pre-installed with Anaconda. Once Anaconda is installed on your machine, all the required libraries can be installed using pip, for example, pip install numpy. Alternatively, you can install all the required libraries using pip install –r requirements.txt. You can find the requirements.txt file at https://packt.live/3hSJgYy.
The exercises and activities will be executed in Jupyter Notebooks. Jupyter is a Python library and can be installed in the same way as the other Python libraries – that is, with pip install jupyter, but fortunately, it comes pre-installed with Anaconda. To open a notebook, simply run the command jupyter notebook in the Terminal or Command Prompt.
You can find the complete code files of this book at https://packt.live/2TlcKDf. You can also run many activities and exercises directly in your web browser by using the interactive lab environment at https://packt.live/37QVpsD.
We've tried to support interactive versions of all activities and exercises, but we recommend a local installation as well for instances where this support isn't available.
If you have any issues or questions about installation, please email us at [email protected].
Change the font size
Change margin width
Change background colour