Book Image

Applied Data Science with Python and Jupyter

By : Alex Galea
Book Image

Applied Data Science with Python and Jupyter

By: Alex Galea

Overview of this book

Getting started with data science doesn't have to be an uphill battle. Applied Data Science with Python and Jupyter is a step-by-step guide ideal for beginners who know a little Python and are looking for a quick, fast-paced introduction to these concepts. In this book, you'll learn every aspect of the standard data workflow process, including collecting, cleaning, investigating, visualizing, and modeling data. You'll start with the basics of Jupyter, which will be the backbone of the book. After familiarizing ourselves with its standard features, you'll look at an example of it in practice with our first analysis. In the next lesson, you dive right into predictive analytics, where multiple classification algorithms are implemented. Finally, the book ends by looking at data collection techniques. You'll see how web data can be acquired with scraping techniques and via APIs, and then briefly explore interactive visualizations.
Table of Contents (6 chapters)

Preface

Note

About

This section briefly introduces the author, the coverage of this book, the technical skills you'll need to get started, and the hardware and software requirements required to complete all of the included activities and exercises.

About the Book

Applied Data Science with Python and Jupyter teaches you the skills you need for entry-level data science. You'll learn about some of the most commonly used libraries that are part of the Anaconda distribution, and then explore machine learning models with real datasets to give you the skills and exposure you need for the real world. You'll finish up by learning how easy it can be to scrape and gather your own data from the open web so that you can apply your new skills in an actionable context.

About the Author

Alex Galea has been doing data analysis professionally since graduating with a master's in physics from the University of Guelph in Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. More recently, Alex has been doing web data analytics, where Python continues to play a large part in his work. He frequently blogs about work and personal projects, which are generally data-centric and usually involve Python and Jupyter Notebooks.

Objectives

  • Get up and running with the Jupyter ecosystem

  • Identify potential areas of investigation and perform exploratory data analysis

  • Plan a machine learning classification strategy and train classification models

  • Use validation curves and dimensionality reduction to tune and enhance your models

  • Scrape tabular data from web pages and transform it into Pandas DataFrames

  • Create interactive, web-friendly visualizations to clearly communicate your findings

Audience

Applied Data Science with Python and Jupyter is ideal for professionals with a variety of job descriptions across a large range of industries, given the rising popularity and accessibility of data science. You'll need some prior experience with Python, with any prior work with libraries such as Pandas, Matplotlib, and Pandas providing you a useful head start.

Approach

Applied Data Science with Python and Jupyter covers every aspect of the standard data workflow process with a perfect blend of theory, practical hands-on coding, and relatable illustrations. Each module is designed to build on the learnings of the previous chapter. The book contains multiple activities that use real-life business scenarios for you to practice and apply your new skills in a highly relevant context.

Minimum Hardware Requirements

The minimum hardware requirements are as follows:

  • Processor: Intel i5 (or equivalent)

  • Memory: 8 GB RAM

  • Hard disk: 10 GB

  • An internet connection

Software Requirements

You'll also need the following software installed in advance:

  • Python 3.5+

  • Anaconda 4.3+

  • Python libraries included with Anaconda installation:

  • matplotlib 2.1.0+

  • ipython 6.1.0+

  • requests 2.18.4+

  • beautifulsoup4 4.6.0+

  • numpy 1.13.1+

  • pandas 0.20.3+

  • scikit-learn 0.19.0+

  • seaborn 0.8.0+

  • bokeh 0.12.10+

  • Python libraries that require manual installation:

  • mlxtend

  • version_information

  • ipython-sql

  • pdir2

  • graphviz

Installation and Setup

Before you start with this book, we'll install Anaconda environment which consists of Python and Jupyter Notebook.

Installing Anaconda

  1. Visit https://www.anaconda.com/download/ in your browser.

  2. Click on Windows, Mac, or Linux, depending on the OS you are working on.

  3. Next, click on the Download option. Make sure you download the latest version.

  4. Open the installer after download.

  5. Follow the steps in the installer and that's it! Your Anaconda distribution is ready.

Updating Jupyter and Installing Dependencies

  1. Search for Anaconda Prompt and open it.

  2. Type the following commands to update conda and Jupyter:

    #Update conda
    conda update conda
    
    #Update Jupyter
    conda update Jupyter
    
    #install packages
    conda install numpy
    conda install pandas
    conda install statsmodels
    conda install matplotlib
    conda install seaborn
  3. To open Jupyter Notebook from Anaconda Prompt, use the following command:

    jupyter notebook
    pip install -U scikit-learn

Additional Resources

The code bundle for this book is also hosted on GitHub at https://github.com/TrainingByPackt/Applied-Data-Science-with-Python-and-Jupyter.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions

Code words in text, database table names, folder names, filenames, file extensions, path names, dummy URLs, user input, and Twitter handles are shown as follows:

"The final figure is then saved as a high resolution PNG to the figures folder."

A block of code is set as follows:

y = df['MEDV'].copy()
del df['MEDV']
df = pd.concat((y, df), axis=1)

Any command-line input or output is written as follows:

jupyter notebook

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Click on New in the upper-right corner and select a kernel from the drop-down menu."