IPython Interactive Computing and Visualization Cookbook

IPython Interactive Computing and Visualization Cookbook - Second Edition

By : Cyrille Rossant

Buy this Book

IPython Interactive Computing and Visualization Cookbook - Second Edition

By: Cyrille Rossant

Buy this Book

Overview of this book

Python is one of the leading open source platforms for data science and numerical computing. IPython and the associated Jupyter Notebook offer efficient interfaces to Python for data analysis and interactive visualization, and they constitute an ideal gateway to the platform. IPython Interactive Computing and Visualization Cookbook, Second Edition contains many ready-to-use, focused recipes for high-performance scientific computing and data analysis, from the latest IPython/Jupyter features to the most advanced tricks, to help you write better and faster code. You will apply these state-of-the-art methods to various real-world examples, illustrating topics in applied mathematics, scientific modeling, and machine learning. The first part of the book covers programming techniques: code quality and reproducibility, code optimization, high-performance computing through just-in-time compilation, parallel computing, and graphics card programming. The second part tackles data science, statistics, machine learning, signal and image processing, dynamical systems, and pure and applied mathematics.

IPython Interactive Computing and Visualization CookbookSecond Edition

Contributors

Preface

Free Chapter

A Tour of Interactive Computing with Jupyter and IPython

Introduction

Introducing IPython and the Jupyter Notebook

Getting started with exploratory data analysis in the Jupyter Notebook

Introducing the multidimensional array in NumPy for fast array computations

Creating an IPython extension with custom magic commands

Mastering IPython's configuration system

Creating a simple kernel for Jupyter

Best Practices in Interactive Computing

Introduction

Learning the basics of the Unix shell

Using the latest features of Python 3

Learning the basics of the distributed version control system Git

A typical workflow with Git branching

Efficient interactive computing workflows with IPython

Ten tips for conducting reproducible interactive computing experiments

Writing high-quality Python code

Writing unit tests with pytest

Debugging code with IPython

Mastering the Jupyter Notebook

Introduction

Teaching programming in the Notebook with IPython Blocks

Converting a Jupyter notebook to other formats with nbconvert

Mastering widgets in the Jupyter Notebook

Creating custom Jupyter Notebook widgets in Python, HTML, and JavaScript

Configuring the Jupyter Notebook

Introducing JupyterLab

Profiling and Optimization

Introduction

Evaluating the time taken by a command in IPython

Profiling your code easily with cProfile and IPython

Profiling your code line-by-line with line_profiler

Profiling the memory usage of your code with memory_profiler

Understanding the internals of NumPy to avoid unnecessary array copying

Using stride tricks with NumPy

Implementing an efficient rolling average algorithm with stride tricks

Processing large NumPy arrays with memory mapping

Manipulating large arrays with HDF5

High-Performance Computing

Introduction

Using Python to write faster code

Accelerating pure Python code with Numba and Just-In-Time compilation

Accelerating array computations with NumExpr

Wrapping a C library in Python with ctypes

Accelerating Python code with Cython

Optimizing Cython code by writing less Python and more C

Releasing the GIL to take advantage of multi-core processors with Cython and OpenMP

Writing massively parallel code for NVIDIA graphics cards (GPUs) with CUDA

Distributing Python code across multiple cores with IPython

Interacting with asynchronous parallel tasks in IPython

Performing out-of-core computations on large arrays with Dask

Trying the Julia programming language in the Jupyter Notebook

Data Visualization

Introduction

Using Matplotlib styles

Creating statistical plots easily with seaborn

Creating interactive web visualizations with Bokeh and HoloViews

Visualizing a NetworkX graph in the Notebook with D3.js

Discovering interactive visualization libraries in the Notebook

Creating plots with Altair and the Vega-Lite specification

Statistical Data Analysis

Introduction

Exploring a dataset with pandas and Matplotlib

Getting started with statistical hypothesis testing — a simple z-test

Getting started with Bayesian methods

Estimating the correlation between two variables with a contingency table and a chi-squared test

Fitting a probability distribution to data with the maximum likelihood method

Estimating a probability distribution nonparametrically with a kernel density estimation

Fitting a Bayesian model by sampling from a posterior distribution with a Markov chain Monte Carlo method

Analyzing data with the R programming language in the Jupyter Notebook

Machine Learning

Introduction

Getting started with scikit-learn

Predicting who will survive on the Titanic with logistic regression

Learning to recognize handwritten digits with a K-nearest neighbors classifier

Learning from text – Naive Bayes for Natural Language Processing

Using support vector machines for classification tasks

Using a random forest to select important features for regression

Reducing the dimensionality of a dataset with a principal component analysis

Detecting hidden structures in a dataset with clustering

Numerical Optimization

Introduction

Finding the root of a mathematical function

Minimizing a mathematical function

Fitting a function to data with nonlinear least squares

Finding the equilibrium state of a physical system by minimizing its potential energy

Signal Processing

Introduction

Analyzing the frequency components of a signal with a Fast Fourier Transform

Applying a linear filter to a digital signal

Computing the autocorrelation of a time series

Image and Audio Processing

Introduction

Manipulating the exposure of an image

Applying filters on an image

Segmenting an image

Finding points of interest in an image

Detecting faces in an image with OpenCV

Applying digital filters to speech sounds

Creating a sound synthesizer in the Notebook

Deterministic Dynamical Systems

Introduction

Plotting the bifurcation diagram of a chaotic dynamical system

Simulating an elementary cellular automaton

Simulating an ordinary differential equation with SciPy

Simulating a partial differential equation — reaction-diffusion systems and Turing patterns

Stochastic Dynamical Systems

Introduction

Simulating a discrete-time Markov chain

Simulating a Poisson process

Simulating a Brownian motion

Simulating a stochastic differential equation

Graphs, Geometry, and Geographic Information Systems

Introduction

Manipulating and visualizing graphs with NetworkX

Drawing flight routes with NetworkX

Resolving dependencies in a directed acyclic graph with a topological sort

Computing connected components in an image

Computing the Voronoi diagram of a set of points

Manipulating geospatial data with Cartopy

Creating a route planner for a road network

Symbolic and Numerical Mathematics

Introduction

Diving into symbolic computing with SymPy

Solving equations and inequalities

Analyzing real-valued functions

Computing exact probabilities and manipulating random variables

A bit of number theory with SymPy

Finding a Boolean propositional formula from a truth table

Analyzing a nonlinear differential system — Lotka-Volterra (predator-prey) equations

Getting started with Sage

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Getting started with exploratory data analysis in the Jupyter Notebook

In this recipe, we will give an introduction to IPython and Jupyter for data analysis. Most of the subject has been covered in the prequel of this book, Learning IPython for Interactive Computing and Data Visualization, Second Edition, Packt Publishing, but we will review the basics here.

We will download and analyze a dataset about attendance on Montreal's bicycle tracks. This example is largely inspired by a presentation from Julia Evans (available at https://github.com/jvns/talks/blob/master/2013-04-mtlpy/pistes-cyclables.ipynb). Specifically, we will introduce the following:

Data manipulation with pandas
Data visualization with Matplotlib
Interactive widgets

How to do it...

The very first step is to import the scientific packages we will be using in this recipe, namely NumPy, pandas, and Matplotlib. We also instruct Matplotlib to render the figures as inline images in the Notebook:
```
>>> import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    %matplotlib inline
```
Note
We can enable high-resolution Matplotlib figures on Retina display systems with the following commands:
```
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')
```
Now, we create a new Python variable called url that contains the address to a Comma-separated Values (CSV) data file. This standard text-based file format is used to store tabular data:
```
>>> url = ("https://raw.githubusercontent.com/"
           "ipython-books/cookbook-2nd-data/"
           "master/bikes.csv")
```
pandas defines a read_csv() function that can read any CSV file. Here, we pass the URL to the file. pandas will automatically download the file, parse it, and return a DataFrame object. We need to specify a few options to make sure that the dates are parsed correctly:
```
>>> df = pd.read_csv(url, index_col='Date',
                     parse_dates=True, dayfirst=True)
```
The df variable contains a DataFrame object, a specific pandas data structure that contains 2D tabular data. The head(n) method displays the first n rows of this table. In the Notebook, pandas displays a DataFrame object in an HTML table, as shown in the following screenshot:
```
>>> df.head(2)
```
Here, every row contains the number of bicycles on every track of the city, for every day of the year.
We can get some summary statistics of the table with the describe() method:
```
>>> df.describe()
```
Let's display some figures. We will plot the daily attendance of two tracks. First, we select the two columns, Berri1 and PierDup. Then, we call the plot() method:
```
>>> df[['Berri1', 'PierDup']].plot(figsize=(10, 6),
                                   style=['-', '--'],
                                   lw=2)
```
Now, we move to a slightly more advanced analysis. We will look at the attendance of all tracks as a function of the weekday. We can get the weekday easily with pandas: the index attribute of the DataFrame object contains the dates of all rows in the table. This index has a few date-related attributes, including weekday_name:
```
>>> df.index.weekday_name
Index(['Tuesday', 'Wednesday', 'Thursday', 'Friday',
       'Saturday', 'Sunday', 'Monday', 'Tuesday',
       ...
       'Friday', 'Saturday', 'Sunday', 'Monday',
       'Tuesday', 'Wednesday'],
      dtype='object', name='Date', length=261)
```
To get the attendance as a function of the weekday, we need to group the table elements by the weekday. The groupby() method lets us do just that. We use weekday instead of weekday_name to keep the weekday order (Monday is 0, Tuesday is 1, and so on). Once grouped, we can sum all rows in every group:
```
>>> df_week = df.groupby(df.index.weekday).sum()
>>> df_week
```

We can now display this information in a figure. We create a Matplotlib figure, and we use the plot() method of DataFrame to create our plot:

>>> fig, ax = plt.subplots(1, 1, figsize=(10, 8))
    df_week.plot(style='-o', lw=3, ax=ax)
    ax.set_xlabel('Weekday')
    # We replace the labels 0, 1, 2... by the weekday
    # names.
    ax.set_xticklabels(
        ('Monday,Tuesday,Wednesday,Thursday,'
         'Friday,Saturday,Sunday').split(','))
    ax.set_ylim(0)  # Set the bottom axis to 0.

Finally, let's illustrate the interactive capabilities of the Notebook. We will plot a smoothed version of the track attendance as a function of time (rolling mean). The idea is to compute the mean value in the neighborhood of any day. The larger the neighborhood, the smoother the curve. We will create an interactive slider in the Notebook to vary this parameter in real time in the plot. All we have to do is add the @interact decorator above our plotting function:
```
>>> from ipywidgets import interact
    
    @interact
    def plot(n=(1, 30)):
        fig, ax = plt.subplots(1, 1, figsize=(10, 8))
        df['Berri1'].rolling(window=n).mean().plot(ax=ax)
        ax.set_ylim(0, 7000)
        plt.show()
```

How it works...

To create Matplotlib figures, it is good practice to create a Figure (fig) and one or several Axes (subplots, ax object) objects with the plt.subplots() command. The figsize keyword argument lets us specify the size of the figure, in inches. Then, we call plotting methods directly on the Axes instances. Here, for example, we set the y limits of the axis with the set_ylim() method. If there are existing plotting commands, like the plot() method provided by pandas on DataFrame instances, we can pass the relevant Axis instance with the ax keyword argument.

There's more...

pandas is the main data wrangling library in Python. Other tools and methods are generally required for more advanced analyses (signal processing, statistics, and mathematical modeling). We will cover these steps in the second part of this book, starting with Chapter 7, Statistical Data Analysis.

Here are some more references about data manipulation with pandas:

Learning IPython for Interactive Computing and Data Visualization, Second Edition, Packt Publishing, the prequel of this book
Python for Data Analysis, O'Reilly Media, by Wes McKinney, the creator of pandas, at http://shop.oreilly.com/product/0636920023784.do
Python Data Science Handbook, O'Reilly Media, by Jake VanderPlas, at http://shop.oreilly.com/product/0636920034919.do
The documentation of pandas available at http://pandas.pydata.org/pandas-docs/stable/
Usage guide of Matplotlib, at https://matplotlib.org/tutorials/introductory/usage.html

IPython Interactive Computing and Visualization Cookbook - Second Edition

By : Cyrille Rossant

IPython Interactive Computing and Visualization Cookbook - Second Edition

By: Cyrille Rossant

Overview of this book

Related Content you might be interested in

Current Title:

IPython Interactive Computing and Visualization Cookbook - Second Edition

Python High Performance

Applying Math with Python

A Handbook of Mathematical Models with Python

Getting started with exploratory data analysis in the Jupyter Notebook

How to do it...

Note

How it works...

There's more...

See also