IPython Interactive Computing and Visualization Cookbook - Second Edition

Book Image

IPython Interactive Computing and Visualization Cookbook - Second Edition

By : Cyrille Rossant

Book Image

IPython Interactive Computing and Visualization Cookbook - Second Edition

By: Cyrille Rossant

Overview of this book

Python is one of the leading open source platforms for data science and numerical computing. IPython and the associated Jupyter Notebook offer efficient interfaces to Python for data analysis and interactive visualization, and they constitute an ideal gateway to the platform. IPython Interactive Computing and Visualization Cookbook, Second Edition contains many ready-to-use, focused recipes for high-performance scientific computing and data analysis, from the latest IPython/Jupyter features to the most advanced tricks, to help you write better and faster code. You will apply these state-of-the-art methods to various real-world examples, illustrating topics in applied mathematics, scientific modeling, and machine learning. The first part of the book covers programming techniques: code quality and reproducibility, code optimization, high-performance computing through just-in-time compilation, parallel computing, and graphics card programming. The second part tackles data science, statistics, machine learning, signal and image processing, dynamical systems, and pure and applied mathematics.

IPython Interactive Computing and Visualization CookbookSecond Edition

IPython Interactive Computing and Visualization CookbookSecond Edition

Contributors

Preface

Free Chapter

A Tour of Interactive Computing with Jupyter and IPython

A Tour of Interactive Computing with Jupyter and IPython

Introducing IPython and the Jupyter Notebook

Getting started with exploratory data analysis in the Jupyter Notebook

Introducing the multidimensional array in NumPy for fast array computations

Creating an IPython extension with custom magic commands

Mastering IPython's configuration system

Creating a simple kernel for Jupyter

Best Practices in Interactive Computing

Best Practices in Interactive Computing

Learning the basics of the Unix shell

Using the latest features of Python 3

Learning the basics of the distributed version control system Git

A typical workflow with Git branching

Efficient interactive computing workflows with IPython

Ten tips for conducting reproducible interactive computing experiments

Writing high-quality Python code

Writing unit tests with pytest

Debugging code with IPython

Mastering the Jupyter Notebook

Mastering the Jupyter Notebook

Teaching programming in the Notebook with IPython Blocks

Converting a Jupyter notebook to other formats with nbconvert

Mastering widgets in the Jupyter Notebook

Creating custom Jupyter Notebook widgets in Python, HTML, and JavaScript

Configuring the Jupyter Notebook

Introducing JupyterLab

Profiling and Optimization

Profiling and Optimization

Evaluating the time taken by a command in IPython

Profiling your code easily with cProfile and IPython

Profiling your code line-by-line with line_profiler

Profiling the memory usage of your code with memory_profiler

Understanding the internals of NumPy to avoid unnecessary array copying

Using stride tricks with NumPy

Implementing an efficient rolling average algorithm with stride tricks

Processing large NumPy arrays with memory mapping

Manipulating large arrays with HDF5

High-Performance Computing

High-Performance Computing

Using Python to write faster code

Accelerating pure Python code with Numba and Just-In-Time compilation

Accelerating array computations with NumExpr

Wrapping a C library in Python with ctypes

Accelerating Python code with Cython

Optimizing Cython code by writing less Python and more C

Releasing the GIL to take advantage of multi-core processors with Cython and OpenMP

Writing massively parallel code for NVIDIA graphics cards (GPUs) with CUDA

Distributing Python code across multiple cores with IPython

Interacting with asynchronous parallel tasks in IPython

Performing out-of-core computations on large arrays with Dask

Trying the Julia programming language in the Jupyter Notebook

Data Visualization

Data Visualization

Using Matplotlib styles

Creating statistical plots easily with seaborn

Creating interactive web visualizations with Bokeh and HoloViews

Visualizing a NetworkX graph in the Notebook with D3.js

Discovering interactive visualization libraries in the Notebook

Creating plots with Altair and the Vega-Lite specification

Statistical Data Analysis

Statistical Data Analysis

Exploring a dataset with pandas and Matplotlib

Getting started with statistical hypothesis testing — a simple z-test

Getting started with Bayesian methods

Estimating the correlation between two variables with a contingency table and a chi-squared test

Fitting a probability distribution to data with the maximum likelihood method

Estimating a probability distribution nonparametrically with a kernel density estimation

Fitting a Bayesian model by sampling from a posterior distribution with a Markov chain Monte Carlo method

Analyzing data with the R programming language in the Jupyter Notebook

Machine Learning

Machine Learning

Getting started with scikit-learn

Predicting who will survive on the Titanic with logistic regression

Learning to recognize handwritten digits with a K-nearest neighbors classifier

Learning from text – Naive Bayes for Natural Language Processing

Using support vector machines for classification tasks

Using a random forest to select important features for regression

Reducing the dimensionality of a dataset with a principal component analysis

Detecting hidden structures in a dataset with clustering

Numerical Optimization

Numerical Optimization

Finding the root of a mathematical function

Minimizing a mathematical function

Fitting a function to data with nonlinear least squares

Finding the equilibrium state of a physical system by minimizing its potential energy

Signal Processing

Signal Processing

Analyzing the frequency components of a signal with a Fast Fourier Transform

Applying a linear filter to a digital signal

Computing the autocorrelation of a time series

Image and Audio Processing

Image and Audio Processing

Manipulating the exposure of an image

Applying filters on an image

Segmenting an image

Finding points of interest in an image

Detecting faces in an image with OpenCV

Applying digital filters to speech sounds

Creating a sound synthesizer in the Notebook

Deterministic Dynamical Systems

Deterministic Dynamical Systems

Plotting the bifurcation diagram of a chaotic dynamical system

Simulating an elementary cellular automaton

Simulating an ordinary differential equation with SciPy

Simulating a partial differential equation — reaction-diffusion systems and Turing patterns

Stochastic Dynamical Systems

Stochastic Dynamical Systems

Simulating a discrete-time Markov chain

Simulating a Poisson process

Simulating a Brownian motion

Simulating a stochastic differential equation

Graphs, Geometry, and Geographic Information Systems

Graphs, Geometry, and Geographic Information Systems

Manipulating and visualizing graphs with NetworkX

Drawing flight routes with NetworkX

Resolving dependencies in a directed acyclic graph with a topological sort

Computing connected components in an image

Computing the Voronoi diagram of a set of points

Manipulating geospatial data with Cartopy

Creating a route planner for a road network

Symbolic and Numerical Mathematics

Symbolic and Numerical Mathematics

Diving into symbolic computing with SymPy

Solving equations and inequalities

Analyzing real-valued functions

Computing exact probabilities and manipulating random variables

A bit of number theory with SymPy

Finding a Boolean propositional formula from a truth table

Analyzing a nonlinear differential system — Lotka-Volterra (predator-prey) equations

Getting started with Sage

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Predicting who will survive on the Titanic with logistic regression

In this recipe, we will introduce logistic regression, a basic classifier. We will apply these techniques on a Kaggle dataset where the goal is to predict survival on the Titanic based on real data (see http://www.kaggle.com/c/titanic).

Note

Kaggle (http://www.kaggle.com/competitions) hosts machine learning competitions where anyone can download a dataset, train a model, and test the predictions on the website.

How to do it...

We import the standard packages:

>>> import numpy as np
    import pandas as pd
    import sklearn
    import sklearn.linear_model as lm
    import sklearn.model_selection as ms
    import matplotlib.pyplot as plt
    %matplotlib inline

We load the training and test datasets with pandas:

>>> train = pd.read_csv('https://github.com/ipython-books'
                        '/cookbook-2nd-data/blob/master/'
                        'titanic_train.csv?raw=true')
    test = pd.read_csv('https:/...