IPython Interactive Computing and Visualization Cookbook

IPython Interactive Computing and Visualization Cookbook - Second Edition

By : Cyrille Rossant

Buy this Book

IPython Interactive Computing and Visualization Cookbook - Second Edition

By: Cyrille Rossant

Buy this Book

Overview of this book

Python is one of the leading open source platforms for data science and numerical computing. IPython and the associated Jupyter Notebook offer efficient interfaces to Python for data analysis and interactive visualization, and they constitute an ideal gateway to the platform. IPython Interactive Computing and Visualization Cookbook, Second Edition contains many ready-to-use, focused recipes for high-performance scientific computing and data analysis, from the latest IPython/Jupyter features to the most advanced tricks, to help you write better and faster code. You will apply these state-of-the-art methods to various real-world examples, illustrating topics in applied mathematics, scientific modeling, and machine learning. The first part of the book covers programming techniques: code quality and reproducibility, code optimization, high-performance computing through just-in-time compilation, parallel computing, and graphics card programming. The second part tackles data science, statistics, machine learning, signal and image processing, dynamical systems, and pure and applied mathematics.

IPython Interactive Computing and Visualization CookbookSecond Edition

Contributors

Preface

Free Chapter

A Tour of Interactive Computing with Jupyter and IPython

Introduction

Introducing IPython and the Jupyter Notebook

Getting started with exploratory data analysis in the Jupyter Notebook

Introducing the multidimensional array in NumPy for fast array computations

Creating an IPython extension with custom magic commands

Mastering IPython's configuration system

Creating a simple kernel for Jupyter

Best Practices in Interactive Computing

Introduction

Learning the basics of the Unix shell

Using the latest features of Python 3

Learning the basics of the distributed version control system Git

A typical workflow with Git branching

Efficient interactive computing workflows with IPython

Ten tips for conducting reproducible interactive computing experiments

Writing high-quality Python code

Writing unit tests with pytest

Debugging code with IPython

Mastering the Jupyter Notebook

Introduction

Teaching programming in the Notebook with IPython Blocks

Converting a Jupyter notebook to other formats with nbconvert

Mastering widgets in the Jupyter Notebook

Creating custom Jupyter Notebook widgets in Python, HTML, and JavaScript

Configuring the Jupyter Notebook

Introducing JupyterLab

Profiling and Optimization

Introduction

Evaluating the time taken by a command in IPython

Profiling your code easily with cProfile and IPython

Profiling your code line-by-line with line_profiler

Profiling the memory usage of your code with memory_profiler

Understanding the internals of NumPy to avoid unnecessary array copying

Using stride tricks with NumPy

Implementing an efficient rolling average algorithm with stride tricks

Processing large NumPy arrays with memory mapping

Manipulating large arrays with HDF5

High-Performance Computing

Introduction

Using Python to write faster code

Accelerating pure Python code with Numba and Just-In-Time compilation

Accelerating array computations with NumExpr

Wrapping a C library in Python with ctypes

Accelerating Python code with Cython

Optimizing Cython code by writing less Python and more C

Releasing the GIL to take advantage of multi-core processors with Cython and OpenMP

Writing massively parallel code for NVIDIA graphics cards (GPUs) with CUDA

Distributing Python code across multiple cores with IPython

Interacting with asynchronous parallel tasks in IPython

Performing out-of-core computations on large arrays with Dask

Trying the Julia programming language in the Jupyter Notebook

Data Visualization

Introduction

Using Matplotlib styles

Creating statistical plots easily with seaborn

Creating interactive web visualizations with Bokeh and HoloViews

Visualizing a NetworkX graph in the Notebook with D3.js

Discovering interactive visualization libraries in the Notebook

Creating plots with Altair and the Vega-Lite specification

Statistical Data Analysis

Introduction

Exploring a dataset with pandas and Matplotlib

Getting started with statistical hypothesis testing — a simple z-test

Getting started with Bayesian methods

Estimating the correlation between two variables with a contingency table and a chi-squared test

Fitting a probability distribution to data with the maximum likelihood method

Estimating a probability distribution nonparametrically with a kernel density estimation

Fitting a Bayesian model by sampling from a posterior distribution with a Markov chain Monte Carlo method

Analyzing data with the R programming language in the Jupyter Notebook

Machine Learning

Introduction

Getting started with scikit-learn

Predicting who will survive on the Titanic with logistic regression

Learning to recognize handwritten digits with a K-nearest neighbors classifier

Learning from text – Naive Bayes for Natural Language Processing

Using support vector machines for classification tasks

Using a random forest to select important features for regression

Reducing the dimensionality of a dataset with a principal component analysis

Detecting hidden structures in a dataset with clustering

Numerical Optimization

Introduction

Finding the root of a mathematical function

Minimizing a mathematical function

Fitting a function to data with nonlinear least squares

Finding the equilibrium state of a physical system by minimizing its potential energy

Signal Processing

Introduction

Analyzing the frequency components of a signal with a Fast Fourier Transform

Applying a linear filter to a digital signal

Computing the autocorrelation of a time series

Image and Audio Processing

Introduction

Manipulating the exposure of an image

Applying filters on an image

Segmenting an image

Finding points of interest in an image

Detecting faces in an image with OpenCV

Applying digital filters to speech sounds

Creating a sound synthesizer in the Notebook

Deterministic Dynamical Systems

Introduction

Plotting the bifurcation diagram of a chaotic dynamical system

Simulating an elementary cellular automaton

Simulating an ordinary differential equation with SciPy

Simulating a partial differential equation — reaction-diffusion systems and Turing patterns

Stochastic Dynamical Systems

Introduction

Simulating a discrete-time Markov chain

Simulating a Poisson process

Simulating a Brownian motion

Simulating a stochastic differential equation

Graphs, Geometry, and Geographic Information Systems

Introduction

Manipulating and visualizing graphs with NetworkX

Drawing flight routes with NetworkX

Resolving dependencies in a directed acyclic graph with a topological sort

Computing connected components in an image

Computing the Voronoi diagram of a set of points

Manipulating geospatial data with Cartopy

Creating a route planner for a road network

Symbolic and Numerical Mathematics

Introduction

Diving into symbolic computing with SymPy

Solving equations and inequalities

Analyzing real-valued functions

Computing exact probabilities and manipulating random variables

A bit of number theory with SymPy

Finding a Boolean propositional formula from a truth table

Analyzing a nonlinear differential system — Lotka-Volterra (predator-prey) equations

Getting started with Sage

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Preface

We are becoming awash in the flood of digital data from scientific research, engineering, economics, politics, journalism, business, and many other domains. As a result, analyzing, visualizing, and harnessing data is the occupation of an increasingly large and diverse set of people. Quantitative skills such as programming, numerical computing, mathematics, statistics, and data mining, which form the core of data science, are more and more appreciated in a seemingly endless plethora of fields.

Python, a widely-known programming language, is also one of the leading open platforms for data science. IPython is a mature Python project that provides scientist-friendly interactive access to Python. It is part of the broader Project Jupyter, which aims to provide high-quality environments for interactive computing, data analysis, visualization, and the authoring of interactive scientific documents. Jupyter is estimated to have several million users today.

The prequel of this book, Learning IPython for Interactive Computing and Data Visualization Second Edition, Packt Publishing was published in 2015, two years after the first edition. It is a beginner-level introduction to data science and numerical computing with Python, IPython, and Jupyter.

This book, the first edition of which was published in 2014, continues that journey by presenting more than 100 recipes for interactive scientific computing and data science. These recipes not only cover programming topics such as numerical computing, high-performance computing, parallel computing, and interactive visualization, but also data analysis topics such as statistics, data mining, machine learning, signal processing, graph theory, numerical optimization, and many others.

This second edition is fully compatible with the latest versions of the platform and its libraries. It includes new recipes to better leverage the latest features of Python 3, and it introduces promising new projects such as JupyterLab, Altair, and Dask.

Note

By design, this book privileges breadth over depth. A particularly wide range of libraries and techniques are covered in this book, but not comprehensively. We give many references that let you deepen your knowledge of individual methods. The goal of this book is not to make you an expert of the subjects covered, but to give you a glimpse of the extremely diverse set of applications that you can tackle with the platform.

All the recipes in this book, which cover a specific techniques, are available online as a Jupyter notebook. This interactive document lets you read, execute, and modify the code interactively, which makes the learning process more engaging and dynamic.

Almost all of this book's content is available online on the GitHub platform (http://ipython-books.github.io/). Updates and corrections will be regularly published there, so you should make sure you check out the latest version of the book online.

Who this book is for

This book targets researchers, engineers, data scientists, teachers, students, analysts, journalists, economists, and hobbyists interested in data analysis and numerical computing.

Readers familiar with the scientific Python ecosystem will find many resources to sharpen their skills in high-performance interactive computing with IPython and Jupyter.

Readers who need to implement algorithms for domain-specific applications will appreciate the introductions to a wide variety of topics in data analysis and applied mathematics.

Readers who are new to numerical computing with Python should start with the prequel of this book, Learning IPython for Interactive Computing and Data Visualization Second Edition, Packt Publishing published in 2015.

What this book covers

This book is split into two parts:

Part 1 (chapters 1 to 6) covers relatively advanced methods in interactive numerical computing, high-performance computing, and data visualization.

Part 2 (chapters 7 to 15) introduces standard methods in data science and mathematical modeling. Many of these methods are applied to real-world data.

Part 1 – Interactive Computing with Jupyter

Chapter 1, A Tour of Interactive Computing with Jupyter and IPython, contains a brief introduction to data analysis and numerical computing with IPython and Jupyter. It not only covers common packages such as Python, NumPy, pandas, and Matplotlib, but also advanced IPython/Jupyter topics such as interactive widgets in the Notebook, custom magic commands, configurable IPython extensions, and custom Jupyter kernels.

Chapter 2, Best Practices in Interactive Computing, details best practices to write reproducible, high-quality code: task automation, version control with Git, workflows with IPython and Jupyter, unit testing, continuous integration, debugging, and other related topics. The importance of these subjects in computational research and data analysis cannot be overstated.

Chapter 3, Mastering the Jupyter Notebook, covers topics related to the Jupyter Notebook, notably the Notebook format, notebook conversions, and interactive widgets.

Chapter 4, Profiling and Optimization, covers methods to make your code faster and more efficient: CPU and memory profiling in Python, advanced optimization techniques with NumPy (including large array manipulations), and memory mapping of huge arrays. These techniques are essential for big data analysis.

Chapter 5, High-Performance Computing, covers techniques to make your code much faster: code acceleration with Numba and Cython, wrapping C libraries in Python with ctypes, parallel computing with IPython and Dask, OpenMP, and General-Purpose Computing on Graphics Processing Units (GPGPU) with CUDA. The chapter ends with an introduction to the Julia language, a high-performance numerical computing programming language that can be used in the Jupyter Notebook.

Chapter 6, Data Visualization, introduces several visualization or interactive visualization libraries, such as matplotlib, seaborn, bokeh, D3, Altair, and others.

Part 2 – Standard Methods in Data Science and Applied Mathematics

Chapter 7, Statistical Data Analysis, covers methods for getting insights into data. It introduces classic frequentist and Bayesian methods for hypothesis testing, parametric and nonparametric estimation, and model inference. The chapter leverages Python libraries such as pandas, SciPy, statsmodels, and PyMC. The last recipe introduces the statistical language R, which can be easily used in the Jupyter Notebook.

Chapter 8, Machine Learning, covers methods to learn and make predictions from data. Using the scikit-learn Python package, this chapter illustrates fundamental data mining and machine learning concepts such as supervised and unsupervised learning, classification, regression, feature selection, feature extraction, overfitting, regularization, cross-validation, and grid search. Algorithms addressed in this chapter include logistic regression, Naive Bayes, K-nearest neighbors, support vector machines, random forests, and others. These methods are applied to various types of datasets: numerical data, images, and text.

Chapter 9, Numerical Optimization, covers minimizing and maximizing mathematical functions. This topic is pervasive in data science, notably in statistics, machine learning, and signal processing. This chapter illustrates a few root-finding, minimization, and curve-fitting routines with SciPy.

Chapter 10, Signal Processing, covers extracting relevant information from complex and noisy data. These steps are sometimes required prior to running statistical and data mining algorithms. This chapter introduces basic signal processing methods such as Fourier transforms and digital filters.

Chapter 11, Image and Audio Processing, covers signal processing methods for images and sounds. It introduces image filtering, segmentation, computer vision, and face detection with scikit-image and OpenCV. It also presents methods for audio processing and synthesis.

Chapter 12, Deterministic Dynamical Systems, describes the dynamical processes underlying particular types of data. It illustrates simulation techniques for discrete-time dynamical systems, as well as for ordinary differential equations and partial differential equations.

Chapter 13, Stochastic Dynamical Systems, describes the dynamical random processes underlying particular types of data. It illustrates simulation techniques for discrete-time Markov chains, point processes, and stochastic differential equations.

Chapter 14, Graphs, Geometry, and Geographic Information Systems, covers analysis and visualization methods for graphs, flight networks, road networks, maps, and geographic data.

Chapter 15, Symbolic and Numerical Mathematics, introduces SymPy, a computer algebra system that brings symbolic computing to Python. The chapter ends with an introduction to Sage, another Python-based system for computational mathematics.

To get the most out of this book 

This book is accessible to beginners. However, it may be easier for you if you are familiar with the contents of Learning IPython for Interactive Computing and Data Visualization, Second Edition, Packt Publishing (also called the "IPython minibook"), the prequel of this book. The minibook introduces Python programming, the IPython console, the Jupyter Notebook, numerical computing with NumPy, basic data analysis with pandas, and plotting with Matplotlib. This book tackles scientific programming topics that rely on all of these tools.

Part 2 is a bit more theoretical. It is easier to read if you know the basics of calculus, linear algebra, and probability theory (real-valued functions, integrals and derivatives, differential equations, matrices, vector spaces, probabilities, random variables, and so on). These chapters introduce different topics in data science and applied mathematics, and how to apply them with Python: statistics, machine learning, numerical optimization, signal processing, dynamical systems, graph theory, and others.

Installing Python

This book uses the free Anaconda distribution (https://www.anaconda.com/download/). It includes Python 3, IPython, Jupyter, and almost all of the packages that we will be using in this book. Anaconda also includes a powerful packaging system named Conda. The introduction of this book's first chapter gives you more details.

The code of this book has been written for Python 3 and is incompatible with older versions of Python, Python 2 (although minimal to no changes would be required to make it compatible).

GitHub repositories

This book has a website: http://ipython-books.github.io. The text, the code, and the data from the book are available on several GitHub repositories at https://github.com/ipython-books/. You can also run the code interactively in your web browser without installing anything on your computer, thanks to the Binder project.

Be sure to check out http://ipython-books.github.io and the repositories to get the latest updates and corrections. You can also propose your own corrections and suggestions on GitHub by opening issues or pull requests.

You can also follow the author online (http://cyrille.rossant.net) and on Twitter (@cyrillerossant).

Download the example code files

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at http://www.packtpub.com.
Select the SUPPORT tab.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box and follow the on-screen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example:«"

A block of code is set as follows:

>>> print("Hello world!")
Hello world!

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

>>> print("Hello world!")
Hello world!

Any command-line input or output is written as follows:

# cp /usr/src/asterisk-addons/configs/cdr_mysql.conf.sample
     /etc/asterisk/cdr_mysql.conf

Bold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. Here is an example: "Select System info from the Administration panel."

Note

Warnings or important notes appear in a box like this.

Note

Tips and tricks appear like this.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).

To give clear instructions on how to complete a recipe, use these sections as follows:

Getting ready

This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email <[email protected]> and mention the book's title in the subject of your message. If you have questions about any aspect of this book, please email us at <[email protected]>.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at <[email protected]> with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

IPython Interactive Computing and Visualization Cookbook - Second Edition

By : Cyrille Rossant

IPython Interactive Computing and Visualization Cookbook - Second Edition

By: Cyrille Rossant

Overview of this book

Related Content you might be interested in

Current Title:

IPython Interactive Computing and Visualization Cookbook - Second Edition

Python High Performance

Applying Math with Python

A Handbook of Mathematical Models with Python

Preface

Note

Who this book is for

What this book covers

Part 1 – Interactive Computing with Jupyter

Part 2 – Standard Methods in Data Science and Applied Mathematics

To get the most out of this book

Installing Python

GitHub repositories

Download the example code files

Download the color images

Conventions used

Note

Note

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Get in touch

Reviews

To get the most out of this book