IPython Interactive Computing and Visualization Cookbook

Book Image

IPython Interactive Computing and Visualization Cookbook

By : Cyrille Rossant

Book Image

IPython Interactive Computing and Visualization Cookbook

By: Cyrille Rossant

Overview of this book

IPython Interactive Computing and Visualization Cookbook

IPython Interactive Computing and Visualization Cookbook

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

A Tour of Interactive Computing with IPython

A Tour of Interactive Computing with IPython

Introducing the IPython notebook

Getting started with exploratory data analysis in IPython

Introducing the multidimensional array in NumPy for fast array computations

Creating an IPython extension with custom magic commands

Mastering IPython's configuration system

Creating a simple kernel for IPython

Best Practices in Interactive Computing

Best Practices in Interactive Computing

Choosing (or not) between Python 2 and Python 3

Efficient interactive computing workflows with IPython

Learning the basics of the distributed version control system Git

A typical workflow with Git branching

Ten tips for conducting reproducible interactive computing experiments

Writing high-quality Python code

Writing unit tests with nose

Debugging your code with IPython

Mastering the Notebook

Mastering the Notebook

Teaching programming in the notebook with IPython blocks

Converting an IPython notebook to other formats with nbconvert

Adding custom controls in the notebook toolbar

Customizing the CSS style in the notebook

Using interactive widgets – a piano in the notebook

Creating a custom JavaScript widget in the notebook – a spreadsheet editor for pandas

Processing webcam images in real time from the notebook

Profiling and Optimization

Profiling and Optimization

Evaluating the time taken by a statement in IPython

Profiling your code easily with cProfile and IPython

Profiling your code line-by-line with line_profiler

Profiling the memory usage of your code with memory_profiler

Understanding the internals of NumPy to avoid unnecessary array copying

Using stride tricks with NumPy

Implementing an efficient rolling average algorithm with stride tricks

Making efficient array selections in NumPy

Processing huge NumPy arrays with memory mapping

Manipulating large arrays with HDF5 and PyTables

Manipulating large heterogeneous tables with HDF5 and PyTables

High-performance Computing

High-performance Computing

Accelerating pure Python code with Numba and just-in-time compilation

Accelerating array computations with Numexpr

Wrapping a C library in Python with ctypes

Accelerating Python code with Cython

Optimizing Cython code by writing less Python and more C

Releasing the GIL to take advantage of multicore processors with Cython and OpenMP

Writing massively parallel code for NVIDIA graphics cards (GPUs) with CUDA

Writing massively parallel code for heterogeneous platforms with OpenCL

Distributing Python code across multiple cores with IPython

Interacting with asynchronous parallel tasks in IPython

Parallelizing code with MPI in IPython

Trying the Julia language in the notebook

Advanced Visualization

Advanced Visualization

Making nicer matplotlib figures with prettyplotlib

Creating beautiful statistical plots with seaborn

Creating interactive web visualizations with Bokeh

Visualizing a NetworkX graph in the IPython notebook with D3.js

Converting matplotlib figures to D3.js visualizations with mpld3

Getting started with Vispy for high-performance interactive data visualizations

Statistical Data Analysis

Statistical Data Analysis

Exploring a dataset with pandas and matplotlib

Getting started with statistical hypothesis testing – a simple z-test

Getting started with Bayesian methods

Estimating the correlation between two variables with a contingency table and a chi-squared test

Fitting a probability distribution to data with the maximum likelihood method

Estimating a probability distribution nonparametrically with a kernel density estimation

Fitting a Bayesian model by sampling from a posterior distribution with a Markov chain Monte Carlo method

Analyzing data with the R programming language in the IPython notebook

Machine Learning

Machine Learning

Getting started with scikit-learn

Predicting who will survive on the Titanic with logistic regression

Learning to recognize handwritten digits with a K-nearest neighbors classifier

Learning from text – Naive Bayes for Natural Language Processing

Using support vector machines for classification tasks

Using a random forest to select important features for regression

Reducing the dimensionality of a dataset with a principal component analysis

Detecting hidden structures in a dataset with clustering

Numerical Optimization

Numerical Optimization

Finding the root of a mathematical function

Minimizing a mathematical function

Fitting a function to data with nonlinear least squares

Finding the equilibrium state of a physical system by minimizing its potential energy

Signal Processing

Signal Processing

Analyzing the frequency components of a signal with a Fast Fourier Transform

Applying a linear filter to a digital signal

Computing the autocorrelation of a time series

Image and Audio Processing

Image and Audio Processing

Manipulating the exposure of an image

Applying filters on an image

Segmenting an image

Finding points of interest in an image

Detecting faces in an image with OpenCV

Applying digital filters to speech sounds

Creating a sound synthesizer in the notebook

Deterministic Dynamical Systems

Deterministic Dynamical Systems

Plotting the bifurcation diagram of a chaotic dynamical system

Simulating an elementary cellular automaton

Simulating an ordinary differential equation with SciPy

Simulating a partial differential equation – reaction-diffusion systems and Turing patterns

Stochastic Dynamical Systems

Stochastic Dynamical Systems

Simulating a discrete-time Markov chain

Simulating a Poisson process

Simulating a Brownian motion

Simulating a stochastic differential equation

Graphs, Geometry, and Geographic Information Systems

Graphs, Geometry, and Geographic Information Systems

Manipulating and visualizing graphs with NetworkX

Analyzing a social network with NetworkX

Resolving dependencies in a directed acyclic graph with a topological sort

Computing connected components in an image

Computing the Voronoi diagram of a set of points

Manipulating geospatial data with Shapely and basemap

Creating a route planner for a road network

Symbolic and Numerical Mathematics

Symbolic and Numerical Mathematics

Diving into symbolic computing with SymPy

Solving equations and inequalities

Analyzing real-valued functions

Computing exact probabilities and manipulating random variables

A bit of number theory with SymPy

Finding a Boolean propositional formula from a truth table

Analyzing a nonlinear differential system – Lotka-Volterra (predator-prey) equations

Getting started with Sage

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Learning from text – Naive Bayes for Natural Language Processing

In this recipe, we show how to handle text data with scikit-learn. Working with text requires careful preprocessing and feature extraction. It is also quite common to deal with highly sparse matrices.

We will learn to recognize whether a comment posted during a public discussion is considered insulting to one of the participants. We will use a labeled dataset from Impermium, released during a Kaggle competition.

Getting ready

Download the Troll dataset from the book's GitHub repository at https://github.com/ipython-books/cookbook-data.

This dataset was obtained from Kaggle, at www.kaggle.com/c/detecting-insults-in-social-commentary.

How to do it...

Let's import our libraries:

In [1]: import numpy as np
        import pandas as pd
        import sklearn
        import sklearn.cross_validation as cv
        import sklearn.grid_search as gs
        import sklearn.feature_extraction.text as text
        import sklearn.naive_bayes as nb...