Book Image

IPython Interactive Computing and Visualization Cookbook - Second Edition

By : Cyrille Rossant
Book Image

IPython Interactive Computing and Visualization Cookbook - Second Edition

By: Cyrille Rossant

Overview of this book

Python is one of the leading open source platforms for data science and numerical computing. IPython and the associated Jupyter Notebook offer efficient interfaces to Python for data analysis and interactive visualization, and they constitute an ideal gateway to the platform. IPython Interactive Computing and Visualization Cookbook, Second Edition contains many ready-to-use, focused recipes for high-performance scientific computing and data analysis, from the latest IPython/Jupyter features to the most advanced tricks, to help you write better and faster code. You will apply these state-of-the-art methods to various real-world examples, illustrating topics in applied mathematics, scientific modeling, and machine learning. The first part of the book covers programming techniques: code quality and reproducibility, code optimization, high-performance computing through just-in-time compilation, parallel computing, and graphics card programming. The second part tackles data science, statistics, machine learning, signal and image processing, dynamical systems, and pure and applied mathematics.
Table of Contents (19 chapters)
IPython Interactive Computing and Visualization CookbookSecond Edition
Contributors
Preface
Index

Performing out-of-core computations on large arrays with Dask


Dask is a parallel computing library that offers not only a general framework for distributing complex computations on many nodes, but also a set of convenient high-level APIs to deal with out-of-core computations on large arrays. Dask provides data structures resembling NumPy arrays (dask.array) and Pandas DataFrames (dask.dataframe) that efficiently scale to huge datasets. The core idea of Dask is to split a large array into smaller arrays (chunks).

In this recipe, we illustrate the basic principles of dask.array.

Getting ready

Dask should already be installed in Anaconda, but you can always install it manually with conda install dask. You also need memory_profiler, which you can install with conda install memory_profiler.

How to do it...

  1. Let's import the libraries:

    >>> import numpy as np
        import dask.array as da
        import memory_profiler
    >>> %load_ext memory_profiler
  2. We initialize a large 10,000 x 10,000 array...