Mastering IPython 4.0

Mastering IPython 4.0

By : Thomas Bitterman, Dipanjan Deb

Buy this Book

Mastering IPython 4.0

By: Thomas Bitterman, Dipanjan Deb

Buy this Book

Overview of this book

IPython is an interactive computational environment in which you can combine code execution, rich text, mathematics, plots, and rich media. This book will get IPython developers up to date with the latest advancements in IPython and dive deep into interactive computing with IPython. This an advanced guide on interactive and parallel computing with IPython will explore advanced visualizations and high-performance computing with IPython in detail. You will quickly brush up your knowledge of IPython kernels and wrapper kernels, then we'?ll move to advanced concepts such as testing, Sphinx, JS events, interactive work, and the ZMQ cluster. The book will cover topics such as IPython Console Lexer, advanced configuration, and third-party tools. By the end of this book, you will be able to use IPython for interactive and parallel computing in a high-performance computing environment.

Mastering IPython 4.0

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Using IPython for HPC

The need for speed

FORTRAN to the rescue – the problems FORTRAN addressed

Choosing between IPython and Fortran

An example case – Fast Fourier Transform

High Performance Computing

Going parallel

Summary

Advanced Shell Topics

What is IPython?

Installing IPython

All-in-one distributions

What happened to the Notebook?

Starting out with the terminal

IPython beyond Python

Magic commands

Cython

Configuring IPython

Debugging

Read-Eval-Print Loop (REPL) and IPython architecture

Alternative development environments

Summary

Stepping Up to IPython for Parallel Computing

Serial processes

Threading

Using multiple processors

The IPython parallel architecture

Getting started with ipyparallel

Parallel magic commands

Types of parallelism

Data parallelism

Application steering

Summary

Messaging with ZeroMQ and MPI

The storage hierarchy

ZeroMQ

MPI

ZeroMQ and IPython

Summary

Opening the Toolkit – The IPython API

Performance profiling

The AsyncResult class

The Client class

The View class

Summary

Works Well with Others – IPython and Third-Party Tools

Octave

Summary

Seeing Is Believing– Visualization

Matplotlib

Bokeh

Python-nvd3

Summary

But It Worked in the Demo! – Testing

Unit testing

unittest

pytest

nose2

Summary

Documentation

Sphinx

Summary

Visiting Jupyter

Installation and startup

The Dashboard

Creating a notebook

Interacting with Python scripts

Working with cells

Being graphic

Format conversion

Summary

Into the Future

Some history

The Jupyter project

IPython

The rise of parallelism

Growing professionalism

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

High Performance Computing

At this point, we have to leave consumer computing aside for a while. As computing hardware became more affordable, the need for most people to have programs run as efficiently as possible diminished. Other criteria entered the picture: graphical interfaces, multitasking, interactivity, and so on. Usability became more important than raw speed.

This, however, was not true for everybody. There remained a small (but devoted) group of users/programmers for whom efficiency was not just the most important thing. It was the only thing. These groups hung out in nuclear labs and intelligence agencies and had money to spend on exotic hardware and highly skilled coders. Thus was shaped High Performance Computing (HPC).

True to the nature of HPC, its implementations have been chosen with efficiency in mind. HPC systems are highly parallel, are batch type, and run Fortran. It is important enough to the users of HPC systems that their programs run quickly, so much so that they have ignored any and all advances in the field which did not result faster programs.

The HPC learning curve

This was a satisfactory relationship for some time. The types of problems of interest to the HPC community (complicated physical modeling and advanced mathematics) had little overlap with the rest of computer science. HPC was a niche with a very high barrier to entry. After all, there were just not that many massively parallel computers to go around.

In a sense then, programming HPC systems was an island. On the island, there were ongoing research programs centered on important HPC-centric questions. Tools were built, skills were developed, and a community of practice developed to the point that approaching HPC from the outside could be daunting. Advances occurred outside of HPC also, but those inside it had their own concerns.

As time passed, the HPC island drifted further and further from mainstream computing. New areas opened up: web computing, mobile computing, agile methods, and many others. HPC took what it needed from these areas, but nothing really affected it. Until something finally did…

Cloudy with a chance of parallelism (or Amazon's computer is bigger than yours)

Amazon had a problem. During the Christmas season, it used a lot of computer power. For the rest of the year, these computers would sit idle. If there were some way to allow people to rent time on these idle machines, Amazon could make money. The result was an API that allowed people to store data on those machines (the Amazon Simple Storage Service, or S3) and an API that allowed people to run programs on the same machines (the Amazon Elastic Compute Cloud, or EC2). Together, these made up the start of the Amazon Cloud.

While not the first system to rent out excess capacity (CompuServe started off the same way several decades earlier), Amazon Cloud was the first large-scale system that provided the general public paid access to virtually unlimited storage and computing power.

It is not clear whether anybody realized what this meant at first. There are a lot of uses of clouds—overflow capacity, mass data storage, and redundancy, among others—that have a wide appeal. For our purposes, the cloud meant one thing: now everybody has access to a supercomputer. HPC will never be the same again.

HPC and parallelism

The current relationship between HPC and highly parallel architectures is relatively new. It was only in the 1990s that HPC left the realm of very fast single-processor machines for massively parallel architectures. In one sense, this was unfortunate, as the old Cray machines were aesthetic marvels:

The image is taken from a public domain: https://commons.wikimedia.org/wiki/File:Cray2.jpeg

It was largely inevitable, however, as single-processor systems were bumping up against physical limitations involving transistor density and cooling.

The change in architecture did not bring with it a change in the problems to be solved. To this end, the generic supercomputer physical architecture evolved toward:

Commodity processors—not custom-fast but top-of-the-line and homogeneous
Commodity RAM—ditto
High-end hard drives—lots of smaller, low-latency models (now turning into solid state drives)
Super-fast interconnected networks

Moving from single to multiple processors brought issues with locality. Every time a program running on one processor needed data from another processor (or disk), processing could come to a halt as the data was being retrieved. The physical architecture of the supercomputer is meant to minimize the latency associated with non-local data access.

Given the position of HPC centers as early adopters of parallel architectures, "parallel programming" came to be largely synonymous with "HPC programming." This is largely a historical accident, and new paradigms have opened up parallel computing to constituencies outside of the HPC world. As such, this book will use the two terms interchangeably.

We now turn to one of the new paradigms, cloud computing, and discuss its similarities and differences from standard HPC.

Clouds and HPC

There are some differences between a "real" supercomputer and what most clouds offer. In particular, a cloud's physical architecture will contain:

Commodity processors—not necessarily fast, but they make up for it in sheer numbers
Commodity RAM—ditto
Commodity hard drives—smaller, but larger in aggregate
Slow(er) interconnected networks

In addition, clouds are generally heterogeneous and easily scaled. While an initial cloud is likely to have many subsystems with the same processor, RAM, hard drives, and so on, over time new subsystems will be added, with newer (or at least different) technology. The loose coupling of cloud systems encourages this sort of organic growth.

Differences in architecture mean that some algorithms will run well on supercomputers versus others that favor clouds. A lot of software that runs on supercomputers will not run on clouds; period (and vice versa)! This is not always just a matter of recompiling for a new target platform or using different libraries. The underlying algorithm may not be suited for a particular paradigm.

If speed is imperative and you have the budget, there is still no substitute for a special-purpose HPC system. If cost, ease of access, redundancy, and massive parallelism are desired, a cloud fits the bill.

That is not to say the two worlds (HPC and cloud) are completely distinct. Despite these architectural differences, it is worth noting that an Amazon EC2 C3 instance cluster is listed at 134 on the top 500 list of fastest HPC systems as of June 2015. Even on HPC's own terms, cloud computers offer respectable performance.

The core audience for this book then consists of members of both of these groups:

Python programmers looking to expand into HPC/parallel-style programming
HPC/parallel programmers looking to employ Python

Each group has the skills the other wants. HPC programmers understand scientific computing, efficiency, and parallelism. Python programmers are skilled in interactivity, usability, correctness, powerful development tools, ease of debugging, and other capabilities that mainstream computing values. New technology means that future systems will need to incorporate elements from both skill sets.

Mastering IPython 4.0

By : Thomas Bitterman, Dipanjan Deb

Mastering IPython 4.0

By: Thomas Bitterman, Dipanjan Deb

Overview of this book

Related Content you might be interested in

Current Title:

Mastering IPython 4.0

High Performance Computing

The HPC learning curve

Cloudy with a chance of parallelism (or Amazon's computer is bigger than yours)

HPC and parallelism

Clouds and HPC