Mastering IPython 4.0

Mastering IPython 4.0

By : Thomas Bitterman, Dipanjan Deb

Buy this Book

Mastering IPython 4.0

By: Thomas Bitterman, Dipanjan Deb

Buy this Book

Overview of this book

IPython is an interactive computational environment in which you can combine code execution, rich text, mathematics, plots, and rich media. This book will get IPython developers up to date with the latest advancements in IPython and dive deep into interactive computing with IPython. This an advanced guide on interactive and parallel computing with IPython will explore advanced visualizations and high-performance computing with IPython in detail. You will quickly brush up your knowledge of IPython kernels and wrapper kernels, then we'?ll move to advanced concepts such as testing, Sphinx, JS events, interactive work, and the ZMQ cluster. The book will cover topics such as IPython Console Lexer, advanced configuration, and third-party tools. By the end of this book, you will be able to use IPython for interactive and parallel computing in a high-performance computing environment.

Mastering IPython 4.0

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Using IPython for HPC

The need for speed

FORTRAN to the rescue – the problems FORTRAN addressed

Choosing between IPython and Fortran

An example case – Fast Fourier Transform

High Performance Computing

Going parallel

Summary

Advanced Shell Topics

What is IPython?

Installing IPython

All-in-one distributions

What happened to the Notebook?

Starting out with the terminal

IPython beyond Python

Magic commands

Cython

Configuring IPython

Debugging

Read-Eval-Print Loop (REPL) and IPython architecture

Alternative development environments

Summary

Stepping Up to IPython for Parallel Computing

Serial processes

Threading

Using multiple processors

The IPython parallel architecture

Getting started with ipyparallel

Parallel magic commands

Types of parallelism

Data parallelism

Application steering

Summary

Messaging with ZeroMQ and MPI

The storage hierarchy

ZeroMQ

MPI

ZeroMQ and IPython

Summary

Opening the Toolkit – The IPython API

Performance profiling

The AsyncResult class

The Client class

The View class

Summary

Works Well with Others – IPython and Third-Party Tools

Octave

Summary

Seeing Is Believing– Visualization

Matplotlib

Bokeh

Python-nvd3

Summary

But It Worked in the Demo! – Testing

Unit testing

unittest

pytest

nose2

Summary

Documentation

Sphinx

Summary

Visiting Jupyter

Installation and startup

The Dashboard

Creating a notebook

Interacting with Python scripts

Working with cells

Being graphic

Format conversion

Summary

Into the Future

Some history

The Jupyter project

IPython

The rise of parallelism

Growing professionalism

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

An example case – Fast Fourier Transform

In this section, we will look at a small test program for a common scientific algorithm as written in Fortran and Python. Issues related to efficiency and general software engineering will be addressed.

Fast Fourier Transform

Rosetta Code (http://rosettacode.org/wiki/Rosetta_Code) is an excellent site that contains solutions to many problems in different programming languages. Although there is no guarantee that the code samples contained on the site are optimal (in whatever sense the word "optimal" is being used), its goal is to present a solution usable by visitors who are learning a new language. As such, the code is generally clear and well-organized. The following examples are from the site. All code is covered under the GNU Free Documentation License 1.2.

Fortran

From http://rosettacode.org/wiki/Fast_Fourier_transform#Fortran:

module fft_mod
  implicit none
  integer,       parameter :: dp=selected_real_kind(15,300)
  real(kind=dp), parameter :: pi=3.141592653589793238460_dp
contains

  ! In place Cooley-Tukey FFT
  recursive subroutine fft(x)
    complex(kind=dp), dimension(:), intent(inout)  :: x
    complex(kind=dp)                               :: t
    integer                                        :: N
    integer                                        :: i
    complex(kind=dp), dimension(:), allocatable    :: even, odd

    N=size(x)

    if(N .le. 1) return

    allocate(odd((N+1)/2))
    allocate(even(N/2))

    ! divide
    odd =x(1:N:2)
    even=x(2:N:2)

    ! conquer
    call fft(odd)
    call fft(even)

    ! combine
    do i=1,N/2
       t=exp(cmplx(0.0_dp,-2.0_dp*pi*real(i-1,dp)/real(N,dp),kind=dp))*even(i)
       x(i)     = odd(i) + t
       x(i+N/2) = odd(i) - t
    end do

    deallocate(odd)
    deallocate(even)

  end subroutine fft

end module fft_mod

program test
  use fft_mod
  implicit none
  complex(kind=dp), dimension(8) :: data = (/1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0/)
  integer :: i

  call fft(data)

  do i=1,8
     write(*,'("(", F20.15, ",", F20.15, "i )")') data(i)
  end do

end program test

Python

From http://rosettacode.org/wiki/Fast_Fourier_transform#Python:

from cmath import exp, pi

def fft(x):
    N = len(x)
    if N <= 1: return x
    even = fft(x[0::2])
    odd =  fft(x[1::2])
    T= [exp(-2j*pi*k/N)*odd[k] for k in xrange(N/2)]
    return [even[k] + T[k] for k in xrange(N/2)] + \
           [even[k] - T[k] for k in xrange(N/2)]

print( ' '.join("%5.3f" % abs(f)
for f in fft([1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0])) )

Performance concerns

It would be difficult to compare the performance of these programs. The time required to run a program can be influenced by many things outside of the inherent properties of the language:

Skilled Fortran and Python programmers could find optimizations at the code level
Optimizing compilers vary in quality
Underlying libraries (for example, numpy) could be substituted in and affect performance
Critical sections could be coded in a compiled language (for example, Cython) or even assembly language, yielding a major speedup without affecting most of the lines of code
The architecture of the machine itself could have an impact

Software engineering concerns

The question of how fast code runs is independent of the question of how long it takes to write, debug, and maintain. It is notoriously difficult to estimate how much time / how much effort will be required to write a program before the development has started. This uncertainty remains throughout the development cycle, with many projects going well over time and budget. Even when coding is complete, it can be difficult to tell how efficient the entire process was. A means to measure effort would help answer these questions.

There are two primary ways to measure the amount of effort required to write software, mentioned as follows.

Complexity-based metrics

Complexity-based metrics focus on either of these two:

Code-level complexity (number of variables, loop nesting, branching complexity, and cyclomatic complexity)
Functionality (based on intuitive ideas of how difficult implementing different pieces of functionality might be)

Complexity-based measures have the advantage that they tend to match intuitive ideas of what complexity is and what types of code are complex (that is, difficult to write and debug). The primary drawback is that such measures often seem arbitrary. Especially before a project is started, it can be difficult to tell how much effort will be required to write a particular piece of functionality. Too many things can change between project specification and coding. This effect is even greater on large projects, where the separation between specification and implementation can be years long.

Size-based metrics

Size-based metrics focus on a property that can be expressed on a linear scale, for example:

Lines of code (LOC, or thousands of LOC, also known as KLOC)
Lines of machine code (post-compilation)
Cycles consumed (code that uses more cycles is probably more important and harder to write)

Size-based metrics have the advantage that they are easy to gather, understand, and objectively measure. In addition, LOC seems to be a decent correlate of project cost—the more the lines of code in a project, the more it costs to write it. The most expensive part of a software project is paying the coders, and the more lines of code they have to write, the longer it probably takes them to write. If the lines of code could be estimated upfront, they would be a tool for estimating the cost.

The primary drawback of this is that it is unclear whether they are very valid. It is often the case that better (clearer, faster, and easier to maintain) code will grade out as "smaller" under a size-based metric. In addition, such code is often easier to write, making the development team look even more productive. Bloated, buggy, and inefficient code can make a team look good under these metrics, but can be a disaster for the project overall.

As for the class of projects this book is concerned with, much of it involves taking mathematics-based models and translating them into executable systems. In this case, we can consider the complexity of the problem as fixed by the underlying model and concentrate on size-based measures: speed and lines of code. Speed was addressed previously, so the main concern left is LOC. As illustrated previously, Python programs tend to be shorter than Fortran programs. For a more detailed look, visit http://blog.wolfram.com/2012/11/14/code-length-measured-in-14-languages/.

Admittedly, such measures are fairly arbitrary. It is possible to write programs in such a way as to minimize or maximize the number of lines required, often to the detriment of the overall quality. Absent such incentives, however, any programmer tends to produce the same number of lines of code a day regardless of the language being used. This makes the relative conciseness of Python an important consideration when choosing a language to develop in.

Where we stand now

In the past, most HPC and parallel programming were done on a limited number of expensive machines. As such, the most important criteria by which programs were measured was execution speed. Fortran was an excellent solution to the problems of writing fast, efficient programs. This environment was acceptable to the community, which needed to perform these types of calculations, and it gradually separated from mainstream commercial computing, which developed other concerns.

The birth of cloud computing (and cheaper hardware in general) and the evolution of big data has caused some in the commercial mainstream to reconsider using large, parallel systems. This reconsideration has brought commercial concerns to the fore: development and maintenance costs, testing, training, and other things. In this environment, some (small) trade-off in speed is worth it for significant gains in other areas. Python/IPython has demonstrated that it can provide these gains with a minimal runtime performance cost.

Mastering IPython 4.0

By : Thomas Bitterman, Dipanjan Deb

Mastering IPython 4.0

By: Thomas Bitterman, Dipanjan Deb

Overview of this book

Related Content you might be interested in

Current Title:

Mastering IPython 4.0

An example case – Fast Fourier Transform

Fast Fourier Transform

Fortran

Python

Performance concerns

Software engineering concerns

Complexity-based metrics

Size-based metrics

Where we stand now