Book Image

Mastering IPython 4.0

By : Thomas Bitterman, Dipanjan Deb
Book Image

Mastering IPython 4.0

By: Thomas Bitterman, Dipanjan Deb

Overview of this book

IPython is an interactive computational environment in which you can combine code execution, rich text, mathematics, plots, and rich media. This book will get IPython developers up to date with the latest advancements in IPython and dive deep into interactive computing with IPython. This an advanced guide on interactive and parallel computing with IPython will explore advanced visualizations and high-performance computing with IPython in detail. You will quickly brush up your knowledge of IPython kernels and wrapper kernels, then we'?ll move to advanced concepts such as testing, Sphinx, JS events, interactive work, and the ZMQ cluster. The book will cover topics such as IPython Console Lexer, advanced configuration, and third-party tools. By the end of this book, you will be able to use IPython for interactive and parallel computing in a high-performance computing environment.
Table of Contents (18 chapters)
Mastering IPython 4.0
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
6
Works Well with Others – IPython and Third-Party Tools
Index

An example case – Fast Fourier Transform


In this section, we will look at a small test program for a common scientific algorithm as written in Fortran and Python. Issues related to efficiency and general software engineering will be addressed.

Fast Fourier Transform

Rosetta Code (http://rosettacode.org/wiki/Rosetta_Code) is an excellent site that contains solutions to many problems in different programming languages. Although there is no guarantee that the code samples contained on the site are optimal (in whatever sense the word "optimal" is being used), its goal is to present a solution usable by visitors who are learning a new language. As such, the code is generally clear and well-organized. The following examples are from the site. All code is covered under the GNU Free Documentation License 1.2.

Fortran

From http://rosettacode.org/wiki/Fast_Fourier_transform#Fortran:

module fft_mod
  implicit none
  integer,       parameter :: dp=selected_real_kind(15,300)
  real(kind=dp), parameter :: pi=3.141592653589793238460_dp
contains

  ! In place Cooley-Tukey FFT
  recursive subroutine fft(x)
    complex(kind=dp), dimension(:), intent(inout)  :: x
    complex(kind=dp)                               :: t
    integer                                        :: N
    integer                                        :: i
    complex(kind=dp), dimension(:), allocatable    :: even, odd

    N=size(x)

    if(N .le. 1) return

    allocate(odd((N+1)/2))
    allocate(even(N/2))

    ! divide
    odd =x(1:N:2)
    even=x(2:N:2)

    ! conquer
    call fft(odd)
    call fft(even)

    ! combine
    do i=1,N/2
       t=exp(cmplx(0.0_dp,-2.0_dp*pi*real(i-1,dp)/real(N,dp),kind=dp))*even(i)
       x(i)     = odd(i) + t
       x(i+N/2) = odd(i) - t
    end do

    deallocate(odd)
    deallocate(even)

  end subroutine fft

end module fft_mod

program test
  use fft_mod
  implicit none
  complex(kind=dp), dimension(8) :: data = (/1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0/)
  integer :: i

  call fft(data)

  do i=1,8
     write(*,'("(", F20.15, ",", F20.15, "i )")') data(i)
  end do

end program test

Python

From http://rosettacode.org/wiki/Fast_Fourier_transform#Python:

from cmath import exp, pi

def fft(x):
    N = len(x)
    if N <= 1: return x
    even = fft(x[0::2])
    odd =  fft(x[1::2])
    T= [exp(-2j*pi*k/N)*odd[k] for k in xrange(N/2)]
    return [even[k] + T[k] for k in xrange(N/2)] + \
           [even[k] - T[k] for k in xrange(N/2)]

print( ' '.join("%5.3f" % abs(f)
for f in fft([1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0])) )

Performance concerns

It would be difficult to compare the performance of these programs. The time required to run a program can be influenced by many things outside of the inherent properties of the language:

  • Skilled Fortran and Python programmers could find optimizations at the code level

  • Optimizing compilers vary in quality

  • Underlying libraries (for example, numpy) could be substituted in and affect performance

  • Critical sections could be coded in a compiled language (for example, Cython) or even assembly language, yielding a major speedup without affecting most of the lines of code

  • The architecture of the machine itself could have an impact

Software engineering concerns

The question of how fast code runs is independent of the question of how long it takes to write, debug, and maintain. It is notoriously difficult to estimate how much time / how much effort will be required to write a program before the development has started. This uncertainty remains throughout the development cycle, with many projects going well over time and budget. Even when coding is complete, it can be difficult to tell how efficient the entire process was. A means to measure effort would help answer these questions.

There are two primary ways to measure the amount of effort required to write software, mentioned as follows.

Complexity-based metrics

Complexity-based metrics focus on either of these two:

  • Code-level complexity (number of variables, loop nesting, branching complexity, and cyclomatic complexity)

  • Functionality (based on intuitive ideas of how difficult implementing different pieces of functionality might be)

Complexity-based measures have the advantage that they tend to match intuitive ideas of what complexity is and what types of code are complex (that is, difficult to write and debug). The primary drawback is that such measures often seem arbitrary. Especially before a project is started, it can be difficult to tell how much effort will be required to write a particular piece of functionality. Too many things can change between project specification and coding. This effect is even greater on large projects, where the separation between specification and implementation can be years long.

Size-based metrics

Size-based metrics focus on a property that can be expressed on a linear scale, for example:

  • Lines of code (LOC, or thousands of LOC, also known as KLOC)

  • Lines of machine code (post-compilation)

  • Cycles consumed (code that uses more cycles is probably more important and harder to write)

Size-based metrics have the advantage that they are easy to gather, understand, and objectively measure. In addition, LOC seems to be a decent correlate of project cost—the more the lines of code in a project, the more it costs to write it. The most expensive part of a software project is paying the coders, and the more lines of code they have to write, the longer it probably takes them to write. If the lines of code could be estimated upfront, they would be a tool for estimating the cost.

The primary drawback of this is that it is unclear whether they are very valid. It is often the case that better (clearer, faster, and easier to maintain) code will grade out as "smaller" under a size-based metric. In addition, such code is often easier to write, making the development team look even more productive. Bloated, buggy, and inefficient code can make a team look good under these metrics, but can be a disaster for the project overall.

As for the class of projects this book is concerned with, much of it involves taking mathematics-based models and translating them into executable systems. In this case, we can consider the complexity of the problem as fixed by the underlying model and concentrate on size-based measures: speed and lines of code. Speed was addressed previously, so the main concern left is LOC. As illustrated previously, Python programs tend to be shorter than Fortran programs. For a more detailed look, visit http://blog.wolfram.com/2012/11/14/code-length-measured-in-14-languages/.

Admittedly, such measures are fairly arbitrary. It is possible to write programs in such a way as to minimize or maximize the number of lines required, often to the detriment of the overall quality. Absent such incentives, however, any programmer tends to produce the same number of lines of code a day regardless of the language being used. This makes the relative conciseness of Python an important consideration when choosing a language to develop in.

Where we stand now

In the past, most HPC and parallel programming were done on a limited number of expensive machines. As such, the most important criteria by which programs were measured was execution speed. Fortran was an excellent solution to the problems of writing fast, efficient programs. This environment was acceptable to the community, which needed to perform these types of calculations, and it gradually separated from mainstream commercial computing, which developed other concerns.

The birth of cloud computing (and cheaper hardware in general) and the evolution of big data has caused some in the commercial mainstream to reconsider using large, parallel systems. This reconsideration has brought commercial concerns to the fore: development and maintenance costs, testing, training, and other things. In this environment, some (small) trade-off in speed is worth it for significant gains in other areas. Python/IPython has demonstrated that it can provide these gains with a minimal runtime performance cost.