Book Image

Mastering IPython 4.0

By : Thomas Bitterman, Dipanjan Deb
Book Image

Mastering IPython 4.0

By: Thomas Bitterman, Dipanjan Deb

Overview of this book

IPython is an interactive computational environment in which you can combine code execution, rich text, mathematics, plots, and rich media. This book will get IPython developers up to date with the latest advancements in IPython and dive deep into interactive computing with IPython. This an advanced guide on interactive and parallel computing with IPython will explore advanced visualizations and high-performance computing with IPython in detail. You will quickly brush up your knowledge of IPython kernels and wrapper kernels, then we'?ll move to advanced concepts such as testing, Sphinx, JS events, interactive work, and the ZMQ cluster. The book will cover topics such as IPython Console Lexer, advanced configuration, and third-party tools. By the end of this book, you will be able to use IPython for interactive and parallel computing in a high-performance computing environment.
Table of Contents (18 chapters)
Mastering IPython 4.0
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
6
Works Well with Others – IPython and Third-Party Tools
Index

Choosing between IPython and Fortran


We will start by taking a look at each language in general, and follow that with a discussion on the cost factors that impact a software project and how each language can affect them. No two software development projects are the same, and so the factors discussed next (along with many, many others) should serve as guidelines for the choice of language. This chapter is not an attempt to promote IPython at the expense of Fortran, but it shows that IPython is a superior choice when implementing certain important types of systems.

Fortran

Many of the benefits and drawbacks of Fortran are linked to its longevity. For the kinds of things that have not changed over the decades, Fortran excels (for example, numerical computing, which is what the language was originally designed for). Newer developments (for example, text processing, objects) have been added to the language in its various revisions.

The benefits of Fortran are as follows:

  • Compilation makes for efficient runtime performance

  • Existence of many tested and optimized libraries for scientific computing

  • Highly portable

  • Optimized for scientific computing (especially matrix operations)

  • Stable language definition with a well-organized system for revisions

The drawbacks of Fortran are as follows:

  • Text processing is an add-on

  • Object-orientation is a recent addition

  • Shrinking pool of new talent

IPython

IPython/Python is the new kid in town. It began in 2001 when Fernando Perez decided that he wanted some additional features out of Python. In particular, he wanted a more powerful command line and integration with a lab-notebook-style interface. The end result was a development environment that placed greater emphasis on ongoing interaction with the system than what traditional batch processing provided.

The nearly 45-year delay between the advent of Fortran and IPython's birth provided IPython the advantage of being able to natively incorporate ideas about programming that have arisen since Fortran was created (for example, object-orientation and sophisticated data structuring operations). However, its relative newness puts it behind in terms of installed code base and libraries. IPython, as an extension of Python, shares its benefits and drawbacks to a large extent.

The benefits of IPython are as follows:

  • Good at non-numeric computing

  • More concise

  • Many object-oriented features

  • Ease of adoption

  • Useful libraries

  • Sophisticated data structuring capabilities

  • Testing and documentation frameworks

  • Built-in visualization tools

  • Ease of interaction while building and running systems

The drawbacks of IPython are as follows:

  • Its interpreted nature makes for slower runtime

  • Fewer libraries (although the ones that exist are of high quality)

Some of these benefits deserve more extensive treatment here, while others merit entire chapters.

Object-orientation

Object-oriented programming (OOP) was designed for writing simulations. While some simulations reduce to computational application of physical laws (for example, fluid dynamics), other types of simulation (for example, traffic patterns and neural networks) require modeling the entities involved at a more abstract level. This is more easily accomplished with a language that supports classes and objects (such as Python) than an imperative language.

The ability to match a program structure to a problem's structure makes it easier to write, test, and debug a system. The OOP paradigm is simply superior when simulating a large number of individually identifiable, complex elements.

Ease of adoption

It is easy to learn Python. It is currently the most popular introductory programming language in the United States among the top 39 departments (http://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-most-popular-introductory-teaching-language-at-top-us-universities/fulltext):

Note that Fortran is not on the list.

This is no accident, nor is Python limited to a "teaching language." Rather, it is a well-designed language with an easy-to-learn syntax and a gentle learning curve. It is much easier to learn Python than Fortran, and it is also easier to move from Fortran to Python than the reverse. This has led to an increasing use of Python in many areas.

Popularity – Fortran versus IPython

The trend toward teaching Python has meant that there is a much larger pool of potential developers who know Python. This is an important consideration when staffing a project.

TIOBE Software ranks the popularity of programming languages based on skilled engineers, courses, and third-party vendors. Their rankings for October 2015 put Python in the fifth place and growing. Fortran is 22nd (behind COBOL, which is 21st).

IEEE uses its own methods, and they produced the following graph:

The column on the left is the 2015 ranking, and the column on the right is the 2014 ranking, for comparison. Fortran came in 29th, with a Spectrum ranking of 39.5.

Useful libraries

The growing number of Python coders has led to an increasing number of libraries written in/for Python. SciPy, NumPy, and sage are leading the way, with new open source libraries coming out on a regular basis. The usefulness of a language is heavily dependent on its libraries, and while Python cannot boast the depth in this field that Fortran can, the sheer number of Python developers means that it is closing the gap rapidly.

The cost of building (and maintaining) software

If developers were all equal in talent, they worked for free, development time were no object, all code were bug-free, and all programs only needed to run once and were then thrown away, Fortran would be the clear winner given its efficiency and installed library base.

This is not how commercial software is developed. At a first approximation, a software project's cost can be broken down into the cost of several parts:

  • Requirements and specification gathering

  • Development

  • Execution

  • Testing and maintenance

Requirements and specification gathering

There is no clear differentiation between IPython and Fortran in the difficulty of production, good requirements, and specifications. These activities are language-independent. While the availability of prewritten software packages may impact parts of the specification, both languages are equally capable of reducing requirements and specifications to a working system.

Development

As discussed previously, Python code tends to be more concise, leading to higher programmer productivity. Combine this with the growing numbers of developers already fluent in Python and Python is the clear winner in terms of reducing development time.

Execution

If it is costly to run on the target system (which is true for many supercomputers), or the program takes a long time to run (which is true for some large-scale simulations such as weather prediction), then the runtime efficiency of Fortran is unmatched. This consideration looms especially large when development on a program has largely concluded and the majority of the time spent on it is in waiting for it to complete its run.

Testing and maintenance

There are many different styles of testing: unit, coverage, mocks, web, and GUI, to name just a few. Good tests are hard to write and not very the effort put into them is often unappreciated. Most programmers will avoid writing tests if they can. To that end, it is important to have a set of good, easy-to-use testing tools.

Python has the advantage in this area, particularly because of such quality unit testing frameworks such as unit test, nose, and Pythoscope. The introspection capabilities of the Python language make the writing and use of testing frameworks much easier than those available for Fortran.

You could always just skip testing (it is, after all, expensive and unpopular), or do it the old-fashioned way; try a few values and check whether they work. This leads to an important consideration governing how much testing to do: the cost of being wrong. This type of cost is especially important in scientific and engineering computing. While the legal issues surrounding software liability are in flux, moral and practical considerations are important. No one wants to be the developer who was responsible for lethally overdosing chemotherapy patients because of a bug. There are types of programming for which this is not important (word processors come to mind), but any system that involves human safety or financial risk incurs a high cost when something goes wrong.

Maintenance costs are similar to testing costs in that maintenance programming tends to be unpopular and allows new errors to creep into previously correct code. Python's conciseness reduces maintenance costs by reducing the number of lines of code that need to be maintained. The superior testing tools allow the creation of comprehensive regression testing suites to minimize the chances of errors being introduced during maintenance.

Alternatives

There are alternatives to the stark IPython/Fortran choice: cross-language development and prototyping.

Cross-language development

Python began as a scripting language. As such, it was always meant to be able to interoperate with other languages. This can be a great advantage in several situations:

  • A divided development team: If some of your developers know only Fortran and some know only Python, it can be worth it to partition the system between the groups and define a well-structured interface between them. Functionality can then be assigned to the appropriate team:

    • Runtime-intensive sections to the Fortran group

    • Process coordination, I/O, and others to the Python group

  • Useful existing libraries: It always seems like there is a library that does exactly what is needed but it is written in another language. Python's heritage as a scripting language means that there are many tools that can be used to make this process easier. Of particular interest in this context is F2Py (part of NumPy), which makes interfacing with Fortran code easier.

  • Specialized functionality: Even without a pre-existing library, it may be advantageous to write some performance-sensitive modules in Fortran. This can raise development, testing, and maintenance costs, but it can sometimes be worth it. Conversely, IPython provides specialized functionality in several areas (testing, introspection, and graphics) that Fortran projects could use.

Prototyping and exploratory development

It is often the case that it is not clear before writing a program how useful that program will turn out to be. Experience with the finished product would provide important feedback, but building the entire system would be prohibitively costly.

Similarly, there may be several different ways to build a system. Without clear guidelines to start with, the only way to decide between alternatives is to build several different versions and see which one is the best.

These cases share the problem of needing the system to be complete before being able to decide whether to build the system in the first place.

The solution is to build a prototype—a partially functional system that nevertheless incorporates important features of the finished product as envisioned. The primary virtue of a prototype is its short development time and concomitant low cost. It is often the case that the prototype (or prototypes) will be thrown away after a short period of evaluation. Errors, maintainability, and software quality in general are not important insofar as they are important to evaluating the prototype (say, for use in estimating the schedule for the entire project).

Python excels as a prototyping language. It is flexible and easy to work with (reducing development time) while being powerful enough to implement sophisticated algorithms. Its interpreted nature is not an issue, as prototypes are generally not expected to be efficient (only quick and cheap).

It is possible to adopt an approach known as Evolutionary Prototyping. In this approach, an initial prototype is built and evaluated. Based on this evaluation, changes are decided upon. The changes are made to the original prototype, yielding an improved version. This cycle completes until the software is satisfactory. Among other advantages, this means that a working version of the system is always available for benchmarking, testing, and so on. The results of the ongoing evaluations may point out functionality that would be better implemented in one language or another, and these changes could be made as described in the section on cross-language development.