Book Image

Learning IPython for Interactive Computing and Data Visualization, Second Edition

By : Cyrille Rossant
Book Image

Learning IPython for Interactive Computing and Data Visualization, Second Edition

By: Cyrille Rossant

Overview of this book

Python is a user-friendly and powerful programming language. IPython offers a convenient interface to the language and its analysis libraries, while the Jupyter Notebook is a rich environment well-adapted to data science and visualization. Together, these open source tools are widely used by beginners and experts around the world, and in a huge variety of fields and endeavors. This book is a beginner-friendly guide to the Python data analysis platform. After an introduction to the Python language, IPython, and the Jupyter Notebook, you will learn how to analyze and visualize data on real-world examples, how to create graphical user interfaces for image processing in the Notebook, and how to perform fast numerical computations for scientific simulations with NumPy, Numba, Cython, and ipyparallel. By the end of this book, you will be able to perform in-depth analyses of all sorts of data.
Table of Contents (13 chapters)
Learning IPython for Interactive Computing and Data Visualization Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

A crash course on Python


If you don't know Python, read this section to learn the fundamentals. Python is a very accessible language and, if you have ever programmed, it will only take you a few minutes to learn the basics.

Hello world

Open a new notebook and type the following in the first cell:

In [1]: print("Hello world!")
Out[1]: Hello world!

Here is a screenshot:

"Hello world" in the Notebook

Tip

Prompt string

Note that the convention chosen in this book is to show Python code (also called the input) prefixed with In [x]: (which shouldn't be typed). This is the standard IPython prompt. Here, you should just type print("Hello world!") and then press Shift + Enter.

Congratulations! You are now a Python programmer.

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. You will also find the book's code on this GitHub repository: https://github.com/ipython-books/minibook-2nd-code.

Variables

Let's use Python as a calculator.

In [2]: 2 * 2
Out[2]: 4

Here, 2 * 2 is an expression statement. This operation is performed, the result is returned, and IPython displays it in the notebook cell's output.

Tip

Division

In Python 3, 3 / 2 returns 1.5 (floating-point division), whereas it returns 1 in Python 2 (integer division). This can be source of errors when porting Python 2 code to Python 3. It is recommended to always use the explicit 3.0 / 2.0 for floating-point division (by using floating-point numbers) and 3 // 2 for integer division. Both syntaxes work in Python 2 and Python 3. See http://python3porting.com/differences.html#integer-division for more details.

Other built-in mathematical operators include +, -, ** for the exponentiation, and others. You will find more details at https://docs.python.org/3/reference/expressions.html#the-power-operator.

Variables form a fundamental concept of any programming language. A variable has a name and a value. Here is how to create a new variable in Python:

In [3]: a = 2

And here is how to use an existing variable:

In [4]: a * 3
Out[4]: 6

Several variables can be defined at once (this is called unpacking):

In [5]: a, b = 2, 6

There are different types of variables. Here, we have used a number (more precisely, an integer). Other important types include floating-point numbers to represent real numbers, strings to represent text, and booleans to represent True/False values. Here are a few examples:

In [6]: somefloat = 3.1415
        sometext = 'pi is about'  # You can also use double quotes.
        print(sometext, somefloat)  # Display several variables.
Out[6]: pi is about 3.1415

Note how we used the # character to write comments. Whereas Python discards the comments completely, adding comments in the code is important when the code is to be read by other humans (including yourself in the future).

String escaping

String escaping refers to the ability to insert special characters in a string. For example, how can you insert ' and ", given that these characters are used to delimit a string in Python code? The backslash \ is the go-to escape character in Python (and in many other languages too). Here are a few examples:

In [7]: print("Hello \"world\"")
        print("A list:\n* item 1\n* item 2")
        print("C:\\path\\on\\windows")
        print(r"C:\path\on\windows")
Out[7]: Hello "world"
        A list:
        * item 1
        * item 2
        C:\path\on\windows
        C:\path\on\windows

The special character \n is the new line (or line feed) character. To insert a backslash, you need to escape it, which explains why it needs to be doubled as \\.

You can also disable escaping by using raw literals with a r prefix before the string, like in the last example above. In this case, backslashes are considered as normal characters.

This is convenient when writing Windows paths, since Windows uses backslash separators instead of forward slashes like on Unix systems. A very common error on Windows is forgetting to escape backslashes in paths: writing "C:\path" may lead to subtle errors.

You will find the list of special characters in Python at https://docs.python.org/3.4/reference/lexical_analysis.html#string-and-bytes-literals.

Lists

A list contains a sequence of items. You can concisely instruct Python to perform repeated actions on the elements of a list. Let's first create a list of numbers as follows:

In [8]: items = [1, 3, 0, 4, 1]

Note the syntax we used to create the list: square brackets [], and commas , to separate the items.

The built-in function len() returns the number of elements in a list:

In [9]: len(items)
Out[9]: 5

Note

Python comes with a set of built-in functions, including print(), len(), max(), functional routines like filter() and map(), and container-related routines like all(), any(), range(), and sorted(). You will find the full list of built-in functions at https://docs.python.org/3.4/library/functions.html.

Now, let's compute the sum of all elements in the list. Python provides a built-in function for this:

In [10]: sum(items)
Out[10]: 9

We can also access individual elements in the list, using the following syntax:

In [11]: items[0]
Out[11]: 1
In [12]: items[-1]
Out[12]: 1

Note that indexing starts at 0 in Python: the first element of the list is indexed by 0, the second by 1, and so on. Also, -1 refers to the last element, -2 to the penultimate element, and so on.

The same syntax can be used to alter elements in the list:

In [13]: items[1] = 9
         items
Out[13]: [1, 9, 0, 4, 1]

We can access sublists with the following syntax:

In [14]: items[1:3]
Out[14]: [9, 0]

Here, 1:3 represents a slice going from element 1 included (this is the second element of the list) to element 3 excluded. Thus, we get a sublist with the second and third element of the original list. The first-included/last-excluded asymmetry leads to an intuitive treatment of overlaps between consecutive slices. Also, note that a sublist refers to a dynamic view of the original list, not a copy; changing elements in the sublist automatically changes them in the original list.

Python provides several other types of containers:

  • Tuples are immutable and contain a fixed number of elements:

    In [15]: my_tuple = (1, 2, 3)
             my_tuple[1]
    Out[15]: 2
    
  • Dictionaries contain key-value pairs. They are extremely useful and common:

    In [16]: my_dict = {'a': 1, 'b': 2, 'c': 3}
             print('a:', my_dict['a'])
    Out[16]: a: 1
    In [17]: print(my_dict.keys())
    Out[17]: dict_keys(['c', 'a', 'b'])
    

    There is no notion of order in a dictionary. However, the native collections module provides an OrderedDict structure that keeps the insertion order (see https://docs.python.org/3.4/library/collections.html).

  • Sets, like mathematical sets, contain distinct elements:

    In [18]: my_set = set([1, 2, 3, 2, 1])
             my_set
    Out[18]: {1, 2, 3}
    

    Note

    A Python object is mutable if its value can change after it has been created. Otherwise, it is immutable. For example, a string is immutable; to change it, a new string needs to be created. A list, a dictionary, or a set is mutable; elements can be added or removed. By contrast, a tuple is immutable, and it is not possible to change the elements it contains without recreating the tuple. See https://docs.python.org/3.4/reference/datamodel.html for more details.

Loops

We can run through all elements of a list using a for loop:

In [19]: for item in items:
             print(item)
Out[19]: 1
         9
         0
         4
         1

There are several things to note here:

  • The for item in items syntax means that a temporary variable named item is created at every iteration. This variable contains the value of every item in the list, one at a time.

  • Note the colon : at the end of the for statement. Forgetting it will lead to a syntax error!

  • The statement print(item) will be executed for all items in the list.

  • Note the four spaces before print: this is called the indentation. You will find more details about indentation in the next subsection.

Python supports a concise syntax to perform a given operation on all elements of a list, as follows:

In [20]: squares = [item * item for item in items]
         squares
Out[20]: [1, 81, 0, 16, 1]

This is called a list comprehension. A new list is created here; it contains the squares of all numbers in the list. This concise syntax leads to highly readable and Pythonic code.

Indentation

Indentation refers to the spaces that may appear at the beginning of some lines of code. This is a particular aspect of Python's syntax.

In most programming languages, indentation is optional and is generally used to make the code visually clearer. But in Python, indentation also has a syntactic meaning. Particular indentation rules need to be followed for Python code to be correct.

In general, there are two ways to indent some text: by inserting a tab character (also referred to as \t), or by inserting a number of spaces (typically, four). It is recommended to use spaces instead of tab characters. Your text editor should be configured such that the Tab key on the keyboard inserts four spaces instead of a tab character.

In the Notebook, indentation is automatically configured properly; so you shouldn't worry about this issue. The question only arises if you use another text editor for your Python code.

Finally, what is the meaning of indentation? In Python, indentation delimits coherent blocks of code, for example, the contents of a loop, a conditional branch, a function, and other objects. Where other languages such as C or JavaScript use curly braces to delimit such blocks, Python uses indentation.

Conditional branches

Sometimes, you need to perform different operations on your data depending on some condition. For example, let's display all even numbers in our list:

In [21]: for item in items:
             if item % 2 == 0:
                 print(item)
Out[21]: 0
         4

Again, here are several things to note:

  • An if statement is followed by a boolean expression.

  • If a and b are two integers, the modulo operand a % b returns the remainder from the division of a by b. Here, item % 2 is 0 for even numbers, and 1 for odd numbers.

  • The equality is represented by a double equal sign == to avoid confusion with the assignment operator = that we use when we create variables.

  • Like with the for loop, the if statement ends with a colon :.

  • The part of the code that is executed when the condition is satisfied follows the if statement. It is indented. Indentation is cumulative: since this if is inside a for loop, there are eight spaces before the print(item) statement.

Python supports a concise syntax to select all elements in a list that satisfy certain properties. Here is how to create a sublist with only even numbers:

In [22]: even = [item for item in items if item % 2 == 0]
         even
Out[22]: [0, 4]

This is also a form of list comprehension.

Functions

Code is typically organized into functions. A function encapsulates part of your code. Functions allow you to reuse bits of functionality without copy-pasting the code. Here is a function that tells whether an integer number is even or not:

In [23]: def is_even(number):
             """Return whether an integer is even or not."""
             return number % 2 == 0

There are several things to note here:

  • A function is defined with the def keyword.

  • After def comes the function name. A general convention in Python is to only use lowercase characters, and separate words with an underscore _. A function name generally starts with a verb.

  • The function name is followed by parentheses, with one or several variable names called the arguments. These are the inputs of the function. There is a single argument here, named number.

  • No type is specified for the argument. This is because Python is dynamically typed; you could pass a variable of any type. This function would work fine with floating point numbers, for example (the modulo operation works with floating point numbers in addition to integers).

  • The body of the function is indented (and note the colon : at the end of the def statement).

  • There is a docstring wrapped by triple quotes """. This is a particular form of comment that explains what the function does. It is not mandatory, but it is strongly recommended to write docstrings for the functions exposed to the user.

  • The return keyword in the body of the function specifies the output of the function. Here, the output is a Boolean, obtained from the expression number % 2 == 0. It is possible to return several values; just use a comma to separate them (in this case, a tuple of Booleans would be returned).

Once a function is defined, it can be called like this:

In [24]: is_even(3)
Out[24]: False
In [25]: is_even(4)
Out[25]: True

Here, 3 and 4 are successively passed as arguments to the function.

Positional and keyword arguments

A Python function can accept an arbitrary number of arguments, called positional arguments. It can also accept optional named arguments, called keyword arguments. Here is an example:

In [26]: def remainder(number, divisor=2):
             return number % divisor

The second argument of this function, divisor, is optional. If it is not provided by the caller, it will default to the number 2, as shown here:

In [27]: remainder(5)
Out[27]: 1

There are two equivalent ways of specifying a keyword argument when calling a function. They are as follows:

In [28]: remainder(5, 3)
Out[28]: 2
In [29]: remainder(5, divisor=3)
Out[29]: 2

In the first case, 3 is understood as the second argument, divisor. In the second case, the name of the argument is given explicitly by the caller. This second syntax is clearer and less error-prone than the first one.

Functions can also accept arbitrary sets of positional and keyword arguments, using the following syntax:

In [30]: def f(*args, **kwargs):
             print("Positional arguments:", args)
             print("Keyword arguments:", kwargs)
In [31]: f(1, 2, c=3, d=4)
Out[31]: Positional arguments: (1, 2)
         Keyword arguments: {'c': 3, 'd': 4}

Inside the function, args is a tuple containing positional arguments, and kwargs is a dictionary containing keyword arguments.

Passage by assignment

When passing a parameter to a Python function, a reference to the object is actually passed (passage by assignment):

  • If the passed object is mutable, it can be modified by the function

  • If the passed object is immutable, it cannot be modified by the function

Here is an example:

In [32]: my_list = [1, 2]

         def add(some_list, value):
             some_list.append(value)

         add(my_list, 3)
         my_list
Out[32]: [1, 2, 3]

The add() function modifies an object defined outside it (in this case, the object my_list); we say this function has side-effects. A function with no side-effects is called a pure function: it doesn't modify anything in the outer context, and it deterministically returns the same result for any given set of inputs. Pure functions are to be preferred over functions with side-effects.

Knowing this can help you spot out subtle bugs. There are further related concepts that are useful to know, including function scopes, naming, binding, and more. Here are a couple of links:

Errors

Let's talk about errors in Python. As you learn, you will inevitably come across errors and exceptions. The Python interpreter will most of the time tell you what the problem is, and where it occurred. It is important to understand the vocabulary used by Python so that you can more quickly find and correct your errors.

Let's see the following example:

In [33]: def divide(a, b):
             return a / b
In [34]: divide(1, 0)
Out[34]: ---------------------------------------------------------
         ZeroDivisionError       Traceback (most recent call last)
         <ipython-input-2-b77ebb6ac6f6> in <module>()
         ----> 1 divide(1, 0)

         <ipython-input-1-5c74f9fd7706> in divide(a, b)
               1 def divide(a, b):
         ----> 2     return a / b

         ZeroDivisionError: division by zero

Here, we defined a divide() function, and called it to divide 1 by 0. Dividing a number by 0 is an error in Python. Here, a ZeroDivisionError exception was raised. An exception is a particular type of error that can be raised at any point in a program. It is propagated from the innards of the code up to the command that launched the code. It can be caught and processed at any point. You will find more details about exceptions at https://docs.python.org/3/tutorial/errors.html, and common exception types at https://docs.python.org/3/library/exceptions.html#bltin-exceptions.

The error message you see contains the stack trace, the exception type, and the exception message. The stack trace shows all function calls between the raised exception and the script calling point.

The top frame, indicated by the first arrow ---->, shows the entry point of the code execution. Here, it is divide(1, 0), which was called directly in the Notebook. The error occurred while this function was called.

The next and last frame is indicated by the second arrow. It corresponds to line 2 in our function divide(a, b). It is the last frame in the stack trace: this means that the error occurred there.

We will see later in this chapter how to debug such errors interactively in IPython and in the Jupyter Notebook. Knowing how to navigate up and down in the stack trace is critical when debugging complex Python code.

Object-oriented programming

Object-oriented programming (OOP) is a relatively advanced topic. Although we won't use it much in this book, it is useful to know the basics. Also, mastering OOP is often essential when you start to have a large code base.

In Python, everything is an object. A number, a string, or a function is an object. An object is an instance of a type (also known as class). An object has attributes and methods, as specified by its type. An attribute is a variable bound to an object, giving some information about it. A method is a function that applies to the object.

For example, the object 'hello' is an instance of the built-in str type (string). The type() function returns the type of an object, as shown here:

In [35]: type('hello')
Out[35]: str

There are native types, like str or int (integer), and custom types, also called classes, that can be created by the user.

In IPython, you can discover the attributes and methods of any object with the dot syntax and tab completion. For example, typing 'hello'.u and pressing Tab automatically shows us the existence of the upper() method:

In [36]: 'hello'.upper()
Out[36]: 'HELLO'

Here, upper() is a method available to all str objects; it returns an uppercase copy of a string.

A useful string method is format(). This simple and convenient templating system lets you generate strings dynamically, as shown in the following example:

In [37]: 'Hello {0:s}!'.format('Python')
Out[37]: Hello Python!

The {0:s} syntax means "replace this with the first argument of format(), which should be a string". The variable type after the colon is especially useful for numbers, where you can specify how to display the number (for example, .3f to display three decimals). The 0 makes it possible to replace a given value several times in a given string. You can also use a name instead of a position—for example 'Hello {name}!'.format(name='Python').

Some methods are prefixed with an underscore _; they are private and are generally not meant to be used directly. IPython's tab completion won't show you these private attributes and methods unless you explicitly type _ before pressing Tab.

In practice, the most important thing to remember is that appending a dot . to any Python object and pressing Tab in IPython will show you a lot of functionality pertaining to that object.

Functional programming

Python is a multi-paradigm language; it notably supports imperative, object-oriented, and functional programming models. Python functions are objects and can be handled like other objects. In particular, they can be passed as arguments to other functions (also called higher-order functions). This is the essence of functional programming.

Decorators provide a convenient syntax construct to define higher-order functions. Here is an example using the is_even() function from the previous Functions section:

In [38]: def show_output(func):
             def wrapped(*args, **kwargs):
                 output = func(*args, **kwargs)
                 print("The result is:", output)
             return wrapped

The show_output() function transforms an arbitrary function func() to a new function, named wrapped(), that displays the result of the function, as follows:

In [39]: f = show_output(is_even)
         f(3)
Out[39]: The result is: False

Equivalently, this higher-order function can also be used with a decorator, as follows:

In [40]: @show_output
         def square(x):
             return x * x
In [41]: square(3)
Out[41]: The result is: 9

You can find more information about Python decorators at https://en.wikipedia.org/wiki/Python_syntax_and_semantics#Decorators and at http://www.thecodeship.com/patterns/guide-to-python-function-decorators/.

Python 2 and 3

Let's finish this section with a few notes about Python 2 and Python 3 compatibility issues.

There are still some Python 2 code and libraries that are not compatible with Python 3. Therefore, it is sometimes useful to be aware of the differences between the two versions. One of the most obvious differences is that print is a statement in Python 2, whereas it is a function in Python 3. Therefore, print "Hello" (without parentheses) works in Python 2 but not in Python 3, while print("Hello") works in both Python 2 and Python 3.

There are several non-mutually exclusive options to write portable code that works with both versions:

  • futures: A built-in module supporting backward-incompatible Python syntax

  • 2to3: A built-in Python module to port Python 2 code to Python 3

  • six: An external lightweight library for writing compatible code

Here are a few references:

Going beyond the basics

You now know the fundamentals of Python, the bare minimum that you will need in this book. As you can imagine, there is much more to say about Python.

Following are a few further basic concepts that are often useful and that we cannot cover here, unfortunately. You are highly encouraged to have a look at them in the references given at the end of this section:

  • range and enumerate

  • pass, break, and, continue, to be used in loops

  • Working with files

  • Creating and importing modules

  • The Python standard library provides a wide range of functionality (OS, network, file systems, compression, mathematics, and more)

Here are some slightly more advanced concepts that you might find useful if you want to strengthen your Python skills:

  • Regular expressions for advanced string processing

  • Lambda functions for defining small anonymous functions

  • Generators for controlling custom loops

  • Exceptions for handling errors

  • with statements for safely handling contexts

  • Advanced object-oriented programming

  • Metaprogramming for modifying Python code dynamically

  • The pickle module for persisting Python objects on disk and exchanging them across a network

Finally, here are a few references: