Overview of this book

Mastering Python High Performance
Credits
www.PacktPub.com
Preface
Free Chapter
Profiling 101
The Profilers
Going Visual – GUIs to Help Understand Profiler Output
Optimize Everything
Generic Optimization Options
Lightning Fast Number Crunching with Numba, Parakeet, and pandas
Putting It All into Practice
Index

Running time complexity

When profiling and optimizing code, it's really important to understand what Running time complexity (RTC) is and how we can use that knowledge to properly optimize our code.

RTC helps quantify the execution time of a given algorithm. It does so by providing a mathematical approximation of the time a piece of code will take to execute for any given input. It is an approximation, because that way, we're able to group similar algorithms using that value.

RTC is expressed using something called Big O notation. In mathematics, Big O notation is used to express the limiting behavior of a given function when the terms tend to infinity. If I apply that concept in computer science, we can use Big O notation to express the limiting behavior of the function describing the execution time.

In other words, this notation will give us a broad idea of how long our algorithm will take to process an arbitrarily large input. It will not, however, give us a precise number for the time of execution, which would require a more in-depth analysis of the source code.

As I've said earlier, we can use this tendency to group algorithms. Here are some of the most common groups:

Constant time – O(1)

This is the simplest of them all. This notation basically means that the action we're measuring will always take a constant amount of time, and this time is not dependent on the size of the input.

Here are some examples of code that have O(1) execution time:

• Determining whether a number is odd or even:

```if number % 2:
odd = True
else:
odd = False```
• Printing a message into standard output:

`print "Hello world!"`

Even something more conceptually complex, like finding the value of a key inside a dictionary (or hash table), if implemented correctly, can be done in constant time. Technically speaking, accessing an element on the hash takes O(1) amortized time, which roughly means that the average time each operation takes (without taking into account edge cases) is a constant O(1) time.

Linear time – O(n)

Linear time dictates that for a given input of arbitrary length n, the amount of time required for the execution of the algorithm is linearly proportional to n, for instance, 3n, 4n + 5, and so on.

The preceding chart clearly shows that both the blue (3n) line and the red one (4n + 5) have the same upper limit as the black line (n) when x tends to infinity. So, to simplify, we can just say that all three functions are O(n).

Examples of algorithms with this execution order are:

• Finding the smallest value in an unsorted list

• Comparing two strings

• Deleting the last item inside a linked list

Logarithmic time – O(log n)

An algorithm with logarithmic execution time is one that will have a very determined upper limit time. A logarithmic function grows quickly at first, but it'll slow down as the input size gets bigger. It will never stop growing, but the amount it grows by will be so small that it will be irrelevant.

The preceding chart shows three different logarithmic functions. You can clearly see that they all possess a similar shape, including the upper limit x, which keeps increasing to infinity.

Some examples of algorithms that have logarithmic execution time are:

• Binary search

• Calculating Fibonacci numbers (using matrix multiplications)

Linearithmic time – O(nlog n)

A particular combination of the previous two orders of execution is the linearithmic time. It grows quickly as soon as the value of x starts increasing.

Here are some examples of algorithms that have this order of execution:

• Merge sort

• Heap sort

• Quick sort (at least its average time complexity)

Let's see a few examples of plotted linearithmic functions to understand them better:

Factorial time – O(n!)

Factorial time is one of the worst execution times we might get out of an algorithm. It grows so quickly that it's hard to plot.

Here is a rough approximation of how the execution time of our algorithm would look with factorial time:

An example of an algorithm with factorial execution time is the solution for the traveling salesman using brute force search (basically checking every single possible solution).

Quadratic execution time is another example of a fast growing algorithm. The bigger the input size, the longer it's going to take (this is true for most complexities, but then again, specially true for this one). Quadratic execution time is even less efficient that linearithmic time.

Some examples of algorithms having this order of execution are:

• Bubble sort

• Traversing a 2D array

• Insertion sort

Here are some examples of plotted exponential functions:

Finally, let's look at all examples plotted together to get a clear idea of algorithm efficiency:

Leaving aside constant execution time, which is clearly faster but most of the time impossible to achieve in complex algorithms, the order or preference should be:

• Logarithmic

• Linear

• Linearithmic

• Factorial

Obviously, there are cases when you'll have no choice but to get a quadratic execution time as the best possible result. The idea is to always aim for the faster algorithms, but the limitations of your problems and technology will affect the actual result.

Note

Note that between quadratic and factorial times, there are several other alternatives (cubic, n ^ 4, and so on).

Another important consideration is that most algorithms don't have only a single order of execution time. They can have up to three orders of execution time: for the best case, normal case, and worst case scenarios. The scenario is determined by the properties of the input data. For instance, the insertion sort algorithm will run much faster if the input is already sorted (best case), and it will be worst (exponential order) for other types of input.

Other interesting cases to look at are the data types used. They inherently come with execution time that is associated with actions you can perform on them (lookup, insert, search, and so on). Let's look at some of the most common data types and their associated actions:

Data Structure

Time complexity

Average case

Worst case

Indexing

Search

Insertion

Deletion

Indexing

Search

Insertion

Deletion

List

O(1)

O(n)

-

-

O(1)

O(n)

-

-

O(n)

O(n)

O(1)

O(1)

O(n)

O(n)

O(1)

O(n)

O(n)

O(n)

O(1)

O(1)

O(n)

O(n)

O(1)

O(1)

Dictionary

-

O(1)

O(1)

O(1)

-

O(n)

O(n)

O(n)

Binary search tree

O(log(n))

O(log(n))

O(log(n))

O(log(n))

O(n)

O(n)

O(n)

O(n)