Book Image

Julia High Performance

By : Avik Sengupta
Book Image

Julia High Performance

By: Avik Sengupta

Overview of this book

Julia is a high performance, high-level dynamic language designed to address the requirements of high-level numerical and scientific computing. Julia brings solutions to the complexities faced by developers while developing elegant and high performing code. Julia High Performance will take you on a journey to understand the performance characteristics of your Julia programs, and enables you to utilize the promise of near C levels of performance in Julia. You will learn to analyze and measure the performance of Julia code, understand how to avoid bottlenecks, and design your program for the highest possible performance. In this book, you will also see how Julia uses type information to achieve its performance goals, and how to use multuple dispatch to help the compiler to emit high performance machine code. Numbers and their arrays are obviously the key structures in scientific computing – you will see how Julia’s design makes them fast. The last chapter will give you a taste of Julia’s distributed computing capabilities.
Table of Contents (14 chapters)

SIMD parallelization


SIMD is the method of parallelizing computation whereby a single operation is performed on many data elements simultaneously. Modern CPU architectures contain instruction sets that can do this, operating on many variables at once.

Say you want to add two vectors, placing the result in a third vector. Let's imagine that there is no standard library function to achieve this, and you were writing a naïve implementation of this operation. Execute the following code:

function sum_vectors!(x, y, z)
    n = length(x)
    for i = 1:n
        x[i] = y[i] + z[i]
    end
end

Say the input arrays to this function has 1,000 elements. Then, the function essentially performs 1,000 sequential additions. A typical SIMD-enabled processor, however, can add maybe eight numbers in one CPU cycle. Adding each of the elements sequentially can, therefore, be a waste of CPU capabilities.

On the other hand, rewriting code to operate on parts of the array in parallel can get complex quickly. Doing...