Book Image

Julia High Performance

By : Avik Sengupta
Book Image

Julia High Performance

By: Avik Sengupta

Overview of this book

Julia is a high performance, high-level dynamic language designed to address the requirements of high-level numerical and scientific computing. Julia brings solutions to the complexities faced by developers while developing elegant and high performing code. Julia High Performance will take you on a journey to understand the performance characteristics of your Julia programs, and enables you to utilize the promise of near C levels of performance in Julia. You will learn to analyze and measure the performance of Julia code, understand how to avoid bottlenecks, and design your program for the highest possible performance. In this book, you will also see how Julia uses type information to achieve its performance goals, and how to use multuple dispatch to help the compiler to emit high performance machine code. Numbers and their arrays are obviously the key structures in scientific computing – you will see how Julia’s design makes them fast. The last chapter will give you a taste of Julia’s distributed computing capabilities.
Table of Contents (14 chapters)

Allocations and in-place operations


Consider the following trivial function, xpow, which takes an integer as input and returns the first few powers of the number. Another function, xpow_loop, uses the first function to compute the sum of squares of a large sequence of numbers, as follows:

function xpow(x)
   return [x x^2 x^3 x^4]
end

function xpow_loop(n)
    s = 0
    for i = 1:n
      s = s + xpow(i)[2]
    end
   return s
end

Benchmarking this function for a large input shows that this function is quite slow, as follows:

julia> @benchmark xpow_loop(1000000)
================ Benchmark Results ========================
     Time per evaluation: 103.17 ms [101.39 ms, 104.95 ms]
Proportion of time in GC: 13.15% [12.76%, 13.53%]
        Memory allocated: 152.58 mb
   Number of allocations: 4999441 allocations
       Number of samples: 97
   Number of evaluations: 97
 Time spent benchmarking: 10.16 s

The clue is in the number of allocations displayed in the preceding output. Within the xpow...