Julia 1.0 High Performance - Second Edition

By : Avik Sengupta

Julia 1.0 High Performance - Second Edition

By: Avik Sengupta

Overview of this book

Julia is a high-level, high-performance dynamic programming language for numerical computing. If you want to understand how to avoid bottlenecks and design your programs for the highest possible performance, then this book is for you. The book starts with how Julia uses type information to achieve its performance goals, and how to use multiple dispatches to help the compiler emit high-performance machine code. After that, you will learn how to analyze Julia programs and identify issues with time and memory consumption. We teach you how to use Julia's typing facilities accurately to write high-performance code and describe how the Julia compiler uses type information to create fast machine code. Moving ahead, you'll master design constraints and learn how to use the power of the GPU in your Julia code and compile Julia code directly to the GPU. Then, you'll learn how tasks and asynchronous IO help you create responsive programs and how to use shared memory multithreading in Julia. Toward the end, you will get a flavor of Julia's distributed computing capabilities and how to run Julia programs on a large distributed cluster. By the end of this book, you will have the ability to build large-scale, high-performance Julia applications, design systems with a focus on speed, and improve the performance of existing programs.

Title Page

Julia High Performance Second Edition

Dedication

About Packt

Why subscribe?

Foreword

Contributors

About the author

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Julia is Fast

Julia – fast and dynamic

Designed for speed

How fast can Julia be?

Summary

Analyzing Performance

Timing Julia functions

The Julia profiler

Analyzing memory allocation

Statistically accurate benchmarking

Summary

Types, Type Inference, and Stability

The Julia type system

Type-stability

Kernel methods and function barriers

Types in storage locations

Summary

Making Fast Function Calls

Using globals

Inlining

Using macros for performance

Generated functions

Summary

Fast Numbers

Numbers in Julia, their layout, and storage

Trading performance for accuracy

Subnormal numbers

Summary

Using Arrays

Array internals in Julia

Bounds checking

Allocations and in-place operations

Broadcasting

Array views

SIMD parallelization (AVX2, AVX512)

Specialized array types

Yeppp!

Writing generic library functions with arrays

Summary

Accelerating Code with the GPU

Technical requirements

Getting started with GPUs

CUDA and Julia

CuArrays

Deep learning on the GPU

ArrayFire

Summary

Concurrent Programming with Tasks

Tasks

Using tasks

Communicating between tasks

High-performance I/O

Summary

Threads

Thread safety and synchronization primitives

Threaded libraries

The future of threading

Summary

Distributed Computing with Julia

Creating Julia clusters

Programming parallel tasks

Shared arrays

Summary

Licences

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Designed for speed

When the creators of Julia launched the language into the world, they said the following in a blog post entitled Why We Created Julia, which was published in early 2012:

"We want a language that's open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that's homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled. (Did we mention it should be as fast as C?)"

High-performance, indeed nearly C-level performance, has therefore been one of the founding principles of the language. It's built from the ground up to enable the fast execution of code.

In addition to being a core design principle, it has also been a necessity from the early stages of its development. A very large part of Julia's standard library, including very basic low-level operations, is written in Julia itself. For example, the + operation to add two integers is defined in Julia itself. (Refer to: https://github.com/JuliaLang/julia/blob/e1def102429941705bc16009e35a74abcdb6f88e/base/int.jl#L38.) Similarly, the basic for loop uses the standard iteration mechanism available to all user-defined types. Broadcasting, which is a fundamental low-level operation in the compiler, can be completely overridden by custom array types (this is used heavily in CUDA arrays, for example). All of this means that the compiler had to be very fast from the very beginning to create a usable language. The creators of Julia did not have the luxury of escaping to C for even the core elements of the library.

We will note throughout the book the many design decisions that have been made with an eye to high performance, but there are three main elements that create the basis for Julia's speed: a high performance Just in Time compiler, LLVM to generate machine code, and a type system that allows expressive code.

JIT and LLVM

Julia is a Just In Time (JIT) compiled language, rather than an interpreted one. This allows Julia to be dynamic, without having the overhead of interpretation. This compilation infrastructure is built on top of LLVM—more information about it is available on its website: http://llvm.org.

The LLVM compiler infrastructure project originated at the University of Illinois. It now has contributions from a very large number of corporate as well as independent developers. As a result of all this work, it is now a very high-quality, yet modular, system for many different compilation and code generation activities.

Julia uses LLVM for its JIT compilation needs. The Julia runtime code generator produces LLVM Intermediate Representation (IR) and hands it over to LLVM's JIT compiler, which in turn generates machine code that is executed on the CPU. As a result, sophisticated compilation techniques that are built into LLVM are ready and available to Julia, from simple ones (such as Loop Unrolling or Loop Deletion) to state-of-the-art ones (such as SIMD Vectorization). These compiler optimizations form a very large body of work and, in this sense, the existence of LLVM is very much a pre-requisite to the existence of Julia. It would have been an almost impossible task for a small team of developers to build this compiler and code generation infrastructure from scratch.

Just-In-Time compilation:
A technique in which the code in a high-level language is converted to machine code for execution on the CPU at runtime. This is in contrast to interpreted languages, whose runtime executes the source language directly.

This usually has a significantly higher overhead. On the other hand, Ahead of Time (AOT) compilation refers to the technique of converting a source language into machine code as a separate step prior to running the code. In this case, the converted machine code can usually be saved to disk as an executable file.

Types, type inference, and code specialization

While LLVM provides the basic infrastructure that allows fast machine code to be produced, it must be noted that adding an LLVM compiler to any language will not necessarily make it execute faster. Julia's syntax and semantics have been carefully designed to allow high-performance execution, and a large part of this is due to how Julia uses types in the language. We will, of course, have much more to say about types in Julia throughout this book. At this stage, suffice it to say that Julia's concept of types is a key ingredient of its performance.

The Julia compiler attempts to infer the type of all data used in a program, and compiles different versions of functions specialized to particular types of its arguments. To take a simple example, consider the ^ (power) function. This function can be called with integer or floating point (i.e, fractional, or decimal) arguments. The mathematical definitions and, thus, the implementation of this function are very different for integers and floats. So, Julia will compile, on demand, two versions of the code, one for integer arguments, and one for floating point arguments, and insert the appropriate call in the code when it compiles the program. This means that, at runtime, fast, straight-line code without any type checks will be executed on the CPU.

Julia allows us to introspect the native code that runs on the CPU. Using this facility, we can see that very different code is generated for integer and floating point arguments. So, let's look at the following machine code, generated for squaring an integer:

julia> @code_native 3^2
  pushl %eax
  decl %eax
  movl $202927424, %eax ## imm = 0xC186D40
  addl %eax, (%eax)
  addb %al, (%eax)
  calll *%eax
  popl %ecx
  retl

We omitted some boilerplate output when showing the result of the @code macros, in order to focus on the relevant parts. Run this code yourself to see the full output.

Let's now look at the following code, generated for squaring a floating point value:

julia> @code_native 3.5^2
  vcvtsi2sdl %edi, %xmm1, %xmm1
  decl %eax
  movl $1993314664, %eax ## imm = 0x76CF9168
  .byte 0xff .byte 0x7f .byte 0x00
  addb %bh, %bh
  loopne 0x68
  nopw %cs:(%eax, %eax)

You will notice that the code looks very different (although the actual meaning of the code is not relevant for now). You will notice that there are no runtime type checks in the code. This gets to the heart of Julia's design and its performance claims.

The ability of the compiler to reason about types is due to the combination of a sophisticated dataflow-based algorithm, and careful language design that allows this information to be inferred from most programs before execution begins. Put in another way, the language is designed to make it easy to statically analyze its data types.

If there is a single reason for Julia being such a high-performance language, this is it. This is why Julia is able to run at C-like speeds while still being a dynamic language. Type inference and code specialization are as close to a secret sauce as Julia gets. It is notable that, outside this type inference mechanism, the Julia compiler is quite simple. It does not include many of the advanced Just in Time optimizations that Java and JavaScript compilers are known to use. When the compiler has enough information about the types within the code, it can generate optimized, straight-line code without many of these advanced techniques.

Detailed information about the implementation of type inference and code specialization in Julia can be found in the paper Julia: A Fresh Approach to Numerical Computing. Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah (2017) SIAM Review, 59: 65–98. doi: 10.1137/141000671. URL: https://julialang.org/research/julia-fresh-approach-BEKS.pdf

It is useful to note here that, unlike some optionally typed dynamic languages, simply adding type annotations to your code does not make Julia go any faster. Type inference means that the compiler is usually able to figure out the types of variables when necessary. Hence, you can usually write high-level code without fighting with the compiler about types, and still achieve superior performance.

Julia 1.0 High Performance - Second Edition

By : Avik Sengupta

Julia 1.0 High Performance - Second Edition

By: Avik Sengupta

Overview of this book

Related Content you might be interested in

Current Title:

Julia 1.0 High Performance - Second Edition

Julia 1.0 Programming Cookbook

Learning Julia

Mastering Julia