Book Image

Getting Started with Julia

By : Ivo Balbaert
Book Image

Getting Started with Julia

By: Ivo Balbaert

Overview of this book

Table of Contents (19 chapters)
Getting Started with Julia
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
The Rationale for Julia
Index

The Rationale for Julia

This introduction will present you with the reasons why Julia is quickly growing in popularity in the technical, data scientist, and high-performance computing arena. We will cover the following topics:

  • The scope of Julia

  • Julia's place among other programming languages

  • A comparison with other languages for the data scientist

  • Useful links

The scope of Julia

The core designers and developers of Julia (Jeff Bezanson, Stefan Karpinski, and Viral Shah) have made it clear that Julia was born out of a deep frustration with the existing software toolset in the technical computing disciplines. Basically, it boils down to the following dilemma:

  • Prototyping is a problem in this domain that needs a high-level, easy-to-use, and flexible language that lets the developer concentrate on the problem itself instead of on low-level details of the language and computation.

  • The actual computation of a problem needs maximum performance; a factor of 10 in computation time makes a world of difference (think of one day versus ten days), so the production version often has to be (re)written in C or FORTRAN.

  • Before Julia, practitioners had to be satisfied with a "speed for convenience" trade-off, use developer-friendly and expressive, but decades-old interpreted languages such as MATLAB, R, or Python to express the problem at a high level. To program the performance-sensitive parts and speed up the actual computation, people had to resort to statically compiled languages such as C or FORTRAN, or even the assembly code. Mastery on both the levels is not evident: writing high-level code in MATLAB, R, or Python for prototyping on the one hand, and writing code that does the same thing in C, which is used for the actual execution.

    Julia was explicitly designed to bridge this gap. It gives you the possibility of writing high-performance code that uses CPU and memory resources as effectively as can be done in C, but working in pure Julia all the way down, reduces the need for a low-level language. This way, you can rapidly iterate using a simple programming model from the problem prototype to near-C performance. The Julia developers have proven that working in one environment that has the expressive capabilities as well as the pure speed is possible using the recent advances in Low Level Virtual Machine Just in Time (LLVM JIT) compiler technologies (for more information, see http://en.wikipedia.org/wiki/LLVM).

In summary, they designed Julia to have the following specifications:

  • Julia is open source and free with a liberal (MIT) license.

  • It is designed to be an easy-to-use and learn, elegant, clear and dynamic, interactive language by reducing the development time. To that end, Julia almost looks like the pseudo code with an obvious and familiar mathematical notation; for example, here is the definition for a polynomial function, straight from the code:

    x -> 7x^3 + 30x^2 + 5x + 42

    Notice that there is no need to indicate the multiplications.

  • It provides the computational power and speed without having to leave the Julia environment.

  • Metaprogramming and macro capabilities (due to its homoiconicity (refer to Chapter 7, Metaprogramming in Julia), inherited from Lisp), to increase its abstraction power.

  • Also, it is usable for general programming purposes, not only in pure computing disciplines.

  • It has built-in and simple to use concurrent and parallel capabilities to thrive in the multicore world of today and tomorrow.

Julia unites this all in one environment, something which was thought impossible until now by most researchers and language designers.

The Julia logo

Julia's place among the other programming languages

Julia reconciles and brings together the technologies that before were considered separate, namely:

  • The dynamic, untyped, and interpreted languages on the one hand (Python, Ruby, Perl, MATLAB/Octave, R, and so on)

  • The statically typed and compiled languages on the other (C, C++, Fortran, and Fortress)

How can Julia have the flexibility of the first and the speed of the second category?

Julia has no static compilation step. The machine code is generated just-in-time by an LLVM-based JIT compiler. This compiler, together with the design of the language, helps Julia to achieve maximal performance for numerical, technical, and scientific computing. The key for the performance is the type information, which is gathered by a fully automatic and intelligent type inference engine, that deduces the type from the data contained in the variables. Indeed, because Julia has a dynamic type system, declaring the type of variables in the code is optional. Indicating types is not necessary, but it can be done to document the code, improve tooling possibilities, or in some cases, to give hints to the compiler to choose a more optimized execution path. This optional typing discipline is an aspect it shares with Dart. Typeless Julia is a valid and useful subset of the language, similar to traditional dynamic languages, but it nevertheless runs at statically compiled speeds. Julia applies generic programming and polymorphic functions to the limit, writing an algorithm just once and applying it to a broad range of types. This provides common functionality across drastically different types, for example: size is a generic function with 50 concrete method implementations. A system called dynamic multiple dispatch efficiently picks the optimal method for all of a function's arguments from tens of method definitions. Depending on the actual types very specific and efficient native code implementations of the function are chosen or generated, so its type system lets it align closer with primitive machine operations.

Note

In summary, data flow-based type inference implies multiple dispatch choosing specialized execution code.

However, do keep in mind that types are not statically checked. Exceptions due to type errors can occur at runtime, so thorough testing is mandatory. As to categorizing Julia in the programming language universe, it embodies multiple paradigms, such as procedural, functional, metaprogramming, and also (but not fully) object oriented. It is by no means an exclusively class-based language such as Java, Ruby, or C#. Nevertheless, its type system offers a kind of inheritance and is very powerful. Conversions and promotions for numeric and other types are elegant, friendly, and swift, and user-defined types are as fast and compact as built-in types. As for functional programming, Julia makes it very easy to design programs with pure functions and has no side effects; functions are first-class objects, as in mathematics.

Julia also supports a multiprocessing environment based on a message passing model to allow programs to run via multiple processes (local or remote) using distributed arrays, enabling distributed programs based on any of the models for parallel programming.

Julia is equally suited for general programming as is Python. It has as good and modern (Unicode capable) string processing and regular expressions as Perl or other languages. Moreover, it can also be used at the shell level, as a glue language to synchronize the execution of other programs or to manage other processes.

Julia has a standard library written in Julia itself, and a built-in package manager based on GitHub, which is called Metadata, to work with a steadily growing collection of external libraries called packages. It is cross platform, supporting GNU/Linux, Darwin/OS X, Windows, and FreeBSD for both x86/64 (64-bit) and x86 (32-bit) architectures.

A comparison with other languages for the data scientist

Because speed is one of the ultimate targets of Julia, a benchmark comparison with other languages is displayed prominently on the Julia website (http://julialang.org/). It shows that Julia's rivals C and Fortran, often stay within a factor of two of fully optimized C code, and leave the traditional dynamic language category far behind. One of Julia's explicit goals is to have sufficiently good performance that you never have to drop down into C. This is in contrast to the following environments, where (even for NumPy) you often have to work with C to get enough performance when moving to production. So, a new era of technical computing can be envisioned, where libraries can be developed in a high-level language instead of in C or FORTRAN. Julia is especially good at running MATLAB and R-style programs. Let's compare them somewhat more in detail.

MATLAB

Julia is instantly familiar to MATLAB users; its syntax strongly resembles that of MATLAB, but Julia aims to be a much more general purpose language than MATLAB. The names of most functions in Julia correspond to the MATLAB/Octave names, and not the R names. Under the covers, however, the way the computations are done, things are extremely different. Julia also has equally powerful capabilities in linear algebra, the field where MATLAB is traditionally applied. However, using Julia won't give you the same license fee headaches. Moreover, the benchmarks show that it is from 10 to 1,000 times faster depending on the type of operation, also when compared to Octave (the open source version of MATLAB). Julia provides an interface to the MATLAB language with the package MATLAB.jl (https://github.com/lindahua/MATLAB.jl).

R

R was until now the chosen development language in the statistics domain. Julia proves to be as usable as R in this domain, but again with a performance increase of a factor of 10 to 1,000. Doing statistics in MATLAB is frustrating, as is doing linear algebra in R, but Julia fits both the purposes. Julia has a much richer type system than the vector-based types of R. Some statistics experts such as Douglas Bates heavily support and promote Julia as well. Julia provides an interface to the R language with the package Rif.jl (https://github.com/lgautier/Rif.jl).

Python

Again, Julia has a performance head start of a factor of 10 to 30 times as compared to Python. However, Julia compiles the code that reads like Python into machine code that performs like C. Furthermore, if necessary you can call Python functions from within Julia using the PyCall package (https://github.com/stevengj/PyCall.jl).

Because of the huge number of existing libraries in all these languages, any practical data scientist can and will need to mix the Julia code with R or Python when the problem at hand demands it.

Julia can also be applied to data analysis and big data, because these often involve predictive analysis, modeling problems that can often be reduced to linear algebra algorithms, or graph analysis techniques, all things Julia is good at tackling.

In the field of High Performance Computing (HPC), a language such as Julia has long been lacking. With Julia, domain experts can experiment and quickly and easily express a problem in such a way that they can use modern HPC hardware as easily as a desktop PC. In other words, a language that gets users started quickly without the need to understand the details of the underlying machine architecture is very welcome in this area.

Useful links

The following are the links that can be useful while using Julia:

Summary

In this introduction, we gave an overview of Julia's characteristics and compared them to the existing languages in its field. Julia's main advantage is its ability to generate specialized code for different input types. When coupled with the compiler's ability to infer these types, this makes it possible to write the Julia code at an abstract level while achieving the efficiency associated with the low-level code. Julia is already quite stable and production ready. The learning curve for Julia is very gentle; the idea being that people who don't care about fancy language features should be able to use it productively too and learn about new features only when they become useful or needed.