Book Image

Learning Julia

By : Anshul Joshi, Rahul Lakhanpal
Book Image

Learning Julia

By: Anshul Joshi, Rahul Lakhanpal

Overview of this book

Julia is a highly appropriate language for scientific computing, but it comes with all the required capabilities of a general-purpose language. It allows us to achieve C/Fortran-like performance while maintaining the concise syntax of a scripting language such as Python. It is perfect for building high-performance and concurrent applications. From the basics of its syntax to learning built-in object types, this book covers it all. This book shows you how to write effective functions, reduce code redundancies, and improve code reuse. It will be helpful for new programmers who are starting out with Julia to explore its wide and ever-growing package ecosystem and also for experienced developers/statisticians/data scientists who want to add Julia to their skill-set. The book presents the fundamentals of programming in Julia and in-depth informative examples, using a step-by-step approach. You will be taken through concepts and examples such as doing simple mathematical operations, creating loops, metaprogramming, functions, collections, multiple dispatch, and so on. By the end of the book, you will be able to apply your skills in Julia to create and explore applications of any domain.
Table of Contents (17 chapters)
Title Page
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
8
Data Visualization and Graphics

Julia's importance in data science


In the last decade, data science has become a buzzword, with Harvard Business Review naming it the sexiest job of the 21st century. What is a data scientist? The answer was published in The Guardian (https://www.theguardian.com/careers/2015/jun/30/whats-a-data-scientist-and-how-do-i-become-one):

A data scientist takes raw data and marries it with analysis to make it accessible and more valuable for an organization. To do this, they need a unique blend of skills—a solid grounding in maths and algorithms and a good understanding of human behaviors, as well as knowledge of the industry they're working in, to put their findings into context. From here, they can unlock insights from the datasets and start to identify trends. 

The technical skills of a data scientist are varied but, generally, they are good at programming, and have a very strong background in mathematics—especially statistics, skills in machine learning, and knowledge of big data. A data scientist is required to have in-depth understanding of the domain he/she is working in. Julia was designed for scientific and numerical computation. And with the advent of big data, there is a requirement to have a language that can work on huge amounts of data. Although we have Spark and MapReduce (Hadoop) as processing engines that are generally used with Python, Scala, and Java, Julia with Intel's High Performance Analytics Toolkit can provide an alternative option. It may also be worth noting that Julia excels at parallel computing but is much easier to write and prototype than Spark/Hadoop.

One great feature of Julia is that it solves the 2-language problem. Generally, with Python and R, code that is doing most of the heavy workload is written in C/C++ and it is then called. This is not required with Julia, as it can perform comparably to C/C++. Therefore, complete code—including code that does heavy computations—can be written in Julia itself.

Benchmarks

We mentioned the speed of Julia above, and that's what sets this language apart from traditional dynamically typed languages. Speed is its specialty. So, how fast can Julia be? The following micro-benchmark results were obtained on a single core (serial execution) on an Intel(R) Xeon(R) CPU E7-8850 2.00 GHz CPU with 1 TB of 1067 MHz DDR3 RAM running Linux:

Julia 0.4.0

Python 3.4.3

R 3.2.2

MATLAB R2015b

Go go1.5

Java 1.8.0_45

fib

2.11

77.76

533.52

26.89

1.86

1.21

parse_int

1.45

17.02

45.73

802.52

1.20

3.35

quicksort

1.15

32.89

264.54

4.92

1.29

2.60

mandel

0.79

15.32

53.16

7.58

1.11

1.35

pi_sum

1.00

21.99

9.56

1.00

1.00

1.00

rand_mat_stat

1.66

17.93

14.56

14.52

2.96

3.92

rand_mat_mul

1.02

1.14

1.57

1.12

1.42

2.36

These benchmark times are relative to C (smaller is better, C performance = 1.0). Benchmarks can be misleading and are not always true. Good coding practices need to be followed and exactly identical conditions are required to measure them side by side. Julia has been quite open about how it measured these benchmarks and the code is available at https://github.com/JuliaLang/julia/tree/master/test/perf/micro.

  • Julia is significantly faster than Python. There is a huge difference in performance in the benchmarks. However, some libraries for numerical computation available to Python are written in C, and here it performs nearly equivalent to Julia.
  • R was specifically designed for statisticians. It has a huge set of libraries for statistics and numerical computation and is available for free. It used to be the language of choice for data scientists (now Python is preferred). R is single-threaded and is a lot slower than Julia.
  • MATLAB is not a free product. It comes with a paid license (students may get discounts). It is used by statisticians and academicians for some specific use cases. The above benchmarks run a lot slower on MATLAB.
  • Go is designed from scratch for system programming. It was created by Google and the source code is available on GitHub, where it is actively developed. Go performs really well on these benchmarks, but it is not designed for numerical and scientific computing.
  • Java performs well. It beats Julia in some benchmarks and Julia beats it in others. But we need to consider the development time associated with it. Julia is designed in a way that it can be used even for rapid prototyping. That makes it unique.

Julia is therefore well-suited to data science problems. Its ecosystem may not be as comprehensive as other languages right now, but it is growing at a great pace.