Julia 1.0 Programming Cookbook

Julia 1.0 Programming Cookbook

By : Bogumił Kamiński, Przemysław Szufel

Buy this Book

Julia 1.0 Programming Cookbook

By: Bogumił Kamiński, Przemysław Szufel

Buy this Book

Overview of this book

Julia, with its dynamic nature and high-performance, provides comparatively minimal time for the development of computational models with easy-to-maintain computational code. This book will be your solution-based guide as it will take you through different programming aspects with Julia. Starting with the new features of Julia 1.0, each recipe addresses a specific problem, providing a solution and explaining how it works. You will work with the powerful Julia tools and data structures along with the most popular Julia packages. You will learn to create vectors, handle variables, and work with functions. You will be introduced to various recipes for numerical computing, distributed computing, and achieving high performance. You will see how to optimize data science programs with parallel computing and memory allocation. We will look into more advanced concepts such as metaprogramming and functional programming. Finally, you will learn how to tackle issues while working with databases and data processing, and will learn about on data science problems, data modeling, data analysis, data manipulation, parallel processing, and cloud computing with Julia. By the end of the book, you will have acquired the skills to work more effectively with your data

Title Page

Dedication

About Packt

Contributors

Preface

Free Chapter

Installing and Setting Up Julia

Introduction

Installing Julia from binaries

Julia IDEs

Julia support for text editors

Building Julia from sources on Linux

Running Julia inside the Cloud9 IDE in the AWS cloud

How to customize Julia on startup

Setting up Julia to use multiple cores

Useful options for interaction with Julia

Displaying computation results in Julia

Managing packages

Configuring Julia in Jupyter Notebook

Configuring Julia to work with JupyterLab

Configuring Julia with Jupyter Notebook in Terminal-only cloud environments

Data Structures and Algorithms

Introduction

Finding the index of a random minimum element in an array

Fast matrix multiplication

Implementing a custom pseudo-random number generator

Parsing Git logs with regular expressions

Non-standard ways to sort your data

Creating a function preimage - understanding how dictionaries and sets work

Working with UTF-8 strings

Data Engineering in Julia

Introduction

Managing streams, and reading and writing files

Using IOBuffer to efficiently work with in-memory streams

Fetching data from the internet

Writing a simple RESTful service

Working with JSON data

Working with date and time

Using object serialization in Julia

Running Julia as a background process

Reading and writing Microsoft Excel files

Handling Feather data

Reading CSV and FWF files

Numerical Computing with Julia

Introduction

Traversing matrices efficiently

Executing loops efficiently with conditional statements

Generating full factorial designs

Approximating pi using partial series sums

Running Monte Carlo simulations

Analyzing a queuing system

Working with complex numbers

Writing a simple optimization routine

Estimating a linear regression

Understanding broadcasting in Julia

Improving code performance using @inbounds

Creating a matrix from a set of vectors as rows

Using array views to avoid memory allocation

Variables, Types, and Functions

Introduction

Understanding subtyping in Julia

Using multiple dispatch to handle branching behavior

Using functions as variables in Julia

Functional programming in Julia

Scope of variables in Julia

Handling exceptions in Julia

Working with NamedTuples

Metaprogramming and Advanced Typing

Introduction

Metaprogramming

Macros and generated functions

Defining your own types - linked list

Defining primitive types

Understanding the structure of Julia numeric types with introspection

Using static arrays

The efficiency of mutable versus immutable types

Ensuring type stability of your code

Handling Analytical Data

Introduction

Converting data between DataFrame and Matrix

Investigating the contents of a data frame

Reading CSV data from the internet

Working with categorical data

Handling missing data

Split-apply-combine in DataFrames

Converting a data frame between wide and narrow formats

Comparing data frames for identity

Transforming rows of DataFrame

Creating pivot tables by chaining transformations of data frames

Julia Workflow

Introduction

Julia development workflow with Revise.jl

Benchmarking code

Profiling Julia code

Setting up logging in your code

Calling Python from Julia

Calling R from Julia

Managing project dependencies

Data Science

Introduction

Working with databases in Julia

Optimization using JuMP

Estimation using maximum likelihood

Complex plotting with Plots.jl

Building machine learning models with ScikitLearn.jl

Distributed Computing

Introduction

Multiprocessing in Julia

Sending parameters to remote Julia processes

Multithreading in Julia

Distributed computing with Julia

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Setting up Julia to use multiple cores

Current computers have multiple cores installed. In this recipe, we explain how to start Julia so that we can utilize them. There are two basic ways you can use multiple cores: via multithreading and multiprocessing (visit https://www.backblaze.com/blog/whats-the-diff-programs-processes-and-threads/ and https://en.wikipedia.org/wiki/Thread_(computing)#Threads_vs._processes, where you can find a basic explanation of the differences between these two approaches). The major difference is that processes have separate state information, whereas multiple threads within a process share process state as well as memory and other resources. Both options are discussed in this recipe.

Getting ready

In order to test how multiprocessing works, prepare two simple files that display a text message in the console. When running parallelization tests, we will see messages generated by those scripts appear asynchronously.

Create a hello.jl file in your working directory, containing the following code:

println("Hello " * join(ARGS, ", "))

And create hello2.jl with the following code:

println("Hello " * join(ARGS, ", "))
sleep(1)

Note

In the GitHub repository for this recipe, you will find the commands.txt file that contains the presented sequence of shell and Julia commands and the hello.jl and hello2.jl files described above.

Now, open your favorite terminal to execute the commands.

How to do it...

We will first explain how to start Julia using multiple processes. In the second part of the recipe, we will set up Julia to use multiple threads.

Multiple processes

In order to start several Julia processes, perform the following steps:

Specify the number of required worker processes using the -p option on Julia startup.
Then, check the number of workers in Julia by using the nworkers() function from the Distributed package.
Run the command following $ in your OS shell, then import the Distributed package and write nworkers() while in Julia, and then use exit() to go back to the shell:

$ julia --banner=no -p 2

julia> using Distributed

julia> nworkers()
2

julia> exit()

$

If you want to execute some script on every worker on startup, you can do it using the -L option.

Run the hello.jl and hello2.jlscripts (the steps to start Julia and exit it are the same as in the preceding steps):

$ julia --banner=no -p auto -L hello.jl
Hello !
 From worker 4: Hello !
 From worker 5: Hello !
julia> From worker 2: Hello !
 From worker 3: Hello !
julia> exit()

$ julia --banner=no -p auto -L hello2.jl
Hello !
 From worker 4: Hello !
 From worker 5: Hello !
 From worker 2: Hello !
 From worker 3: Hello !
julia> exit()

$

We can see that when the -L option is passed, then Julia stays in command line after executing the script (as opposed to running a script normally, where we have to pass the -i option to remain in REPL). The difference in behavior between hello.jl and hello2.jl is explained in the How it works... section.

Multiple threads

Julia can be run in a multithreaded mode. This mode is achieved via the JULIA_NUM_THREADS system environment parameter. One should perform the following steps:

To start Julia with the number of threads equal to the number of cores in your machine, you have to set the environment variable JULIA_NUM_THREADS first
Check how many threads Julia is using with the Threads.nthreads() function

Running the preceding steps is handled differently on Linux and Windows.

Here is a list of steps to be followed:

If you are using bash on Linux, run the following commands:

$ export JULIA_NUM_THREADS=`nproc`
$ julia -e "println(Threads.nthreads())"
4
$

If you are using cmd on Windows, run the following commands:

C:\> set JULIA_NUM_THREADS=%NUMBER_OF_PROCESSORS%
C:\> julia -e "println(Threads.nthreads())"
 4
C:\>

Observe that we have not used the -i option in either case, so the process terminated immediately.

How it works...

A switch, -p {N|auto}, tells Julia to spin up N additional worker processes on startup. The auto option in the -p switch starts as many workers as you have cores on your machine, so julia -p auto is equivalent to:

julia -p `nproc` on Linux
julia -p %NUMBER_OF_PROCESSORS% on Windows

It is important to understand that when you start N workers, where N is greater than 1, then Julia will spin up N+1 processes. You can check it using the nprocs() function—one master process and N worker processes. If N is equal to 1, then only one process is started.

We can see here that hello.jl was executed on the master process and on all of the worker processes. Additionally, observe that the execution was asynchronous. In this case, workers 4 and 5 printed their message before the Julia prompt was printed by the master process, but workers 2 and 3 executed their print method after it. By adding a sleep(1) statement in hello2.jl, we make the master process wait for one second, which is sufficient time for all workers to run their println command.

As you have seen, in order to start Julia with multiple threads, you have to set the environment variable JULIA_NUM_THREADS. It is used by Julia to determine how many threads it should use. This value—in order to have any effect—must be set before Julia is started. This means that you can access it via the ENV["JULIA_NUM_THREADS"] option but changing it when Julia is running will not add or remove threads. Therefore, before running Julia you have to type the following in a terminal session:

export JULIA_NUM_THREADS=[number of threads] on Linux or if you use bash on Windows
set JULIA_NUM_THREADS=[number of threads] on Windows if you use the standard shell

There's more...

You can also add processes after Julia has started using the addprocs function. We are running the following code on Windows with two drives,C: and D:, present. Julia is started in the D:\directory:

D:\> julia --banner=no -p 2 -L hello2.jl
Hello
 From worker 3: Hello
 From worker 2: Hello
julia> pwd()
"D:\\"

julia> using Distributed

julia> pmap(i -> (i, myid(), pwd()), 1:nworkers())
2-element Array{Tuple{Int64,Int64,String},1}:
 (1, 2, "D:\\")
 (2, 3, "D:\\")

julia> cd("C:\\")

julia> pwd()
"C:\\"

julia> addprocs(2)
2-element Array{Int64,1}:
 4
 5

julia> pmap(i -> (i,myid(),pwd()), 1:nworkers())
4-element Array{Tuple{Int64,Int64,String},1}:
 (1, 3, "D:\\")
 (2, 2, "D:\\")
 (3, 5, "C:\\")
 (4, 4, "C:\\")

In particular, we see that each worker has its own working directory, which is initially set to the working directory of the master Julia process when it is started. Also, addprocs does not execute the script that was specified by the -L switch on Julia startup.

Additionally, we can see the simple use of the pmap and myid functions. The first one is a parallelized version of the map function. The second returns the identification number of a process that it is run on.

As we explained earlier, it is not possible to add threads to a running Julia process. The number of threads has to be specified before Julia is started.

Deciding between using multiple processes and multiple threads is not a simple decision. A rule of thumb is to use threads if there is a need for data sharing and frequent communication between tasks running in parallel.

Julia 1.0 Programming Cookbook

By : Bogumił Kamiński, Przemysław Szufel

Julia 1.0 Programming Cookbook

By: Bogumił Kamiński, Przemysław Szufel

Overview of this book

Related Content you might be interested in

Current Title:

Julia 1.0 Programming Cookbook

Julia High Performance.

Julia 1.0 Programming

Learning Julia

Setting up Julia to use multiple cores

Getting ready

Note

How to do it...

Multiple processes

Multiple threads

How it works...

There's more...

See also