Book Image

Julia 1.0 Programming Cookbook

By : Bogumił Kamiński, Przemysław Szufel
Book Image

Julia 1.0 Programming Cookbook

By: Bogumił Kamiński, Przemysław Szufel

Overview of this book

Julia, with its dynamic nature and high-performance, provides comparatively minimal time for the development of computational models with easy-to-maintain computational code. This book will be your solution-based guide as it will take you through different programming aspects with Julia. Starting with the new features of Julia 1.0, each recipe addresses a specific problem, providing a solution and explaining how it works. You will work with the powerful Julia tools and data structures along with the most popular Julia packages. You will learn to create vectors, handle variables, and work with functions. You will be introduced to various recipes for numerical computing, distributed computing, and achieving high performance. You will see how to optimize data science programs with parallel computing and memory allocation. We will look into more advanced concepts such as metaprogramming and functional programming. Finally, you will learn how to tackle issues while working with databases and data processing, and will learn about on data science problems, data modeling, data analysis, data manipulation, parallel processing, and cloud computing with Julia. By the end of the book, you will have acquired the skills to work more effectively with your data
Table of Contents (18 chapters)
Title Page
Copyright and Credits
Dedication
About Packt
Contributors
Preface
Index

Setting up Julia to use multiple cores


Current computers have multiple cores installed. In this recipe, we explain how to start Julia so that we can utilize them. There are two basic ways you can use multiple cores: via multithreading and multiprocessing (visit https://www.backblaze.com/blog/whats-the-diff-programs-processes-and-threads/ and https://en.wikipedia.org/wiki/Thread_(computing)#Threads_vs._processes, where you can find a basic explanation of the differences between these two approaches). The major difference is that processes have separate state information, whereas multiple threads within a process share process state as well as memory and other resources. Both options are discussed in this recipe.

Getting ready

In order to test how multiprocessing works, prepare two simple files that display a text message in the console. When running parallelization tests, we will see messages generated by those scripts appear asynchronously.

Create a hello.jl file in your working directory, containing the following code:

println("Hello " * join(ARGS, ", "))

 

And create hello2.jl with the following code:

println("Hello " * join(ARGS, ", "))
sleep(1)

Note

In the GitHub repository for this recipe, you will find the commands.txt file that contains the presented sequence of shell and Julia commands and the hello.jl and hello2.jl files described above.

Now, open your favorite terminal to execute the commands.

How to do it...

We will first explain how to start Julia using multiple processes. In the second part of the recipe, we will set up Julia to use multiple threads.

Multiple processes

In order to start several Julia processes, perform the following steps:

  1. Specify the number of required worker processes using the -p option on Julia startup.
  2. Then, check the number of workers in Julia by using the nworkers() function from the Distributed package.
  3. Run the command following $ in your OS shell, then import the Distributed package and write nworkers() while in Julia, and then use exit() to go back to the shell:
$ julia --banner=no -p 2

julia> using Distributed

julia> nworkers()
2

julia> exit()

$

If you want to execute some script on every worker on startup, you can do it using the -L option.

  1. Run the hello.jl and hello2.jlscripts (the steps to start Julia and exit it are the same as in the preceding steps):
$ julia --banner=no -p auto -L hello.jl
Hello !
 From worker 4: Hello !
 From worker 5: Hello !
julia> From worker 2: Hello !
 From worker 3: Hello !
julia> exit()

$ julia --banner=no -p auto -L hello2.jl
Hello !
 From worker 4: Hello !
 From worker 5: Hello !
 From worker 2: Hello !
 From worker 3: Hello !
julia> exit()

$

We can see that when the -L option is passed, then Julia stays in command line after executing the script (as opposed to running a script normally, where we have to pass the -i option to remain in REPL). The difference in behavior between hello.jl and hello2.jl is explained in the How it works... section.

Multiple threads

Julia can be run in a multithreaded mode. This mode is achieved via the JULIA_NUM_THREADS system environment parameter. One should perform the following steps:

  1. To start Julia with the number of threads equal to the number of cores in your machine, you have to set the environment variable JULIA_NUM_THREADS first
  2. Check how many threads Julia is using with the Threads.nthreads() function

 

Running the preceding steps is handled differently on Linux and Windows.

Here is a list of steps to be followed:

  1. If you are using bash on Linux, run the following commands:
$ export JULIA_NUM_THREADS=`nproc`
$ julia -e "println(Threads.nthreads())"
4
$
  1. If you are using cmd on Windows, run the following commands:
C:\> set JULIA_NUM_THREADS=%NUMBER_OF_PROCESSORS%
C:\> julia -e "println(Threads.nthreads())"
 4
C:\>

Observe that we have not used the -i option in either case, so the process terminated immediately.

How it works...

A switch, -p {N|auto}, tells Julia to spin up N additional worker processes on startup. The auto option in the -p switch starts as many workers as you have cores on your machine, so julia -p auto is equivalent to:

  • julia -p `nproc` on Linux
  • julia -p %NUMBER_OF_PROCESSORS% on Windows

It is important to understand that when you start N workers, where N is greater than 1, then Julia will spin up N+1 processes. You can check it using the nprocs() function—one master process and N worker processes. If N is equal to 1, then only one process is started.

We can see here that hello.jl was executed on the master process and on all of the worker processes. Additionally, observe that the execution was asynchronous. In this case, workers 4 and 5 printed their message before the Julia prompt was printed by the master process, but workers 2 and 3 executed their print method after it. By adding a sleep(1) statement in hello2.jl, we make the master process wait for one second, which is sufficient time for all workers to run their println command.

 

 

As you have seen, in order to start Julia with multiple threads, you have to set the environment variable JULIA_NUM_THREADS. It is used by Julia to determine how many threads it should use. This value—in order to have any effect—must be set before Julia is started. This means that you can access it via the ENV["JULIA_NUM_THREADS"] option but changing it when Julia is running will not add or remove threads. Therefore, before running Julia you have to type the following in a terminal session:

  • export JULIA_NUM_THREADS=[number of threads] on Linux or if you use bash on Windows
  • set JULIA_NUM_THREADS=[number of threads] on Windows if you use the standard shell

There's more...

You can also add processes after Julia has started using the addprocs function. We are running the following code on Windows with two drives,C: and D:, present. Julia is started in the D:\directory:

D:\> julia --banner=no -p 2 -L hello2.jl
Hello
 From worker 3: Hello
 From worker 2: Hello
julia> pwd()
"D:\\"

julia> using Distributed

julia> pmap(i -> (i, myid(), pwd()), 1:nworkers())
2-element Array{Tuple{Int64,Int64,String},1}:
 (1, 2, "D:\\")
 (2, 3, "D:\\")

julia> cd("C:\\")

julia> pwd()
"C:\\"

julia> addprocs(2)
2-element Array{Int64,1}:
 4
 5

julia> pmap(i -> (i,myid(),pwd()), 1:nworkers())
4-element Array{Tuple{Int64,Int64,String},1}:
 (1, 3, "D:\\")
 (2, 2, "D:\\")
 (3, 5, "C:\\")
 (4, 4, "C:\\")

In particular, we see that each worker has its own working directory, which is initially set to the working directory of the master Julia process when it is started. Also, addprocs does not execute the script that was specified by the -L switch on Julia startup.

Additionally, we can see the simple use of the pmap and myid functions. The first one is a parallelized version of the map function. The second returns the identification number of a process that it is run on.

As we explained earlier, it is not possible to add threads to a running Julia process. The number of threads has to be specified before Julia is started.

Deciding between using multiple processes and multiple threads is not a simple decision. A rule of thumb is to use threads if there is a need for data sharing and frequent communication between tasks running in parallel.

See also

More details about how to work with multiple processes and multiple threads are explained in the Multithreading in Julia and Distributed computing with Julia recipes in Chapter 10, Distributed Computing.