Book Image

Julia Programming Projects

By : Adrian Salceanu
Book Image

Julia Programming Projects

By: Adrian Salceanu

Overview of this book

Julia is a new programming language that offers a unique combination of performance and productivity. Its powerful features, friendly syntax, and speed are attracting a growing number of adopters from Python, R, and Matlab, effectively raising the bar for modern general and scientific computing. After six years in the making, Julia has reached version 1.0. Now is the perfect time to learn it, due to its large-scale adoption across a wide range of domains, including fintech, biotech, education, and AI. Beginning with an introduction to the language, Julia Programming Projects goes on to illustrate how to analyze the Iris dataset using DataFrames. You will explore functions and the type system, methods, and multiple dispatch while building a web scraper and a web app. Next, you'll delve into machine learning, where you'll build a books recommender system. You will also see how to apply unsupervised machine learning to perform clustering on the San Francisco business database. After metaprogramming, the final chapters will discuss dates and time, time series analysis, visualization, and forecasting. We'll close with package development, documenting, testing and benchmarking. By the end of the book, you will have gained the practical knowledge to build real-world applications in Julia.
Table of Contents (19 chapters)
Title Page
Copyright and Credits
Dedication
About Packt
Contributors
Preface
Index

The package system


Your Julia installation comes with a powerful package manager called Pkg. This handles all the expected operations, such as adding and removing packages, resolving dependencies and keeping installed packages up to date, running tests, and even assisting with publishing our own packages.

Packages play a pivotal role by providing a wide range of functionality, seamlessly extending the core language. Let's take a look at the most important package management functions.

Adding a package

In order to be known to Pkg, the packages must be added to a registry that is available to Julia. Pkg supports working with multiple registries simultaneously—including private ones hosted behind corporate firewalls. By default, Pkg is configured to use Julia's General registry, a repository of free and open sources packages maintained by the Julia community.

Pkg is quite a powerful beast and we'll use it extensively throughout the book. Package management is a common task when developing with Julia so we'll have multiple opportunities to progressively dive deeper. We'll take our first steps now as we'll learn how to add packages—and we'll do this by stacking a few powerful new features to our Julia setup.

OhMyREPL

One of my favourite packages is called OhMyREPL. It implements a few super productive features for the Julia REPL, most notably syntax highlighting and brackets pairing. It's a great addition that makes the interactive coding experience even more pleasant and efficient.

Julia's Pkg is centered around GitHub. The creators distribute the packages as git repos, hosted on GitHub—and even the General registry is a GitHub repository itself. OhMyREPL is no exception. If you want to learn more about it before installing it—always a good idea when using code from third parties — you can check it out at https://github.com/KristofferC/OhMyREPL.jl 

Keep in mind that even if it's part of the General registry, the packages come with no guarantees and they're not necessarily checked, validated or endorsed by the Julia community. However, there are a few common sense indicators which provide insight into the quality of the package, most notably the number of stars, the status of the tests as well as the support for the most recent Julia versions.

The first thing we need to do in order to add a package is to enter the Pkg REPL-mode. We do this by typing ] at the beginning of the line:

julia>]

The cursor will change to reflect that we're now ready to manage packages:

(v1.0) pkg>

Note

IJulia does not (yet) support the pkg> mode, but we can execute Pkg commands by wrapping them in pkg"..." as in pkg"add OhMyREPL".

Pkg uses the concept of environments, allowing us to define distinct and independent sets of packages on a per-project basis. This is a very powerful and useful feature, as it eliminates dependency conflicts caused by projects that rely on different versions of the same package (the so-called dependency hell).

Given that we haven't created any project yet, Pkg will just use the default project, v1.0, indicated by the value between the parenthesis. This represents the Julia version that you're running on—and it's possible that you'll get a different default project depending on your very own version of Julia.

Now we can just go ahead and add OhMyREPL:

(v1.0) pkg> add OhMyREPL 
  Updating registry at `~/.julia/registries/General` 
  Updating git-repo `https://github.com/JuliaRegistries/General.git` 
 Resolving package versions... 
  Updating `~/.julia/environments/v1.0/Project.toml` 
  [5fb14364] + OhMyREPL v0.3.0 
  Updating `~/.julia/environments/v1.0/Manifest.toml` 
  [a8cc5b0e] + Crayons v1.0.0 
  [5fb14364] + OhMyREPL v0.3.0 
  [0796e94c] + Tokenize v0.5.2 
  [2a0f44e3] + Base64 
  [ade2ca70] + Dates 
  [8ba89e20] + Distributed 
  [b77e0a4c] + InteractiveUtils 
  [76f85450] + LibGit2 
  [8f399da3] + Libdl 
  [37e2e46d] + LinearAlgebra 
  [56ddb016] + Logging 
  [d6f4376e] + Markdown 
  [44cfe95a] + Pkg 
  [de0858da] + Printf 
  [3fa0cd96] + REPL 
  [9a3f8284] + Random 
  [ea8e919c] + SHA 
  [9e88b42a] + Serialization 
  [6462fe0b] + Sockets 
  [8dfed614] + Test 
  [cf7118a7] + UUIDs 
  [4ec0a83e] + Unicode 

Note

The IJulia equivalent of the previous command is pkg"add OhMyREPL".

When running pkg> add on a fresh Julia installation, Pkg will clone Julia's General registry and use it to look up the names of the package we requested. Although we only explicitly asked for OhMyREPL, most Julia packages have external dependencies that also need to be installed. As we can see, our package has quite a few—but they were promptly installed by Pkg.

Custom package installation

Sometimes we might want to use packages that are not added to the general registry. This is usually the case with packages that are under (early) development—or private packages. For such situations, we can pass pkg> add the URL of the repository, instead of the package's name:

(v1.0) pkg> add https://github.com/JuliaLang/Example.jl.git 
   Cloning git-repo `https://github.com/JuliaLang/Example.jl.git` 
  Updating git-repo `https://github.com/JuliaLang/Example.jl.git` 
 Resolving package versions... 
  Updating `~/.julia/environments/v1.0/Project.toml` 
  [7876af07] + Example v0.5.1+ #master (https://github.com/JuliaLang/Example.jl.git) 
  Updating `~/.julia/environments/v1.0/Manifest.toml` 
  [7876af07] + Example v0.5.1+ #master (https://github.com/JuliaLang/Example.jl.git)

Another common scenario is when we want to install a certain branch of a package's repository. This can be easily achieved by appending #name_of_the_branch at the end of the package's name or URL:

(v1.0) pkg> add OhMyREPL#master 
   Cloning git-repo `https://github.com/KristofferC/OhMyREPL.jl.git` 
  Updating git-repo `https://github.com/KristofferC/OhMyREPL.jl.git` 
 Resolving package versions... 
 Installed Crayons ─ v0.5.1 
  Updating `~/.julia/environments/v1.0/Project.toml` 
  [5fb14364] ~ OhMyREPL v0.3.0 ⇒ v0.3.0 #master (https://github.com/KristofferC/OhMyREPL.jl.git) 
  Updating `~/.julia/environments/v1.0/Manifest.toml`
  [a8cc5b0e] ↓ Crayons v1.0.0 ⇒ v0.5.1 
  [5fb14364] ~ OhMyREPL v0.3.0 ⇒ v0.3.0 #master (https://github.com/KristofferC/OhMyREPL.jl.git)

Or, for unregistered packages, use the following:

(v1.0) pkg> add https://github.com/JuliaLang/Example.jl.git#master

If we want to get back to using the published branch, we need to free the package:

(v1.0) pkg> free OhMyREPL 
 Resolving package versions... 
  Updating `~/.julia/environments/v1.0/Project.toml` 
  [5fb14364] ~ OhMyREPL v0.3.0 #master (https://github.com/KristofferC/OhMyREPL.jl.git) ⇒ v0.3.0 
  Updating `~/.julia/environments/v1.0/Manifest.toml` 
  [a8cc5b0e] ↑ Crayons v0.5.1 ⇒ v1.0.0 
  [5fb14364] ~ OhMyREPL v0.3.0 #master (https://github.com/KristofferC/OhMyREPL.jl.git) ⇒ v0.3.0
Revise

That was easy, but practice makes perfect. Let's add one more! This time we'll install Revise, another must-have package that enables a streamlined development workflow by monitoring and detecting changes in your Julia files and automatically reloading the code when needed. Before Revise it was notoriously difficult to load changes in the current Julia process, developers usually being forced to restart the REPL—a time-consuming and inefficient process. Revise can eliminate the overhead of restarting, loading packages, and waiting for code to compile.

Note

You can learn more about Revise by reading its docs at https://timholy.github.io/Revise.jl/latest/.

Unsurprisingly, we only have to invoke add one more time, this time passing in Revise for the package name:

(v1.0) pkg> add Revise 
Resolving package versions... 
 Installed Revise ─ v0.7.5 
  Updating `~/.julia/environments/v1.0/Project.toml` 
  [295af30f] + Revise v0.7.5 
  Updating `~/.julia/environments/v1.0/Manifest.toml` 
  [bac558e1] + OrderedCollections v0.1.0
  [295af30f] + Revise v0.7.5 
  [7b1f6079] + FileWatching 

Note

The add command also accepts multiple packages at once. We added them one by one now, for learning purposes, but otherwise, we could've just executed (v1.0) pkg> add OhMyREPL Revise.

Checking the package status

We can confirm that the operations were successful by checking our project's status, using the aptly named status command:

 (v1.0) pkg> status 
    Status `~/.julia/environments/v1.0/Project.toml` 
  [7876af07] Example v0.5.1+ #master (https://github.com/JuliaLang/Example.jl.git) 
  [5fb14364] OhMyREPL v0.3.0 
  [295af30f] Revise v0.7.5 

The status command displays all the installed packages, including, from left to right, the short version of the package's id (called the UUID), the name of the package and the version number. Where appropriate, it will also indicate the branch that we're tracking, as in the case of Example, where we're on the master branch.

Note

Pkg also supports a series of shortcuts, if you want to save a few keystrokes. In this case, st can be used instead of status.

Using packages

Once a package has been added, in order to access its functionality we have to bring into scope. That's how we tell Julia that we intend to use it, asking the compiler to make it available for us. For that, first, we need to exit pkg mode. Once we're at the julian prompt, in order to use OhMyREPL, all we need to do is execute:

julia> using OhMyREPL
[ Info: Precompiling OhMyREPL [5fb14364-9ced-5910-84b2-373655c76a03]

That's all it takes—OhMyREPL is now automatically enhancing the current REPL session. To see it in action, here is what the regular REPL looks like:

And here is the same code, enhanced by OhMyREPL:

Syntax highlighting and bracket matching make the code more readable, reducing syntax errors. Looks pretty awesome, doesn't it?

Note

OhMyREPL has a few more cool features up its sleeve—you can learn about them by checking the official documentation at https://kristofferc.github.io/OhMyREPL.jl/latest/index.html.

One more step

OhMyREPL and Revise are excellent development tools and it's very useful to have them loaded automatically in all the Julia sessions. This is exactly why the startup.jl file exists—and now we have the opportunity to put it to good use (not that our heartfelt and welcoming greetings were not impressive enough!).

Here's a neat trick, to get us started—Julia provides an edit function that will open a file in the configured editor. Let's use it to open the startup.jl file:

julia> edit("~/.julia/config/startup.jl")

This will open the file in the default editor. If you haven't yet deleted our previously added welcome messages, feel free to do it now (unless you really like them and in that case, by all means, you can keep them). Now, Revise needs to be used before any other module that we want to track—so we'll want to have it at the top of the file. As for OhMyREPL, it can go next. Your startup.jl file should look like this:

using Revise 
using OhMyREPL 

Save it and close the editor. Next time you start Julia, both Revise and OhMyREPL will be already loaded.

Updating packages

Julia boosts a thriving ecosystem and packages get updated at a rapid pace. It's a good practice to regularly check for updates with pkg> update:

(v1.0) pkg> update

When this command is issued, Julia will first retrieve the latest version of the general repository, where it will check if any of the packages need to be updated.

Beware that issuing the update command will update all the available packages. As we discussed earlier, when mentioning dependency hell, this might not be the best thing. In the upcoming chapters, we will see how to work with individual projects and manage dependencies per individual application. Until then though, it's important to know that you can cherry pick the packages that you want to update by passing in their names:

(v1.0) pkg> update OhMyREPL Revise

Pkg also exposes a preview mode, which will show what will happen when running a certain command without actually making any of the changes:

(v1.0) pkg> preview update OhMyREPL 
(v1.0) pkg> preview add HTTP

Note

The shortcut for pkg> update is pkg> up.

Pinning packages

Sometimes though we might want to ensure that certain packages will not be updated. That's when we pin them:

(v1.0) pkg> pin OhMyREPL 
 Resolving package versions... 
  Updating `~/.julia/environments/v1.0/Project.toml` 
  [5fb14364] ~ OhMyREPL v0.3.0 ⇒ v0.3.0
  Updating `~/.julia/environments/v1.0/Manifest.toml` 
  [5fb14364] ~ OhMyREPL v0.3.0 ⇒ v0.3.0

Pinned packages are marked with the symbol—also present now when checking the status:

(v1.0) pkg> st 
    Status `~/.julia/environments/v1.0/Project.toml` 
  [5fb14364] OhMyREPL v0.3.0
  [295af30f] Revise v0.7.5

If we want to unpin a package, we can use pkg> free:

(v1.0) pkg> free OhMyREPL 
  Updating `~/.julia/environments/v1.0/Project.toml` 
  [5fb14364] ~ OhMyREPL v0.3.0 ⇒ v0.3.0 
  Updating `~/.julia/environments/v1.0/Manifest.toml` 
  [5fb14364] ~ OhMyREPL v0.3.0 ⇒ v0.3.0 
 
(v1.0) pkg> st 
    Status `~/.julia/environments/v1.0/Project.toml` 
  [5fb14364] OhMyREPL v0.3.0 
  [295af30f] Revise v0.7.5

Removing packages

If you no longer plan on using some packages you can delete (or remove them), with the (you guessed it) pkg> remove command. For instance, let's say that we have the following configuration:

(v1.0) pkg> st 
    Status `~/.julia/environments/v1.0/Project.toml` 
  [7876af07] Example v0.5.1+ #master (https://github.com/JuliaLang/Example.jl.git) 
  [5fb14364] OhMyREPL v0.3.0 
  [295af30f] Revise v0.7.5

We can remove the Example package with the following code:

(v1.0) pkg> remove Example 
  Updating `~/.julia/environments/v1.0/Project.toml` 
  [7876af07] - Example v0.5.1+ #master (https://github.com/JuliaLang/Example.jl.git) 
  Updating `~/.julia/environments/v1.0/Manifest.toml` 
  [7876af07] - Example v0.5.1+ #master (https://github.com/JuliaLang/Example.jl.git)

Sure enough, it's now gone:

(v1.0) pkg> st 
    Status `~/.julia/environments/v1.0/Project.toml` 
  [5fb14364] OhMyREPL v0.3.0 
  [295af30f] Revise v0.7.5

Note

The shortcut for pkg> remove is pkg> rm.

Besides the explicit removal of undesired packages, Pkg also has a built-in auto-cleanup function. As package versions evolve and package dependencies change, some of the installed packages will become obsolete and will no longer be used in any existing project. Pkg keeps a log of all the projects used so it can go through the log and see exactly which projects are still needing which packages—and thus identify the ones that are no longer necessary. These can be deleted in one swoop with the pkg> gc command:

(v1.0) pkg> gc Active manifests at: `/Users/adrian/.julia/environments/v1.0/Manifest.toml` `/Users/adrian/.julia/environments/v0.7/Manifest.toml` Deleted /Users/adrian/.julia/packages/Acorn/exWWb: 40.852 KiB Deleted /Users/adrian/.julia/packages/BufferedStreams/hCA7W: 102.235 KiB Deleted /Users/adrian/.julia/packages/Crayons/e1SsX: 49.133 KiB Deleted /Users/adrian/.julia/packages/Example/ljaU2: 4.625 KiB Deleted /Users/adrian/.julia/packages/Genie/XOia2: 2.031 MiB Deleted /Users/adrian/.julia/packages/HTTPClient/ZQR55: 37.669 KiB Deleted /Users/adrian/.julia/packages/Homebrew/l8kUw: 277.296 MiB Deleted /Users/adrian/.julia/packages/LibCURL/Qs5og: 11.599 MiB Deleted /Users/adrian/.julia/packages/LibExpat/6jLDP: 127.247 KiB Deleted /Users/adrian/.julia/packages/LibPQ/N7lDU: 134.734 KiB Deleted /Users/adrian/.julia/packages/Libz/zMAun: 80.744 KiB Deleted /Users/adrian/.julia/packages/Nettle/LMDZh: 50.371 KiB
   Deleted /Users/adrian/.julia/packages/OhMyREPL/limOC: 448.493 KiB 
   Deleted /Users/adrian/.julia/packages/WinRPM/rDDZz: 24.925 KiB 
   Deleted 14 package installations : 292.001 MiB 

Note

Besides the dedicated Pkg REPL mode, Julia also provides a powerful API for programmatically managing packages. We won't cover it, but if you want to learn about it, you can check the official documentation at https://docs.julialang.org/en/latest/stdlib/Pkg/#References-1.

Discovering packages

Package discovery is not yet as simple as it could be, but there are a few good options. I recommend starting with this list of curated Julia packages: https://github.com/svaksha/Julia.jl. It groups a large collection of packages by domain, covering topics such as AI, Biology, Chemistry, Databases, Graphics, Data Science, Physics, Statistics, Super-Computing and more.

If that is not enough, you can always go to https://discourse.julialang.org where the Julia community discusses a multitude of topics related to the language. You can search and browse the existing threads, especially the package announcements section, hosted at https://discourse.julialang.org/c/community/packages.

Of course you can always ask the community for help—Julians are very friendly and welcoming, and a lot of effort is put towards moderation, in order to keep the discussion civilized and constructive. A free Discourse account is required in order to create new topics and post replies.

Finally, https://juliaobserver.com/packages is a third party website that provides a more polished way to look for packages—and it also performs a GitHub search, thus including unregistered packages too.

Registered versus unregistered

Although I already touched upon the topic in the previous paragraphs, I want to close the discussion about Pkg with a word of caution. The fact that a package is registered does not necessarily mean that it has been vetted in terms of functionality or security. It simply means that the package has been submitted by the creator and that it met certain technical requirements to be added to the general registry. The package sources are available on GitHub, and like with any open source software, make sure that you understand what it does, how it should be used, and that you accept the licensing terms.

This concludes our initial discussion about package management. But as this is one of the most common tasks, we'll come back to it again and again in future chapters, where we'll also see a few scenarios for more advanced usage.