Book Image

Mastering Julia

Book Image

Mastering Julia

Overview of this book

Table of Contents (17 chapters)
Mastering Julia
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Package management


We have noted that Julia uses Git as a repository for itself and for its package and that the installation has a built-in package manager, so there is no need to interface directly to GitHub. This repository is located in the Git folder of the installed system.

As a full discussion of the package system is given on the Julia website, we will only cover some of the main commands to use.

Listing, adding, and removing

After installing Julia, the user can create a new repository by using the Pkg.init() command. This clones the metadata from a "well-known" repository and creates a local folder called .julia:

julia> Pkg.init()
INFO: Initializing package repository C:\Users\Malcolm
INFO: Cloning METADATA from git://Github.com/JuliaLang

The latest versions of all installed packages can be updated with the Pkg.update() command.

Notice that if the repository does not exist, the first use of a package command such as Pkg.update() or Pkg.add() will call Pkg.init() to create it:

julia> Pkg.update()
Pkg.update()
INFO: Updating METADATA...
INFO: Computing changes...
INFO: No packages to install, update or remove.

We previously discussed how to install the ASCIIPlots package by using the Pkg.add("ASCIIPlots") command. The Pkg.status() command can be used to show the current packages installed and Pkg.rm() to remove them:

julia> Pkg.status()
Pkg.status()
Required packages:
- ASCIIPlots 0.0.2

julia> Pkg.rm("ASCIIPlots")
Pkg.rm("ASCIIPlots")
INFO: Removing ASCIIPlots INFO: REQUIRE updated.

After adding ASCIIPlots, we added the Winston graphics package. Most packages have a set of others on which they depend and the list can be found in the REQUIRE file.

For instance, Winston requires the Cairo, Color, IniFile, and Tk packages. Some of these packages also have dependencies, so Pkg.add() will recursively resolve, clone, and install all of these. The Cairo package is interesting since it requires Homebrew on Mac OS X and WinRPM on Windows.

WinRPM further needs URLParse, HTTPClient, LibExpat, and ZLib. So, if we use Pkg.status() again on a Windows installation, we get the following:

julia> Pkg.status()
Required packages:
 - ASCIIPlots               0.0.2
 - Winston                    0.11.0
Additional packages:
 - BinDeps                    0.2.14
 - Cairo  0.2.13
 - Color 0.2.10
 - HTTPClient              0.1.0
 - IniFile 0.2.2
 - LibCURL             0.1.3
 - LibExpat0.0.4
 - Tk                            0.2.12
 - URIParser 0.0.2
 - URLParse 0.0.0
 - WinRPM                 0.0.13
 - Zlib                          0.1.5

All the packages installed as dependencies are listed as additional packages. Removing the Winston package will also remove all the additional packages. When adding complex packages, you may wish to add some of the dependent ones first. So, with Winston, you can add both Cairo and Tk, which will then show the required rather than the additional packages.

Choosing and exploring packages

For such a young language, Julia has a rich and rapidly developing set of packages covering all aspects of use to the data scientist and the mathematical analyst. Registered packages are available on GitHub, and the list of these packages can be referenced via http://docs.julialang.org/.

Because the core language is still under review from release to release, some features are deprecated, others changed, and the others dropped, so it is possible that specific packages may be at variance with the release of Julia you are using, even if it is designated as the current "stable" one. Furthermore, it may be that a package may not work under different operating systems. In general, use under the Linux operating system fares the best and under Windows fares the worst.

How then should we select a package? The best indicators are the version number; packages designated v0.0.0 should always be viewed with some suspicion. Furthermore, the date of the last update is useful here. The docs website also lists the individual contributors to each individual package with the principal author listed first. Ones with multiple developers are clearly of interest to a variety of contributors and tend to be better discussed and maintained. There is strength here in numbers. The winner in this respect seems to be (as of July 2014) the DataFrames package, which is up to version 0.3.15 and has attracted the attention of 33 separate authors.

Even with an old relatively untouched package, there is nothing to stop you checking out the code and modifying or building on it. Any enhancements or modifications can be applied and the code returned; that's how open source grows. Furthermore, the principal author is likely to be delighted that someone else is finding the package useful and taking an interest in the work.

It is not possible to create a specific taxonomy of Julia packages but certain groupings emerge, which build on the backs of the earlier ones. We will be meeting many of these later in this book, but before that, it may be useful to quickly list a few.

Statistics and mathematics

Statistics is seen rightly as the realm of R and mathematics of MATLAB and Mathematica, while Python impresses in both. The base Julia system provides much of the functionality available in NumPy, while additional packages add that of SciPy and Pandas.

Statistics is well provided in Julia on GitHub by both the https://Github.com/JuliaStats group and a Google group called https://groups.google.com/forum/#!forum/julia-stats.

Much of the basic statistics is provided by Stats.jl and StatsBase.jl. There are various means of working with R-style data frames and loading some of the datasets available to R. The distributions package covers the probability distributions and the associated functions. Moreover, there is support for time series, cluster analysis, hypothesis testing, MCMC methods, and more.

Mathematical operations such as random number generators and exotic functions are largely in the core (unlike Python), but packages exist for elemental calculus operations, ODE solvers, Monte-Carlo methods, mathematical programming, and optimization. There is a GitHub page for the https://Github.com/JuliaOpt/ group, which lists the packages under the umbrella of optimization.

Data visualization

Graphics support in Julia has sometimes been given less than favorable press in comparison with other languages such as Python, R, and MATLAB. It is a stated aim of the developers to incorporate some degree of graphics support in the core, but at present, this is largely the realm of package developers.

While it was true that v0.1.x offered very limited and flaky graphics, v0.2.x vastly improved the situation and this continues with v0.3.x.

Firstly, there is a module in the core called Base.Graphics, which acts as an abstract layer to packages such as Cairo and Tk/Gtk, which serve to implement much of the required functionality.

Layered on top of these are a couple of packages, namely Winston (which we have introduced already) and Gadfly. Normally, as a user, you will probably work with one or the other of these.

Winston is a 2D graphics package that provides methods for curve plotting and creating histograms and scatter diagrams. Axis labels and display titles can be added, and the resulting display can be saved to files as well as shown on the screen.

Gadfly is a system for plotting and visualization equivalent to the ggplot2 module in R. It can be used to render the graphics output to PNG, PostScript, PDF, and SVG files. Gadfly works best with the following C libraries installed: cairo, pango, and fontconfig. The PNG, PS, and PDF backends all require cairo, but without it, it is still possible to create displays to SVG and Javascript/D3.

There are a couple of different approaches, which are worthy of note: Gaston and PyPlot.

Gaston is an interface to the gnuplot program on Linux. You need to check whether gnuplot is available, and if not, it must be installed in the usual way via yum or apt-get. For this, you need to install XQuartz, which must be started separately before using Gaston.

Gaston can do whatever gnuplot is capable of. There is a very comprehensive script available in the package by running Gaston.demo().

We have discussed Pyplot briefly before when looking at IJulia. The package uses Julia's PyCall package to call the Matplotlib Python module directly and can display plots in any Julia graphical backend, including as we have seen, inline graphics in IJulia.

Web and networking

Distributed computing is well represented in Julia. TCP/IP sockets are implemented in the core. Additionally, there is support for Curl, SMTP and for WebSockets. HTTP protocols and parsing are provided for with a number of packages, such as HTTP, HttpParser, HttpServer, JSON, and Mustache.

Working in the cloud at present, there are a couple of packages. One is AWS, which addresses the use of Amazon Simple Storage System (S3) and Elastic Compute Cloud (EC2). The other is HDFS, which provides a wrapper over libhdfs and a Julia MapReduce functionality.

Database and specialist packages

The database is supported mainly through the use of the ODBC package. On Windows, ODBC is the standard, while Linux and Mac OS X require the installation of unixODBC or iODBC. There is currently no native support for the main SQL databases such as Oracle, MySQL, and PostgreSQL.

The package SQLite provides an interface to that database and there is a Mongo package, which implements bindings to the NoSQL database MongoDB. Other NoSQL databases such as CouchDB and Neo4j exposed a RESTful API, so some of the HTTP packages coupled with JSON can be used to interact with these.

A couple of specialist Julia groups are JuliaQuant and JuliaGPU.

JuliaQuant encompasses a variety of packages for quantitative financial modeling. This is an area that has been heavily supported by developers in R, MATLAB, and Python, and the Quant group is addressing the same problems in Julia.

JuliaGPU is a set of packages supporting OpenCL and CUDA interfacing to GPU cards for high-speed parallel processing.

Both of these are very much works in progress, and interest and support in the development of the packages would be welcome.

How to uninstall Julia

Removing Julia is very simple; there is no explicit uninstallation process. It consists of deleting the source tree, which was created by the build process or from the DMG file on Mac OS X or the EXE file on Windows. Everything runs within this tree, so there are no files installed to any system folders.

In addition, we need to attend to the package folder. Recall that under Linux and Mac OS X this is a hidden folder called .julia in the user's home folder. In Windows, it is located in the user's profile typically in C:\Users\[my-user-name]. Removing this folder will erase all the packages that were previously installed.

There is another hidden file called .julia_history that should be deleted; it keeps an historical track of the commands listed.

Adding an unregistered package

The official repository for the registered packages in Julia is here:

https://Github.com/JuliaLang/METADATA.jl.

Any packages here will be listed using the package manager or in Julia Studio.

However, it is possible to use an unregistered package by using Pkg.clone(url), where the url is a Git URL from which the package can be cloned. The package should have the src and test folders and may have several others. If it contains a REQUIRE file at the top of the source tree, that file can be used to determine any dependent registered packages; these packages will be automatically installed.

If you are developing a package, it is possible to place the source in the .julia folder alongside packages added with Pkg.add() or Pkg.clone(). Eventually, you will wish to use GitHub in a more formal way; we will deal with that later when considering package implementation.