Book Image

Mastering Julia

Book Image

Mastering Julia

Overview of this book

Table of Contents (17 chapters)
Mastering Julia
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Preface

Julia is a relatively young programming language. The initial design work on the Julia project began at MIT in August 2009, and by February 2012, it became open source. It is largely the work of three developers Stefan Karpinski, Jeff Bezanson, and Viral Shah. These three, together with Alan Edelman, still remain actively committed to Julia and MIT currently hosts a variety of courses in Julia, many of which are available over the Internet.

Initially, Julia was envisaged by the designers as a scientific language sufficiently rapid to make the necessity of modeling in an interactive language and subsequently having to redevelop in a compiled language, such as C or Fortran. At that time the major scientific languages were propriety ones such as MATLAB and Mathematica, and were relatively slow. There were clones of these languages in the open source domain, such as GNU Octave and Scilab, but these were even slower. When it launched, the community saw Julia as a replacement for MATLAB, but this is not exactly case. Although the syntax of Julia is similar to MATLAB, so much so that anyone competent in MATLAB can easily learn Julia, it was not designed as a clone. It is a more feature-rich language with many significant differences that will be discussed in depth later.

The period since 2009 has seen the rise of two new computing disciplines: big data/cloud computing, and data science. Big data processing on Hadoop is conventionally seen as the realm of Java programming, since Hadoop runs on the Java virtual machine. It is, of course, possible to process big data by using programming languages other than those that are Java-based and utilize the streaming-jar paradigm and Julia can be used in a way similar to C++, C#, and Python.

The emergence of data science heralded the use of programming languages that were simple for analysts with some programming skills but who were not principally programmers. The two languages that stepped up to fill the breach have been R and Python. Both of these are relatively old with their origins back in the 1990s. However, the popularity of these two has seen a rapid growth, ironically from around the time when Julia was introduced to the world. Even so, with such estimated and staid opposition, Julia has excited the scientific programming community and continues to make inroads in this space.

The aim of this book is to cover all aspects of Julia that make it appealing to the data scientist. The language is evolving quickly. Binary distributions are available for Linux, Mac OS X, and Linux, but these will lag behind the current sources. So, to do some serious work with Julia, it is important to understand how to obtain and build a running system from source. In addition, there are interactive development environments available for Julia and the book will discuss both the Jupyter and Juno IDEs.

What this book covers

Chapter 1, The Julia Environment, deals with the steps needed to get a working distribution of Julia up and running. It is important to be able to acquire the latest sources and build the system from scratch, as well as find and install appropriate packages and also to remove them when necessary.

Chapter 2, Developing in Julia, is a quick overview of some of Julia's basic syntax. Julia is a new language, but it is not unfamiliar to readers with a background in MATLAB, R, or Python, so the aim of the chapter is to briefly bring readers up to speed, using examples, with Julia and to point them to online sources. Also, it is important to be aware of the differences between working via the console in contrast to the JuliaStudio IDE.

Chapter 3, Types and Dispatch, looks at the Julia type system and shows how this exposes powerful techniques to the developer by means of its de facto functional dispatch system.

Chapter 4, Interoperability, covers the methods by which Julia can interact with the operating system and other programming languages. These methods are largely native to Julia and the chapter concludes with an introduction to parallelism that is discussed further in Chapter 9, Networking.

Chapter 5, Working with Data, begins the journey the data scientist would take from data source to analytics results. Most projects begin with data, which has to be read, cleaned up, and sampled. The chapter starts here and goes on to describe simple statistics and analytics.

Chapter 6, Scientific Programming, is seen as a principle reason to program in Julia. Its strength is the speed of execution combined with the ease of developing in a scripting language that makes it particularly useful in tackling compute-bound processes. The chapter looks at various techniques used in approaching mathematical and scientific problems.

Chapter 7, Graphics, in Julia is often compared unfavorably to other alternate languages such as MATLAB and R. While earlier versions of the language had limited graphics options, this is certainly not the case now and this chapter describes a wide variety of sophisticated approaches both to display to screen and save to disk files.

Chapter 8, Databases, deals with interaction with databases in Julia. Data to be analyzed may be stored in a database or it may be necessary to save the results in a database after analysis. Various approaches are considered for SQL and NoSQL datastores. These are not built in to the language, rather rely totally on contributed packages, and so may be enhanced in the near future.

Chapter 9, Networking, covers aspects of working with distributed data sources. Big data and cloud systems are becoming more prevalent in data science and the chapter covers network programming at the socket level and interfacing via the Web. Also, it includes a discussion on running Julia on Amazon Web Services and the Google compute server.

Chapter 10, Working with Julia, aims to provide information and encouragement to go on and contribute as a Julia developer. This may be as a sole author contributing to an existing package or as a member of the Julia groups.

What you need for this book

Developing in Julia can be done under any of the familiar computing operating systems: Linux, OS X, and Windows. To explore the language in depth, the reader may wish to acquire the latest versions and to build from source under Linux. However, to work with the language using a binary distribution on any of the three platforms, the installation is very straightforward and convenient. In addition, Julia now comes pre-packaged with the Juno IDE, which just requires expansion from a compressed (zipped) archive.

Some of the examples in the later chapters on database support, networking, and cloud services will require additional installation and resources, and how to acquire these is discussed at the relevant point.

Who this book is for

This is not an introduction to programming, so it is assumed that the reader is familiar with the concepts of at least one programming language. For those familiar with scripting languages such as Python, R, and MATLAB, the task is not a difficult one, as well as for people using similar-style languages such as C, Java, and C#.

However, for the data scientist, possibly with a background in analytics methods using spreadsheets, such as Excel, or statistical packages, such as SPSS and Stata, most parts of the text should prove rewarding.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The test folder has some code that illustrates how to write test scripts and use the Base.Test system."

A block of code is set as follows:

function isAdmin2(_mc::Dict{ASCIIString,UserCreds}, _name::ASCIIString)
    check_admin::Bool = false;
    try
        check_admin = _mc[_name].admin
    catch
        check_admin = false
    finally
        return check_admin
   end
end

Any command-line input or output is written as follows:

julia> include("asian.jl")
julia> run_asian()

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "However, there are others that may occur, such as in case of redirection and error, one being the infamous 404, Page not found."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail , and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at , and we will do our best to address the problem.