Book Image

Julia for Data Science

By : Anshul Joshi
2 (1)
Book Image

Julia for Data Science

2 (1)
By: Anshul Joshi

Overview of this book

Julia is a fast and high performing language that's perfectly suited to data science with a mature package ecosystem and is now feature complete. It is a good tool for a data science practitioner. There was a famous post at Harvard Business Review that Data Scientist is the sexiest job of the 21st century. (https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century). This book will help you get familiarised with Julia's rich ecosystem, which is continuously evolving, allowing you to stay on top of your game. This book contains the essentials of data science and gives a high-level overview of advanced statistics and techniques. You will dive in and will work on generating insights by performing inferential statistics, and will reveal hidden patterns and trends using data mining. This has the practical coverage of statistics and machine learning. You will develop knowledge to build statistical models and machine learning systems in Julia with attractive visualizations. You will then delve into the world of Deep learning in Julia and will understand the framework, Mocha.jl with which you can create artificial neural networks and implement deep learning. This book addresses the challenges of real-world data science problems, including data cleaning, data preparation, inferential statistics, statistical modeling, building high-performance machine learning systems and creating effective visualizations using Julia.
Table of Contents (17 chapters)
Julia for Data Science
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Using Jupyter Notebook


Data science and scientific computing are privileged to have an amazing interactive tool called Jupyter Notebook. With Jupyter Notebook you can to write and run code in an interactive web environment, which also has the capability to have visualizations, images, and videos. It makes testing of equations and prototyping a lot easier. It has the support of over 40 programming languages and is completely open source.

GitHub supports Jupyter notebooks. The notebook with the record of computation can be shared via the Jupyter notebook viewer or other cloud storage. Jupyter notebooks are extensively used for coding machine-learning algorithms, statistical modeling and numerical simulation, and data munging.

Jupyter Notebook is implemented in Python but you can run the code in any of the 40 languages provided you have their kernel. You can check if Python is installed on your system or not by typing the following into the Terminal:

python -version 

This will give the version of Python if it is there on the system. It is best to have Python 2.7.x or 3.5.x or a later version.

If Python is not installed then you can install it by downloading it from the official website for Windows. For Linux, typing the following should work:

sudo apt-get install python 

It is highly recommended to install Anaconda if you are new to Python and data science. Commonly used packages for data science, numerical, and scientific computing including Jupyter notebook come bundled with Anaconda making it the preferred way to set up the environment. Instructions can be found at https://www.continuum.io/downloads.

Jupyter is present in the Anaconda package, but you can check if the Jupyter package is up to date by typing in the following:

conda install jupyter 

Another way to install Jupyter is by using pip:

pip install jupyter 

To check if Jupyter is installed properly, type the following in the Terminal:

jupyter -version 

It should give the version of the Jupyter if it is installed.

Now, to use Julia with Jupyter we need the IJulia package. This can be installed using Julia's package manager.

After installing IJulia, we can create a new notebook by selecting Julia under the Notebooks section in Jupyter.

To get the latest version of all your packages, in Julia's shell type the following:

julia> Pkg.update() 

After that add the IJulia package by typing the following:

julia> Pkg.add("IJulia") 

In Linux, you may face some warnings, so it's better to build the package:

julia> Pkg.build("IJulia") 

After IJulia is installed, come back to the Terminal and start the Jupyter notebook:

jupyter notebook 

A browser window will open. Under New, you will find options to create new notebooks with the kernels already installed. As we want to start a Julia notebook we will select Julia 0.4.2. This will start a new Julia notebook. You can try out a simple example.

In this example, we are creating a histogram of random numbers. This is just an example we will be studying the components used in detail in coming chapters.

Popular editors such as Atom and Sublime have a plugin for Julia. Atom has language—julia and Sublime has Sublime—IJulia, both of which can be downloaded from their package managers.