Book Image

Julia for Data Science

By : Anshul Joshi
2 (1)
Book Image

Julia for Data Science

2 (1)
By: Anshul Joshi

Overview of this book

Julia is a fast and high performing language that's perfectly suited to data science with a mature package ecosystem and is now feature complete. It is a good tool for a data science practitioner. There was a famous post at Harvard Business Review that Data Scientist is the sexiest job of the 21st century. (https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century). This book will help you get familiarised with Julia's rich ecosystem, which is continuously evolving, allowing you to stay on top of your game. This book contains the essentials of data science and gives a high-level overview of advanced statistics and techniques. You will dive in and will work on generating insights by performing inferential statistics, and will reveal hidden patterns and trends using data mining. This has the practical coverage of statistics and machine learning. You will develop knowledge to build statistical models and machine learning systems in Julia with attractive visualizations. You will then delve into the world of Deep learning in Julia and will understand the framework, Mocha.jl with which you can create artificial neural networks and implement deep learning. This book addresses the challenges of real-world data science problems, including data cleaning, data preparation, inferential statistics, statistical modeling, building high-performance machine learning systems and creating effective visualizations using Julia.
Table of Contents (17 chapters)
Julia for Data Science
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Facilitating language interoperability


Although Julia can be used to write most kinds of code, there are mature libraries for numerical and scientific computing which we would like to exploit. These libraries can be in C, Fortran or Python. Julia allows the ease of using the existing code written in Python, C, or Fortran. This is done by making Julia perform simple and efficient-to-call C, Fortran, or Python functions.

The C/Fortran libraries should be available to Julia. An ordinary but valid call with ccall is made to this code. This is possible when the code is available as a shared library. Julia's JIT generates the same machine instructions as the native C call. Therefore, it is generally no different from calling through a C code with a minimal overhead.

Importing Python code can be beneficial and sometimes needed, especially for data science, because it already has an exhaustive library of implementations of machine learning and statistical functions. For example, it contains scikit-learn and pandas. To use Python in Julia, we require PyCall.jl. To add PyCall.jl do the following:

Pkg.add("PyCall") 

PyCall contains a macro @pyimport that facilitates importing Python packages and provides Julia wrappers for all of the functions and constants therein, including automatic conversion of types between Julia and Python.

PyCall also provides functionalities for lower-level manipulation of Python objects, including a PyObject type for opaque Python objects. It also has a pycall function (similar to Julia's ccall function), which can be used in Julia to call Python functions with type conversions. PyCall does not use the Python program but links directly to the libpython library. During the Pkg.build, it finds the location of the libpython by Punning python.

Calling Python code in Julia

The @pyimport macro automatically makes the appropriate type conversions to Julia types in most of the scenarios based on a runtime inspection of the Python objects. It achieves better control over these type conversions by using lower-level functions. Using PyCall in scenarios where the return type is known can help in improving the performance, both by eliminating the overhead of runtime type inference, and also by providing more type information to the Julia compiler:

  • pycall(function::PyObject, returntype::Type, args...): This calls the given Python function (typically looked up from a module) with the given args... (of standard Julia types which are converted automatically to the corresponding Python types if possible), converting the return value to returntype (use a returntype of PyObject to return the unconverted Python object reference, or PyAny to request an automated conversion).

  • pyimport(s): This imports the Python modules (a string or symbol) and returns a pointer to it (a PyObject). Functions or other symbols in the module may then be looked up by s[name] where the name is a string (for the raw PyObject) or a symbol (for automatic type conversion). Unlike the @pyimport macro, this does not define a Julia module and members cannot be accessed with an s.name.