Julia for Data Science

Book Image

Julia for Data Science

By : Anshul Joshi

2 (1)

Book Image

Julia for Data Science

2 (1)

By: Anshul Joshi

Overview of this book

Julia is a fast and high performing language that's perfectly suited to data science with a mature package ecosystem and is now feature complete. It is a good tool for a data science practitioner. There was a famous post at Harvard Business Review that Data Scientist is the sexiest job of the 21st century. (https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century). This book will help you get familiarised with Julia's rich ecosystem, which is continuously evolving, allowing you to stay on top of your game. This book contains the essentials of data science and gives a high-level overview of advanced statistics and techniques. You will dive in and will work on generating insights by performing inferential statistics, and will reveal hidden patterns and trends using data mining. This has the practical coverage of statistics and machine learning. You will develop knowledge to build statistical models and machine learning systems in Julia with attractive visualizations. You will then delve into the world of Deep learning in Julia and will understand the framework, Mocha.jl with which you can create artificial neural networks and implement deep learning. This book addresses the challenges of real-world data science problems, including data cleaning, data preparation, inferential statistics, statistical modeling, building high-performance machine learning systems and creating effective visualizations using Julia.

Julia for Data Science

Julia for Data Science

Credits

About the Author

About the Author

About the Reviewer

About the Reviewer

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

The Groundwork – Julia's Environment

The Groundwork – Julia's Environment

Julia is different

Setting up the environment

Using Jupyter Notebook

Package management

Parallel computation using Julia

Julia's key feature – multiple dispatch

Facilitating language interoperability

Data Munging

What is data munging?

What is a DataFrame?

Data Exploration

Data Exploration

Inferring column types

Basic statistical summaries

Scalar statistics

Measures of variation

Scatter matrix and covariance

Computing deviations

Counting functions

Correlation analysis

Deep Dive into Inferential Statistics

Deep Dive into Inferential Statistics

Understanding the sampling distribution

Understanding the normal distribution

Type hierarchy in Distributions.jl

Univariate distributions

Truncated distributions

Understanding multivariate distributions

Understanding matrixvariate distributions

Distribution fitting

Confidence interval

Understanding z-score

Understanding the significance of the P-value

Making Sense of Data Using Visualization

Making Sense of Data Using Visualization

Difference between using and importall

Pyplot for Julia

Visualizing using Vega

Data visualization using Gadfly

Supervised Machine Learning

Supervised Machine Learning

What is machine learning?

Machine learning – the process

Understanding decision trees

Supervised learning using Naïve Bayes

Unsupervised Machine Learning

Unsupervised Machine Learning

Understanding clustering

K-means clustering

Creating Ensemble Models

Creating Ensemble Models

What is ensemble learning?

Implementation in Julia

Why is ensemble learning superior?

Time Series

What is forecasting?

What is TimeSeries?

Implementation in Julia

Collaborative Filtering and Recommendation System

Collaborative Filtering and Recommendation System

What is a recommendation system?

Association rule mining

Content-based filtering

Collaborative filtering

Building a movie recommender system

Introduction to Deep Learning

Introduction to Deep Learning

Revisiting linear algebra

Probability and information theory

Differences between machine learning and deep learning

Implementation in Julia

Customer Reviews

2 (1)

5 star

0

4 star

0

3 star

0

2 star

100%

1 star

0

Why is ensemble learning superior?

To comprehend the generalization power of ensemble learning being superior to an individual learner, Dietterich provided three reasons.

These three reasons help us understand the reason for the superiority of ensemble learning leading to a better hypothesis:

The training information won't give adequate data to picking a single best learner. For instance, there might be numerous learners performing similarly well on the training information set. In this way, joining these learners might be a superior decision.
The second reason is that, the search procedures of the learning algorithms may be defective. For instance, regardless of the possibility that there exists a best hypothesis, the learning algorithms may not be able to achieve that due to various reasons including generation of an above average hypothesis. Ensemble learning can improve on that part by increasing the possibility to achieve the best hypothesis.
The third reason is that one target function...