Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Introduction


The following is Wikipedia's definition of machine learning:

"Machine learning is a scientific discipline that explores the construction and study of algorithms that can learn from data."

Essentially, machine learning is making use of past data to make predictions about the future. Machine learning heavily depends upon statistical analysis and methodology.

In statistics, there are four types of measurement scales:

Scale type

Description

Nominal Scale

=, ≠

Identifies categories

Can't be numeric

Example: male, female

Ordinal Scale

=, ≠, <, >

Nominal scale +

Ranks from least important to most important

Example: corporate hierarchy

Interval Scale

=, ≠, <, >, +, -

Ordinal scale + distance between observations

Numbers assigned to observations indicate order

Difference between any consecutive values is same as others

60° temperature is not the double of 30°

Ratio Scale

=, ≠, <, >, +, ×, ÷

Interval scale +ratios of observations

$20 is twice as costly as $10

Another...