Book Image

Practical Machine Learning

By : Sunila Gollapudi
Book Image

Practical Machine Learning

By: Sunila Gollapudi

Overview of this book

This book explores an extensive range of machine learning techniques uncovering hidden tricks and tips for several types of data using practical and real-world examples. While machine learning can be highly theoretical, this book offers a refreshing hands-on approach without losing sight of the underlying principles. Inside, a full exploration of the various algorithms gives you high-quality guidance so you can begin to see just how effective machine learning is at tackling contemporary challenges of big data This is the only book you need to implement a whole suite of open source tools, frameworks, and languages in machine learning. We will cover the leading data science languages, Python and R, and the underrated but powerful Julia, as well as a range of other big data platforms including Spark, Hadoop, and Mahout. Practical Machine Learning is an essential resource for the modern data scientists who want to get to grips with its real-world application. With this book, you will not only learn the fundamentals of machine learning but dive deep into the complexities of real world data before moving on to using Hadoop and its wider ecosystem of tools to process and manage your structured and unstructured data. You will explore different machine learning techniques for both supervised and unsupervised learning; from decision trees to Naïve Bayes classifiers and linear and clustering methods, you will learn strategies for a truly advanced approach to the statistical analysis of data. The book also explores the cutting-edge advancements in machine learning, with worked examples and guidance on deep learning and reinforcement learning, providing you with practical demonstrations and samples that help take the theory–and mystery–out of even the most advanced machine learning methodologies.
Table of Contents (23 chapters)
Practical Machine Learning
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Big data and the context of large-scale Machine learning


I have covered some of the core aspects of big data in my previous Packt book titled Getting Started with Greenplum for Big Data Analytics. In this section, we will quickly recap some of the core aspects of big data and its impact in the field of Machine learning:

  • The definition of large-scale is a scale of terabytes, petabytes, exabytes, or higher. This is typically the volume that cannot be handled by traditional database engines. The following chart lists the orders of magnitude that represents data volumes:

    Multiples of bytes

    SI decimal prefixes

    Binary Usage

    Name(Symbol)

    Value

    Kilobyte (KB)

    103

    210

    Megabyte (MB)

    106

    220

    Gigabyte (GB)

    109

    230

    Terabyte (TB)

    1012

    240

    Petabyte (PB)

    1015

    250

    Exabyte (EB)

    1018

    260

    Zettabyte (ZB)

    1021

    270

    Yottabyte (YB)

    1024

    280

  • Data formats that are referred to in this context are distinct; they are generated and consumed, and need not be structured (for example, DBMS...