Book Image

Machine Learning Engineering with Python

By : Andrew P. McMahon
Book Image

Machine Learning Engineering with Python

By: Andrew P. McMahon

Overview of this book

Machine learning engineering is a thriving discipline at the interface of software development and machine learning. This book will help developers working with machine learning and Python to put their knowledge to work and create high-quality machine learning products and services. Machine Learning Engineering with Python takes a hands-on approach to help you get to grips with essential technical concepts, implementation patterns, and development methodologies to have you up and running in no time. You'll begin by understanding key steps of the machine learning development life cycle before moving on to practical illustrations and getting to grips with building and deploying robust machine learning solutions. As you advance, you'll explore how to create your own toolsets for training and deployment across all your projects in a consistent way. The book will also help you get hands-on with deployment architectures and discover methods for scaling up your solutions while building a solid understanding of how to use cloud-based tools effectively. Finally, you'll work through examples to help you solve typical business problems. By the end of this book, you'll be able to build end-to-end machine learning services using a variety of techniques and design your own processes for consistently performant machine learning engineering.
Table of Contents (13 chapters)
1
Section 1: What Is ML Engineering?
4
Section 2: ML Development and Deployment
9
Section 3: End-to-End Examples

Scaling with Spark

Apache Spark came from the work of some brilliant researchers at the University of California, Berkeley in 2012 and since then, it has revolutionized how we tackle problems with large datasets. Before Spark, the dominant paradigm for big data was Hadoop MapReduce, which is a lot less popular now.

Spark is a cluster computing framework, which means it works on the principle that several computers are linked together in a way that allows computational tasks to be shared. This allows us to coordinate these tasks effectively. Whenever we discuss running Spark jobs, we always talk about the cluster we are running on. This is the collection of computers that perform the tasks, the worker nodes, and the computer that hosts the organizational workload, known as the head node.

Spark is written in Scala, a language with a strong functional flavor and that compiles down to Java Virtual Machines (JVMs). Since this is a book about ML engineering in Python, we don't...