Book Image

Hands-On Machine Learning on Google Cloud Platform

By : Giuseppe Ciaburro, V Kishore Ayyadevara, Alexis Perrier
Book Image

Hands-On Machine Learning on Google Cloud Platform

By: Giuseppe Ciaburro, V Kishore Ayyadevara, Alexis Perrier

Overview of this book

Google Cloud Machine Learning Engine combines the services of Google Cloud Platform with the power and flexibility of TensorFlow. With this book, you will not only learn to build and train different complexities of machine learning models at scale but also host them in the cloud to make predictions. This book is focused on making the most of the Google Machine Learning Platform for large datasets and complex problems. You will learn from scratch how to create powerful machine learning based applications for a wide variety of problems by leveraging different data services from the Google Cloud Platform. Applications include NLP, Speech to text, Reinforcement learning, Time series, recommender systems, image classification, video content inference and many other. We will implement a wide variety of deep learning use cases and also make extensive use of data related services comprising the Google Cloud Platform ecosystem such as Firebase, Storage APIs, Datalab and so forth. This will enable you to integrate Machine Learning and data processing features into your web and mobile applications. By the end of this book, you will know the main difficulties that you may encounter and get appropriate strategies to overcome these difficulties and build efficient systems.
Table of Contents (18 chapters)
8
Creating ML Applications with Firebase

ML and the cloud

In short, artificial intelligence (AI) requires a lot of computing resources. Cloud computing addresses those concerns.

ML is a new type of microscope and telescope, allowing each of to us to push the boundaries of human knowledge and human activities. With ever more powerful ML platforms and open tools, we are able to conquer new realms of knowledge and grow new types of businesses. From the comfort of our laptops, at home, or at the office, we can better understand and predict human behavior in a wide range of domains. Think health care, transportation, energy, financial markets, human communication, human-machine interaction, social network dynamics, economic behavior, and nature (astronomy, global warming, or seismic activity). The list of domains affected by the explosion of AI is truly unlimited. The impact on society? Astounding.

With so many resources available to anyone with an online connection, the barrier to joining the AI revolution has never been lower than it is now. Books, tutorials, MOOCs, and meet-ups, as well as open source libraries in a myriad of languages, are freely available to both the seasoned and the beginner data scientist.

As veteran data scientists know well, data science is always hungry for more computational resources. Classification on the Iris or the MINST image datasets or predictive modeling on Titanic passengers does not reflect real-world data. Real-world data is by essence dirty, incomplete, noisy, multi-sourced, and more often than not, in large volumes. Exploiting these large datasets requires computational power, storage, CPUs, GPUs, and fast I/O.

However, more powerful machines are not sufficient to build meaningful ML applications. Grounded in science, data science requires a scientific mindset with concepts such as reproducibility and reviewing. Both aspects are made easier by working with online accessible resources. Sharing datasets and models and exposing results is always more difficult when the data lives on one person's computer. Reproducing results and maintaining models with new data also requires easy accessibility to assets. And as we work on ever more personalized and critical data (for instance in healthcare), privacy and security concerns become all the more important to the project stakeholders.

This is where the cloud comes in, by offering scalability and accessibility while providing an adequate level of security.

Before diving into GCP, let's learn a bit more about the cloud.

The nature of the cloud

ML projects are resource intensive. From storage to computational power, training models sometimes require resources that cannot be found on a simple standalone computer. Physical limitations in terms of storage have shrunk in recent years. As we now enjoy reliable terabyte storage accessible at reduced prices, storage is no longer an issue for most data projects that are not in the realm of big data. Computing power has also increased so much that what required expensive workstations a few years ago can now run on laptops.

However, despite all this amazingly rapid evolution, the power of the standalone PC is finite. There is an upper limit to the volume of data you can store on your machine and to the time you're willing to wait to get your model trained. New frontiers in AI, with speech-to-text, video captioning in real time, self-driving cars, music generation, or chatbots that can fool a human being and pass the turing test, require ever larger resources. This is especially true of deep learning models, which are too slow on standard CPUs and require GPU-based machines to train in a reasonable amount of time.

ML in the cloud does not face these limitations. What you get with cloud computing is direct access to high-performance computing (HPC). Before the cloud (roughly before AWS launched its Elastic Computing Cloud (EC2) service in 2006), HPC was only available via supercomputers, such as the Cray computers. Cray is a US company that has built some of the most powerful supercomputers since the 1960s. China's Tianhe-2 is now the most powerful supercomputer in the world, with a capacity of 100,000 petaflops (that's 102 x 1015, or 10 to the power of 17 floating-point operations per second!).

A supercomputer not only costs millions of US dollars but also requires its own physical infrastructure and has huge maintenance costs. It is also out of reach for individuals and for most companies. Engineers and researchers, hungry for HPC, now turn to on-demand cloud infrastructures. Cloud service offers are democratizing access to HPC.

Computing in the cloud is built on a distributed architecture. The processors are distributed across different servers instead of being aggregated in one single machine. With a few clicks or command lines, anyone can sign up massively complex banks of servers in a matter of minutes. The amount of power at your command can be mind-blowing.

Cloud computing can not only handle the most demanding optimization tasks but also carry out a simple regression on a tiny dataset. Cloud computing is extremely flexible.

To recap, cloud computing offers:

  • Instantaneity: Resources can be made available in a matter of minutes.
  • On-demand: Instances can be put on stand by or decommissioned when no longer needed.
  • Diversity: The wide range of operating systems, storage, and database solutions, allow the architect to create project-focused architectures, from simple mobile applications to ML APIs.
  • Unlimited resources: If not infinite yet, the volume of resources for storage computing and networks you can assemble is mind-blowing.
  • GPUs: Most PCs are based on CPUs (with the exception of machines optimized for gaming). Deep learning requires GPUs to achieve human-compatible speeds for training models. Cloud computing makes GPUs available at a fraction of the cost needed to buy GPU machines.
  • Controlled accessibility and security: With granular role definitions, service compartmentalization, encrypted connections, and user-based access control, cloud platforms greatly reduce the risk of intrusion and data loss.

Apart from these, there are several other types of cloud platforms and offers on the market.

Public cloud

There are two main types of cloud models depending on the needs of the customers: public versus private and multi-tenant versus single-tenant. These different cloud types offer different levels of management, security, and pricing.

A public cloud consists of resources that are located off-site over the internet. In a public cloud, the infrastructure is typically multi-tenant. Multiple customers can share the same underlying hardware or server. Resources such as networking, storage, power, cooling and computing are all shared. The customer usually has no visibility of where this infrastructure is hosted except for choosing a geographic region. The pricing mode of a public cloud service is based on the volume of data, the computing power that is used and other infrastructure-management-related services—or, more precisely, a mix of RAM, vCPUs, disk, and bandwidth.

In a private cloud, the resources are dedicated to a single customer; the architecture is single-tenant instead of multi-tenant. The servers are located on premise or in a remote data center. Customers own (or rent) the infrastructure and are responsible for maintaining it. Private cloud infrastructures are more expensive to operate as they require dedicated hardware to be secured for a single tenant. Customers of the private cloud have more control over their infrastructure, and therefore they can achieve their compliance and security requirements.

Hybrid clouds are composed of a mix of public clouds and private ones.

The GCP is a public multi-tenant cloud platform. You share the servers you use with other customers and let Google handle the support, the data centers, and the infrastructure.

Managed cloud versus unmanaged cloud

The cloud market has also diversified into two large segments—managed cloud versus unmanaged cloud.

In an unmanaged cloud platform, the infrastructure is self-served. In case of failure, it is the responsibility of the customer to have some mechanisms in place to restore the operations. Unmanaged cloud requires the customer to have the qualified expertise and resources to build, manage, and maintain cloud instances and infrastructures. Focused on self-serving applications, unmanaged cloud offers do not include support with their basic tiers.

In a managed cloud platform, the provider will support the underlying infrastructure by offering monitoring, troubleshooting, and around-the-clock customer service. Managed cloud brings along qualified expertise and resources to the team right away. For many companies, having a service provider to handle their public cloud can be easier and more cost-effective than hiring their own staff to operate their clouds.

The GCP is a public, multi-tenant, and unmanaged cloud service. So are AWS and Azure. Rackspace, on the other hand, is an example of a managed cloud service company. As an example, Rackspace just started offering managed services for GCP in March 2017.

IaaS versus PaaS versus SaaS

Another important distinction is to be made with respect to the amount of work done by the user or by the cloud platform provider. Let us take a look at this distinction with the help of the following service levels:

  • Infrastructure as a Service (IaaS): At the minimum level, IaaS, the cloud provider, handles the machines, their virtualization and the required networking. The user is responsible for everything else—OS, middleware, data, and application software. The provider is the host of the resources on which the user builds the infrastructure. Google compute Engine, SQL, DNS, or load balancing are examples of IaaS services within the GCP.
  • Platform as a service (PaaS): In a PaaS offering, the user is only responsible for the software and the data. Everything else is handled by the cloud provider. The provider builds the infrastructure while the user deploys the software. The main advantage of PaaS over IaaS, besides the reduced workload and need for sysadmin resources, is the automatic scaling for web applications. The appropriate number of resources are automatically allocated as demand fluctuates. Examples of PaaS services include Heroku or the Google App Engine.
  • Software as a service (SaaS): In SaaS, the provider is a software company offering services online while the user consumes the service that are provided. Think Uber, Facebook, or Gmail.

While being mostly an IaaS provider, the GCP also has some PaaS offerings such as the Google App Engine. And its ML APIs (text, speech, video, and image) can be considered as SaaS.

Costs and pricing

Pricing of cloud services is complicated and varies across vendors. Basic cost structure of a cloud service can be broken down into:

  • Computing costs: The duration of running VMs per number of vCPUs, per GB of RAM
  • Storage costs: Disks, files, and databases per GB
  • Networking costs: internal and external, inbound and outbound traffic

Google's preemptible VMs (AWS Spot instances) are VMs that are built on leftover, unused capacity and priced three to four times lower than normal on-demand VMs. However, Compute Engine may terminate (preempt) these instances if it requires access to those resources for other tasks. Preemptible instances are adapted to batch processing jobs or workflows that can withstand sudden interruptions. They may also not always be available. In the next chapter, we learn how to launch preemptible instances from the command line.

Google cloud also recently introduced price reduction for committed use. You get a discount when you reserve instances for a long period of time, typically committing to a usage term of 1 year or 3 years.

The argument of cost cutting when moving to the cloud holds when your infrastructure is evolving quickly and requires scalability and rapid modifications. If your applications are very static with stable load, the cloud may not result in lower costs. In the end, as the cloud offers much more flexibility and opens the way to implementing new projects quickly, the overall cost is higher than with a fixed infrastructure. But this flexibility is the true benefit of cloud computing.

See https://cloud.google.com/compute/pricing for the current Google Compute Engine pricing.

Price war
The costs of cloud services have dwindled in the past several years. The three major public cloud actors have gone through successive phases of price reduction since 2012, when AWS drastically reduced its storage prices to undermine the competition. The four main cloud providers reduced their prices 22 times in 2012 and 26 times in 2013. Reductions ranged from 6% to 30% and touched all types of services: computing, storage, bandwidth, and databases. As of January 2014, Amazon had reduced the price of their offerings over 40 times. These reductions have been matched or exceeded by the other main cloud service providers. Recently, the three main actors have further reduced their prices on storage, possibly reigniting the price war. According to a recent study of cloud computing prices, there isn't much data suggesting that cloud is anywhere near a commodity yet. 451 research said so, further predicting that relational databases are likely to be the next price war battleground.

ML

So, near-instant availability, low cost, flexible architecture, and near-unlimited resources are the advantages of cloud computing, at the expense of extra overhead and recurring costs.

In the global landscape of cloud computing, the GCP is a public unmanaged IaaS cloud offering, with some PaaS and SaaS services. Although Azure and GCP are directly comparable for standard cloud services such as from computing (EC2, Cloud Compute, and so on), databases (BigQuery, Redshift, and so on), network, and so forth; the Google Cloud approach to ML is quite different than Amazon's or Azure's.

In short, AWS offers, either all-in-one services for very specific applications—face recognition and Alexa-related applications, or a predictive analytics platform based on classic (not deep learning) models called Amazon ML. Microsoft's offer is more PaaS centered, with its Cortana Intelligence Suite. Microsoft's ML service is quite similar to AWS's, with more available models.

The GCP ML offer is based on TensorFlow, Google's deep learning library. Google offers a wide range of ML APIs based on pre-trained TensorFlow models for NLP, speech-to-text, translation, image, and video processing. It also offers a platform where you can train your own TensorFlow models and evaluate them (TensorBoard).