Book Image

Machine Learning with BigQuery ML

By : Alessandro Marrandino
Book Image

Machine Learning with BigQuery ML

By: Alessandro Marrandino

Overview of this book

BigQuery ML enables you to easily build machine learning (ML) models with SQL without much coding. This book will help you to accelerate the development and deployment of ML models with BigQuery ML. The book starts with a quick overview of Google Cloud and BigQuery architecture. You'll then learn how to configure a Google Cloud project, understand the architectural components and capabilities of BigQuery, and find out how to build ML models with BigQuery ML. The book teaches you how to use ML using SQL on BigQuery. You'll analyze the key phases of a ML model's lifecycle and get to grips with the SQL statements used to train, evaluate, test, and use a model. As you advance, you'll build a series of use cases by applying different ML techniques such as linear regression, binary and multiclass logistic regression, k-means, ARIMA time series, deep neural networks, and XGBoost using practical use cases. Moving on, you'll cover matrix factorization and deep neural networks using BigQuery ML's capabilities. Finally, you'll explore the integration of BigQuery ML with other Google Cloud Platform components such as AI Platform Notebooks and TensorFlow along with discovering best practices and tips and tricks for hyperparameter tuning and performance enhancement. By the end of this BigQuery book, you'll be able to build and evaluate your own ML models with BigQuery ML.
Table of Contents (20 chapters)
1
Section 1: Introduction and Environment Setup
5
Section 2: Deep Learning Networks
9
Section 3: Advanced Models with BigQuery ML
15
Section 4: Further Extending Your ML Capabilities with GCP

Exploring AI and ML services on GCP

Before we get too deep into our look at all of the AI and ML tools of GCP, it is very important to remember that Google is an AI company and embeds AI and ML features within many of its other products, providing the best user experience to its customers. Simply looking at Google's products, we can easily perceive how AI can be a key asset for a business. Some examples follow:

  • Gmail Smart Reply allows users to quickly reply to emails, providing meaningful suggestions according to the context of the conversation.
  • Google Maps is able to precisely predict our time of arrival when we move from one place to another by combining different data sources.
  • Google Translate provides translation services for more than one hundred languages.
  • YouTube and the Google Play Store are able to recommend the best video to watch or the most useful mobile application to install according to user preferences.
  • Google Photos recognizes people, animals, and places in our pictures, simplifying the job of archiving and organizing our photos.

Google proves that leveraging AI and ML capabilities in our business opens new opportunities for us, increases our revenue, saves money and time, and provides better experiences to our customers.

To better understand the richness of the GCP portfolio in terms of AI and ML services, it is important to emphasize that GCP services are able to address all the needs that emerge in a typical life cycle of an ML model:

  1. Ingestion and preparation of the datasets
  2. Building and training of the model
  3. Evaluation and validation
  4. Deployment
  5. Maintenance and further improvements of the model

In the following figure, you can see the entire AI and ML GCP portfolio:

Figure 1.3 – GCP AI and ML services represented by their icons

Figure 1.3 – GCP AI and ML services represented by their icons

Each one of the previously mentioned five stages can be fully managed by the user or delegated to the automation capabilities of GCP, according to the customer's needs and skills. For this reason, it is possible to divide the AI and ML services provided by GCP into three subcategories:

  • Core platform services
  • AI Applications
  • Solutions

For each of these subcategories, we'll go through the most important services currently available and some typical users that could benefit from them.

Core platform services

The core AI and ML services are the most granular items that a customer can use on GCP to develop AI and ML use cases. They provide the most control and flexibility to their users in exchange for less automation; users will also need to have more expertise in ML.

Processing units (CPU, GPU, and TPU)

With a traditional Infrastructure-as-a-Service (IaaS) approach, developers can equip their Google Compute Engine instances with powerful processing units to accelerate the training phases of ML models that might otherwise take a long time to run, particularly if complex contexts or large amounts of data need to be processed. Beyond the Central Processing Units (CPUs) that are available on our laptops, GCP offers the use of high-performance Graphical Processing Units (GPUs) made by Nvidia and available in the cloud to speed up computationally heavy jobs. Beyond that, there are Tensor Processing Units (TPUs), which are specifically designed to support ML workloads and perform matrix calculations.

Deep Learning VM Image

One of the biggest challenges for data scientists is quickly provisioning environments to develop their ML models. For this reason, Google Cloud provides pre-configured Google Compute Engine (GCE) images that can be easily provisioned with a pre-built set of components and libraries dedicated to ML.

In the following screenshot, you can see how these Virtual Machines (VMs) are presented in the GCP marketplace:

Figure 1.4 – Deep Learning VM in the GCP marketplace

Figure 1.4 – Deep Learning VM in the GCP marketplace

Deep Learning VM Image is also optimized for ML workloads and is already pre-configured to use GPUs. When a GCE image is provisioned from the GCP marketplace, it is already configured with the most common ML frameworks and programming languages, such as Python, TensorFlow, scikit-learn, and others. This allows data scientists to focus on the development of the model rather than on the provisioning and configuration of the development environment.

TensorFlow

TensorFlow is an open source framework for math, statistics, and ML. It was launched by Google Brain for internal use at Google and then released under the Apache License 2.0. It is still the core of the most successful Google products. The framework natively supports Python but can be used also with other programming languages such as Java, C++, and Go. It requires ML expertise, but it allows users to achieve great results in terms of customization and flexibility to develop the best ML model.

AI Platform

AI Platform is an integrated service of GCP that provides serverless tools to train, evaluate, deploy, and maintain ML models. With this service, data scientists are able to focus only on their code, simplifying all the side activities of ML development, such as provisioning, maintenance, and scalability.

AI Platform Notebooks

AI Platform Notebooks is a fully managed service that provides data scientists with a JupyterLab environment already integrated and connected with all other GCP resources. Similar to Deep Learning VM Image, AI Platform Notebooks instances come pre-configured with the latest versions of the AI and ML frameworks and allow you to develop an ML model with diagrams and written explanations.

All the services described so far require good knowledge of ML and proven experience in hand-coding with the most common programming languages. The core platform services address the needs of data scientists and ML engineers who need full control over and flexibility with the solutions that they're building and who already have strong technical skills.

Building blocks

On top of the core platform services, Google Cloud provides pre-built components that can be used to accelerate the development of new ML use cases. This category encompasses the following aspects:

AutoML

Unlike the services outlined in the previous section, AutoML offers the ability to build ML models even if you have limited expertise in the field. It leverages Google's ML capabilities and allows users to provide their data to train customized versions of algorithms already developed by Google. AutoML currently provides the ability to train models for images (AutoML Vision), video (AutoML Video Intelligence), free text (AutoML Natural Language), translation (AutoML Translation), and structured data (AutoML Tables). When the ML model is trained and ready to use, it is automatically deployed and made available through a REST endpoint.

Pre-built APIs

Google Cloud provides pre-built APIs that leverage ML technology under the surface but are already trained and ready to use. The APIs are exposed through a standard REST interface that can be easily integrated into applications to work with images (Vision API), videos (Video API), free text (Natural Language API), translations (Translation API), e-commerce data (Recommendations AI), and conversational scenarios (Speech-to-Text API, Text-to-Speech API, and Dialogflow). Using a pre-built ML API is the best choice for general-purpose applications where generic training datasets can be used.

BigQuery ML

As BigQuery ML will be discussed in detail in the following sections of this chapter, for the moment you just need to know that this component enables users to build ML models with SQL language, using structured data stored in BigQuery and a list of supported algorithms.

None of the building blocks described here requires any specific knowledge of ML or any proven coding experience with programming languages. In fact, these services are intended for developers or business analysts who are not very familiar with ML but want to start using it quickly and with little effort. On the other hand, a data scientist with ML expertise can also leverage the building blocks to accelerate the development of a model, reducing the time to market of a solution.

To see a summary of the building blocks, their usage, and their target users, let's take a look at the following table:

Figure 1.5 – Building blocks summary table

Figure 1.5 – Building blocks summary table

Now that we've learned the basics of building blocks, let's take a look at the solutions offered by GCP.

Solutions

Following the incremental approach, building blocks and core platform services are also bundled to provide out-of-the-box solutions. These pre-built modules can be adopted by companies and immediately used to improve their business. These solutions are covered in this section.

AI Hub

Google Cloud's AI Hub acts as a marketplace for AI components. It can be used in public mode to share and use assets developed by the community, which actively works on GCP, or it can be used privately to share ML assets inside your company. The goal of this service is to simplify the sharing of valuable assets across different users, favoring re-use and accelerating the deployment of new use cases.

In the following screenshot, you can see AI Hub's home page:

Figure 1.6 – Screenshot of AI Hub on GCP

Figure 1.6 – Screenshot of AI Hub on GCP

Now that we've understood the role of AI Hub, let's look at Cloud Talent Solution.

Cloud Talent Solution

Cloud Talent Solution is basically a solution for HR offices that improves the candidate discovery and hiring processes using AI. We will not go any further with the description of this solution, but there will be a link in the Further resources section at the end of this chapter.

Contact Center AI

Contact Center AI is a solution that can be used to improve the effectiveness of the customer experience with a contact center powered by AI and automation. The solution is based on Dialogflow and the Text-to-Speech and Speech-to-Text APIs.

Document AI

This solution is focused on document processing to extract relevant information and streamline business processes that usually require manual effort. The solution is able to parse PDF files, images, and handwritten text to convert this information into a digitally structured format, making them accessible and researchable.

As can be easily seen from their descriptions, the AI solutions provided by Google are more business-oriented and designed to solve specific challenges. They can be configured and customized but are basically dedicated to business users.

Before going on, let's take a look at the following table, which summarizes the concepts explained in this section and provides a clear overview of the different AI and ML service categories:

Figure 1.7 – Summary of GCP AI and ML services

Figure 1.7 – Summary of GCP AI and ML services

Tip

When you need to develop a new use case, we recommend using pre-built solutions and building blocks before trying to reinvent the wheel. If a building block already satisfies all the requirements of your use case, it can be extremely valuable to use it. It will save time and effort during the development and maintenance phases. Start considering the use of core services only if the use case is complex or so particular that it cannot be addressed with building blocks or solutions.

As we've seen in this section, GCP's AI and ML services are extensive. Now, let's take a closer look at the main topic of this book: Google BigQuery.