Book Image

Machine Learning with BigQuery ML

By : Alessandro Marrandino
Book Image

Machine Learning with BigQuery ML

By: Alessandro Marrandino

Overview of this book

BigQuery ML enables you to easily build machine learning (ML) models with SQL without much coding. This book will help you to accelerate the development and deployment of ML models with BigQuery ML. The book starts with a quick overview of Google Cloud and BigQuery architecture. You'll then learn how to configure a Google Cloud project, understand the architectural components and capabilities of BigQuery, and find out how to build ML models with BigQuery ML. The book teaches you how to use ML using SQL on BigQuery. You'll analyze the key phases of a ML model's lifecycle and get to grips with the SQL statements used to train, evaluate, test, and use a model. As you advance, you'll build a series of use cases by applying different ML techniques such as linear regression, binary and multiclass logistic regression, k-means, ARIMA time series, deep neural networks, and XGBoost using practical use cases. Moving on, you'll cover matrix factorization and deep neural networks using BigQuery ML's capabilities. Finally, you'll explore the integration of BigQuery ML with other Google Cloud Platform components such as AI Platform Notebooks and TensorFlow along with discovering best practices and tips and tricks for hyperparameter tuning and performance enhancement. By the end of this BigQuery book, you'll be able to build and evaluate your own ML models with BigQuery ML.
Table of Contents (20 chapters)
1
Section 1: Introduction and Environment Setup
5
Section 2: Deep Learning Networks
9
Section 3: Advanced Models with BigQuery ML
15
Section 4: Further Extending Your ML Capabilities with GCP

Introducing Google Cloud Platform

Starting from 1998 with the launch of Google Search, Google has developed one of the largest and most powerful IT infrastructures in the world. Today, this infrastructure is used by billions of users to use services such as Gmail, YouTube, Google Photo, and Maps. After 10 years, in 2008, Google decided to open its network and IT infrastructure to business customers, taking an infrastructure that was initially developed for consumer applications to public service and launching Google Cloud Platform (GCP).

The 90+ services that Google currently provides to large enterprises and small- and medium-sized businesses cover the following categories:

  • Compute: Used to support workloads or applications with virtual machines such as Google Compute Engine, containers with Google Kubernetes Engine, or platforms such as AppEngine.
  • Storage and databases: Used to store datasets and objects in an easy and convenient way. Some examples are Google Cloud Storage, Cloud SQL, and Spanner.
  • Networking: Used to easily connect different locations and data centers across the globe with Virtual Private Clouds (VPCs), firewalls, and fully managed global routers.
  • Big data: Used to store and process large amounts of information in a structured, semi-structured, or unstructured format. Among these services are Google DataProc, the Hadoop services offered by GCP, and BigQuery, which is the main focus of this book.
  • AI and machine learning: This product area provides various tools for different kinds of users, enabling them to leverage AI and ML in their everyday business. Some examples are TensorFlow, AutoML, Vision APIs, and BigQuery ML, the main focus of this book.
  • Identity, security, and management tools: This area includes all the services that are necessary to prevent unauthorized access, ensure security, and monitor all other cloud infrastructure. Identity Access Management, Key Management Service, Cloud Logging, and Cloud Audit Logs are just some of these tools.
  • Internet of Things (IoT): Used to connect plants, vehicles, or any other objects to the GCP infrastructure, enabling the development of modern IoT use cases. The core component of this area is Google IoT Core.
  • API management: Tools to expose services to customers and partners through REST APIs, providing the ability to fully leverage the benefits of interconnectivity. In this pillar, Google Apigee is one of the most famous products and is recognized as the leader of this market segment.
  • Productivity: Used to improve productivity and collaboration for all companies that want to start working with Google and embracing its way of doing business through the powerful tools of Google Workplace (previously GSuite).

Interacting with GCP

All the services just mentioned can be accessed through four different interfaces:

  • Google Cloud Console: The web-based user interface of GCP, easily accessible from compatible web browsers such as Google Chrome, Edge, or Firefox. For the hands-on exercises in this book, we'll mainly use Google Cloud Console:
Figure 1.1 – Screenshot of Google Cloud Console

Figure 1.1 – Screenshot of Google Cloud Console

  • Google Cloud SDK: The client SDK can be installed in order to interact with GCP services through the command line. It can be very useful to automate tasks and operations by scheduling them into scripts.
  • Client libraries: The SDK also includes some client libraries to interact with GCP using the most common programming languages, such as Python, Java, and Node.js.
  • REST APIs: Any task or operation performed on GCP can be executed by invoking a specific REST API from any compatible software.

Now that we've learned how to interact with GCP, let's discover how GCP is different from other cloud providers.

Discovering GCP's key differentiators

GCP is not the only public cloud provider on the market. Other companies have embarked on this kind of business, for example, with Amazon Web Services (AWS), Microsoft Azure, IBM, and Oracle. For this reason, before we get too deep into this book, it could be valuable to understand how GCP is different from the other offerings in the cloud market.

Each cloud provider has its own mission, strategy, history, and strengths. Let's take a look at why Google Cloud can be considered different from all the other cloud providers.

Security

Google provides an end-to-end security model for its data centers across the globe, using customized hardware developed and used by Google, and application encryption is enabled by default. The security best practices adopted by Google for GCP are the same as those developed to run applications with more than 1 billion users, such as Gmail and Google Maps.

Global network and infrastructure

At the time of writing, Google's infrastructure is available in 24 different regions, 74 availability zones, and 144 network edge locations, enabling customers to connect to Google's network and ensuring the best experience in terms of bandwidth, network latency, and security. This network allows GCP users to move data across different regions without leaving Google's proprietary network, minimizing the risk of sending information across the public internet. As of today, it is estimated that about 40% of internet traffic goes through Google's proprietary network.

In the following figure, we can see how GCP regions are distributed across the globe:

Figure 1.2 – A map of Google's global availability

Figure 1.2 – A map of Google's global availability

The latest version of the map can be seen at the following URL: https://cloud.google.com/about/locations.

Serverless and fully managed approach

Google provides a lot of fully managed and serverless services to allow its customers to focus on high-value activities rather than maintenance operations. A great example is BigQuery, the serverless data warehouse that will be introduced in the next section of this chapter.

Environmental sustainability

100% of the energy used for Google's data centers comes from renewable energy sources. Furthermore, Google has committed to being the first major company to operate carbon-free for all its operations, such as its data centers and campuses, by 2030.

Pervasive AI

Google is a pioneer of the AI industry and is leveraging AI and ML to improve its consumer products, such as Google Photos, but also to improve the performance and efficiency of its data centers. All of Google's expertise in terms of AI and ML can be leveraged by customers through adopting GCP services such as AutoML and BigQuery ML. That will be the main focus of this book.

Now that we have discussed some of the key elements of GCP as a service, let's look at AI and ML more specifically.