Book Image

The Self-Taught Cloud Computing Engineer

By : Dr. Logan Song
Book Image

The Self-Taught Cloud Computing Engineer

By: Dr. Logan Song

Overview of this book

The Self-Taught Cloud Computing Engineer is a comprehensive guide to mastering cloud computing concepts by building a broad and deep cloud knowledge base, developing hands-on cloud skills, and achieving professional cloud certifications. Even if you’re a beginner with a basic understanding of computer hardware and software, this book serves as the means to transition into a cloud computing career. Starting with the Amazon cloud, you’ll explore the fundamental AWS cloud services, then progress to advanced AWS cloud services in the domains of data, machine learning, and security. Next, you’ll build proficiency in Microsoft Azure Cloud and Google Cloud Platform (GCP) by examining the common attributes of the three clouds while distinguishing their unique features. You’ll further enhance your skills through practical experience on these platforms with real-life cloud project implementations. Finally, you’ll find expert guidance on cloud certifications and career development. By the end of this cloud computing book, you’ll have become a cloud-savvy professional well-versed in AWS, Azure, and GCP, ready to pursue cloud certifications to validate your skills.
Table of Contents (24 chapters)
1
Part 1: Learning about the Amazon Cloud
9
Part 2:Comprehending GCP Cloud Services
14
Part 3:Mastering Azure Cloud Services
19
Part 4:Developing a Successful Cloud Career

AWS Glue

As we explained earlier, AWS Glue is an ETL process used to extract data from various sources, transform it into a consistent format and structure, and then load it into a target data repository, such as an S3 bucket or a data warehouse. In an ETL process such as the one used in AWS Glue, the data is typically transformed before it is loaded into the target database. AWS Glue has the following features:

  • Automatically generate schemas from semi-structured data by using crawlers, which run on your data sources, derive a schema from them, and populate the Data Catalog. Crawlers can run on many data stores, including Amazon S3, Amazon Redshift, most relational databases, and DynamoDB. By using the metadata in the Data Catalog, you can also automatically generate scripts with AWS Glue extensions as the starting point of your AWS Glue jobs.
  • Catalog data and get a unified view with the AWS Glue Data Catalog, which stores metadata including schema information about data...