Book Image

Learning AWS - Second Edition

By : Aurobindo Sarkar, Amit Shah
Book Image

Learning AWS - Second Edition

By: Aurobindo Sarkar, Amit Shah

Overview of this book

Amazon Web Services (AWS) is the most popular and widely-used cloud platform. Administering and deploying application on AWS makes the applications resilient and robust. The main focus of the book is to cover the basic concepts of cloud-based development followed by running solutions in AWS Cloud, which will help the solutions run at scale. This book not only guides you through the trade-offs and ideas behind efficient cloud applications, but is a comprehensive guide to getting the most out of AWS. In the first section, you will begin by looking at the key concepts of AWS, setting up your AWS account, and operating it. This guide also covers cloud service models, which will help you build highly scalable and secure applications on the AWS platform. We will then dive deep into concepts of cloud computing with S3 storage, RDS and EC2. Next, this book will walk you through VPC, building real-time serverless environments, and deploying serverless APIs with microservices. Finally, this book will teach you to monitor your applications, automate your infrastructure, and deploy with CloudFormation. By the end of this book, you will be well-versed with the various services that AWS provides and will be able to leverage AWS infrastructure to accelerate the development process.
Table of Contents (12 chapters)

Using AWS Glue and Amazon Athena

In this section, we will use AWS Glue to create a crawler, an ETL job, and a job that runs KMeans clustering algorithm on the input data.

We use a publicly available dataset about the students' knowledge status on a subject. The dataset and the field descriptions are available for download from the UCI site:

  1. Log in to the AWS Management Console and go to the Glue console. Click on the Add crawler button.
  2. Specify the Crawler name as User Modeling Data Crawler as shown here. Click on the Next button:
  1. In the Add a data store screen, select S3 as the Data store, and select the Specified path in my account option. Specify the path for the S3 bucket containing the input data. Click on the Next button:
  1. Select No on the Add another data store and click on the Next button.
  2. On the...