Book Image

Serverless Architectures with AWS

By : Mohit Gupta
Book Image

Serverless Architectures with AWS

By: Mohit Gupta

Overview of this book

Serverless Architecture with AWS begins with an introduction to the serverless model and helps you get started with AWS and Lambda. You'll also get to grips with other capabilities of the AWS Serverless Platform and see how AWS supports enterprise-grade serverless applications with and without Lambda. This book will guide you in deploying your first serverless project and exploring the capabilities of serverless Amazon Athena, an interactive query service that makes it easy to analyze data in Amazon Simple Storage Service (S3 Amazon) using standard SQL. You’ll also learn about AWS Glue, a fully managed ETL service that makes categorizing data easy and cost-effective. You'll study how Amazon Kinesis makes it possible to unleash the potential of real-time data insights and analytics with capabilities such as video streams, data streams, data firehose, and data analytics. Last but not least, you’ll be equipped to combine Amazon Kinesis capabilities with AWS Lambda to create lightweight serverless architectures. By the end of the book, you will be ready to create and run your first serverless application that takes advantage of the high availability, security, performance, and scalability of AWS.
Table of Contents (8 chapters)

Chapter 4: Serverless Amazon Athena and the AWS Glue Data Catalog


Solution for Activity 5: Building a AWS Glue catalog for a CSV-Formatted Dataset and Analyzing the Data Using AWS Athena

  1. Log in to your AWS account.

  2. Upload the data file total-business-inventories-to-sales-ratio.csv (provided with this book) into a S3 bucket. Make sure that the required permissions are in place:

    Figure 4.24: Uploading the data file

  3. Go to the AWS Glue service.

  4. Select Crawlers and click on Add Crawler.

  5. Provide the crawler name and click on Next.

  6. Provide the path of the S3 bucket, where the file was uploaded in step 2. Click on Next.

  7. Click on Next, as we don't want to add another data store.

  8. Choose an existing IAM role that was created in Exercise 11: Using AWS Glue to Build a Metadata Repository. Alternatively, you can create a new one. Click on Next.

  9. Let's keep it as Run on demand and click on Next.

  10. Either you can create a new database here or click on the dropdown to select an existing one. Click on Next.

  11. Review the settings and click on Finish. You have successfully created the crawler.

  12. Now, go ahead and run the crawler.

  13. Once the run of the crawler is completed, you will see a new table being created under the schema that you chose in step 10:

    Figure 4.25: The new table after the crawler run was completed

  14. Go to tables, and you should see the newly created table, inventory_sales_ratio. Note that the table name is derived from the bucket name.

  15. Go to the AWS Athena service. You should see a new table name under the database that was selected in step 10.

  16. Click on new query and write the following query to get the expected output:

    select  month(try(date_parse(observed_date, '%m/%d/%Y'))) a, count(*) from inventory_sales_ratio
    where observed_value < 1.25 group by month(try(date_parse(observed_date, '%m/%d/%Y')))
    order by a ;
  17. When the query gets executed, you should see the expected output:

    Figure 4.26: The output after the query has run

  18. Looking at the output, we have a total of 8 months since 1992 where the inventories to sales ratios was < 1.25. We also have the month level count as well.

We have successfully completed the activity.