Book Image

Data Engineering with AWS

By : Gareth Eagar
Book Image

Data Engineering with AWS

By: Gareth Eagar

Overview of this book

Written by a Senior Data Architect with over twenty-five years of experience in the business, Data Engineering for AWS is a book whose sole aim is to make you proficient in using the AWS ecosystem. Using a thorough and hands-on approach to data, this book will give aspiring and new data engineers a solid theoretical and practical foundation to succeed with AWS. As you progress, you’ll be taken through the services and the skills you need to architect and implement data pipelines on AWS. You'll begin by reviewing important data engineering concepts and some of the core AWS services that form a part of the data engineer's toolkit. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how the transformed data is used by various data consumers. You’ll also learn about populating data marts and data warehouses along with how a data lakehouse fits into the picture. Later, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. In the final chapters, you'll understand how the power of machine learning and artificial intelligence can be used to draw new insights from data. By the end of this AWS book, you'll be able to carry out data engineering tasks and implement a data pipeline on AWS independently.
Table of Contents (19 chapters)
1
Section 1: AWS Data Engineering Concepts and Trends
6
Section 2: Architecting and Implementing Data Lakes and Data Lake Houses
13
Section 3: The Bigger Picture: Data Analytics, Data Visualization, and Machine Learning

Hands-on – creating and accessing your AWS account

The projects in this book require you to access an AWS account with administrator privileges. If you already have administrator privileges for an AWS account and know how to access the AWS Management Console, you can skip this section and move on to Chapter 2, Data Marts, Data Lakes, and the Data Lakehouse.

If you are making use of a corporate AWS account, you will want to check with your AWS cloud operations team to ensure that your account has administrative privileges. Even if your daily-use account does not allow full administrative privileges, your cloud operations team may be able to create a sandbox account for you.

What is a sandbox account?

A sandbox account is an account isolated from your corporate production systems with relevant guardrails and governance in place, and is used by many organizations to provide a safe space for teams or individual developers to experiment with cloud services.

If you cannot get administrative access to a corporate account, you will need to create a personal AWS account or work with your cloud operations team to request specific permissions needed to complete each section. Where possible, we will provide links to AWS documentation that will list the required permissions, but the full details of the required permissions will not be covered directly in this book.

Important note about the costs associated with the hands-on tasks in this book

If you are creating a new personal account or using an existing personal account, you will incur and be responsible for AWS costs as you follow along in this book. While some services may fall under AWS free-tier usage, some of the services covered in this book will not. We strongly encourage you to set up budget alerts within your account and to regularly check your billing console.

See the AWS documentation on monitoring your usage and costs at https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/monitoring-costs.html.

Creating a new AWS account

To create a new AWS account, you will need the following things:

  • An email address (or alias) that has not been used before to register an AWS account
  • A phone number that can be used for important account verification purposes
  • Your credit or debit card, which will be charged for AWS usage outside of the Free Tier

    Tip regarding the phone number you use when registering

    It is important that you keep your contact details up to date for your AWS account, as if you lose access to your account, you will need access to the email address and phone number registered for the account. If you expect that your contact number may change in the future, consider registering a virtual number that you will always be able to access and that you can forward to your primary number. One such service that enables this is Google Voice (http://voice.google.com).

The following steps will guide you through creating a new AWS account:

  1. Navigate to the AWS landing page at http://aws.amazon.com.
  2. Click on the Create an AWS Account link.
  3. Provide an email address, specify a secure password (one that you have not used elsewhere), and provide a name for your account.

    Tip about reusing an existing email address

    Some email systems support adding a + sign followed by a few characters to the end of the username portion of your email address in order to create a unique email address that still goes to your same mailbox. For example, [email protected] and [email protected] will both go to the primary email address inbox. If you have used your primary email address previously to register an AWS account, you can use this tip to provide a unique email address during registration, but still have emails delivered to your primary account.

  4. Select Professional or Personal for the account type (note that the functionality and tools available are the same no matter which one you pick).
    Figure 1.1 – Contact information during AWS account sign-up

    Figure 1.1 – Contact information during AWS account sign-up

  5. Provide the requested personal information and then after reviewing the terms of the AWS Customer Agreement, click the checkbox if you agree to the terms, and then click on Create Account and Continue.
  6. Provide a credit or debit card for payment information and select Verify and Add.
  7. Provide a phone number for a verification text or call, enter the characters shown for the security check, and complete the verification.
    Figure 1.2 – Confirming your identity during AWS account sign-up

    Figure 1.2 – Confirming your identity during AWS account sign-up

  8. Select a support plan.
  9. You will receive a notification that your account is being activated. This usually completes in a few minutes, but it can take up to 24 hours. Check your email to confirm account activation.

    What to do if you don't receive a confirmation email within 24 hours

    If you do not receive an email confirmation within 24 hours confirming that your account has been activated, follow the troubleshooting steps provided by AWS Premium Support at https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/.

Accessing your AWS account

Once you have received the confirmation email confirming that your account has been activated, follow these steps to access your account and to create a new admin user:

  1. Access the AWS console login page at http://console.aws.amazon.com.
  2. Make sure Root user is selected, and then enter the email address that you used when creating the account.
  3. Enter the password that you set when creating the account.

    Best practices for securing your account

    When you log in using the email address you specified when registering the account, you are logging in as the account's root user. It is a recommended best practice that you do not use this login for your day-to-day activities, but rather only use this when performing activities that require the root account, such as creating your first Identity and Access Management (IAM) user, deleting the account, or changing your account settings. For more information, see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_root-user.html.

    It is also strongly recommended that you enable Multi-Factor Authentication (MFA) on this and other administrative accounts. To enable this, see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_mfa_enable_virtual.html.

In the following steps, we are going to create a new IAM administrative user account:

  1. In the AWS Management Console, confirm which Region you are currently in. You can select any region, such as the Region closest to you geographically.

    Important note about pricing differences in AWS Regions

    Note that pricing for AWS services differs from Region to Region, so take this into account when selecting a Region to use for the exercises in this book and make sure you are always in the same Region when working through the exercises. 

    In the following screenshot, the user is in the Ohio Region (also known as us-east-2):

    Figure 1.3 – AWS Management Console

    Figure 1.3 – AWS Management Console

  2. In the search bar in the top middle of the screen, type in IAM and press Enter. This brings up the console for IAM.
  3. On the left-hand side menu, click Users and then Add user.
  4. Provide a username and select both Programmatic access as well as AWS Management Console access.
  5. Set a password for the console, and select whether to force a password change on the next login, then click Next: Permissions.
    Figure 1.4 – Creating a new user in the AWS Management Console

    Figure 1.4 – Creating a new user in the AWS Management Console

  6. For production accounts, it is best practice to grant permissions with a policy of least privilege, giving each user only the permissions they specifically require to perform their role. However, AWS managed policies can be used to cover common use cases in test accounts, and so to simplify the setup of our test account, we will use the AdministratorAccess managed policy. This policy gives full access to all AWS resources in the account.

    On the Set permissions screen, select Attach existing policies directly. From the list of policies, select AdministratorAccess. Then, click Next: Tags.

  7. Optionally, specify tags (key-value pairs), then click Next: Review.
  8. Review the settings, and then click Create user.
  9. Take note of the URL to sign in to your account.
  10. Take note of the access key ID and secret access key as you will need these later. This is the only opportunity you will have to record the secret access key so it is important to safely record this information now:
Figure 1.5 – Successful creation of new IAM user

Figure 1.5 – Successful creation of new IAM user

Important note about protecting your account

Make sure you protect this information as anyone who has access to your access key ID and secret access key is able to perform full administrative functions in your account, including deploying resources that you will be responsible for paying for.

For the remainder of the tutorials in this book, you should log in using the URL provided and the username and password you set for your IAM user. You should also strongly consider enabling MFA for this account, a recommended best practice for all accounts with administrator permissions.