Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Loading data from Amazon S3


Amazon Simple Storage Service (S3) provides developers and IT teams with a secure, durable, and scalable storage platform. The biggest advantage of Amazon S3 is that there is no up-front IT investment and companies can build capacity (just by clicking a button a button) as they need.

Though Amazon S3 can be used with any compute platform, it integrates really well with Amazon's cloud services such as Amazon Elastic Compute Cloud (EC2) and Amazon Elastic Block Storage (EBS). For this reason, companies who use Amazon Web Services (AWS) are likely to have significant data is already stored on Amazon S3.

This makes a good case for loading data in Spark from Amazon S3 and that is exactly what this recipe is about.

How to do it...

Let's start with the AWS portal:

  1. Go to http://aws.amazon.com and log in with your username and password.

  2. Once logged in, navigate to Storage & Content Delivery | S3 | Create Bucket:

  3. Enter the bucket name—for example, com.infoobjects.wordcount...