Book Image

Serverless Design Patterns and Best Practices

By : Brian Zambrano
Book Image

Serverless Design Patterns and Best Practices

By: Brian Zambrano

Overview of this book

Serverless applications handle many problems that developers face when running systems and servers. The serverless pay-per-invocation model can also result in drastic cost savings, contributing to its popularity. While it's simple to create a basic serverless application, it's critical to structure your software correctly to ensure it continues to succeed as it grows. Serverless Design Patterns and Best Practices presents patterns that can be adapted to run in a serverless environment. You will learn how to develop applications that are scalable, fault tolerant, and well-tested. The book begins with an introduction to the different design pattern categories available for serverless applications. You will learn thetrade-offs between GraphQL and REST and how they fare regarding overall application design in a serverless ecosystem. The book will also show you how to migrate an existing API to a serverless backend using AWS API Gateway. You will learn how to build event-driven applications using queuing and streaming systems, such as AWS Simple Queuing Service (SQS) and AWS Kinesis. Patterns for data-intensive serverless application are also explained, including the lambda architecture and MapReduce. This book will equip you with the knowledge and skills you need to develop scalable and resilient serverless applications confidently.
Table of Contents (18 chapters)
Title Page
Copyright and Credits
Dedication
Packt Upsell
Contributors
Preface
Index

Processing Enron emails with serverless MapReduce


I've based our example application on the Enron email corpus, which is publicly available on Kaggle. This data is made up of some 500,000 emails from the Enron corporation. In total, this dataset is approximately 1.5 GB. What we will be doing is counting the number of From-To emails. That is, for each person who sent an email, we will generate a count of the number of times they sent to a particular person.

Note

Anyone may download and work with this dataset: https://www.kaggle.com/wcukierski/enron-email-dataset. The original data from Kaggle comes as a single file in CSV format. To make this data work with this example MapReduce program, I broke the single ~1.4 GB file into roughly 100 MB chunks. During this example, it's important to remember that we are starting from 14 separate files on S3.

The data format in our dataset is a CSV with two columns, the first being the email message location (on the mail server, presumably) and the second...