Book Image

Amazon EC2 Cookbook

Book Image

Amazon EC2 Cookbook

Overview of this book

Discover how to perform a complete forensic investigation of large-scale Hadoop clusters using the same tools and techniques employed by forensic experts. This book begins by taking you through the process of forensic investigation and the pitfalls to avoid. It will walk you through Hadoop’s internals and architecture, and you will discover what types of information Hadoop stores and how to access that data. You will learn to identify Big Data evidence using techniques to survey a live system and interview witnesses. After setting up your own Hadoop system, you will collect evidence using techniques such as forensic imaging and application-based extractions. You will analyze Hadoop evidence using advanced tools and techniques to uncover events and statistical information. Finally, data visualization and evidence presentation techniques are covered to help you properly communicate your findings to any audience.
Table of Contents (15 chapters)
Amazon EC2 Cookbook
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface
Index

Introduction


You need to ask yourself several questions in order to choose the right AWS EC2 instance for meeting your requirements. These include: What is the primary purpose of the EC2 instance being provisioned? What is the duration of your need for a particular machine? Do you need high performance storage? Should you go for dedicated or shared tenancy? Will the machine be used for compute-intensive or memory-intensive processing? What are the scalability, availability, and security requirements? What are your networking requirements? There are several options available for each of these parameters, and we will describe them in our recipes for making the right choices. For low latency, you can host your application in the AWS region nearest to the end user. Each AWS region is a separate geographic area, and has multiple isolated locations called availability zones. These availability zones are individual data centers in each region. They are used to deploy fault-tolerant and highly available applications. The latency between these availability zones is very low. If something goes wrong in an availability zone, then it does not affect the systems in another availability zone.