Book Image

Storm Blueprints: Patterns for Distributed Real-time Computation

Book Image

Storm Blueprints: Patterns for Distributed Real-time Computation

Overview of this book

Table of Contents (17 chapters)
Storm Blueprints: Patterns for Distributed Real-time Computation
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Examining the use case


In our use case, we will process the logs of an advertising campaign to determine the most effective campaigns. The batch processing mechanism will process a single large flat file using a Pig script. Pig is a high-level language that allows users to perform data transformation and analysis. Pig is similar to SQL and compiles down into map/reduce jobs that typically deploy and run on Hadoop infrastructure.

In this chapter, we will convert the Pig script into a topology and deploy that topology using Storm-YARN. This allows us to transition from a batch processing approach to one that is capable of ingesting and reacting to real-time events (for example, clicks on a banner advertisement).

In advertising, an impression is an advertising event that represents an advertisement displayed in front of a user, regardless of whether or not it was clicked. For our analysis, we will track each impression and use a field to indicate whether the user clicked on the advertisement...