In our use case, we will process the logs of an advertising campaign to determine the most effective campaigns. The batch processing mechanism will process a single large flat file using a Pig script. Pig is a high-level language that allows users to perform data transformation and analysis. Pig is similar to SQL and compiles down into map/reduce jobs that typically deploy and run on Hadoop infrastructure.
In this chapter, we will convert the Pig script into a topology and deploy that topology using Storm-YARN. This allows us to transition from a batch processing approach to one that is capable of ingesting and reacting to real-time events (for example, clicks on a banner advertisement).
In advertising, an impression is an advertising event that represents an advertisement displayed in front of a user, regardless of whether or not it was clicked. For our analysis, we will track each impression and use a field to indicate whether the user clicked on the advertisement...