Cascalog comes with a number of operators; however, for most analyses, we'll need to define our own.
For different uses, Cascalog defines a number of different categories of operators, each with different properties. Some are run in the Map phase of processing, and some are run in the Reduce phase. The ones in the Map phase can use a number of extra optimizations, so if we can push some of our processing into that stage, we'll get better performance. In this recipe, we'll see which categories of operators are Map-side and which are Reduce-side. We'll also provide an example of each, and see how they fit into the larger processing model.
For this recipe, we'll use the same dependencies and includes that we did in the Distributed processing with Cascalog and Hadoop recipe. We'll also use the Doctor Who companion data from that recipe.