In the previous chapters, we looked at the MapReduce and Spark programming APIs to write distributed applications. Although very powerful and flexible, these APIs come with a certain level of complexity and possibly require significant development time.
In an effort to reduce verbosity, we introduced the Pig and Hive frameworks, which compile domain-specific languages, Pig Latin and Hive QL, into a number of MapReduce jobs or Spark DAGs, effectively abstracting the APIs away. Both languages can be extended with UDFs, which is a way of mapping complex logic to the Pig and Hive data models.
At times when we need a certain degree of flexibility and modularity, things can get tricky. Depending on the use case and developer needs, the Hadoop ecosystem presents a vast choice of APIs, frameworks, and libraries. In this chapter, we identify four categories of users and match them with the following relevant tools:
Developers that want to avoid Java in favor of scripting MapReduce...