In this chapter, you learned some useful techniques to tune a Cascading workflow. We discussed what to look at when determining performance characteristics. We looked at many best practices that can be considered when improving the performance of a Cascading application. You also learned how to make performance changes to the underlying Hadoop system when Cascading is running on this platform. We discussed how to effectively use checkpoints to help with processing time when failures occur. And finally, we looked at several tools that can help us to diagnose the performance issues of a Cascading application.
In the next chapter, we will finally put everything that you have learned so far together and create a real-world Cascading application. This application will be fully-featured. We will build a natural language processing (NLP) application that will process free form text files, and extract various forms of information from them using OpenNLP software. In this application, we will...