Book Image

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos
Book Image

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Table of Contents (16 chapters)
Programming MapReduce with Scalding
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Development lifecycle with testing strategy


The testing strategy described here is deeply intertwined with the software development lifecycle we follow. For data processing applications, everything starts with a data science phase, where we perform two tasks:

  • Data exploration: Analysis of the format, frequency of arrival, and contents of the data

  • Whiteboard design: Definition of the processing algorithm and the mathematical models to be used to generate features

These tasks are followed by two development tasks, which are:

  • TDD implementation: Conversion of the algorithm into a scalable MapReduce application using Scalding

  • Production deployment and monitoring: Execution, performance enhancement, and monitoring of the MapReduce job