Imagine a situation where we have to carry out jobs at a certain throughput, such that each job includes the same sequence of a differently sized I/O task (task A), a memory-bound task (task B), and again an I/O task (task C). A naive approach would be to create a thread pool and run each job off it, but soon we realize that this is not optimum because we cannot ascertain the utilization of each I/O resource due to unpredictability of the threads being scheduled by the OS. We also observe that even though several concurrent jobs have similar I/O tasks, we are unable to batch them in our first approach.
As the next iteration, we split each job in to stages (A, B, and C) such that each stage corresponds to one task. Since the tasks are well known, we create one thread pool of appropriate size per stage and execute tasks in them. The result of task A is required by task B, and B's result is required by task C—we enable this communication via queues. Now, we can tune the...