Let's talk a little bit about what happens when a new container becomes available. On start up, the container process notifies the catalog server that an additional container is available for use:
Once the catalog server knows about the new container, it assigns several partitions to it using a waterfall algorithm. As more containers become available, partitions are migrated from the containers that contain the most partitions, to the containers that contain the fewest partitions. In the previous figure, each of the three existing containers holds three partitions.
When the fourth container starts, the catalog server assigns two shards to it (as seen above). This is done to keep the distribution as even as possible. In this situation, we end up with container0
holding one more shard than the other containers. As it contains an extra shard, it must do 50 percent more work than the other containers.
In order to reduce the disparity in container workload, we must make...