We're going to rearchitect the example of Chapter 6, Deploy Real Applications on Swarm, so we'll deploy Spark on Swarm, but this time with a realistic networking and storage setup.
Spark storage backend usually runs on Hadoop, or on NFS when on filesystem. For jobs not requiring storage, Spark will create local data on workers, but for storage computations, you will need a shared filesystem on each node, which cannot be guaranteed automatically by Docker volume plugins (at least, so far).
A possibility to achieve that goal on Swarm is to create NFS shares on each Docker host, and then mount them transparently inside service containers.
Our focus here is not to illustrate Spark job details and their storage organization, but to introduce an opinionated storage option for Docker and give an idea of how to organize and scale a fairly-complex service on Docker Swarm.