Let us look at some scaling methods on different platforms in detail in the following sections.
Before the arrival of advanced systems, such as Hadoop or Spark, developers had to handle the problem of horizontal scaling. What are the methods they used?
The most basic form of horizontal scaling is multi-threading or multi-processing. These two approaches are similar, since both use multiple threads on a single machine to break the data into chunks and then execute the computation in parallel. The typical difference between a thread and a process is that threads (of the same process) run in a shared memory space, while processes run in separate memory spaces.
Parallel computation has one fundamental limitation: it is restricted by the resources of the single machine.
Let's see a hands-on example of parallel computation. For this we will use Python's native multiprocessing library.
from multiprocessing import Pool def f(x): return...