-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Fast Data Processing Systems with SMACK Stack
By :
The Spark soul is the resilient distributed dataset. Spark has four design goals: make in-memory (Hadoop is not in-memory) data storage, distribute in a cluster, be fault tolerant, and be fast and efficient.
Fault tolerance is achieved, in part, by applying linear operations on small data chunks. Efficiency is achieved by parallelization of operations throughout all parts of the cluster. Performance is achieved by minimizing data replication between cluster members.
A fundamental concept in Spark is that there are only two types of operations we can do on an RDD:
It's right when people say that computer science is mathematics with a costume. As we've already seen, in functional programming, functions are first-class citizens; the equivalent...