Book Image

NoSQL Data Models

By : Olivier Pivert
Book Image

NoSQL Data Models

By: Olivier Pivert

Overview of this book

Big Data environments are now to be handled in most current applications, this book addresses the latest issues and hurdles that are encountered in such environments. The book begins by presenting an overview of NoSQL languages and systems. Then, you’ll evaluate SPARQL queries over large RDF datasets and devise a solution that will use the MapReduce framework to process SPARQL graph patterns. Next, you’ll handle the production of web data, generate a set of links between two different datasets and overcome different heterogeneity problems. Moving ahead, you’ll take the multi-graph based approach to overcome challenges faced by the RDF data management community. Finally, you’ll deal with the flexible querying of graph databases and textual data management. By the end of this book, you’ll have gathered essential information on big data challenges faced by NoSQL databases.
Table of Contents (11 chapters)
Preface
8
List of Authors
9
Index
10
End User License Agreement

2.5. SPARQL on Apache Spark

2.5.1. Apache Spark

Apache Spark [ZAH 10] is a cluster computing engine which can be understood as a main memory extension of the MapReduce model, enabling parallel computations on unreliable machines and automatic locality-aware scheduling, fault tolerance and load balancing. While both Spark and Hadoop are based on a data flow computation model, Spark is more efficient than Hadoop for applications requiring the frequent reuse of working data sets across multiple parallel operations. This efficiency is mainly due to two complementary distributed main memory data abstractions, as shown in Figure 2.6: (i) Resilient Distributed Data sets (RDD) [ZAH 12], a distributed, lineage-supported, fault-tolerant memory data abstraction for in-memory computations (when Hadoop is mainly disk-based) and (ii) Data Frames (DF), a compressed and schema-enabled data abstraction. Both data abstractions ease the programming task by natively supporting a subset of relational operators...