Book Image

NoSQL Data Models

By : Olivier Pivert
Book Image

NoSQL Data Models

By: Olivier Pivert

Overview of this book

Big Data environments are now to be handled in most current applications, this book addresses the latest issues and hurdles that are encountered in such environments. The book begins by presenting an overview of NoSQL languages and systems. Then, you’ll evaluate SPARQL queries over large RDF datasets and devise a solution that will use the MapReduce framework to process SPARQL graph patterns. Next, you’ll handle the production of web data, generate a set of links between two different datasets and overcome different heterogeneity problems. Moving ahead, you’ll take the multi-graph based approach to overcome challenges faced by the RDF data management community. Finally, you’ll deal with the flexible querying of graph databases and textual data management. By the end of this book, you’ll have gathered essential information on big data challenges faced by NoSQL databases.
Table of Contents (11 chapters)
Preface
8
List of Authors
9
Index
10
End User License Agreement

2.1. Introduction

The Semantic Web is rapidly growing, generating large volumes of Resource Description Framework (RDF) data [W3C 14] stored in the Linked Open Data (LOD) cloud. With data sets ranging from hundreds of millions to billions of triples, RDF triple stores are expected to meet properties such as scalability, high availability, automatic work distribution and fault tolerance. This chapter is dedicated to the problem of evaluating SPARQL queries over large RDF datasets. Section 2.2 introduces the RDF data model and the SPARQL query language. The challenges and solutions for efficiently processing SPARQL queries and in particular basic graph pattern (BGP) expressions are presented in section 2.3. The specific solution using the MapReduce framework for processing SPARQL graph patterns [DEA 04] is introduced in section 2.4. The chapter concludes with section 2.5, describing the use of Apache Spark and explaining the importance of the physical data layers for the query performance...