Book Image

NoSQL Data Models

By : Olivier Pivert
Book Image

NoSQL Data Models

By: Olivier Pivert

Overview of this book

Big Data environments are now to be handled in most current applications, this book addresses the latest issues and hurdles that are encountered in such environments. The book begins by presenting an overview of NoSQL languages and systems. Then, you’ll evaluate SPARQL queries over large RDF datasets and devise a solution that will use the MapReduce framework to process SPARQL graph patterns. Next, you’ll handle the production of web data, generate a set of links between two different datasets and overcome different heterogeneity problems. Moving ahead, you’ll take the multi-graph based approach to overcome challenges faced by the RDF data management community. Finally, you’ll deal with the flexible querying of graph databases and textual data management. By the end of this book, you’ll have gathered essential information on big data challenges faced by NoSQL databases.
Table of Contents (11 chapters)
Preface
8
List of Authors
9
Index
10
End User License Agreement

7.7. Experiments

In this section, we study the behavior of TDV computation, the filtering system in both centralized and NoSQL environments. We will also show the impact of several parameters (i.e. novelty threshold, diversity and size of the sliding window) with a real dataset of items. Finally, thanks to a user validation, we study the quality of our system with different settings and a periodic filtering based on a top-k approach.

7.7.1. Implementation and description of datasets

For our experiments, we used a subset from a real dataset of items acquired over an 8-month campaign from March to October 2010 [TRA 14]. Subscriptions were generated by using the ALIAS sampling method [WAL 77]. It produced 10M subscriptions that follow the distribution of term occurrences on the Web, and the Web query size reported in [BEI 04], based on the vocabulary of 1.5M distinct terms extracted from items. It is characterized among others by a maximum size equal to 12 terms and on average 2.2 terms...