Book Image

NoSQL Data Models

By : Olivier Pivert
Book Image

NoSQL Data Models

By: Olivier Pivert

Overview of this book

Big Data environments are now to be handled in most current applications, this book addresses the latest issues and hurdles that are encountered in such environments. The book begins by presenting an overview of NoSQL languages and systems. Then, you’ll evaluate SPARQL queries over large RDF datasets and devise a solution that will use the MapReduce framework to process SPARQL graph patterns. Next, you’ll handle the production of web data, generate a set of links between two different datasets and overcome different heterogeneity problems. Moving ahead, you’ll take the multi-graph based approach to overcome challenges faced by the RDF data management community. Finally, you’ll deal with the flexible querying of graph databases and textual data management. By the end of this book, you’ll have gathered essential information on big data challenges faced by NoSQL databases.
Table of Contents (11 chapters)
Preface
8
List of Authors
9
Index
10
End User License Agreement

3.4. Techniques applied to the data linking process

Identity link discovery (also called linkset discovery) requires a three step process to identify equivalent resources across different datasets: prepare data (preprocessing, step 1), align resources (instance matching, step 2) and fix erroneous links generated between some of them (post-processing, step 3). First, the resources need to be represented in a uniform manner. This preprocessing proves necessary when we deal with different vocabularies, when resources are valued by using different languages, or when the number of resources and properties to be compared is too high. To establish links, it is important to compare resources regarding their values. However, the comparison can be done at different levels going from the URI of resources to the description of their neighborhoods in the RDF graph. Finally, once equivalent resources are connected, some systems perform an additional step to evaluate the generated links and therefore...