Book Image

NoSQL Data Models

By : Olivier Pivert
Book Image

NoSQL Data Models

By: Olivier Pivert

Overview of this book

Big Data environments are now to be handled in most current applications, this book addresses the latest issues and hurdles that are encountered in such environments. The book begins by presenting an overview of NoSQL languages and systems. Then, you’ll evaluate SPARQL queries over large RDF datasets and devise a solution that will use the MapReduce framework to process SPARQL graph patterns. Next, you’ll handle the production of web data, generate a set of links between two different datasets and overcome different heterogeneity problems. Moving ahead, you’ll take the multi-graph based approach to overcome challenges faced by the RDF data management community. Finally, you’ll deal with the flexible querying of graph databases and textual data management. By the end of this book, you’ll have gathered essential information on big data challenges faced by NoSQL databases.
Table of Contents (11 chapters)
Preface
8
List of Authors
9
Index
10
End User License Agreement

1.5. Bibliography

[ABI 84] ABITEBOUL S., BIDOIT N., “Non first normal form relations to represent hierarchically organized data”, Proceedings of the 3rd ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, PODS’84, New York, USA, pp. 191–200, 1984.

[ALH 09] AL HAJJ HASSAN M., Parallelism and load balancing in the treatment of the join on distributed architectures, PhD thesis, University of Orléans, December 2009.

[ALH 15] AL HAJJ HASSAN M., BAMHA M., “Towards scalability and data skew handling in group by-joins using map reduce model”, International Conference On Computational Science - ICCS 2015, 51, Procedia Computer Science, Reykjavik, Iceland, pp. 70–79, June 2015.

[APA 17a] APACHE SOFTWARE FOUNDATION, Apache Hadoop 2.8, 2017.

[APA 17b] APACHE SOFTWARE FOUNDATION, Apache Spark, 2017.

[BEN 13] BENZAKEN V., CASTAGNA G., NGUYỄN K. et al., “Static and dynamic semantics of NoSQL languages”, SIGPLAN Not., ACM, vol. 48, no. 1, pp. 101–114, January 2013.

[BEN 18] BENZAKEN V., CASTAGNA G., DAYNÈS L. et al., “Language-integrated queries: a BOLDR approach”, The Web Conference 2018, Lyon, France, April 2018.

[BES 17] BESSE P., GUILLOUET B., LOUBES J.-M., “Big data analytics. Three use cases with R, Python and Spark”, in MAUMY-BERTRAND M., SAPORTA G., THOMAS-AGNAN C. (eds), Apprentissage Statistique et Données Massives, Journées d’Etudes en Statistisque, Technip, 2017.

[BEY 11] BEYER K.S., ERCEGOVAC V., GEMULLA R. et al., “Jaql: a scripting language for large scale semistructured data analysis”, PVLDB, vol. 4, no. 12, pp. 1272–1283, 2011.

[BLO 70] BLOOM B.H., “Space/time trade-offs in hash coding with allowable errors”, Communication ACM, ACM, vol. 13, no. 7, pp. 422–426, July 1970.

[BRE 00] BREWER E.A., “Towards robust distributed systems (abstract)”, Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, ACM, New York, USA, p. 7, 2000.

[CHE 13a] CHENEY J., LINDLEY S., RADANNE G. et al., “Effective quotation”, CoRR, vol. abs/1310.4780, 2013.

[CHE 13b] CHENEY J., LINDLEY S., WADLER P., “A practical theory of language-integrated query”, SIGPLAN Not., ACM, vol. 48, no. 9, pp. 403–416, September 2013.

[CHE 14] CHENEY J., LINDLEY S., WADLER P., “Query shredding: Efficient relational evaluation of queries over nested multisets (extended version)”, CoRR, vol. abs/1404.7078, 2014.

[COD 70] CODD E.F., “A relational model of data for large shared data banks”, Communication ACM, ACM, vol. 13, no. 6, pp. 377–387, June 1970.

[COU 15] COUILLEC Y., SERRANO M., “Requesting heterogeneous data sources with array comprehensions in Hop.js”, Proceedings of the 15th Symposium on Database Programming Languages, ACM, Pittsburgh, United States, p. 4, October 2015.

[CUR 11] CURÉ O., HECHT R., LE DUC C. et al., “Data integration over NoSQL stores using access path based mappings”, DEXA 2012, 6860 Lecture Notes in Computer Science, Toulouse, France, pp. 481–495, August 2011.

[DEA 04] DEAN J., GHEMAWAT S., “MapReduce: simplified data processing on large clusters”, Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6, USENIX Association, Berkeley, USA, p. 10, 2004.

[FAN 16] FANG Y., CHENG R., TANG W. et al., “Scalable algorithms for nearest-neighbor joins on big trajectory data”, IEEE Transactions on Knowledge and Data Engineering, Institute of Electrical and Electronics Engineers, vol. 28, no. 3, 2016.

[FIS 85] FISCHER P.C., SAXTON L.V., THOMAS S.J. et al., “Interactions between dependencies and nested relational structures”, Journal of Computer and System Sciences, vol. 31, no. 3, pp. 343–354, 1985.

[FRI 08] FRISCH A., CASTAGNA G., BENZAKEN V., “Semantic subtyping: dealing set-theoretically with function, union, intersection, and negation types”, Journal ACM, vol. 55, no. 4, pp. 19:1–19:64, September 2008.

[GIL 02] GILBERT S., LYNCH N., “Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services”, SIGACT News, vol. 33, no. 2, pp. 51–59, June 2002.

[GÓM 16] GÓMEZ P., CASALLAS R., RONCANCIO C., “Data schema does matter, even in NoSQL systems!”, 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS), Grenoble, France, June 2016.

[GRA 16] GRAUX D., JACHIET L., GENEVÈS P. et al., “SPARQLGX: efficient distributed evaluation of SPARQL with apache spark”, The 15th International Semantic Web Conference, Kobe, Japan, October 2016.

[GRE 15] GREENFIELD P., Keynote speech at PyData 2015: How Python Found its way into Astronomy, New York, USA, 2015.

[HAA 97] HAAS L.M., KOSSMANN D., WIMMERS E.L. et al., “Optimizing queries across diverse data sources”, Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB’97, San Francisco, USA, pp. 276–285, 1997.

[HUS 14] HUSSON A., “Une sémantique statique pour MongoDB”, Journées francophones des langages applicatifs 25, Fréjus, France, pp. 77–92, January 8–11 2014.

[KOL 16a] KOLEV B., PAU R., LEVCHENKO O. et al., “Benchmarking polystores: the CloudMdsQL experience”, in GADEPALLY V. (ed.), IEEE BigData 2016: Workshop on Methods to Manage Heterogeneous Big Data and Polystore Databases, IEEE Computing Society, Washington D.C., United States, December 2016.

[KOL 16b] KOLEV B., VALDURIEZ P., BONDIOMBOUY C. et al., “CloudMdsQL: querying heterogeneous cloud data stores with a common language”, Distributed and Parallel Databases, vol. 34, no. 4, pp. 463–503, December 2016.

[LAN 01] LANEY D., 3D Data Management: Controlling Data Volume, Velocity, and Variety, Report, META Group, February 2001.

[MIL 92] MILNER R., PARROW J., WALKER D., “A calculus of mobile processes, I”, Information and Computation, vol. 100, no. 1, pp. 1–40, 1992.

[PAR 92] PAREDAENS J., VAN GUCHT D., “Converting nested algebra expressions into flat algebra expressions”, ACM Transaction Database Systems, vol. 17, no. 1, pp. 65–93, March 1992.

[PHA 14] PHAN T.-C., Optimization for big joins and recursive query evaluation using intersection and difference filters in MapReduce, Thesis, Blaise Pascal University, July 2014.

[PHA 16] PHAN T.-C., D’ORAZIO L., RIGAUX P., “A theoretical and experimental comparison of filter-based equijoins in MapReduce”, Transactions on Large-Scale Data-and Knowledge-Centered Systems XXV, 9620 Lecture Notes in Computer Science, pp. 33–70, 2016.

[PIL 16] PILOURDAULT J., LEROY V., AMER-YAHIA S., “Distributed evaluation of top-k temporal joins”, Proceedings of the 2016 International Conference on Management of Data, SIGMOD’16, New York, USA, pp. 1027–1039, 2016.

[RAM 03] RAMAKRISHNAN R., GEHRKE J., Database Management Systems, McGraw-Hill, New York, 3rd ed., 2003.

[SER 16] SERRANO M., PRUNET V., “A Glimpse of Hopjs”, International Conference on Functional Programming (ICFP), ACM, Nara, Japan, p. 12, September 2016.

[VAN 17] VANDERPLAS J., Keynote speech at PyCon 2017, 2017.

[W3C 13] W3C, SPARQL 1.1 overview, 2013.

[W3C 14] W3C, RDF 1.1 Concepts and Abstract Syntax, 2014.

Chapter written by Kim NGUYỄN.