Book Image

Hands-On Big Data Modeling

By : James Lee, Tao Wei, Suresh Kumar Mukhiya
Book Image

Hands-On Big Data Modeling

By: James Lee, Tao Wei, Suresh Kumar Mukhiya

Overview of this book

Modeling and managing data is a central focus of all big data projects. In fact, a database is considered to be effective only if you have a logical and sophisticated data model. This book will help you develop practical skills in modeling your own big data projects and improve the performance of analytical queries for your specific business requirements. To start with, you’ll get a quick introduction to big data and understand the different data modeling and data management platforms for big data. Then you’ll work with structured and semi-structured data with the help of real-life examples. Once you’ve got to grips with the basics, you’ll use the SQL Developer Data Modeler to create your own data models containing different file types such as CSV, XML, and JSON. You’ll also learn to create graph data models and explore data modeling with streaming data using real-world datasets. By the end of this book, you’ll be able to design and develop efficient data models for varying data sizes easily and efficiently.
Table of Contents (17 chapters)

Importance and implications of big data modeling and management

We have witnessed that big data is of economic and scientific significance. It is a scientific belief that the bigger the data utilized in research, the greater the accuracy. Data is generated every second in real life, which means the volume of data available can never diminish, but it will continue to grow. It is also important to recognize that much of this data explosion is the result of an explosion in devices located at the periphery of the network, including embedded sensors, smartphones, and tablet computers. All of this data creates new opportunities for data analysts in human genomics, healthcare, oil and gas, search, surveillance, finance, and many other areas. In this section, we are going to explore the various benefits of big data management, and in the next section we will discover various challenges of big data management in today's market.

Benefits of big data management

As mentioned, big data is a powerful tool. Thoughtful management of big data gives substantial breakthroughs and leads to more solid business decisions. In this section, we are going to discuss several benefits of big data management:

  • Accelerates revenue: When the data is managed correctly and efficiently, it gives value. Value helps in the acceleration of revenue for small or enterprise businesses.
  • Improved customer service: Several studies show that enterprises that use the previous data to gain business intelligence have improved their customer services as the mined models guide the business by overcoming bottlenecks in the current system.
  • Improves marketing: Big data analysis reveals a deeper analysis of business from the past and current data, and gives information about how to run the business in the future. This gives a guided path for how to deliver critical and innovative marketing solutions.
  • Increased efficiency: The identification of a new source of data has been made moderately easier with an introduction of high-speed tools such as Hadoop. These tools help businesses in analyzing data in real-time, and accelerate decision making.
  • Cost savings: Cloud-based services are getting attention these days and have been successfully used in a lot of enterprise data management. Tools such as Hadoop are cloud-based and are easier to handle. These systems help to reduce costs by providing easier interfaces on which to store, analyze, and visualize big data.
  • Improved accuracy of analytics: The accuracy and reliability of big data analytics have been uplifted by data-management practices. Data management services provide a better and cheaper way to turn data into business intelligence, thus increasing accuracy and the precision of analytics.

Challenges in big data management

With a huge explosion of data in several organizations, businesses have a keen interest in exploring solutions that provide opportunities and insights to increase profits in the business. However, it is still difficult to manage and maintain big data. Some of the major challenges in the big data management process are stated as follows:

  • Expanding data stores: Having an enormous volume of data involved, and the fact that it is continuously growing over time, makes data management very complex and challenging. It is also very critical to perform any sort of operation on this dataset as it can hinder the quality and performance of the analysis. It can be very complex to move a database into an analytical solution due to continuous expansion in data stores and data silos.
  • Data and structural complexity: Enterprises typically have both structured data and unstructured data, and that data resides in a very wide range of formats, including JSON, CSV, a document file, a text file, or BLOB data. An enterprise generally has several thousand applications on its systems, and every one of these applications might scan from and write to several distinct databases. As a result, simply cataloging what styles of data an organization has in its storage systems is often extraordinarily tough.
  • Assuring data quality: It is one of the essences for enterprises to ensure data reliability and accuracy. As mentioned, the deficit of synchronization across data silos and data warehouses can make it complicated for managers to understand which part of the data is accurate and complete. If a user enters the wrong data, the generated output is also incorrect. This is referred to as garbage in, garbage out (GIGO). This type of error is referred to as a human error.
  • Low staffing: It is difficult and challenging to find qualified staff with decent knowledge about the problem domain. A lack of data scientists, database administrators (DBA), data analysts, data modelers, and different big data professionals makes the job of data management very challenging.
  • Lack of executive support: Senior managers generally do not appreciate the importance and value of good data management. It is very difficult to convince them and show the roadmaps of how these management techniques would be beneficial for the organization. In other words, most of the executive managers are happy with their state-of-the-art solutions for the problem domain.