Book Image

Modern Big Data Processing with Hadoop

By : V Naresh Kumar, Manoj R Patil, Prashant Shindgikar
Book Image

Modern Big Data Processing with Hadoop

By: V Naresh Kumar, Manoj R Patil, Prashant Shindgikar

Overview of this book

The complex structure of data these days requires sophisticated solutions for data transformation, to make the information more accessible to the users.This book empowers you to build such solutions with relative ease with the help of Apache Hadoop, along with a host of other Big Data tools. This book will give you a complete understanding of the data lifecycle management with Hadoop, followed by modeling of structured and unstructured data in Hadoop. It will also show you how to design real-time streaming pipelines by leveraging tools such as Apache Spark, and build efficient enterprise search solutions using Elasticsearch. You will learn to build enterprise-grade analytics solutions on Hadoop, and how to visualize your data using tools such as Apache Superset. This book also covers techniques for deploying your Big Data solutions on the cloud Apache Ambari, as well as expert techniques for managing and administering your Hadoop cluster. By the end of this book, you will have all the knowledge you need to build expert Big Data systems.
Table of Contents (12 chapters)

Data as a Service

Data as a Service (DaaS) is a concept that has become popular in recent times due to the increase in adoption of cloud. When it comes to data. It might some a little confusing that how can data be added to as a service model?

DaaS offers great flexibility to users of the service in terms of not worrying about the scale, performance, and maintenance of the underlying infrastructure that the service is being run on. The infrastructure automatically takes care of it for us, but given that we are dealing with a cloud model, we have all the benefits of the cloud such as pay as you go, capacity planning, and so on. This will reduce the burden of data management.

If we try to understand this carefully we are taking out the data management part alone. But data governance should be well-defined here as well or else we will lose all the benefits of the service model.

So far, we are talking about the Service in the cloud concept. Does it mean that we cannot use this within the Enterprise or even smaller organizations? The answer is No. Because this is a generic concept that tells us the following things.

When we are talking about a service model, we should keep in mind the following things, or else chaos will ensue:

  • Authentication
  • Authorization
  • Auditing

This will guarantee that only well-defined users, IP addresses, and services can access the data exposed as a service.

Let's take an example of an organization that has the following data:

  • Employees
  • Servers and data centers
  • Applications
  • Intranet documentation sites

As you can see, all these are independent datasets. But, as a whole when we want the organization to succeed. There is lot of overlap and we should try to embrace the DaaS model here so that all these applications that are authoritative for the data will still manage the data. But for other applications, they are exposed as a simple service using REST API; therefore, this increases collaboration and fosters innovation within the organization.

Let's take further examples of how this is possible:

  • The team that manages all the employee data in the form of a database can provide a simple Data Service. All other applications can use this dataset without worrying about the underlying infrastructure on which this employee data is stored:
    • This will free the consumers of the data services in such a way that the consumers:
      • Need not worry about the underlying infrastructure
      • Need not worry about the protocols that are used to communicate with these data servers
      • Can just focus on the REST model to design the application
    • Typical examples of this would be:
      • Storing the employee data in a database like LDAP or the Microsoft Active directory
  • The team that manages the infrastructure for the entire organization can design their own system to keep off the entire hardware inventory of the organization, and can provide a simple data service. The rest of the organization can use this to build applications that are of interest to them:
    • This will make the Enterprise more agile
    • It ensures there is a single source of truth for the data about the entire hardware of the organization
    • It improves trust in the data and increases confidence in the applications that are built on top of this data
  • Every team in the organization might use different technology to build and deploy their applications on to the servers. Following this, they also need to build a data store that keeps track of the active versions of software that are deployed on the servers. Having a data source like this helps the organization in the following ways:
    • Services that are built using this data can constantly monitor and see where the software deployments are happening more often
    • The services can also figure out which applications are vulnerable and are actively deployed in production so that further action can be taken to fix the loopholes, either by upgrading the OS or the software
    • Understanding the challenges in the overall software deployment life cycle
    • Provides a single platform for the entire organization to do things in a standard way, which promotes a sense of ownership
  • Documentation is one of the very important things for an organization. Instead of running their own infrastructure, with the DaaS model, organizations and teams can focus on the documents that are related to their company and pay only for those. Here, services such as Google Docs and Microsoft Office Online are very popular as they give us flexibility to pay as we go and, most importantly, not worry about the technology required to build these.
    • Having such a service model for data will help us do the following:
      • Pay only for the service that is used
      • Increase or decrease the scale of storage as needed
      • Access the data from anywhere if the service is on the Cloud and connected to the internet
      • Access corporate resources when connected via VPN as decided by the Enterprise policy

In the preceding examples, we have seen a variety of applications that are used in Enterprises and how data as a model can help Enterprises in variety of ways to bring collaboration, innovation, and trust.

But, when it comes to big data, what can DaaS Do?

Just like all other data pieces, big data can also be fit into a DaaS model and provides the same flexibility as we saw previously:

  • No worry about the underlying hardware and technology
  • Scale the infrastructure as needed
  • Pay only for the data that is owned by the Enterprise
  • Operational and maintenance challenges are taken away
  • Data can be made geographically available for high availability
  • Integrated backup and recovery for DR requirements

With these few advantages, enterprises can be more agile and build applications that can leverage this data as service.