Book Image

Architecting Data-Intensive Applications

By : Anuj Kumar
Book Image

Architecting Data-Intensive Applications

By: Anuj Kumar

Overview of this book

<p>Are you an architect or a developer who looks at your own applications gingerly while browsing through Facebook and applauding it silently for its data-intensive, yet ?uent and efficient, behaviour? This book is your gateway to build smart data-intensive systems by incorporating the core data-intensive architectural principles, patterns, and techniques directly into your application architecture.</p> <p>This book starts by taking you through the primary design challenges involved with architecting data-intensive applications. You will learn how to implement data curation and data dissemination, depending on the volume of your data. You will then implement your application architecture one step at a time. You will get to grips with implementing the correct message delivery protocols and creating a data layer that doesn’t fail when running high traffic. This book will show you how you can divide your application into layers, each of which adheres to the single responsibility principle. By the end of this book, you will learn to streamline your thoughts and make the right choice in terms of technologies and architectural principles based on the problem at hand.</p>
Table of Contents (18 chapters)
Title Page
Packt Upsell
Contributors
Preface
Index

What is a data ecosystem?


An ecosystem is defined as a complex set of relationships between interconnected elements and their environments. For example, the social construct around our daily lives is an ecosystem. We depend on the state to provide us with basic necessities, including food, water, and gas. We rely on our local stores for our daily needs, and so on. Our livelihood is directly or indirectly dependent upon the social construct of our society. The inter-dependency, as well as the inter-connectivity of these social elements, is what defines a society.

 

Along the same lines, a data ecosystem can be defined as a complex set of possibly interconnected data and the environment from which that data originates. Data from social websites, such as Twitter, Facebook, and Instagram; data from connected devices, such as sensors; data from the (Industrial) Internet of Things; SCADA systems; data from your phone; and data from your home router, all constitute a data ecosystem to some extent. As we will see in the following sections, this huge variety of data, when connected, can be really useful in providing insights into previously undiscovered business opportunities. 

A complex set of interconnected data

What this section implies is that data can be a collection of structured, semi-structured, or unstructured data (hence, a complex set). Additionally, data collected from different sources may relate to one another, in some form or other. To put it in perspective, let's look at a very simple use case, where data from different sources can be connected. Imagine you have an online shopping website and you would like to recommend to your visitors the things that they would most probably want to buy. For the recommendation to succeed, you may need a lot of relevant information about the person. You may want to know what a person likes/dislikes, what they have been searching for in the last few days, what they have been tweeting about, and what topics they are discussing in public forums. All these constitute different sources of data and, even though, at first glance, it may appear that the data from individual sources is not connected, the reality is that all the data pertains to one individual, and their likes and dislikes. Establishing such connections in different data sources is key for an organization when it comes to quickly turning an idea into a business opportunity.

Data environment

The environment in which the data originates is as important as the data itself. The environment provides us with the contextual information to attach to the data, which may further help us in making the correct decision. Having contextual information helps us to understand the relevancy as well as the reliability of the data source, which ultimately feeds into the decision-making process. The environment also tells us about the data lineage (to be discussed in detail in Chapter 12, When Data Dissemination Is as Important as Data Itself), which helps us to understand whether the data has been modified during its journey or not and, if it has, how it affects our use case.

Each organization has its own set of data sources that constitute their specific data ecosystem. Remember that one organization's data sources may not be the same as another organization's.

The data evangelist within the organization should always focus on identifying which sources of data are more relevant than others for a given set of use cases that the organization is trying to resolve.

This feeds into our next topic, what constitutes a data ecosystem?