What is a data ecosystem?
An ecosystem is defined as a complex set of relationships between interconnected elements and their environments. For example, the social construct around our daily lives is an ecosystem. We depend on the state to provide us with basic necessities, including food, water, and gas. We rely on our local stores for our daily needs, and so on. Our livelihood is directly or indirectly dependent upon the social construct of our society. The inter-dependency, as well as the inter-connectivity of these social elements, is what defines a society.
Along the same lines, a data ecosystem can be defined as a complex set of possibly interconnected data and the environment from which that data originates. Data from social websites, such as Twitter, Facebook, and Instagram; data from connected devices, such as sensors; data from the (Industrial) Internet of Things; SCADA systems; data from your phone; and data from your home router, all constitute a data ecosystem to some extent. As we will see in the following sections, this huge variety of data, when connected, can be really useful in providing insights into previously undiscovered business opportunities.
A complex set of interconnected data
What this section implies is that data can be a collection of structured, semi-structured, or unstructured data (hence, a complex set). Additionally, data collected from different sources may relate to one another, in some form or other. To put it in perspective, let's look at a very simple use case, where data from different sources can be connected. Imagine you have an online shopping website and you would like to recommend to your visitors the things that they would most probably want to buy. For the recommendation to succeed, you may need a lot of relevant information about the person. You may want to know what a person likes/dislikes, what they have been searching for in the last few days, what they have been tweeting about, and what topics they are discussing in public forums. All these constitute different sources of data and, even though, at first glance, it may appear that the data from individual sources is not connected, the reality is that all the data pertains to one individual, and their likes and dislikes. Establishing such connections in different data sources is key for an organization when it comes to quickly turning an idea into a business opportunity.
Data environment
The environment in which the data originates is as important as the data itself. The environment provides us with the contextual information to attach to the data, which may further help us in making the correct decision. Having contextual information helps us to understand the relevancy as well as the reliability of the data source, which ultimately feeds into the decision-making process. The environment also tells us about the data lineage (to be discussed in detail in Chapter 12, When Data Dissemination Is as Important as Data Itself), which helps us to understand whether the data has been modified during its journey or not and, if it has, how it affects our use case.
Each organization has its own set of data sources that constitute their specific data ecosystem. Remember that one organization's data sources may not be the same as another organization's.
The data evangelist within the organization should always focus on identifying which sources of data are more relevant than others for a given set of use cases that the organization is trying to resolve.
This feeds into our next topic, what constitutes a data ecosystem?