The 3 V's
The 3 V's stand for:
- Volume
- Variety
- Velocity
Volume
Today's world consists of petabytes of data being emitted by a variety of sources, be it social media, sensors, blockchain, video, audio, or even transactional. The data collected can be huge, depending on the nature of the business, but, if you are reading this book, it essentially means that you have huge volumes of data that you need to understand how to handle in an effective manner.
Variety
Variety refers to the different data formats. Relational databases, Excel files, or even simple text files are all examples of different data formats. A system should be capable of handling new varieties of data as and when they arrive. Extensibility is the key component for a data-intensive system when it comes to handling varieties of data. Data variety can be broadly classified into three major blocks:
- Structured: Data that has a well-defined schema associated with it, for example, relational data, and XML-formatted data.
- Semi-structured: Data whose structure can be anticipated but that does not always conform to a set standard. Examples include JSON-formatted data, and columnar data.
- Unstructured: Binary large object (BLOB) data, for example, video, and audio.
Velocity
Velocity denotes the speed at which the data arrives and becomes stale. There was a time when even one month-old data was considered fresh. In today's world, where social media has taken the place of traditional information sources and sensors have replaced human log books, we can't even rely on yesterday's data as it may have already become stale. The data moves at near real time and, if not processed properly and in time, may represent a lost opportunity for the business.
Until now, we have only discussed the data ecosystem, what it consists of, what requirements are associated with it in terms of the ability to share, and the types of data you can expect to collect. None of this will make sense unless we associate the data ecosystem and collection with the value drivers associated with that data for an organization.
Broadly speaking, any data that an organization decides to collect or use has two motivations/intentions behind it. Either the organization wants to use it for improving its own system/processes, or it wants to place itself strategically in a situation where it can generate new opportunities for itself.
Better decision-making processes, be they quicker or more proactive, are directly proportional to the revenue of a company.
Improvements in internal capabilities, either via automation or improved business process management, save time and money, thereby giving organizations more opportunities to innovate and, in turn, reducing costs further and opening up new business opportunities.
As you may have already noticed, this is a circle of dependencies and, once an organization can find a balance within this circle, the only way for it is upward.