Chapter 6. Creating a Data Pipeline for Consistent Data Collection, Processing, and Dissemination
In a data intensive application, data travels in two directions in two different forms. One form of the data is data that is returned to the end users as part of a request. The process of gathering the data is usually synchronous and in a distributed system, which typically comes from a variety of data sources. Imagine we are building a context service where we want to know everything about a given IP address that tries to access our secured network. The use case would be that we want to block all IP addresses that we know are potentially from known malicious users. Typically, to keep the example discussion simple, what we would do is the following:
- Resolve the IP to the domain list pertaining to the IP address
- Check the IP address against the list of IP addresses that are marked as malicious
- Check the domain(s) against the set of domains that are marked as malicious
- Return the answer as good or...