In previous chapters, we introduced different databases for different business-use cases. We also introduced the next-generation compute engine Spark for big data analytics. With these tools, we now have all the necessary building blocks for composing any AI data pipeline:
A typical data pipeline (not limited to AI) looks like the following:
- Collect user feedback from a user application.
- Store all user feedback and data in a data storage system.
- Extract raw user data from the data storage system.
- Preprocess raw data into a predefined format so that data science/AI applications can process it.
- Cook the processed data into a higher-level view so that business people such as product managers can digest it and make data-informed decisions.
Let's imagine you are working in a data-driven company such as Netflix. Data scientists are building data...