While flat files and databases are the most common types of sources that you use from PDI, there are many other types of data sources available. People have started to leverage the capabilities of tools such as Hadoop, NoSQL databases, and cloud services. In this section, you will learn to connect, read data from, and load data into some of these big data sources with PDI.
S3 is a scalable storage space and is a common location for files to be processed. If you have files in S3 and want to read them, you don't have to download them. PDI allows you to read those files directly from the Amazon Web Services (AWS) Simple Storage Service (S3) instance.
The step that you will use for doing this is the S3 CSV Input
step. This step works very similarly to the CSV Input
step. The biggest difference is that to access the file, you have to provide the bucket name where the file is located and the dual keys to access the bucket...