Book Image

Pentaho Data Integration Quick Start Guide

By : María Carina Roldán
Book Image

Pentaho Data Integration Quick Start Guide

By: María Carina Roldán

Overview of this book

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.
Table of Contents (15 chapters)


In this chapter, you learned how to get data from different sources, converting it to PDI datasets.

First, you learned how to get data from plain files stored in your local system. You also had the opportunity to configure PDI to access compressed files and files stored in a Google Drive.

Having worked with files, you started to interact with relational databases. You learned how to configure a connection to a database, how to explore its content, and how to get data from it.

Finally, you were presented with sources other than plain files and databases, including XML and JSON sources and sources of system-related information.

Having explored the different options for getting external information, you learned how to combine two or more datasets into a single one. This task will be used not only for extracting and combining external sources but in many situations in your daily PDI work.

Now that you have the data, you are ready to transform it. You will learn how to do so in the next chapter...