Book Image

Pentaho Data Integration Quick Start Guide

By : María Carina Roldán
Book Image

Pentaho Data Integration Quick Start Guide

By: María Carina Roldán

Overview of this book

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.
Table of Contents (15 chapters)

Chapter 3. Extracting Data

Extracting data is all about getting and combining data from different sources, before transforming it in different ways. PDI offers connectivity to a big list of data sources, including all kinds of databases, both commercial and open source. It can also connect to a wide variety of files, both structured and unstructured. The list includes CSV files, properties files, fixed-width text files, and proprietary formats. In particular, this chapter will explain how to get data from plain files and relational databases.

The following topics will be covered in this chapter:

  • Getting data from plain files
  • Getting data from relational databases
  • Getting data from other sources
  • Combining different sources into a single dataset