Book Image

Pentaho Data Integration Quick Start Guide

By : María Carina Roldán
Book Image

Pentaho Data Integration Quick Start Guide

By: María Carina Roldán

Overview of this book

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.
Table of Contents (15 chapters)

Getting data from relational databases


Relational databases are some of the most common sources of data in any ETL process. PDI enables you to connect and get data from multiple RDBMS engines, including the most popular (for example, Oracle or MySQL) but also other engines. The only prerequisite is that there exists the proper JDBC driver. In this section, you will learn how to connect to, explore, and get data from a database.

Connecting to a database and using the database explorer

There are two things that you must do in order to connect to a database, if you intend to use its data inside PDI:

  • Install the proper JDBC driver
  • Create a connection to the database

Note

For demonstration purposes, we will connect to a PostgreSQL engine where we have installed a sports database, available for download at http://www.sportsdb.org/sd that you have a JDBC drive/samples.

Make sure that you have a JDBC driver, a .jar file – for the engine. Once you have it, you will have to copy it into the lib folder in...