Book Image

Pentaho Data Integration Quick Start Guide

By : María Carina Roldán
Book Image

Pentaho Data Integration Quick Start Guide

By: María Carina Roldán

Overview of this book

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.
Table of Contents (15 chapters)

Creating a simple transformation


Transformations and jobs are the main PDI artifacts. Transformations are data-flow oriented entities, while jobs are task-oriented. In this book, we will start by learning all about transformations, focusing on jobs later. To get a quick idea of what, exactly, a transformation is, we will start by creating a simple one. This will also allow you to see what it's like to work with Spoon.

Our first transformation will find out the current version of PDI (Kettle), and will print the value to the log. Proceed as follows:

  • On the Welcome page, click on the New transformation link, located under the WORK link group. Alternatively, press Ctrl + N.
  • A new tab will appear, with the title Transformation 1. It's in this tab that you will create your work.
  • To the left of the screen, under the Design tab, you'll see a tree of folders. Expand the Input folder by double-clicking on it.

Note

Note that if you work in macOS, a single click is enough.

  • Then, left-click on the Get System Info icon, and, without releasing the button, drag and drop the selected icon to the work area (that is, the blank area that occupies almost all of the screen). You should see something like this:

Dragging and dropping a step

  • Double-click on the Get System Info icon. A configuration window will show up. Fill in the first row in the grid, as shown in the following screenshot. Note that you don't have to type the Kettle version. Instead, you can choose it from a list of available options:

Configuring the Get System Info step

  • In the Design tab, double-click on the Utility folder, click on the Write to log icon, and drag and drop it to the work area.
  • Put the mouse cursor over the Get System Info icon and wait until a tiny toolbar shows up, as shown in the following screenshot:

Mouseover assistance toolbar

  • Click on the output connector (the icon highlighted in the preceding image) and drag it towards the Write to log icon. A greyed hop is displayed.
  • When the mouse cursor is over the Write to log step, release the button. A link (a hop, from now on) is created, from the first step to the second one. The screen should look as follows:

Connecting steps with a hop

Let's add some color note to our work, as follows:

  • Right-click anywhere in the work area to bring up a contextual menu.
  • In the menu, select the New Note... option. A note editor will appear.
  • Type a description, such as My first transformation. Select the Font style tab and choose a nice font and some colors for your note, and then click on OK. The following should be the final result:

My first transformation

  • Save the transformation by pressing Ctrl + S. PDI will ask for a destination folder. Select the folder of your choice, and give the transformation a name. PDI will save the transformation as a file with a ktr extension (for example, sample_transformation.ktr).

Finally, let's run the transformation to see what happens:

  • Click on the Run icon, located in the transformation toolbar:

Run icon in the transformation toolbar

  • A window named Run Options will appear. Click on Run.
  • At the bottom of the screen, you should see a log with the results of the execution:

Execution Results