Book Image

Learning Pentaho Data Integration 8 CE - Third Edition

Book Image

Learning Pentaho Data Integration 8 CE - Third Edition

Overview of this book

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag-and-drop design and powerful Extract-Tranform-Load (ETL) capabilities. This book shows and explains the new interactive features of Spoon, the revamped look and feel, and the newest features of the tool including transformations and jobs Executors and the invaluable Metadata Injection capability. We begin with the installation of PDI software and then move on to cover all the key PDI concepts. Each of the chapter introduces new features, enabling you to gradually get practicing with the tool. First, you will learn to do all kind of data manipulation and work with simple plain files. Then, the book teaches you how you can work with relational databases inside PDI. Moreover, you will be given a primer on data warehouse concepts and you will learn how to load data in a data warehouse. During the course of this book, you will be familiarized with its intuitive, graphical and drag-and-drop design environment. By the end of this book, you will learn everything you need to know in order to meet your data manipulation requirements. Besides, your will be given best practices and advises for designing and deploying your projects.
Table of Contents (23 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Introducing transformations


Till now, you've just opened and customized the look and feel of Spoon. It's time to do some interesting tasks beyond looking around. As mentioned before, in PDI we basically work with two kinds of artifacts: transformations and jobs. In this section, we will introduce transformations. First of all, we will introduce some basic definitions. Then, we will design, preview, and run our first Transformation.

The basics about transformations

A Transformation is an entity made of steps linked by hops. These steps and hops build paths through which data flows: the data enters or is created in a step, the step applies some kind of Transformation to it, and finally, the data leaves that step. Therefore, it's said that a Transformation is data flow oriented. Graphically, steps are represented with small boxes, while hops are represented by directional arrows, as depicted in the following sample:

Steps and hops

A Transformation itself is neither a program nor an executable file. It is just plain XML. The Transformation contains metadata, which tells the Kettle engine what to do.

A step is a minimal unit inside a Transformation. A big set of steps is available, either out of the box or the Marketplace, as explained before. These steps are grouped in categories, as, for example, input, output, or transform. Each step is conceived to accomplish a specific function, going from a simple task as reading a parameter to normalizing a dataset.

A hop is a graphical representation of data flowing between two steps: an origin and a destination. The data that flows through that hop constitutes the output data of the origin step and the input data of the destination step.

That's enough theory for now. Let's see it in practice.

Creating a Hello World! Transformation

In this section, we will design, preview, and run a simple Hello World! Transformation; simple, but good enough for our first practical example.

Designing a Transformation

Here are the steps to start working on our very first Transformation. All you need for starting is to have PDI installed:

  1. Open Spoon.From the main menu and navigate to File | New | Transformation.
  2. On the left of the screen, under the Design tab, you'll see a tree of Steps. Expand the Input branch by double-clicking on it.

Note

Note that if you work in Mac OS, a single click is enough.

  1. Then, left-click on the Data Grid icon and without releasing the button, drag and drop the selected icon to the main canvas. The screen will look like the following screenshot:

Dragging and dropping a step

Note

The dotted grid appeared as a consequence of the changes we made in the options window. Also, note that we changed the preferred language back to English.

  1. Double-click on the Data Grid step you just put on the canvas, and fill the Meta tab as follows:

Configuring a metadata tab

  1. Now select the Data tab and fill the grid with some names, as in the following screenshot. Then click on OK to close the window:

Filling a Data tab

  1. From the Steps tree, double-click on the Scripting branch, click on the User Defined Java Expression icon, and drag and drop it to the main canvas.
  2. Put the mouse cursor over the Data Grid step and wait until a tiny toolbar shows up succeeding the Data Grid icon, as shown next:

Mouseover assistance toolbar

  1. Click on the output connector (the icon highlighted in the preceding image) and drag it towards the User Defined Java Expression (UDJE) step. A greyed hop is displayed.
  1. When the mouse cursor is over the UDJE step, release the button. A link—a hop from now on is created from the Data Grid step to the UDJE step. The screen should look like this:

Connecting steps with a hop

  1. Double-click the UDJE icon and fill the grid as shown. Then close the window:

Configuring a UDJE step

Done! We have a draft for our first Transformation. A Data Grid with the names of a list of people, and a script step that builds the hello_message.

Before continuing, let's just add some color note to our work. This is totally optional, but as your work gets more complicated, it's highly recommended that you comment your transformations:

  1. Right-click anywhere on the canvas to bring a contextual menu.
  2. In the menu, select the New note option. A note editor appears.
  3. Type some description, such as Hello, World!. Select the Font style tab and choose some nice font and colors for your note, and then click on OK. This should be the final result:

Hello World Transformation

The final step is to save the work:

  1. From the main menu, navigate to Edit | Settings.... A window appears to specify Transformation properties. Fill the Transformation name textbox with a simple name, such as hello world. Fill the Description textbox with a short description, such as My first transformation. Finally, provide a more clear explanation in the Extended description textbox, and then click on OK.
  1. From the main menu, navigate to File | Save and save the Transformation in a folder of your choice with the name hello_world.

Next step is to preview the data produced and run the Transformation.

Previewing and running a Transformation

Now we will preview and run the Transformation created earlier. Note the difference between both:

  • The Preview functionality allows you to see a sample of the data produced for selected steps
  • The Run option effectively runs the whole Transformation

In our Transformation, we will preview the output of the User Defined Java Expression step:

  1. Select the User Defined Java Expression step by left-clicking on it.
  2. Click on the Preview icon in the bar menu preceding in the main canvas: 

Preview icon in the Transformation toolbar

  1. The Transformation debug dialog window will appear. Click on the Quick Launch button.
  1. A window will appear to preview the data generated by the Transformation, as shown in the following screenshot:

Previewing the Hello World Transformation

  1. Close the preview window.

Note

You can preview the output of any step in the Transformation at any time of your designing process. You can also preview the data even if you haven't yet saved the work.

Once we have the Transformation ready, we can run it:

  1. Click on the Run icon:

Run icon in the Transformation toolbar

  1. A window named Run Options appears. Click on Run.

Note

You need to save the Transformation before you run it. If you have modified the Transformation without saving it, you will be prompted to do so.

  1. At the bottom of the screen, you should see a log with the result of the execution:

Sample execution results window

Whether you preview or run a Transformation, you'll get an Execution Results window showing what happened. You will learn more about this in Chapter 2, Getting Started with Transformations.