Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.
Table of Contents (27 chapters)
Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
Index

Time for action – starting and customizing Spoon


In this tutorial you're going to launch the PDI graphical designer and get familiarized with its main features.

  1. Start Spoon.

    • If your system is Windows, type the following command:

      Spoon.bat
      
    • In other platforms such as Unix, Linux, and so on, type:

      Spoon.sh
      
    • If you didn't make spoon.sh executable, you may type:

      sh Spoon.sh
      
  2. As soon as Spoon starts, a dialog window appears asking for the repository connection data. Click the No Repository button. The main window appears. You will see a small window with the tip of the day. After reading it, close that window.

  3. A welcome! window appears with some useful links for you to see.

  4. Close the welcome window. You can open that window later from the main menu.

  5. Click Options... from the Edit menu. A window appears where you can change various general and visual characteristics. Uncheck the circled checkboxes:

  6. Select the tab window Look Feel.

  7. Change the Grid size and Preferred Language settings as follows:

  8. Click the OK button.

  9. Restart Spoon in order to apply the changes. You should neither see the repository dialog, nor the welcome window. You should see the following screen instead:

What just happened?

You ran for the first time the graphical designer of PDI Spoon, and applied some custom configuration.

From the Look Feel configuration window, you changed the size of the dotted grid that appears in the canvas area while you are working. You also changed the preferred language. In the Option tab window, you chose not to show either the repository dialog or the welcome window at startup. These changes were applied as you restarted the tool, not before.

The second time you launched the tool, the repository dialog didn't show up. When the main window appeared, all the visible texts were shown in French, which was the selected language, and instead of the welcome window, there was a blank screen.

Spoon

This tool that you're exploring in this section is the PDI's desktop design tool. With Spoon you design, preview, and test all your work, that is, transformations and jobs. When you see PDI screenshots, what you are really seeing are Spoon screenshots. The other PDI components that you will meet in the following chapters are executed from terminal windows.

Setting preferences in the Options window

In the tutorial you changed some preferences in the Options window. There are several look and feel characteristics you can change beyond those you changed. Feel free to experiment with this setting.

Note

Remember to restart Spoon in order to see the changes applied.

Tip

If you choose any language as preferred language other than English, you should select a different language as alternative. If you do so, every name or description not translated to your preferred language will be shown in the alternative language.

Just for the curious people: Italian and French are the overall winners of the list of languages to which the tool has been translated from English. Below them follow Korean, Argentinean Spanish, Japanese, and Chinese.

One of the settings you changed was the appearance of the welcome window at start up. The welcome window has many useful links, all related with the tool: wiki pages, news, forum access, and more. It's worth exploring them.

Tip

You don't have to change the settings again to see the welcome window. You can open it from the menu Help | Show the Welcome Screen.

Storing transformations and jobs in a repository

The first time you launched Spoon, you chose No Repository. After that, you configured Spoon to stop asking you for the Repository option. You must be curious about what the repository is and why not to use it. Let's explain it.

As said, the results of working with PDI are Transformations and Jobs. In order to save the Transformations and Jobs, PDI offers two methods:

  • Repository: When you use the repository method you save jobs and transformations in a repository. A repository is a relational database specially designed for this purpose.

  • Files: The files method consists of saving jobs and transformations as regular XML files in the filesystem, with extension kjb and ktr respectively.

The following diagram summarizes this:

You cannot mix the two methods (files and repository) in the same project. Therefore, you must choose the method when you start the tool.

Why did we choose not to work with repository, or in other words, to work with files? This is mainly for the following two reasons:

  • Working with files is more natural and practical for most users.

  • Working with repository requires minimum database knowledge and that you also have access to a database engine from your computer. Having both preconditions would allow you to learn working with both methods. However, it's probable that you haven't.

Throughout this book, we will use the file method. For details of working with repositories, please refer to Appendix A.

Creating your first transformation

Until now, you've seen the very basic elements of Spoon. For sure, you must be waiting to do some interesting task beyond looking around. It's time to create your first transformation.