Book Image

Pentaho Data Integration Quick Start Guide

By : María Carina Roldán
Book Image

Pentaho Data Integration Quick Start Guide

By: María Carina Roldán

Overview of this book

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.
Table of Contents (15 chapters)

Understanding the Kettle home directory


When you run Spoon for the first time, a folder named .kettle is created in your home directory by default. This folder is referred to as the Kettle home directory.

The folder contains several configuration files, mainly created and updated by the different PDI tools. Among these files, there is the kettle.properties file.

The purpose of the kettle.properties file – created along with the .kettle folder, the first time you run Spoon – is to contain variable definitions with a broad scope: Java Virtual Machine. Therefore, it's the perfect place to define general settings; some examples are as follows:

  • Database connection settings: host, database name, and so on
  • SMTP settings: SMTP server, port, and so on
  • Common input and output folders
  • Directory to send log files to

Before continuing, let's add some variables to the file. Suppose that you have two folders, named C:/PDI/INPUT and C:/PDI/OUTPUT, which you will use for storing files. The objective will be to add two variables, named INPUT_FOLDER and OUTPUT_FOLDER, containing those values:

  1. Locate the Kettle home directory. If you work in Windows, the folder could be C:\Documents and Settings\<your_name> or C:\Users\<your_name>, depending on which Windows version you have. If you work in Linux (or similar) or macOS, the folder will most likely be /home/<your_name>/.
  2. Edit the kettle.properties file. You will see that it only contains commented sample lines.
  3. You can safely remove the contents of the file and define your own variables by typing the following lines:
       INPUT_FOLDER=C:/PDI/INPUT
       OUTPUT_FOLDER=C:/PDI/OUTPUT

Save the file and restart Spoon, so that it can recognize the variables defined in the file. We will learn how to use these variables in Chapter 2Getting Familiar with Spoon.