Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Book Image

Pentaho 3.2 Data Integration: Beginner's Guide

Overview of this book

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.
Table of Contents (27 chapters)
Pentaho 3.2 Data Integration Beginner's Guide
Credits
Foreword
The Kettle Project
About the Author
About the Reviewers
Preface
Index

Running transformations and jobs from a repository


In order to run a transformation or job stored in a repository follow these steps:

  1. Open a terminal window.

  2. Go to the Kettle installation directory.

  3. Run the proper command according to the following table:

    Running a ...

    Windows

    Unix-like system

    transformation

    pan.bat /rep:<value>
    		/user:<user>
    		/pass:<value>
    		/trans:<value>
    		/dir:<value>

    pan.sh /rep:<value>
           /user:<user>
           /pass:<value>
           /trans:<value>
           /dir:<value>
    

    job

    kitchen.bat /rep:<value>
                /user:<user>
                /pass:<value>
                /job:<value>
                /dir:<value>
    

    kitchen.sh /rep:<value>
               /user:<user>
               /pass:<value>
               /job:<value>
               /dir:<value>
    

In this preceding table:

  • rep is the name of the repository to log into

  • user and pass are the credentials to log into the repository

  • trans and job are the names of the transformation or job to run

  • dir is the name of the directory where the transformation or job is located

The parameters are shown on different lines for you to clearly identify all the options.

Note

When you type the command, you have to write all the parameters on the same line.

Suppose that you work on Windows, you have a repository named MY_REPO, and you log into the repository with user PDI_USER and password 1234. To run a transformation named Hello located in a directory named MY_WORK in that repository, type the following:

pan.bat /rep:"MY_REPO" /user:"PDI_USER" /pass:"1234" /trans:"Hello" /dir:"/MY_WORK/"

Note

If you defined auto-login, you don't need to provide the repository information— the rep, user, and pass command line parameters—as part of the command.

Specifying command line options

In the examples provided in this appendix, all options are specified by using the /option:value syntax—for example, /trans:"Hello".

Instead of /, you can also use -. Between the name of the option and the value, you can also use =. This means the options /trans:"Hello" and -trans="Hello" are equivalents.

You may use any combination of /,-, :, and =.

Note

In Windows, the use of - and = may cause problems; it's recommended that you use the /option:value syntax.

If there are spaces in the values, you can use quotes ('') or double quotes ("") to keep the values together. If there are no spaces, the quotes are optional.