Book Image

Pentaho Data Integration Quick Start Guide

By : María Carina Roldán
Book Image

Pentaho Data Integration Quick Start Guide

By: María Carina Roldán

Overview of this book

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.
Table of Contents (15 chapters)

Preface

Pentaho Data Integration Quick Start Guide provides the guidance needed to get started with Pentaho Data Integration (PDI), covering the main features of the tool. The book shows the interactive features of the graphical designer, and explains the main ETL capabilities that the tool offers.

The book's content is based on PDI 8.1 Community Edition (CE), the latest version. However, it can be used with the Enterprise Edition (EE) as well. Many of the examples will also work with earlier versions of PDI.

Who this book is for

This book is a helpful guide for software developers, business intelligence analysts, IT students, and everyone involved or interested in developing ETL solutions, or more generally in performing any kind of data manipulation.

What this book covers

Chapter 1, Getting Started with PDI, presents the tool. This chapter includes instructions for installing PDI and gives you the opportunity to explore and configure the graphical designer (Spoon).

Chapter 2, Getting Familiar with Spoon, explains the fundamentals of working with Spoon by designing, debugging, and testing a transformation.

Chapter 3, Extracting Data, discusses getting and combining data from different sources. In particular, this chapter explains how to get data from files and databases.

Chapter 4, Transforming Data, explains how to transform data in many ways. Also, it explains how to get system information and predefined variables to be used as part of the data flow.

Chapter 5, Loading Data, explains how to save the output of transformations into files and databases. In addition, it explains how to load data into a datamart.

Chapter 6, Orchestrating your Work, shows how to organize your work through simple PDI jobs. You will learn how to use jobs to sequence tasks, deal with files, send emails, run DDL, and to carry out other useful tasks.

 

To get the most out of this book

PDI is a multiplatform tool, meaning that it can be installed and used under any operating system. The only prerequisite is to have JVM 1.8 installed. You will also need a good text editor, for example, Sublime III or Notepad ++. It's also recommended that you have access to a relational database. The examples in the book were built with PostgreSQL syntax, but you can adapt them to any other engine, as soon as there is a JDBC driver for it.Throughout the chapters, several internet links are provided to complement what is explained. Therefore, having an internet connection while reading is highly recommended.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packtpub.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Pentaho-Data-Integration-Quick-Start-Guide. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

 

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/PentahoDataIntegrationQuickStartGuide_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "You should specify the full path, for instance:C:/Pentaho/data/ny_cities."

A block of code is set as follows:

SELECT full_name
, injury_type
, to_char(start_date_time, 'yyyy-mm-dd') as injury_date
FROM injury_phases i
JOIN display_names n ON i.person_id = n.id AND entity_type = 'persons'
AND start_date_time BETWEEN '2007-07-01' AND '2007-07-31'
ORDER BY full_name, injury_type

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "You can save time by clicking theGet fields to selectbutton, which fills the grid with all the incoming fields."

Note

Warnings or important notes appear like this.

Note

Tips and tricks appear like this.

 

 

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.