Book Image

Pentaho Data Integration Quick Start Guide

By : María Carina Roldán
Book Image

Pentaho Data Integration Quick Start Guide

By: María Carina Roldán

Overview of this book

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.
Table of Contents (15 chapters)

Sorting and aggregating data


In the previous section, we learned how to work with individual fields—for example, by creating new ones or modifying existent ones. The operations were applied row by row. In this section, we will not look at individual rows, but we will instead learn to observe and work on the dataset as a unit.

Sorting data

Sorting the dataset is a very useful and common task. Sorting is really easy to do in PDI, and we will demonstrate it with a simple transformation. We will take the files of the surveys that we used in the previous chapter, and we will sort the data by neighborhood and room_type columns, and then by the reviews column in descending order. In order to do this, go through the following steps:

  1. Open any of the transformations created in the last chapter that read files with surveys. Save the transformation with a different name.
  2. Drag a Sort rows step from the Transform folder and create a hop from the Text file input toward this new step.
  3. Double-click the step and...