The following table includes all the transformation steps used in the book. For a full list of steps and their descriptions, select Help | Show step plug-in information in Spoon's main menu.
You can also visit http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+v3.2.+Steps for a full step reference along with some examples.
Icon |
Name |
Purpose |
Time for action |
---|---|---|---|
Aborts a transformation |
Aborting when there are too many errors (Chapter 7); also in Chapters 11 and 12 | ||
Adds one or more constant fields to the stream |
Gathering progress and merging all together (Chapter 4); also in Chapters 7, 8, and 9 | ||
Gets the next value from a sequence |
Assigning tasks by Distributing (Chapter 4); also in Chapters 6 and 11 | ||
Appends two streams in an ordered way |
Giving priority to Bouchard by using Append Stream (Chapter 4) | ||
Creates new fields by performing simple calculations |
Reviewing examination by using the Calculator step (Chapter 3); also in Chapters 6 and 8 | ||
Updates a junk dimension. Alternatively, it can be used to update Type I SCD. |
Loading a region dimension with a Combination lookup/update step (Chapter 9); also in Chapter 12 | ||
Write rows to the executing job. The information will then be passed to the next entry in the job. |
Splitting the generation of top scores by copying and getting rows (Chapter 11) | ||
Validates fields based on a set of rules |
Checking films file with the Data Validator (Chapter 7) | ||
Executes a database query using stream values as parameters |
Using a Database join step to create a list of suggested products to buy (Chapter 9) | ||
Looks up values in a database table |
Using a Database lookup step to create a list of products to buy (Chapter 9), also in Chapter 12 | ||
For each incoming row, waits a given time before giving the row to the next step |
Generating custom files by executing a transformation for every input row (Chapter 11) | ||
Delete data in a database table |
Deleting data about discontinued items (Chapter 8) | ||
Updates or looks up a Type II SCD. Alternatively, it can be used to update Type I SCD or hybrid dimensions. |
Keeping a history of product changes with the Dimension lookup/update step (Chapter 9), also in Chapter 12 | ||
This step type doesn't do anything! However it is used often. |
Creating a hello world transformation (Chapter 1), also in Chapters 2, 3, 7, and 9 | ||
Reads data from a Microsoft Excel ( |
Browsing PDI new features by copying a dataset (Chapter 4); also in Chapter 8 | ||
Writes data to a Microsoft Excel ( |
Getting data from an XML file with information about countries (Chapter 2); also in Chapters 4 and 10 | ||
Splits the stream in two upon a given condition. Alternatively, it is used to let pass just the rows that meet the condition. |
Counting frequent words by filtering (Chapter 3); also in Chapters 4, 6, 7, 9, 11, and 12 | ||
Reads data from a fixed width file |
Calculating Scores with JavaScript (Chapter 5) | ||
Creates new fields by using formulas. It uses Pentaho's libformula. |
Reviewing examination by using the Formula step (Chapter 3); also in Chapters 10 and 11 | ||
Generates a number of equal rows |
Creating a hello world transformation (Chapter 1); also in Chapters 6, 9, and 10 | ||
Gets data from XML files |
Getting data from an XML file with information about countries(Chapter 2); also in chapters 3 and 9 | ||
Reads rows from a previous entry in a job |
Splitting the generation of top scores by copying and getting rows (Chapter 11) | ||
Gets information from the system like system date, arguments, etc. |
Updating a file with news about examination (Chapter 2) also in Chapters 7, 8, 10, 11, and 12 | ||
Takes the values of environment or Kettle variables and adds them as fields in the stream |
Creating the time dimension dataset(Chapter 6) | ||
Builds aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly |
Calculating World Cup statistics by grouping data (Chapter 3); also in Chapters 4, 7, and 9 | ||
If a field is null, it changes its value to a constant. It can be applied to all fields of a same data type, or to particular fields |
Enhancing a films file by converting rows to columns (Chapter 6) | ||
Updates or inserts rows in a database table |
Inserting new products or updating existent ones (Chapter 8) | ||
Runs a subtransformation |
Calculating the top scores with a subtransformation (Chapter 11) | ||
Specifies the input interface of a sub-transformation |
Calculating the top scores with a subtransformation (Chapter 11) | ||
Specifies the output interface of a sub-transformation |
Calculating the top scores with a subtransformation (Chapter 11) | ||
Allows you to code Javascript to modify or create new fields. It's also possible to code Java |
Calculating Scores with JavaScript(Chapter 5); also in Chapters 6, 7, and 11 | ||
Creates ranges based on a numeric field |
Capturing errors while calculating the age of a film (Chapter 7); also in Chapter 8 | ||
Evaluates a field with a regular expression |
Validating Genres with a Regex Evaluation step (Chapter 7); also in Chapter 12 | ||
Denormalises rows by looking up key-value pairs |
Enhancing a films file by converting rows to columns (Chapter 6) | ||
Normalises data de-normalised |
Enhancing the matches file by normalizing the dataset (Chapter 6) | ||
Selects, reorders, or removes fields. Also allows you to change the metadata of fields |
Reading all your files at a time using a single Text file input step (Chapter 2); also in Chapters 3, 4, 6, 7, 8, 9, 11, and 12 | ||
Sets Kettle variables based on a single input row |
Updating a file with news about examinations by setting a variable with the name of the file (Chapter 11); also in Chapter 12 | ||
Sorts rows based upon field values, ascending or descending |
Reviewing examinations by using the Calculator step (Chapter 3); also in Chapters 4, 6, 7, 8, 9, and 11 | ||
Splits a single string field and creates a new row for each split term |
Counting frequent words by filtering (Chapter 3) | ||
Splits a single field into more than one |
Calculating World Cup statistics by grouping data (Chapter 3); also in Chapters 6 and 11 | ||
Looks up values coming from another stream in the transformation |
Finding out which language people speak (Chapter 3); also in Chapter 6 | ||
Switches a row to a certain target step based on the value of a field |
Assigning tasks by filtering priorities with the Switch/ Case step (Chapter 4) | ||
Reads data from a database table |
Getting data about shipped orders (Chapter 8); also in Chapters 9, 10, and 12 | ||
Writes data to a database table |
Loading a table with a list of manufacturers (Chapter 8), also in Chapters 9 and 12 | ||
Reads data from a text file |
Reading all your files at a time using a single Text file input step (Chapter 2); also in Chapters 3, 5, 6, 7, 8, and 11 | ||
Writes data to a text file |
Sending the results of matches to a plain file (Chapter 2); also in Chapters 3, 7, 9, 10, and 11 | ||
Updates data in a database table |
Loading a region dimension with a Combination lookup/update step (Chapter 9) | ||
Maps values of a certain field from one value to another |
Browsing PDI new features by copying a dataset (Chapter 4) |