The following table includes all of the transformation steps used in the book. For a full list of the steps and their descriptions, select Help | Step information... in Spoon’s main menu.
Also visit the website http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+Steps.
There you will find a full step reference and some examples as well.
Icon |
Name |
Purpose |
Time for action |
---|---|---|---|
Abort |
Aborts a transformation. |
Generating custom messages by setting a variable with the name of the examination file (Chapter 11, Creating Advanced Transformations and Jobs). Also in Chapter 12, Developing and Implementing a Simple Datamart. | |
Add constants |
Adds one or more constants to the input rows. |
Gathering progress and merging all together (Chapter 5, Controlling the Flow of Data). Also in Chapter 8, Working with Databases, Chapter 9, Performing Advanced Operations with Databases, and Chapter 12, Developing and Implementing a Simple Datamart. | |
Add sequence |
Assigning tasks by distributing (Chapter 5, Controlling the Flow of Data). Also in Chapter 11, Creating Advanced Transformations and Jobs, and Chapter 12, Developing and Implementing a Simple Datamart. | ||
Append streams |
Appends two streams in an ordered way. |
Giving priority to Bouchard by using Append Stream (Chapter 5, Controlling the Flow of Data). | |
Calculator |
Creates new fields by performing simple calculations. |
Avoiding errors while converting the estimated time (Chapter 2, Getting started with Transformations). Also in Chapter 3, Manipulating Real-world Data, Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data, Chapter 5, Controlling the Flow of Data, Chapter 7, Transforming the Rowset, Chapter 8, Working with Databases, Chapter 10, Creating Basic Task Flows, and Chapter 12, Developing and Implementing a Simple Datamart. | |
Clone row |
Generating a range of dates and inspecting (Chapter 2, Getting started with Transformations). Also in Chapter 7, Transforming the Rowset, and Chapter 10, Creating Basic Task Flows. | ||
Combination lookup/update |
Updates a junk dimension in a data-warehouse. Alternatively, looks up information in this dimension. |
Loading a region dimension with a Combination lookup-update step (Chapter 9, Performing Advanced Operations with Databases). Also in Chapter 12, Developing and Implementing a Simple Datamart. | |
Copy rows to result |
Writes rows to the executing job. The information will then be passed to the next entry in the job. |
Generating top average scores by copying and getting rows (Chapter 11, Creating Advanced Transformations and Jobs). | |
Data Grid |
Allows you to create rows of static data in a grid, usually for testing, reference, or demo purposes. |
Avoiding errors while converting the estimated time (Chapter 2, Getting started with Transformations). Also in Chapter 5, Controlling the Flow of Data. | |
Database join |
Executes a database query using stream values as parameters. |
Using a Database join step to create a list of suggested products to buy (Chapter 9, Performing Advanced Operations with Databases). | |
Database lookup |
Looks up values in a database using field values. |
Using a Database lookup step to create a list of products to buy (Chapter 9, Performing Advanced Operations with Databases). Also in Chapter 12, Developing and Implementing a Simple Datamart. | |
Delay row |
Outputs each input row after a delay. |
Generating a range of dates and inspecting (Chapter 2, Getting started with Transformations) Also in Chapter 10, Creating Basic Task Flows. | |
Delete |
Deleting data about discontinued items (Chapter 8, Working with Databases). | ||
Dimension lookup/update |
Updates a slowly changing dimension in a data-warehouse. Alternatively, look up information in this dimension. |
Keeping a history of changes in product by using the Dimension lookup-update step (Chapter 9, Performing Advanced Operations with Databases). Also in Chapter 12, Developing and Implementing a Simple Datamart. | |
Dummy (do nothing) |
This step type doesn’t do anything. It’s useful however when testing, or in certain situations where you want to split streams. |
Creating a hello world transformation (Chapter 1, Getting started with Pentaho Data Integration). Also in Chapter 3, Manipulating Real-world Data, Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data, Chapter 6, Transforming your Data by Coding, Chapter 7, Transforming the Rowset, Chapter 9, Performing Advanced Operations with Databases, Chapter 11, Creating Advanced Transformations and Jobs, and Chapter 12, Developing and Implementing a Simple Datamart. | |
Filter rows |
Counting frequent words by filtering (Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data). Also in Chapter 5, Controlling the Flow of Data, Chapter 7, Transforming the Rowset, Chapter 9, Performing Advanced Operations with Databases, Chapter 11, Creating Advanced Transformations and Jobs, and Chapter 12, Developing and Implementing a Simple Datamart. | ||
Formula |
Calculates a formula using Pentaho’s |
Generating custom files by executing a transformation for every input row (Chapter 11, Creating Advanced Transformations and Jobs). | |
Generate random value |
Generates random values. |
Have a go Hero – measuring the performance of input steps (Chapter 3, Manipulating Real-world Data). | |
Generate Rows |
Generates a number of empty or equal rows. |
Creating a hello world transformation (Chapter 1, Getting started with Pentaho Data Integration). Also in Chapter 2, Getting started with Transformations, Chapter 3, Manipulating Real-world Data, Chapter 7, Transforming the Rowset, Chapter 9, Performing Advanced Operations with Databases, Chapter 10, Creating Basic Task Flows, and Chapter 12, Developing and Implementing a Simple Datamart. | |
Get data from XML |
Gets data from an XML file by using |
Getting data from an XML file with information about countries (Chapter 3, Manipulating Real-world Data). Also in Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data, and Chapter 9, Performing Advanced Operations with Databases. | |
Get rows from result |
Reads rows from a previous entry in a job. |
Generating top average scores by copying and getting rows (Chapter 11, Creating Advanced Transformations and Jobs). | |
Get System Info |
Gets information from the system such as system date, arguments, and so on. |
Reading and writing matching files with flexibility (Chapter 3, Manipulating Real-world Data). Also in Chapter 8, Working with Databases, Chapter 10, Creating Basic Task Flows, Chapter 11, Creating Advanced Transformations and Jobs, and Chapter 12, Developing and Implementing a Simple Datamart. | |
Get Variables |
Determines the values of certain (environment or Kettle) variables and puts them in field values. |
Creating a time dimension dataset (Chapter 7, Transforming the Rowset). Also in Chapter 12, Developing and Implementing a Simple Datamart. | |
Group by |
Builds aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly. |
Calculating football matches statistics by grouping data (Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data). Also in Chapter 5, Controlling the Flow of Data, Chapter 9, Performing Advanced Operations with Databases, and Chapter 11, Creating Advanced Transformations and Jobs. | |
If field value is null |
Sets a field value to a constant if it is null. |
Enhancing a films file by converting rows to columns (Chapter 7, Transforming the Rowset). | |
Insert/Update |
Updates or insert rows in a database based upon keys. |
Inserting new products or updating existent ones (Chapter 8, Working with Databases). | |
Java Filter |
Generating top average scores by copying and getting rows (Chapter 11, Creating Advanced Transformations and Jobs). | ||
Mapping (subtransformation) |
Run a subtransformation, use |
Calculating statistics with the use of a subtransformation (Chapter 11, Creating Advanced Transformations and Jobs). | |
Mapping input specification |
Specifies the input interface of a mapping. |
Calculating statistics with the use of a subtransformation (Chapter 11, Creating Advanced Transformations and Jobs). | |
Mapping output specification |
Specifies the output interface of a mapping. |
Calculating statistics with the use of a subtransformation (Chapter 11, Creating Advanced Transformations and Jobs). | |
Microsoft Excel Input |
Reads data from Excel and OpenOffice Workbooks ( |
Browsing new features of PDI by copying a dataset (Chapter 5, Controlling the Flow of Data). Also in Chapter 8, Working with Databases, and Chapter 12, Developing and Implementing a Simple Datamart. | |
Microsoft Excel Output |
Getting data from an XML file with information about countries (Chapter 3, Manipulating Real-World Data). Also in Chapter 5, Controlling the Flow of Data. | ||
Modified JavaScript Value |
Allows you to code JavaScript to modify or create new fields. |
Counting frequent words by coding in JavaScript (Chapter 6, Transforming your Data by Coding). Also in Chapter 7, Transforming the Rowset. | |
Number range |
Creates ranges based on a numeric field. |
Creating a simple transformation (Chapter 2, Getting Started with Transformations). Also in Chapter 8, Working with Databases. | |
Regex Evaluation |
Evaluates a field using a regular expression. It can also extract new fields out of an existing field with capturing groups. |
Loading the fact table using a range of dates (Chapter 12, Developing and Implementing a Simple Datamart). | |
Replace in string |
Replaces all occurrences of a word in a string with another word. |
Fixing words before counting them (Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data). | |
Row denormaliser |
Denormalizes rows by looking up key-value pairs and assigning them to new fields in the output rows. This method aggregates and needs the input rows to be sorted on the grouping fields. |
Enhancing a films file by converting rows to columns (Chapter 7, Transforming the Rowset). | |
Row Normaliser |
Denormalized information can be normalized using this step-type. |
Enhancing the matches files by normalizing the dataset (Chapter 7, Transforming the Rowset). Also in Chapter 12, Developing and Implementing a Simple Datamart. | |
Select values |
Selects, reorders, or removes fields. Optionally, allows you to change the metadata: type, length, and precision. |
Avoiding errors while converting the estimated time (Chapter 2, Getting Started with Transformations). Also in Chapter 3, Manipulating Real-World Data, Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data, Chapter 5, Controlling the Flow of Data, Chapter 7, Transforming the Rowset, Chapter 8, Working with Databases, Chapter 9, Performing Advanced Operations with Databases, Chapter 10, Creating Basic Task Flows, and Chapter 12, Developing and Implementing a Simple Datamart. | |
Set Variables |
Generating custom messages by setting a variable with the name of the examination file (Chapter 11, Creating Advanced Transformations and Jobs). Also in Chapter 12, Developing and Implementing a Simple Datamart. | ||
Sort rows |
Sorts rows based upon field values (ascending or descending). |
Sorting information about matches (Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data). Also in Chapter 5, Controlling the Flow of Data, Chapter 7, Transforming the Rowset, Chapter 8, Working with Databases, Chapter 9, Performing Advanced Operations with Databases, and Chapter 11, Creating Advanced Transformations and Jobs. | |
Split field to rows |
Splits a single string field by delimiter and creates a new row for each split term. |
Counting frequent words by filtering (Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data). Also in Chapter 6, Transforming your Data by Coding. | |
Split Fields |
Splits a single string field into two or more. |
Aggregating football matches data with the Row denormaliser step (Chapter 7, Transforming the Rowset). Also in Chapter 12, Developing and Implementing a Simple Datamart. | |
Stream lookup |
Looks up values coming from another stream in the transformation. |
Finding out which language people speak (Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data). Also in Chapter 7, Transforming the Rowset, and Chapter 12, Developing and Implementing a Simple Datamart. | |
Switch/Case |
Switches a row to a certain target step based on the case value in a field. |
Assigning tasks by filtering priorities with the Switch Case step (Chapter 5, Controlling the Flow of Data). | |
Table input |
Reads data from a database table. |
Getting data about shipped orders (Chapter 8, Working with Databases). Also in Chapter 9, Performing Advanced Operations with Databases, and Chapter 12, Developing and Implementing a Simple Datamart. | |
Table output |
Loading a table with a list of manufacturers (Chapter 8, Working with Databases). Also in Chapter 9, Performing Advanced Operations with Databases, and Chapter 12, Developing and Implementing a Simple Datamart. | ||
Text file input |
Reads data from a text file. |
Reading results of football matches from files (Chapter 3, Manipulating Real-world Data). Also in Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data, Chapter 6, Transforming your Data by Coding, Chapter 7, Transforming the Rowset, Chapter 8, Working with Databases, and Chapter 11, Creating Advanced Transformations and Jobs. | |
Text file output |
Writes data to a text file. |
Sending the results of matches to a plain file (Chapter 3, Manipulating Real-world Data). Also in Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data, Chapter 9, Performing Advanced Operations with Databases, Chapter 10, Creating Basic Task Flows, and Chapter 11, Creating Advanced Transformations and Jobs. | |
Unique rows |
Have a go hero – listing the last match played by each team (Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data). | ||
Update |
Updates data in a database table based upon keys. |
Loading a region dimension with a Combination lookup-update step (Chapter 9, Performing Advanced Operations with Databases) Also in Chapter 12, Developing and Implementing a Simple Datamart. | |
User Defined Java Class |
Allows you to program a step using Java code. |
Counting frequent words by coding in Java (Chapter 6, Transforming your Data by Coding). | |
User Defined Java Expression |
Calculates the result of a Java Expression using Janino. |
Creating a simple transformation (Chapter 2, Getting Started with Transformations). Also in Chapter 10, Creating Basic Task Flows, and Chapter 11, Creating Advanced Transformations and Jobs. | |
Value Mapper |
Browsing new features of PDI by copying a dataset (Chapter 5, Controlling the Flow of Data). | ||
Write to log |
Writes data to the log. |
Avoiding errors while converting the estimated time (Chapter 2, Getting started with Transformations). Also in Chapter 5, Controlling the Flow of Data, and Chapter 8, Working with Databases. |