This appendix gives you some advice to take into account in your daily work with PDI. If you intend to work seriously with PDI, knowing how to accomplish different tasks is just not enough.
Here you have some guidelines that will help you to go in the right direction:
Outline your ideas on paper before creating a transformation or a job. Don't drop steps randomly on the canvas trying to get things to work, otherwise you will end up with a transformation or a job that is difficult to understand and might not be of any use.
Document your work. Write at least a simple description in the transformation and job setting windows. Replace the default names of the steps and the job entries with meaningful ones. Use notes to clarify the purpose of the transformations and the jobs. Color-code your notes for a better effect; for example, use a color for notes explaining the purpose of a transformation, and a different color or font for technical notes. By doing this, your work will be well documented.
Make your jobs and transformations clear to understand. Arrange the elements in the canvas so that it does not look like a puzzle to solve. Memorize the shortcuts for arrangement and alignment and use them regularly. You will find a full list in Appendix D, Spoon Shortcuts.
Organize the PDI elements in folders. Don't save all of the transformations and jobs in the same folder. Organize them according to the purpose they have.
Make your work flexible and reusable. Make use of arguments, variables, and named parameters. If you identify tasks that are going to be used in several situations, create subtransformations.
Make your work portable (ready for deployment). Do whatever you can so that even if you move your work to another machine or another folder, or the path to source or destination files change, or the connection properties to the databases change, everything keeps working without or with minimal changes. In order to do that, don't use fixed names but variables. If you know the values for the variables beforehand, define the variables in the
kettle.properties
file. For the name of the transformations and jobs use the relative paths (use the${Internal.Job.Filename.Directory}
, and${Internal.Transformation.Filename.Directory}
variables).Avoid overloading your transformations. A transformation should do a precise task. If it doesn't, think of splitting it into two or more, or create subtransformations. Doing so, your transformation will be clearer and in the case of subtransformations, also reusable.
Handle errors. Try to figure out the kind of errors that may occur and trap them by validating, handing errors, and acting accordingly—fixing data, taking alternative paths, sending friendly messages to the log files, and so on.
Do everything you can to optimize the PDI performance. You can find a full checklist at http://wiki.pentaho.com/display/COM/PDI+Performance+tuning+check-list.
For tracking the performance of individual steps in a transformation, you can look up the details at http://wiki.pentaho.com/display/EAI/Step+performance+monitoring.
Keep a track of jobs and transformations history. You can use a versioning system, such as Subversion or Git. In doing so, you can recover older versions of your jobs and transformations or examine the history of how they changed. For more on Subversion, visit the site http://subversion.tigris.org/. For more on Git visit the official site http://git-scm.com/. Also, consider upgrading to EE, where versioning is a repository feature.
Bookmark the the forum page and visit it frequently. The PDI forum is available at http://forums.pentaho.org/forumdisplay.php?f=135. If you are stuck with something, search for a solution in the forum. If you don't find what you're looking for, create a new thread, expose your doubts or scenario clearly and you'll get a prompt answer as the Pentaho community and particularly the PDI one is quite active. Alternatively you can meet Pentaho people on IRC server www.freenode.net, channel
#pentaho
. On the channel, people discuss all kinds of issues related to all the Pentaho tools, and not just Kettle.