To create a repository, follow these steps:
Open MySQL Command Line Client.
In the command window, type the following:
CREATE DATABASE PDI_REPO;
Open Spoon.
If the repository dialog appears, skip to step 6.
Open the repository dialog from the Repository | Connect to repository menu.
Click on New to create a new repository. The repository information dialog shows up. Click on New to create a new database connection.
The database connection window appears. Define a connection to the database you have just created and give a name to the connection—
PDI_REPO_CONN
in this case.Tip
If you want to refer to the steps on creating the database connection, check out Time for action – creating a connection to the Steel Wheels database section in Chapter 8.
Test the connection to see that it is properly configured.
Click OK to close the database connection window. The Select database connection box will show the created connection.
Give the name
MY_REPO
to the repository. As description, type My first repository.Click on Create or Upgrade.
PDI will ask you if you are sure you want to create the repository on the specified database connection. Answer Yes if you are sure of the settings you entered.
A dialog appears asking if you want to do a dry run to evaluate the generated SQL before execution.
Answer No unless you want to preview the SQL that will create the reposprogress window appears showing you the progress while the repository is being created.
Finally, you see a window with the message Kettle created the repository on the specified connection. Close the dialog window.
Click on OK to close the repository information window. You will be back in the repository dialog, this time with a new repository available in the repository drop-down list.
If you want to start working with the created repository, please refer to the Working with the repository storage system section. If not, click on No Repository. This will close the window.
In MySQL you created a new database named PDI_REPO
. Then you used that database to create a PDI repository.
A Kettle repository is a database that provides you with a storage system for your transformations and jobs. The repository is the alternative to the *.ktr
and *.kjb
file-based system.
In order to create a new repository, a database must have been created previously. In the tutorial, the repository was created in a MySQL RDBMS. However, you can create your repositories in any relational database.
Note that if the repository has already been created from another machine or by another user, that is, another profile in the operating system, you don't have to create the repository again. In that case, just define the connection to the repository but don't create it again. In other words, follow all the instructions but don't click the Create or Upgrade button.
Once you have created a repository, its name, description, and connection information are stored in a file named repositories.xml
, which is located in the PDI home directory. The repository database is populated with a bunch of tables with familiar names such as transformation
, job
, steps
, and steps_type
.
Note that you may have more than one repository—different repositories for different projects, different repositories for different versions of a project, a repository just for testing new PDI features, and another for serious development, and so on. Therefore, it is important that you give the repositories meaningful names and descriptions so that you don't get confused if you have more than one.