To create a repository, follow these steps:
Open the MySQL command-line client.
In the command window, type the following command:
CREATE DATABASE PDI_REPO;
Open Spoon.
Unless a repository dialog appears, open the repository dialog from the Tools | Repository | Connect... menu.
Click on the plus icon to create a new repository. A window with two options appears: Select the Kettle database repository option, as shown in the following screenshot:
The Repository information dialog shows up. Click on New to create a new database connection.
The database connection window appears. Define a connection to the database you have just created and give the connection the name
PDI_REPO_CONN
.Tip
In order to create the database connection, refer to the Time for Action – creating a connection to the Steel Wheels database recipe in Chapter 8, Working with Databases.
Test the connection to see that it is properly configured.
Click on OK to close the database connection window. The Select Database Connection box will show the created connection.
Give the repository an ID and a Name, for example,
kettle_repo
andMy First Repo
.Click on Create or Upgrade.
PDI will ask you if you are sure you want to create the repository on the specified database connection. Answer Yes (if you are sure of the settings you entered of course).
A dialog appears asking if you want to do a dry run to evaluate the generated SQL before execution. Answer No, unless you want to preview the SQL that will create the repository.
A progress window appears showing you the progress while the repository is being created.
Finally, you see a window with the message Kettle created the repository on the specified connection. Close the dialog window.
Click on OK to close the Repository information window. You will be back in the repository dialog, this time with a new repository available in the repository list.
If you want to start working with the created repository, refer to the Working with the repository storage system section. If not, click on Cancel. This will close the window.
In MySQL, you created a new database named PDI_REPO
. Then, you used that database to create a PDI database repository.
A Kettle database repository is a database that provides a storage system for your transformations and jobs. The repository is the alternative to the *.ktr
and *.kjb
file-based system.
In order to create a new database repository, a database must have been created previously. In that section, the repository was created in a MySQL RDBMS. However, you can create your repositories in any JDBC compliant RDBMS.
Note that if the repository has already been created from another machine or by another user, which means another profile in the operating system, you don't have to create the repository again. In that case, just define the connection to the repository but don't create it again. In other words, follow all the instructions but don't click on the Create or Upgrade button.
Once you have created a repository; its name, description, and connection information are stored in a file named repositories.xml
, located in the PDI home directory. The repository database is populated with a bunch of tables with the familiar names of transformation
, job
, steps
, and steps_type
.
Note that you may have more than one repository—different repositories for different projects, different repositories for different versions of a project, a repository just for testing new PDI features, and another for serious development, and so on. Therefore, it is important that you give the repositories meaningful names and descriptions so you don't get confused if you have more than one.