Book Image

Pentaho Analytics for MongoDB Cookbook

By : Joel André Latino, Harris Ward
Book Image

Pentaho Analytics for MongoDB Cookbook

By: Joel André Latino, Harris Ward

Overview of this book

MongoDB is an open source, schemaless NoSQL database system. Pentaho as a famous open source Analysis tool provides high performance, high availability, and easy scalability for large sets of data. The variant features in Pentaho for MongoDB are designed to empower organizations to be more agile and scalable and also enables applications to have better flexibility, faster performance, and lower costs. Whether you are brand new to online learning or a seasoned expert, this book will provide you with the skills you need to create turnkey analytic solutions that deliver insight and drive value for your organization. The book will begin by taking you through Pentaho Data Integration and how it works with MongoDB. You will then be taken through the Kettle Thin JDBC Driver for enabling a Java application to interact with a database. This will be followed by exploration of a MongoDB collection using Pentaho Instant view and creating reports with MongoDB as a datasource using Pentaho Report Designer. The book will then teach you how to explore and visualize your data in Pentaho BI Server using Pentaho Analyzer. You will then learn how to create advanced dashboards with your data. The book concludes by highlighting contributions of the Pentaho Community.
Table of Contents (15 chapters)
Pentaho Analytics for MongoDB Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Working with jobs and filtering MongoDB data using parameters and variables


In this recipe, we guide you through creating two PDI jobs. One uses variables and the other uses parameters. In a PDI process, jobs orchestrate other jobs and transformations in a coordinated way to realize the main business process. These jobs use the transformation created in the last recipe but with some changes, as described in this recipe.

So, in this recipe, we are going create two different jobs, which will be used to send data to a subtransformation. The subtransformation that we will use will be a copy of the transformation in the previous recipe.

Getting ready

To get ready for this recipe, you need to start your ETL development environment Spoon, and make sure you have the MongoDB server started with the data inserted in the last recipes.

How to do it…

Let's start using jobs and variables. We can orchestrate the ETL to run in different ways. In this simple case, we are just using the customer name. Perform the following steps:

  1. Let's copy and paste the transformation created in the previous recipe and save it as chapter1-mongodb-map-reduce-writelog.ktr.

  2. Open that transformation using Spoon, and from the Utility category folder, find the Write to log step. Drag and drop it into the working area in the right-side view.

    1. Create a hop between the OUTPUT step and the Write to log step.

    2. Double-click on the Write to Log step to open the configuration dialog.

    3. Set Step Name to MapReduce.

    4. Click on the Get Fields button.

    5. Click on OK to finish the configuration.

  3. Let's create a new empty job.

    1. Click on the New file button from the toolbar menu and select the Job item entry. Alternatively from menu bar, go to File | New | Job.

    2. Open the Job properties dialog by pressing Ctrl + J or by right-clicking on the right-hand-side working area and selecting Job settings.

    3. Select the Job tab. Set Job Name to Job Parameters.

    4. Select the Parameters tab and add a Parameter entry with the name as CUSTOMER_NAME. Click on OK.

    5. Save the Job with the name job-parameters.

  4. From the General category folder, find the START, Transformation, and Success steps and drag and drop them into the working area in the right-side view.

    1. Create a hop between the START step and the Transformation step.

    2. Then, create a hop from the Transformation step to the Success step.

    3. Double-click on the Transformation step to open the configuration dialog

    4. Change the Name of job entry property to MapReduce Transf.

    5. Click on the transformation button of the Transformation filename field and select the transformation file that you copied before in your filesystem. Also select the chapter1-mongodb-map-reduce-writelog.ktr file.

    6. Select the Parameters tab. By default, the Pass all parameters values down to the sub-transformation option is checked, which means our job parameter will be passed to the transformation.

    7. Click on OK to finish.

    8. Run the job and analyze the results and check the logs on the Logging tab.

Now let's do a quick and simple example using variables:

  1. Copy and paste the chapter1-mongodb-map-reduce-writelog transformation. Save it as chapter1-mongodb-map-reduce-writelog-without-parameter.

  2. Open the transformation with Spoon and remove the parameter from Transformation properties.

  3. Copy and paste the last job. Save it as job-variables.

    1. Open the job with Spoon.

    2. In Job properties, change the job name to Job Variables. From the Parameters tab, remove the CUSTOMER_NAME parameter. Select the parameter, right-click on it and select Delete selected lines, or just press delete on your keyboard.

    3. Click on OK to finish.

  4. From the General category folder, find the Set variables step and drag and drop it into the working area in the right-side view.

    1. Remove the hop from between the START step and MapReduce Transf step.

    2. Create a hop between the START step and the Set variables step.

    3. Then, create a hop between Set Variables and the MapReduce Transf step.

    4. Double-click on the Set Variables step to open the configuration dialog.

    5. Set the Step name property to Set CUSTOMER_NAME.

    6. On Variables, create a new variable with the CUSTOMER_NAME name. Set the value to an existing client in the database and the Scope type to Valid in the root job.

    7. Click on OK to finish the configuration.

  5. On the MapReduce Transf transformation step, change the file location for the transformation file to the transformation without the parameter.

  6. Run the job and analyze the results, checking the logs in the Logging tab.

How it works…

Most ETL solutions created in Pentaho Data Integration will be sets of jobs and transformations.

Transformations are workflows with an orchestration of actions that manipulate data using essentially input, transformation, and output steps.

Jobs are workflows with an orchestration of tasks that can be order execution failure or success.

Variables and parameters are extremely useful functions that we can use to create dynamic jobs and transformations.