Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Pentaho Analytics for MongoDB Cookbook
  • Table Of Contents Toc
  • Feedback & Rating feedback
Pentaho Analytics for MongoDB Cookbook

Pentaho Analytics for MongoDB Cookbook

By : Joel Andre Latino, Harris Ward
close
close
Pentaho Analytics for MongoDB Cookbook

Pentaho Analytics for MongoDB Cookbook

By: Joel Andre Latino, Harris Ward

Overview of this book

MongoDB is an open source, schemaless NoSQL database system. Pentaho as a famous open source Analysis tool provides high performance, high availability, and easy scalability for large sets of data. The variant features in Pentaho for MongoDB are designed to empower organizations to be more agile and scalable and also enables applications to have better flexibility, faster performance, and lower costs. Whether you are brand new to online learning or a seasoned expert, this book will provide you with the skills you need to create turnkey analytic solutions that deliver insight and drive value for your organization. The book will begin by taking you through Pentaho Data Integration and how it works with MongoDB. You will then be taken through the Kettle Thin JDBC Driver for enabling a Java application to interact with a database. This will be followed by exploration of a MongoDB collection using Pentaho Instant view and creating reports with MongoDB as a datasource using Pentaho Report Designer. The book will then teach you how to explore and visualize your data in Pentaho BI Server using Pentaho Analyzer. You will then learn how to create advanced dashboards with your data. The book concludes by highlighting contributions of the Pentaho Community.
Table of Contents (10 chapters)
close
close
9
Index

Working with jobs and filtering MongoDB data using parameters and variables

In this recipe, we guide you through creating two PDI jobs. One uses variables and the other uses parameters. In a PDI process, jobs orchestrate other jobs and transformations in a coordinated way to realize the main business process. These jobs use the transformation created in the last recipe but with some changes, as described in this recipe.

So, in this recipe, we are going create two different jobs, which will be used to send data to a subtransformation. The subtransformation that we will use will be a copy of the transformation in the previous recipe.

Getting ready

To get ready for this recipe, you need to start your ETL development environment Spoon, and make sure you have the MongoDB server started with the data inserted in the last recipes.

How to do it…

Let's start using jobs and variables. We can orchestrate the ETL to run in different ways. In this simple case, we are just using the customer name. Perform the following steps:

  1. Let's copy and paste the transformation created in the previous recipe and save it as chapter1-mongodb-map-reduce-writelog.ktr.
  2. Open that transformation using Spoon, and from the Utility category folder, find the Write to log step. Drag and drop it into the working area in the right-side view.
    1. Create a hop between the OUTPUT step and the Write to log step.
    2. Double-click on the Write to Log step to open the configuration dialog.
    3. Set Step Name to MapReduce.
    4. Click on the Get Fields button.
    5. Click on OK to finish the configuration.
  3. Let's create a new empty job.
    1. Click on the New file button from the toolbar menu and select the Job item entry. Alternatively from menu bar, go to File | New | Job.
    2. Open the Job properties dialog by pressing Ctrl + J or by right-clicking on the right-hand-side working area and selecting Job settings.
    3. Select the Job tab. Set Job Name to Job Parameters.
    4. Select the Parameters tab and add a Parameter entry with the name as CUSTOMER_NAME. Click on OK.
    5. Save the Job with the name job-parameters.
  4. From the General category folder, find the START, Transformation, and Success steps and drag and drop them into the working area in the right-side view.
    1. Create a hop between the START step and the Transformation step.
    2. Then, create a hop from the Transformation step to the Success step.
    3. Double-click on the Transformation step to open the configuration dialog
    4. Change the Name of job entry property to MapReduce Transf.
    5. Click on the transformation button of the Transformation filename field and select the transformation file that you copied before in your filesystem. Also select the chapter1-mongodb-map-reduce-writelog.ktr file.
    6. Select the Parameters tab. By default, the Pass all parameters values down to the sub-transformation option is checked, which means our job parameter will be passed to the transformation.
    7. Click on OK to finish.
    8. Run the job and analyze the results and check the logs on the Logging tab.

Now let's do a quick and simple example using variables:

  1. Copy and paste the chapter1-mongodb-map-reduce-writelog transformation. Save it as chapter1-mongodb-map-reduce-writelog-without-parameter.
  2. Open the transformation with Spoon and remove the parameter from Transformation properties.
  3. Copy and paste the last job. Save it as job-variables.
    1. Open the job with Spoon.
    2. In Job properties, change the job name to Job Variables. From the Parameters tab, remove the CUSTOMER_NAME parameter. Select the parameter, right-click on it and select Delete selected lines, or just press delete on your keyboard.
    3. Click on OK to finish.
  4. From the General category folder, find the Set variables step and drag and drop it into the working area in the right-side view.
    1. Remove the hop from between the START step and MapReduce Transf step.
    2. Create a hop between the START step and the Set variables step.
    3. Then, create a hop between Set Variables and the MapReduce Transf step.
    4. Double-click on the Set Variables step to open the configuration dialog.
    5. Set the Step name property to Set CUSTOMER_NAME.
    6. On Variables, create a new variable with the CUSTOMER_NAME name. Set the value to an existing client in the database and the Scope type to Valid in the root job.
    7. Click on OK to finish the configuration.
  5. On the MapReduce Transf transformation step, change the file location for the transformation file to the transformation without the parameter.
  6. Run the job and analyze the results, checking the logs in the Logging tab.

How it works…

Most ETL solutions created in Pentaho Data Integration will be sets of jobs and transformations.

Transformations are workflows with an orchestration of actions that manipulate data using essentially input, transformation, and output steps.

Jobs are workflows with an orchestration of tasks that can be order execution failure or success.

Variables and parameters are extremely useful functions that we can use to create dynamic jobs and transformations.

Visually different images
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Pentaho Analytics for MongoDB Cookbook
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon