Book Image

Pentaho Analytics for MongoDB Cookbook

By : Joel Andre Latino, Harris Ward
Book Image

Pentaho Analytics for MongoDB Cookbook

By: Joel Andre Latino, Harris Ward

Overview of this book

MongoDB is an open source, schemaless NoSQL database system. Pentaho as a famous open source Analysis tool provides high performance, high availability, and easy scalability for large sets of data. The variant features in Pentaho for MongoDB are designed to empower organizations to be more agile and scalable and also enables applications to have better flexibility, faster performance, and lower costs. Whether you are brand new to online learning or a seasoned expert, this book will provide you with the skills you need to create turnkey analytic solutions that deliver insight and drive value for your organization. The book will begin by taking you through Pentaho Data Integration and how it works with MongoDB. You will then be taken through the Kettle Thin JDBC Driver for enabling a Java application to interact with a database. This will be followed by exploration of a MongoDB collection using Pentaho Instant view and creating reports with MongoDB as a datasource using Pentaho Report Designer. The book will then teach you how to explore and visualize your data in Pentaho BI Server using Pentaho Analyzer. You will then learn how to create advanced dashboards with your data. The book concludes by highlighting contributions of the Pentaho Community.
Table of Contents (10 chapters)
9
Index

Learning basic operations with Pentaho Data Integration

The following recipe is aimed at showing you the basic building blocks that you can use for the rest of the recipes in this chapter. We recommend that you work through this simple recipe before you tackle any of the others. If you want, PDI also contains a large selection of sample transformations for you to open, edit, and test. These can be found in the sample directory of PDI.

Getting ready

Before you can begin this recipe, you will need to make sure that the JAVA_HOME environment variable is set properly. By default, PDI tries to guess the value of the JAVA_HOME environment variable. Note that for this book, we are using Java 1.7. As soon as this is done, you're ready to launch Spoon, the graphical development environment for PDI. To start Spoon, you can use the appropriate scripts located at the PDI home folder. To start Spoon in Windows, you will have to execute the spoon.bat script in the home folder of PDI. For Linux or Mac, you will have to execute the spoon.sh bash script instead.

How to do it…

First, we need configure Spoon to be able to create transformations and/or jobs. To acclimatize to the tool, perform the following steps:

  1. Create a new empty transformation:
    1. Click on the New file button from the toolbar menu and select the Transformation item entry. You can also navigate to File | New | Transformation from the main menu. Ctrl + N also creates a new transformation.
  2. Set a name for the transformation:
    1. Open the Transformation settings dialog by pressing Ctrl + T. Alternatively, you can right-click on the right-hand-side working area and select Transformation settings. Or on the menu bar, select the Settings... item entry from the Edit menu.
    2. Select the Transformation tab.
    3. Set Transformation Name to First Test Transformation.
    4. Click on the OK button.
  3. Save the transformation:
    1. Click on the Save current file button from the toolbar. Alternatively, from the menu bar, go to File | Save. Or finally, use the quick option by pressing Ctrl + S.
    2. Choose the location of your transformation and give it the name chapter1-first-transformation.
    3. Click on the OK button.
  4. Run a transformation using Spoon.
    1. You can run the transformation by either of these ways: click on the green play icon on the transformation toolbar and navigate to Action | Run on the main menu or simply press F9.
    2. You will get an Execute a transformation dialog. Here, you can set parameters, variables, or arguments if they are required for running the transformation.
    3. Run the transformation by clicking on the Launch button.
  5. Run the transformation in preview mode using Spoon.
    1. In the Transformation debug dialog, select the step you want to preview the output data.
    2. After selecting the desired output step, you can preview the transformation by either clicking on the magnify icon on the transformation toolbar, going to Action | Preview on the main menu, or simply pressing F10.
    3. You will get a Transformation debug dialog that you can use to define the number of rows you want to see, breakpoints, and the step that you want analyze.
    4. You can click on the Configure button to define parameters, variables, or arguments. Click on the Quick Launch button to preview the transformation.

How it works…

In this recipe, we just introduced the Spoon tool, touching on the main basic points for you to manage ETL transformations. We started by creating a transformation. We gave a name to the transformation, First Test Transformation in this case. Then, we saved the transformation in the filesystem with the name chapter1-first-transformation.

Finally, we ran the transformation normally and in debug mode. Understanding how to run a transformation in debug mode is useful for future ETL developments as it helps you understand what is happening inside of the transformation.

There's more…

In the PDI home folder, you will find a large selection of sample transformations and jobs that you can open, edit, and run to better understand the functionality of the diverse steps available in PDI.