Book Image

Pentaho Analytics for MongoDB Cookbook

By : Joel Andre Latino, Harris Ward
Book Image

Pentaho Analytics for MongoDB Cookbook

By: Joel Andre Latino, Harris Ward

Overview of this book

MongoDB is an open source, schemaless NoSQL database system. Pentaho as a famous open source Analysis tool provides high performance, high availability, and easy scalability for large sets of data. The variant features in Pentaho for MongoDB are designed to empower organizations to be more agile and scalable and also enables applications to have better flexibility, faster performance, and lower costs. Whether you are brand new to online learning or a seasoned expert, this book will provide you with the skills you need to create turnkey analytic solutions that deliver insight and drive value for your organization. The book will begin by taking you through Pentaho Data Integration and how it works with MongoDB. You will then be taken through the Kettle Thin JDBC Driver for enabling a Java application to interact with a database. This will be followed by exploration of a MongoDB collection using Pentaho Instant view and creating reports with MongoDB as a datasource using Pentaho Report Designer. The book will then teach you how to explore and visualize your data in Pentaho BI Server using Pentaho Analyzer. You will then learn how to create advanced dashboards with your data. The book concludes by highlighting contributions of the Pentaho Community.
Table of Contents (10 chapters)
9
Index

Loading data from MongoDB to MySQL

In this recipe, we will guide you through extracting data from MongoDB and inserting it into a MySQL database. You will create a simple transformation as you did in the last recipe, but in reverse. You don't have to use MySQL as your database. If you want, you can use any other database. You just need to make sure that you can connect to Pentaho Data Integration via JDBC. However, in this book, we will use MySQL as an example.

Getting ready

Make sure you have created a MySQL database server or some other database type server with a database called SteelWheels. Also make sure that your MongoDB instance is running and launch Spoon.

How to do it…

After you have made sure that you have the databases set up, perform the following steps:

  1. Create a new empty transformation.
    1. Set the name for this transformation to Loading data from MongoDB to MySQL.
    2. Save the transformation with the name chapter1-mongodb-to-mysql.
  2. Select Customers from MongoDB using the MongoDB Input step.
    1. Select the Design tab in the left-hand-side view.
    2. From the Big Data category folder, find the MongoDB Input step and drag and drop it into the working area in the right-hand-side view.
    3. Double-click on the MongoDB Input step to open the configuration dialog.
    4. Set the Step Name property to Select Customers.
    5. Select the Input options tab. Click on Get DBs and select SteelWheels from the Database select box.
    6. After selecting the database, you can click on the Get Collections button and then select Customers Collection from the select box.
    7. As we're just running one MongoDB instance, we'll keep Read preference as primary and will not configure any Tag set specification.
    8. Click on the Query tab. In this section, we'll define the where filter data condition and the fields that we want to extract.
    9. As we just want the customers from USA, we'll write the following query in the Query expression (JSON) field: {"address.country": "USA"}.

      Note

      In this recipe, we are not going to cover the MongoDB aggregation framework, so you can ignore those options for now.

    10. Click on the Fields tab. In this tab, we'll define the output fields that we want. By default, the Output single JSON field comes checked. This means that each document is extracted in the JSON format with the field name defined in the Name of JSON output field. As we want to define the fields, we remove the selection of the Output single JSON field.
    11. Click on the Get fields button and you will get all the fields available from MongoDB. Remove the _id field because it isn't necessary. For deletion, you can select the row of the _id field and press the Delete key from your keyboard, or right-click on the row and select the Delete selected lines option.
    12. Click on OK to finish the MongoDB input configuration.
  3. Let's configure the output of the MongoDB Customers data in the MySQL database.
    1. On the Design tab, from the Output category folder, find the Table Output step and drag and drop it into the working area in the right-hand-side view.
    2. Connect the MongoDB Input step to the Table output step by creating a hop between them.
    3. Double-click on the step to open the Table Output configuration dialog.
    4. Set Step Name to Customers Output.
    5. Click on the New button next to the Database Connection pulldown. This will open the Database Connection dialog.

      Set Connection Name to SteelWheels. Select the Connection Type as MySQL. Set Host Name to localhost, Database Name to SteelWheels, and Port to 3306. Then, set Username and Password to whatever you had set them as. Your setup should look similar to the following screenshot:

      How to do it…
    6. Test this, and if all is well, click on OK to return to the Table Output step.
  4. Insert this data into a MySQL table using the Table Output step:
    1. Set the Target table field to Customers. This is the name of the MySQL table to insert data into.
    2. As we haven't created a customer's table in the MySQL database, we can use a PDI function that will try to generate the required SQL to create the table and structure. Simply click on the SQL button and it will open the Execute SQL dialog. Here, you will see the SQL that PDI will execute to create the customers table. Click on Execute to send this SQL to MySQL and create the table. Then, click on OK.
    3. Click on OK again to exit the Table Output configuration dialog. The transformation is complete. You can now run it to load data from MongoDB to MySQL.

How it works…

In this transformation, we are simply selecting a collection from the MongoDB Input step where the country field is USA. Next, we map this collection to the fields in the PDI stream. Lastly, we insert this data into a MySQL table using the Table Output step. In the Fields tab, we use JSONPath to select the correct data from the MongoDB collection (http://goessner.net/articles/JsonPath/). JSONPath is like XPath for JSON documents.