Pentaho Analytics for MongoDB Cookbook

Pentaho Analytics for MongoDB Cookbook

By : Joel André Latino, Harris Ward

Buy this Book

Pentaho Analytics for MongoDB Cookbook

By: Joel André Latino, Harris Ward

Buy this Book

Overview of this book

MongoDB is an open source, schemaless NoSQL database system. Pentaho as a famous open source Analysis tool provides high performance, high availability, and easy scalability for large sets of data. The variant features in Pentaho for MongoDB are designed to empower organizations to be more agile and scalable and also enables applications to have better flexibility, faster performance, and lower costs. Whether you are brand new to online learning or a seasoned expert, this book will provide you with the skills you need to create turnkey analytic solutions that deliver insight and drive value for your organization. The book will begin by taking you through Pentaho Data Integration and how it works with MongoDB. You will then be taken through the Kettle Thin JDBC Driver for enabling a Java application to interact with a database. This will be followed by exploration of a MongoDB collection using Pentaho Instant view and creating reports with MongoDB as a datasource using Pentaho Report Designer. The book will then teach you how to explore and visualize your data in Pentaho BI Server using Pentaho Analyzer. You will then learn how to create advanced dashboards with your data. The book concludes by highlighting contributions of the Pentaho Community.

Pentaho Analytics for MongoDB Cookbook

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

PDI and MongoDB

Introduction

Learning basic operations with Pentaho Data Integration

Migrating data from the RDBMS to MongoDB

Loading data from MongoDB to MySQL

Migrating data from files to MongoDB

Exporting MongoDB data using the aggregation framework

MongoDB Map/Reduce using the User Defined Java Class step and MongoDB Java Driver

Working with jobs and filtering MongoDB data using parameters and variables

The Thin Kettle JDBC Driver

Introduction

Using a transformation as a data service

Running the Carte server in a single instance

Running the Pentaho Data Integration server in a single instance

Define a connection using a SQL Client (SQuirreL SQL)

Pentaho Instaview

Introduction

Creating an analysis view

Modifying Instaview transformations

Modifying the Instaview model

Exploring, saving, deleting, and opening analysis reports

A MongoDB OLAP Schema

Introduction

Creating a date dimension

Creating an Orders cube

Creating the customer and product dimensions

Saving and publishing a Mondrian schema

Creating a Mondrian 4 physical schema

Creating a Mondrian 4 cube

Publishing a Mondrian 4 schema

Pentaho Reporting

Introduction

Copying the MongoDB JDBC library

Connecting to MongoDB using Reporting Wizard

Connecting to MongoDB via PDI

Adding a chart to a report

Adding parameters to a report

Adding a formula to a report

Grouping data in reports

Creating subreports

Creating a report with MongoDB via Java

Publishing a report to the Pentaho server

Running a report in the Pentaho server

The Pentaho BI Server

Introduction

Importing Foodmart MongoDB sample data

Creating a new analysis view using Pentaho Analyzer

Creating a dashboard using Pentaho Dashboard Designer

Pentaho Dashboards

Introduction

Copying the MongoDB JDBC library

Importing a sample repository

Using a transformation data source

Using a BeanShell data source

Using Pentaho Analyzer for MongoDB data source

Using a Thin Kettle data source

Defining dashboard layouts

Creating a Dashboard Table component

Creating a Dashboard line chart component

Pentaho Community Contributions

Introduction

The PDI MongoDB Delete Step

The PDI MongoDB GridFS Output Step

The PDI MongoDB Map/Reduce Output step

The PDI MongoDB Lookup step

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Migrating data from files to MongoDB

In this recipe, we will guide you through creating a transformation that loads data from different files in your filesystem, and then load them into a MongoDB Collection. We are going to load data from files called orders.csv, customers.xls, and products.xml. Each of these files contains a key that we can use to join data in PDI before we send it to the MongoDB Output step.

Getting ready

Start Spoon and take a look at the content of the orders.csv, customers.xls, and products.xml files. This will help you understand what the data looks like before you start loading it into MongoDB.

How to do it…

You will need the orders.csv, customers.xls, and products.xml files. These files will be available at the Packt Publishing website, just in case you don't have them. Make sure that MongoDB is up and running, and then you will be able to perform to the following steps:

Create a new empty transformation.
1. Set the transformation name to Migrate data from files to MongoDB.
2. Save the transformation with the name chapter1-files-to-mongodb.
Select data from the orders.csv file using the CSV file input step.
1. Select the Design tab in the left-hand-side view.
2. From the Input category folder, find the CSV file input step and drag and drop it into the working area in the right-hand-side view.
3. Double-click on the step to open the CSV Input configuration dialog.
4. Set Step Name to Select Orders.
5. In the Filename field, click on the Browse button, navigate to the location of the .csv file, and select the order.csv file.
6. Set the Delimiter field to a semicolon (;).
7. Now, let's define our output fields by clicking on the Get Fields button. A Sample size dialog will appear; it is used to analyze the format data in the CSV file. Click on OK. Then, click on Close in Scan results.
8. Click on OK to finish the configuration of the CSV file input.
Select data from the customers.xls file using the Microsoft Excel Input step.
1. Select the Design tab in the left-hand-side view.
2. From the Input category folder, find the Microsoft Excel Input step and drag and drop it into the working area in the right-hand-side view.
3. Double-click on the step to open the Microsoft Excel Input dialog.
4. Set Step Name to Select Customers.
5. On the Files tab, in the File or directory field, click on the Browse button and choose the location of the customers.xls file in your filesystem. After that, click on the Add button to add the file to the list of files to be processed.
6. Select the Sheets tab. Then, click on the Get sheetname(s)... button. You'll be shown an Enter list dialog. Select Sheet1 and click on the > button to add a sheet to the Your selection list. Finally, click on OK.
7. Select the Fields tab. Then, click on the Get field from header row... button. This will generate a list of existing fields in the spreadsheet. You will have to make a small change; change the Type field for Customer Number from Number to Integer. You can preview the file data by clicking on the Preview rows button.
8. Click on OK to finish the configuration of the Select Customers step.
Select data from the products.xml file using the Get data from XML step.
1. Select the Design tab in the left-hand-side view.
2. From the Input category folder, find the Get data from XML step and drag and drop it into the working area in the right-hand-side view.
3. Double-click on the step to open the Get data from XML dialog.
4. Set Step Name to Select Products.
5. On the File tab, in the File or directory field, click on the Browse button and choose the location of the products.xml file in your filesystem. After that, click on the Add button to add the file to the list of files to be processed.
6. Select the Content tab. Click on the Get XPath nodes button and select the /products/product option from the list of the Available Paths dialog.
7. Next, select the Fields tab. Click on the Get fields button and you will get a list of available fields in the XML file. Change the types of the last three fields (stockquantity, buyprice, and MSRP) from Number to Integer. Set the Trim Type to Both for all fields.
Now, let's join the data from the three different files.
1. Select the Design tab in the left-hand-side view.
2. From the Lookup category folder, find the Stream lookup step. Drag and drop it onto the working area in the right-hand-side view. Double-click on Stream lookup and change the Step name field to Lookup Customers.
3. We are going to need two lookup steps for this transformation. Drag and drop another Stream Lookup step onto the design view, and set Step Name to Lookup Products.
4. Create a hop between the Select Orders step and the Lookup Customers step.
5. Then, create a hop from the Select Customers step to the Lookup Customers step.
6. Next, create a hop from the Lookup Customers step to the Lookup Products step.
7. Finally, create a hop from Select Products to the Lookup Products step.
Let's configure the Lookup Customers step. Double-click on the Lookup Customers step and set the Lookup step field to the Select Customers option.
1. In the Keys section, set the Field and Lookup Field options to Customer Number.
2. Click on the Get lookup fields button. This will populate the step with all the available fields from the lookup source. Remove Customer Number from the field from the list.
3. Click on OK to finish.
Let's configure the Lookup Products step. The process is similar to that of the Lookup Customers step but with different values. Double-click on the Lookup Products step and set the Lookup step field to the Select Products option.
1. In the Keys section, set Field to Product Code and the LookupField option to Code.
2. Click on the Get lookup fields button. This will populate the step with all the available fields from the lookup source. Remove Code from the field in the list.
3. Click on OK to finish.
Now that we have the data joined correctly, we can write the data stream to a MongoDB collection.
1. On the Design tab, from the Big Data category folder, find the MongoDB Output step and drag and drop it into the working area in the right-hand-side view.
2. Create a hop between the Lookup Products step and the MongoDB Output step.
3. Double-click on the MongoDB Output step and change the Step name field to Orders Output.
4. Select the Output options tab. Click on the Get DBs buttons and select the SteelWheels option for the Database field. Set the Collection field to Orders. Check the Truncate collection option.
5. Select the Mongo document fields tab. Click on the Get fields button and you will get a list of fields from the previous step.
6. Configure the Mongo document output as seen in the following screenshot:
7. Click on OK.
You can run the transformation and check out MongoDB for the new data. Your transformation should look like the one in this screenshot:

How it works…

In this transformation, we initially get data from the Orders CSV. This first step populates the primary data stream in PDI. Our other XLS and XML steps also collect data. We then connect these two streams of data to the first stream using the Lookup steps and the correct keys. When we finally have all of the data in the single stream, we can load it into the MongoDB collection.

You can learn more about the Stream lookup step online at:

http://wiki.pentaho.com/display/EAI/Stream+Lookup

Pentaho Analytics for MongoDB Cookbook

By : Joel André Latino, Harris Ward

Pentaho Analytics for MongoDB Cookbook

By: Joel André Latino, Harris Ward

Overview of this book

Related Content you might be interested in

Current Title:

Pentaho Analytics for MongoDB Cookbook

Migrating data from files to MongoDB

Getting ready

How to do it…

How it works…