Pentaho Analytics for MongoDB Cookbook

Pentaho Analytics for MongoDB Cookbook

By : Joel André Latino, Harris Ward

Buy this Book

Pentaho Analytics for MongoDB Cookbook

By: Joel André Latino, Harris Ward

Buy this Book

Overview of this book

MongoDB is an open source, schemaless NoSQL database system. Pentaho as a famous open source Analysis tool provides high performance, high availability, and easy scalability for large sets of data. The variant features in Pentaho for MongoDB are designed to empower organizations to be more agile and scalable and also enables applications to have better flexibility, faster performance, and lower costs. Whether you are brand new to online learning or a seasoned expert, this book will provide you with the skills you need to create turnkey analytic solutions that deliver insight and drive value for your organization. The book will begin by taking you through Pentaho Data Integration and how it works with MongoDB. You will then be taken through the Kettle Thin JDBC Driver for enabling a Java application to interact with a database. This will be followed by exploration of a MongoDB collection using Pentaho Instant view and creating reports with MongoDB as a datasource using Pentaho Report Designer. The book will then teach you how to explore and visualize your data in Pentaho BI Server using Pentaho Analyzer. You will then learn how to create advanced dashboards with your data. The book concludes by highlighting contributions of the Pentaho Community.

Pentaho Analytics for MongoDB Cookbook

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

PDI and MongoDB

Introduction

Learning basic operations with Pentaho Data Integration

Migrating data from the RDBMS to MongoDB

Loading data from MongoDB to MySQL

Migrating data from files to MongoDB

Exporting MongoDB data using the aggregation framework

MongoDB Map/Reduce using the User Defined Java Class step and MongoDB Java Driver

Working with jobs and filtering MongoDB data using parameters and variables

The Thin Kettle JDBC Driver

Introduction

Using a transformation as a data service

Running the Carte server in a single instance

Running the Pentaho Data Integration server in a single instance

Define a connection using a SQL Client (SQuirreL SQL)

Pentaho Instaview

Introduction

Creating an analysis view

Modifying Instaview transformations

Modifying the Instaview model

Exploring, saving, deleting, and opening analysis reports

A MongoDB OLAP Schema

Introduction

Creating a date dimension

Creating an Orders cube

Creating the customer and product dimensions

Saving and publishing a Mondrian schema

Creating a Mondrian 4 physical schema

Creating a Mondrian 4 cube

Publishing a Mondrian 4 schema

Pentaho Reporting

Introduction

Copying the MongoDB JDBC library

Connecting to MongoDB using Reporting Wizard

Connecting to MongoDB via PDI

Adding a chart to a report

Adding parameters to a report

Adding a formula to a report

Grouping data in reports

Creating subreports

Creating a report with MongoDB via Java

Publishing a report to the Pentaho server

Running a report in the Pentaho server

The Pentaho BI Server

Introduction

Importing Foodmart MongoDB sample data

Creating a new analysis view using Pentaho Analyzer

Creating a dashboard using Pentaho Dashboard Designer

Pentaho Dashboards

Introduction

Copying the MongoDB JDBC library

Importing a sample repository

Using a transformation data source

Using a BeanShell data source

Using Pentaho Analyzer for MongoDB data source

Using a Thin Kettle data source

Defining dashboard layouts

Creating a Dashboard Table component

Creating a Dashboard line chart component

Pentaho Community Contributions

Introduction

The PDI MongoDB Delete Step

The PDI MongoDB GridFS Output Step

The PDI MongoDB Map/Reduce Output step

The PDI MongoDB Lookup step

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Working with jobs and filtering MongoDB data using parameters and variables

In this recipe, we guide you through creating two PDI jobs. One uses variables and the other uses parameters. In a PDI process, jobs orchestrate other jobs and transformations in a coordinated way to realize the main business process. These jobs use the transformation created in the last recipe but with some changes, as described in this recipe.

So, in this recipe, we are going create two different jobs, which will be used to send data to a subtransformation. The subtransformation that we will use will be a copy of the transformation in the previous recipe.

Getting ready

To get ready for this recipe, you need to start your ETL development environment Spoon, and make sure you have the MongoDB server started with the data inserted in the last recipes.

How to do it…

Let's start using jobs and variables. We can orchestrate the ETL to run in different ways. In this simple case, we are just using the customer name. Perform the following steps:

Let's copy and paste the transformation created in the previous recipe and save it as chapter1-mongodb-map-reduce-writelog.ktr.
Open that transformation using Spoon, and from the Utility category folder, find the Write to log step. Drag and drop it into the working area in the right-side view.
1. Create a hop between the OUTPUT step and the Write to log step.
2. Double-click on the Write to Log step to open the configuration dialog.
3. Set Step Name to MapReduce.
4. Click on the Get Fields button.
5. Click on OK to finish the configuration.
Let's create a new empty job.
1. Click on the New file button from the toolbar menu and select the Job item entry. Alternatively from menu bar, go to File | New | Job.
2. Open the Job properties dialog by pressing Ctrl + J or by right-clicking on the right-hand-side working area and selecting Job settings.
3. Select the Job tab. Set Job Name to Job Parameters.
4. Select the Parameters tab and add a Parameter entry with the name as CUSTOMER_NAME. Click on OK.
5. Save the Job with the name job-parameters.
From the General category folder, find the START, Transformation, and Success steps and drag and drop them into the working area in the right-side view.
1. Create a hop between the START step and the Transformation step.
2. Then, create a hop from the Transformation step to the Success step.
3. Double-click on the Transformation step to open the configuration dialog
4. Change the Name of job entry property to MapReduce Transf.
5. Click on the transformation button of the Transformation filename field and select the transformation file that you copied before in your filesystem. Also select the chapter1-mongodb-map-reduce-writelog.ktr file.
6. Select the Parameters tab. By default, the Pass all parameters values down to the sub-transformation option is checked, which means our job parameter will be passed to the transformation.
7. Click on OK to finish.
8. Run the job and analyze the results and check the logs on the Logging tab.

Now let's do a quick and simple example using variables:

Copy and paste the chapter1-mongodb-map-reduce-writelog transformation. Save it as chapter1-mongodb-map-reduce-writelog-without-parameter.
Open the transformation with Spoon and remove the parameter from Transformation properties.
Copy and paste the last job. Save it as job-variables.
1. Open the job with Spoon.
2. In Job properties, change the job name to Job Variables. From the Parameters tab, remove the CUSTOMER_NAME parameter. Select the parameter, right-click on it and select Delete selected lines, or just press delete on your keyboard.
3. Click on OK to finish.
From the General category folder, find the Set variables step and drag and drop it into the working area in the right-side view.
1. Remove the hop from between the START step and MapReduce Transf step.
2. Create a hop between the START step and the Set variables step.
3. Then, create a hop between Set Variables and the MapReduce Transf step.
4. Double-click on the Set Variables step to open the configuration dialog.
5. Set the Step name property to Set CUSTOMER_NAME.
6. On Variables, create a new variable with the CUSTOMER_NAME name. Set the value to an existing client in the database and the Scope type to Valid in the root job.
7. Click on OK to finish the configuration.
On the MapReduce Transf transformation step, change the file location for the transformation file to the transformation without the parameter.
Run the job and analyze the results, checking the logs in the Logging tab.

How it works…

Most ETL solutions created in Pentaho Data Integration will be sets of jobs and transformations.

Transformations are workflows with an orchestration of actions that manipulate data using essentially input, transformation, and output steps.

Jobs are workflows with an orchestration of tasks that can be order execution failure or success.

Variables and parameters are extremely useful functions that we can use to create dynamic jobs and transformations.

Pentaho Analytics for MongoDB Cookbook

By : Joel André Latino, Harris Ward

Pentaho Analytics for MongoDB Cookbook

By: Joel André Latino, Harris Ward

Overview of this book

Related Content you might be interested in

Current Title:

Pentaho Analytics for MongoDB Cookbook

Working with jobs and filtering MongoDB data using parameters and variables

Getting ready

How to do it…

How it works…