MongoDB and Pentaho go together like yin and yang. They are emerging as a powerful combination for scalable data storage, processing, and analytics. Leading companies are pairing these complementary technologies together in development labs and production to deliver innovative analytics. These innovations are creating worldwide demand for developers with skills in both Pentaho and MongoDB.
You want to make an impact by creating innovative data storage capabilities or eye-catching data visualizations. Wouldn't it be great if you could quickly ramp up on both technologies to develop a turn-key solution for your organization? However, as with any new and emerging technology combination, the availability of organized knowledge on the combined topic is scarce.
Pentaho Analytics for MongoDB will show you how to develop an analytic solution that you can demonstrate to your colleagues. It is a practical guide to get you started with both Pentaho and MongoDB, beginning with basic MongoDB data modeling and querying and then advancing to data integration, analysis, and reporting with Pentaho. Each chapter guides you through using different components of the Pentaho platform to create analytic models and reports using a sample MongoDB database.
Chapter 1, Getting Started with Pentaho and MongoDB, introduces you to the powerful combination of MongoDB and Pentaho and provides step-by-step guidance on how to install and configure both technologies and restore the sample MongoDB data provided with this book.
Chapter 2, MongoDB Database Fundamentals, expands on the topic of data modeling and explains MongoDB database concepts essential to querying MongoDB data with Pentaho.
Chapter 3, Using Pentaho Instaview, shows you how to visualize data by connecting Pentaho to MongoDB. You use Instaview with the sample MongoDB database to analyze and visualize the website clickstream data.
Chapter 4, Modifying and Enhancing Instaview Transformations, introduces Pentaho Data Integration (PDI)—the ETL tool used by Instaview to extract, load, and transform data from various data sources.
Chapter 5, Modifying and Enhancing Instaview Metadata, explores metadata by explaining dimensional modeling concepts and how to model metadata to better reflect business requirements.
Chapter 6, Pentaho Report Designer Fundamentals, teaches you the basics of Pentaho Report Designer (PRD) to build pixel-perfect reports sourced directly from MongoDB databases.
Chapter 7, Pentaho Report Designer Prompting and Charting, expands on the previous chapter by teaching you additional advanced PRD features. You can enhance your report with new queries, charts, and a prompt designed to make the report more interactive.
Chapter 8, Deploying Pentaho Analytics to the Web, is all about web-enabling your MongoDB data using Pentaho methods and web interfaces for connecting to, modeling, and analyzing our sample clickstream data in a web browser.
We need the following software for this book:
Pentaho Business Analytics v5.0.2 (64-bit for Windows)
MongoDB v2.2.3 (64-bit for Windows)
This book provides two data sources for use throughout the book, a MongoDB database of sample web clickstream data, and an associated comma-separated (CSV) file containing geographic data. Both files are available as a free download from: http://www.packtpub.com/support.
This book is intended for business analysts, data architects, and developers new to either Pentaho or MongoDB, who want to be able to deliver a complete solution for storing, processing, and visualizing data. It's assumed that you already have experience in defining the data requirements needed to support business processes and exposure to database modeling, SQL query, and reporting techniques.
In this book, you will find a number of styles of text that distinguish between different kinds of information. The following are some examples of these styles and an explanation of their meaning.
Code words in text are shown as follows: "$.event_data[0].event."
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
{ $match : {referring_url : "${ReferringURLParam}"}},
{ $unwind : "$event_data" },
{ $group : { _id : "$browser", event_count : { $sum : 1 } } },
{$sort:{event_count: -1}}
Any command-line input or output is written as follows:
cd \ move C:\mongodb-win32-* C:\mongodb
New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "Select and drag the CSV file input step onto the canvas."
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>
, and mention the book title through the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website, or added to any list of existing errata, under the Errata section of that title.
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]>
with a link to the suspected pirated material.
We appreciate your help in protecting our authors, and our ability to bring you valuable content.
You can contact us at <[email protected]>
if you are having a problem with any aspect of the book, and we will do our best to address it.