Book Image

Pentaho for Big Data Analytics

By : Manoj R Patil, Feris Thia
Book Image

Pentaho for Big Data Analytics

By: Manoj R Patil, Feris Thia

Overview of this book

<p>Pentaho accelerates the realization of value from big data with the most complete solution for big data analytics and data integration. The real power of big data analytics is the abstraction between data and analytics. Data can be distributed across the cluster in various formats, and the analytics platform should have the capability to talk to different heterogeneous data stores and fetch the filtered data to enrich its value.<br /><br />Pentaho Big Data Analytics is a practical, hands-on guide that provides you with clear, step-by-step exercises for using Pentaho to take advantage of big data systems, where data beats algorithm, and gives you a good grounding in using Pentaho Business Analytics’ capabilities.<br /><br />This book looks at the key ingredients of the Pentaho Business Analytics platform. We will see how to prepare the Pentaho BI environment, and get to grips with the big data ecosystem through. The book provides a clear guide to the essential tools of Pentaho Business Analytics, providing familiarity with both the various design tools for setting up reports, and the visualization tools necessary for complete data analysis.</p>
Table of Contents (14 chapters)
Pentaho for Big Data Analytics
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Importing data to Hive


Before we begin the walkthrough, see Appendix A, Big Data Sets, to complete the Hive nyse_stocks data preparation and follow these steps:

  1. Launch Spoon if you have closed it.

  2. On the File menu, click on New and select Transformation.

  3. On the left-hand side panel, click on the View tab.

  4. Right-click on the Database connections node to show up a contextual menu and choose New.

The following screenshot shows you how to create a new database connection:

When the Database Connection dialog appears, fill in the following configuration:

  • Connection Name: HIVE2

  • Connection Type: Hadoop Hive 2

  • Host Name: [your working IP address]

  • Database Name: default

Now follow these steps:

  1. Click on the Test button to verify the connection. If successful, click on the OK button to close it. The display window will look like the following screenshot:

  2. On the left-hand side panel, click on the Design tab.

  3. In the Input group, click on the Table input step and drag it into the working space. The following screenshot...