Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Data analytics workflow


Data analytics involves transforming and inspecting data to figure out the inherent meaningful information from it. The information extracted is used in decision making or suggesting conclusions. The analytics workflow is shown in the following diagram:

The steps involved in the analytics workflow are as follows:

  1. The first step is to identify the problem to be solved. This is important as the decisions in the rest of the steps hinge on it. For example, the problem statement will dictate what kind of data to collect and what the important features that represent the solution to the problem are. A lot of domain expertise is required in data analytics, and a problem space where expertise is accessible is almost mandatory.

  2. Once the problem is identified, appropriate data needs to be collected. The collected data needs to be represented in a format that optimizes on space without losing resolution in information. Enterprises now need to be aware of compliance and security...