Book Image

Big Data Forensics: Learning Hadoop Investigations

Book Image

Big Data Forensics: Learning Hadoop Investigations

Overview of this book

Table of Contents (15 chapters)
Big Data Forensics – Learning Hadoop Investigations
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Collecting other Hadoop application data and non-Hadoop data


Not all relevant Hadoop data is always stored and accessed within Hive, HBase, or even HDFS. Hadoop clusters are typically part of a larger data analysis ecosystem. This means that data flows into and out of Hadoop from other systems. Inside Hadoop, and at the Hadoop data ingress and egress points, data transfers and transformations may occur. These changes to the data may be relevant, and as such, the investigator may need to collect data from these systems.

Many other Hadoop applications are available for data analysis and storage. The Apache Foundation currently lists many projects and incubator projects that are deployed in production environments. Applications such as Cassandra, Chukwa, and Spark may be found in the course of an investigation as well as new ones (for example, Drill and Tajo). When a new or uncommon application is identified, the investigator can apply the same collection process for each application, which...