Summary
This chapter covered several methods investigators can use to collect data from HDFS. Investigators can collect HDFS data from the host operating system by imaging or collecting logical files. They can also collect HDFS data via the Hadoop shell, a data transfer tool such as Sqoop, or using other methods, such as a custom-developed Java application, or relying on an outside party to perform the collection. Each method has its own advantages and disadvantages. The pros and cons for each are covered in the following tables:
Methods |
Pros |
Cons |
---|---|---|
Host operating system collection |
This has a complete forensic collection |
This requires collection across each node and manual re-piecing of data blocks for analysis |
This follows standard forensic process |
This is a time-consuming and cumbersome process | |
This captures the system as is, including slack space and deleted files |
This requires extra disk space for extraneous collected data | |
Hadoop shell command collection |
This collects Hadoop... |