A Hadoop cluster is a system comprising two or more computers or systems (called nodes). It represents a single unified system for the users. The nodes work together to execute applications or perform other tasks like a virtual machine. There are variants of Hadoop clusters that cater for different data needs. The key features in the construction of these platforms are reliability, load balancing, and performance.
The single node or pseudo-distributed cluster has the essential daemons such as NameNode, DataNode, JobTracker, and TaskTracker, all run on the same machine. A single node cluster is a simple configuration system used to test Hadoop applications by simulating a full cluster-like environment with a replication factor of 1.
A small Hadoop cluster comprises a single master and multiple worker nodes. The master node is comprised of a Job Tracker, Task Tracker, NameNode, and DataNode. A slave or worker node performs the roles of both a DataNode and TaskTracker if required...