After writing about the ecosystem of Hadoop, I should write about wiring those blocks to see them working. Before doing this, I prefer to document the HFDS/MR paradigm quickly.
If we look at the Hadoop in a high level, we can separate it into 2 parts.
Nodes in Hadoop clusters stores the data in HDFS. It stores the huge volume of data as different small blocks. HDFS is running on top of unix filesystem (or others where the HDFS is running)
Searching for the data across multiple nodes, based on catalog and aggregating them to arrive at resired results is called as MP Reduce processing.
I have depicted it diagramatically below.