The user had submitted his job. He has permissions. We have slots in the cluster. Job setup is completed. We look at 4th circle given below – The Map Task Execution
The below given diagram depicts the Map Task Execution.
- The task tracker launches the Map Task
- The Map task read the jar file given the user. This is what we write in Eclipse. In the entire frameworks, this is what our contribution 🙂
The Map task also reads the job config (input path, output path etc). It gets everything from HDFS, as all these are already uploaded to HDFS initially.
- The Map task reads the input splits from HDFS
- From the input splits, Map task creates the record.
- The Map task invokes the user Mapper with the record
- The mapper writes intermediate output
- The task sort them based on key and flush them to disk.
- Map task informs Task Tracker about the completion of the job.