Mappers run on individual machines and prepare intermediate results. They accept splits as inputs. Reducer accepts partition of data as inputs. Preparing the partitions from the intermediate mapper results is the responsibility of this sort & spill phase, which is the 5th circle given below.
During this phase, we find the keys from all the mappers, sort and shuffle them before sending them to one machine, where the reducer will run.
- The task tracker 1 initiates a reducer task.
- TT1 updates the job tracker about the completion status.
- Similarly TT2 or any other task trackers also updates the Job Tracker about completion status.
- The reducer task goes to TT1 where the mapping task is finished to collect the interim results.
- TT2 read the mapper output streams it to the reducer
- Task 4 and 5 is repeated for other task trackers also who are all involved in mapping tasks.
- Once the reducer received all the mapping results, it performs a sorting and spilling.