We are in 6th circle today, which is the reducer function. A job is submitted by the user, which has been initiated in 2nd circle for which the setup is completed in 3rd circle.
Map Task was executed in 4th circle and sort & shuffle was completed in 5th circle.
The reducer will collect the output from all the mappers to apply the user defined reduce function.
- Task tracker launches the reduce task
- Reduce task (not reduce function) read the jar and xml of the job.
- It execute the shuffle. Because the time the reducer task started, all the mappers may not have completed the job. So it goes to individual mapper machines to collect the output and shuffles them.
- Once all the mapping activity is finished it invokes the user reducer function (one more reducers).
- Each reducers will complete their jobwrite the output records to HDFS.
- Those output would be stored in temporary output file first.
- Once all the reducers have completed their job, final output would be written to the reducer partition file.