MapReduce Job Execution Process – Sort & Spill

Hi Hadoopers,

Mappers run on individual machines and prepare intermediate results. They accept splits as inputs. Reducer accepts partition of data as inputs.  Preparing the partitions from the intermediate mapper results is the responsibility of this sort & spill phase, which is the 5th circle given below.

hadoop037-job-submission-1

 

During this phase, we find the keys from all the mappers, sort and shuffle them before sending them to one machine, where the reducer will run.

hadoop042-sort-n-spill

  1. The task tracker 1 initiates a reducer task.
  2. TT1 updates the job tracker about the completion status.
  3. Similarly TT2 or any other task trackers also updates the Job Tracker about completion status.
  4. The reducer task goes to TT1 where the mapping task is finished to collect the interim results.
  5. TT2 read the mapper output streams it to the reducer
  6. Task 4 and 5 is repeated for other task trackers also who are all involved in mapping tasks.
  7. Once the reducer received all the mapping results, it performs a sorting and spilling.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s