MapReduce Job Execution Process – Job Submission

Hi Hadoopers,

After publishing many posts about MapReduce code, we’ll see the MR internals like, how the MR job is submitted and executed.

hadoop037-job-submission-1

This post talks about first circle – Job Submission.

We compiled the MR code and jar is ready. We execute the job with hadoop jar xxxxxx. First the job is submitted to hadoop. There are schedulers which runs the job, based on cluster capacity and availability.

I want to scribble down quick notes on Job Submission using the below given gantt diagram.

hadoop038-job-submission-2

  1. The user submits the job to Job Client.
  2. Job client talks to Job Tracker to get the job id
  3. The job client creates a staging directory in HDFS. This is where all the files related to the job would get uploaded.
  4. The MR code and configurations with their 10 replicas of the blocks would be uploaded to Staging directory. Jar file of the job, job splits, split meta data and job.xml which has the job description would be uploaded.
  5. Splits are computed automatically and input is read.
  6. Meta data of split is uploaded to HDFS
  7. Job is submitted and it is ready to execute.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s