After publishing many posts about MapReduce code, we’ll see the MR internals like, how the MR job is submitted and executed.
This post talks about first circle – Job Submission.
We compiled the MR code and jar is ready. We execute the job with hadoop jar xxxxxx. First the job is submitted to hadoop. There are schedulers which runs the job, based on cluster capacity and availability.
I want to scribble down quick notes on Job Submission using the below given gantt diagram.
- The user submits the job to Job Client.
- Job client talks to Job Tracker to get the job id
- The job client creates a staging directory in HDFS. This is where all the files related to the job would get uploaded.
- The MR code and configurations with their 10 replicas of the blocks would be uploaded to Staging directory. Jar file of the job, job splits, split meta data and job.xml which has the job description would be uploaded.
- Splits are computed automatically and input is read.
- Meta data of split is uploaded to HDFS
- Job is submitted and it is ready to execute.