I wrote about the first step of the MR Job execution – Job Submission in my earlier post.
In this post, we talk about 2nd circle, which is Job initialization.
I got the job, How will I execute it. This is what hadoop elephant is thinking with a yarn in its trunk!
- Once the job is submitted, it becomes Job Tracker’s responsibility to initialize it.
- The job xml uploaded at the staging directory created as given in my earlier post. Job Tracker reads it and perform the validation.
- Once the XML validation is completed, It goes to scheduler for job validations. Scheduler check is the user is authorized for this job, content is allowed etc.
- If the job validation is also successful, the job is added by the Scheduler. The schedule information is updated.
- Job Scheduler initializes the job.
- It reads the number of splits needed for the job to get executed.
- Tasks are created to exec the job. If we have many splits, that many map tasks would be spawned.