So we are looking at the 7th circle today – which is the job clean up.
MR job writes many intermediate results and junk files during the operation. Once the job is completed, these junks would occupy space on HDFS which is of no benefit any more. Hence the clean up task is launched.
- Job tracker informs all the task trackers to perform the cleanup.
- Individual task tracker cleans up the work folders
- They clean up the temporary directory
- Once the cleanup task is successful, Task Tracker ends the job by writing _SUCCESS file