MapReduce Job Execution Process – Job Cleanup

Hi Hadoopers,

So we are looking at the 7th circle today – which is the job clean up.



MR job writes many intermediate results and junk files during the operation. Once the job is completed, these junks would occupy space on HDFS which is of no benefit any more. Hence the clean up task is launched.


  1. Job tracker informs all the task trackers to perform the cleanup.
  2. Individual task tracker cleans up the work folders
  3. They clean up the temporary directory
  4. Once the cleanup task is successful, Task Tracker ends the job by writing _SUCCESS file


Pig Installation and Configuration


Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMSs. Pig Latin can be extended using User Defined Functions (UDFs) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language.

This is the continuation of my Hadoop series. This will follow the folder structures created earlier. Pls refer to the following posts.

Download and Extract

hadoop@gandhari:~$ wget
hadoop@gandhari:~$ gunzip pig-0.16.0.tar.gz
hadoop@gandhari:~$ tar -xvf pig-0.16.0.tar
hadoop@gandhari:~$ ln -s pig-0.16.0 pig

.bashrc Changes

Add the following environmental variables to your ~/.bashrc

export PIG_CONF_DIR=/etc/hadoop/conf
export PIG_CLASSPATH=/etc/hadoop/conf
export PIG_HOME=/opt/hadoop/pig
export PATH=$PATH:/opt/hadoop/pig/bin

Sample Pig Job

Let’s copy a file using hadoop and read it with Pig

hadoop@gandhari:~$ ls -alt>test.txt
hadoop@gandhari:~$ hadoop fs -mkdir /pigdata
hadoop@gandhari:~$ hadoop fs -put test.txt /pigdata

Let’s enure the jobs are running.

hadoop@gandhari:~$ jps
7156 ResourceManager
6788 DataNode
6998 SecondaryNameNode
7846 Jps
7276 NodeManager
6671 NameNode

hadoop@gandhari:~$ pig

grunt> a = LOAD ‘/data/test.txt’ as (name:chararray);

grunt> dump a;

2016-08-24 14:01:27,845 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil – Total input paths to process : 1
(total 452532)
(drwxr-xr-x 15 hadoop hadoop      4096 Aug 24 06:00 .)
(-rw-rw-r–  1 hadoop hadoop         0 Aug 24 06:00 text.txt)
(drwxrwxr-x  3 hadoop hadoop      4096 Aug 24 05:58 logs)
(-rw-rw-r–  1 hadoop hadoop       139 Aug 23 16:09 .hivehistory)
(-rw-r–r–  1 hadoop hadoop      4567 Aug 23 15:59 .bashrc)




Error while Starting Hive – DatastoreDriverNotFoundException

Here is a scary exception thrown out when I started Hive. The specified datastore driver (“com.mysql.jdbc.Driver”) was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.

I forgot to copy the mysql driver to hive lib folder. Here is the command to copy the same

cp /usr/share/java/mysql-connector-java-5.1.38.jar /opt/hadoop/hive/lib/


carrois gothic – Nice google font for the web applications

I like this font and embedded in my site.

carrios gothic

<link rel='stylesheet' id='superhero-carrois-gothic-css'  href='' type='text/css' media='all' />

body {
font-family:Carrois Gothic;
font-size: 1.2rem;
text-align: left;
font-weight: 400;

When I say someone, this font is not readable on the screen, I used to get a reply ‘this font is used in xxxxx device’. My reaction then would be like this!

com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

Code snippet – Date Picker with jQuery

Here is the snippet for jQuery date picker.


  <link rel="stylesheet" href="//">

Date (Click for calendar): <input type="text" name="mydate" id="mydate"/>

Spring mvc and jQuery DatePicker

We have added a simple CSS using the ResourceHandlers.  Now, it is the time to test jQuery on a Spring MVC. First Step is to display the calendar object on the jsp.

Spring mvc and jQuery DatePicker

Insert the scripts and CSS to head part of jsp
<link rel=”stylesheet”

Add the following javascript. Replace expenseDate with your field id

$(document).ready(function() {
$(function() {

Thats all. The calendar object should rendered on the browser window.