Eclipse project dependency and Maven


I’m unable to schedule the posts for the past two days, as I have been stuck with creating the input data for my ongoing exercise. Unfortunately I got stuck somewhere.

Today, lets talk about updating maven dependencies when you have project dependencies in eclipse.

Say, I have project dependencies in my Eclipse project.


Eclipse recognize it well and your code will not show any errors, when you use the classes of the dependencies.

But, Maven doesn’t care about the project dependencies unless you instruct it to do. So my build process is failed.

The project I rely on, is also a maven project with following identifiers.

groupId: jatomrss
artifactId: jatomrss
version: 0.0.5-SNAPSHOT

I define the same in my pom.

<!– RSS feed parsing library – local eclipse project –>

So what happens?

[INFO] ————————————————————————
[INFO] ————————————————————————
[INFO] Total time: 4.075 s
[INFO] Finished at: 2016-10-06T05:46:42+08:00
[INFO] Final Memory: 28M/337M
[INFO] ————————————————————————

Pls check how to add non-maven local jars to your maven projects in my post Adding local libraries to Maven

Good day.

Lab 02 – A Simple Hadoop Mapper with Eclipse and Maven

Hi Hadoopers,

All the tasks given below are done on the Hadoop server. I assume you have downloaded the Eclipse IDE for your platform. (I use STS for this demo, as it has Maven plugin out of the box)

Create a new Java project.


After opening the Java project, convert it in to Maven project.


After adding maven capability add the following dependencies of Hadoop 2.6.4.

        <!-- -->
        <!-- -->




Let’s create a mapper to count the words in a file. Refer to for more details.

package org.grassfield.hadoop;

import java.util.StringTokenizer;

import org.apache.hadoop.mapreduce.Mapper;

 * @author Pandian
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
     * This is to store the output string
    private Text word = new Text();
     * This is to denote each occurrence of the word
    private IntWritable one = new IntWritable(1);

    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
            throws IOException, InterruptedException {
        //read line by line
        String line = value.toString();
        //tokenize by comma
        StringTokenizer st = new StringTokenizer(line, ",");
        while (st.hasMoreTokens()) {
            //store the token as the word
            //register the count
            context.write(word, one);



Mapper class cannot be executed by itself. hence we write a mapper driver.

package org.grassfield.hadoop;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;

import com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider.Text;

 * @author pandian
public class WordCountDriver extends Configured implements Tool {
    public WordCountDriver(Configuration conf){
        //Assign the configuration to super class. 
        //Otherwise you will get null pointer for getConf()

    public int run(String[] args) throws Exception {
        //get the configuration
        Configuration conf = getConf();

        //initiate the parser with arguments and configuration
        GenericOptionsParser parser = new GenericOptionsParser(conf, args);
        args = parser.getRemainingArgs();
        //input and output HDFS locations are received as command line argument
        Path input = new Path(args[0]);
        Path output = new Path(args[1]);
        //Mapper Job is defined
        Job job = new Job(conf, "Word Count Driver");
        //Output format is defined
        //reducer count is defined. We do not have any reducers for this assignment
        //File input and output formats are specified
        FileInputFormat.setInputPaths(job,  input);
        FileOutputFormat.setOutputPath(job, output);
        //Set the mapper class and run the job
        boolean b = job.waitForCompletion(true);
        return 0;
     * @param args input and output files are specified
     * @throws Exception
    public static void main (String[]args) throws Exception{
        Configuration conf = new Configuration();
        WordCountDriver driver = new WordCountDriver(conf);;



Let’s execute maven with clean install target. Jar will be copied to the below given location

[INFO] --- maven-install-plugin:2.4:install (default-install) @ WordCount ---
[INFO] Installing D:\workspace_gandhari\WordCount\target\WordCount-0.0.1-SNAPSHOT.jar to C:\Users\pandian\.m2\repository\WordCount\WordCount\0.0.1-SNAPSHOT\WordCount-0.0.1-SNAPSHOT.jar

I copy the jar file to hadoop server as hadoop user.

Input file

To execute the mapper, we need a file in HDFS. I have the following file already.

hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/lab01/month.txt

hadoop@gandhari:~/jars$ hadoop jar WordCount-0.0.1-SNAPSHOT.jar org.grassfield.hadoop.WordCountDriver /user/hadoop/lab01/month.txt /user/hadoop/lab01/output/10
16/09/09 23:32:22 INFO mapreduce.JobSubmitter: number of splits:1
16/09/09 23:32:23 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/09/09 23:32:23 INFO mapreduce.Job: Running job: job_local57180184_0001
16/09/09 23:32:23 INFO mapred.LocalJobRunner: Starting task: attempt_local57180184_0001_m_000000_0
16/09/09 23:32:23 INFO mapred.MapTask: Processing split: hdfs://gandhari:9000/user/hadoop/lab01/month.txt:0+90
16/09/09 23:32:23 INFO output.FileOutputCommitter: Saved output of task 'attempt_local57180184_0001_m_000000_0' to hdfs://gandhari:9000/user/hadoop/lab01/output/10/_temporary/0/task_local57180184_0001_m_000000
16/09/09 23:32:24 INFO mapreduce.Job:  map 100% reduce 0%
16/09/09 23:32:24 INFO mapreduce.Job: Job job_local57180184_0001 completed successfull

Output file

Let’s see if our output file is created

hadoop@gandhari:~/jars$ hadoop fs -ls /user/hadoop/lab01/output/10
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2016-09-09 23:32 /user/hadoop/lab01/output/10/_SUCCESS
-rw-r--r--   3 hadoop supergroup        114 2016-09-09 23:32 /user/hadoop/lab01/output/10/part-m-00000

part-m denotes that it is the output of mapper. Here is the output of our job.

hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/lab01/output/10/part-m-00000
chithirai       1
vaigasi 1
aani    1
aadi    1
aavani  1
purattasi       1
aippasi 1
karthikai       1
margazhi        1
thai    1
thai    1
panguni 1

Interesting, isn’t it!

Have a good week.

Adding local libraries to Maven

How to add user defined .jar files or inhouse developed libraries to Maven Repository, so that my maven project will get executed correctly. Here is the steps I did to add two jar files c:\eg_agent.jar and c:\eg_util.jar to maven repository.

F:\sts-bundle\apache-maven-3.3.9-bin\apache-maven-3.3.9\bin>mvn install:install-file -Dfile=c:\eg_agent.jar -DartifactId=agent -Dversion=6.1.2 -Dpackaging=jar

Once it is added to repository, I added the dependency as given below.


Here is another one.

F:\sts-bundle\apache-maven-3.3.9-bin\apache-maven-3.3.9\bin>mvn install:install-file -Dfile=c:\eg_util.jar -DartifactId=util -Dversion=6.1.2 -Dpackaging=jar



Oozie mkdistro fails with mvn: command not found

Oozie installation is not stright forward similar to other applications. When I executed the mkdistro script, it failed with the below given error

./ line 71: mvn: command not found

Maven is a build tool for java, which is not instaled in my ubuntu VM. Hence we need to install it using the below given command to make the script working

hadoop@gandhari:~/oozie/bin$ sudo apt-get install maven


Maven and jUnit

package junit.framework does not exist


This was another ugly exception I got while building my simple application. Build Failure!


Unfortunately I did created the test case inside src/main/java folder, which Maven doesn’t like. After moving the test to src/test/java, he is happy.