Eclipse project dependency and Maven

Hi,

I’m unable to schedule the posts for the past two days, as I have been stuck with creating the input data for my ongoing exercise. Unfortunately I got stuck somewhere.

Today, lets talk about updating maven dependencies when you have project dependencies in eclipse.

Say, I have project dependencies in my Eclipse project.

hadoop046-distributed-cache-eclipse

Eclipse recognize it well and your code will not show any errors, when you use the classes of the dependencies.

But, Maven doesn’t care about the project dependencies unless you instruct it to do. So my build process is failed.

The project I rely on, is also a maven project with following identifiers.

groupId: jatomrss
artifactId: jatomrss
version: 0.0.5-SNAPSHOT

I define the same in my pom.

<!– RSS feed parsing library – local eclipse project –>
<dependency>
<groupId>jatomrss</groupId>
<artifactId>jatomrss</artifactId>
<version>0.0.5-SNAPSHOT</version>
<scope>compile</scope>
</dependency>

So what happens?

[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 4.075 s
[INFO] Finished at: 2016-10-06T05:46:42+08:00
[INFO] Final Memory: 28M/337M
[INFO] ————————————————————————

Pls check how to add non-maven local jars to your maven projects in my post Adding local libraries to Maven

Good day.

Lab 02 – A Simple Hadoop Mapper with Eclipse and Maven

Hi Hadoopers,

All the tasks given below are done on the Hadoop server. I assume you have downloaded the Eclipse IDE for your platform. (I use STS for this demo, as it has Maven plugin out of the box)

Create a new Java project.

screenshot-from-2016-09-09-17-54-34

After opening the Java project, convert it in to Maven project.

screenshot-from-2016-09-09-17-57-07

After adding maven capability add the following dependencies of Hadoop 2.6.4.

<dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.6.4</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.4</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.6.4</version>
        </dependency>

    </dependencies>

Mapper

logo-mapreduce

Let’s create a mapper to count the words in a file. Refer to https://hadoop.apache.org/docs/r2.7.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0 for more details.

package org.grassfield.hadoop;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

/**
 * @author Pandian
 *
 */
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    /**
     * This is to store the output string
     */
    private Text word = new Text();
    
    
    /**
     * This is to denote each occurrence of the word
     */
    private IntWritable one = new IntWritable(1);

    @Override
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
            throws IOException, InterruptedException {
        //read line by line
        String line = value.toString();
        
        //tokenize by comma
        StringTokenizer st = new StringTokenizer(line, ",");
        while (st.hasMoreTokens()) {
            //store the token as the word
            word.set(st.nextToken());
            
            //register the count
            context.write(word, one);
        }
    }

}

Driver

Mapper class cannot be executed by itself. hence we write a mapper driver.

package org.grassfield.hadoop;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;

import com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider.Text;

/**
 * @author pandian
 *
 */
public class WordCountDriver extends Configured implements Tool {
    public WordCountDriver(Configuration conf){
        //Assign the configuration to super class. 
        //Otherwise you will get null pointer for getConf()
        super(conf);
    }

    @Override
    public int run(String[] args) throws Exception {
        //get the configuration
        Configuration conf = getConf();

        //initiate the parser with arguments and configuration
        GenericOptionsParser parser = new GenericOptionsParser(conf, args);
        args = parser.getRemainingArgs();
        
        //input and output HDFS locations are received as command line argument
        Path input = new Path(args[0]);
        Path output = new Path(args[1]);
        
        //Mapper Job is defined
        Job job = new Job(conf, "Word Count Driver");
        job.setJarByClass(getClass());
        
        //Output format is defined
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        
        //reducer count is defined. We do not have any reducers for this assignment
        job.setNumReduceTasks(0);
        
        //File input and output formats are specified
        FileInputFormat.setInputPaths(job,  input);
        FileOutputFormat.setOutputPath(job, output);
        
        //Set the mapper class and run the job
        job.setMapperClass(WordCountMapper.class);
        boolean b = job.waitForCompletion(true);
        
        return 0;
    }
    
    /**
     * @param args input and output files are specified
     * @throws Exception
     */
    public static void main (String[]args) throws Exception{
        Configuration conf = new Configuration();
        WordCountDriver driver = new WordCountDriver(conf);
        driver.run(args);
    }

}

Export

Let’s execute maven with clean install target. Jar will be copied to the below given location

[INFO] --- maven-install-plugin:2.4:install (default-install) @ WordCount ---
[INFO] Installing D:\workspace_gandhari\WordCount\target\WordCount-0.0.1-SNAPSHOT.jar to C:\Users\pandian\.m2\repository\WordCount\WordCount\0.0.1-SNAPSHOT\WordCount-0.0.1-SNAPSHOT.jar

I copy the jar file to hadoop server as hadoop user.

Input file

To execute the mapper, we need a file in HDFS. I have the following file already.

hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/lab01/month.txt
chithirai
vaigasi
aani
aadi
aavani
purattasi
aippasi
karthikai
margazhi
thai
thai
panguni

hadoop@gandhari:~/jars$ hadoop jar WordCount-0.0.1-SNAPSHOT.jar org.grassfield.hadoop.WordCountDriver /user/hadoop/lab01/month.txt /user/hadoop/lab01/output/10
...
16/09/09 23:32:22 INFO mapreduce.JobSubmitter: number of splits:1
16/09/09 23:32:23 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/09/09 23:32:23 INFO mapreduce.Job: Running job: job_local57180184_0001
16/09/09 23:32:23 INFO mapred.LocalJobRunner: Starting task: attempt_local57180184_0001_m_000000_0
16/09/09 23:32:23 INFO mapred.MapTask: Processing split: hdfs://gandhari:9000/user/hadoop/lab01/month.txt:0+90
16/09/09 23:32:23 INFO output.FileOutputCommitter: Saved output of task 'attempt_local57180184_0001_m_000000_0' to hdfs://gandhari:9000/user/hadoop/lab01/output/10/_temporary/0/task_local57180184_0001_m_000000
16/09/09 23:32:24 INFO mapreduce.Job:  map 100% reduce 0%
16/09/09 23:32:24 INFO mapreduce.Job: Job job_local57180184_0001 completed successfull
y

Output file

Let’s see if our output file is created

hadoop@gandhari:~/jars$ hadoop fs -ls /user/hadoop/lab01/output/10
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2016-09-09 23:32 /user/hadoop/lab01/output/10/_SUCCESS
-rw-r--r--   3 hadoop supergroup        114 2016-09-09 23:32 /user/hadoop/lab01/output/10/part-m-00000

part-m denotes that it is the output of mapper. Here is the output of our job.

hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/lab01/output/10/part-m-00000
chithirai       1
vaigasi 1
aani    1
aadi    1
aavani  1
purattasi       1
aippasi 1
karthikai       1
margazhi        1
thai    1
thai    1
panguni 1

Interesting, isn’t it!

Have a good week.

Adding local libraries to Maven

How to add user defined .jar files or inhouse developed libraries to Maven Repository, so that my maven project will get executed correctly. Here is the steps I did to add two jar files c:\eg_agent.jar and c:\eg_util.jar to maven repository.

F:\sts-bundle\apache-maven-3.3.9-bin\apache-maven-3.3.9\bin>mvn install:install-file -Dfile=c:\eg_agent.jar -DgroupId=com.eg -DartifactId=agent -Dversion=6.1.2 -Dpackaging=jar

Once it is added to repository, I added the dependency as given below.

<dependency>
<groupId>com.eg</groupId>
<artifactId>agent</artifactId>
<version>6.1.2</version>
</dependency>

Here is another one.

F:\sts-bundle\apache-maven-3.3.9-bin\apache-maven-3.3.9\bin>mvn install:install-file -Dfile=c:\eg_util.jar -DgroupId=com.eg -DartifactId=util -Dversion=6.1.2 -Dpackaging=jar

<dependency>
<groupId>com.eg</groupId>
<artifactId>util</artifactId>
<version>6.1.2</version>
</dependency>

Ref http://stackoverflow.com/questions/29330577/maven-3-3-1-eclipse-dmaven-multimoduleprojectdirectory-system-propery-is-not-s

Oozie mkdistro fails with mvn: command not found

Oozie installation is not stright forward similar to other applications. When I executed the mkdistro script, it failed with the below given error

./mkdistro.sh
./mkdistro.sh: line 71: mvn: command not found

Maven is a build tool for java, which is not instaled in my ubuntu VM. Hence we need to install it using the below given command to make the script working

hadoop@gandhari:~/oozie/bin$ sudo apt-get install maven

 

Maven and jUnit

package junit.framework does not exist

21BGMCUBE_1904636e

This was another ugly exception I got while building my simple application. Build Failure!

17

Unfortunately I did created the test case inside src/main/java folder, which Maven doesn’t like. After moving the test to src/test/java, he is happy.

18