Lab 04 – A Hadoop Reducer demo

Hi Hadoopers,

We have seen two posts giving you the demo on how to write simple mappers.

Lab 02 – A simple hadoop mapper with Eclipse and Maven

Lab 03 – A Hadoop Mapper to get the category of an RSS feed with Eclipse and Maven

Lets take the Lab 03 exercise and write a reducer program for the same.

The output of the Mapper is given below.

Big Data        1
YARN    1
Application Master      1
Resource Manager        1
Task Container  1
Big Data        1
Hadoop  1
Map Reduce      1
MapReduce       1
Job Tracker     1
Task Tracker    1
Big Data        1
HDFS    1
HDFS commands   1
HDFS permissions        1
Big Data        1
HDFS    1
Big Data        1
HDFS commands   1
Big Data        1
HDFS    1
Big Data        1
HDFS    1
HDFS block      1
HDFS replication        1
Big Data        1
HDFS    1
hadoop federation       1
hadoop high availability        1
hadoop rack-aware       1
ZooKeeperFailoverController     1
Big Data        1
HDFS    1
BDFS split      1
HDFS block      1
Big Data        1
Flume   1
Hadoop  1
HBase   1
Hive    1
Hue     1
Oozie   1
Pig     1
sqoop   1
ZooKeeper       1

Reducer

Let’s write  a reducer by extending the Reducer class.

package org.grassfield.hadoop;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

/**
 * Reducer for Feed Category Mapper
 * @author pandian
 *
 */
public class FeedCategoryReducer extends 
    Reducer<Text, IntWritable, Text, IntWritable> {

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values,
            Context context) throws IOException, InterruptedException {
        int sum=0;
        for (IntWritable value:values){
            sum+=value.get();
        }
        context.write(key, new IntWritable(sum));
    }
}

Changes to Driver program

Referring to my earlier post, I made the following changes to plug in the reducer.

public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        GenericOptionsParser parser = new GenericOptionsParser(conf, args);
        args = parser.getRemainingArgs();
        Path input = new Path(args[0]);
        Path output = new Path(args[1]);
        Job job = new Job(conf, "Feed Category Count");
        job.setJarByClass(getClass());
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setReducerClass(FeedCategoryReducer.class);
        job.setNumReduceTasks(3);
        FileInputFormat.setInputPaths(job, input);
        FileOutputFormat.setOutputPath(job, output);
        job.setMapperClass(FeedCategoryCountMapper.class);
        job.waitForCompletion(true);
        return 0;
    }

Execution

Ok, It’s the time to execute and check the output of Reducer.

hadoop@gandhari:~/jars$ hadoop jar FeedCategoryCount-1.0.jar org.grassfield.hadoop.FeedCategoryCountDriver /user/hadoop/lab03/feed /user/hadoop/output/lab04

The job is successful. Let’s see the output from reducers.

hadoop@gandhari:~/jars$ hadoop fs -ls /user/hadoop/output/lab04
Found 4 items
-rw-r--r--   3 hadoop supergroup          0 2016-09-10 16:05 /user/hadoop/output/lab04/_SUCCESS
-rw-r--r--   3 hadoop supergroup        136 2016-09-10 16:05 /user/hadoop/output/lab04/part-r-00000
-rw-r--r--   3 hadoop supergroup        118 2016-09-10 16:05 /user/hadoop/output/lab04/part-r-00001
-rw-r--r--   3 hadoop supergroup        132 2016-09-10 16:05 /user/hadoop/output/lab04/part-r-00002

Lets cat them.

hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab04/part-r-00000
HDFS    6
HDFS replication        1
Hue     1
Oozie   1
YARN    1
ZooKeeper       1
ZooKeeperFailoverController     1
hadoop federation       1
hadoop high availability        1
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab04/part-r-00001
Big Data        10
HDFS block      2
HDFS commands   2
HDFS permissions        1
Hive    1
Map Reduce      1
MapReduce       1
Pig     1
hadoop rack-aware       1
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab04/part-r-00002
Application Master      1
BDFS split      1
Flume   1
HBase   1
Hadoop  2
Job Tracker     1
Resource Manager        1
Task Container  1
Task Tracker    1
sqoop   1

See you in another interesting post.

Advertisements

One thought on “Lab 04 – A Hadoop Reducer demo

  1. Pingback: Lab 05 – A hadoop combiner demo | JavaShine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s