Lab 06 – A simple hadoop partitioner

Hi Hadoopers,

Lab 04 – A Hadoop Reducer demo talks about a MapReduce program with 3 reducers. By default Hadoop uses hashing algorithm to decide the load of each reducers. If needed we can customize it. Customization will come in the scene when you want to do the load balancing by yourself.

So Lab 04, follows the following process.


Map Reduce program without a partitioner

In this exercise, we are going to decide, what type of records each partitioner will get. I’ll use reducer 1 for vowels, reducer 2 for ijklm and all the other records goes to reducer 3.


MapReduce program with partitioner



Here is how we accomplished this. Here is the partitioner code.

package org.grassfield.hadoop;

import org.apache.hadoop.mapreduce.Partitioner;

 * Let's decide the way to manage the load of the reducers.
 * Reducer 1 - Vowels
 * Reducer 2 - Keys starting with ijklm
 * Reducer 3 - Remaining all
 * @author pandian
public class FeedCategoryPartitioner extends Partitioner<Text, IntWritable> {

    public int getPartition(Text word, IntWritable count, int numReducer) {
        String s = word.toString();
        if (s.length()==0)
            return 0;
        if (
        return 0;
        if (
            return 1;
        return 2;




Let’s export the jar and copy the jar file to Hadoop machine. Here is the output.

hadoop@gandhari:~/jars$ hadoop jar FeedCategoryCount-6.0.jar org.grassfield.hadoop.FeedCategoryCountDriver /user/hadoop/lab03/feed /user/hadoop/output/lab06_01
hadoop@gandhari:~/jars$ hadoop fs -ls /user/hadoop/output/lab06_01
Found 4 items
-rw-r--r--   3 hadoop supergroup          0 2016-09-10 18:41 /user/hadoop/output/lab06_01/_SUCCESS
-rw-r--r--   3 hadoop supergroup         29 2016-09-10 18:41 /user/hadoop/output/lab06_01/part-r-00000
-rw-r--r--   3 hadoop supergroup         39 2016-09-10 18:41 /user/hadoop/output/lab06_01/part-r-00001
-rw-r--r--   3 hadoop supergroup        318 2016-09-10 18:41 /user/hadoop/output/lab06_01/part-r-00002
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab06_01/part-r-00000
Application Master      1
Oozie   1
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab06_01/part-r-00001
Job Tracker     1
Map Reduce      1
MapReduce       1
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab06_01/part-r-00002
BDFS split      1
Big Data        10
Flume   1
HBase   1
HDFS    6
HDFS block      2
HDFS commands   2
HDFS permissions        1
HDFS replication        1
Hadoop  2
Hive    1
Hue     1
Pig     1
Resource Manager        1
Task Container  1
Task Tracker    1
YARN    1
ZooKeeper       1
ZooKeeperFailoverController     1
hadoop federation       1
hadoop high availability        1
hadoop rack-aware       1
sqoop   1

One thought on “Lab 06 – A simple hadoop partitioner

  1. Pingback: Lab 26: Achieving Partitioner with Apache Pig – Split | JavaShine

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s