Lab 06 – A simple hadoop partitioner

Hi Hadoopers,

Lab 04 – A Hadoop Reducer demo talks about a MapReduce program with 3 reducers. By default Hadoop uses hashing algorithm to decide the load of each reducers. If needed we can customize it. Customization will come in the scene when you want to do the load balancing by yourself.

So Lab 04, follows the following process.

hadoop028-mr-withour-a-partitioner

Map Reduce program without a partitioner

In this exercise, we are going to decide, what type of records each partitioner will get. I’ll use reducer 1 for vowels, reducer 2 for ijklm and all the other records goes to reducer 3.

hadoop029-mr-with-a-partitioner

MapReduce program with partitioner

Partitioner

logo-mapreduce

Here is how we accomplished this. Here is the partitioner code.

package org.grassfield.hadoop;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

/**
 * Let's decide the way to manage the load of the reducers.
 * Reducer 1 - Vowels
 * Reducer 2 - Keys starting with ijklm
 * Reducer 3 - Remaining all
 * @author pandian
 *
 */
public class FeedCategoryPartitioner extends Partitioner<Text, IntWritable> {

    @Override
    public int getPartition(Text word, IntWritable count, int numReducer) {
        String s = word.toString();
        if (s.length()==0)
            return 0;
        if (
                s.startsWith("a")
            ||s.startsWith("e")
            ||s.startsWith("i")
            ||s.startsWith("o")
            ||s.startsWith("u")
            ||s.startsWith("A")
            ||s.startsWith("E")
            ||s.startsWith("I")
            ||s.startsWith("O")
            ||s.startsWith("U")
            )
        return 0;
        
        if (
                s.startsWith("i")
            ||s.startsWith("j")
            ||s.startsWith("k")
            ||s.startsWith("l")
            ||s.startsWith("m")
            ||s.startsWith("I")
            ||s.startsWith("J")
            ||s.startsWith("K")
            ||s.startsWith("L")
            ||s.startsWith("M")
            )
            return 1;
        
        return 2;
    }

}

Execution

hadoop_yarn

Let’s export the jar and copy the jar file to Hadoop machine. Here is the output.

hadoop@gandhari:~/jars$ hadoop jar FeedCategoryCount-6.0.jar org.grassfield.hadoop.FeedCategoryCountDriver /user/hadoop/lab03/feed /user/hadoop/output/lab06_01
hadoop@gandhari:~/jars$ hadoop fs -ls /user/hadoop/output/lab06_01
Found 4 items
-rw-r--r--   3 hadoop supergroup          0 2016-09-10 18:41 /user/hadoop/output/lab06_01/_SUCCESS
-rw-r--r--   3 hadoop supergroup         29 2016-09-10 18:41 /user/hadoop/output/lab06_01/part-r-00000
-rw-r--r--   3 hadoop supergroup         39 2016-09-10 18:41 /user/hadoop/output/lab06_01/part-r-00001
-rw-r--r--   3 hadoop supergroup        318 2016-09-10 18:41 /user/hadoop/output/lab06_01/part-r-00002
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab06_01/part-r-00000
Application Master      1
Oozie   1
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab06_01/part-r-00001
Job Tracker     1
Map Reduce      1
MapReduce       1
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab06_01/part-r-00002
BDFS split      1
Big Data        10
Flume   1
HBase   1
HDFS    6
HDFS block      2
HDFS commands   2
HDFS permissions        1
HDFS replication        1
Hadoop  2
Hive    1
Hue     1
Pig     1
Resource Manager        1
Task Container  1
Task Tracker    1
YARN    1
ZooKeeper       1
ZooKeeperFailoverController     1
hadoop federation       1
hadoop high availability        1
hadoop rack-aware       1
sqoop   1

Advertisements

One thought on “Lab 06 – A simple hadoop partitioner

  1. Pingback: Lab 26: Achieving Partitioner with Apache Pig – Split | JavaShine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s