Illegal partition for Big Data (3)


I was writing a sample MR program with Partitioner for 6th lab exercise. My job failed with below given teasing exception.

16/09/10 18:28:40 WARN mapred.LocalJobRunner: job_local938197532_0001
java.lang.Exception: Illegal partition for Big Data (3)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(
        at org.apache.hadoop.mapred.LocalJobRunner$
Caused by: Illegal partition for Big Data (3)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(
        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(
        at org.apache.hadoop.mapred.MapTask.runNewMapper(
        at org.apache.hadoop.mapred.LocalJobRunner$Job$
        at java.util.concurrent.Executors$
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$
16/09/10 18:28:41 INFO mapreduce.Job: Job job_local938197532_0001 failed with state FAILED due to: NA


Actually there were only 3 reducers 0, 1 and 2. I mentioned 3 in my partitioner instead of 2. Correcting the right reducer number in the partitioner solved this problem.


Lab 06 – A simple hadoop partitioner

Hi Hadoopers,

Lab 04 – A Hadoop Reducer demo talks about a MapReduce program with 3 reducers. By default Hadoop uses hashing algorithm to decide the load of each reducers. If needed we can customize it. Customization will come in the scene when you want to do the load balancing by yourself.

So Lab 04, follows the following process.


Map Reduce program without a partitioner

In this exercise, we are going to decide, what type of records each partitioner will get. I’ll use reducer 1 for vowels, reducer 2 for ijklm and all the other records goes to reducer 3.


MapReduce program with partitioner



Here is how we accomplished this. Here is the partitioner code.

package org.grassfield.hadoop;

import org.apache.hadoop.mapreduce.Partitioner;

 * Let's decide the way to manage the load of the reducers.
 * Reducer 1 - Vowels
 * Reducer 2 - Keys starting with ijklm
 * Reducer 3 - Remaining all
 * @author pandian
public class FeedCategoryPartitioner extends Partitioner<Text, IntWritable> {

    public int getPartition(Text word, IntWritable count, int numReducer) {
        String s = word.toString();
        if (s.length()==0)
            return 0;
        if (
        return 0;
        if (
            return 1;
        return 2;




Let’s export the jar and copy the jar file to Hadoop machine. Here is the output.

hadoop@gandhari:~/jars$ hadoop jar FeedCategoryCount-6.0.jar org.grassfield.hadoop.FeedCategoryCountDriver /user/hadoop/lab03/feed /user/hadoop/output/lab06_01
hadoop@gandhari:~/jars$ hadoop fs -ls /user/hadoop/output/lab06_01
Found 4 items
-rw-r--r--   3 hadoop supergroup          0 2016-09-10 18:41 /user/hadoop/output/lab06_01/_SUCCESS
-rw-r--r--   3 hadoop supergroup         29 2016-09-10 18:41 /user/hadoop/output/lab06_01/part-r-00000
-rw-r--r--   3 hadoop supergroup         39 2016-09-10 18:41 /user/hadoop/output/lab06_01/part-r-00001
-rw-r--r--   3 hadoop supergroup        318 2016-09-10 18:41 /user/hadoop/output/lab06_01/part-r-00002
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab06_01/part-r-00000
Application Master      1
Oozie   1
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab06_01/part-r-00001
Job Tracker     1
Map Reduce      1
MapReduce       1
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab06_01/part-r-00002
BDFS split      1
Big Data        10
Flume   1
HBase   1
HDFS    6
HDFS block      2
HDFS commands   2
HDFS permissions        1
HDFS replication        1
Hadoop  2
Hive    1
Hue     1
Pig     1
Resource Manager        1
Task Container  1
Task Tracker    1
YARN    1
ZooKeeper       1
ZooKeeperFailoverController     1
hadoop federation       1
hadoop high availability        1
hadoop rack-aware       1
sqoop   1