java.io.IOException: Illegal partition for Big Data (3)

Hi,

I was writing a sample MR program with Partitioner for 6th lab exercise. My job failed with below given teasing exception.

16/09/10 18:28:40 WARN mapred.LocalJobRunner: job_local938197532_0001
java.lang.Exception: java.io.IOException: Illegal partition for Big Data (3)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: Illegal partition for Big Data (3)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1079)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
        at org.grassfield.hadoop.FeedCategoryCountMapper.map(FeedCategoryCountMapper.java:32)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
16/09/10 18:28:41 INFO mapreduce.Job: Job job_local938197532_0001 failed with state FAILED due to: NA

logo-mapreduce

Actually there were only 3 reducers 0, 1 and 2. I mentioned 3 in my partitioner instead of 2. Correcting the right reducer number in the partitioner solved this problem.

Lab 06 – A simple hadoop partitioner

Hi Hadoopers,

Lab 04 – A Hadoop Reducer demo talks about a MapReduce program with 3 reducers. By default Hadoop uses hashing algorithm to decide the load of each reducers. If needed we can customize it. Customization will come in the scene when you want to do the load balancing by yourself.

So Lab 04, follows the following process.

hadoop028-mr-withour-a-partitioner

Map Reduce program without a partitioner

In this exercise, we are going to decide, what type of records each partitioner will get. I’ll use reducer 1 for vowels, reducer 2 for ijklm and all the other records goes to reducer 3.

hadoop029-mr-with-a-partitioner

MapReduce program with partitioner

Partitioner

logo-mapreduce

Here is how we accomplished this. Here is the partitioner code.

package org.grassfield.hadoop;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

/**
 * Let's decide the way to manage the load of the reducers.
 * Reducer 1 - Vowels
 * Reducer 2 - Keys starting with ijklm
 * Reducer 3 - Remaining all
 * @author pandian
 *
 */
public class FeedCategoryPartitioner extends Partitioner<Text, IntWritable> {

    @Override
    public int getPartition(Text word, IntWritable count, int numReducer) {
        String s = word.toString();
        if (s.length()==0)
            return 0;
        if (
                s.startsWith("a")
            ||s.startsWith("e")
            ||s.startsWith("i")
            ||s.startsWith("o")
            ||s.startsWith("u")
            ||s.startsWith("A")
            ||s.startsWith("E")
            ||s.startsWith("I")
            ||s.startsWith("O")
            ||s.startsWith("U")
            )
        return 0;
        
        if (
                s.startsWith("i")
            ||s.startsWith("j")
            ||s.startsWith("k")
            ||s.startsWith("l")
            ||s.startsWith("m")
            ||s.startsWith("I")
            ||s.startsWith("J")
            ||s.startsWith("K")
            ||s.startsWith("L")
            ||s.startsWith("M")
            )
            return 1;
        
        return 2;
    }

}

Execution

hadoop_yarn

Let’s export the jar and copy the jar file to Hadoop machine. Here is the output.

hadoop@gandhari:~/jars$ hadoop jar FeedCategoryCount-6.0.jar org.grassfield.hadoop.FeedCategoryCountDriver /user/hadoop/lab03/feed /user/hadoop/output/lab06_01
hadoop@gandhari:~/jars$ hadoop fs -ls /user/hadoop/output/lab06_01
Found 4 items
-rw-r--r--   3 hadoop supergroup          0 2016-09-10 18:41 /user/hadoop/output/lab06_01/_SUCCESS
-rw-r--r--   3 hadoop supergroup         29 2016-09-10 18:41 /user/hadoop/output/lab06_01/part-r-00000
-rw-r--r--   3 hadoop supergroup         39 2016-09-10 18:41 /user/hadoop/output/lab06_01/part-r-00001
-rw-r--r--   3 hadoop supergroup        318 2016-09-10 18:41 /user/hadoop/output/lab06_01/part-r-00002
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab06_01/part-r-00000
Application Master      1
Oozie   1
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab06_01/part-r-00001
Job Tracker     1
Map Reduce      1
MapReduce       1
hadoop@gandhari:~/jars$ hadoop fs -cat /user/hadoop/output/lab06_01/part-r-00002
BDFS split      1
Big Data        10
Flume   1
HBase   1
HDFS    6
HDFS block      2
HDFS commands   2
HDFS permissions        1
HDFS replication        1
Hadoop  2
Hive    1
Hue     1
Pig     1
Resource Manager        1
Task Container  1
Task Tracker    1
YARN    1
ZooKeeper       1
ZooKeeperFailoverController     1
hadoop federation       1
hadoop high availability        1
hadoop rack-aware       1
sqoop   1