java.lang.Exception: java.io.IOException: Incorrect string value: ‘\xE0\xAE\xB5\xE0\xAF\x87…’

Hi Hadoopers,

This is a nasty exception which kicked off my reducer task, which updates my MySQL table with the reducer output.

The reason behind this is unicode character.

MySQL table was created with non-unicode wester encoding. I’m trying to insert multi lingual unicode text. After changing the table collation (if needed field collation also) to utf8_bin, it worked fine.

alter table FeedEntryRecord convert to character set utf8 collate utf8_bin;

 

Lab 14: Sending MapReduce output to JDBC

Hi Hadoopers,

Unfortunately I couldn’t post on time, as I’ve been hit with flu. Here is the post for today. Let’s see how to send the output of Reducer to JDBC in this post. I’ll take Lab 08 – MapReduce using custom class as Key post and modify it.

logo-mapreduce
Mapper

We have no change in Mapper. It will accept the long and Text object as input and emit the custom key EntryCategory and IntWritable Output.

Reducer

Reducer will accept the output of mapper as its input, EntryCategory as key and IntWritable as value. It will emit custom key DBOutputWritable as key and NullWritable as output.

/**
 * 
 */
package org.grassfield.hadoop;

import java.io.IOException;
import java.util.Date;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;
import org.grassfield.hadoop.entity.DBOutputWritable;
import org.grassfield.hadoop.entity.EntryCategory;

/**
 * Reducer for Feed Category reducer
 * @author pandian
 *
 */
public class FeedCategoryReducer extends 
    Reducer<EntryCategory, IntWritable, DBOutputWritable, NullWritable> {

    @Override
    protected void reduce(EntryCategory key, Iterable<IntWritable> values, Context context) {
        int sum=0;
        for (IntWritable value:values){
            sum+=value.get();
        }
        DBOutputWritable db = new DBOutputWritable();
        db.setParseDate(new java.sql.Date(new Date().getTime()));
        db.setCategory(key.getCategory());
        db.setCount(sum);
        try {
            context.write(db, NullWritable.get());
        } catch (IOException | InterruptedException e) {
            System.err.println("Error while updating record in database");
            e.printStackTrace();
        }
    }
}

 

DBOutputWritable

Our bean DBOutputWritable should implement Writable and DBWritable interfaces so that we shall update the database.

package org.grassfield.hadoop.entity;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.sql.Date;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.lib.db.DBWritable;

/**
 * Bean for table feed_analytics
 * @author pandian
 *
 */
public class DBOutputWritable implements Writable, DBWritable {
    private Date parseDate;
    private String category;
    private int count;

    public Date getParseDate() {
        return parseDate;
    }

    public void setParseDate(Date parseDate) {
        this.parseDate = parseDate;
    }

    public String getCategory() {
        return category;
    }

    public void setCategory(String category) {
        this.category = category;
    }

    public int getCount() {
        return count;
    }

    public void setCount(int count) {
        this.count = count;
    }

    @Override
    public void readFields(ResultSet arg0) throws SQLException {
        throw new RuntimeException("not implemented");
    }

    @Override
    public void write(PreparedStatement ps) throws SQLException {
        ps.setDate(1, this.parseDate);
        ps.setString(2, this.category);
        ps.setInt(3, this.count);
    }

    @Override
    public void readFields(DataInput arg0) throws IOException {
        throw new RuntimeException("not implemented");

    }

    @Override
    public void write(DataOutput arg0) throws IOException {
        throw new RuntimeException("not implemented");
    }
}

Driver

Driver is where I’ll be specifying my database details. Note the changes in output key classes and values

package org.grassfield.hadoop;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.grassfield.hadoop.entity.DBOutputWritable;
import org.grassfield.hadoop.entity.EntryCategory;

/**
 * A Mapper Driver Program to count the categories in RSS XML file This may not
 * be the right approach to parse the XML. Only for demo purpose
 * 
 * @author pandian
 *
 */
public class FeedCategoryCountDriver extends Configured
        implements Tool {

    @Override
    public int run(String[] args) throws ClassNotFoundException, IOException, InterruptedException {
        Configuration conf = getConf();
        DBConfiguration.configureDB(
                conf, 
                "com.mysql.jdbc.Driver", 
                "jdbc:mysql://localhost:3306/feed_analytics?useUnicode=true&characterEncoding=UTF-8",
                "feed_analytics",
                "P@ssw0rd");
        GenericOptionsParser parser = new GenericOptionsParser(conf,
                args);
        args = parser.getRemainingArgs();

        Path input = new Path(args[0]);

        Job job = new Job(conf, "Feed Category Count");
        job.setJarByClass(getClass());

        job.setMapOutputKeyClass(EntryCategory.class);
        job.setMapOutputValueClass(IntWritable.class);

        job.setOutputKeyClass(DBOutputWritable.class);
        job.setOutputValueClass(NullWritable.class);
        job.setOutputFormatClass(DBOutputFormat.class);
        
        job.setMapperClass(FeedCategoryCountMapper.class);
        job.setPartitionerClass(FeedCategoryPartitioner.class);
        job.setCombinerClass(FeedCategoryCombiner.class);
        job.setReducerClass(FeedCategoryReducer.class);
        job.setNumReduceTasks(3);
        
        try {
            FileInputFormat.setInputPaths(job, input);
            DBOutputFormat.setOutput(job, 
                    "feed_category", //table name
                    new String[]{"parseDate", "category", "count"}    //fields
            );
        } catch (IOException e) {
            e.printStackTrace();
        }

        return job.waitForCompletion(true)?0:1;
    }

    public static void main(String[] args) throws Exception {
        System.exit(ToolRunner.run(new Configuration(),
                new FeedCategoryCountDriver(), args));
    }
}

Add the mysql driver to Maven dependencies. If you don’t use Maven, use the library as external jar dependency.

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.6</version>
        </dependency>

****Copy the jar to HadoopHome/lib/native and HadoopHome/share/hadoop/mapreduce/lib/***

Restart hadoop deamons.

Table Structure & DB setup

Let’s create our table first.

mysql-php

CREATE TABLE `feed_category` (
`id` bigint(20) NOT NULL,
`parseDate` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
`category` varchar(100) COLLATE utf8_bin NOT NULL,
`count` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

ALTER TABLE `feed_category`
ADD PRIMARY KEY (`id`);

ALTER TABLE `feed_category`
MODIFY `id` bigint(20) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=151;

Execution

Let’s execute now.

hadoop@gandhari:/opt/hadoop-2.6.4/jars$ hadoop jar FeedCategoryCount-14.jar org.grassfield.hadoop.FeedCategoryCountDriver /user/hadoop/feed/2016-09-24

16/09/24 08:35:46 INFO mapreduce.Job: Counters: 40
        File System Counters
                FILE: Number of bytes read=128167
                FILE: Number of bytes written=1162256
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1948800
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=12
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
        Map-Reduce Framework
                Map input records=1107
                Map output records=623
                Map output bytes=19536
                Map output materialized bytes=4279
                Input split bytes=113
                Combine input records=623
                Combine output records=150
                Reduce input groups=150
                Reduce shuffle bytes=4279
                Reduce input records=150
                Reduce output records=150
                Spilled Records=300
                Shuffled Maps =3
                Failed Shuffles=0
                Merged Map outputs=3
                GC time elapsed (ms)=0
                CPU time spent (ms)=0
                Physical memory (bytes) snapshot=0
                Virtual memory (bytes) snapshot=0
                Total committed heap usage (bytes)=1885339648
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=487200
        File Output Format Counters
                Bytes Written=0
        org.grassfield.hadoop.FeedCategoryCountMapper$MapperRCheck
                INVALID=35
                VALID=1072

So, is my table populated?

hadoop035-lab-14-jdbc-output

yes it is.

com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

Have a good weekend guys. Let me take some rest before moving to MRUnit.

org.hibernate.AssertionFailure: null id in entry (don’t flush the Session after an exception occurs)

I’m  inserting multiple records to MySQL with Hibernate 5.

After a constraint failure, all the records are failed to get inserted with the error ‘org.hibernate.AssertionFailure: null id in entry (don’t flush the Session after an exception occurs)’. To get rid of this problem, I cleared the hibernate session when the exception occurs with session.clear().

Data is getting pumped without any problem now.

Cd1bUvDUUAEo8M4.jpg large

Spring 4 MVC and JDBC authenticated Spring Security – 100% java annotation based configuration

I feel pleasure to introduce you another post on this spring 4 development environment series. So far we have done the following steps.

This post would be an extension to my earlier post Spring 4 MVC and Spring Security 100% java annotation based configuration. We have used in-memory authentication, which is suitable for a beginner. But my next project has already a database designed and I need to authenticate against it. So here would be the jdbc based authentication. Please add MySQL and Commons DBCP in your maven dependencies.

Modify WebSecurityConfig

The in-memory configuration has been defined in WebSecurityConfig.java in our earlier example. Lets modify it slightly to handle the authentication with JDBC. Pls look at the method public void configAuthentication(AuthenticationManagerBuilder auth).

package org.grassfield.conf;

import org.apache.commons.dbcp2.BasicDataSource;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Configuration;
import org.springframework.security.config.annotation.authentication.builders.AuthenticationManagerBuilder;
import org.springframework.security.config.annotation.web.builders.HttpSecurity;
import org.springframework.security.config.annotation.web.configuration.EnableWebSecurity;
import org.springframework.security.config.annotation.web.configuration.WebSecurityConfigurerAdapter;
import org.springframework.security.config.annotation.web.servlet.configuration.EnableWebMvcSecurity;

@SuppressWarnings("deprecation")
@Configuration
@EnableWebMvcSecurity
@EnableWebSecurity
public class WebSecurityConfig extends WebSecurityConfigurerAdapter {
    /*@Autowired
    private DataSource datasource;*/

    @Override
    protected void configure(HttpSecurity http) throws Exception {
        http.csrf().disable();
        http.authorizeRequests().antMatchers("/", "/home").permitAll()
                .anyRequest().authenticated().and().formLogin()
                .loginPage("/login").permitAll().and().logout().permitAll();
    }

    
    @Autowired
    public void configAuthentication(AuthenticationManagerBuilder auth)
            throws Exception {
        BasicDataSource ds = new BasicDataSource();
        ds.setDriverClassName("com.mysql.jdbc.Driver");
        ds.setUrl("jdbc:mysql://localhost:3306/vriksha");
        ds.setUsername("root");
        ds.setPassword("");
        
        auth.jdbcAuthentication()
        .dataSource(ds)
        .usersByUsernameQuery(
                "select u.user_name, u.password, true from user_info u, user_role r where u.user_role_id=r.id and u.user_name=?")
        .authoritiesByUsernameQuery(
                "select u.user_name, r.name from user_info u, user_role r where u.user_role_id=r.id and u.user_name=?");
    }
}

Here are the screenshots. jdbc01 jdbc02


spring-tool-suite-project-logo java8-logo image00110 Apache-Tomcat-logo

Way to Struts 1.2 DataSource

hurray

Today I had downloaded struts 1.2.7 and tried to write an application. (I know I am selecting a bit old.) I configured the datasource in struts. Alas. When I tried to start my server, tomcat 6.0, I got the error saying "java.lang.ClassNotFoundException: org.apache.commons.dbcp.BasicDataSource". With Googling I found struts-legacy.jar need to downloaded and copied to lib folder of Tomcat. (I couldnt get the recent version of struts-legacy. I found something in archive). Happily restarted the server. I saw java.lang.NoClassDefFoundError: org/apache/commons/pool/impl/GenericObjectPool.. oof.. again I need to download commons-dbcp-1.2.2.jar and commons-pool.jar.

My search come to an end 🙂 It is working fine now.

com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

kuttikaranam

Today I had a new install of Mandriva 2008 linux. I wrote a java code to test the mysql connectivity. It ended with the following exception.

com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
Last packet sent to the server was 0 ms ago.

Alas, deeper google search gives me a solution that told about the mysql connectivity parameter in /etc/my.cnf

skip-networking

This parameter has been added for some security related reasons. Really I dont know what it is. I just removed that line, which solved this issue.

How to access MS Access database from JDBC?

btw, I dint test it. I just post it for my reference. thanks Angsuman Chakraborty.

How to access MS Access database from JDBC?

 private static final String accessDBURLPrefix = "jdbc:odbc:Driver={Microsoft Access Driver (*.mdb)};DBQ=";   
private static final String accessDBURLSuffix = ";
DriverID=22;READONLY=false}";
// Initialize the JdbcOdbc Bridge Driver
static {
try {
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
} catch(ClassNotFoundException e)
{
System.err.println("JdbcOdbc Bridge Driver not found!");
}
}
/** Creates a Connection to a Access Database */
public static Connection getAccessDBConnection(String filename) throws SQLException
{
filename = filename.replace('\', '/').trim();
String databaseURL = accessDBURLPrefix + filename + accessDBURLSuffix;
// System.err.println("Datebase URL: " + databaseURL);
return DriverManager.getConnection(databaseURL, "", "");
}