Setting up Eclipse IDE for Hadoop 2.5.0

Hadoop

You need to look at the following blog posts to understand this post in a better way.

  1. Copying the File to HDFS file system
  2. A java program to read the file from HDFS
  3. A java program to read the file from HDFS – 2
  4. Java program to read a file from Hadoop Cluster 2 (with file seek)
  5. Java program to copy a file from local disk to Hadoop Cluster with progress bar

Till now we are bugging with terminal to write the java programs. Here is how you can setup the Eclipse development environment for Hadoop.

  • This tutorial assumes you have working Hadoop 2.5.0 setup in your environment.
  • This tutorial assumes you have m2-eclipse maven plugin
  • This tutorial assumes you have installed latest version of Maven installed on your system

Set up a plain java maven project. I set it up by name my-app.

Cloud era repository is not still available…

View original post 192 more words

Advertisements

Java program to copy a file from local disk to Hadoop Cluster with progress bar

Hadoop

You need to look at the following blog posts to understand this post in a better way.

  1. Copying the File to HDFS file system
  2. A java program to read the file from HDFS
  3. A java program to read the file from HDFS – 2
  4. Java program to read a file from Hadoop Cluster 2 (with file seek)

We copied the file using CLI in example 1 given above. Lets use java to do the same.

Here comes the full java program to copy the local file to HDFS. You can get it from https://github.com/tomwhite/hadoop-book/blob/3e/ch03/src/main/java/FileCopyWithProgress.java

import java.io.*;
import java.net.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.util.*;

public class FileCopyWithProgress{
    public static void main(String [] args) throws Exception{
        String localSrc=args[0];
        String dst=args[1];
        InputStream in=new BufferedInputStream(new FileInputStream(localSrc));
        Configuration conf=new Configuration();
        FileSystem fs = FileSystem.get(URI.create(dst), conf);
        OutputStream out=fs.create(new Path(dst),new Progressable(){

View original post 61 more words

Java program to read a file from Hadoop Cluster 2 (with file seek)

Hadoop

You need to look at the following blog posts to understand this post in a better way.

  1. Copying the File to HDFS file system
  2. A java program to read the file from HDFS
  3. A java program to read the file from HDFS – 2

The InputStream we used in example 3 given above is nothing but FSDataInputStream, which is capable of doing random access in the file. Hence you can “seek” the content from the location, you prefer.

Beware, Seek is a costly operation!

The code given in example 3 is modifed as below. You can get this resource from https://github.com/tomwhite/hadoop-book/blob/3e/ch03/src/main/java/FileSystemDoubleCat.java

import java.io.*;
import java.net.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
public class FileSystemDoubleCat{
public static void main(String [] args) throws Exception{
String uri=args[0];
Configuration conf= new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri),conf);
FSDataInputStream in = null;
try{
in=fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out,4096,false);
in.seek(0);
IOUtils.copyBytes(in, System.out,4096,false);
}finally{
IOUtils.closeStream(in);
}
}
}

Lets…

View original post 122 more words

Java program to read a file from Hadoop Cluster 2

Hadoop

You need to look at the following blog posts to understand this post in a better way.

  1. Copying the File to HDFS file system
  2. A java program to read the file from HDFS

We already wrote a program to read the file from HDFS. Why the hell we do it again? Tom says, URL.setURLStreamHandlerFactory() method we used in the previous program can be called only once in a JVM. That’s why we execute it inside the static block. In a collaborative development environment, it will be difficult to track this. (Is it a limitation of URL.setURLStreamHandlerFactory()???)

So we use the FileSystem API to do the same operation – read the file and display it on the console!

import java.io.*;
import java.net.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
public class FileSystemCat{
    public static void main(String [] args) throws Exception{
        String uri=args[0];
        Configuration conf= new Configuration();
  …

View original post 120 more words

Java program to read a file from Hadoop Cluster

Hadoop

To know how to copy the file inside the hadoop cluster, read the following post.

How to copy the file to hadoop cluster

The following tutorial is the continuation of the above post. Here is a java program that will read the contents and display in on System.out.

You can download this java file from https://github.com/tomwhite/hadoop-book/blob/3e/ch03/src/main/java/URLCat.java

import java.io.*;
import java.net.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;

public class URLCat{

static{
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}

public static void main (String [] args) throws Exception{
InputStream is = null;
try{
is=new URL(args[0]).openStream();
IOUtils.copyBytes(is,System.out,4096,false);
}finally{
IOUtils.closeStream(is);
}
}
}

Compile it.

$ export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
$ bin/hadoop com.sun.tools.javac.Main URLCat.java

$ export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar:.
pandian@pandian-SH560:~/hadoop-2.5.0$ bin/hadoop URLCat hdfs://localhost:9000/user/pandian/anthem.hadoop.txt
Jana gaṇa mana adhināyaka jaya he
Bhārata bhāgya bidhātā
Pañjāba Sindhu Gujarāṭa Marāṭhā
Drāviḍa Utkala Baṅga
Bindhya Himācala Yamunā Gaṅgā
Ucchala jaladhi taraṅga
Taba śubha nāme jāge
Taba śubha āśiṣa māge
Gāhe taba jaya gāthā
Jana gaṇa maṅgala dāyaka jaya he

View original post 16 more words

java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

Hadoop

Lets do some basic filesystem operations. Lets copy the file /home/pandian/anthem.txt to HDFS

wpid-img_612476816780963.jpeg

$ bin/hadoop fs -copyFromLocal /home/pandian/anthem.txt hdfs://localhost/usr/pandian
copyFromLocal: Call From pandian-SH560/127.0.1.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

Yes, because we haven’t started the server yet. We should make the following changes before we start the server.

Open etc/hadoop/core-site.xml and add the following parameters under <configuration></configuration> tag.

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000/</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

Now, go ahead and start the server.
$ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/pandian/hadoop-2.5.0/logs/hadoop-pandian-namenode-pandian-SH560.out
localhost: starting datanode, logging to /home/pandian/hadoop-2.5.0/logs/hadoop-pandian-datanode-pandian-SH560.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/pandian/hadoop-2.5.0/logs/hadoop-pandian-secondarynamenode-pandian-SH560.out
starting yarn daemons
starting resourcemanager, logging to /home/pandian/hadoop-2.5.0/logs/yarn-pandian-resourcemanager-pandian-SH560.out
localhost: starting nodemanager, logging to /home/pandian/hadoop-2.5.0/logs/yarn-pandian-nodemanager-pandian-SH560.out

Lets try now.

$ bin/hadoop fs -copyFromLocal /home/pandian/anthem.txt hdfs://localhost/usr/pandian
copyFromLocal:…

View original post 241 more words