Java program to read a file from Hadoop Cluster 2 (with file seek)

Hadoop

You need to look at the following blog posts to understand this post in a better way.

  1. Copying the File to HDFS file system
  2. A java program to read the file from HDFS
  3. A java program to read the file from HDFS – 2

The InputStream we used in example 3 given above is nothing but FSDataInputStream, which is capable of doing random access in the file. Hence you can “seek” the content from the location, you prefer.

Beware, Seek is a costly operation!

The code given in example 3 is modifed as below. You can get this resource from https://github.com/tomwhite/hadoop-book/blob/3e/ch03/src/main/java/FileSystemDoubleCat.java

import java.io.*;
import java.net.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
public class FileSystemDoubleCat{
public static void main(String [] args) throws Exception{
String uri=args[0];
Configuration conf= new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri),conf);
FSDataInputStream in = null;
try{
in=fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out,4096,false);
in.seek(0);
IOUtils.copyBytes(in, System.out,4096,false);
}finally{
IOUtils.closeStream(in);
}
}
}

Lets…

View original post 122 more words

Leave a comment