You need to look at the following blog posts to understand this post in a better way.
- Copying the File to HDFS file system
- A java program to read the file from HDFS
- A java program to read the file from HDFS – 2
The InputStream we used in example 3 given above is nothing but FSDataInputStream, which is capable of doing random access in the file. Hence you can “seek” the content from the location, you prefer.
Beware, Seek is a costly operation!
The code given in example 3 is modifed as below. You can get this resource from https://github.com/tomwhite/hadoop-book/blob/3e/ch03/src/main/java/FileSystemDoubleCat.java
import java.io.*;
import java.net.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
public class FileSystemDoubleCat{
public static void main(String [] args) throws Exception{
String uri=args[0];
Configuration conf= new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri),conf);
FSDataInputStream in = null;
try{
in=fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out,4096,false);
in.seek(0);
IOUtils.copyBytes(in, System.out,4096,false);
}finally{
IOUtils.closeStream(in);
}
}
}
Lets…
View original post 122 more words