I am looking for an example which is using the new API to read and write Sequence Files.
Effectively I need to know how to use these functions
createWriter(Configuration conf, org.apache.hadoop.io.SequenceFile.Writer.Option... opts)
The Old definition is not working for me:
SequenceFile.createWriter( fs, conf, path, key.getClass(), value.getClass());
Similarly I need to know what will be the code for reading the Sequence file, as the follwoing is deprecated:
SequenceFile.Reader(fs, path, conf);
Here is the way to use the same -
String uri = args[0];
Configuration conf = new Configuration();
Path path = new Path( uri);
IntWritable key = new IntWritable();
Text value = new Text();
CompressionCodec Codec = new GzipCodec();
SequenceFile.Writer writer = null;
Option optPath = SequenceFile.Writer.file(path);
Option optKey = SequenceFile.Writer.keyClass(key.getClass());
Option optVal = SequenceFile.Writer.valueClass(value.getClass());
Option optCom = SequenceFile.Writer.compression(CompressionType.RECORD, Codec);
writer = SequenceFile.createWriter( conf, optPath, optKey, optVal, optCom);
public class SequenceFilesTest {
public void testSeqFileReadWrite() throws IOException {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.getLocal(conf);
Path seqFilePath = new Path("file.seq");
SequenceFile.Writer writer = SequenceFile.createWriter(conf,
Writer.file(seqFilePath), Writer.keyClass(Text.class),
writer.append(new Text("key1"), new IntWritable(1));
writer.append(new Text("key2"), new IntWritable(2));
SequenceFile.Reader reader = new SequenceFile.Reader(conf,
Text key = new Text();
IntWritable val = new IntWritable();
while (reader.next(key, val)) {
System.err.println(key + "\t" + val);
I'm late by more than an year to answer but just got started with Hadoop 2.4.1 :)
Below is the code, someone may find it useful.
Note: It includes the commented 1.x code to read and write a sequence file. I was wondering where does it pick up the file system but when I executed it directly on the cluster, it picked it properly(probably, from core-site.xml as mentioned in Configuration
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Reader.Option;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.util.ReflectionUtils;
public class SequenceFileOperator {
private Configuration conf = new Configuration();
/*private FileSystem fs;
try {
fs = FileSystem.get(URI.create("hdfs://cldx-1336-1202:9000"), conf);
} catch (IOException e) {
// TODO Auto-generated catch block
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
if (args == null || args.length < 2) {
.println("Following are the possible invocations <operation id> <arg1> <arg2> ...");
.println("1 <absolute path of directory containing documents> <HDFS path of the sequence file");
System.out.println("2 <HDFS path of the sequence file>");
int operation = Integer.valueOf(args[0]);
SequenceFileOperator docToSeqFileWriter = new SequenceFileOperator();
switch (operation) {
case 1: {
String docDirectoryPath = args[1];
String sequenceFilePath = args[2];
System.out.println("Writing files present at " + docDirectoryPath
+ " to the sequence file " + sequenceFilePath);
case 2: {
String sequenceFilePath = args[1];
System.out.println("Reading the sequence file " + sequenceFilePath);
private void readSequenceFile(String sequenceFilePath) throws IOException {
// TODO Auto-generated method stub
* SequenceFile.Reader sequenceFileReader = new SequenceFile.Reader(fs,
* new Path(sequenceFilePath), conf);
Option filePath = SequenceFile.Reader.file(new Path(sequenceFilePath));
SequenceFile.Reader sequenceFileReader = new SequenceFile.Reader(conf,
Writable key = (Writable) ReflectionUtils.newInstance(
sequenceFileReader.getKeyClass(), conf);
Writable value = (Writable) ReflectionUtils.newInstance(
sequenceFileReader.getValueClass(), conf);
try {
while (sequenceFileReader.next(key, value)) {
.printf("[%s] %s %s \n",
sequenceFileReader.getPosition(), key,
} finally {
private void loadDocumentsToSequenceFile(String docDirectoryPath,
String sequenceFilePath) throws IOException {
// TODO Auto-generated method stub
File docDirectory = new File(docDirectoryPath);
if (!docDirectory.isDirectory()) {
.println("Please provide an absolute path of a directory that contains the documents to be added to the sequence file");
* SequenceFile.Writer sequenceFileWriter =
* SequenceFile.createWriter(fs, conf, new Path(sequenceFilePath),
* Text.class, BytesWritable.class);
org.apache.hadoop.io.SequenceFile.Writer.Option filePath = SequenceFile.Writer
.file(new Path(sequenceFilePath));
org.apache.hadoop.io.SequenceFile.Writer.Option keyClass = SequenceFile.Writer
org.apache.hadoop.io.SequenceFile.Writer.Option valueClass = SequenceFile.Writer
SequenceFile.Writer sequenceFileWriter = SequenceFile.createWriter(
conf, filePath, keyClass, valueClass);
File[] documents = docDirectory.listFiles();
try {
for (File document : documents) {
RandomAccessFile raf = new RandomAccessFile(document, "r");
byte[] content = new byte[(int) raf.length()];
sequenceFileWriter.append(new Text(document.getName()),
new BytesWritable(content));
} finally {
for reading you can use
Path path= new Path("/bar");
Reader sequenceFileReader = new SequenceFile.Reader(conf,SequenceFile.Reader.file(path));
You need to set SequenceFile as input format
You will find an example of reading SeequnceFile form HDFS here.
I am attempting to save a file in the main class of a Hadoop application so it can be read later on by the mapper. The file is an encryption key that will be used to encrypt data. My question here is, where will the data end up if I am writing the file to the working directory?
public class HadoopIndexProject {
private static SecretKey generateKey(int size, String Algorithm) throws UnsupportedEncodingException, NoSuchAlgorithmException {
KeyGenerator keyGen = KeyGenerator.getInstance(Algorithm);
return keyGen.generateKey();
private static IvParameterSpec generateIV() {
byte[] b = new byte[16];
new Random().nextBytes(b);
return new IvParameterSpec(b);
public static void saveKey(SecretKey key, IvParameterSpec IV, String path) throws IOException {
FileOutputStream stream = new FileOutputStream(path);
//FSDataOutputStream stream = fs.create(new Path(path));
try {
} finally {
* #param args the command line arguments
* #throws java.lang.Exception
public static void main(String[] args) throws Exception {
// TODO code application logic here
Configuration conf = new Configuration();
//FileSystem fs = FileSystem.getLocal(conf);
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
SecretKey KEY;
IvParameterSpec IV;
if (otherArgs.length != 2) {
System.err.println("Usage: Index <in> <out>");
try {
if(! new File("key.dat").exists()) {
KEY = generateKey(128, "AES");
IV = generateIV();
saveKey(KEY, IV, "key.dat");
} catch (NoSuchAlgorithmException ex) {
Logger.getLogger(HadoopIndexMapper.class.getName()).log(Level.SEVERE, null, ex);
conf.set("mapred.textoutputformat.separator", ":");
Job job = Job.getInstance(conf);
job.setJobName("Index creator");
FileInputFormat.addInputPath(job, new Path(otherArgs[0]) {});
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
There is not concept of working directory in HDFS. All relative paths are paths from /user/<username>, so your file will be located in /user/<username>/key.dat.
But in yarn you have concept of distributed cache, so additional files for your yarn application you can add there using job.addCacheFile
I really need some help. I am working on this assignment in school. I'm supposed to read a txt file into a list or string, and Encrypt that list of string, save it and Decrypt it back into a list of string. I was able to Encrypt it and save it, but when I try to Decrypt it back, it give me an error message about Bad Padding. I really don't how to fix this problem. Any help will be really good.
This is my code for Encrypting it
import java.awt.FileDialog;
import java.awt.Frame;
import java.io.*;
import java.nio.file.*;
import java.security.*;
import java.util.*;
import javax.crypto.BadPaddingException;
import javax.crypto.Cipher;
import javax.crypto.CipherInputStream;
import javax.crypto.CipherOutputStream;
import javax.crypto.IllegalBlockSizeException;
import javax.crypto.KeyGenerator;
import javax.crypto.NoSuchPaddingException;
import javax.crypto.SecretKey;
import javax.crypto.spec.IvParameterSpec;
public class WordAnalysisMenu {
private static KeyGenerator kgen;
private static SecretKey key;
private static byte[] iv = null;
static Scanner sc = new Scanner(System.in);
private static void init(){
try {
kgen = KeyGenerator.getInstance("AES");
key = kgen.generateKey();
} catch (NoSuchAlgorithmException e) {
public static void main (String [] arge) throws NoSuchAlgorithmException, NoSuchPaddingException,
InvalidKeyException, IOException, InvalidAlgorithmParameterException, IllegalBlockSizeException, BadPaddingException {
List<String> warPeace = new ArrayList<>();
while(true) {
int choice = menu();
switch(choice) {
case 0:
case 1:
//select file for reading
warPeace = readFile();
case 7:
//apply the cipher and save it to a file
Frame frame = new Frame();
FileDialog fileDialog = new FileDialog(frame, "", FileDialog.SAVE);
Path path = Paths.get(fileDialog.getDirectory() + fileDialog.getFile());
iv = ListEncrypter(warPeace, path, key);
case 8:
// read the ciper file, decode, and print
frame = new Frame();
fileDialog = new FileDialog(frame, "", FileDialog.LOAD);
Path inputFile = Paths.get(fileDialog.getDirectory() + fileDialog.getFile());
warPeace = ListDecrypter(inputFile, key, iv);
case 9:
System.out.println("Good bye!");
public static int menu() {
int choice = 0;
System.out.println("\nPlease make a selection: ");
System.out.println("1.\t Select a file for reading.");
System.out.println("7.\t Apply cipher & save");
System.out.println("8.\t Read encoded file");
System.out.println("9.\t Exit");
Scanner scan = new Scanner(System.in);
try {
choice = scan.nextInt();
catch(InputMismatchException ime) {
System.out.println("Invalid input!");
return choice;
public static List<String> readFile() {
List<String> word = new ArrayList<String>();
Frame f = new Frame();
FileDialog saveBox = new FileDialog(f, "Reading text file", FileDialog.LOAD);
String insName = saveBox.getFile();
String fileSavePlace = saveBox.getDirectory();
File inFile = new File(fileSavePlace + insName);
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader(inFile));
String line;
while (((line = in.readLine()) != null)) {
String s = new String();
} catch (IOException io) {
System.out.println("There Was An Error Reading The File");
} finally {
try {
} catch (Exception e) {
return word;
private static byte[] ListEncrypter(List<String> content, Path outputFile, SecretKey key)
throws NoSuchAlgorithmException, NoSuchPaddingException, InvalidKeyException, IOException, IllegalBlockSizeException, BadPaddingException {
Cipher encryptCipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
//Cipher encryptCipher = Cipher.getInstance("AES/CFB8/NoPadding");
encryptCipher.init(Cipher.ENCRYPT_MODE, key);
StringBuilder sb = new StringBuilder();
content.stream().forEach(e -> sb.append(e).append(System.lineSeparator()));
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
CipherOutputStream cipherOutputStream = new CipherOutputStream(outputStream, encryptCipher);
ByteArrayInputStream inputStream = new ByteArrayInputStream(outputStream.toByteArray());
Files.copy(inputStream, outputFile, StandardCopyOption.REPLACE_EXISTING);
return encryptCipher.getIV();
private static List<String> ListDecrypter(Path inputFile, SecretKey key, byte[] iv) throws
NoSuchAlgorithmException, NoSuchPaddingException, InvalidKeyException,
InvalidAlgorithmParameterException, IOException, IllegalBlockSizeException, BadPaddingException {
Cipher decryptCipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
//Cipher decryptCipher = Cipher.getInstance("AES/CBC/CFB8NoPadding");
IvParameterSpec ivParameterSpec = new IvParameterSpec(iv);
decryptCipher.init(Cipher.DECRYPT_MODE, key, ivParameterSpec);
List<String> fileContent = new ArrayList<>();
String line = null;
ByteArrayInputStream inputStream = new ByteArrayInputStream(Files.readAllBytes(inputFile));
try(CipherInputStream chipherInputStream = new CipherInputStream(new FileInputStream(inputFile.toFile()), decryptCipher);
BufferedReader br = new BufferedReader(new InputStreamReader(chipherInputStream))) {
while ((line = br.readLine()) != null) {
return null;
This is the error message.
Exception in thread "main" java.io.IOException:
javax.crypto.BadPaddingException: Given final block not properly
I am trying to run java program on my hadoop system to store image in sequence file and then trying to read that sequence file after that.
My Sequence is created but image data is not getting appended in sequence file.
I am trying to run below code by running this command
sudo -u hdfs hadoop jar /usr/java_jar/ImageStorage.jar ImageStorage 12e2baa2ae0e455ac40015942b682c4b.jpg
Please help me out here.
import java.io.*;
import java.util.*;
import java.net.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Writer.Option;
import org.apache.hadoop.io.Writable;
public class ImageStorage {
private static void openOutputFile(String args1) throws Exception {
String uri = "hdfs://localhost:8020/";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
Path path = new Path("hdfs://localhost:8020/user/img_data/SequenceFileCodecTest.seq");
String string1 = "hdfs://localhost:8020/user/img_data/";
string1 = string1 + args1;
Path inPath = new Path(string1);
FSDataInputStream in = null;
Text key = new Text();
BytesWritable value = new BytesWritable();
SequenceFile.Writer writer = null;
in = fs.open(inPath);
byte buffer[] = new byte[in.available()];
Option optPath = SequenceFile.Writer.file(path);
Option optKey = SequenceFile.Writer.keyClass(key.getClass());
Option optVal = SequenceFile.Writer.valueClass(value.getClass());
Option optCom = SequenceFile.Writer.compression(SequenceFile.CompressionType.BLOCK);
FSDataOutputStream fileOutputStream = fs.append(path);
BufferedWriter br = new BufferedWriter(new OutputStreamWriter(fileOutputStream));
writer.append(new Text(inPath.getName()), new BytesWritable(buffer));
}catch (Exception e) {
System.out.println("Exception MESSAGES = "+e.getMessage());
finally {
System.out.println("last line of the code....!!!!!!!!!!");
private static void openReadFile() throws Exception {
String uri = "hdfs://localhost:8020/";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
Path path = new Path("hdfs://localhost:8020/user/img_data/SequenceFileCodecTest.seq");
/* Reading Operations */
org.apache.hadoop.io.SequenceFile.Reader.Option filePath = SequenceFile.Reader.file(path);
SequenceFile.Reader sequenceFileReader = new SequenceFile.Reader(conf,filePath);
Writable key1 = (Writable) ReflectionUtils.newInstance(
sequenceFileReader.getKeyClass(), conf);
Writable value1 = (Writable) ReflectionUtils.newInstance(
sequenceFileReader.getValueClass(), conf);
try {
while (sequenceFileReader.next(key1, value1)) {
System.out.printf("[%s] %s %s \n", sequenceFileReader.getPosition(), key1,value1.getClass());
} finally {
/* Reading operations */
public static void main(String[] args) throws Exception {
I have huge amount of data stored in HDFS, but the individual files are very small (KBs). So the MapReduce processing is taking a lot of time.
Can I reduce the processing time? Will SequenceFile be a good option?
Please provide some Java or MR code to convert multiple smaller text files into SequenceFile.
SequenceFile would be a good choice in such a scenario. You could do something like this :
public class TextToSequenceConverter {
* #param args
* #throws IOException
* #throws IllegalAccessException
* #throws InstantiationException
public static void main(String[] args) throws IOException,
InstantiationException, IllegalAccessException {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
Path inputFile = new Path("/infile");
FSDataInputStream inputStream = fs.open(inputFile);
Path outputFile = new Path("/outfile");
IntWritable key = new IntWritable();
int count = 0;
Text value = new Text();
String str;
SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf,outputFile, key.getClass(), value.getClass());
while (inputStream.available() > 0) {
str = inputStream.readLine();
writer.append(key, value);
System.out.println("SEQUENCE FILE CREATED SUCCESSFULLY........");
You might also wanna have a look at HAR files.
You might find this a good read :
To convert all the files inside a HDFS directory into a single Sequence file :
package my.pack;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
public class BundleSeq {
* #param args
* #throws IOException
* #throws IllegalAccessException
* #throws InstantiationException
public static void main(String[] args) throws IOException,
InstantiationException, IllegalAccessException {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
conf.addResource(new Path(
conf.addResource(new Path(
FileSystem fs = FileSystem.get(conf);
Path inputFile = new Path("/bundleinput");
Path outputFile = new Path("/outfile");
FSDataInputStream inputStream;
Text key = new Text();
Text value = new Text();
SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf,
outputFile, key.getClass(), value.getClass());
FileStatus[] fStatus = fs.listStatus(inputFile);
for (FileStatus fst : fStatus) {
String str = "";
System.out.println("Processing file : " + fst.getPath().getName() + " and the size is : " + fst.getPath().getName().length());
inputStream = fs.open(fst.getPath());
while(inputStream.available()>0) {
str = str+inputStream.readLine();
writer.append(key, value);
System.out.println("SEQUENCE FILE CREATED SUCCESSFULLY........");
Here filename is the key and file content is the value.
You may override org.apache.hadoop.mapred.lib.CombineFileInputFormat and create your CombinedInputFormat. For implementation see my answer here. And by setting the parameter mapred.max.split.size you may control the size you would like the input files to be combined into.
For more read here.
I have a download action implemented on my Vaadin application but for some reason the downloaded file has the original file's full path as the file name.
Any idea?
You can see the code on this post.
Here's the important part of the code:
package com.bluecubs.xinco.core.server.vaadin;
import com.bluecubs.xinco.core.server.XincoConfigSingletonServer;
import com.vaadin.Application;
import com.vaadin.terminal.DownloadStream;
import com.vaadin.terminal.FileResource;
import java.io.*;
import java.net.URLEncoder;
import java.util.UUID;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.zip.CRC32;
import java.util.zip.CheckedInputStream;
* #author Javier A. Ortiz Bultrón<javier.ortiz.78#gmail.com>
public class FileDownloadResource extends FileResource {
private final String fileName;
private File download;
private File newFile;
public FileDownloadResource(File sourceFile, String fileName,
Application application) {
super(sourceFile, application);
this.fileName = fileName;
protected void cleanup() {
if (newFile != null && newFile.exists()) {
if (download != null && download.exists() && download.listFiles().length == 0) {
public DownloadStream getStream() {
try {
//Copy file to directory for downloading
InputStream in = new CheckedInputStream(new FileInputStream(getSourceFile()),
new CRC32());
download = new File(XincoConfigSingletonServer.getInstance().FileRepositoryPath
+ System.getProperty("file.separator") + UUID.randomUUID().toString());
newFile = new File(download.getAbsolutePath() + System.getProperty("file.separator") + fileName);
OutputStream out = new FileOutputStream(newFile);
byte[] buf = new byte[1024];
int len;
while ((len = in.read(buf)) > 0) {
out.write(buf, 0, len);
final DownloadStream ds = new DownloadStream(
new FileInputStream(newFile), getMIMEType(), fileName);
ds.setParameter("Content-Disposition", "attachment; filename="
+ URLEncoder.encode(fileName, "utf-8"));
return ds;
} catch (final FileNotFoundException ex) {
Logger.getLogger(FileDownloadResource.class.getName()).log(Level.SEVERE, null, ex);
return null;
} catch (IOException ex) {
Logger.getLogger(FileDownloadResource.class.getName()).log(Level.SEVERE, null, ex);
return null;
I already debugged and verified that fileName only contains the file's name not the whole path.
The answer was actually a mix of houman001's answer and this post: https://vaadin.com/forum/-/message_boards/view_message/200534
I went away from the above approach to a simpler working one:
StreamSource ss = new StreamSource() {
byte[] bytes = //Get the file bytes here
InputStream is = new ByteArrayInputStream(bytes);
public InputStream getStream() {
return is;
StreamResource sr = new StreamResource(ss, <file name>, <Application Instance>);
getMainWindow().open(sr, "_blank");
Here is my code that works fine (downloading a blob from database as a file), but it's using a Servlet and OutputStream rather than DownloadStream in your case:
public class TextFileServlet extends HttpServlet
public static final String PARAM_BLOB_ID = "id";
private final Logger logger = LoggerFactory.getLogger(TextFileServlet.class);
public void doGet(HttpServletRequest req, HttpServletResponse res) throws IOException
Principal userPrincipal = req.getUserPrincipal();
PersistenceManager pm = PMFHolder.get().getPersistenceManager();
Long id = Long.parseLong(req.getParameter(PARAM_BLOB_ID));
MyFile myfile = pm.getObjectById(MyFile.class, id);
if (!userPrincipal.getName().equals(myfile.getUserName()))
logger.info("TextFileServlet.doGet - current user: " + userPrincipal + " file owner: " + myfile.getUserName());
res.setHeader("Content-Disposition", "attachment;filename=\"" + myfile.getName() + "\"");
I hope it helps you.
StreamResource myResource = createResource(attachmentName);
attachmentName = attachmentName.substring(attachmentName.lastIndexOf("/"));
attachmentName = attachmentName.substring(attachmentName.lastIndexOf("\\"));
FileDownloader fileDownloader = new FileDownloader(myResource);