Hadoop compression - hadoop

I was trying to compress a file using the following code. The compression works fine when the size of the file is small(say 1 GB). But when the size of the file is around 5GB the program does not fail rather it keeps on running for 2 days with out any result.
Based on the info message I get it seems like cluster issue although I am not sure enough.
Following is the code the error I am getting:
Error
Code I am using
public void compressData(final String inputFilePath,final String outputPath) throws DataFabricAppendException {
CompressionOutputStream compressionOutputStream = null;
FSDataOutputStream fsDataOutputStream = null;
FSDataInputStream fsDataInputStream = null;
CompressionCodec compressionCodec = null;
CompressionCodecFactory compressionCodecFactory = null;
try {
compressionCodecFactory = new CompressionCodecFactory(conf);
final Path compressionFilePath = new Path(outputPath);
fsDataOutputStream = fs.create(compressionFilePath);
compressionCodec = compressionCodecFactory
.getCodecByClassName(BZip2Codec.class.getName());
compressionOutputStream = compressionCodec
.createOutputStream(fsDataOutputStream);
fsDataInputStream = new FSDataInputStream(fs.open(new Path(
inputFilePath)));
IOUtils.copyBytes(fsDataInputStream, compressionOutputStream, conf,
false);
compressionOutputStream.finish();
} catch (IOException ex) {
throw new DataFabricAppendException(
"Error while compressing non-partitioned file : "
+ inputFilePath, ex);
} catch (Exception ex) {
throw new DataFabricAppendException(
"Error while compressing non-partitioned file : "
+ inputFilePath, ex);
} finally {
try {
if (compressionOutputStream != null) {
compressionOutputStream.close();
}
if (fsDataInputStream != null) {
fsDataInputStream.close();
}
if (fsDataOutputStream != null) {
fsDataOutputStream.close();
}
} catch (IOException e1) {
LOG.warn("Could not close necessary objects");
}
}
}

Related

is it possible to read the content of the file present in the ftp server? [duplicate]

This is re-worded from a previous question (which was probably a bit unclear).
I want to download a text file via FTP from a remote server, read the contents of the text file into a string and then discard the file. I don't need to actually save the file.
I am using the Apache Commons library so I have:
import org.apache.commons.net.ftp.FTPClient;
Can anyone help please, without simply redirecting me to a page with lots of possible answers on?
Not going to do the work for you, but once you have your connection established, you can call retrieveFile and pass it an OutputStream. You can google around and find the rest...
FTPClient ftp = new FTPClient();
...
ByteArrayOutputStream myVar = new ByteArrayOutputStream();
ftp.retrieveFile("remoteFileName.txt", myVar);
ByteArrayOutputStream
retrieveFile
Normally I'd leave a comment asking 'What have you tried?'. But now I'm feeling more generous :-)
Here you go:
private void ftpDownload() {
FTPClient ftp = null;
try {
ftp = new FTPClient();
ftp.connect(mServer);
try {
int reply = ftp.getReplyCode();
if (!FTPReply.isPositiveCompletion(reply)) {
throw new Exception("Connect failed: " + ftp.getReplyString());
}
if (!ftp.login(mUser, mPassword)) {
throw new Exception("Login failed: " + ftp.getReplyString());
}
try {
ftp.enterLocalPassiveMode();
if (!ftp.setFileType(FTP.BINARY_FILE_TYPE)) {
Log.e(TAG, "Setting binary file type failed.");
}
transferFile(ftp);
} catch(Exception e) {
handleThrowable(e);
} finally {
if (!ftp.logout()) {
Log.e(TAG, "Logout failed.");
}
}
} catch(Exception e) {
handleThrowable(e);
} finally {
ftp.disconnect();
}
} catch(Exception e) {
handleThrowable(e);
}
}
private void transferFile(FTPClient ftp) throws Exception {
long fileSize = getFileSize(ftp, mFilePath);
InputStream is = retrieveFileStream(ftp, mFilePath);
downloadFile(is, buffer, fileSize);
is.close();
if (!ftp.completePendingCommand()) {
throw new Exception("Pending command failed: " + ftp.getReplyString());
}
}
private InputStream retrieveFileStream(FTPClient ftp, String filePath)
throws Exception {
InputStream is = ftp.retrieveFileStream(filePath);
int reply = ftp.getReplyCode();
if (is == null
|| (!FTPReply.isPositivePreliminary(reply)
&& !FTPReply.isPositiveCompletion(reply))) {
throw new Exception(ftp.getReplyString());
}
return is;
}
private byte[] downloadFile(InputStream is, long fileSize)
throws Exception {
byte[] buffer = new byte[fileSize];
if (is.read(buffer, 0, buffer.length)) == -1) {
return null;
}
return buffer; // <-- Here is your file's contents !!!
}
private long getFileSize(FTPClient ftp, String filePath) throws Exception {
long fileSize = 0;
FTPFile[] files = ftp.listFiles(filePath);
if (files.length == 1 && files[0].isFile()) {
fileSize = files[0].getSize();
}
Log.i(TAG, "File size = " + fileSize);
return fileSize;
}
You can just skip the download to local filesystem part and do:
FTPClient ftpClient = new FTPClient();
try {
ftpClient.connect(server, port);
ftpClient.login(user, pass);
ftpClient.enterLocalPassiveMode();
ftpClient.setFileType(FTP.BINARY_FILE_TYPE);
InputStream inputStream = ftpClient.retrieveFileStream("/folder/file.dat");
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, "Cp1252"));
while(reader.ready()) {
System.out.println(reader.readLine()); // Or whatever
}
inputStream.close();
} catch (IOException ex) {
ex.printStackTrace();
} finally {
try {
if (ftpClient.isConnected()) {
ftpClient.logout();
ftpClient.disconnect();
}
} catch (IOException ex) {
ex.printStackTrace();
}
}

how to save CoreDocument in Stanford nlp to disk 2

Followed Professor Manning's suggestion to use the ProtobufAnnotationSerializer and did something wrong.
used serializer.writeCoreDocument on the correctly working document; Later read written file with pair = serializer.read; then used pair.second InputStream p2 = pair.second; p2 was empty resulting in a null pointer when running Pair pair3 = serializer.read(p2);
public void writeDoc(CoreDocument document, String filename ) {
AnnotationSerializer serializer = new
ProtobufAnnotationSerializer();
FileOutputStream fos = null;
try {
OutputStream ks = new FileOutputStream(filename);
ks = serializer.writeCoreDocument(document, ks);
ks.flush();
ks.close();
}catch(IOException ioex) {
logger.error("IOException "+ioex);
}
}
public void ReadSavedDoc(String filename) {
// Read
byte[]kb = null;
try {
File initialFile = new File(filename);
InputStream ks = new FileInputStream(initialFile);
ProtobufAnnotationSerializer serializer = new
ProtobufAnnotationSerializer();
InputStream kis = new
ByteArrayInputStream(ks.readAllBytes());
ks.close();
Pair<Annotation, InputStream> pair = serializer.read(kis);
InputStream p2 = pair.second;
int nump2 = p2.available();
logger.info(nump2);
byte[] ba = p2.readAllBytes();
Annotation readAnnotation = pair.first;
Pair<Annotation, InputStream> pair3 = serializer.read(p2);
kis.close();
} catch (IOException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (ClassCastException e) {
e.printStackTrace();
} catch(Exception ex) {
logger.error("Exception: "+ex);
ex.printStackTrace();
}
}
This line is unnecessary and should be deleted:
Pair<Annotation, InputStream> pair3 = serializer.read(p2);
If you have set up readAnnotation correctly that's the end of the read/write process. p2 is empty because you have read all its contents already.
There is a clear example of how to use serialization here:
https://github.com/stanfordnlp/CoreNLP/blob/master/itest/src/edu/stanford/nlp/pipeline/ProtobufSerializationSanityITest.java
You will have to also build a CoreDocument from an Annotation.
CoreDocument readDocument = new CoreDocument(readAnnotation);

Open a .Bat Using Java Apllication

I'm trying to Open the CMD Using java + Applying code to it to open an .jar so the applications output is shown in the .bat file.
can someone tell me how to do it?
This is the code it got,it does run excecute the file but the CMD doesnt show.
btnTest.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent arg0) {
String Bat = "C:"+File.separatorChar+"Users"+File.separatorChar+"Gebruiker"+File.separatorChar+"AppData"+File.separatorChar+"Local"+File.separatorChar+"Temp"+File.separatorChar+"hexT"+File.separatorChar+"run.bat";
Runtime rt = Runtime.getRuntime();
try {
rt.exec(Bat);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
});
Edited: This works for me:
String Bat = "C:\\app.bat"; //Try to use \\ as path seperator
try {
Runtime.getRuntime().exec("cmd /c start " + Bat);
} catch (IOException e) {
e.printStackTrace();
}
Define this :
FileWriter writer;
then in your try/catch do the following :
try {
writer = new FileWriter("test.txt");
Process child = rt.exec(Bat);
InputStream input = child.getInputStream();
BufferedInputStream buffer = new BufferedInputStream(input);
BufferedReader commandResult = new BufferedReader(new InputStreamReader(buffer));
String line = "";
try {
while ((line = commandResult.readLine()) != null) {
writer.write(line + "\n");
}
writer.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
This will read the output as a buffer line by line and write it into a text file

How do I read and write UTF-8 encoding from java assets folder? I have the code, but its not working. I get �D�nde est� / D�nde

private void CopyAssets2() {
AssetManager assetManager = getAssets();
String[] files = null;
try {
files = assetManager.list("");
} catch (IOException e) {
Log.e("File Error", e.getMessage());
}
for (String filename : files) {
InputStream in = null;
OutputStream out = null;
try {
in = assetManager.open(filename);
out = new FileOutputStream("/sdcard/Translate/" + filename);
copyFile2(in, out);
in.close();
in = null;
out.flush();
out.close();
out = null;
} catch (Exception e) {
Log.e("Save Error", e.getMessage());
}
}
}
private void copyFile2(InputStream in, OutputStream out)
throws IOException {
char[] buffer = new char[1024];
Reader reader = new BufferedReader( new InputStreamReader(in, "UTF-8"));
Writer writer = new BufferedWriter(new OutputStreamWriter(out, "UTF-8"));
int read;
while ((read = reader.read(buffer)) != -1) {
writer.write(buffer, 0, read);
}
reader.close();
writer.flush();
writer.close();
}
Im getting the inputstream with assetManager, and passing it as a parameter of reader with UTF-8 encoding specified.
I'm also doing writing to the outputstream filepath with writer in UTF-8
The file is read and written, but the encoding is still wrong. I get characters like these:
Where are... = �D�nde est� / D�nde
What I am doing wrong?
Are you sure the input file is encoded in UTF-8? The � you see in the output is a character that is used as a replacement for byte sequences that could not be converted into characters when reading.
You could make a binary copy instead of decoding and encoding text:
byte[] buffer = new byte[1024];
InputStream reader = new BufferedInputStream(in);
OutputStream writer = new BufferedOutputStream(out);
int read;
while ((read = reader.read(buffer)) != -1) {
writer.write(buffer, 0, read);
}

How to make a save action that checks whether a 'save-as' has already been performed

I have researched and tried to refer back to my fileChooser.getSeletedFile() in my save as action but can not work out how to check whether or not a file has been created. Here is my attempted code so far:
Save as code(works well):
public void Save_As() {
fileChooserTest.setApproveButtonText("Save");
int actionDialog = fileChooserTest.showOpenDialog(this);
File fileName = new File(fileChooserTest.getSelectedFile() + ".txt");
try {
if (fileName == null) {
return;
}
BufferedWriter outFile = new BufferedWriter(new FileWriter(fileName));
outFile.write(this.jTextArea2.getText());//put in textfile
outFile.flush(); // redundant, done by close()
outFile.close();
} catch (IOException ex) {
}
}
"Save" code doesn't work:
private void SaveActionPerformed(java.awt.event.ActionEvent evt) {
File f = fileChooserTest.getSelectedFile();
try {
if (f.exists()) {
BufferedWriter bw1 = new BufferedWriter(new FileWriter(fileChooserTest.getSelectedFile() + ".txt"));
bw1 = new BufferedWriter(new FileWriter(fileChooserTest.getSelectedFile() + ".txt"));
String text = ((JTextArea) jTabbedPane1.getSelectedComponent()).getText();
bw1.write(text);
bw1.close();
} else {
Save_As();
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
Instead of storing an instance to the JFileChooser rather store an instance to the File (wich will be null before any save has been performed). In your SaveActionPerformed method check if the file is null. If it is null then do a Save_As and store the selected file in your file variable, if it is not null then do a normal save into the file.

Resources