How to get rid of NullPointerException in Flume Interceptor?

How to get rid of NullPointerException in Flume Interceptor? - hadoop

I have an interceptor written for Flume code is below:
public Event intercept(Event event) {
byte[] xmlstr = event.getBody();
InputStream instr = new ByteArrayInputStream(xmlstr);
//TransformerFactory factory = TransformerFactory.newInstance(TRANSFORMER_FACTORY_CLASS,TRANSFORMER_FACTORY_CLASS.getClass().getClassLoader());
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("removeNs.xslt"));
Transformer transformer = null;
try {
transformer = factory.newTransformer(xslt);
} catch (TransformerConfigurationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
Source text = new StreamSource(instr);
OutputStream ostr = new ByteArrayOutputStream();
try {
transformer.transform(text, new StreamResult(ostr));
} catch (TransformerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
event.setBody(ostr.toString().getBytes());
return event;
}
I'm removing NameSpace from my source xml with removeNs.xslt file. So that I can store that data into HDFS and later put into hive. When my interceptor run it throw below error :
ERROR org.apache.flume.source.jms.JMSSource: Unexpected error processing events
java.lang.NullPointerException
at test.intercepter.App.intercept(App.java:59)
at test.intercepter.App.intercept(App.java:82)
at org.apache.flume.interceptor.InterceptorChain.intercept(InterceptorChain.java:62)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:146)
at org.apache.flume.source.jms.JMSSource.doProcess(JMSSource.java:258)
at org.apache.flume.source.AbstractPollableSource.process(AbstractPollableSource.java:54)
at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:139)
at java.lang.Thread.run(Thread.java:745)*
Can you suggest me what and where is the problem?

I found the solution. The problem was not anything else than new File("removeNs.xslt"). It was not able to find the location as I's not sure where to keep this file but later I get the flume agent path but as soon as I restart the flume agent it deletes all files which I kept in the flume agent dir. So I changed the code and kept the file material into my java code.

Related

How to use method documentoPDF() without BeanCreatingException

Someone could tell why mi app doesn't start when i put the method documentoPDF() in my springboot app? I reach the line that causes the error (the one after the try declaration) but I do not know why and how to solve, any suggestion? if I remove this invoking my app starts normally...
public void documentopPDF(){
Document document = new Document();
try{
PdfWriter.getInstance(document, new FileOutputStream("table.pdf"));
} catch(FileNotFoundException e1){
e1.printStackTrace();
} catch (DocumenmtException e2) {}
e2.printStackTrace();
}

Is there a way to batch upload a collection of InputStreams to Amazon S3 using the Java SDK?

I am aware of the TransferManager and the .uploadFileList() and .uploadFileDirectory() methods, however they accept java.io.File types as arguments. I have a collection of byte array input streams containing jpeg image data. I don't want to create in-memory files to store this data before I upload it either.
So what I need is essentially what the S3 client's PutObjectRequest does but for a collection of InputStream objects. Also, if one upload fails, I want to abort the whole thing and not upload anything, much like how a database transaction will reverse the changes if something goes wrong along the way.
Is this possible with the Java SDK?

Before I share an answer, please consider upgrading...
fyi - TransferManager is deprecated, now supported as TransferManagerBuilder in JAVA AWS SDK, please consider upgrading if TransferManagerBuilder Object suits your needs.
now since you asked about TransferManager, you could either 1) copy the code below and replace the functionality/arguments with your custom in memory handling of the input stream and handle it in your custom function... or; 2) further below is another sample, try to use this as-is...
Github source modify with with inputstream and issue listed here
private def uploadFile(is: InputStream, s3ObjectName: String, metadata: ObjectMetadata) = {
try {
val putObjectRequest = new PutObjectRequest(bucketName, s3ObjectName,
is, metadata)
// TransferManager supports asynchronous uploads and downloads
val upload = transferManager.upload(putObjectRequest)
upload.addProgressListener(ExceptionReporter.wrap(UploadProgressListener(putObjectRequest)))
} catch {
case e: Exception => throw new RuntimeException(e)
}
}
Bonus, Nice custom answer here using sequence input streams
public void combineFiles() {
List<String> files = getFiles();
long totalFileSize = files.stream()
.map(this::getContentLength)
.reduce(0L, (f, s) -> f + s);
try {
try (InputStream partialFile = new SequenceInputStream(getInputStreamEnumeration(files))) {
ObjectMetadata resultFileMetadata = new ObjectMetadata();
resultFileMetadata.setContentLength(totalFileSize);
s3Client.putObject("bucketName", "resultFilePath", partialFile, resultFileMetadata);
}
} catch (IOException e) {
LOG.error("An error occurred while combining files. {}", e);
}
}
private Enumeration<? extends InputStream> getInputStreamEnumeration(List<String> files) {
return new Enumeration<InputStream>() {
private Iterator<String> fileNamesIterator = files.iterator();
#Override
public boolean hasMoreElements() {
return fileNamesIterator.hasNext();
}
#Override
public InputStream nextElement() {
try {
return new FileInputStream(Paths.get(fileNamesIterator.next()).toFile());
} catch (FileNotFoundException e) {
System.err.println(e.getMessage());
throw new RuntimeException(e);
}
}
};
}

spring write string to file - spacing error

In my Spring boot application, I receive String, now I want to save them as files in a specific directory.
How can I do so ?
I have gone through this, but it is receiving file and saving, but I want to write to those files.
I'm using this code, raw JAVA:
PrintWriter writer = null;
try {
writer = new PrintWriter("file.txt", "UTF_32");
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
writer.println(data);
writer.close();
But it not how everyone will probably want, take a look:

It looks like it's your character encoding, UTF_32.
Notepad does not support UTF_32, only ansi, UTF_8, UTF_16.
See:
Can Notepad read UTF-32?

Birt Images with IText never ends

I'm working with Birt reports and, when i want to generate a pdf file, it never ends. The problem is in the line.
IRunAndRenderTask.run. I have no exception when I create the IBirtEngine.
Here is the code to create the BirtEngine and the reports designs.
private static IReportEngine birtEngine = null;
private static IReportRunnable examenAuditifReportDesign = null;
private static IReportRunnable bilanCollectifReportDesign = null;
private static Properties configProps = new Properties();
static{
loadEngineProps();//Read the birt configuration properties
EngineConfig config = new EngineConfig();
if( configProps != null){
config.setLogConfig(configProps.getProperty("logDirectory"), Level.INFO);
config.setBIRTHome(configProps.getProperty("birtHome"));
config.setResourcePath(configProps.getProperty("ressourcePath"));
config.setEngineHome(configProps.getProperty("birtHome"));
config.setProperty("birtReportsHome", configProps.getProperty("birtReportsHome"));
}
try {
RegistryProviderFactory.releaseDefault();
Platform.startup( config );
} catch ( BirtException e ) {
e.printStackTrace( );
}
IReportEngineFactory factory = (IReportEngineFactory) Platform.createFactoryObject(IReportEngineFactory.EXTENSION_REPORT_ENGINE_FACTORY );
birtEngine = factory.createReportEngine( config );
try {
examenAuditifReportDesign = birtEngine.openReportDesign(
new FileInputStream(birtEngine.getConfig().getProperty("birtReportsHome") + "/examenAuditif.rptdesign"));
bilanCollectifReportDesign = birtEngine.openReportDesign(
new FileInputStream(birtEngine.getConfig().getProperty("birtReportsHome") + "/bilanCollectif.rptdesign"));
} catch (EngineException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
And here's the code to execute the reports.
IRunAndRenderTask task = birtEngine.createRunAndRenderTask(examenAuditifReportDesign);
task.setParameterValue("RPT_ID_Travailleur", t.getId());
task.setParameterValue("RPT_LATEST_HA", params.getId_latest_ha());
task.setParameterValue("RPT_LATEST_EXAMN", params.getId_latest_examen());
task.setParameterValue("RPT_PREVIOUS_HA", params.getId_previous_ha());
task.setParameterValue("RPT_PREVIOUS_EXAMN", params.getId_previous_examen());
PDFRenderOption options = new PDFRenderOption();
options.setOutputStream(outputStream);
options.setOutputFormat("pdf");
task.setRenderOption(options);
task.run();
task.close();
and is in the task.run(); line that it takes forever, and i tried for about 1 hour or 1 hour and a half and it does not end.
If anyone can help it will be really apreciated.
Bye and thank you.

If this report works in a birt Eclipse designer then it should work with a birt runtime API, the issue is in your code.
You need to close yourself the outputStream sent to the PDFRenderOption object, when the task has terminated
Add this line to your code:
options.setSupportedImageFormats("PNG;JPG;BMP");

I found the problem and it wasn't neither of the two API's.
We were trying to generate a pdf file for worker, and we were zipping all the pdf file of all the workers, and this zip was saved in a blob in the database. That was the problem.
If the zip is saved in the File System it works ok.
thank you all.

Unable to load OpenNLP sentence model in Hadoop map-reduce job

I'm trying to get OpenNLP integrated into a map-reduce job on Hadoop, starting with some basic sentence splitting. Within the map function, the following code is run:
public AnalysisFile analyze(String content) {
InputStream modelIn = null;
String[] sentences = null;
// references an absolute path to en-sent.bin
logger.info("sentenceModelPath: " + sentenceModelPath);
try {
modelIn = getClass().getResourceAsStream(sentenceModelPath);
SentenceModel model = new SentenceModel(modelIn);
SentenceDetectorME sentenceBreaker = new SentenceDetectorME(model);
sentences = sentenceBreaker.sentDetect(content);
} catch (FileNotFoundException e) {
logger.error("Unable to locate sentence model.");
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
}
}
}
logger.info("number of sentences: " + sentences.length);
<snip>
}
When I run my job, I'm getting an error in the log saying "in must not be null!" (source of class throwing error), which means that somehow I can't open an InputStream to the model. Other tidbits:
I've verified that the model file exists in the location sentenceModelPath refers to.
I've added Maven dependencies for opennlp-maxent:3.0.2-incubating, opennlp-tools:1.5.2-incubating, and opennlp-uima:1.5.2-incubating.
Hadoop is just running on my local machine.
Most of this is boilerplate from the OpenNLP documentation. Is there something I'm missing, either on the Hadoop side or the OpenNLP side, that would cause me to be unable to read from the model?

Your problem is the getClass().getResourceAsStream(sentenceModelPath) line. This will try to load a file from the classpath - neither the file in HDFS nor on the client local file system is part of the classpath at mapper / reducer runtime, so this is why you're seeing the Null error (the getResourceAsStream() returns null if the resource cannot be found).
To get around this you have a number of options:
Amend your code to load the file from HDFS:
modelIn = FileSystem.get(context.getConfiguration()).open(
new Path("/sandbox/corpus-analysis/nlp/en-sent.bin"));
Amend your code to load the file from the local dir, and use the -files GenericOptionsParser option (which copies to file from the local file system to HDFS, and back down to the local directory of the running mapper / reducer):
modelIn = new FileInputStream("en-sent.bin");
Hard-bake the file into the job jar (in the root dir of the jar), and amend your code to include a leading slash:
modelIn = getClass().getResourceAsStream("/en-sent.bin");</li>

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to get rid of NullPointerException in Flume Interceptor? - hadoop

Related

How to use method documentoPDF() without BeanCreatingException

Is there a way to batch upload a collection of InputStreams to Amazon S3 using the Java SDK?

spring write string to file - spacing error

Birt Images with IText never ends

Unable to load OpenNLP sentence model in Hadoop map-reduce job

Categories

Resources