How to extract and manipulate data within a Nifi processor

How to extract and manipulate data within a Nifi processor - apache-nifi

I'm trying to write a custom Nifi processor which will take in the contents of the incoming flow file, perform some math operations on it, then write the results into an outgoing flow file. Is there a way to dump the contents of the incoming flow file into a string or something? I've been searching for a while now and it doesn't seem that simple. If anyone could point me toward a good tutorial that deals with doing something like that it would be greatly appreciated.

The Apache NiFi Developer Guide documents the process of creating a custom processor very well. In your specific case, I would start with the Component Lifecycle section and the Enrich/Modify Content pattern. Any other processor which does similar work (like ReplaceText or Base64EncodeContent) would be good examples to learn from; all of the source code is available on GitHub.
Essentially you need to implement the #onTrigger() method in your processor class, read the flowfile content and parse it into your expected format, perform your operations, and then re-populate the resulting flowfile content. Your source code will look something like this:
#Override
public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
FlowFile flowFile = session.get();
if (flowFile == null) {
return;
}
final ComponentLog logger = getLogger();
AtomicBoolean error = new AtomicBoolean();
AtomicReference<String> result = new AtomicReference<>(null);
// This uses a lambda function in place of a callback for InputStreamCallback#process()
processSession.read(flowFile, in -> {
long start = System.nanoTime();
// Read the flowfile content into a String
// TODO: May need to buffer this if the content is large
try {
final String contents = IOUtils.toString(in, StandardCharsets.UTF_8);
result.set(new MyMathOperationService().performSomeOperation(contents));
long stop = System.nanoTime();
if (getLogger().isDebugEnabled()) {
final long durationNanos = stop - start;
DecimalFormat df = new DecimalFormat("#.###");
getLogger().debug("Performed operation in " + durationNanos + " nanoseconds (" + df.format(durationNanos / 1_000_000_000.0) + " seconds).");
}
} catch (Exception e) {
error.set(true);
getLogger().error(e.getMessage() + " Routing to failure.", e);
}
});
if (error.get()) {
processSession.transfer(flowFile, REL_FAILURE);
} else {
// Again, a lambda takes the place of the OutputStreamCallback#process()
FlowFile updatedFlowFile = session.write(flowFile, (in, out) -> {
final String resultString = result.get();
final byte[] resultBytes = resultString.getBytes(StandardCharsets.UTF_8);
// TODO: This can use a while loop for performance
out.write(resultBytes, 0, resultBytes.length);
out.flush();
});
processSession.transfer(updatedFlowFile, REL_SUCCESS);
}
}
Daggett is right that the ExecuteScript processor is a good place to start because it will shorten the development lifecycle (no building NARs, deploying, and restarting NiFi to use it) and when you have the correct behavior, you can easily copy/paste into the generated skeleton and deploy it once.

Related

Is there a way to batch upload a collection of InputStreams to Amazon S3 using the Java SDK?

I am aware of the TransferManager and the .uploadFileList() and .uploadFileDirectory() methods, however they accept java.io.File types as arguments. I have a collection of byte array input streams containing jpeg image data. I don't want to create in-memory files to store this data before I upload it either.
So what I need is essentially what the S3 client's PutObjectRequest does but for a collection of InputStream objects. Also, if one upload fails, I want to abort the whole thing and not upload anything, much like how a database transaction will reverse the changes if something goes wrong along the way.
Is this possible with the Java SDK?

Before I share an answer, please consider upgrading...
fyi - TransferManager is deprecated, now supported as TransferManagerBuilder in JAVA AWS SDK, please consider upgrading if TransferManagerBuilder Object suits your needs.
now since you asked about TransferManager, you could either 1) copy the code below and replace the functionality/arguments with your custom in memory handling of the input stream and handle it in your custom function... or; 2) further below is another sample, try to use this as-is...
Github source modify with with inputstream and issue listed here
private def uploadFile(is: InputStream, s3ObjectName: String, metadata: ObjectMetadata) = {
try {
val putObjectRequest = new PutObjectRequest(bucketName, s3ObjectName,
is, metadata)
// TransferManager supports asynchronous uploads and downloads
val upload = transferManager.upload(putObjectRequest)
upload.addProgressListener(ExceptionReporter.wrap(UploadProgressListener(putObjectRequest)))
} catch {
case e: Exception => throw new RuntimeException(e)
}
}
Bonus, Nice custom answer here using sequence input streams
public void combineFiles() {
List<String> files = getFiles();
long totalFileSize = files.stream()
.map(this::getContentLength)
.reduce(0L, (f, s) -> f + s);
try {
try (InputStream partialFile = new SequenceInputStream(getInputStreamEnumeration(files))) {
ObjectMetadata resultFileMetadata = new ObjectMetadata();
resultFileMetadata.setContentLength(totalFileSize);
s3Client.putObject("bucketName", "resultFilePath", partialFile, resultFileMetadata);
}
} catch (IOException e) {
LOG.error("An error occurred while combining files. {}", e);
}
}
private Enumeration<? extends InputStream> getInputStreamEnumeration(List<String> files) {
return new Enumeration<InputStream>() {
private Iterator<String> fileNamesIterator = files.iterator();
#Override
public boolean hasMoreElements() {
return fileNamesIterator.hasNext();
}
#Override
public InputStream nextElement() {
try {
return new FileInputStream(Paths.get(fileNamesIterator.next()).toFile());
} catch (FileNotFoundException e) {
System.err.println(e.getMessage());
throw new RuntimeException(e);
}
}
};
}

How to read and write files in a reactive way using InputStreamand OutputStream

I am trying to read an Excel file in manipulate it or add new data to it and write it back out. I am also trying to do this a complete reactive process using Flux and Mono. The Idea is to return the resulting file or bytearray via a webservice.
My question is how do I get a InputStream and OutputStream in a non blocking way?
I am using the Apache Poi library to read and generate the Excel File.
I currently have a solution based around a mix of Mono.fromCallable() and Blocking code getting the Input Stream.
For example the webservice part is as follows.
#GetMapping(value = API_BASE_PATH + "/download", produces = "application/vnd.ms-excel")
public Mono<ByteArrayResource> download() {
Flux<TimeKeepingEntry> createExcel = excelExport.createDocument(false);
return createExcel.then(Mono.fromCallable(() -> {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
excelExport.getWb().write(outputStream);
return new ByteArrayResource(outputStream.toByteArray());
}).subscribeOn(Schedulers.elastic()));
}
And the Processing of the file:
public Flux<TimeKeepingEntry> createDocument(boolean all) {
Flux<TimeKeepingEntry> entries = null;
try {
InputStream inputStream = new ClassPathResource("Timesheet Template.xlsx").getInputStream();
wb = WorkbookFactory.create(inputStream);
Sheet sheet = wb.getSheetAt(0);
log.info("Created document");
if (all) {
//all entries
} else {
entries = service.findByMonth(currentMonthName).log("Excel Export - retrievedMonths").sort(Comparator.comparing(TimeKeepingEntry::getDateOfMonth)).doOnNext(timeKeepingEntry-> {
this.populateEntry(sheet, timeKeepingEntry);
});
}
} catch (IOException e) {
log.error("Error Importing File", e);
}
return entries;
}
This works well enough but not very in line with Flux and Mono. Some guidance here would be good. I would prefer to have the whole sequence non-blocking.

Unfortunately the WorkbookFactory.create() operation is blocking, so you have to perform that operation using imperative code. However fetching each timeKeepingEntry can be done reactively. Your code would looks something like this:
public Flux<TimeKeepingEntry> createDocument() {
return Flux.generate(
this::getWorkbookSheet,
(sheet, sink) -> {
sink.next(getNextTimeKeepingEntryFrom(sheet));
},
this::closeWorkbook);
}
This will keep the workbook in memory, but will fetch each entry on demand when the elements of the Flux are requested.

opendaylight : Storing a sting in MDSAL

I have a YANG model (known to MDSAL) which I am using in an opendaylight application. In my application, I am presented with a json formatted String which I want to store in the MDSAL database. I could use the builder of the object that I wish to store and set its with fields presented in the json formatted String one by one but this is laborious and error prone.
Alternatively I could post from within the application to the Northbound API which will eventually write to the MDSAL datastore.
Is there a simpler way to do this?
Thanks,

Assuming that your incoming JSON matches the structure of your YANG model exactly (does it?), I believe what you are really looking for is to transform that JSON into a "binding independant" (not setters of the generated Java class) internal model - NormalizedNode & Co. Somewhere in the controller or mdsal project there is a "codec" class that can do this.
You can either search for such code, and its usages (I find looking at tests are always useful) in the ODL controller and mdsal projects source code, or in other ODL projects which do similar things - I'm thinking specifically browsing around the jsonrpc and daexim projects sources; specifically this looks like it may inspire you: https://github.com/opendaylight/daexim/blob/stable/nitrogen/impl/src/main/java/org/opendaylight/daexim/impl/ImportTask.java
Best of luck.

Based on the information above, I constructed the following (which I am posting here to help others). I still do not know how to get rid of the deprecated reference to SchemaService (perhaps somebody can help).
private void importFromNormalizedNode(final DOMDataReadWriteTransaction rwTrx, final LogicalDatastoreType type,
final NormalizedNode<?, ?> data) throws TransactionCommitFailedException, ReadFailedException {
if (data instanceof NormalizedNodeContainer) {
#SuppressWarnings("unchecked")
YangInstanceIdentifier yid = YangInstanceIdentifier.create(data.getIdentifier());
rwTrx.put(type, yid, data);
} else {
throw new IllegalStateException("Root node is not instance of NormalizedNodeContainer");
}
}
private void importDatastore(String jsonData, QName qname) throws TransactionCommitFailedException, IOException,
ReadFailedException, SchemaSourceException, YangSyntaxErrorException {
// create StringBuffer object
LOG.info("jsonData = " + jsonData);
byte bytes[] = jsonData.getBytes();
InputStream is = new ByteArrayInputStream(bytes);
final NormalizedNodeContainerBuilder<?, ?, ?, ?> builder = ImmutableContainerNodeBuilder.create()
.withNodeIdentifier(new YangInstanceIdentifier.NodeIdentifier(qname));
try (NormalizedNodeStreamWriter writer = ImmutableNormalizedNodeStreamWriter.from(builder)) {
SchemaPath schemaPath = SchemaPath.create(true, qname);
LOG.info("SchemaPath " + schemaPath);
SchemaNode parentNode = SchemaContextUtil.findNodeInSchemaContext(schemaService.getGlobalContext(),
schemaPath.getPathFromRoot());
LOG.info("parentNode " + parentNode);
try (JsonParserStream jsonParser = JsonParserStream.create(writer, schemaService.getGlobalContext(),
parentNode)) {
try (JsonReader reader = new JsonReader(new InputStreamReader(is))) {
reader.setLenient(true);
jsonParser.parse(reader);
DOMDataReadWriteTransaction rwTrx = domDataBroker.newReadWriteTransaction();
importFromNormalizedNode(rwTrx, LogicalDatastoreType.CONFIGURATION, builder.build());
}
}
}
}

Spring Batch ~ Dynamic commit interval or a custom completion policy

What I have?
Spring Integration that watch recursively a folder for new CSV's files; and send them back to Spring batch.
The job: read the CSV file; in the processor, I modify some data in the items; then I use a custom writer to save my data on the DB.
Problem?
In fact that I have dynamic number of CSV beeing send to the batch. I want that my job commit interval will be based on the number of items (lines) present in the CSV's file. In other way, I don't want to commit my data in every fixed number of item, but every end of file. Exemple: CSV 1 have 200 Lines, I want to process all the lines, writes them, commit, close the transaction then read the next CSV.
I have two idea, but I didn't know whoch is the perfect and how to implement it:
Get from the reader the number of lines in my CSV and send it to my commit interval using a job parameter argument like so #{jobParameters['commit.interval.value']}
Implement a Custom Completion Policy to replace my commit inteval, how to implement isComplete() Do you have any exemples? Github project?
But before all that, how can I get the number of items?
Could any one helps me? a code sample maybe?
Thank you in advance.

No answer, but I found a solution
I'm using a Dynamic commit interval instead of a completion policy.
With Spring batch integration, I can use a transformer to send my file to the batch, for that I have a custom class FileMessageToJobRequest in that one I added this function that helps me to get the count lines
public static int countLines(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean empty = true;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
}
return (count == 0 && !empty) ? 1 : count;
} finally {
is.close();
}
}
and this one to send parameters
#Transformer
public JobLaunchRequest toRequest(Message<File> message) throws IOException{
JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();
jobParametersBuilder.addString("commit.interval", Integer.toString(countLines(message.getPayload().getAbsolutePath())));
return new JobLaunchRequest(job, jobParametersBuilder.toJobParameters());
}
and in my job context, I just added this commit-interval="#{jobParameters['commit.interval']}"
Hope it help someone in need ;)

How do you save images to a Blackberry device via HttpConnection?

My script fetches xml via httpConnection and saves to persistent store. No problems there.
Then I loop through the saved data to compose a list of image url's to fetch via queue.
Each of these requests calls the httpConnection thread as so
...
public synchronized void run()
{
HttpConnection connection = (HttpConnection)Connector.open("http://www.somedomain.com/image1.jpg");
connection.setRequestMethod("GET");
String contentType = connection.getHeaderField("Content-type");
InputStream responseData = connection.openInputStream();
connection.close();
outputFinal(responseData, contentType);
}
public synchronized void outputFinal(InputStream result, String contentType) throws SAXException, ParserConfigurationException, IOException
{
if(contentType.startsWith("text/"))
{
// bunch of xml save code that works fine
}
else if(contentType.equals("image/png") || contentType.equals("image/jpeg") || contentType.equals("image/gif"))
{
// how to save images here?
}
else
{
//default
}
}
What I can't find any good documentation on is how one would take the response data and save it to an image stored on the device.
Maybe I just overlooked something very obvious. Any help is very appreciated.
Thanks
I tried following this advise and found the same thing I always find when looking up BB specific issues: nothing.
The problem is that every example or post assumes you know everything about the platform.
Here's a simple question: What line of code writes the read output stream to the blackberry device? What path? How do I retrieve it later?
I have this code, which I do not know if it does anything because I don't know where it is supposedly writing to or if that's even what it is doing at all:
** filename is determined on a loop based on the url called.
FileOutputStream fos = null;
try
{
fos = new FileOutputStream( File.FILESYSTEM_PATRIOT, filename );
byte [] buffer = new byte [262144];
int byteRead;
while ((byteRead = result.read (buffer ))!=- 1)
{
fos.write (buffer, 0, byteRead);
}
fos.flush();
fos.close();
}
catch(IOException ieo)
{
}
finally
{
if(fos != null)
{
fos.close();
}
}
The idea is that I have some 600 images pulled from a server. I need to loop the xml and save each image to the device so that when an entity is called, I can pull the associated image - entity_id.png - from the internal storage.
The documentation from RIM does not specify this, nor does it make it easy to begin figuring it out.
This issue does not seem to be addressed on this forum, or others I have searched.
Thanks

You'll need to use the Java FileOutputStream to do the writing. You'll also want to close the connection after reading the data from the InputStream (move outputFinal above your call to close). You can find all kinds of examples regarding FileOutputStream easily.
See here for more. Note that in order to use the FileOutputStream your application must be signed.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to extract and manipulate data within a Nifi processor - apache-nifi

Related

Is there a way to batch upload a collection of InputStreams to Amazon S3 using the Java SDK?

How to read and write files in a reactive way using InputStreamand OutputStream

opendaylight : Storing a sting in MDSAL

Spring Batch ~ Dynamic commit interval or a custom completion policy

How do you save images to a Blackberry device via HttpConnection?

Categories

Resources