QueryDNS stuck, how to get out of UnexpectedNamingException? - apache-nifi

Using QueryDNS, some of my incoming flowfiles carry an "invalid" fully qualified domain name.
In this case the QueryDNS processor displays an ugly error message
Failed to process session due to Unexpected NamingException while processing records. Please review your configuration.: org.apache.nifi.processor.exception.ProcessException: Unexpected NamingException while processing records. Please review your configuration.
It returns the flowfile to the incoming queue and will loop indefinitely yielding and trying to process the flowfile. Meanwhile other incoming flowfiles are stuck in the incoming queue and will never get processed since there are only "found" or "not found" relationships available with the processor.
How is it possible to get rid of these flowfiles (in NiFi 1.9.2), for example passing them to a LogAttribute processor ?
QueryDNS stuck

The only way I found to get round this issue was to thoroughly clean/validate the hostnames/IPs I was looking up before it reached the processor.
If I'm honest the processor isn't really fit for purpose for any significant quantity of data. The problem you mention coupled with the lack of caching make it practically useless in production.
In the end we switched to using Logstash for our enrichment rather than NiFi, although, depending on your use case this may not be possible.

Related

Camunda: Receive multiple, different messages at once

I am currently developing a kinda complex workflow with camunda. The goal of this workflow is to orchestrate the execution of different external business processes. Which includes start, overwatch and synchronize these workflows. Everything besides the synchronization works as expected.
Example:
My example has one main workflow which starts multiple sub workflows. The main workflow has to be aware when all sub workflows are finished. Every sub workflow is triggered by a message and sends a message back to the main workflow at the end of execution. Therefore, all sub workflows should be synchronized in the main workflow.
Xml can be accessed on this site: https://pastebin.com/2aj4z0zU
Unfortunately, this leads to numerous message correlation exceptions at the choke point in the main workflow (1st lane, after the first parallel gateway). I am using the following code to correlate the messages:
this.runtimeService.createMessageCorrelation(messageName)
.processInstanceId(processInstanceId)
.setVariables(payload)
.correlate();
The whole workflow is executable and runs without errors, but only if one example_workflow at a time is executed. Starting multiple example_workflows quickly one after another results in this type of exception randomly for every message type:
ENGINE-16004 Exception while closing command context: Cannot correlate message 'PROCESS_B_FINISHED': No process definition or execution matches the parameters org.camunda.bpm.engine.MismatchingMessageCorrelationException: Cannot correlate message 'PROCESS_B_FINISHED': No process definition or execution matches the parameters
at org.camunda.bpm.engine.impl.cmd.CorrelateMessageCmd.execute(CorrelateMessageCmd.java:88) ~[camunda-engine-7.14.0.jar!/:7.14.0]
Currently, the correlation exceptions occur if a postgresql database is used. The same workflow runs much better, but not perfect, when we use a h2 file-based database. All receive tasks are not configured asynchronously, only send tasks are (async before + exclusive).
Questions:
Is this already the best practice to synchronize multiple messages in one workflow?
What could be the reason for the correlation exceptions while using a postgresql database?
Used software:
spring boot application [Version:2.3.4]
camunda [Version:7.14.0]
h2 [Version:1.4.200]
postgresql [Version:42.2.22]
the process model seems to contain sequences where it can run into a deadlock (What if blue is followed directly by green? Or yellow?) or where you have race conditions. If the process has not reached a state where it is in a receiving state for the message, then the message delivery will fail (as indicated in the error message you shared)
(The reason you are observing the CorellationException more frequently on postgresql if the race condition. With this external database some operations take slightly more time, increasing the chance of the race condition occurring).
The process engine needs to be able to match a message to a unique receiver. If there are multiple potential receivers for the same message name, and no other correlation criteria creating a unique match is provided, then the delivery will also fail. You either need to use unique message names per instance or better use a businessKey or a process data which is unique per instance as additional correlation criteria. This is why it does not work when you run multiple process instances.
Modelling a workflow with this parallel message bottleneck leads to a race condition, as mentioned by #rob2universe's post.
To solve this problem, I had firstly to correlate the messages directly. I did this by adding a unique identifier to every message, which was not a big deal due to the fact that an item ID was defined within the payload of every message. Secondly, I had to remove all asynchronous and exclusive markers for every receive task and connected gateways. And thirdly, I had to reset the job executor properties to default values. Limiting the pool size and jobs per acquisition did not benefit the workflow execution.
After all these changes, my workflow now runs as expected with no errors. Unfortunately, due to the described bottleneck optimistic logging exceptions are common, but the workflow engine handles these exceptions without further errors.

When does the Penalty Duration on the InvokeHTTP Processor in NiFi kick in?

I am trying to use the "Penalty Duration" on the Settings of the InvokeHTTP Processor.
I understand that the flow file will be penalized by this duration if the processor determines that there may be a problem with this flow file.
My question is "under what conditions during the processing of InvokeHTTP would the flow file be penalized ?"
Is it when the result is Failure, Retry ?
I did read this post. However, I am still not clear on the penalty conditions for the InvokeHTTP processor
I am using NiFi 1.9.1
Thanks
The invocation is attempted. If it occurs without any exceptions being thrown then a routing decision is made. The routing decision is what decides whether to go to success, response, retry, no retry. This is largely based on the status code returned by the invoked web service. However, if any exception is thrown during the invocation/response handling then the request is sent to 'failure' and is penalized.

Apache NiFi instance hangs on the "Computing FlowFile lineage..." window

My Apache NiFi instance just hangs on the "Computing FlowFile lineage..." for a specific flow. Others work, but it won't show the lineage for this specific flow for any data files. The only error message in the log is related to an error in one of the processors, but I can't see how that would affect the lineage, or stop the page from loading.
This was related to two things...
1) I was using the older (but default) provenance repository, which didn't perform well, resulting in the lag in the UI. So I needed to change it...
#nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository
nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
2) Fixing #1 exposed the second issue, which was that the EnforceOrder processor was generating hundreds of provenance events per file, because I was ordering on a timestamp, which had large gaps between the values. This is apparently not a proper use case for the EnforceOrder processor. So I'll have to remove it and find another way to do the ordering.

Best practices to handle errors in NIFI

I'm using NIFI, and i have data flows where I use the following processos :
ExecuteScript
RouteOnAttribute
FetchMapDistribuedCache
InvokeHTTPRequest
EvaluateJSONPath
and two level process group like NIFI FLOW >>> Process group 1 >>> Process group 2, my question is how to handle errors in this case, I have created output port for each processor to output errors outside the process group and in the NIFI Flow I have done a funnel for each error type and then put all those errors catched in Hbase so i can do some reporting later on, and as you can imagine this add multiples relationships and my simple dataflow start to became less visible.
My questions are, what's the best practices to handle errors in processors, and what's the best approach to do some error reporting using NIFI ( Email or PDF )
It depends on the errors you routinely encounter. Some processors may fail to perform a task (an expected but not desired outcome), and route the failed flowfile to REL_FAILURE, a specific relationship which can be connected to a processor to handle these failures, or back to the same processor to be retried. Others (or the same processors in different scenarios) may encounter exceptions, which are unexpected occurrences which cannot be resolved by the processor.
An example of this is PutKafka vs. EncryptContent. If the remote Kafka system is temporarily unavailable, the processor would fail to send the flowfile content. However, retrying after some delay period could be successful if the remote system is once again available. However, decrypting cipher text with the wrong key will always throw an exception, no matter how many times it is attempted or how long the retry delay is.
Many users route the errors to PutEmail processor and report them to a specific user/group who can evaluate the errors and monitor the data flow if necessary. You can also use "Reporting Tasks" to monitor metrics or ingest provenance data as operational data and route that to email/offline storage, etc. to run analytics on it.

How to solve the relationship failure?

I have a processor that appears to be creating FlowFiles correctly (modified a standard processor), but when it goes to commit() the session, an exception is raised:
2016-10-11 12:23:45,700 ERROR [Timer-Driven Process Thread-6] c.s.c.processors.files.GetFileData [GetFileData[id=8f5e644d-591c-4df1-8c79-feea118bd8c0]] Failed to retrieve files due to {}  org.apache.nifi.processor.exception.FlowFileHandlingException: StandardFlowFileRecord transfer relationship not specified
I'm assuming this is supposed to be indicating there's no connection available to commit the transfer; however, there is a "success" relationship registered during init() in same way as original processor did it, and the success relationship out is connected to another processor input as it should be.
Any suggestions for troubleshooting?
What changes did you make to the standard processor? If you are calling methods on the ProcessSession object, ensure that you are saving the latest "version" of the FlowFile returned from those method calls, and transfer only the latest version to "success".
FlowFile references are immutable; often in code you will see an initial reference like "flowFile" pointing at the incoming flow file (from session.get() for example), then it gets updated as the flow file is mutated, such as flowFile = session.putAttribute(flowFile, "myAttribute", "myValue").
Also ensure that you have transferred or removed the latest version of each distinct flow file (not the various references to the same flow file) to some relationship (even Relationship.SELF if need be). If your processor creates a new flow file, ensure that new flow file is transferred. If the incoming flow file is no longer needed, be sure to call session.remove() on it.
There are some common patterns and additional guidance in the NiFi Developer's Guide, including test patterns; your unit test(s) for this processor should be able to flush out this error (by asserting how many flow files should have been transferred to which relationship(s) during the test).

Resources