How to get NiFi processor error into Flowfile attribute? - apache-nifi

I have a PutGCSObject processor for which I want to capture the error into a flow file attribute.
As in the Picture, when there is an error for the Processor, it sends to failure with all the pre-existing attributes as-is.
I want the error message to be a part of the same flow file as an attribute. How can I achieve that ?

There is actually a way to get it.
Here is how i do it:
1: I route all ERROR connections to a main "monitoring process group"
2: Here is my "monitoring process group"
In updateattribute I capture filename as initial_filename
Then in my next step I query the bulletins
I then parse the output as individual attributes.
After I have the parsed bulleting output I use a RouteOnAttribute proc to drop all bulletins I don't need (some of them I have already used and notified on).
Once I only have my actual ERROR bulletin left, I use ExecuteStreamingCommand to run a python script using nipyapi module to get more info about the error, such as where it is in my flow, hierarchy, a description of the processor that failed, some proc stats and also I have metadata catalog about each proc/process group with their custodians and business use case.
This data is then posted to sumologic for logging and also I trigger a series of notifications (Slack + PagerDuty hook to create an incident lifecycle).
I hope this helps

There's no universal way to append error messages as flowfile attributes. Also, we tend to strongly avoid anything like that because of the potential to bubble up error messages with sensitive data to users who might not be authorized to see those details.

Related

Can you wait for user response message with bwmarrin/discordgo?

Is there any option to wait for user response message in discordgo? I'm looking for something similar to discord.js awaitMessages.
No, but you can make a collection that holds message and event information and checking news messages.
Simply
Make a collection/array
Add message information
Check if the incoming message in the message event handler is in the collection
Handle event
Remove from collection
Don't forget set a timeout and clear expired data from collection.
according the docs: awaitMessages
time: Amount of time in milliseconds the collector should run for
max: Number of messages to successfully pass the filter
In Go, you can easily use a routine with just one keyword go, so implementing asynchronous (async) is very simple.
solving ideas:
Create a message storage center: It has the following features:
store all sent messages
Have a garbage collection mechanism: Since we are constantly collecting messages, we need a mechanism to eliminate old messages.
Need a mutex: Considering that the messages may generate race conditions, this lock is added to ensure security
It can generate filters: to communicate with each filter, we need to provide a chan for each filter.
Whenever a new message is created, we add the message to the message center, and it also notifies each filter.
Define your filter function: The message will be collected whenever this criterion is true. for example
Define the callback function: This is the result of the filter function. You can do something based on this result. for example
Full code
I put the full code on the replit and gist.
If you want to run it, you can copy the code from replit and set your token and channel ID (or user ID) in the environment variables to test it.

How to properly create Prometheus metrics with unique field

I have a system that regularly downloads files and parses them. However, sometimes something might go wrong with the parsing and I have the task to create a Prometheus alert for when a certain file fails. My
initial idea is to create a custom counter alert in Prometheus - something like
processed_files_total and use status as label because if the file fails it has FAILED status and if it succeeds - SUCCESS, so supposedly the alert should look like
increase(processed_files_total{status=FAILED}[24h]) > 0 and I hope that this will alert me in case there is at least 1 file with failed status.
The problem comes from the fact that I also want to have the
exact filename in the alert message and since each file has a unique name I'm almost sure that it is not a good idea to put it as label e.g. filename={filename} - According to Prometheus docs -
Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.
is there any other way I can achieve getting the filename from the alert or this is the way to go ?
It's a good question.
I think the correct answer is that the alert should notify you that something failed and the resolution is to go to the app's logs to identify the specific file(s) that failed.
Lightning won't strike you for using the filename as a label value in Prometheus if you really must but, I think, as you are, using an unbounded value should give you pause as to whether you're abusing the tool.
Metrics seem intrinsically (hunch) about monitoring aggregate state (an unusual number of files are failing) rather than specific (why did this one fail); logs and tracing tools help with the specific cases.

NiFi How to get the current processor Name and Processor group name through the custom processor using (Java)

I'm Creating the NiFi Custom processor using Java,
one of the requirement is to get the previous processor name and processor group (like a breadcrumb) using java code.
The previous processor name and process group name is not immediately (nor meant to be) available to processors, can you explain more about your use case? You can perhaps use a SiteToSiteProvenanceReportingTask to send provenance information back to your own NiFi instance (an Input Port, e.g.) and find the events that correspond to FlowFiles entering your custom processor, the events should have the source (previous) processor and destination (your custom) processor.
If instead you code your custom processor using InvokeScriptedProcessor with Groovy for example, then you can "bend the rules" and get at the previous processor name and such, as Groovy allows access to private members and you can assume the implementation of the ProcessContext in onTrigger is an instance of StandardProcessContext, so you can get at its members which include upstream connections and thus the previous processor. For a particular FlowFile though, I'm not sure you can use this approach to know which upstream processor it came from.
Alternatively, you could add an UpdateAttribute after each "previous processor" to set attribute(s) with the information about that processor, but that has to be hardcoded and applied to every corresponding part of the flow.
I faced this some time back. I used InvokeHTTP processor and used nifi-api/process-groups/${process_group_id} Web Service
This is how I implemented:
Identify the process group where the error handling should be done. [Action Group]
Create a new process group [Error Handling Group] next to the Action Group and add relationship to transfer files to Error Handling Group.
Use the InvokeHTTP processor and set HTTP Method to GET
Set Remote URL to http://{nifi-instance}:{port}/nifi-api/process-groups/${action_group_process_group_id}
You will get response in JSON which you will have to customize according to your needs
Please let me know if you need the XML file that I am using. I can share that. It just works fine for me

NiFi: Get all the processors name involved in a particular run

I have a nifi template of 30 processors. There are multiple conditional branches are there in the template. Now, I want to add something at the end of template so that I can get the list of all processors name which has executed for a particular run.
How can do this?
Thanks,
You could technically insert an UpdateAttribute processor after every "operational" processor which would add an attribute containing the most recent processor, but #Bryan is correct that the provenance feature exists to provide this information automatically. If you need to operate on it, you can use the SiteToSiteProvenanceReportingTask to send that data to a Remote Process Group (linked to an Input Port on the same instance) and then treat that data as any other in NiFi and examine/transform it.

Best practices to handle errors in NIFI

I'm using NIFI, and i have data flows where I use the following processos :
ExecuteScript
RouteOnAttribute
FetchMapDistribuedCache
InvokeHTTPRequest
EvaluateJSONPath
and two level process group like NIFI FLOW >>> Process group 1 >>> Process group 2, my question is how to handle errors in this case, I have created output port for each processor to output errors outside the process group and in the NIFI Flow I have done a funnel for each error type and then put all those errors catched in Hbase so i can do some reporting later on, and as you can imagine this add multiples relationships and my simple dataflow start to became less visible.
My questions are, what's the best practices to handle errors in processors, and what's the best approach to do some error reporting using NIFI ( Email or PDF )
It depends on the errors you routinely encounter. Some processors may fail to perform a task (an expected but not desired outcome), and route the failed flowfile to REL_FAILURE, a specific relationship which can be connected to a processor to handle these failures, or back to the same processor to be retried. Others (or the same processors in different scenarios) may encounter exceptions, which are unexpected occurrences which cannot be resolved by the processor.
An example of this is PutKafka vs. EncryptContent. If the remote Kafka system is temporarily unavailable, the processor would fail to send the flowfile content. However, retrying after some delay period could be successful if the remote system is once again available. However, decrypting cipher text with the wrong key will always throw an exception, no matter how many times it is attempted or how long the retry delay is.
Many users route the errors to PutEmail processor and report them to a specific user/group who can evaluate the errors and monitor the data flow if necessary. You can also use "Reporting Tasks" to monitor metrics or ingest provenance data as operational data and route that to email/offline storage, etc. to run analytics on it.

Resources