How can i see the Content of Flowfiles in queue via API in NiFi - apache-nifi

I have a use case, where i have to see the content of each FlowFiles by calling the API.
API i have explored so far :
https://<URL>/nifi-api/flowfile-queues/<Queue-ID>/flowfiles/<FlowFile-UUID>/content
I got the response as below :
"The id of the node in the cluster is required."
Note : My NiFi is setting up in three node cluster.
Please suggest.

*Hi Everyone, From below API, You'll get the content of flowfile which present in queue.
https://<URL>/nifi-api/flowfile-queues/<id>/flowfiles/<flowfile-uuid>/content?clusterNodeId=<clusterNodeId>

Related

which elastic search node should I send rest calls to?

if I want to edit my index settings for example I want to change the number of replicas then I would have to send a request with the number of replicas through rest end point. my question is which node should I send the request to ? to any node or specific node in the cluster ?
you can send a request to any node in the cluster, and the request will be appropriately routed to where it needs to be actioned
in this case, the request will be sent to the master node, the settings updated, and then a response sent back to the original node and onto the client

NiFi DistributedMapCacheServer and flowfiles on different nodes

Hi I am using NiFi DistributedMapCacheServer to keep track of processed files in my flow. The issue is that we are working in a cluster and to leverage it we are using load balancing in queues so Flowfiles are not on the same node. Once they are arriving to Put/GetDistributedMapCache that is using DistributedMapCacheClient with fixed name of one of the hosts it only works when arriving Flowfile is on the same node as the one specified in DistributedMapCacheClient- for others we are getting:
FetchDistributedMapCache[id=d4713096-5ae5-1cb4-b777-202948e39e50] Unable to communicate with cache when processing StandardFlowFileRecord[uuid=5b1e8092-5bc5-4213-97a3-fa023691973f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1587393798960-14, container=default, section=14], offset=983015, length=5996],offset=0,name=bf15d684-4100-4aa5-9fb5-fa0ddb21b140,size=5996] due to No route to host: java.net.NoRouteToHostException: No route to host
Is there any way to set up DMC server/client to work in such case, or can I somehow route all flowfiles to explicitly given node?
This means the hostname/ip-address that you specified in the DistributedMapCacheClient for the location of the server is unreachable by the other nodes in your cluster. Your nodes must be able to communicate since you have a cluster, so you just need to set this to the correct value.

Apache Nifi - Flowfiles are stuck in queue

The flow files are stuck in the queue(Load balance by attribute) and are not read by the next downstream processor(MergeRecord with CSVReader and CSVRecordSetWriter). From the Nifi UI, it appears that flow files are in the queue but when tried to list queue it says "Queue has no flow files". Attempting to empty queue also gives the exact message. Nifi Logs doesn't have any exceptions related to the processor. There are around 80 flow files in queue.
I have tried below action items but all in vain:
Restarting the downstream and upstream(ConvertRecord) processor.
Disabled and enabled CSVReader and CSVRecordSetWriter.
Disabled load balancing.
Flow file expiration set to 3 sec.
Screenshot:
Flowfile:
MergeRecord properties:
CSVReader Service:
CSVRecordSetWriter:
Your merge record processor is running only on the primary node, and likely all the files are on other nodes (since you are load balancing). NiFi is not aware enough to notice that the downstream processor is only running on the primary, so it does not automatically rebalance everything to the primary node. Simply changing the MergeRecord to run on all nodes will allow the files to pass through.
Alas, I have not found a way to get all flow files back on the primary node, you can use the "Single Node" load balance strategy to get all the files on the same node, but it will not necessarily be the primary.
This is probably because the content of the flowfile was deleted. However, the entry of it is still present in the flow-file registry.
if you have a dockerized nifi setup and if you dont have a heavy production flow, you can stop your nifi flow and delete everything in the _*repository folders (flowfile-repository, content repository etc)
(provided you have all you directories mounted and no other data loss is at risk)
Let me know if you need further assistance
You have a miss configuration in the way you load balance your FlowFiles. To check that stop your MergeRecord processor to be able to check and view what's inside your queue.
In the modal window displayed you can check where are your flowfiles waiting, it's highly probable that your FlowFiles are in fact on one of the other node(s) but since the MergeRecord is running on the primary node then it has nothing in its Queue.

NiFi - Choose Queue to execute

Suppose you have an ExecuteScript processor in a NiFi flow.
This processor has 2 incoming queues.
Is there a way to choose from which Queue session.get() will pull the flowfile?
Thanks.
There's no direct way via the API to identify which queue a flow file is coming from. However you can try this:
Add an UpdateAttribute to each upstream flow before ExecuteScript. For each branch, add the same attribute with a different value, say "queue.name" = "A" for one and "queue.name" = "B" for the other
In ExecuteScript you can pass a FlowFileFilter to session.get(), to fetch flow file(s) whose queue.name attribute is "A" or "B". Note that you may get an empty list, and if you need at least one flow file to continue, you can just return if the list is empty.

Access to queue attributes?

I have a number of GenerateTableFetch processors that send Flowfiles to a downstream UpdateAttributes processor. From the UpdateAttributes, the Flowfile is passed to an ExecuteSQL processor:
Is there any way to add an attribute to a flow file coming off a queue with the position of that Flowfile in the queue? For example, After I reset/clear the state for a GenerateTableFetch, I would like to know if this is the first batch of Flowfiles coming from GenerateTableFetch. I can see the position of the FlowFile in the queue, but it would nice is there's a way that I could add that as an attribute that is passed downstream. Is this possible?
This is not an available feature in Apache NiFi. The position of a flowfile in a queue is dynamic, and will change as flowfiles are removed from the queue, either by downstream processing or by flowfile expiration.
If you are simply trying to determine if the queue was empty before a specific flowfile was added, your best solution at this time is probably to use an ExecuteScript processor to get the desired connection via the REST API, then use FlowFileQueue#isActiveQueueEmpty() to determine if the specified queue is currently empty, and add a boolean attribute to the flowfile indicating it is the "first of a batch" or whatever logic you want to apply.
"Batches" aren't really a NiFi concept. Is there a specific action you want to take with the "first" flowfile? Perhaps there is other logic (i.e. the ExecuteSQL processor hasn't operated on a flowfile in x seconds, etc.) that could trigger your desired behavior.

Resources