NiFi - data stuck in queues when load balancing is used - apache-nifi

In Apache NiFi, dockerized version 1.15, a cluster of 3 NiFi nodes is created. When load balancing is used via default port 6342, flow files get stuck in some of the queues, in the queue in which load balancing is enabled. But, when "List queue" is tried, the message "The queue has no FlowFiles." is issued:
The part of the NiFi processor group where the issue happens:
Configuration of NiFi queue in which flow files seem to be stuck:
Another problem, maybe not related, is that after this happens, some of the flow files reach the subsequent NiFi processors, but get stuck before the MergeContent processors. This time, the queues can be listed:
The part of code when the second issue occurs:
The part of code when the second issue occurs
The configuration of the queue:
The listing of the FlowFiles in the queue:
The MergeContent processor configuration. The parameter "max_num_for_merge_smxs" is set to 100:
Load balancing is used because data are gathered from the SFTP server, and that processor runs only on the Primary node.
If you need more information, please let me know.
Thank you in advance!
Edited:
I put the load-balancing queues between the ConsumeMQTT (working on the Primary node only) and UpdataAttribute processors, but Flow files are seemingly staying in the load-balancing queue, but when the listing is done, the message is "The queue has no FlowFiles.". Please check:
Changed position of the load-balancing queue:
The message that there are no flow files in the queues:
Take notice that the processors before and after the queue are stopped while doing "List queue".
Edit 2:
I changed the configuration in the nifi.properties to the following:
nifi.cluster.load.balance.connections.per.node=20
nifi.cluster.load.balance.max.thread.count=60
nifi.cluster.load.balance.comms.timeout=30 sec
I also restarted the NiFi containers, so I will monitor the behaviour. For now, there are no stuck Flow files in the load-balancing queues, they go to the processor that follows the queue.

"The queue has no FlowFiles" is normal behaviour of a queue that is feeding into a Merge - the flowfiles are pending to be merged.
The most likely cause of them being "stuck" before a Merge is that you have Round Robin distributed the FlowFiles across many nodes, and then you are setting a Minimum count on the Merge. This minimum is per node and there are not enough FlowFiles on each node to hit the Minimum, so they are stuck waiting for more FlowFiles to trigger the Merge.
-- Edit
"The queue has no FlowFiles" is also expected on a queue that is active - in your flow, the load balancing queue is drained immediately into the output queue of your merge PGs Input port - so there are no FFs sitting around in the load balancing queue. If you were to STOP the Input ports inside the merge PG, you should be able to list them on the LB queue.
It sounds like you are doing GetSFTP (Primary) and then distributing the files. The better approach would be to use ListSFTP (Primary) -> Load Balance -> FetchSFTP - this would avoid shuffling large files, and would instead load balance the file names between all nodes, with each node then fetching a subset of the files.
Secondly, I would review your Merge config - you have a parameter #{max_num_for_merge_xmsx} defined, but this set in the Minimum Number of Entries for the Merge - so you are telling Merge to only ever merge when at least #{max_num_for_merge_xmsx} amount of FlowFiles is reached.

Related

Is there anyway to check current bulk queue size Opensearch?

My Opensearch sometimes reaches the error "429 Too Many Requests" when writing data. I know there is a queue, when the queue is full it will show that error. So is there any Api to check that bulk queue status, current size...? Example: queue 150/200 (nearly full)
Yes, you can use the following API call
GET _cat/thread_pool?v
You will get something like this, where you can see the node name, the thread pool name (look for write), the number of active requests currently being carried out, the number of requests waiting in the queue and finally the number of rejected requests.
node_name name active queue rejected
node01 search 0 0 0
node01 write 8 2 0
The write queue can handle as many requests as 1 + number of CPUs, i.e. as many can be active at the same time. If active is full and new requests come in, they go directly in the queue (default size 10000). If active and queue are full, requests start to be rejected.
Your mileage may vary, but when optimizing this, you're looking at:
keeping rejected at 0
minimizing the number of requests in the queue
making sure that active requests get carried out as fast as possible.
Instead of increasing the queue, it's usually preferable to increase the number of CPU. If you have heavy ingest pipelines kicking in, it's often a good idea to add ingest nodes whose goal will be to execute that pipeline instead of on the data node.

Spring Batch - restart behavior upon worker crash

I've been exploring how Spring Batch works in certain failure cases when remote partitioning is used.
Let's say I have 3 worker nodes and 1 manager node. The manager node creates 30 partitions that the workers can pick up. The messaging layer is Kafka.
The workers are up, waiting for work to arrive on the specific topic. The manager node creates the partitions, puts them into the DB and sends the messages on the Kafka topic which has 3 partitions.
All nodes have started the processing but suddenly one node has crashed. The node that has crashed will have the step execution states set to STARTED/STARTING for the partitions it initially has picked up.
Another node will come to the rescue since the Kafka partitions will get revoked and reassigned, so one of the nodes between the 2 will read the partition the crashed node did.
In this case, nothing will happen of course because the original Kafka offset was committed by the crashed node even though the processing hasn't finished. Let's say when partitions get reassigned, I set the consumer back to the topic's beginning - for the partitions it manages.
Awesome, this way the consumer will start consuming messages from the partition of the crashed node.
And here's the catch. Even though some of the step executions that the crashed node processed with COMPLETED state, the new node that took over will reprocess that particular step execution once more even though it was finished before by the crashed node.
This seems strange to me.
Maybe I'm trying to solve this the wrong way, not sure but I appreciate any suggestions how to make the workers fault-tolerant for crashes.
Thanks!
If a StepExecution is marked as COMPLETED in the job repository, it will not be reprocessed. No data will be run again. A new StepExecution may be created (I don't have the code in front of me right now) but when Spring Batch evaluates what to do based on the previous run, it won't process it again. That's a key feature of how Spring Batch's partitioning works. You can send the workers 100 messages to process each partition, but it will only actually get processed once due to the synchronization in the job repository. If you are seeing other behavior, we would need more information (details from your job repository and configuration specifics).

Long duration soak tests in jmeter

Jmeter tests are run in master slave fashion with around 8 slave machines. However with the remote batching mode set to MODE_STRIPPED_BATCH, I am not able to run tests for more than 64 hours. Throughput is around 450 requests per minute, and per slave machine it results in the creation of jtl files that are around 1.5 gb. All 8 slaves are going to send this to the master (1.5 gb x 8) and probably the I/O gets too much for the master to handle. The master machines memory is at 16 gb ram and has disk storage of around 250 gb. I was wondering if the jmeter distributed architecture has any provision to make long running soak tests possible without any un explained stress on the master machine. Obviously I have the option to abandon master slave setup and go for 8 independent nodes, however I'll in that case run into complications with respect to serving data csv files ( which I currently serve using simple table server plugin from the master m) and also around aggregating result files. Any suggestions please. It would be great to be able to run tests atleast for around 4 days (96 hours or so).
I would suggest to go for an independent JMeter workers + external data collector setup.
Actually, the JMeter right-out-of-the-box "distributed scaling" abilities are weak, way outdated & overall pretty ridiculous. As well as it's data collection/agregation/processing abilities.
This situation actually puzzles me a lot - mind you, rivals are even worse, so there's literally NOTHING in the field (except for, perhaps, some SaaS solutions trying to monetize on this gap).
But is is what it is...
So that's about why-s, now to how-s.
If I were you, I would:
Containerize the JMeter worker
Equip each container with a watchdog to quickly restart the worker if things go south locally (or probably even on schedule to refresh it ultimately). Be that an internal one, or external like cloud services have - doesn't matter.
Set up a timeseries database - I recommend InfluxDB, it's an excellent product & it's free in basic version (which is going to be enough for your purposes).
Flow your test results/metrics into that DB - do not collect them locally! You can do it right from your tests with pretty simple custom listener (Influx line protocol is ridiculously simple & fast), or you can have external agent watching the result files as they flow. I just suggest you not to use so called Backend Listner to do the job - it's garbage, it won't shape your data right, so you'd have to do additional ops to bring them to order.
If you shape your test result/metrics data properly, you've get 'em already time-synced into a single set - and the further processing options are amazingly powerful!
My expectation is that you're looking for the StrippedAsynch sampler sender mode.
As per the documentation:
Asynch
samples are temporarily stored in a local queue. A separate worker thread sends the samples. This allows the test thread to continue without waiting for the result to be sent back to the client. However, if samples are being created faster than they can be sent, the queue will eventually fill up, and the sampler thread will block until some samples can be drained from the queue. This mode is useful for smoothing out peaks in sample generation. The queue size can be adjusted by setting the JMeter property asynch.batch.queue.size (default 100) on the server node.
StrippedAsynch
remove responseData from successful samples, and use Async sender to send them.
So on slave node add the following line to user.properties file:
mode=StrippedAsynch
and on the master node define asynch.batch.queue.size, to be as high to not to have impact onto JMeter's throughput (won't slow it down) and as low to not to overwhelm the master. I would start with 1000.
Another option is using StrippedDiskStore but you will have to manually collect serialized results after test completion (make sure that slave processes will not shut down because the results will be deleted when slave process finishes)
You could use JMeter PerfMon Plugin to monitor memory and network usage on master and slaves.

IIB Collector Node and transactions

I am using a Collector Node in my message flow. It is configured to collect 50 message or wait for 30 seconds. Under load testing, Websphere MQ sometimes says that a long-running transaction has been detected, and the pid corresponds with the pid of the application's execution group. The question is: is it possible that the Collector Node does not commit its internal transaction while waiting for the messages or for the timeout expiry?
The MQInput node is where the transactionality is specified. This is described in the IIB v10 KC page Developing integration solutions > Developing message flows > Message flow behavior > Changing message flow behavior > Configuring transactionality for message flows > Configuring MQ nodes for transactions
If you set the property to Yes (the default option): if a transaction is not already inflight, the node starts a transaction.
The Collector Node does not commit until it times out or reaches the count. See the IIB v10 KC page Reference > Message flow development > Built-in nodes > Collector node
All input messages that are received under sync point from a transaction or thread by the Collector node are stored in internal queues. Storing the input messages under sync point ensures that the messages remain in a consistent state for the outgoing thread to process; such messages are available only at the end of the transaction or thread that propagates the input messages.
A new transaction is created when a message collection is complete, and is propagated to the next node.
Whenever you configure any node(those are eligible as per IBM documentation) to work under transaction, they don't commit until the unit-of-work gets completed. In your case since 50 messages(if arrived in 30 secs) are requested in one unit-of-work, the message flow that has collector node and all other nodes in that flow commit once all 50 messages are successfully processed. During this time period, Queue manager has to maintain this in-flight state in its logs which I had stated previously which had to be increased. So any large unit-of-work causes this issue irrespective of node used
Since your issue deals with MQ long running transaction, ensure you have enough MQ log space for transaction handling by the queue manager.
To increase the MQ log space go to the below path and increase the primary and secondary number
==> IBM\WebSphere MQ\qmgrs\QMNAME\qm.ini
Below are the content that you have to increase. By default it is 3 and 2. Ensure you have space on your disc to whatever number you are increasing it to. Restart your queue manager once the qm.ini file has been updated.
Log:
LogPrimaryFiles=3
LogSecondaryFiles=2
Link to MQ config on :
https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_9.0.0/com.ibm.mq.con.doc/q018710_.htm
Hope this helps.

HBase operations hang when nodes go offline

I noticed that operations like Put hang forever if nodes go offline (server crash e.g.)
Here's the relevant logs from the client:
(AsyncProcess.java:1777) - Left over 1 task(s) are processed on
server(s): [s1.mycompany.com,16020,1519065917510,
s2.mycompany.com,16020,1519065918510,
s3.mycompany.com,16020,1519065917410]
(AsyncProcess.java:1785) - Regions against which left over task(s) are processed: [...]
In my case, s2 and s3 went offline. (p.s. ~50 nodes in cluster)
Shouldn't this problem be handled by HBase? E.g. if region servers go offline, their regions are reassigned to other servers and puts change their destination?
Since HBase is fault tolerant, this problem should not happen

Resources