How do I got notified while my cache replication finished? - caching

I have a scenario that I will start a ignite cache node(node1) with cache mode(CacheMode.REPLICATED) first. And the other cache node(node2) with the same cache name will be started in the future for synchronizing data from node1. I have a question of "How do I got notified when the cache synchronization from node1 to node2 is finished?

You can use synchronous rebalancing for this: https://apacheignite.readme.io/docs/rebalancing#rebalance-modes
If rebalance mode is SYNC, new node will not complete start process until data is fully replicated.

Related

Implementing Non-blocking background cache refresh

In my springboot application currently cache gets refresh with a miss, cache key contains requestKey+versionId mapping, if versionId changes the cache started to miss, in case of miss - first get query will be performed and update the cache. This extra call and update of cache invites extra latency.
We need a mechanism where instead of waiting for a cache miss to update the cache, we can schedule periodic updates of the cache in background to reduce the latency caused by cache misses.
This time cache key will only be requestKey
I was thinking to create a separate process or thread that periodically checks if version is updated and updates the cache if version is changed. This process can run independently of the main thread and not impact the requests.
My question is - is this the best way to achieve this ?
Which cache library I can use ? as I need it to be thread safe.

Time-sensitive Work Node Disaster Recovery and Status Synchronization

My project has one main node and one work node, which the work node excutes some job fetched from the main node(most jobs are time-sensitive like send an email after three hours).
We expect the work node to be stateless, it register to main node and fetch some jobs to do, when a job done, it send a finish signal. The main node will check work node's health every once in a while, if work node was died, it set the jobs to unfetched.
The point is we choose HTTP to connect main and worker, so the main node get infomation only when work node send a KEEPALIVE request. It makes some confuse in disaster recovery and status synchronization.
Wanted to know if it's good practice to do that and what would be the best way to do that?
Thanks in advance.

Spring Batch - restart behavior upon worker crash

I've been exploring how Spring Batch works in certain failure cases when remote partitioning is used.
Let's say I have 3 worker nodes and 1 manager node. The manager node creates 30 partitions that the workers can pick up. The messaging layer is Kafka.
The workers are up, waiting for work to arrive on the specific topic. The manager node creates the partitions, puts them into the DB and sends the messages on the Kafka topic which has 3 partitions.
All nodes have started the processing but suddenly one node has crashed. The node that has crashed will have the step execution states set to STARTED/STARTING for the partitions it initially has picked up.
Another node will come to the rescue since the Kafka partitions will get revoked and reassigned, so one of the nodes between the 2 will read the partition the crashed node did.
In this case, nothing will happen of course because the original Kafka offset was committed by the crashed node even though the processing hasn't finished. Let's say when partitions get reassigned, I set the consumer back to the topic's beginning - for the partitions it manages.
Awesome, this way the consumer will start consuming messages from the partition of the crashed node.
And here's the catch. Even though some of the step executions that the crashed node processed with COMPLETED state, the new node that took over will reprocess that particular step execution once more even though it was finished before by the crashed node.
This seems strange to me.
Maybe I'm trying to solve this the wrong way, not sure but I appreciate any suggestions how to make the workers fault-tolerant for crashes.
Thanks!
If a StepExecution is marked as COMPLETED in the job repository, it will not be reprocessed. No data will be run again. A new StepExecution may be created (I don't have the code in front of me right now) but when Spring Batch evaluates what to do based on the previous run, it won't process it again. That's a key feature of how Spring Batch's partitioning works. You can send the workers 100 messages to process each partition, but it will only actually get processed once due to the synchronization in the job repository. If you are seeing other behavior, we would need more information (details from your job repository and configuration specifics).

NiFi - data stuck in queues when load balancing is used

In Apache NiFi, dockerized version 1.15, a cluster of 3 NiFi nodes is created. When load balancing is used via default port 6342, flow files get stuck in some of the queues, in the queue in which load balancing is enabled. But, when "List queue" is tried, the message "The queue has no FlowFiles." is issued:
The part of the NiFi processor group where the issue happens:
Configuration of NiFi queue in which flow files seem to be stuck:
Another problem, maybe not related, is that after this happens, some of the flow files reach the subsequent NiFi processors, but get stuck before the MergeContent processors. This time, the queues can be listed:
The part of code when the second issue occurs:
The part of code when the second issue occurs
The configuration of the queue:
The listing of the FlowFiles in the queue:
The MergeContent processor configuration. The parameter "max_num_for_merge_smxs" is set to 100:
Load balancing is used because data are gathered from the SFTP server, and that processor runs only on the Primary node.
If you need more information, please let me know.
Thank you in advance!
Edited:
I put the load-balancing queues between the ConsumeMQTT (working on the Primary node only) and UpdataAttribute processors, but Flow files are seemingly staying in the load-balancing queue, but when the listing is done, the message is "The queue has no FlowFiles.". Please check:
Changed position of the load-balancing queue:
The message that there are no flow files in the queues:
Take notice that the processors before and after the queue are stopped while doing "List queue".
Edit 2:
I changed the configuration in the nifi.properties to the following:
nifi.cluster.load.balance.connections.per.node=20
nifi.cluster.load.balance.max.thread.count=60
nifi.cluster.load.balance.comms.timeout=30 sec
I also restarted the NiFi containers, so I will monitor the behaviour. For now, there are no stuck Flow files in the load-balancing queues, they go to the processor that follows the queue.
"The queue has no FlowFiles" is normal behaviour of a queue that is feeding into a Merge - the flowfiles are pending to be merged.
The most likely cause of them being "stuck" before a Merge is that you have Round Robin distributed the FlowFiles across many nodes, and then you are setting a Minimum count on the Merge. This minimum is per node and there are not enough FlowFiles on each node to hit the Minimum, so they are stuck waiting for more FlowFiles to trigger the Merge.
-- Edit
"The queue has no FlowFiles" is also expected on a queue that is active - in your flow, the load balancing queue is drained immediately into the output queue of your merge PGs Input port - so there are no FFs sitting around in the load balancing queue. If you were to STOP the Input ports inside the merge PG, you should be able to list them on the LB queue.
It sounds like you are doing GetSFTP (Primary) and then distributing the files. The better approach would be to use ListSFTP (Primary) -> Load Balance -> FetchSFTP - this would avoid shuffling large files, and would instead load balance the file names between all nodes, with each node then fetching a subset of the files.
Secondly, I would review your Merge config - you have a parameter #{max_num_for_merge_xmsx} defined, but this set in the Minimum Number of Entries for the Merge - so you are telling Merge to only ever merge when at least #{max_num_for_merge_xmsx} amount of FlowFiles is reached.

How to set up restart of cluster to be graceful and control how shards are allocated in elasticsearch6.4.3

I want to set up the restart job to be a graceful restart where a node will not begin to restart until the previous node is back up and operational.
Only once a the first node is back up and working will the second node restart and so on until it's through all the nodes.
Also,I want to control how soon to start reallocating shards if we loose a node. I need to set it to 5 min.
Ant suggestions?
This process is documented here

Resources