Is there a way to limit the number of File Handler instances? - spring

I have a component which uses the spring framework File integration to kick off a process each time a file arrives at a location. I can see from log files that two threads/processes/instances are running. Is there a way to limit it to one?
The second process/thread appears to kick off almost immediately after the first and they are interfering with each other. The first instance processes the file but then the second tries to do the same and hits a filenotfound exception because the first moved it.

First of all you need to consider to configure a poller for your file inbound channel adapter with the fixedDelay instead of fixedRate. This way the next polling task is not going to be start until the end of the previous one.
Also consider to use some filter do not process the same file again. Not sure what is your use-case , but simple AcceptOnceFileListFilter should be enough. There is a prevent-duplicates option on the channel adapter for convenience.
See more info in the Reference Manual: https://docs.spring.io/spring-integration/docs/current/reference/html/#files
And also about the poller behavior: https://docs.spring.io/spring-integration/docs/current/reference/html/#channel-adapter-namespace-inbound

Related

kg.apc.jmeter.functions.FifoTimeout does not work if added to user.properties in jmeter

I am using inter thread communicator plugin to share data between two thread groups.
TG-1: generates an ID -> stores it in the queue name Q1 TG-2: picks an ID from queue -> does the processing
After some time when run duration of TG-1 is completed, it stops processing or storing ID in to Q1. TG-2 processed all the data in the queue and keep on waiting for new data in the Q1. However Q1 will not have any data. My expectation was when the run duration of TG-2 also completed. TG-2 should finish its job and exit. Why does TG-2 keep on waiting for data in Q1. This is causing the exhaustion of the heap space and test never stops. This is causing a serious issue.
To prevent this, I tried adding kg.apc.jmeter.functions.FifoTimeout=120 in user.properties file as suggested by Dmitri T in one of my previous question for the same thing. However this property is not taking effect. Has anybody else also experience the same thing with this plugin? What is the alternative?
We are not telepathic enough to guess what is your setup, what exact components of the Inter-Thread Communication Plugin you're using and how they're configured.
If you're using Functions - the timeout is working fine for the __fifoPop() function, just make sure to restart JMeter after amending the property. __fifoGet() one will just return an empty value if the queue is empty
If you're using jp#gc - Inter-Thread Communication PreProcessor - there is a possibility to specify the timeout directly in GUI
Also it is always possible to stop the test via Flow Control Action Sampler

How can I cancel the whole batch, not on first failure but after N th failure?

I have multiple batches that are using different 3rd party apis to get and store/update data. the connections are made via laravels http request. all batches have about 6k jobs. Because all jobs are important I need to log the failed ones and nofiy the user.
Sometimes the response returns an error for all jobs. sometimes just a connection error or an error because the server cant process those requests.
The batch automatically cancels on first failure. But is there a way to cancel the batch if there are multiple failues (on nth failure) not just first?
First turn off normal batch error handling, then implement your own:
Initialize a counter with zero.
Whenever an error occurs, increase that counter.
Whenever that counter reaches/exceeds 5, fail the batch.
The concise implementation depends on the batch system you are working with.

Spring batch file resume after Server failure

I have configured Spring Boot Batch to process Fixed length flat file. I read and split columns by using FlatFileItemReader, FixedLengthTokenizer and Writing data into Database by using ItemWriter, JPA Repository.
I have a scenario like, My Server was crashed or it was stopped at the time of file processing. At this point half of the file was processed(means half of the data wrote into DB). When it comes to next Job(when server was running up) the file has to start from where it stops.
For Example, A file having 1000 lines, Server was shutdown after processing 500 rows. In the next Job, The file has to start from 501 row.
I googled for solution but nothing relevant. Any help appreciated.
As far as I know, what you are asking ( restart at chunk level ) doesn't automatically exist in Spring Batch API & is something that programmer has to implement on his/her own.
Spring Batch provides Job restart feature via JobOperator.restart . This is a job level restart and a new execution id will be created for next run & whole of the job will rerun as there are other concerns like somebody put in a new file or renamed existing file to process in place of old file , how batch will know that its same input file content wise or db not changed since last run?
Due to these concerns , its imperative that programmer handles these situations via custom code.
Second concern is that when there is a server failure, job status would still be STARTED & not FAILED since it happens all of a sudden and framework couldn't update status correctly.
Following steps you need to implement ,
1.Implement a custom logic to decide if last job execution was successful or restart is needed.
2.If restart is needed, mark previous job execution as FAILED & then use JobOperator.restart(long executionId) - For a non - partitioned job , only useful impact would be the marking of job status to be correct as FAILED but whole job will restart from beginning.
There are many scenarios like,
a)job status is STARTED but all steps are marked COMPELTEDetc
b)For a partitioned job, few steps are completed , few failed & few in started etc
3.If restart is not needed, launch a new job using - JobLauncher.run.
So with above steps, you see that a real chunk level job restart is not achieved but above steps are primary things that you first need to understand & implement.
Next would be to changing your input at job restart i.e. you devise a mechanism to mark input records as processed for processed chunks ( i.e. read , processed & written ) & have a way to know what input records are not processed - then in next job run you feed modified input that is still unprocessed. So its all going to be your use case specific custom logic.
I am not aware of any inbuilt mechanism in the framework itself to achieve this. To me a Job Restart is a brand new job execution with modified / reduced input.

Batch transfer rate upper bound in a channel - batch creation start trigger

From this stackoverflow question i understand a batch is sent out one at a time (without bother pipeline in this discussion), meaning, a second batch won't be sent until the first one is delivered.
My follow up question is, what condition starts a batch creation process. If i understand correctly (i could obviously be wrong....), a batch is created/cut, or let's call it a batch creation process is completed, if BATCHSZ reached, or BATCHLIM reached, or BATCHINT (=/=0) reached, or XMIT-Q is empty, but what starts a batch creation process. Is the batch creation process synchronous or asynchronous to batch transfer? Does batch creation process start only after the previous batch is delivered (synchronous), or it's totally decoupled from the previous batch (eg. while the previous batch is still in transfer)?
This is a sibling/follow up question to 1. The intention is to estimate our QRepl-MQ-transfer upper limit. As documented in entry "[added on Dec.20]" in the first (self-)answer in 1, our observation seems support the batch creation process starts synchronously AFTER the previous batch transfer is complete, but i couldn't find ibm references documenting the details......
Thanks for your help.
our observation seems support the batch creation process starts
synchronously AFTER the previous batch transfer is complete, but i
couldn't find ibm references documenting the details.
Yes that is how it works. If a 2nd batch started before the 1st batch finished then you would have newer messages jumping in front of older messages, which could cause all kinds of issues.
Yes, I know, applications are not suppose to rely on messages coming in a logical order (i.e. 1,2,3,etc.) but they do.
Think of MCA (Message Channel Agent) which is the process getting messages from the XMIT the same as a security guard at a store on Black Friday. He lets in 50 people form the line (batch). After many people leave the store, he lets in another 50 people into the store. Would you want ASYNC batching of the line at the store - absolutely not. The security guard wants order not chaos.
The same is true for MQ's MCA. It creates a batch of "n" messages, sends them, acknowledges them, then goes onto the next batch.

How to empty all the queues in Nifi at a time?

I have started exploring NiFi. I have built some flow which is working. But i want to clear all the queues at a time to test the flow each time if i made any changes. I know we can stop and start each processor and test step by step. But i want to know is there a way we can clear all the queues at a time.
the easiest way to stop nifi, delete the following folders, and start it again:
content_repository
database_repository
flowfile_repository
provenance_repository
another approach to use nifi-api to get list of all queues and then call function to empty them.

Resources