From this stackoverflow question i understand a batch is sent out one at a time (without bother pipeline in this discussion), meaning, a second batch won't be sent until the first one is delivered.
My follow up question is, what condition starts a batch creation process. If i understand correctly (i could obviously be wrong....), a batch is created/cut, or let's call it a batch creation process is completed, if BATCHSZ reached, or BATCHLIM reached, or BATCHINT (=/=0) reached, or XMIT-Q is empty, but what starts a batch creation process. Is the batch creation process synchronous or asynchronous to batch transfer? Does batch creation process start only after the previous batch is delivered (synchronous), or it's totally decoupled from the previous batch (eg. while the previous batch is still in transfer)?
This is a sibling/follow up question to 1. The intention is to estimate our QRepl-MQ-transfer upper limit. As documented in entry "[added on Dec.20]" in the first (self-)answer in 1, our observation seems support the batch creation process starts synchronously AFTER the previous batch transfer is complete, but i couldn't find ibm references documenting the details......
Thanks for your help.
our observation seems support the batch creation process starts
synchronously AFTER the previous batch transfer is complete, but i
couldn't find ibm references documenting the details.
Yes that is how it works. If a 2nd batch started before the 1st batch finished then you would have newer messages jumping in front of older messages, which could cause all kinds of issues.
Yes, I know, applications are not suppose to rely on messages coming in a logical order (i.e. 1,2,3,etc.) but they do.
Think of MCA (Message Channel Agent) which is the process getting messages from the XMIT the same as a security guard at a store on Black Friday. He lets in 50 people form the line (batch). After many people leave the store, he lets in another 50 people into the store. Would you want ASYNC batching of the line at the store - absolutely not. The security guard wants order not chaos.
The same is true for MQ's MCA. It creates a batch of "n" messages, sends them, acknowledges them, then goes onto the next batch.
Related
I have multiple batches that are using different 3rd party apis to get and store/update data. the connections are made via laravels http request. all batches have about 6k jobs. Because all jobs are important I need to log the failed ones and nofiy the user.
Sometimes the response returns an error for all jobs. sometimes just a connection error or an error because the server cant process those requests.
The batch automatically cancels on first failure. But is there a way to cancel the batch if there are multiple failues (on nth failure) not just first?
First turn off normal batch error handling, then implement your own:
Initialize a counter with zero.
Whenever an error occurs, increase that counter.
Whenever that counter reaches/exceeds 5, fail the batch.
The concise implementation depends on the batch system you are working with.
I have a batch job that reads hundreds of images from an SFTP location and then encodes them into base64 and uploads them via API using HTTP connector.
I would like to make the process run quicker and hence trying to split the payload into 2 via scatter-gather and then sending then sending payload1 to one batch job in a subflow and payload2 to another batch job in another subflow.
Is this the right approach?
Or is it possible to split the load in just one batch process, ie for one half of the payload to be processed by batch step 1 and second half will be processed by batch step 2 at the same time?
Thank you
No, it is not a good approach. Batch jobs are always executed asynchronously (ie using different threads) so there is no benefit on using scatter-gather and it has the cons of increasing resource usage.
Splitting the payload in different batch steps doesn't make sense either. You should not try to scale by adding steps.
Batch jobs should be used naturally to work in parallel by iterating on an input. It may be able to handle the splitting itself or you can manually split the input payload before. Then let it handle the concurrency automatically. There are some configurations you can use to tune it, like block sizing.
This question already has answers here:
How can you restart a failed spring batch job and let it pick up where it left off?
(3 answers)
Closed 1 year ago.
After read a lot about restarting in Spring Batch, i've learnt that:
SB can restart a step from the beginning where a job has failed.
Example: Job1 -> step1, step2, step3 (FAIL) -> then you can restart from step3
I would like another behaviour, but I didnt find any solution that fits me.
I have a job with a single step.
This step read a text file (can have a lot of lines).
I want to cover our system in case of a non-expected ending (for example, our server shutdown abruptly)
In this case if we have read X lines, i want to recover the job from X+1 lines to the end.
¿Is it this possible to achieve?
Thanks in advance.
IMO, if the job stops abruptly at line X and you want to start at line X+1, it will happen only if the chunk size is 1. Because when chunk size is 1, every processed entry is committed and the JobContext knows exactly when it failed and where to restart from.
When your chunk size is greater than 1, let's say 8 and your job abruptly stops when the item 4 was being processed, then the first 3 items processed in the chunk is also not committed to the job execution tables at time of failure and the context would start from the same chunk from the 1st entry. In this case you will process the first 3 items again!
This can be avoided if you enable graceful shutdown of Spring Batch job when a kernel interruption happens so that the chunk is completely processed before the process ends. This will minimize the number of incidents reported but there would still be chances like - when you server's power plug is pulled of and the program doesn't get a chance to do a graceful shutdown.
Suggestions:
If the process is idempotent, no trouble if some items are re-processed
If the process is not idempotent, probably some kind of de-duplication check can be placed in a CompositeProcessor.
For graceful shutdown of the job, I have written some PoC code. You can see it here - https://github.com/innovationchef/batchpay/blob/master/src/main/java/com/innovationchef/batchcommons/JobManager.java
I have a component which uses the spring framework File integration to kick off a process each time a file arrives at a location. I can see from log files that two threads/processes/instances are running. Is there a way to limit it to one?
The second process/thread appears to kick off almost immediately after the first and they are interfering with each other. The first instance processes the file but then the second tries to do the same and hits a filenotfound exception because the first moved it.
First of all you need to consider to configure a poller for your file inbound channel adapter with the fixedDelay instead of fixedRate. This way the next polling task is not going to be start until the end of the previous one.
Also consider to use some filter do not process the same file again. Not sure what is your use-case , but simple AcceptOnceFileListFilter should be enough. There is a prevent-duplicates option on the channel adapter for convenience.
See more info in the Reference Manual: https://docs.spring.io/spring-integration/docs/current/reference/html/#files
And also about the poller behavior: https://docs.spring.io/spring-integration/docs/current/reference/html/#channel-adapter-namespace-inbound
I have a CustomReceiver which receives a single event(String).The received single event is used during spark application's run time to read data from nosql and to apply transformations.When the processing time for each batch was observed to be greater than batch interval I set this property.
spark.streaming.backpressure.enabled=true
After which I expected the CustomReceiver to not trigger and receive the event when a batch is processing longer than batch window, which didn't happen and still a backlog of batches were being added. Am I missing something here?
Try to check this and this articles.