Spring Batch Restart - Not picking up correct row - spring

I have a spring batch job that has the following attributes:
commit-interval: 25.
skip-limit of 3.
In my integration tests, I have injected in a fake writer that will throw the skippable exception, and this in turn is injected with a list of ids, which will cause the exception to be thrown.
In the before of my test I create 135 rows. I configure that rows
"9", "11", "44", "51", "70"
will all be the rows that cause the ItemWriter to throw the exception.
All works well on the first run, and as expected, the job fails after the 3 commits of 50, on row 51, or rather when "something" in the writer has detected a skippable exception that has now exceeded the limit of 3. Also, I have asserted that 9, 11 and 44 are registered in the skippable listener which I would expect.
I realise that the batch job has not individually wrapped the items in transactions before it fails, like id did for 9, 11 and 44 because it already knows that the skip limit is reached.
However, when I restart the job, the starting row is 74 - Not 51 as I would expect.
Therefore from 51 to 73 are skipped?
I cannot figure this one out. Or why it would skip the chunk that has failed completely.
Any help would be appreciated.
David.

The fix to this bug will be included in the next release of spring batch.
https://jira.springsource.org/browse/BATCH-2122

Related

ENGINE-16004 Exception while closing command context: Cannot correlate message 'xxxxxx':

in 100 order 1 or 2 are failing with this message.
184 and 258 are callbacks we get the callbacks but camunda is unable to close the task . We noticed that this is happening whenever we get callbacks with a time between them in milliseconds .If the 258 comes before 184 we do close the 184 tasks so this is not the issue .Please don't tell me to check the message name in the bpmn because it's working for almost all the orders. Thank you
ENGINE-16004 Exception while closing command context: Cannot correlate message 'order-m1-i0184': No process definition or execution matches the parameters
org.camunda.bpm.engine.MismatchingMessageCorrelationException: Cannot correlate message 'order-m1-i0184': No process definition or execution matches the parameters
Camunda bpmn
It didn't close the task after we got the callback .

SpringBoot/MongoDB returns error code 251

We use Spring Boot 3.0.2 (which in turn uses the mongo driver 4.8.2). We use MongoDB 6.0.4, replica set size 1.
We use the reactive stack and the transactional operator to demarcate the transactions like this:
Mono.just(initialState).flatMap(queryDatabase).flatMap(createMongoDocument1InCollectionA).flatMap(updateMongoDocument2InCollectionB).as(transactionalOperator::transactional).retryWhen(retrySpec)
We use read and write concern majority (even if it is not relevant with a replica set size 1). All other settings, e.g. session synchronization, are default.
If we run this code in parallel in multiple threads it often (about 20%) fails with the following message:
Command failed with error 251 (NoSuchTransaction): 'Given transaction number 8 does not match any in-progress transactions. The active transaction number is 7' on server localhost:49633. The full response is {"errorLabels": ["TransientTransactionError"], "ok": 0.0, "errmsg": "Given transaction number 8 does not match any in-progress transactions. The active transaction number is 7", "code": 251, "codeName": "NoSuchTransaction", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1675160699, "i": 11}}, "signature": {"hash": {"$binary": {"base64": "75m2LsbMsyLuFwtSHZPInaFs4Lo=", "subType": "00"}}, "keyId": 7194760344735055879}}, "operationTime": {"$timestamp": {"t": 1675160699, "i": 11}}}
The retry normally helps here, but not always (in case the max attempts count is exceeded). We did not observe this error if we have just one thread executing a single request. We also noticed, that if multiple requests from the parallel threads do not try to modify the same Mongo document, this error occurs much less often (maybe about 5%).
We observed the same error occuring even more often (about 60%) with Spring Boot 2.7.6 (mongo driver 4.6.1) and MongoDB 6.0.4 (or MongoDB 4.2.0). The error message was:
Query failed with error code 251 and error message 'Given transaction number 1 does not match any in-progress transactions. The active transaction number is -1' on server localhost:49372;
The issue will certainly cause a significant operation disruption in the high concurrency production environment.
Any help on the explanation and the fix of the issue would be very appreciated.

While using Instaloader via command line, how can I force 429 errors to cause requests to be retried after a longer period of time?

I am using Instaloader via command line on Windows 11, with the following command:
.\instaloader --login=MYUSERNAME :saved --dirname-pattern="Saved_Posts\{profile}" --filename-pattern="{profile}-{shortcode}" --no-resume --no-metadata-json --slide 1 --no-captions --no-video-thumbnails --no-iphone
This attempts to download approximately 12,000 saved posts from a profile. Instaloader behaves as expected for several thousand posts, occasionally giving the following error:
Too many queries in the last time. Need to wait 15 seconds, until 13:19.
The process then resumes successfully for several hundred more posts. Eventually, however, I start encountering 429 errors:
JSON Query to graphql/query: 429 Too Many Requests [retrying; skip with ^C]
Number of requests within last 10/11/20/22/30/60 minutes grouped by type:
d6f4427fbe92d846298cf93df0b937d3: 0 0 0 0 0 0
f883d95537fbcd400f466f63d42bd8a1: 0 0 0 1 1 11
* 2b0673e0dc4580674a88d426fe00ea90: 59 64 121 134 191 709
Instagram responded with HTTP error "429 - Too Many Requests". Please
do not run multiple instances of Instaloader in parallel or within
short sequence. Also, do not use any Instagram App while Instaloader
is running.
The request will be retried in 7 seconds, at 14:01.
This error then repeats over and over again, I believe until the default maximum connection attempts limit is reached and it moves onto the next post — which also receives the same error. Importantly, this error does not go away after several hours of these 'slower' requests being made; it seems to persist as long as Instaloader stays open. I have seen these 429 errors with very few requests in the last 60 minutes (i.e. <100), which makes me think I am hitting quite a long shadowban.
I have tried setting the maximum connection attempts to 0 (i.e. retry indefinitely), but this time limit appears to be capped at 666 seconds, or 11 minutes. The error does not seem to clear even leaving Instaloader to send requests every 11 minutes in this way; it is as though each individual request 'resets' the ban or something.
I am looking for a way of resolving this issue, which could include:
Adding a command to force 429 errors to be retried after subsequently longer periods of time (instead of the number of seconds being capped at 666 seconds)
Adding a command that 'preserves' wait times after each 429 error. e.g. if downloading Post 456 fails and retries after 5, then 10, then 15 seconds before successfully downloading, and then downloading Post 457 immediately fails... start the wait for a retry on Post 457 at at LEAST 15 seconds, rather than going back to 5!
Avoiding the 429 errors in the first place, if there appears to be an issue with my command line prompt
Breaking down the request into 'batches' and running one of those prompts every few days. e.g. is there a way to download Saved Posts 1-500, then 500-1000, and so on? (The Saved Posts are not necessarily in chronological order of the post date, which is what I've tried so far)
I have looked at several other posts on 429 errors but the general theme seems to be either:
Wait some time for the issue to clear — have tried this for up to 48 hours, but running the command again starts from post #1 and never makes it to the latter half of posts
Disable iPhone API requests — already done, which helps but does not solve the issue
The 429 errors simply should not be encountered during normal behaviour – well, they are!

KafkaConsumer poll() behavior understanding

Trying to understand (new to kafka)how the poll event loop in kafka works.
Use Case : 25 records on the topic, max poll size is set to 5.
max.poll.interval.ms = 5000 //5 seconds by default max.poll.records = 5
Sequence of tasks
Poll the records from the topic.
Process the records in a for loop.
Some processing login where the logic would either pass or fail.
If logic passes (with offset) will be added to a map.
Then it will be committed using commitSync call.
If fails then the loop will break and whatever was success before this would be committed.The problem starts after this.
The next poll would just keep moving in batches of 5 even after error, is it expected?
What we basically expect is that the loop breaks and the offsets till success process message logic should get committed, then the next poll should continue from the failed message.
Example, 1st batch of poll 5 messages polled and 1,2 offsets successful and committed then 3rd failed.So the poll call keep moving to next batch like 5-10,10-15 if there are any errors in between we expect it to stop at that point and poll should start from 3 in first case or if it fails in 2nd batch at 8 then the next poll should start from 8th offset not from next max poll batch settings which would be like 5 in this case.IF IT MATTERS USING SPRING BOOT PROJECT and enable autocommit is false.
I have tried finding this in documentation but no help.
tried tweaking this but no help max.poll.interval.ms
EDIT: Not accepted answer because there is no direct solution for a customer consumer.Keeping this for informational purpose
max.poll.interval.ms is milliseconds, not seconds so it should be 5000.
Once the records have been returned by the poll (and offsets not committed), they won't be returned again unless you restart the consumer or perform seek() operations on the consumer to reset the offset to the unprocessed ones.
The Spring for Apache Kafka project provides a SeekToCurrentErrorHandler to perform this task for you.
If you are using the consumer yourself (which it sounds like), you must do the seeks.
You can manually seek to the beginning offset of the poll for all the assigned partitions on failure. I am not sure using spring consumer.
Sample code for seeking offset to beginning for normal consumer.
In the code below I am getting the records list per partition and then getting the offset of the first record to seek to.
def seekBack(records: ConsumerRecords[String, String]) = {
records.partitions().map(partition => {
val partitionedRecords = records.records(partition)
val offset = partitionedRecords.get(0).offset()
consumer.seek(partition, offset)
})
}
One problem doing this in production is bad since you don't want seekback all the time only in cases where you have a transient error otherwise you will end up retrying infinitely.

WebSphere FFDC Count, what does it mean?

I am trying to understand what the "Count" column means in a WebSphere FFDC exception log. IBM told us we have received this error 6835 times. I have not found a good guide which explains what this count shows but from what I have seen it seems to be the number of times that this exception happened since the last JVM restart. The problem is that does not match up with our logs as this error only appears to be thrown 1 time per day with our daily restarts which I can see in the Systemout.log. Also this count doesn't seem to change over a weeks time in the exception logs. Can anyone help?
Index Count Time of last Occurrence Exception SourceId ProbeId
------+------+---------------------------+--------------------------
21 6835 11/19/11 7:00:17:631 UTC java.util.zip.ZipException com.ibm.ws.classloader.ClassLoaderUtils.addDependents 238
This website may be helpful: http://wasdynacache.blogspot.com/2011_07_01_archive.html (Section entitled "First Failure Data Capture for your enterprise application with WebSphere Application Server")
FFDC is capturing the state of the application server and/or the
application when an unexpected error or exception occurs the first
time. All subsequent iterations of the same error/exception are
ignored.
I tried seeing if I could find information for you specifically about count, but all FFDC logs can be formatted differently depending on which application calls it and which formatter the call uses. Good luck.

Resources