Spring batch - conditional step flow for Chunk model - spring

I have two step where, step 2 should be skipped if the step 1 processor doesn't returned any item after filtration.
I see ItemListenerSupport can be extended and after process can be utilized.
#Override
public void afterProcess(NumberInfo item, Integer result) {
super.afterProcess(item, result);
if (item.isPositive()) {
stepExecution.setExitStatus(new ExitStatus(NOTIFY));
}
}
My processing is chunk based, I want to set the exit status after all chunks are processed and if any item left unfiltered. I am current adding items left unfiltered to ExecutionContext and utilizing in next step.
How would i prevent next step if all the items of all chunks are filtered out

For programmatic decisions, you can use a JobExecutionDecider. This API gives you access to the StepExecution of the previous step, so you can base the decision to run the next step on any information from the last step execution and its execution context. In your case, it could be the filter count or anything meaningful to your decision that you store in the execution context upfront.
You can find more details about this API and some code examples in the Programmatic Flow Decisions section of the reference documentation.

Related

Breaking a possible infinite loop in AWS step functions

I am writing a state machine with the following functionality.
start State -> Lambda1 which calls external service Describe API endpoint to get State attribute of item example "isOKay" or "isNotOkay" -> Choice state((depending on the state received) if "IsOkay" move to next state and if "isNotOkay" again call lambda1. This happens until it gets a IsOkay state. How can put a limit to this custom retry loop so that I dont get stuck if I never receive a IsOkay response.
You can use input your step in a form of counter, which incremented by lambda. Which when return in retry can be checked for a limit, if crosses one fail lambda with custom exception. Describe separate step for handling the exception.
https://docs.aws.amazon.com/step-functions/latest/dg/input-output-inputpath-params.html
https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html

Is it possible to lock some entries in MongoDB and do a query that do not take into account the locked recors?

I have a mongoDB that contains a list of "task" and two istance of executors. This 2 executors have to read a task from the DB, save it in the state "IN_EXECUTION" and execute the task. Of course I do not want that my 2 executors execute the same task and this is my problem.
I use the transaction query. In this way when An executor try to change state of the task it get "write exception" and have to start again and read a new task. The problem of this approach is that sometimes an Executor get a lot of errors before it can save the change of task state correctly and execute a new task. So it is like I have only one exector.
Note:
- I do not want to block my entire DB on read/write becouse in this way I will slow down the entire process.
- I think it is necessay to save the state of the task because it could be a long task.
I asked if it is possible to lock only certain record and execute a query on the "not-locked" records but each advices that solves my problem will be really appriciated.
Thanks in advance.
EDIT1:
Sorry, I simplified the concept in the question above. Actually I extract n messages that I have to send. I have to send this messages in block of 100 messages so my executors will split the messages extracted in block of 100 and pass them to others executors basically.
Each executor extract the messages and then update them with the new state. I hope this is more clear now.
#Transactional(readOnly = false, propagation = Propagation.REQUIRED)
public List<PushMessageDB> assignPendingMessages(int limitQuery, boolean sortByClientPriority,
LocalDateTime now, String senderId) {
final List<PushMessageDB> messages = repositoryMessage.findByNotSendendAndSpecificError(limitQuery, sortByClientPriority, now);
long count = repositoryMessage.updateStateAndSenderId(messages, senderId, MessageState.IN_EXECUTION);
return messages;
}
DB update:
public long updateStateAndSenderId(List<String> ids, String senderId, MessageState messageState) {
Query query = new Query(Criteria.where(INTERNAL_ID).in(ids));
Update update = new Update().set(MESSAGE_STATE, messageState).set(SENDER_ID, senderId);
return mongoTemplate.updateMulti(query, update, PushMessageDB.class).getModifiedCount();
}
You will have to do the locking one-by-one.
Trying to lock 100 records at once and at the same time have a second process also lock 100 records (without any coordination between the two) will almost certainly result in an overlapping set unless you have a huge selection of available records.
Depending on your application, having all work done by one thread (and the other being just a "hot standby") may also be acceptable as long as that single worker does not get overloaded.

Spring batch Read/write in the same table

I have spring batch application which reads and writes into the same table. I have used pagination for reading the items from the table as my data volume is quite high. When I set the chunk size as more than 1 then my pagination number is getting updated wrongly and hence failing to read some items from the table.
Any idea?
#Bean
public Step fooStep1() {
return stepBuilderFactory.get("step1")
.<foo, foo>chunk(chunkSize)
.reader(fooTableReader())
.writer(fooTableWriter())
.listener(fooStepListener())
.listener(chunkListener())
.build();
}
Reader
#Bean
#StepScope
public ItemReader<foo> fooBatchReader(){
NonSortingRepositoryItemReader<foo> reader = new NonSortingRepositoryItemReader<>();
reader.setRepository(service.getRepository());
reader.setPageSize(chunkSize);
reader.setMethodName("findAllByStatusCode");
List<Object> arguments = new ArrayList<>();
reader.setArguments(arguments);
arguments.add(statusCode);
return reader;
}
Don't use a pagination reader. The problem is, that this reader executes a new query for every chunk. Therefore, if you add items or change items in the same table during writing, the queries will not produce the same result.
Dive a little bit into the code of the pagination reader, it is clearly obvious in there.
If you modify the same table you are reading from, then you have to ensure that your result set doesn't change during the processing of the whole step, otherwise, your results may not be predictable and very likely not what you wanted.
Try to use a jdbccursoritemreader. This one creates the query at the beginning of your step, and hence, the result set is defined at the beginning and will not change during the processing of the step.
Editet
Based on your code to configure the reader which you added, I assume a couple of things:
this is not a standard springbatch item reader
you are using a method called "findAllByStatusCode". I assume, that this is the status field that gets updated during writing
Your Reader-Class is named "NonSortingRepositoryItemReader", hence, I assume that there is no guaranteed ordering in your result list
If 3 is correct, then this is very likely the problem. If the order of the elements is not guaranteed, then using a paging reader will definitely not work.
Every page executes it's own select and then moves to the pointer to the appropriate position in the result.
E.g., if you have a pagesize of 5, the first call will return elements 1-5 of its select, the second call will return elements 6-10 of its select. But since the order is not guaranteed, element at position 1 in the first call could be at position 6 in the second call and therefore be processed 2, whilst element 6 in the first call, could be at position 2 in the second call and therefore never been processed.

IBM Integration Bus, best practices for calling multiple services

So I have this requirement, that takes in one document, and from that needs to create one or more documents in the output.
During the cause of this, it needs to determine if the document is already there, because there are different operations to apply for create and update scenarios.
In straight code, this would be simple (conceptually)
InputData in = <something>
if (getItemFromExternalSystem(in.key1) == null) {
createItemSpecificToKey1InExternalSystem(in.key1);
}
if (getItemFromExternalSystem(in.key2) == null) {
createItemSpecificToKey2InExternalSystem(in.key1, in.key2);
}
createItemFromInput(in.key1,in.key2, in.moreData);
In effect a kind of "ensure this data is present".
However, in IIB How would i go about achieving this? If i used a subflow for the Get/create cycle, the output of the subflow would be whatever the result of the last operation is, is returned from the subflow as the new "message" of the flow, but really, I don't care about the value from the "ensure data present" subflow. I need instead to keep working on my original message, but still wait for the different subflows to finish before i can run my final "createItem"
You can use Aggregation Nodes: for example, use 3 flows:
first would be propagate your original message to third
second would be invoke operations createItemSpecificToKey1InExternalSystem and createItemSpecificToKey2InExternalSystem
third would be aggregate results of first and second and invoke createItemFromInput.
Have you considered using the Collector node? It will collect your records into N 'collections', and then you can iterate over the collections and output one document per collection.

Java 8 parallel stream + anyMatch - do threads get interrupted once a match is found?

If I have a parallel stream in java 8, and I terminate with an anyMatch, and my collection has an element that matches the predicate, I'm trying to figure out what happens when one thread processes this element.
I know that anyMatch is short circuiting, so that I wouldn't expect further elements to be processed once the matching element is processed. My confusion is about what happens to the other threads, that are presumably in the middle of processing elements. I can think of 3 plausible scenarios:
a) Do they get interrupted?
b) Do they keep processing the element that they are working on, and then, once all the threads are doing nothing, I get my result?
c) Do I get my result, but the threads that were processing other elements continue processing those elements (but don't take on other elements once they are done)?
I have a long running predicate, where it is very useful to terminate quickly as soon as I know that one element matches. I worry a bit since I can't find this information in the documentation that it might be an implementation dependent thing, which would also be good to know.
Thanks
After some digging through the Java source code I think I found the answer.
The other threads periodically check to see if another thread has found the answer and if so, then they stop working and cancel any not yet running nodes.
java.util.Stream.FindOps$FindTask has this method:
private void foundResult(O answer) {
if (isLeftmostNode())
shortCircuit(answer);
else
cancelLaterNodes();
}
Its parent class, AbstractShortcircuitTask implements shortCircuit like this:
/**
* Declares that a globally valid result has been found. If another task has
* not already found the answer, the result is installed in
* {#code sharedResult}. The {#code compute()} method will check
* {#code sharedResult} before proceeding with computation, so this causes
* the computation to terminate early.
*
* #param result the result found
*/
protected void shortCircuit(R result) {
if (result != null)
sharedResult.compareAndSet(null, result);
}
And the actual compute() method that does the work has this important line:
AtomicReference<R> sr = sharedResult;
R result;
while ((result = sr.get()) == null) {
...//does the actual fork stuff here
}
where sharedResult is updated by the shortCircuit() method so the compute will see it the next time it checks the while loop condition.
EDIT
So in summary:
Threads are not interrupted
Instead, they will periodically check to see if someone has found the answer and will stop further processing if the answer has been found.
No new threads will be started once the answer has been found.

Resources