Spring batch Remote partitioning | can Master step complete without completion of slave step - spring

I have written spring batch remote partitioning approach.I dont want my master step to waits for slave step acknowledgment.I want my master step to complete as soon as it partitions the data.
Is there any configuration for same in spring batch.

If the manager should not wait for workers, what you are describing is not a manager/worker configuration anymore. In a manager/worker setup, the manager divides the work among workers and waits for them to finish (in Spring Batch, you can configure the manager to wait in two different ways: poll the job repository for worker statuses, or gather replies from workers up to a given timeout).
I don't see the rationale behind this "fire-and-forget" approach (who would monitor the status of workers and drive the process accordingly?), but remote partitioning is definitely not suitable to implement this pattern (at least in my opinion). If you really want to (ab)use remote partitioning for that, you can register a custom PartitionHandler that does not wait for workers (ie remove this section).

Related

Service synchronization issue

I've created two services.
One of them (scheduler) only requests to the other (backoffice) for performing some "large" operations.
When backoffice receives a request:
first creates a mark (key on redis) in order to set that the process has started.
Each time a request is reached:
backoffice checks if the mark exist.
When it exists means that the previous process has not yet finished, and escape it.
Perform the large process.
When process is finished, the previous key in redis is removed.
It would be something like this:
if (key exists)
return;
make long process... (1);
remove key;
The problem arises when service is destroyed when the process has not already finished and then it doesn't removes the mark on redis. It means the process will never run again.
Is there any way to solve this kind of problems?
The way to solve this problem is use an existing engine as building custom scalable and robust solution for reliable service orchestration is really hard.
I recommend looking at Uber Cadence Workflow which would allow to convert your pseudocode into a real production application with minor changes.
You can fire a background job that updates timestamp under the key, e.g. every minute.
When service attempts to start the process it must verify key existence (as it does now) + timestamp under the key. If it is more than 1 minute ago then the previous attempt is stale and you can start over.
Sounds like you should be using a messaging queue to schedule tasks for the back office service. Queuing solutions like RabbitMQ allow you to manually acknowledge (or “ack”) that the process is complete. Whenever a subscriber crashes, the queue detects that the connection dropped without acknowledgement and will re-enqueue the same task which will be picked up by the next available subscriber. Here’s another thread talking about this problem specifically focused on messaging queues:
What happens to fetched messages when RabbitMQ consumer crashes?

Understanding the MajorDomo Pattern from NetMQ ZeroMQ

I am trying to understand how to best implement the MDP example in c# to be used in a windows service in a multiple client - single server environment.
I have read the docs but I am still unclear on the following:
Should all Worker instances be created on startup and left to run?
Should the Workers all be different types of services or just different instances of the same service?
Can I have one windows service when contains the Broker and Workers or is it best to split them out into their own services?
The example code I am using is the MajorDomo Pattern taken from here https://github.com/NetMQ/Samples
Yes, all workers in a MDP environment should be created independently of the requests, since the broker should not know how to create them
Each worker handles a given "service" (contract). Obviously each contract should have at least one worker.
If you need parallelized handling of requests, and a given worker can only do one at a time, having extra workers for that service could make sense. Generally you would do this if multiple machines were involved however (horizontal scaling)
You can have the broker and workers in the same process. HOWEVER, if you want to update only a worker, taking down the broker at the same time can be annoying for the clients. I would recommend letting the broker be its own process, with the workers in one or more other processes.

Master Slave configuration for Spring Boot Microservices

I have a Spring boot application (Micro-service) running on Two nodes and registered with Eureka Naming server. My requirement is as follows:
An Autosys job will trigger one complex calculation in micro-service which will take about 45 minutes to complete. Result of this calculation will be saved to Gemfire cache and database. I want these two nodes act as Master-Slave where only Master node will take up and execute the request of complex calculation. If master goes down then only slave will become master and will be responsible for execution of complex calculation.
Another catch is while complex calculation is running, if adhoc request for the same calculation comes; latest request needs to be rejected saying calculation is already running.
I explored the possibility to use Apache ZooKeeper but it doesn't seem to satisfy my requirement of serving the request only using Master node.
Is there any way of achieving this?
What about Kafka? It uses ZooKeeper under the covers: https://kafka.apache.org/
You are probably looking for leader election: When does Kafka Leader Election happen?

Spring Batch Flow Job - control number of jobs running simultaneously

I have a complex long-running flow that I'm going to implement based on Spring Batch Flow Job.
My REST API will wait for the incoming request and then (based on each request) initiate a new job execution.
Right now I'm worried about the server resources because the number of incoming requests is a quite big and I'd like to control the number of jobs running simultaneously. Is there any way to tell Spring Batch to run simultaneously not more than the exact number of jobs(let's say 5) and put rest of the jobs into the queue in order to be executed later, when for example one of these previous 5 jobs will be finished?
There is not a way to accomplish this in Spring Batch. The reason for this is that the number of concurrent jobs is really an orchestration problem which Spring Batch specifically avoids solving (allowing you to integrate with whatever you want).
That being said, the ability to control what you're describing can be done in a relatively straight forward manor by implementing a work queue that stores the requests to run a job, and having a service picking up those requests at the other end. The concurrency can be controlled easily with Spring Integration components to prevent the system from being overloaded (assuming you have a mechanism to handle the queue size in question).

Spring Batch or JMS for long running jobs

I have the problem that I have to run very long running processes on my Webservice and now I'm looking for a good way to handle the result. The scenario : A user executes such a long running process via UI. Now he gets the message that his request was accepted and that he should return some time later. So there's no need to display him the status of his request or something like this. I'm just looking for a way to handle the result of the long running process properly. Since the processes are external programms, my application server is not aware of them. Therefore I have to wait for these programms to terminate. Of course I don't want to use EJBs for this because then they would block for the time no result is available. Instead I thought of using JMS or Spring Batch. Does anyone ever had the same problem or an advice which solution would be better?
It really depends on what forms of communication your external programs have available. JMS is a very good approach and immediately available in your app server but might not be the best option if your external program is a long running DB query which dumps the result in a text file...
The main advantage of Spring Batch over "just" using JMS as an aynchronous communcations channel is the transactional properties, allowing the infrastructure to retry failed jobs, group jobs together and such. Without knowing more about your specific setup, it is hard to give detailed advise.
Cheers,
I had a similar design requirement, users were sending XML files and I had to generate documents from them. Using JMS in this case is advantageous since you can always add new instances of these processes which can consume and execute the jobs in parallel.
You can use a timer task to check status or monitor these processes. Also, you can publish a message to a JMS queue once the processes are completed.

Resources