I'm developing a migration app to migrate data from one database to another database, with different schema's - so we have some processors in between. Currently, we're using the JdbcCursorItemReader in our steps. I'm trying to avoid a temporary network issue resulting in a failed job, hours into a migration job.
I tried extending the JdbcCursorItemReader and overriding the 'open(ExecutionContext ec)' and also the 'read()' methods and annotating them with #Retryable. However, when an exception is thrown in either the open or read methods, the job fails - the exception is not caught and retry is not triggered.
I'm now starting to wonder if the JdbcCursorItemReader can encounter transient errors, which would need to be retried. As I understand it, a long running connection is opened and the results are streamed. Am I wasting my time trying to make the JdbcCursorItemReader retryable? Is it even possible?
If I used the JdbcPagingItemReader, could I make its read() method retryable?
I'm not too experienced with Spring Batch, any guidance on making the reader more resilient would be greatly appreciated!
Regards,
David
The retry policy of a fault-tolerant chunk-oriented step is not applied to the item reader, see Retry not working with Spring Batch with Java Config and Spring Batch: Using SkipPolicy and RetryPolicy with SpringBoot. The reason for that is because the contract of an item reader is "forward only", there is no way (in the current interface) to go back to a previous position in case of a retry.
So the path you are exploring is the way to go. You can add retry capabilities to your custom reader either declaratively with an annotation or programmatically with a RetryTemplate.
Related
I have used a Spring Boot Scheduler with #Scheduled annotation along with fixedRateString of 1 sec. This scheduler intermittently stops working for approx 2 min and then starts working automatically. What can be the possible reasons for this behavior and do we have any resolution to this?
Below is the code snippet for the scheduler.
1st) Please read SO guidelines
DO NOT post images of code, data, error messages, etc. - copy or type
the text into the question. Please reserve the use of images for
diagrams or demonstrating rendering bugs, things that are impossible
to describe accurately via text.
2nd) To your problem
You use a xml spring based configuration where you have configured your sheduler. Then you also use the annotation #Scheduled. You should not mix those 2 different types of configuring beans.
Also you use some type of thread synchronization into this method. Probably some thread is stuck outside of the method because of the lock and this messes the functionality that you want.
Clean either the xml configuration or the annotation for scheduling and try with debug to see why the method behaves as it does which most probable would be from what I have mentioned above about the locks and the multiple configurations.
I've been researching various batch frameworks and I appreciate the abstraction and out-of-the-box features such as skip, retry, listeners etc. that Spring Batch brings with it and I'm familiar with Spring framework so this is a natural choice.
However, the batch flows I intent to create do not have transactional databases on either end of the reader-process-write flow. I desire to use Spring Batch to connect two systems through API's and still leverage the batch framework for tracking job executions and back the batch app with the Spring Batch database.
The remote API's support their own batching concepts, so, we can be sure to process 100 records and attempt to batch insert where the entire batch fails when one record is invalid. In that way, I would still like Spring Batch to "rollback" and retry each record individually.
Is it possible to leverage Spring Batch with its backing batch metadata database, to use skip and retry, without holding database connections or using transactions during chunk based processing?
Edit:
Based on Mahmoud's comment, I can use a DataSourceTransactionManager with the JobRepository and a ResourcelessTransactionManager with the chunk steps. So I will define a custom StepBuilderFactory:
#Component
public class MyStepBuilderFactory extends StepBuilderFactory {
public MyStepBuilderFactory(JobRepository jobRepository) {
super(jobRepository, new ResourcelessTransactionManager());
}
}
A chunk-oriented tasklet requires a transaction manager to handle the "rollback" semantics you are looking for. If your step does not interact with a transactional resource, you can configure it to use a ResourcelessTransactionManager. This transaction manager is a NoOp implementation that can be used in such cases.
You can always use a DataSourceTransactionManager with the job repository to track job/step execution meta-data.
Background:
I created a Stop Job which finds running jobs with the specified name as this:
jobExplorer.findRunningJobExecutions("job_A")
and then, for each execution of job_A it calls:
jobOperator.stop(execution.getId());
Issue
when i call the above stop() method; even though it eventually accomplishes what i want, but it still throws an exception:
WARN o.s.b.c.l.s.SimpleJobOperator [main] Cannot find Job object
org.springframework.batch.core.launch.NoSuchJobException: No job configuration with the name [job_A] was registered
at org.springframework.batch.core.configuration.support.MapJobRegistry.getJob(MapJobRegistry.java:66) ~[spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
at org.springframework.batch.core.launch.support.SimpleJobOperator.stop(SimpleJobOperator.java:403) [spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
The Cause
This happens when the stop() method tries to located job_A in the JobRegistry.
So even though job_A was found in the "JobRepository" because the repository looks in the database, it was not found in the "JobRegistry" which is a local cache of job beans created within its runtime environment, since job_A is running withing a different runtime instance it was not registered and threw and error.
Concern
Even though job 'A' stops i am still concerned what i have missed because of the exception.
I have searched this issue and found only general answers on how to stop a job, however i did not find anyone explaining how to stop a running job of another runtime.
Any answers would be greatly appreciated.
The JobOperator isn't intended to orchestrate distributed batch environments like you're attempting to do. You really have two options:
Use the JobRepository directly - The part that causes the job to stop successfully in the remote JVM is that the JobRepository is updated and the running job in the other JVM knows to check that periodically. Instead of using the JobOperator to accomplish this, just use the JobRepository directly to make the update.
Use a more robust orchestration tool like Spring Cloud Data Flow - This kind of orchestration (deploying, starting, stopping, etc) for jobs (via Spring Cloud Task) is what Spring Cloud Data Flow is for.
You can read more about Spring Cloud Data Flow at the website: https://cloud.spring.io/spring-cloud-dataflow/
In addition to what Michael mentioned you can solve this by adding some interface to the application allowing you to pass commands to start or stop your job. Something like a webservice. Exposing an end-point to stop it. Now the catch in this case is handling in clustered system may be a bit tricky.
I have a situation where the data is to be read from 4 different web services, process it and then store the results in a database table. Also send a notification after this task is complete. The trigger for this process is through a web service call.
Should I write my job as a spring batch job or write the whole read/process code as an async method (using #Async) which is called from the Rest Controller?
Kindly suggest
In my opinion the your choice should be #Async, because Spring Batch was designed for large data processing and it isn't thought to processing on demand, typically you create a your batch and then launch the batch with a schedule. The benefit of this kind of architetture will be the reliability of your job that colud restarted in case of fail and so on. In your case you have a data integration problem and I can suggest to see at Spring Integration. You could have a Spring Integration pipeline that you start through a rest call.
I hope that this can help you
If there are large amounts of services should be executed, spring-batch would be the choice. Otherwise, I guess there is no need to import spring-batch.
In my opinion, #Async annotation is the easier way.
If both methods can work, of course simpler the better.
At the end, if there will be more and more service not only 4, spring-batch would be the better solution, cause spring-batch are professional in this.
I have a web application that uses Struts2 + Spring for the resource injection, basically my DAO. Now I would like to create a thread that periodically polls the database and, if needed, send email notifications to users.
I would like to know how I can implement this in a way that this thread can use my DAO. I haven't been able to manage Spring to inject it the way I've done it. So I would like to hear suggestions and see if someone can point me to the right way.
Right now I have a thread started by a ServletContextListener, that just creates a timer and schedules an action every 5 minutes. But I can't get this action to use my DAO. I don't have any need to use this structure, I'm open to using whichever solution works.
Thanks for your help!
Edit: As axtavt suggested, I used Spring task Execution Scheduling and it works perfectly, the thing is that my task gets injected with the DAO but then I get LazyInitializationException every time I try to access a property of my fetched objects, any suggestion on how to solve that??
Perhaps the best option is to use Spring's own scheduling support, see 25. Task Execution and Scheduling (if necessary - with Quartz, see 25.6 Using the OpenSymphony Quartz Scheduler). This apporach allows you to configure your scheduled action as Spring beans, so you can wire them with other beans such as DAO.
Alternatively, you can use the following to obtain any Spring bean in web application (for example, to obtain DAO from your thread):
WebApplicationContextUtils.getWebApplicationContext(servletContext).getBean(...)