Spring Batch Persist Job Meta Data - spring

I wanted to Persist Spring Batch Meta Data related to Jobs/Steps instead of Using MapJobRepository, as it is not recommended for Production Use.
I have extended DefaultBatchConfigurer and have overridden the methods to use My Primary DataSource and Transaction Manager by using JobRepositoryFactoryBean.
Also i have Registered SimpleJobLauncher bean to use custom job repository , for which i have used the below code
#Bean
SimpleJobLauncher simpleJobLauncher() throws Exception {
SimpleJobLauncher simpleJobLauncher= new SimpleJobLauncher();
simpleJobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor());
simpleJobLauncher.setJobRepository(customRepository());
return simpleJobLauncher;
}
#Bean
public JobRepository customRepository() throws Exception {
JobRepositoryFactoryBean jobRepositoryFactoryBean = new JobRepositoryFactoryBean();
jobRepositoryFactoryBean.setDataSource(myPrimaryDataSource);
jobRepositoryFactoryBean.setTransactionManager(myTransactionManager);
jobRepositoryFactoryBean.setDatabaseType("MYSQL");
jobRepositoryFactoryBean.setTablePrefix("BATCH_");
jobRepositoryFactoryBean.setMaxVarCharLength(1000);
return jobRepositoryFactoryBean.getObject();
}
I am facing with below exception when i launch the job
java.lang.NullpointerException:null
at org.springframework.batch.core.repository.dao.MapJobExecutionDao:synchronizeStatus(MapJobExecutionDao.java 161)
Please do let me know where exactly my configuration is wrong.

From your exception, it looks like Spring Batch is configured to use the Map based job repository and not the persistent one.
Since you extended DefaultBatchConfigurer, you need to override createJobRepository and return your custom job repository. Declaring a bean of type JobRepository in the application context is not sufficient.

Related

Spring Batch/Data JPA application not persisting/saving data to Postgres database when calling JPA repository (save, saveAll) methods

I am near wits-end. I read/googled endlessly so far and tried the solutions on all the google/stackoverflow posts that have this similiar issue (there a quite a few). Some seemed promising, but nothing has worked for me yet; though I have made some progress and I am on the right track I believe (I'm believing at this point its something with the Transaction manager and some possible conflict with Spring Batch vs. Spring Data JPA).
References:
Spring boot repository does not save to the DB if called from scheduled job
JpaItemWriter: no transaction is in progress
Similar to the aforementioned posts, I have a Spring Boot application that is using Spring Batch and Spring Data JPA. It reads comma delimited data from a .csv file, then does some processing/transformation, and attempts to persist/save to database using the JPA Repository methods, specifically here .saveAll() (I also tried .save() method and this did the same thing), since I'm saving a List<MyUserDefinedDataType> of a user-defined data type (batch insert).
Now, my code was working fine on Spring Boot starter 1.5.9.RELEASE, but I recently attempted to upgrade to 2.X.X, which I found, after countless hours of debugging, only version 2.2.0.RELEASE would persist/save data to database. So an upgrade to >= 2.2.1.RELEASE breaks persistence. Everything is read fine from the .csv, its just when the first time the code flow hits a JPA repository method like .save() .saveAll(), the application keeps running but nothing gets persisted. I also noticed the Hikari pool logs "active=1 idle=4", but when I looked at the same log when on version 1.5.9.RELEASE, it says active=0 idle=5 immediately after persisting the data, so the application is definitely hanging. I went into the debugger and even saw after jumping into the Repository calls, it goes into almost an infinite cycle through the Spring AOP libraries and such (all third party) and I don't believe ever comes back to the real application/business logic that I wrote.
3c22fb53ed64 2021-05-20 23:53:43.909 DEBUG
[HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Pool stats (total=5, active=1, idle=4, waiting=0)
Anyway, I tried the most common solutions that worked for other people which were:
Defining a JpaTransactionManager #Bean and injecting it into the Step function, while keeping the JobRepository using the PlatformTransactionManager. This did not work. Then I also I tried using the JpaTransactionManager also in the JobRepository #Bean, this also did not work.
Defining a #RestController endpoint in my application to manually trigger this Job, instead of doing it manually from my main Application.java class. (I talk about this more below). And per one of the posts I posted above, the data persisted correctly to the database even on spring >= 2.2.1, which further I suspect now something with the Spring Batch persistence/entity/transaction managers is messed up.
The code is basically this:
BatchConfiguration.java
#Configuration
#EnableBatchProcessing
#Import({DatabaseConfiguration.class})
public class BatchConfiguration {
// Datasource is a Postgres DB defined in separate IntelliJ project that I add to my pom.xml
DataSource dataSource;
#Autowired
public BatchConfiguration(#Qualifier("dataSource") DataSource dataSource) {
this.dataSource = dataSource;
}
#Bean
#Primary
public JpaTransactionManager jpaTransactionManager() {
final JpaTransactionManager tm = new JpaTransactionManager();
tm.setDataSource(dataSource);
return tm;
}
#Bean
public JobRepository jobRepository(PlatformTransactionManager transactionManager) throws Exception {
JobRepositoryFactoryBean jobRepositoryFactoryBean = new JobRepositoryFactoryBean();
jobRepositoryFactoryBean.setDataSource(dataSource);
jobRepositoryFactoryBean.setTransactionManager(transactionManager);
jobRepositoryFactoryBean.setDatabaseType("POSTGRES");
return jobRepositoryFactoryBean.getObject();
}
#Bean
public JobLauncher jobLauncher(JobRepository jobRepository) {
SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
simpleJobLauncher.setJobRepository(jobRepository);
return simpleJobLauncher;
}
#Bean(name = "jobToLoadTheData")
public Job jobToLoadTheData() {
return jobBuilderFactory.get("jobToLoadTheData")
.start(stepToLoadData())
.listener(new CustomJobListener())
.build();
}
#Bean
#StepScope
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(maxThreads);
threadPoolTaskExecutor.setThreadGroupName("taskExecutor-batch");
return threadPoolTaskExecutor;
}
#Bean(name = "stepToLoadData")
public Step stepToLoadData() {
TaskletStep step = stepBuilderFactory.get("stepToLoadData")
.transactionManager(jpaTransactionManager())
.<List<FieldSet>, List<myCustomPayloadRecord>>chunk(chunkSize)
.reader(myCustomFileItemReader(OVERRIDDEN_BY_EXPRESSION))
.processor(myCustomPayloadRecordItemProcessor())
.writer(myCustomerWriter())
.faultTolerant()
.skipPolicy(new AlwaysSkipItemSkipPolicy())
.skip(DataValidationException.class)
.listener(new CustomReaderListener())
.listener(new CustomProcessListener())
.listener(new CustomWriteListener())
.listener(new CustomSkipListener())
.taskExecutor(taskExecutor())
.throttleLimit(maxThreads)
.build();
step.registerStepExecutionListener(stepExecutionListener());
step.registerChunkListener(new CustomChunkListener());
return step;
}
My main method:
Application.java
#Autowired
#Qualifier("jobToLoadTheData")
private Job loadTheData;
#Autowired
private JobLauncher jobLauncher;
#PostConstruct
public void launchJob () throws JobParametersInvalidException, JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException
{
JobParameters parameters = (new JobParametersBuilder()).addDate("random", new Date()).toJobParameters();
jobLauncher.run(loadTheData, parameters);
}
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
Now, normally I'm reading this .csv from Amazon S3 bucket, but since I'm testing locally, I am just placing the .csv in the project directory and reading it directly by triggering the job in the Application.java main class (as you can see above). Also, I do have some other beans defined in this BatchConfiguration class but I don't want to over-complicate this post more than it already is and from the googling I've done, the problem possibly is with the methods I posted (hopefully).
Also, I would like to point out, similar to one of the other posts on Google/stackoverflow with a user having a similar problem, I created a #RestController endpoint that simply calls the .run() method the JobLauncher and I pass in the JobToLoadTheData Bean, and it triggers the batch insert. Guess what? Data persists to the database just fine, even on spring >= 2.2.1.
What is going on here? is this a clue? is something funky going wrong with some type of entity or transaction manager? I'll take any advice tips! I can provide any more information that you guys may need , so please just ask.
You are defining a bean of type JobRepository and expecting it to be picked up by Spring Batch. This is not correct. You need to provide a BatchConfigurer and override getJobRepository. This is explained in the reference documentation:
You can customize any of these beans by creating a custom implementation of the
BatchConfigurer interface. Typically, extending the DefaultBatchConfigurer
(which is provided if a BatchConfigurer is not found) and overriding the required
getter is sufficient.
This is also documented in the Javadoc of #EnableBatchProcessing. So in your case, you need to define a bean of type Batchconfigurer and override getJobRepository and getTransactionManager, something like:
#Bean
public BatchConfigurer batchConfigurer(EntityManagerFactory entityManagerFactory, DataSource dataSource) {
return new DefaultBatchConfigurer(dataSource) {
#Override
public PlatformTransactionManager getTransactionManager() {
return new JpaTransactionManager(entityManagerFactory);
}
#Override
public JobRepository getJobRepository() {
JobRepositoryFactoryBean jobRepositoryFactoryBean = new JobRepositoryFactoryBean();
jobRepositoryFactoryBean.setDataSource(dataSource);
jobRepositoryFactoryBean.setTransactionManager(getTransactionManager());
// set other properties
return jobRepositoryFactoryBean.getObject();
}
};
}
In a Spring Boot context, you could also override the createTransactionManager and createJobRepository methods of org.springframework.boot.autoconfigure.batch.JpaBatchConfigurer if needed.

Tasklet or ItemReader that makes calls to Google Cloud Datastore

I'm attempting to create a Spring Batch tasklet that calls a DatastoreRepository. Tasklet execute step
#Override public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext)
throws Exception {
String syncId = this.stepExecutionContext.getJobParameters()
.getString(JobParameterKeys.SYNC_ID);
SyncJob syncJob = syncJobRepo.findById(Long.parseLong(syncId)).get();
When I attempt to call the syncJobRepo I receive a the org.springframework.transaction.NoTransactionException: No transaction aspect-managed TransactionStatus in scope exception.
I have a custom configured datasource (mySql instance) backing Spring Batch for storing the job execution metadata.
I've attempted to define the DatastoreTransactionManager
#Bean
DatastoreTransactionManager datastoreTransactionManager() {
DatastoreTransactionManager manager
= new DatastoreTransactionManager(DatastoreOptions.getDefaultInstance().getService());
return manager;
}
My configuration is annotated with #EnableBatchProcessing
the batch job config:
#Bean public Job customersJob(StepBuilderFactory stepBuilderFactory,
JobBuilderFactory jobBuilderFactory,
Tasklet batchCustomerReader,
SyncJobNotificationListener listener,
DatastoreTransactionManager datastoreTransactionManager) {
Step step = stepBuilderFactory.get(CUSTOMERS_BATCH_JOB_LOAD_FROM_ERP_STEP)
.tasklet(batchCustomerReader)
.transactionManager(datastoreTransactionManager)
.listener(listener)
.build();
return jobBuilderFactory.get(CUSTOMERS_BATCH_JOB)
.incrementer(new RunIdIncrementer())
.start(step)
.build();
}
I have a custom configured datasource (mySql instance) backing Spring Batch for storing the job execution metadata.
If you use different datasources for Spring Batch meta-data and your business data, then you need to configure an XA transaction manager to synchronize the transaction between both datasources. This way, both data and meta-data are kept in sync in case of failure/restart scenario.
A similar Q/A can be found here: Seperate datasource for jobrepository and writer of Spring Batch

Spring - Instantiating beans results in infinite recursion and (ironic) StackOverflow exception. How to fix?

When I launch my application, for some reason not apparent to me it is waiting until it instantiates the SchedulerFactoryBean to instantiate the jtaTransactionManager bean. When it does this, Spring goes into an infinite recursion starting from resulting in a StackOverflow exception.
After tracing hte code, I see no circular dependency - transaction manager is not dependent in any way on the SchedulerAccessor
In the stack view image at the bottom, the Proxy$98 class is some enhancement of org.springframework.scheduling.quartz.SchedulerAccessor
Edit 1: Update
What is happening is that the SchedulerFactoryBean is being initialized in the preInstantiateSingletons() method of the bean factory. The transaction manager is not a singleton, so it is not pre-initialized. When Spring goes through the advisements, it tries to initialize the bean, but the advisement leads it back to the same pathway.
Edit 2: Internals (or infernals)
The spring class org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration implements the transactionManager attribute as a LazyProxy.
This is executed well before the initialization code constructs the actual TransactionManager bean. At some point, the class needs to invoke a transaction within the TransactionManager context, which causes the spring container to try to instantiate the bean. Since there is an advice on the bean proxy, the method interceptor in the SimpleBatchConfiguration class tries to execute the getTransaction() method, which in turn causes the spring container to try to instantiate the bean, which calls the intergceptor, which tries to execute the getTransaction() method ....
Edit 3: #EnableBatchProcessing
I use the word "apparent" a lot here because it's guesswork based on the failure modes during startup.
There is (apparently) no way to configure which transaction manager is being used in the #EnableBatchProcessing annotation. Stripping out the #EnableBatchProcessing has eliminated the recursive call, but left me with an apparent circular dependency.
For some unknown reason, even though I have traced and this code is called exactly once, it fails because it thinks the bean named "configurer" is already in creation:
#Bean({ "configurer", "defaultBatchConfigurer" })
#Order(1)
public BatchConfigurer configurer() throws IOException, SystemException {
DefaultBatchConfigurer result = new DefaultBatchConfigurer(securityDataSource(), transactionManager());
return result;
}
The code that initiates the recursion is:
protected void registerJobsAndTriggers() throws SchedulerException {
TransactionStatus transactionStatus = null;
if (this.transactionManager != null) {
transactionStatus = this.transactionManager.getTransaction(new DefaultTransactionDefinition());
}
AppInitializer Startup Code:
#Override
public void onStartup(ServletContext container) throws ServletException {
Logger logger = LoggerFactory.getLogger(this.getClass());
try {
// DB2XADataSource db2DataSource = null;
AnnotationConfigWebApplicationContext rootContext = new AnnotationConfigWebApplicationContext();
rootContext.register(DatabaseConfig.class);
rootContext.register(SecurityConfig.class);
rootContext.register(ExecutionContextConfig.class);
rootContext.register(SimpleBatchConfiguration.class);
rootContext.register(MailConfig.class);
rootContext.register(JmsConfig.class);
rootContext.register(SchedulerConfig.class);
rootContext.refresh();
} catch (Exception ex) {
logger.error(ex.getMessage(), ex);
}
}
Construction of jtaTransactionManager bean in DatabaseConfig
#Bean(destroyMethod = "shutdown")
#Order(1)
public BitronixTransactionManager bitronixTransactionManager() throws IOException, SystemException {
btmConfig();
BitronixTransactionManager bitronixTransactionManager = TransactionManagerServices.getTransactionManager();
bitronixTransactionManager.setTransactionTimeout(3600); // TODO: Make this configurable
return bitronixTransactionManager;
}
#Bean({ "transactionManager", "jtaTransactionManager" })
#Order(1)
public PlatformTransactionManager transactionManager() throws IOException, SystemException {
JtaTransactionManager mgr = new JtaTransactionManager();
mgr.setTransactionManager(bitronixTransactionManager());
mgr.setUserTransaction(bitronixTransactionManager());
mgr.setAllowCustomIsolationLevels(true);
mgr.setDefaultTimeout(3600);
mgr.afterPropertiesSet();
return mgr;
}
Construction of SchedulerFactoryBean in SchedulerConfig
#Autowired
#Qualifier("transactionManager")
public void setJtaTransactionManager(PlatformTransactionManager jtaTransactionManager) {
this.jtaTransactionManager = jtaTransactionManager;
}
#Bean
#Order(3)
public SchedulerFactoryBean schedulerFactoryBean() {
Properties quartzProperties = new Properties();
quartzProperties.put("org.quartz.jobStore.driverDelegateClass",
delegateClass.get(getDatabaseType()));
quartzProperties.put("org.quartz.jobStore.tablePrefix", getTableSchema()
+ ".QRTZ_");
quartzProperties.put("org.quartz.jobStore.class",
org.quartz.impl.jdbcjobstore.JobStoreCMT.class.getName());
quartzProperties.put("org.quartz.scheduler.instanceName",
"MxArchiveScheduler");
quartzProperties.put("org.quartz.threadPool.threadCount", "3");
SchedulerFactoryBean result = new SchedulerFactoryBean();
result.setDataSource(securityDataSource());
result.setNonTransactionalDataSource(nonJTAsecurityDataSource());
result.setTransactionManager(jtaTransactionManager);
result.setQuartzProperties(quartzProperties);
return result;
}
There were several impossibly convoluted to figure out steps to a resolution. I ended up monkeying it until it worked because the exception messages were not information.
In the end, here is the result:
refactored packaging so job/step scoped and global scoped beans were in different packages, so context scan could capture the right beans in the right context easily.
Cloned and modified org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration to acquire the beans I wanted for my application
Took out the #EnableBatchProcessing annotation. Since I was already initializing less automagically, everything was initializing twice which created confusion
Cleaned up the usage of datasources - XA and non-XA
Use the #Primary annotation to pick out the correct (Biting tongue here - no way to tell the framework which of several datasources to use without implicitly telling it that in case of questions always use "this one"? Really???)

Need to configure my JPA layer to use a TransactionManager (Spring Cloud Task + Batch register a PlatformTransactionManager unexpectedly)

I am using Spring Cloud Task + Batch in a project.
I plan to use different datasources for business data and Spring audit data on the task. So I configured something like:
#Bean
public TaskConfigurer taskConfigurer() {
return new DefaultTaskConfigurer(this.singletonNotExposedSpringDatasource());
}
#Bean
public BatchConfigurer batchConfigurer() {
return new DefaultBatchConfigurer(this.singletonNotExposedSpringDatasource());
}
whereas main datasource is autoconfigured through JpaBaseConfiguration.
The problem comes when SimpleBatchConfiguration+DefaultBatchConfigurer expose a PlatformTransactionManager bean, since JpaBaseConfiguration has a #ConditionalOnMissingBean on PlatformTransactionManager. Therefore Batch's PlatformTransactionManager, binded to the spring.datasource takes place.
So far, this seems to be caused because this bug
So I tried to emulate what JpaBaseConfiguration does, defining my own PlatformTransactionManager over my biz datasource/entityManager.
#Primary
#Bean
public PlatformTransactionManager appTransactionManager(final LocalContainerEntityManagerFactoryBean appEntityManager) {
JpaTransactionManager transactionManager = new JpaTransactionManager();
transactionManager.setEntityManagerFactory(appEntityManager.getObject());
this.appTransactionManager = transactionManager;
return transactionManager;
}
Note I have to define it with a name other than transactionManager, otherwise Spring finds 2 beans and complains (unregardless of #Primary!)
But now it comes the funny part. When running the tests, everything runs smooth, tests finish and DDLs are properly created for both business and Batch/Task's databases, database reads work flawlessly, but business data is not persisted in my testing database, so final assertThats fail when counting. If I #Autowire in my test PlatformTransactionManager or ÈntityManager, everything indicates they are the proper ones. But if I debug within entityRepository.save, and execute org.springframework.transaction.interceptor.TransactionAspectSupport.currentTransactionStatus(), it seems the DatasourceTransactionManager from Batch's configuration is overriding, so my custom exposed PlatformTransactionManager is not being used.
So I guess it is not a problem of my PlatformManager being the primary, but that something is configuring my JPA layer TransactionInterceptor to use the non primary but transactionManager named bean of Batch.
I also tried with making my #Configuration implement TransactionManagementConfigurer and override PlatformTransactionManager annotationDrivenTransactionManager() but still no luck
Thus, I guess what I am asking is whether there is a way to configure the primary TransactionManager for the JPA Layer.
The problem comes when SimpleBatchConfiguration+DefaultBatchConfigurer expose a PlatformTransactionManager bean,
As you mentioned, this is indeed what was reported in BATCH-2788. The solution we are exploring is to expose the transaction manager bean only if Spring Batch creates it.
In the meantime you can set the property spring.main.allow-bean-definition-overriding=true to allow bean definition overriding and set the transaction manager you want Spring Batch to use with BatchConfigurer#getTransactionManager. In your case, it would be something like:
#Bean
public BatchConfigurer batchConfigurer() {
return new DefaultBatchConfigurer(this.singletonNotExposedSpringDatasource()) {
#Override
public PlatformTransactionManager getTransactionManager() {
return new MyTransactionManager();
}
};
}
Hope this helps.

Which is the best transaction manager to use in spring batch application in production environment?

Can you please let me know which Transaction Manager should be used in Spring Batch Application in Production ? I am using Resourceless Transaction manager. Is it fine ? I am facing this issue when reading the data from External Oracle DB
[org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler]
(pool-3130-thread-1) Unexpected error occurred in scheduled task.:
org.springframework.transaction.CannotCreateTransactionException:
Could not open JDBC Connection for transaction; nested exception is
java.sql.SQLRecoverableException: Closed Connection
#Bean
public ResourcelessTransactionManager resourcelessTransactionManager() {
return new ResourcelessTransactionManager();
}
#Bean
public MapJobRepositoryFactoryBean mapJobRepositoryFactory(
ResourcelessTransactionManager txManager) throws Exception {
//LOGGER.info("Inside mapJobRepositoryFactory method");
MapJobRepositoryFactoryBean factory = new MapJobRepositoryFactoryBean(txManager);
factory.setTransactionManager(txManager);
factory.setIsolationLevelForCreate("ISOLATION_READ_UNCOMMITTED");
factory.afterPropertiesSet();
return factory;
}
#Bean
public JobRepository jobRepository(
MapJobRepositoryFactoryBean factory) throws Exception {
//LOGGER.info("Inside jobRepository method");
return factory.getObject();
}
#Bean
public ThreadPoolTaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(5);
taskExecutor.setMaxPoolSize(10);
taskExecutor.setQueueCapacity(30);
return taskExecutor;
}
#Bean
public JobLauncher jobLauncher(JobRepository jobRepository,ThreadPoolTaskExecutor taskExecutor) {
//LOGGER.info("Inside jobLauncher method");
SimpleJobLauncher launcher = new SimpleJobLauncher();
launcher.setTaskExecutor(taskExecutor);
launcher.setJobRepository(jobRepository);
final SimpleAsyncTaskExecutor simpleAsyncTaskExecutor = new SimpleAsyncTaskExecutor();
launcher.setTaskExecutor(simpleAsyncTaskExecutor);
return launcher;
}
As you have said in your question that you are using Oracle DB so most likely you wouldn't need a ResourcelessTransactionManager.
What your current code is doing is storing job meta data into map based in memory structure and my guess is that you wouldn't be doing that in production and you would actually be storing job meta data in DB - for later analysis , restart-ability etc
In Spring Batch , there are two kinds of transactions - one for your business data and methods and second for job repository and lets say you want to read from a file , write to a file and wish to store job meta data into MapJobRepository then your code would work OK.
But the moment you define a DataSource , you can't use ResourcelessTransactionManager . In fact , with databases , you don't need to define any transaction manager on your own but Spring Batch chunk oriented processing will take care of itself and will store job meta data into database.

Resources