Haw spring batch application are started - spring-boot

Excuse me but this may be a noob question for some but it just crossed my mind and I think it is worth to fix my ideas and get a relevant explanation from some experts.
I just started spring batch Tutorial and i have confusion on haw these application are started. let's take this example on official site
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
#SpringBootApplication
public class Application {
public static void main(String[] args) throws Exception {
SpringApplication.run(Application.class, args);
}
}
and the here is the configuration class
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
public DataSource dataSource;
// tag::jobstep[]
#Bean
public Job importUserJob(JobCompletionNotificationListener listener) {
return jobBuilderFactory.get("importUserJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.flow(step1())
.end()
.build();
}
#Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<Person, Person> chunk(10)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
// end::jobstep[]
}
Here it was mentioned that the main() method uses Spring Boot’s SpringApplication.run() method to launch an application.
it' s not clear for me haw the job importUserJob was executed whereas
there is no explicit code found that show haw we start this job, it's only a configuration part (declaration).
On the other hand i found another example haw to start a spring application like this:
public static void main(String[] args) {
GenericApplicationContext context = new AnnotationConfigApplicationContext(MyBatchConfiguration.class);
JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
Job job = (Job) context.getBean("myJobName");//this is bean name of your job
JobExecution execution = jobLauncher.run(job, jobParameters);
}
Here i can understand that the job is executed with the jobLauncher.

This is the Inversion of Control (IOC, or Dependency Injection) in Spring in action. Essentially this is what happens:
The #SpringBootApplication on Application tells Spring that it should search the package of Application and all sub-packages for classes annotated with relevant annotations
The #Configuration annotation on BatchConfiguration is such a relevant annotation and tells Spring that this class contains information about how the application should be configured. This means that Spring will create an instance of this class and read all attributes and methods on it in search of more information.
The #Autowired annotations on the various fields in BatchConfiguration tells Spring that when instantiating this class these fields should be set to the applicable beans (injected).
The #Bean annotation on importUserJob tells Spring that this method produces a bean, this means that Spring will call this method and store the resulting bean under the Job type (and any interfaces/superclasses it inherits) and under the importUserJob name.
Further annotations on the built job will trigger other actions from Spring. It likely contains annotations that tell Spring to run various functions, connect it to some events. Or another bean requests all instances of Job and do stuff with it in the methods it tells Spring to execute.
The #EnableBatchProcessing annotation is a meta-annotation that tells Spring to search other packages and import other configuration classes to make the beans requested for injection (annotated with #Autowired) available.
IOC can be a pretty advanced concept to grasp in the beginning and Spring definitely isn't the easiest IOC-framework there - though it is my personal favourite - but if you want to know more about Spring Framework and IOC I suggest you start by reading the documentation and then continue on by searching for tutorials.

Related

Spring Batch/Data JPA application not persisting/saving data to Postgres database when calling JPA repository (save, saveAll) methods

I am near wits-end. I read/googled endlessly so far and tried the solutions on all the google/stackoverflow posts that have this similiar issue (there a quite a few). Some seemed promising, but nothing has worked for me yet; though I have made some progress and I am on the right track I believe (I'm believing at this point its something with the Transaction manager and some possible conflict with Spring Batch vs. Spring Data JPA).
References:
Spring boot repository does not save to the DB if called from scheduled job
JpaItemWriter: no transaction is in progress
Similar to the aforementioned posts, I have a Spring Boot application that is using Spring Batch and Spring Data JPA. It reads comma delimited data from a .csv file, then does some processing/transformation, and attempts to persist/save to database using the JPA Repository methods, specifically here .saveAll() (I also tried .save() method and this did the same thing), since I'm saving a List<MyUserDefinedDataType> of a user-defined data type (batch insert).
Now, my code was working fine on Spring Boot starter 1.5.9.RELEASE, but I recently attempted to upgrade to 2.X.X, which I found, after countless hours of debugging, only version 2.2.0.RELEASE would persist/save data to database. So an upgrade to >= 2.2.1.RELEASE breaks persistence. Everything is read fine from the .csv, its just when the first time the code flow hits a JPA repository method like .save() .saveAll(), the application keeps running but nothing gets persisted. I also noticed the Hikari pool logs "active=1 idle=4", but when I looked at the same log when on version 1.5.9.RELEASE, it says active=0 idle=5 immediately after persisting the data, so the application is definitely hanging. I went into the debugger and even saw after jumping into the Repository calls, it goes into almost an infinite cycle through the Spring AOP libraries and such (all third party) and I don't believe ever comes back to the real application/business logic that I wrote.
3c22fb53ed64 2021-05-20 23:53:43.909 DEBUG
[HikariPool-1 housekeeper] com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Pool stats (total=5, active=1, idle=4, waiting=0)
Anyway, I tried the most common solutions that worked for other people which were:
Defining a JpaTransactionManager #Bean and injecting it into the Step function, while keeping the JobRepository using the PlatformTransactionManager. This did not work. Then I also I tried using the JpaTransactionManager also in the JobRepository #Bean, this also did not work.
Defining a #RestController endpoint in my application to manually trigger this Job, instead of doing it manually from my main Application.java class. (I talk about this more below). And per one of the posts I posted above, the data persisted correctly to the database even on spring >= 2.2.1, which further I suspect now something with the Spring Batch persistence/entity/transaction managers is messed up.
The code is basically this:
BatchConfiguration.java
#Configuration
#EnableBatchProcessing
#Import({DatabaseConfiguration.class})
public class BatchConfiguration {
// Datasource is a Postgres DB defined in separate IntelliJ project that I add to my pom.xml
DataSource dataSource;
#Autowired
public BatchConfiguration(#Qualifier("dataSource") DataSource dataSource) {
this.dataSource = dataSource;
}
#Bean
#Primary
public JpaTransactionManager jpaTransactionManager() {
final JpaTransactionManager tm = new JpaTransactionManager();
tm.setDataSource(dataSource);
return tm;
}
#Bean
public JobRepository jobRepository(PlatformTransactionManager transactionManager) throws Exception {
JobRepositoryFactoryBean jobRepositoryFactoryBean = new JobRepositoryFactoryBean();
jobRepositoryFactoryBean.setDataSource(dataSource);
jobRepositoryFactoryBean.setTransactionManager(transactionManager);
jobRepositoryFactoryBean.setDatabaseType("POSTGRES");
return jobRepositoryFactoryBean.getObject();
}
#Bean
public JobLauncher jobLauncher(JobRepository jobRepository) {
SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
simpleJobLauncher.setJobRepository(jobRepository);
return simpleJobLauncher;
}
#Bean(name = "jobToLoadTheData")
public Job jobToLoadTheData() {
return jobBuilderFactory.get("jobToLoadTheData")
.start(stepToLoadData())
.listener(new CustomJobListener())
.build();
}
#Bean
#StepScope
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(maxThreads);
threadPoolTaskExecutor.setThreadGroupName("taskExecutor-batch");
return threadPoolTaskExecutor;
}
#Bean(name = "stepToLoadData")
public Step stepToLoadData() {
TaskletStep step = stepBuilderFactory.get("stepToLoadData")
.transactionManager(jpaTransactionManager())
.<List<FieldSet>, List<myCustomPayloadRecord>>chunk(chunkSize)
.reader(myCustomFileItemReader(OVERRIDDEN_BY_EXPRESSION))
.processor(myCustomPayloadRecordItemProcessor())
.writer(myCustomerWriter())
.faultTolerant()
.skipPolicy(new AlwaysSkipItemSkipPolicy())
.skip(DataValidationException.class)
.listener(new CustomReaderListener())
.listener(new CustomProcessListener())
.listener(new CustomWriteListener())
.listener(new CustomSkipListener())
.taskExecutor(taskExecutor())
.throttleLimit(maxThreads)
.build();
step.registerStepExecutionListener(stepExecutionListener());
step.registerChunkListener(new CustomChunkListener());
return step;
}
My main method:
Application.java
#Autowired
#Qualifier("jobToLoadTheData")
private Job loadTheData;
#Autowired
private JobLauncher jobLauncher;
#PostConstruct
public void launchJob () throws JobParametersInvalidException, JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException
{
JobParameters parameters = (new JobParametersBuilder()).addDate("random", new Date()).toJobParameters();
jobLauncher.run(loadTheData, parameters);
}
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
Now, normally I'm reading this .csv from Amazon S3 bucket, but since I'm testing locally, I am just placing the .csv in the project directory and reading it directly by triggering the job in the Application.java main class (as you can see above). Also, I do have some other beans defined in this BatchConfiguration class but I don't want to over-complicate this post more than it already is and from the googling I've done, the problem possibly is with the methods I posted (hopefully).
Also, I would like to point out, similar to one of the other posts on Google/stackoverflow with a user having a similar problem, I created a #RestController endpoint that simply calls the .run() method the JobLauncher and I pass in the JobToLoadTheData Bean, and it triggers the batch insert. Guess what? Data persists to the database just fine, even on spring >= 2.2.1.
What is going on here? is this a clue? is something funky going wrong with some type of entity or transaction manager? I'll take any advice tips! I can provide any more information that you guys may need , so please just ask.
You are defining a bean of type JobRepository and expecting it to be picked up by Spring Batch. This is not correct. You need to provide a BatchConfigurer and override getJobRepository. This is explained in the reference documentation:
You can customize any of these beans by creating a custom implementation of the
BatchConfigurer interface. Typically, extending the DefaultBatchConfigurer
(which is provided if a BatchConfigurer is not found) and overriding the required
getter is sufficient.
This is also documented in the Javadoc of #EnableBatchProcessing. So in your case, you need to define a bean of type Batchconfigurer and override getJobRepository and getTransactionManager, something like:
#Bean
public BatchConfigurer batchConfigurer(EntityManagerFactory entityManagerFactory, DataSource dataSource) {
return new DefaultBatchConfigurer(dataSource) {
#Override
public PlatformTransactionManager getTransactionManager() {
return new JpaTransactionManager(entityManagerFactory);
}
#Override
public JobRepository getJobRepository() {
JobRepositoryFactoryBean jobRepositoryFactoryBean = new JobRepositoryFactoryBean();
jobRepositoryFactoryBean.setDataSource(dataSource);
jobRepositoryFactoryBean.setTransactionManager(getTransactionManager());
// set other properties
return jobRepositoryFactoryBean.getObject();
}
};
}
In a Spring Boot context, you could also override the createTransactionManager and createJobRepository methods of org.springframework.boot.autoconfigure.batch.JpaBatchConfigurer if needed.

Test, if I don't want to trigger the whole thing

A Spring Boot application
#SpringBootApplication
#EnableScheduling
#Slf4j
public class MyApplication {
#Autowired
private ApplicationEventPublisher publisher;
...
#Bean
public CommandLineRunner commandLineRunner(ApplicationContext ctx) {
...
// read data from a file and publishing an event
}
}
For an integration test, I have something typical.
#SpringBootTest
public class TestingMyApplicationTests{
...
}
After I start a test case in the class, the whole chain events occurs, that is reading a file, publishing events and an event listener acts accordingly.
What is the best approach to avoid such chain events occur during running a test?
If you want to avoid that the whole Spring Context is started for all of your integration tests, you can take a look at other test annotations that create a sliced context:
#WebMvcTest creates a Spring Context with only MVC related beans
#DataJpaTest creates a Spring Context with only JPA/JDBC related beans
etc.
In addition to this, I also would remove your CommandLineRunner from your main entry Spring Boot entry point class. Otherwise also the annotations above would trigger the logic.
Therefore you can outsource it to another #Component class:
#Component
public class WhateverInitializer implements CommandLineRunner{
#Autowired
private ApplicationEventPublisher publisher;
// ...
#Override
public void run(String... args) throws Exception {
...
// read data from a file and publishing an event
}
}
Apart from this you can also use #Profile("production") on your Spring beans to only populate them when a specific profile is active. This way you can either include or exluce them for all your integration tests if you don't want e.g. this startup logic always.

run spring batch job from the controller

I am trying to run my batch job from a controller. It will be either fired up by a cron job or by accessing a specific link.
I am using Spring Boot, no XML just annotations.
In my current setting I have a service that contains the following beans:
#EnableBatchProcessing
#PersistenceContext
public class batchService {
#Bean
public ItemReader<Somemodel> reader() {
...
}
#Bean
public ItemProcessor<Somemodel, Somemodel> processor() {
return new SomemodelProcessor();
}
#Bean
public ItemWriter writer() {
return new CustomItemWriter();
}
#Bean
public Job importUserJob(JobBuilderFactory jobs, Step step1) {
return jobs.get("importUserJob")
.incrementer(new RunIdIncrementer())
.flow(step1)
.end()
.build();
}
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<somemodel> reader,
ItemWriter<somemodel> writer,
ItemProcessor<somemodel, somemodel> processor) {
return stepBuilderFactory.get("step1")
.<somemodel, somemodel> chunk(100)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
}
As soon as I put the #Configuration annotation on top of my batchService class, job will start as soon as I run the application. It finished successfully, everything is fine. Now I am trying to remove #Configuration annotation and run it whenever I want. Is there a way to fire it from the controller?
Thanks!
You need to create a application.yml file in the src/main/resources and add following configuration:
spring.batch.job.enabled: false
With this change, the batch job will not automatically execute with the start of Spring Boot. And batch job will be triggered when specific link.
Check out my sample code here:
https://github.com/pauldeng/aws-elastic-beanstalk-worker-spring-boot-spring-batch-template
You can launch a batch job programmatically using JobLauncher which can be injected into your controller. See the Spring Batch documentation for more details, including this example controller:
#Controller
public class JobLauncherController {
#Autowired
JobLauncher jobLauncher;
#Autowired
Job job;
#RequestMapping("/jobLauncher.html")
public void handle() throws Exception{
jobLauncher.run(job, new JobParameters());
}
}
Since you're using Spring Boot, you should leave the #Configuration annotation in there and instead configure your application.properties to not launch the jobs on startup. You can read more about the autoconfiguration options for running jobs at startup (or not) in the Spring Boot documentation here: http://docs.spring.io/spring-boot/docs/current-SNAPSHOT/reference/htmlsingle/#howto-execute-spring-batch-jobs-on-startup

how to select which spring batch job to run based on application argument - spring boot java config

I have two independent spring batch jobs in the same project because I want to use the same infrastructure-related beans. Everything is configured in Java. I would like to know if there's a proper way to start the jobs independent based for example on the first java app argument in the main method for example. If I run SpringApplication.run only the second job gets executed by magic.
The main method looks like:
#ComponentScan
#EnableAutoConfiguration
public class Application {
public static void main(String[] args) {
SpringApplication app = new SpringApplication(Application.class);
app.setWebEnvironment(false);
ApplicationContext ctx= app.run(args);
}
}
and the two jobs are configured as presented in the Spring Batch Getting Started tutorial on Spring.io. Here is the configuration file of the first job, the second being configured in the same way.
#Configuration
#EnableBatchProcessing
#Import({StandaloneInfrastructureConfiguration.class, ServicesConfiguration.class})
public class AddPodcastJobConfiguration {
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory stepBuilderFactory;
//reader, writer, processor...
}
To enable modularization I created an AppConfig class, where I define factories for the two jobs:
#Configuration
#EnableBatchProcessing(modular=true)
public class AppConfig {
#Bean
public ApplicationContextFactory addNewPodcastJobs(){
return new GenericApplicationContextFactory(AddPodcastJobConfiguration.class);
}
#Bean
public ApplicationContextFactory newEpisodesNotificationJobs(){
return new GenericApplicationContextFactory(NotifySubscribersJobConfiguration.class);
}
}
P.S. I am new to Spring configuration in Java configuration Spring Boot and Spring Batch...
Just set the "spring.batch.job.names=myJob" property. You could set it as SystemProperty when you launch your application (-Dspring.batch.job.names=myjob). If you have defined this property, spring-batch-starter will only launch the jobs, that are defined by this property.
To run the jobs you like from the main method you can load the the required job configuration bean and the JobLauncher from the application context and then run it:
#ComponentScan
#EnableAutoConfiguration
public class ApplicationWithJobLauncher {
public static void main(String[] args) throws BeansException, JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException, JobParametersInvalidException, InterruptedException {
Log log = LogFactory.getLog(ApplicationWithJobLauncher.class);
SpringApplication app = new SpringApplication(ApplicationWithJobLauncher.class);
app.setWebEnvironment(false);
ConfigurableApplicationContext ctx= app.run(args);
JobLauncher jobLauncher = ctx.getBean(JobLauncher.class);
JobParameters jobParameters = new JobParametersBuilder()
.addDate("date", new Date())
.toJobParameters();
if("1".equals(args[0])){
//addNewPodcastJob
Job addNewPodcastJob = ctx.getBean("addNewPodcastJob", Job.class);
JobExecution jobExecution = jobLauncher.run(addNewPodcastJob, jobParameters);
} else {
jobLauncher.run(ctx.getBean("newEpisodesNotificationJob", Job.class), jobParameters);
}
System.exit(0);
}
}
What was causing my lots of confusion was that the second job were executed, even though the first job seemed to be "picked up" by the runner... Well the problem was that in both job's configuration file I used standard method names writer(), reader(), processor() and step() and it used the ones from the second job that seemed to "overwrite" the ones from the first job without any warnings...
I used though an application config class with #EnableBatchProcessing(modular=true), that I thought would be used magically by Spring Boot :
#Configuration
#EnableBatchProcessing(modular=true)
public class AppConfig {
#Bean
public ApplicationContextFactory addNewPodcastJobs(){
return new GenericApplicationContextFactory(AddPodcastJobConfiguration.class);
}
#Bean
public ApplicationContextFactory newEpisodesNotificationJobs(){
return new GenericApplicationContextFactory(NotifySubscribersJobConfiguration.class);
}
}
I will write a blog post about it when it is ready, but until then the code is available at https://github.com/podcastpedia/podcastpedia-batch (work/learning in progress)..
There is the CommandLineJobRunner and maybe can be helpful.
From its javadoc
Basic launcher for starting jobs from the command line
Spring Batch auto configuration is enabled by adding #EnableBatchProcessing (from Spring Batch) somewhere in your context. By default it executes all Jobs in the application context on startup (see JobLauncherCommandLineRunner for details). You can narrow down to a specific job or jobs by specifying spring.batch.job.names (comma separated job name patterns).
-- Spring Boot Doc
Or disable the auto execution and run the jobs programmatically from the context using a JobLauncher based on the args passed to the main method

Using Quartz with Spring Boot - injection order changes based upon return type of method

I am trying to get Quartz working with Spring Boot, and am not managing to get the injection working correctly. I am basing myself on the example shown here
Here is my boot class:
#ComponentScan
#EnableAutoConfiguration
public class MyApp {
#Autowired
private DataSource dataSource;
#Bean
public JobFactory jobFactory() {
return new SpringBeanJobFactory();
}
#Bean
public SchedulerFactoryBean quartz() {
final SchedulerFactoryBean bean = new SchedulerFactoryBean();
bean.setJobFactory(jobFactory());
bean.setDataSource(dataSource);
bean.setConfigLocation(new ClassPathResource("quartz.properties"));
...
return bean;
}
public static void main(String[] args) {
SpringApplication.run(MyApp.class, args);
}
}
When the quartz() method is invoked by Spring, dataSource is null. However, if I change the return type of the quartz() method to Object, dataSource is correctly injected with the datasource created by reading application.properties, the bean is built, everything works and I get a subsequent error saying that Quartz has been unable to retrieve any jobs from the database, which is normal as I haven't put the schema in place yet.
I have tried adding a #DependsOn("dataSource") annotation on the quartz() method but that doesn't make any difference.
This class is the only class annotated with #Configuration.
Here are my dependencies (I'm using Maven but present them like this for space reasons):
org.springframework.boot:spring-boot-starter-actuator:1.0.0.RC4
org.springframework.boot:spring-boot-starter-jdbc:1.0.0.RC4
org.springframework.boot:spring-boot-starter-web:1.0.0.RC4
org.quartz-scheduler:quartz:2.2.1
org.springframework:spring-support:2.0.8
And the parent:
org.springframework.boot:spring-boot-starter-parent:1.0.0.RC4
Finally the content of quartz.properties:
org.quartz.threadPool.threadCount = 3
org.quartz.jobStore.class=org.springframework.scheduling.quartz.LocalDataSourceJobStore
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
What am I doing wrong?
(I have seen this question, but that question initialises the datasource in the #Configuration class)
Your app starts up (with a schema error, which is expected) if I use "org.springframework:spring-context-support:4.0.2.RELEASE" ("org.springframework:spring-support:2.0.8" if it ever existed must be nearly 10 years old now and certainly isn't compatible with Boot or Quartz 2).

Resources