How spring batch share data between job - spring

I have one query on Spring Batch Job.
I want to share data of one job with another job in same execution context. Is it possible? If so, then how?
My requirement is caching. I have file, where some data is stored. My job runs daily and need data of that file. I don't want to read file by my job daily. instead of it, I want to store data of file in cache(Hash Map). So when same job runs next day, it will use data from cache only. Is it possible in spring batch?
Your suggestion are welcome.

You can use spring initialize bean which initializes your cache at startup.
Add initialize bean to your application context;
<bean id="yourCacheBean" class="yourpackage.YourCacheBean" init-method="initialize">
</bean>
YourCacheBean looks like;
public class YourCacheBean {
private Map<Object, Object> yourCache;
public void initialize() {
//TODO: Intialize your cache
}
}
Give the initialize bean to the itemReader or itemProcessor or itemWriter in job.xml;
<bean id="exampleProcessor" class="yourpackage.ExampleProcessor" scope="step">
<property name="cacheBean" ref="yourCacheBean" />
</bean>
ExampleProcessor looks like;
public class ExampleProcessor implements ItemProcessor<String, String> {
private YourCacheBean cacheBean;
public String process(String arg0) {
return "";
}
public void setCacheBean(YourCacheBean cacheBean) {
this.cacheBean = cacheBean;
}
}

Create a job to import file into database. Other jobs will use data from database as a cache.
Another way may be to read file into a Map<> and serialize object to a file than de-serialize when need (but I still prefer database as cache)

Spring have a cache annotation that may help that kind of case and it is really easy to implement. The first call to a method will be executed, afterwards if you call the same method with exactly the same arguments, the value will be returned by the cache.
Here you have a little tutorial: http://www.baeldung.com/spring-cache-tutorial
In your case, if your call to read the file is always with the same arguments will work as you want. Just take care of TTL.

Related

Pass data from one writer to another writer after reading from DB

I have to create a batch job where I need to fetch data from 1 DB and after processing dump that data to another DB where auto generated ID would be assigned to persisted data. I need to send that data along with generated ID to solace queue.
Reader(DB1) --data1--> Processor --data2--> Writer (DB2) --data3--> Writer (Solace Publisher)
I am using spring boot-2.2.5.RELEASE and spring-boot-starter-batch.
I have created a job having 1 step that read data from DB1 and write data to DB2 via RepositoryItemReader and RepositoryItemWriter respectively. This is working fine.
Now next task is to send persisted data having generated ID to solace stream (using spring-cloud-starter-stream-solace).
I have below questions. Please assist as I am totally new to spring batch
How can I get the complete record after it's saved to DB2 based on some parameter? Do I have to write my own RepositoryItemWriter having StepExecution Context or can I somehow use the existing RepositoryItemWriter.
Once I got the record I need to use solace stream and there I have publish method which expects argument(record) to be published. I think again I need to write my own Item Writer and either I could use the record passed from above repositoryItemWriter by StepExecutionContext or should I query into DB2 directly from here based on some parameter ?
Either of the above case I need to use stepexecution context but can I use available RepositoryItemWriter or do I have to write my own?
Is there any other concept which is handy in this handy instead of using above approaches?
Passing data to future steps is a common pattern in Spring Batch. According to the documentation https://docs.spring.io/spring-batch/docs/current/reference/html/common-patterns.html#passingDataToFutureSteps you can use stepExecution to store and retrieve your generated IDs. In your case the writers are also listeners which has before step methods annotated with #BeforeStep. For example:
public class DB2ItemWriter implements ItemWriter<Object> {
private StepExecution stepExecution;
public void write(List<? extends Object> items) throws Exception {
// ...
ExecutionContext stepContext = this.stepExecution.getExecutionContext();
stepContext.put("generatedIds", ids);
}
#BeforeStep
public void saveStepExecution(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
}
and then you retrieve the ids in the next writer
public class SolacePublisherItemWriter implements ItemWriter<Object> {
public void write(List<? extends Object> items) throws Exception {
// ...
}
#BeforeStep
public void retrieveGeneratedIds(StepExecution stepExecution) {
ExecutionContext stepExecutionContext = stepExecution.getExecutionContext();
this.generatedIds = stepExecutionContext.get("generatedIds");
}
}
I have created a job having 1 step that read data from DB1 and write data to DB2 via RepositoryItemReader and RepositoryItemWriter respectively. This is working fine.
I would add a second step that reads data from the table (in which records have been persisted by step 1 and have their IDs generated) and push it to solace using a custom writer.

Spring Batch CompositeItemProcessor get value from other delegates

I have a compositeItemProcessor as below
<bean id="compositeItemProcessor" class="org.springframework.batch.item.support.CompositeItemProcessor">
<property name="delegates">
<list>
<bean class="com.example.itemProcessor1"/>
<bean class="com.example.itemProcessor2"/>
<bean class="com.example.itemProcessor3"/>
<bean class="com.example.itemProcessor4"/>
</list>
</property>
</bean>
The issue i have is that within itemProcessor4 i require values from both itemProcessor1 and itemProcessor3.
I have looked at using the Step Execution Context but this does not work as this is within one step. I have also looked at using #AfterProcess within ItemProcessor1 but this does not work as it isn't called until after ItemProcessor4.
What is the correct way to share data between delegates in a compositeItemProcessor?
Is a solution of using util:map that is updated in itemProcessor1 and read in itemProcessor4 under the circumstances that the commit-interval is set to 1?
Using the step execution context won't work as it is persisted at chunk boundary, so it can't be shared between processors within the same chunk.
AfterProcess is called after the registered item processor, which is the composite processor in your case (so after ItemProcessor4). This won't work neither.
The only option left is to use some data holder object that you share between item processors.
Hope this helps.
This page seems to state that there are two types of ExecutionContexts, one at step-level, one at job-level.
https://docs.spring.io/spring-batch/trunk/reference/html/patterns.html#passingDataToFutureSteps
You should be able to get the job context and set keys on that, from the step context
I had a similar requirement in my application too. I went with creating a data transfer object ItemProcessorDto which will be shared by all the ItemProcessors. You can store data in this DTO object in first processor and all the remaining processors will get the information out of this DTO object. In addition to that any ItemProcessor could update or retrieve the data out of the DTO.
Below is a code snippet:
#Bean
public ItemProcessor1<ItemProcessorDto> itemProcessor1() {
log.info("Generating ItemProcessor1");
return new ItemProcessor1();
}
#Bean
public ItemProcessor2<ItemProcessorDto> itemProcessor2() {
log.info("Generating ItemProcessor2");
return new ItemProcessor2();
}
#Bean
public ItemProcessor3<ItemProcessorDto> itemProcessor3() {
log.info("Generating ItemProcessor3");
return new ItemProcessor3();
}
#Bean
public ItemProcessor4<ItemProcessorDto> itemProcessor4() {
log.info("Generating ItemProcessor4");
return new ItemProcessor4();
}
#Bean
#StepScope
public CompositeItemProcessor<ItemProcessorDto> compositeItemProcessor() {
log.info("Generating CompositeItemProcessor");
CompositeItemProcessor<ItemProcessorDto> compositeItemProcessor = new CompositeItemProcessor<>();
compositeItemProcessor.setDelegates(Arrays.asList(itemProcessor1(), itemProcessor2(), itemProcessor3), itemProcessor4()));
return compositeItemProcessor;
}
#Data
public class ItemProcessorDto {
private List<String> sharedData_1;
private Map<String, String> sharedData_2;
}

Reset state before each Spring scheduled (#Scheduled) run

I have a Spring Boot Batch application that needs to run daily. It reads a daily file, does some processing on its data, and writes the processed data to a database. Along the way, the application holds some state such as the file to be read (stored in the FlatFileItemReader and JobParameters), the current date and time of the run, some file data for comparison between read items, etc.
One option for scheduling is to use Spring's #Scheduled such as:
#Scheduled(cron = "${schedule}")
public void runJob() throws Exception {
jobRunner.runJob(); //runs the batch job by calling jobLauncher.run(job, jobParameters);
}
The problem here is that the state is maintained between runs. So, I have to update the file to be read, the current date and time of the run, clear the cached file data, etc.
Another option is to run the application via a unix cron job. This will obviously meet the need to clear state between runs but I prefer to tie the job scheduling to the application instead of the OS (and prefer it to OS agnostic). Can the application state be reset between #Scheduled runs?
You could always move the code that performs your task (and more importantly, keeps your state) into a prototype-scoped bean. Then you can retrieve a fresh instance of that bean from the application context every time your scheduled method is run.
Example
I created a GitHub repository which contains a working example of what I'm talking about, but the gist of it is in these two classes:
ScheduledTask.java
Notice the #Scope annotation. It specifies that this component should not be a singleton. The randomNumber field represents the state that we want to reset with every invocation. "Reset" in this case means that a new random number is generated, just to show that it does change.
#Component
#Scope(ConfigurableBeanFactory.SCOPE_PROTOTYPE)
class ScheduledTask {
private double randomNumber = Math.random();
void execute() {
System.out.printf(
"Executing task from %s. Random number is %f%n",
this,
randomNumber
);
}
}
TaskScheduler.java
By autowiring in ApplicationContext, you can use it inside the scheduledTask method to retrieve a new instance of ScheduledTask.
#Component
public class TaskScheduler {
#Autowired
private ApplicationContext applicationContext;
#Scheduled(cron = "0/5 * * * * *")
public void scheduleTask() {
ScheduledTask task = applicationContext.getBean(ScheduledTask.class);
task.execute();
}
}
Output
When running the code, here's an example of what it looks like:
Executing task from com.thomaskasene.example.schedule.reset.ScheduledTask#329c8d3d. Random number is 0.007027
Executing task from com.thomaskasene.example.schedule.reset.ScheduledTask#3c5b751e. Random number is 0.145520
Executing task from com.thomaskasene.example.schedule.reset.ScheduledTask#3864e64d. Random number is 0.268644
Thomas' approach seems to be a reasonable solution, that's why I upvoted it. What is missing is how this can be applied in the case of a spring batch job. Therefore I adapted his example little bit:
#Component
public class JobCreatorComponent {
#Bean
#Scope(ConfigurableBeanFactory.SCOPE_PROTOTYPE)
public Job createJob() {
// use the jobBuilderFactory to create your job as usual
return jobBuilderFactory.get() ...
}
}
your component with the launch method
#Component
public class ScheduledLauncher {
#Autowired
private ... jobRunner;
#Autwired
private JobCreatorComponent creator;
#Scheduled(cron = "${schedule}")
public void runJob() throws Exception {
// it would probably make sense to check the applicationContext and
// remove any existing job
creator.createJob(); // this should create a complete new instance of
// the Job
jobRunner.runJob(); //runs the batch job by calling jobLauncher.run(job, jobParameters);
}
I haven't tried out the code, but this is the approach I would try.
When constructing the job, it is important to ensure that all reader, processors and writers used in this job are complete new instances as well. This means, if they are not instantiated as pure java objects (not as spring beans) or as spring beans with scope "step" you must ensure that always a new instance is used.
Edited:
How to handle SingeltonBeans
Sometimes singleton beans cannot be prevented, in these cases there must be a way to "reset" them.
An simple approach would be to define an interface "ResetableBean" with a reset method that is implemented by such beans. Autowired can then be used to collect a list of all such beans.
#Component
public class ScheduledLauncher {
#Autowired
private List<ResetableBean> resetables;
...
#Scheduled(cron = "${schedule}")
public void runJob() throws Exception {
// reset all the singletons
resetables.forEach(bean -> bean.reset());
...

Call certain methods alone in a class using spring/pojo?

I am implementing a health check for my application.I have configured the classes for different logical systems in our application and have written methods which check for conditions across the environment like db count , logging errors , cpu process etc.
Now I have requirement where I have to check only certain conditions ie certain methods in the class according to the host.
What is the best way to access those methods via property file ? Please give your suggestions.
Thanks.
I don't like using reflection for this sort of thing. Its too easy for the property files to be changed and then the system starts generating funky error messages.
I prefer something straightforward like:
controlCheck.properties:
dbCount=true
logger=false
cpuProcess=true
Then the code is sort of like this (not real code):
Properties props = ... read property file
boolean isDbCount = getBoolean(props, "dbCount"); // returns false if prop not found
... repeat for all others ...
CheckUtilities.message("Version " + version); // Be sure status show version of this code.
if (isDbCount) {
CheckUtilities.checkDbCount(); // logs all statuses
}
if (... other properties one by one ...) {
... run the corresponding check ...
}
There are lots of ways to do it but this is simple and pretty much foolproof. All the configuration takes place in one properties file and all the code is in one place and easy for a Java programmer to comment out tests that are not relevant or to add new tests. If you add a new test, it doesn't automatically get run everywhere so you can roll it out on your own schedule and you can add a new test with a simple shell script, if you like that.
If its not running a particular test when you think it should, there's only two things to check:
Is it in the properties file?
Is the version of the code correct?
You can define different beans for every check you need:
<bean id="dbcountBean" class="DBCountHealtCheck" scope="prototype">
<!-- bean properties -->
</bean>
Then a bean for HealtCheck with operations bean injected:
<bean id="healtChecker" class="HealtChecker" scope="prototype">
<property name="activeChecker"><bean class="PropertiesFactoryBean>
<property name="location">file:path/to/file</property></bean>
</property>
<property name="dbCountCheck" ref="dbCountBean" />
<!-- other properties -->
</bean>
class HealtChecker {
private DBCountHealtCheck dbCountChecker;
private Properties activeChecker;
public void setDbcount(DBCountHealtCheck dbCountChecker) {
this.dbCountChecker = dbCountChecker;
}
public void setActiveChecker(Properties activeChecker) {
this.activeChecker = activeChecker;
}
public void check() {
if("true".equals(this.activeChecker.get("dbCount")) {
this.dbCountChecker.check();
}
}
If with this solution you can't reload file, in HealthChecker remove activeChecker a property public void setPropertiesLocation(URL propertiesLocation); let HealthChecker implements InitializingBean and load properties in afterPropertiesSet()

What is the Best approach to run a long process using Spring

I would like to ask what is the best approach to run a long process using Spring. I have a webapp and when the client does a request it runs a Spring controller. This Controller would get some parameters from the request and then runs a query and fetch records from the DB.
The records from the DB are high, i need to do a comparing logic which may take a long time, so I need to run it separately.
When this process is executed , it should write the final results into an excel file and mail it.
You can use the annotation #Async to return immediately.
Fisrt, write a #Service class to process you DB and Excel job.
#Service
public class AccountService {
#Async
public void executeTask(){
// DB and Excel job
}
}
Then, In controller method trigger the task
#Controller
public class taskController{
#RequestMapping(value = "as")
#ResponseBody
public ResultInfo async() throws Exception{
accountService.executeTask();
return new ResultInfo(0, "success", null);
}
}
At last, add this to application-context.xml(spring config file)
<task:annotation-driven executor="taskExecutor"/>
<task:executor id="taskExecutor" pool-size="10"/>
Hope this will help you.

Resources