I have a spring batch job that I'd like to do the following...
Step 1 -
Tasklet - Create a list of dates, store the list of dates in the job execution context.
Step 2 -
JDBC Item Reader - Get list of dates from job execution context.
Get element(0) in dates list. Use is as input for jdbc query.
Store element(0) date is job execution context
Remove element(0) date from list of dates
Store element(0) date in job execution context
Flat File Item Writer - Get element(0) date from job execution context and use for file name.
Then using a job listener repeat step 2 until no remaining dates in the list of dates.
I've created the job and it works okay for the first execution of step 2. But step 2 is not repeating as I want it to. I know this because when I debug through my code it only breaks for the initial run of step 2.
It does however continue to give me messages like below as if it is running step 2 even when I know it is not.
2016-08-10 22:20:57.842 INFO 11784 --- [ main] o.s.batch.core.job.SimpleStepHandler : Duplicate step [readStgDbAndExportMasterListStep] detected in execution of job=[exportMasterListCsv]. If either step fails, both will be executed again on restart.
2016-08-10 22:20:57.846 INFO 11784 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [readStgDbAndExportMasterListStep]
This ends up in a never ending loop.
Could someone help me figure out or give a suggestion as to why my stpe 2 is only running once?
thanks in advance
I've added two links to PasteBin for my code so as not to pollute this post.
http://pastebin.com/QhExNikm (Job Config)
http://pastebin.com/sscKKWRk (Common Job Config)
http://pastebin.com/Nn74zTpS (Step execution listener)
From your question and your code I deduct that based on the amount of dates that you retrieve (this happens before the actual job starts), you will execute a step for the amount of times you have dates.
I suggest a design change. Create a java class that will get you the dates as a list and based on that list you will dynamically create your steps. Something like this:
#EnableBatchProcessing
public class JobConfig {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private JobDatesCreator jobDatesCreator;
#Bean
public Job executeMyJob() {
List<Step> steps = new ArrayList<Step>();
for (String date : jobDatesCreator.getDates()) {
steps.add(createStep(date));
}
return jobBuilderFactory.get("executeMyJob")
.start(createParallelFlow(steps))
.end()
.build();
}
private Step createStep(String date){
return stepBuilderFactory.get("readStgDbAndExportMasterListStep" + date)
.chunk(your_chunksize)
.reader(your_reader)
.processor(your_processor)
.writer(your_writer)
.build();
}
private Flow createParallelFlow(List<Step> steps) {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
// max multithreading = -1, no multithreading = 1, smart size = steps.size()
taskExecutor.setConcurrencyLimit(1);
List<Flow> flows = steps.stream()
.map(step -> new FlowBuilder<Flow>("flow_" + step.getName()).start(step).build())
.collect(Collectors.toList());
return new FlowBuilder<SimpleFlow>("parallelStepsFlow")
.split(taskExecutor)
.add(flows.toArray(new Flow[flows.size()]))
.build();
}
}
EDIT: added "jobParameter" input (slightly different approach also)
Somewhere on your classpath add the following example .properties file:
sql.statement="select * from awesome"
and add the following annotation to your JobDatesCreator class
#PropertySource("classpath:example.properties")
You can provide specific sql statements as a command line argument as well. From the spring documentation:
you can launch with a specific command line switch (e.g. java -jar
app.jar --name="Spring").
For more info on that see http://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html
The class that gets your dates (why use a tasklet for this?):
#PropertySource("classpath:example.properties")
public class JobDatesCreator {
#Value("${sql.statement}")
private String sqlStatement;
#Autowired
private CommonExportFromStagingDbJobConfig commonJobConfig;
private List<String> dates;
#PostConstruct
private void init(){
// Execute your logic here for getting the data you need.
JdbcTemplate jdbcTemplate = new JdbcTemplate(commonJobConfig.onlineStagingDb);
// acces to your sql statement provided in a property file or as a command line argument
System.out.println("This is the sql statement I provided in my external property: " + sqlStatement);
// for now..
dates = new ArrayList<>();
dates.add("date 1");
dates.add("date 2");
}
public List<String> getDates() {
return dates;
}
public void setDates(List<String> dates) {
this.dates = dates;
}
}
I also noticed that you have alot of duplicate code that you can quite easily refactor. Now for each writer you have something like this:
#Bean
public FlatFileItemWriter<MasterList> division10MasterListFileWriter() {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(outDir, MerchHierarchyConstants.DIVISION_NO_10 )));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
Consider using something like this instead:
public FlatFileItemWriter<MasterList> divisionMasterListFileWriter(String divisionNumber) {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(outDir, divisionNumber )));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
As not all code is available to correctly replicate your issue, this answer is a suggestion/indication to solve your problem.
Based on our discussion on Spring batch execute dynamically generated steps in a tasklet I'm trying to answer the questions on how to access jobParameter before the job is actually being executed.
I assume that there is restcall which will execute the batch. In general, this will require the following steps to be taken.
1. a piece of code that receives the rest call with its parameters
2. creation of a new springcontext (there are ways to reuse an existing context and launch the job again but there are some issues when it comes to reuse of steps, readers and writers)
3. launch the job
The simplest solution would be to store the jobparameter received from the service as an system-property and then access this property when you build up the job in step 3. But this could lead to a problem if more than one user starts the job at the same moment.
There are other ways to pass parameters into the springcontext, when it is loaded. But that depends on the way you setup your context.
For instance, if you are using SpringBoot directly for step 2, you could write a method like:
private int startJob(Properties jobParamsAsProps) {
SpringApplication springApp = new SpringApplication(.. my config classes ..);
springApp.setDefaultProperties(jobParamsAsProps);
ConfigurableApplicationContext context = springApp.run();
ExitCodeGenerator exitCodeGen = context.getBean(ExitCodeGenerator.class);
int code = exitCodeGen.getExitCode();
context.close();
return cod;
}
This way, you could access the properties as normal with standard Value- or ConfigurationProperties Annotations.
Related
I would like to gain some clarity on exactly how a Partitioner is supposed to work. I implemented the SimplePartitioner (helper class provided by Spring) into a JdbcPagingItemReader, it working for the most part but the issue is that I see duplicate data/records come through the ItemWriter. First I thought there was an issue with my query returning dupes, that is not the case. Right off the bat, the first couple of chunks that hit the writer are the same records. Reading through the docs does not clarify enough about how the Partitioner is supposed to work. Are the dupes on multiple partitions supposed to happen? Perhaps a miss configuration on setting up the steps for partitioning? A simple example or a break down of how Partitioners actually work would be most helpful. Example of my setup below:
#Bean
public Job job(#Qualifier("step-one") #Autowired Step stepOne) {
return jobBuilderFactory.get("my-first-spring-batch")
.start(stepOne)
.build();
}
#Bean(name = "PagingItemReader")
#StepScope
public JdbcPagingItemReader<Product> pagingItemReader(#Autowired PagingQueryProvider queryProvider) {
return new JdbcPagingItemReaderBuilder<Product>().name("paging-reader")
.dataSource(dataSource)
.queryProvider(queryProvider)
.rowMapper(new TheMapper())
.pageSize(100)
.maxItemCount(1000)
.saveState(false)
.build();
}
#Bean(name = "step-one")
public Step stepOne(#Qualifier("step-two") #Autowired Step stepTwo, #Autowired TaskExecutor taskExecutor, #Autowired SimplePartitioner partitioner) {
return stepBuilderFactory.get("step-one-partition")
.partitioner("step-partition", partitioner)
.gridSize(2)
.step(stepTwo)
.taskExecutor(taskExecutor)
.build();
}
#Bean(name = "step-two")
public Step stepTwo(#Autowired JdbcPagingItemReader<Product> pagingItemReader) {
return stepBuilderFactory.get("step-two")
.<Foo, Foo>chunk(10)
.reader(pagingItemReader)
.processor(itemProcessor)
.writer(itemWriter)
.build();
}
#Bean
public SimplePartitioner partitioner() {
SimplePartitioner partitioner = new SimplePartitioner();
partitioner.partition(10);
return partitioner;
}
The partitioner is the piece that understands data and how to partition it. In your case, you are reading a db table so the partitioner should be implemented in a way to partition the table in a set of non-overlapping partitions, for example with IDs:
Partition 1: 1..1000
Partition 2: 1001..2000
etc
This meta-data about partitions is then communicated to workers through ExectuionContext instances.
In your case, you use the SimplePartitioner, but this partitioner does not partition the table, it only creates "empty" execution contexts with numbered partition names as key. Hence, your workers will actually read the same data which explains the behaviour your are seeing.
A better example to look at is the ColumnRangePartitioner and how to configure each reader from worker step to read from a given partition, see: https://github.com/spring-projects/spring-batch/blob/8762e3411557aaf887867f8d8594b01127538cb1/spring-batch-samples/src/main/resources/jobs/partitionJdbcJob.xml#L36-L60
I have the following requirement:
An endpoint http://localhost:8080/myapp/jobExecution/myJobName/execute which receives a CSV and use univocity to apply some validations and generate a List of some pojo.
Send that list to a Spring Batch Job for some processing.
Multiple users could do this.
I want to know if with Spring Batch I can achieve this?
I was thinking to use a queue, put the data and execute a Job that pull objects from that queue. But how can I be sure that if other person execute the endpoint and other Job is executing, Spring Batch Knows which Item belongs to a certain execution?
You can use a queue or go ahead to put the list of values that was generated after the step with validations and store it as part of job parameters in the job execution context.
Below is a snippet to store the list to a job context and read the list using an ItemReader.
Snippet implements StepExecutionListener in a Tasklet step to put List which was constructed,
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
//tenantNames is a List<String> which was constructed as an output of an evaluation logic
stepExecution.getJobExecution().getExecutionContext().put("listOfTenants", tenantNames);
return ExitStatus.COMPLETED;
}
Now "listOfTenants" are read as part of a Step which has Reader (To allow one thread read at a time), Processor and Writer. You can also store it as a part of Queue and fetch it in a Reader. Snippet for reference,
public class ReaderStep implements ItemReader<String>, StepExecutionListener {
private List<String> tenantNames;
#Override
public void beforeStep(StepExecution stepExecution) {
try {
tenantNames = (List<String>)stepExecution.getJobExecution().getExecutionContext()
.get("listOfTenants");
logger.debug("Sucessfully fetched the tenant list from the context");
} catch (Exception e) {
// Exception block
}
}
#Override
public synchronized String read() throws Exception {
String tenantName = null;
if(tenantNames.size() > 0) {
tenantName = tenantNames.get(0);
tenantNames.remove(0);
return tenantName;
}
logger.info("Completed reading all tenant names");
return null;
}
// Rest of the overridden methods of this class..
}
Yes. Spring boot would execute these jobs in different threads. So Spring knows which items belongs to which execution.
Note: You can use like logging correlation id. This will help you filter the logs for a particular request. https://dzone.com/articles/correlation-id-for-logging-in-microservices
I have came across a situation where I need to fetch cron expression from database and then schedule it in Spring boot. I am fetching the data using JPA. Now the problem is in spring boot when I use #Scheduled annotation it does not allow me to use the db value directly as it is taken only constant value. So, what I am planning to do is to dynamically generate properties file and read cron expression from properties file. But here also I am facing one problem.The dynamically generated properties file created in target directory.
So I cant use it the time of program loading.
So can anyone assist me to read the dynamically generated file from the resource folder or how to schedule cron expression fetching from DB in spring boot?
If I placed all the details of corn expression in properties file I can schedule the job.
Latest try with dynamically generate properties file.
#Configuration
public class CronConfiguration {
#Autowired
private JobRepository jobRepository;
#Autowired
private ResourceLoader resourceLoader;
#PostConstruct
protected void initialize() {
updateConfiguration();
}
private void updateConfiguration() {
Properties properties = new Properties();
List<Job> morningJobList=new ArrayList<Job>();
List<String> morningJobCornExp=new ArrayList<String>();
// Map<String,String> map=new HashMap<>();
int num=1;
System.out.println("started");
morningJobList= jobRepository.findByDescriptionContaining("Morning Job");
for(Job job:morningJobList) {
//morningJobURL.add(job.getJobUrl());
morningJobCornExp.add(job.getCronExp());
}
for(String cron:morningJobCornExp ) {
properties.setProperty("cron.expression"+num+"=", cron);
num++;
}
Resource propertiesResource = resourceLoader.getResource("classpath:application1.properties");
try (OutputStream out = new BufferedOutputStream(new FileOutputStream(propertiesResource.getFile()))) {
properties.store(out, null);
} catch (Exception ex) {
// Handle error
ex.printStackTrace();
}
}
}
Still it is not able to write in properties file under resource folder.
Consider using Quartz Scheduler framework. It stores scheduler info in DB. No need to implement own DB communication, it is already provided.
Found this example: https://www.callicoder.com/spring-boot-quartz-scheduler-email-scheduling-example/
This is my first code in SpringBatch using SpringBoot, I am implementing a sample usecase.
This is the exact pseudo-code i want to implement in SpringBatch, can you please help:
It's not feasible to fetch all hotel details(>3 million records) at one shot and process it, so i decided to fetch 1 hotel(50,000 records) at a time and process it and write to DB. Want to repeat this step for each and every hotelID as described below. Does this use-case suitable for SpringBatch ?
List<Integer> allHotelIDs = execute("select distinct(hotelid) from Hotels");
List items = new Arraylist();
allHotelIDs.forEach(hotelID -> {
Object item = itemReader.jdbcReader(hotelID, dataSource);
Object processedItem = itemProcessor.process(item);
items.add(processedItem);
});
itemWriter.write(items);
I am able to pass only 1 hotelid, how can i invoke it multiple times for all list hotels ?
#Bean
Job job(JobBuilderFactory jbf, StepBuilderFactory sbf, DBReaderWriter step1) throws Exception {
Step db2db = sbf.get("db-db").<Table, List<Tendency>>chunk(1000)
.reader(step1.jdbcReader(hotelID, dataSource))
.processor(processor())
.writer(receivableWriter()).build();
return jbf.get("etl").incrementer(new RunIdIncrementer()).start(db2db).build();
}
Reader code:
#Configuration
public class DBReader {
#Bean
public ItemReader<Table> jdbcReader(Integer hotelID, DataSource dataSource) {
return new JdbcCursorItemReaderBuilder<Table>().dataSource(dataSource).name("jdbc-reader")
.sql("SELECT * FROM Hotels where hotelid ="+hotelID).rowMapper((rs, i) -> {
return read().db(rs, "Hotels");
}).build();
}
}
Thanks.
I am writing various jobs using spring batch with java configuration.
I need to get the current state of the job
e.g.
which steps are currently running (I may have multiple steps running at the same time)
Which steps failed (the status and exit code)
etc.
The only examples I see online are of XML based spring batch and I want to use java config only.
Thanks.
Another option is to use JobExplorer
Entry point for browsing executions of running or historical jobs and steps. Since the data may be re-hydrated from persistent storage, it may not contain volatile fields that would have been present when the execution was active.
List<JobExecution> jobExecutions = jobExplorer.getJobExecutions(jobInstance);
for (JobExecution jobExecution : jobExecutions) {
jobExecution.getStepExecutions();
//read step info
}
And for create jobExplorer you have to use the factory:
import org.springframework.batch.core.explore.support.JobExplorerFactoryBean;
JobExplorerFactoryBean factory = new JobExplorerFactoryBean();
factory.setDataSource(dataSource);
factory.getObject();
I use these two queries on spring batch meta data tables to know about job progress and step details.
SELECT * FROM BATCH_JOB_EXECUTION ORDER BY START_TIME DESC;
SELECT * FROM BATCH_STEP_EXECUTION WHERE JOB_EXECUTION_ID=? ORDER BY STATUS;
With First query, I first find JOB_EXECUTION_ID corresponding to my job execution then use that id in second query to find details about specific steps.
Additionally, your config choice ( Java or XML ) has nothing to do with Spring Batch meta data. If you are persisting data then it doesn't matter if its XML config or Java Config.
For Java based monitoring- you can use JobExplorer & JobRepository beans to query jobs etc.
e.g. List<JobInstance> from jobExplorer.getJobInstances & jobExplorer.getJobExecutions(jobInstance) etc.
From JobExecutions you can get StepExecutions and so on.
You might have to set JobRegistryBeanPostProcessor bean like below for JobExplorer & JobRepository to work properly.
#Bean
public JobRegistryBeanPostProcessor jobRegistryBeanPostProcessor(
JobRegistry jobRegistry) {
JobRegistryBeanPostProcessor jobRegistryBeanPostProcessor = new JobRegistryBeanPostProcessor();
jobRegistryBeanPostProcessor.setJobRegistry(jobRegistry);
return jobRegistryBeanPostProcessor;
}