Spring Batch - Conditional Read from database - spring-boot

This is my first code in SpringBatch using SpringBoot, I am implementing a sample usecase.
This is the exact pseudo-code i want to implement in SpringBatch, can you please help:
It's not feasible to fetch all hotel details(>3 million records) at one shot and process it, so i decided to fetch 1 hotel(50,000 records) at a time and process it and write to DB. Want to repeat this step for each and every hotelID as described below. Does this use-case suitable for SpringBatch ?
List<Integer> allHotelIDs = execute("select distinct(hotelid) from Hotels");
List items = new Arraylist();
allHotelIDs.forEach(hotelID -> {
Object item = itemReader.jdbcReader(hotelID, dataSource);
Object processedItem = itemProcessor.process(item);
items.add(processedItem);
});
itemWriter.write(items);
I am able to pass only 1 hotelid, how can i invoke it multiple times for all list hotels ?
#Bean
Job job(JobBuilderFactory jbf, StepBuilderFactory sbf, DBReaderWriter step1) throws Exception {
Step db2db = sbf.get("db-db").<Table, List<Tendency>>chunk(1000)
.reader(step1.jdbcReader(hotelID, dataSource))
.processor(processor())
.writer(receivableWriter()).build();
return jbf.get("etl").incrementer(new RunIdIncrementer()).start(db2db).build();
}
Reader code:
#Configuration
public class DBReader {
#Bean
public ItemReader<Table> jdbcReader(Integer hotelID, DataSource dataSource) {
return new JdbcCursorItemReaderBuilder<Table>().dataSource(dataSource).name("jdbc-reader")
.sql("SELECT * FROM Hotels where hotelid ="+hotelID).rowMapper((rs, i) -> {
return read().db(rs, "Hotels");
}).build();
}
}
Thanks.

Related

Can we get data processed in Spring Batch after batch job is completed?

I am using spring batch for reading data from db and process the same and do spome process in writer.
if batch size is less than the records read by reader then spring batch runs in multiple batches.I want to do the processing in writer only once at the end of all batch process completion or if this is not possible then i will remove writer and process the data obtained in processor after batch job is completed.Is this possible?
Below is my trigger Spring Batch job code
private void triggerSpringBatchJob() {
loggerConfig.logDebug(log, " : Triggering product catalog scheduler ");
JobParametersBuilder builder = new JobParametersBuilder();
try {
// Adding date in buildJobParameters because if not added we will get A job
// instance already exists: JobInstanceAlreadyCompleteException
builder.addDate("date", new Date());
jobLauncher.run(processProductCatalog, builder.toJobParameters());
} catch (JobExecutionAlreadyRunningException | JobRestartException | JobInstanceAlreadyCompleteException
| JobParametersInvalidException e) {
e.printStackTrace();
}
}
Below is my spring batch configuration
#Configuration
#EnableBatchProcessing
public class BatchJobProcessConfiguration {
#Bean
#StepScope
RepositoryItemReader<Tuple> reader(SkuRepository skuRepository,
ProductCatalogConfiguration productCatalogConfiguration) {
RepositoryItemReader<Tuple> reader = new RepositoryItemReader<>();
reader.setRepository(skuRepository);
// query parameters
List<Object> queryMethodArguments = new ArrayList<>();
if (productCatalogConfiguration.getSkuId().isEmpty()) {
reader.setMethodName("findByWebEligibleAndDiscontinued");
queryMethodArguments.add(productCatalogConfiguration.getWebEligible()); // for web eligible
queryMethodArguments.add(productCatalogConfiguration.getDiscontinued()); // for discontinued
queryMethodArguments.add(productCatalogConfiguration.getCbdProductId()); // for cbd products
} else {
reader.setMethodName("findBySkuIds");
queryMethodArguments.add(productCatalogConfiguration.getSkuId()); // for sku ids
}
reader.setArguments(queryMethodArguments);
reader.setPageSize(1000);
Map<String, Direction> sorts = new HashMap<>();
sorts.put("sku_id", Direction.ASC);
reader.setSort(sorts);
return reader;
}
#Bean
#StepScope
ItemWriter<ProductCatalogWriterData> writer() {
return new ProductCatalogWriter();
}
#Bean
ProductCatalogProcessor processor() {
return new ProductCatalogProcessor();
}
#Bean
SkipPolicy readerSkipper() {
return new ReaderSkipper();
#Bean
Step productCatalogDataStep(ItemReader<Tuple> itemReader, ProductCatalogWriter writer,
HttpServletRequest request, StepBuilderFactory stepBuilderFactory,BatchConfiguration batchConfiguration) {
return stepBuilderFactory.get("processProductCatalog").<Tuple, ProductCatalogWriterData>chunk(batchConfiguration.getBatchChunkSize())
.reader(itemReader).faultTolerant().skipPolicy(readerSkipper()).processor(processor()).writer(writer).build();
}
#Bean
Job productCatalogData(Step productCatalogDataStep, HttpServletRequest request,
JobBuilderFactory jobBuilderFactory) {
return jobBuilderFactory.get("processProductCatalog").incrementer(new RunIdIncrementer())
.flow(productCatalogDataStep).end().build();
}
}
want to do the processing in writer only once at the end of all batch process completion or if this is not possible then i will remove writer and process the data obtained in processor after batch job is completed.Is this possible?
"at the end of all batch process completion" is key here. If the requirement is to do some processing after all chunks have been "pre-processed", I would keep it simple and use two steps for that:
Step 1: (pre)processes the data as needed and writes it to a temporary storage
Step 2: Here you do whatever you want with the processed data prepared in the temporary storage
A final step would clean up the temporary storage if it is persistent (file, staging table, etc). Otherwise, ie if it is in memory, this is optional.

Spring Batch - delete

How can I do the deletion of the entities that I just persisted?
#Bean
public Job job() {
return this.jobBuilderFactory.get("job")
.start(this.syncStep())
.build();
}
#Bean
public Step syncStep() {
// read
RepositoryItemReader<Element1> reader = new RepositoryItemReader<>();
reader.setRepository(repository);
reader.setMethodName("findElements");
reader.setArguments(new ArrayList<>(Arrays.asList(ZonedDateTime.now())));
final HashMap<String, Sort.Direction> sorts = new HashMap<>();
sorts.put("uid", Sort.Direction.ASC);
reader.setSort(sorts);
// write
RepositoryItemWriter<Element1> writer = new RepositoryItemWriter<>();
writer.setRepository(otherrepository);
writer.setMethodName("save");
return stepBuilderFactory.get("syncStep")
.<Element1, Element2> chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
It is a process of dumping elements. We pass the elements from one table to another.
It is a process of dumping elements. We pass the elements from one table to another.
You can do that in two steps. The first step copies items from one table to another. The second step deletes the items from the source table. The second step should be executed only if the first step succeeds.
There are a few options:
Using a CompositeItemWriter
You could create a second ItemWriter that does the delete logic, for example:
RepositoryItemWriter<Element1> deleteWriter = new RepositoryItemWriter<>();
deleteWriter.setRepository(repository);
deleteWriter.setMethodName("delete");
To execute both writers you can use a CompositeItemWriter:
CompositeItemWriter<User> writer = new CompositeItemWriter<>();
// 'saveWriter' would be the writer you currently have
writer.setDelegates(List.of(saveWriter, deleteWriter));
This however won't work if your ItemProcessor transforms the original entity to something completely new. In that case I suggest using PropertyExtractingDelegatingItemWriter.
(Note, according to this question the writers run sequentially and the second writer should not be executed if the first one fails, but I'm not 100% sure on that.)
Using a separate Step
Alternatively, you could put the new writer in an entirely separate Step:
#Bean
public Step cleanupStep() {
// Same reader as before (might want to put this in a separate #Bean)
RepositoryItemReader<Element1> reader = new RepositoryItemReader<>();
// ...
// The 'deleteWriter' from before
RepositoryItemWriter<Element1> deleteWriter = new RepositoryItemWriter<>();
// ...
return stepBuilderFactory.get("cleanupStep")
.<Element1, Element2> chunk(10)
.reader(reader)
.writer(writer)
.build();
}
Now you can schedule the two steps individually:
#Bean
public Job job() {
return this.jobBuilderFactory.get("job")
.start(this.syncStep())
.next(this.cleanupStep())
.build();
}
Using a Tasklet
If you're using a separate step and depending on the amount of data, it might be more interesting to offload it entirely to the database and execute a single delete ... where ... query.
public class CleanupRepositoryTasklet implements Tasklet {
private final Repository repository;
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
repository.customDeleteMethod();
return RepeatStatus.FINISHED;
}
}
This Tasklet can then be registered in the same way as before, by declaring a new Step in your configuration:
return this.stepBuilderFactory.get("cleanupStep")
.tasklet(myTasklet())
.build();

To separate steps class in spring batch

I have tried to find the solution but I cannot... ㅠㅠ
I want to separate steps in a job like below.
step1.class -> step2.class -> step3.class -> done
The reason why I'm so divided is that I have to use queries each step.
#Bean
public Job bundleJob() {
return jobBuilderFactory.get(JOB_NAME)
.start(step1) // bean
.next(step2) // bean
.next(step3()) // and here is the code ex) reader, processor, writer
.build();
}
my purpose is that I have to use the return data in step1, step2.
but jpaItemReader is like async ... so it doesn't process like above order.
debug flow like this.
readerStep1 -> writerStep1 -> readerStep2 -> readerWriter2 -> readerStep3 -> writerStep3
and
-> processorStep1 -> processorStep2 -> processorStep3
that is the big problem to me...
How can I wait each step in a job? Including querying.
aha! I got it.
the point is the creating beans in a configuration.
I wrote annotation bean all kinds of steps so that those are created by spring.
the solution is late binding like #JobScope or #StepScope
#Bean
#StepScope. // late creating bean.
public ListItemReader<Dto> itemReader() {
// business logic
return new ListItemReader<>(dto);
}
To have a separate steps in your job you can use a Flow with a TaskletStep. Sharing a snippet for your reference,
#Bean
public Job processJob() throws Exception {
Flow fetchData = (Flow) new FlowBuilder<>("fetchData")
.start(fetchDataStep()).build();
Flow transformData = (Flow) new FlowBuilder<>("transformData")
.start(transformData()).build();
Job job = jobBuilderFactory.get("processTenantLifeCycleJob").incrementer(new RunIdIncrementer())
.start(fetchData).next(transformData).next(processData()).end()
.listener(jobCompletionListener()).build();
ReferenceJobFactory referenceJobFactory = new ReferenceJobFactory(job);
registry.register(referenceJobFactory);
return job;
}
#Bean
public TaskletStep fetchDataStep() {
return stepBuilderFactory.get("fetchData")
.tasklet(fetchDataValue()).listener(fetchDataStepListener()).build();
}
#Bean
#StepScope
public FetchDataValue fetchDataValue() {
return new FetchDataValue();
}
#Bean
public TaskletStep transformDataStep() {
return stepBuilderFactory.get("transformData")
.tasklet(transformValue()).listener(sendReportDataCompletionListener()).build();
}
#Bean
#StepScope
public TransformValue transformValue() {
return new TransformValue();
}
#Bean
public Step processData() {
return stepBuilderFactory.get("processData").<String, Data>chunk(chunkSize)
.reader(processDataReader()).processor(dataProcessor()).writer(processDataWriter())
.listener(processDataListener())
.taskExecutor(backupTaskExecutor()).build();
}
In this example I have used 2 Flows to Fetch and Transform data which will execute data from a class.
In order to return the value of those from the step 1 and 2, you can store the value in the job context and retrieve that in the ProcessData Step which has a reader, processor and writer.

Spring Batch Runtime Exception in Item processor

I am learning spring batch and trying to understand how item processor works, during exception.
I am reading data from csv file in a chunk of 3 records and process it and write it to Database.
my csv file
Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
John,Doem
Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
Batch Configuration, reading items in chunk of 3 , and skip limit 2
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Bean
public FlatFileItemReader<Person> reader() {
return new FlatFileItemReaderBuilder<Person>().name("personItemReader").resource(new ClassPathResource("sample-data.csv")).delimited()
.names(new String[] { "firstName", "lastName" }).fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {
{
setTargetType(Person.class);
}
}).build();
}
#Bean
public PersonItemProcessor processor() {
return new PersonItemProcessor();
}
#Bean
public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Person>().itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO person (first_name, last_name) VALUES (:firstName, :lastName)").dataSource(dataSource).build();
}
#Bean
public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
return jobBuilderFactory.get("importUserJob").incrementer(new RunIdIncrementer()).listener(listener).flow(step1).end().build();
}
#Bean
public Step step1(JdbcBatchItemWriter<Person> writer) {
return stepBuilderFactory.get("step1").<Person, Person> chunk(3).reader(reader()).processor(processor()).writer(writer).faultTolerant().skipLimit(2)
.skip(Exception.class).build();
}
}
I am trying to simulate a Exception, by throwing Exception manually for one record in my item processor
public class PersonItemProcessor implements ItemProcessor<Person, Person> {
private static final Logger log = LoggerFactory.getLogger(PersonItemProcessor.class);
#Override
public Person process(final Person person) throws Exception {
final String firstName = person.getFirstName().toUpperCase();
final String lastName = person.getLastName().toUpperCase();
final Person transformedPerson = new Person(firstName, lastName);
log.info("Converting (" + person + ") into (" + transformedPerson + ")");
if (person.getLastName().equals("Doem"))
throw new Exception("DOOM");
return transformedPerson;
}
}
Now as per skip limit, when the exception is thrown, the item processor is re processing the chunk and skips the item which throws error and item write also inserts all records in DB , except the one record with exception.
This is all fine, because my processor, it is just converting lower to upper case name, and it can be run many times with out impact.
But lets assume if my item processor, is calling web service and sending data.
and if some exception is thrown after successful calling for web service. then remaining data in the chunk will be processed again (and calling webservice again).
I don't want to call web service again, because it is like sending duplicate data to web service and the webservice system cannot identify duplicate data.
How to handle such case. one option is don't skip Exception, which means my still one record in the chunk will not make it to item writer, even though the processor had called web service. so that is not correct.
other option chunk should be of size 1 , then this may not be efficient in processing thousands of records.
what are the other options ?
According to your description, your item processor is not idempotent. However, the Fault tolerance section of the documentation says that the item processor should be idempotent when using a fault tolerant step. Here is an excerpt:
If a step is configured to be fault tolerant (typically by using skip or retry processing), any ItemProcessor used should be implemented in a way that is idempotent.

Spring Batch Annotated No XML Pass Parameters to Item Readere

I created a simple Boot/Spring Batch 3.0.8.RELEASE job. I created a simple class that implements JobParametersIncrementer to go to the database, look up how many days the query should look for and puts those into the JobParameters object.
I need that value in my JdbcCursorItemReader, as it is selecting data based upon one of the looked up JobParameters, but I cannot figure this out via Java annotations. XML examples plenty, not so much for Java.
Below is my BatchConfiguration class that runs job.
`
#Autowired
SendJobParms jobParms; // this guy queries DB and puts data into JobParameters
#Bean
public Job job(#Qualifier("step1") Step step1, #Qualifier("step2") Step step2) {
return jobs.get("DW_Send").incrementer(jobParms).start(step1).next(step2).build();
}
#Bean
protected Step step2(ItemReader<McsendRequest> reader,
ItemWriter<McsendRequest> writer) {
return steps.get("step2")
.<McsendRequest, McsendRequest> chunk(5000)
.reader(reader)
.writer(writer)
.build();
}
#Bean
public JdbcCursorItemReader reader() {
JdbcCursorItemReader<McsendRequest> itemReader = new JdbcCursorItemReader<McsendRequest>();
itemReader.setDataSource(dataSource);
// want to get access to JobParameter here so I can pull values out for my sql query.
itemReader.setSql("select xxxx where rownum <= JobParameter.getCount()");
itemReader.setRowMapper(new McsendRequestMapper());
return itemReader;
}
`
Change reader definition as follows (example for parameter of type Long and name paramCount):
#Bean
#StepScope
public JdbcCursorItemReader reader(#Value("#{jobParameters[paramCount]}") Long paramCount) {
JdbcCursorItemReader<McsendRequest> itemReader = new JdbcCursorItemReader<McsendRequest>();
itemReader.setDataSource(dataSource);
itemReader.setSql("select xxxx where rownum <= ?");
ListPreparedStatementSetter listPreparedStatementSetter = new ListPreparedStatementSetter();
listPreparedStatementSetter.setParameters(Arrays.asList(paramCount));
itemReader.setPreparedStatementSetter(listPreparedStatementSetter);
itemReader.setRowMapper(new McsendRequestMapper());
return itemReader;
}

Resources