Spring batch - pass values between reader and processor - spring

I have a requirement where I need to read values from an xls (where a column called netCreditAmount exists) and save the values in database. The requirement is to add the value of netCreditAmount from all the rows and then set this sum in database only for the first row in xls and remaining rows are inserted with their corresponding netCreditAmounts.
How should I go ahead with the implemetation in Spring Batch. Normal reader, processor and writer are working fine but where exactly should i insert this implementation?
Thanks!

Yo can solve this by adding additional tasklet.
job flow can be like below
#Bean
public Job myJob(JobBuilderFactory jobs) throws Exception {
return jobs.get("myJob")
.start(step1LoadAllData()) // This step will load all data in database excpet first row in xls
.next(updateNetCreditAmountStep()) //// This step will be a tasklet. and will update total sum in first row. You can use database sql for sum for this
.build();
}
Tasklet will be something like below
#Component
public class updateNetCreditAmountTasklet implements Tasklet {
#Override
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext)
throws Exception {
Double sum = jdbctemplate.queryForObject("select sum(netCreditAmount) from XYZ", Double.class);
// nouw update this some in database for first row
return null;
}
}

So what is the problem?
You need to setup your batch job step to use reader-processor-writer.
Reader has interface:
public interface ItemReader<T> {
T read();
}
Processor:
public interface ItemProcessor<I, O> {
O process(I item);
}
So what you need to have same type provided by reader - T; and pass it to processor - I
stepBuilderFactory.get("myCoolStep")
.<I, O>chunk(1)
.reader(myReader)
.processor(myProcessor)
.writer(myWriter)
.build();

Related

Spring Batch - delete

How can I do the deletion of the entities that I just persisted?
#Bean
public Job job() {
return this.jobBuilderFactory.get("job")
.start(this.syncStep())
.build();
}
#Bean
public Step syncStep() {
// read
RepositoryItemReader<Element1> reader = new RepositoryItemReader<>();
reader.setRepository(repository);
reader.setMethodName("findElements");
reader.setArguments(new ArrayList<>(Arrays.asList(ZonedDateTime.now())));
final HashMap<String, Sort.Direction> sorts = new HashMap<>();
sorts.put("uid", Sort.Direction.ASC);
reader.setSort(sorts);
// write
RepositoryItemWriter<Element1> writer = new RepositoryItemWriter<>();
writer.setRepository(otherrepository);
writer.setMethodName("save");
return stepBuilderFactory.get("syncStep")
.<Element1, Element2> chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
It is a process of dumping elements. We pass the elements from one table to another.
It is a process of dumping elements. We pass the elements from one table to another.
You can do that in two steps. The first step copies items from one table to another. The second step deletes the items from the source table. The second step should be executed only if the first step succeeds.
There are a few options:
Using a CompositeItemWriter
You could create a second ItemWriter that does the delete logic, for example:
RepositoryItemWriter<Element1> deleteWriter = new RepositoryItemWriter<>();
deleteWriter.setRepository(repository);
deleteWriter.setMethodName("delete");
To execute both writers you can use a CompositeItemWriter:
CompositeItemWriter<User> writer = new CompositeItemWriter<>();
// 'saveWriter' would be the writer you currently have
writer.setDelegates(List.of(saveWriter, deleteWriter));
This however won't work if your ItemProcessor transforms the original entity to something completely new. In that case I suggest using PropertyExtractingDelegatingItemWriter.
(Note, according to this question the writers run sequentially and the second writer should not be executed if the first one fails, but I'm not 100% sure on that.)
Using a separate Step
Alternatively, you could put the new writer in an entirely separate Step:
#Bean
public Step cleanupStep() {
// Same reader as before (might want to put this in a separate #Bean)
RepositoryItemReader<Element1> reader = new RepositoryItemReader<>();
// ...
// The 'deleteWriter' from before
RepositoryItemWriter<Element1> deleteWriter = new RepositoryItemWriter<>();
// ...
return stepBuilderFactory.get("cleanupStep")
.<Element1, Element2> chunk(10)
.reader(reader)
.writer(writer)
.build();
}
Now you can schedule the two steps individually:
#Bean
public Job job() {
return this.jobBuilderFactory.get("job")
.start(this.syncStep())
.next(this.cleanupStep())
.build();
}
Using a Tasklet
If you're using a separate step and depending on the amount of data, it might be more interesting to offload it entirely to the database and execute a single delete ... where ... query.
public class CleanupRepositoryTasklet implements Tasklet {
private final Repository repository;
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
repository.customDeleteMethod();
return RepeatStatus.FINISHED;
}
}
This Tasklet can then be registered in the same way as before, by declaring a new Step in your configuration:
return this.stepBuilderFactory.get("cleanupStep")
.tasklet(myTasklet())
.build();

Spring Batch Single Reader Multiple Processers and Multiple Writers [duplicate]

In Spring batch I need to pass the items read by an ItemReader to two different processors and writer. What I'm trying to achieve is that...
+---> ItemProcessor#1 ---> ItemWriter#1
|
ItemReader ---> item ---+
|
+---> ItemProcessor#2 ---> ItemWriter#2
This is needed because items written by ItemWriter#1 should be processed in a completely different way compared to the ones written by ItemWriter#2.
Moreover, ItemReader reads item from a database, and the queries it executes are so computational expensive that executing the same query twice should be discarded.
Any hint about how to achieve such set up ? Or, at least, a logically equivalent set up ?
This solution is valid if your item should be processed by processor #1 and processor #2
You have to create a processor #0 with this signature:
class Processor0<Item, CompositeResultBean>
where CompositeResultBean is a bean defined as
class CompositeResultBean {
Processor1ResultBean result1;
Processor2ResultBean result2;
}
In your Processor #0 just delegate work to processors #1 and #2 and put result in CompositeResultBean
CompositeResultBean Processor0.process(Item item) {
final CompositeResultBean r = new CompositeResultBean();
r.setResult1(processor1.process(item));
r.setResult2(processor2.process(item));
return r;
}
Your own writer is a CompositeItemWriter that delegate to writer CompositeResultBean.result1 or CompositeResultBean.result2 (look at PropertyExtractingDelegatingItemWriter, maybe can help)
I followed Luca's suggestion to use PropertyExtractingDelegatingItemWriter as writer and I was able to work with two different entities in one single step.
First of all what I did was to define a DTO that stores the two entities/results from the processor
public class DatabaseEntry {
private AccessLogEntry accessLogEntry;
private BlockedIp blockedIp;
public AccessLogEntry getAccessLogEntry() {
return accessLogEntry;
}
public void setAccessLogEntry(AccessLogEntry accessLogEntry) {
this.accessLogEntry = accessLogEntry;
}
public BlockedIp getBlockedIp() {
return blockedIp;
}
public void setBlockedIp(BlockedIp blockedIp) {
this.blockedIp = blockedIp;
}
}
Then I passed this DTO to the writer, a PropertyExtractingDelegatingItemWriter class where I define two customized methods to write the entities into the database, see my writer code below:
#Configuration
public class LogWriter extends LogAbstract {
#Autowired
private DataSource dataSource;
#Bean()
public PropertyExtractingDelegatingItemWriter<DatabaseEntry> itemWriterAccessLogEntry() {
PropertyExtractingDelegatingItemWriter<DatabaseEntry> propertyExtractingDelegatingItemWriter = new PropertyExtractingDelegatingItemWriter<DatabaseEntry>();
propertyExtractingDelegatingItemWriter.setFieldsUsedAsTargetMethodArguments(new String[]{"accessLogEntry", "blockedIp"});
propertyExtractingDelegatingItemWriter.setTargetObject(this);
propertyExtractingDelegatingItemWriter.setTargetMethod("saveTransaction");
return propertyExtractingDelegatingItemWriter;
}
public void saveTransaction(AccessLogEntry accessLogEntry, BlockedIp blockedIp) throws SQLException {
writeAccessLogTable(accessLogEntry);
if (blockedIp != null) {
writeBlockedIp(blockedIp);
}
}
private void writeBlockedIp(BlockedIp entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO blocked_ips (ip,threshold,startDate,endDate,comment) VALUES (?,?,?,?,?)");
statement.setString(1, entry.getIp());
statement.setInt(2, threshold);
statement.setTimestamp(3, Timestamp.valueOf(startDate));
statement.setTimestamp(4, Timestamp.valueOf(endDate));
statement.setString(5, entry.getComment());
statement.execute();
}
private void writeAccessLogTable(AccessLogEntry entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO log_entries (date,ip,request,status,userAgent) VALUES (?,?,?,?,?)");
statement.setTimestamp(1, Timestamp.valueOf(entry.getDate()));
statement.setString(2, entry.getIp());
statement.setString(3, entry.getRequest());
statement.setString(4, entry.getStatus());
statement.setString(5, entry.getUserAgent());
statement.execute();
}
}
With this approach you can get the wanted inital behaviour from a single reader for processing multiple entities and save them in a single step.
You can use a CompositeItemProcessor and CompositeItemWriter
It won't look exactly like your schema, it will be sequential, but it will do the job.
this is the solution I came up with.
So, the idea is to code a new Writer that "contains" both an ItemProcessor and an ItemWriter. Just to give you an idea, we called it PreprocessoWriter, and that's the core code.
private ItemWriter<O> writer;
private ItemProcessor<I, O> processor;
#Override
public void write(List<? extends I> items) throws Exception {
List<O> toWrite = new ArrayList<O>();
for (I item : items) {
toWrite.add(processor.process(item));
}
writer.write(toWrite);
}
There's a lot of things being left aside. Management of ItemStream, for instance. But in our particular scenario this was enough.
So you can just combine multiple PreprocessorWriter with CompositeWriter.
There is an other solution if you have a reasonable amount of items (like less than 1 Go) : you can cache the result of your select into a collection wrapped in a Spring bean.
Then u can just read the collection twice with no cost.

Spring Batch Runtime Exception in Item processor

I am learning spring batch and trying to understand how item processor works, during exception.
I am reading data from csv file in a chunk of 3 records and process it and write it to Database.
my csv file
Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
John,Doem
Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
Batch Configuration, reading items in chunk of 3 , and skip limit 2
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Bean
public FlatFileItemReader<Person> reader() {
return new FlatFileItemReaderBuilder<Person>().name("personItemReader").resource(new ClassPathResource("sample-data.csv")).delimited()
.names(new String[] { "firstName", "lastName" }).fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {
{
setTargetType(Person.class);
}
}).build();
}
#Bean
public PersonItemProcessor processor() {
return new PersonItemProcessor();
}
#Bean
public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Person>().itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO person (first_name, last_name) VALUES (:firstName, :lastName)").dataSource(dataSource).build();
}
#Bean
public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
return jobBuilderFactory.get("importUserJob").incrementer(new RunIdIncrementer()).listener(listener).flow(step1).end().build();
}
#Bean
public Step step1(JdbcBatchItemWriter<Person> writer) {
return stepBuilderFactory.get("step1").<Person, Person> chunk(3).reader(reader()).processor(processor()).writer(writer).faultTolerant().skipLimit(2)
.skip(Exception.class).build();
}
}
I am trying to simulate a Exception, by throwing Exception manually for one record in my item processor
public class PersonItemProcessor implements ItemProcessor<Person, Person> {
private static final Logger log = LoggerFactory.getLogger(PersonItemProcessor.class);
#Override
public Person process(final Person person) throws Exception {
final String firstName = person.getFirstName().toUpperCase();
final String lastName = person.getLastName().toUpperCase();
final Person transformedPerson = new Person(firstName, lastName);
log.info("Converting (" + person + ") into (" + transformedPerson + ")");
if (person.getLastName().equals("Doem"))
throw new Exception("DOOM");
return transformedPerson;
}
}
Now as per skip limit, when the exception is thrown, the item processor is re processing the chunk and skips the item which throws error and item write also inserts all records in DB , except the one record with exception.
This is all fine, because my processor, it is just converting lower to upper case name, and it can be run many times with out impact.
But lets assume if my item processor, is calling web service and sending data.
and if some exception is thrown after successful calling for web service. then remaining data in the chunk will be processed again (and calling webservice again).
I don't want to call web service again, because it is like sending duplicate data to web service and the webservice system cannot identify duplicate data.
How to handle such case. one option is don't skip Exception, which means my still one record in the chunk will not make it to item writer, even though the processor had called web service. so that is not correct.
other option chunk should be of size 1 , then this may not be efficient in processing thousands of records.
what are the other options ?
According to your description, your item processor is not idempotent. However, the Fault tolerance section of the documentation says that the item processor should be idempotent when using a fault tolerant step. Here is an excerpt:
If a step is configured to be fault tolerant (typically by using skip or retry processing), any ItemProcessor used should be implemented in a way that is idempotent.

Spring Batch memory leak - XML to database using ItemWriter

I had a problem with a Spring Batch job for reading a large XML file (a few million records) and saving the records from it to a database. The job uses chunk of 100 elements and MultiResourceItemReader for reading the XML, ItemProcessor for processed records and ItemWriter for writing records to the database using JPA and EntityManager. The problem is that when call persist operation the job ends up with OutOfMemoryError (I tried to comment writer phase and the problem does not occur).
public class MyClassWriter implements ItemWriter<MyObject> {
#Autowired
private MyDelegate delegate;
#Override
public void write(List<? extends MyObject> items) throws Exception {
...
List<MyObject> foos2 = (List<MyObject>)(List<?>)items;
delegate.setInsert(foos2);
...
}
and
public void setInsert(List<MyObject> list) {
for (MyObject el : list) {
em.persist(el);
}
em.flush();
em.clear(); //I tried to call clear operation too, but not solved problem
}
Any suggestion for me?
It seems the OutOfMemoryException is caused when trying to save too many items at once, try saving the items in batches:
int c = 0;
for (MyObject mo : list) {
em.persist(mo);
if (++c % 1000 == 0) {
em.flush();
}
}
// save any remaining items
em.flush();

Spring Batch: One reader, multiple processors and writers

In Spring batch I need to pass the items read by an ItemReader to two different processors and writer. What I'm trying to achieve is that...
+---> ItemProcessor#1 ---> ItemWriter#1
|
ItemReader ---> item ---+
|
+---> ItemProcessor#2 ---> ItemWriter#2
This is needed because items written by ItemWriter#1 should be processed in a completely different way compared to the ones written by ItemWriter#2.
Moreover, ItemReader reads item from a database, and the queries it executes are so computational expensive that executing the same query twice should be discarded.
Any hint about how to achieve such set up ? Or, at least, a logically equivalent set up ?
This solution is valid if your item should be processed by processor #1 and processor #2
You have to create a processor #0 with this signature:
class Processor0<Item, CompositeResultBean>
where CompositeResultBean is a bean defined as
class CompositeResultBean {
Processor1ResultBean result1;
Processor2ResultBean result2;
}
In your Processor #0 just delegate work to processors #1 and #2 and put result in CompositeResultBean
CompositeResultBean Processor0.process(Item item) {
final CompositeResultBean r = new CompositeResultBean();
r.setResult1(processor1.process(item));
r.setResult2(processor2.process(item));
return r;
}
Your own writer is a CompositeItemWriter that delegate to writer CompositeResultBean.result1 or CompositeResultBean.result2 (look at PropertyExtractingDelegatingItemWriter, maybe can help)
I followed Luca's suggestion to use PropertyExtractingDelegatingItemWriter as writer and I was able to work with two different entities in one single step.
First of all what I did was to define a DTO that stores the two entities/results from the processor
public class DatabaseEntry {
private AccessLogEntry accessLogEntry;
private BlockedIp blockedIp;
public AccessLogEntry getAccessLogEntry() {
return accessLogEntry;
}
public void setAccessLogEntry(AccessLogEntry accessLogEntry) {
this.accessLogEntry = accessLogEntry;
}
public BlockedIp getBlockedIp() {
return blockedIp;
}
public void setBlockedIp(BlockedIp blockedIp) {
this.blockedIp = blockedIp;
}
}
Then I passed this DTO to the writer, a PropertyExtractingDelegatingItemWriter class where I define two customized methods to write the entities into the database, see my writer code below:
#Configuration
public class LogWriter extends LogAbstract {
#Autowired
private DataSource dataSource;
#Bean()
public PropertyExtractingDelegatingItemWriter<DatabaseEntry> itemWriterAccessLogEntry() {
PropertyExtractingDelegatingItemWriter<DatabaseEntry> propertyExtractingDelegatingItemWriter = new PropertyExtractingDelegatingItemWriter<DatabaseEntry>();
propertyExtractingDelegatingItemWriter.setFieldsUsedAsTargetMethodArguments(new String[]{"accessLogEntry", "blockedIp"});
propertyExtractingDelegatingItemWriter.setTargetObject(this);
propertyExtractingDelegatingItemWriter.setTargetMethod("saveTransaction");
return propertyExtractingDelegatingItemWriter;
}
public void saveTransaction(AccessLogEntry accessLogEntry, BlockedIp blockedIp) throws SQLException {
writeAccessLogTable(accessLogEntry);
if (blockedIp != null) {
writeBlockedIp(blockedIp);
}
}
private void writeBlockedIp(BlockedIp entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO blocked_ips (ip,threshold,startDate,endDate,comment) VALUES (?,?,?,?,?)");
statement.setString(1, entry.getIp());
statement.setInt(2, threshold);
statement.setTimestamp(3, Timestamp.valueOf(startDate));
statement.setTimestamp(4, Timestamp.valueOf(endDate));
statement.setString(5, entry.getComment());
statement.execute();
}
private void writeAccessLogTable(AccessLogEntry entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO log_entries (date,ip,request,status,userAgent) VALUES (?,?,?,?,?)");
statement.setTimestamp(1, Timestamp.valueOf(entry.getDate()));
statement.setString(2, entry.getIp());
statement.setString(3, entry.getRequest());
statement.setString(4, entry.getStatus());
statement.setString(5, entry.getUserAgent());
statement.execute();
}
}
With this approach you can get the wanted inital behaviour from a single reader for processing multiple entities and save them in a single step.
You can use a CompositeItemProcessor and CompositeItemWriter
It won't look exactly like your schema, it will be sequential, but it will do the job.
this is the solution I came up with.
So, the idea is to code a new Writer that "contains" both an ItemProcessor and an ItemWriter. Just to give you an idea, we called it PreprocessoWriter, and that's the core code.
private ItemWriter<O> writer;
private ItemProcessor<I, O> processor;
#Override
public void write(List<? extends I> items) throws Exception {
List<O> toWrite = new ArrayList<O>();
for (I item : items) {
toWrite.add(processor.process(item));
}
writer.write(toWrite);
}
There's a lot of things being left aside. Management of ItemStream, for instance. But in our particular scenario this was enough.
So you can just combine multiple PreprocessorWriter with CompositeWriter.
There is an other solution if you have a reasonable amount of items (like less than 1 Go) : you can cache the result of your select into a collection wrapped in a Spring bean.
Then u can just read the collection twice with no cost.

Resources