Multiline AggregateItemReader not working as suggested in spring-batch-samples - spring

I am trying to use the multiline IteamReader following the spring-batch-sample at https://github.com/spring-projects/spring-batch/tree/main/spring-batch-samples#multiline
I am running into compilation error as below -
I am sure there is something related to generics as it looking for class implementing ItemReader but the AggregateItemReader implements ItemReader<List>.
public class AggregateItemReader<T> implements ItemReader<List<T>> {
you can find my code here - https://github.com/arpit9mittal/spring-batch-demo/blob/master/src/main/java/my/demo/batch/BatchConfiguration.java
UPDATE:
I suppressed the generics and updated the AggregateItemReader as below inorder to call ItemStreamReader open() method.
public class AggregateItemReader<T> implements ItemStreamReader<List<T>> {
private static final Log LOG = LogFactory.getLog(AggregateItemReader.class);
private ItemStreamReader<AggregateItem<T>> itemReader;
I noticed that the ItemWriter is writing lists of record instead of record per line
[Trade: [isin=UK21341EAH45,quantity=978,price=98.34,customer=customer1], Trade: [isin=UK21341EAH46,quantity=112,price=18.12,customer=customer2]]
[Trade: [isin=UK21341EAH47,quantity=245,price=12.78,customer=customer3], Trade: [isin=UK21341EAH48,quantity=108,price=9.25,customer=customer4], Trade: [isin=UK21341EAH49,quantity=854,price=23.39,customer=customer5]]
[Trade: [isin=UK21341EAH47,quantity=245,price=12.78,customer=customer6], Trade: [isin=UK21341EAH48,quantity=108,price=9.25,customer=customer7], Trade: [isin=UK21341EAH49,quantity=854,price=23.39,customer=customer8]]
AND When i try to add a processor, it complains that processor cannot convert the list into Trade object.
#Bean
public ItemProcessor<Trade, Trade> processor() {
return new ItemProcessor<Trade, Trade>() {
#Override
public Trade process(Trade item) throws Exception {
item.setProcessed(true);
return item;
}
};
}
#SuppressWarnings({ "rawtypes", "unchecked" })
#Bean
public Step multilineStep(
AggregateItemReader reader,
ItemProcessor processor,
FlatFileItemWriter writer,
StepItemReadListener itemReadListener) {
return stepBuilderFactory.get("multiLineStep")
.chunk(1)
.reader(reader)
.writer(writer)
.processor(processor)
.build();
}
ERROR:
java.lang.ClassCastException: java.util.ArrayList cannot be cast to my.demo.batch.multiline.Trade
at my.demo.batch.BatchConfiguration$2.process(BatchConfiguration.java:1) ~[main/:na]
at org.springframework.batch.core.step.item.SimpleChunkProcessor.doProcess(SimpleChunkProcessor.java:134) ~[spring-batch-core-4.3.3.jar:4.3.3]
at org.springframework.batch.core.step.item.SimpleChunkProcessor.transform(SimpleChunkProcessor.java:319) ~[spring-batch-core-4.3.3.jar:4.3.3]
at org.springframework.batch.core.step.item.SimpleChunkProcessor.process(SimpleChunkProcessor.java:210) ~[spring-batch-core-4.3.3.jar:4.3.3]
at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:77) ~[spring-batch-core-4.3.3.jar:4.3.3]
HELP:
How can we make it work without suppressing generics ?
How to ensure that ItemReader returns list just the same way as it does with chunk processing, so that ItemProcessor and ItemWriter works as usual ?
Is it possible to do so without extending the SimpleStepBuilder & SimpleChunkProvider ?

You need to be consistent in what the type of the items is that you want to handle on batch level. According to your step definition it is Trade. By calling <Trade, Trade>chunk(1) on the step builder, you declare that your batch should read items of type Trade with a chunk size of 1 (i.e. one at a time) and pass these on to a writer for items of type Trade. In this case, you need to supply a reader of type ItemReader<Trade>, a writer of type ItemWriter<Trade> and optionally a processor of type ItemProcessor<Trade, Trade>.
The problem is that your reader is of type ItemReader<List<Trade>>, i.e. it does not yield a Trade for each invocation of its read method but a list of trades.
If you want to use the AggregateItemReader you need to wrap it into a custom reader that works as an adapter and actually returns Trade items and not List<Trade>.
For example, the custom read method could look like this:
public Trade read() throws Exception {
if (queue.isEmpty()) {
List<Trade> trades = aggregateItemReader.read();
if (trades != null) {
queue.addAll(trades);
}
}
return queue.poll();
}
with queue initialized as
private Deque<Trade> queue = new ArrayDeque<>();

Related

Spring Batch Single Reader Multiple Processers and Multiple Writers [duplicate]

In Spring batch I need to pass the items read by an ItemReader to two different processors and writer. What I'm trying to achieve is that...
+---> ItemProcessor#1 ---> ItemWriter#1
|
ItemReader ---> item ---+
|
+---> ItemProcessor#2 ---> ItemWriter#2
This is needed because items written by ItemWriter#1 should be processed in a completely different way compared to the ones written by ItemWriter#2.
Moreover, ItemReader reads item from a database, and the queries it executes are so computational expensive that executing the same query twice should be discarded.
Any hint about how to achieve such set up ? Or, at least, a logically equivalent set up ?
This solution is valid if your item should be processed by processor #1 and processor #2
You have to create a processor #0 with this signature:
class Processor0<Item, CompositeResultBean>
where CompositeResultBean is a bean defined as
class CompositeResultBean {
Processor1ResultBean result1;
Processor2ResultBean result2;
}
In your Processor #0 just delegate work to processors #1 and #2 and put result in CompositeResultBean
CompositeResultBean Processor0.process(Item item) {
final CompositeResultBean r = new CompositeResultBean();
r.setResult1(processor1.process(item));
r.setResult2(processor2.process(item));
return r;
}
Your own writer is a CompositeItemWriter that delegate to writer CompositeResultBean.result1 or CompositeResultBean.result2 (look at PropertyExtractingDelegatingItemWriter, maybe can help)
I followed Luca's suggestion to use PropertyExtractingDelegatingItemWriter as writer and I was able to work with two different entities in one single step.
First of all what I did was to define a DTO that stores the two entities/results from the processor
public class DatabaseEntry {
private AccessLogEntry accessLogEntry;
private BlockedIp blockedIp;
public AccessLogEntry getAccessLogEntry() {
return accessLogEntry;
}
public void setAccessLogEntry(AccessLogEntry accessLogEntry) {
this.accessLogEntry = accessLogEntry;
}
public BlockedIp getBlockedIp() {
return blockedIp;
}
public void setBlockedIp(BlockedIp blockedIp) {
this.blockedIp = blockedIp;
}
}
Then I passed this DTO to the writer, a PropertyExtractingDelegatingItemWriter class where I define two customized methods to write the entities into the database, see my writer code below:
#Configuration
public class LogWriter extends LogAbstract {
#Autowired
private DataSource dataSource;
#Bean()
public PropertyExtractingDelegatingItemWriter<DatabaseEntry> itemWriterAccessLogEntry() {
PropertyExtractingDelegatingItemWriter<DatabaseEntry> propertyExtractingDelegatingItemWriter = new PropertyExtractingDelegatingItemWriter<DatabaseEntry>();
propertyExtractingDelegatingItemWriter.setFieldsUsedAsTargetMethodArguments(new String[]{"accessLogEntry", "blockedIp"});
propertyExtractingDelegatingItemWriter.setTargetObject(this);
propertyExtractingDelegatingItemWriter.setTargetMethod("saveTransaction");
return propertyExtractingDelegatingItemWriter;
}
public void saveTransaction(AccessLogEntry accessLogEntry, BlockedIp blockedIp) throws SQLException {
writeAccessLogTable(accessLogEntry);
if (blockedIp != null) {
writeBlockedIp(blockedIp);
}
}
private void writeBlockedIp(BlockedIp entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO blocked_ips (ip,threshold,startDate,endDate,comment) VALUES (?,?,?,?,?)");
statement.setString(1, entry.getIp());
statement.setInt(2, threshold);
statement.setTimestamp(3, Timestamp.valueOf(startDate));
statement.setTimestamp(4, Timestamp.valueOf(endDate));
statement.setString(5, entry.getComment());
statement.execute();
}
private void writeAccessLogTable(AccessLogEntry entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO log_entries (date,ip,request,status,userAgent) VALUES (?,?,?,?,?)");
statement.setTimestamp(1, Timestamp.valueOf(entry.getDate()));
statement.setString(2, entry.getIp());
statement.setString(3, entry.getRequest());
statement.setString(4, entry.getStatus());
statement.setString(5, entry.getUserAgent());
statement.execute();
}
}
With this approach you can get the wanted inital behaviour from a single reader for processing multiple entities and save them in a single step.
You can use a CompositeItemProcessor and CompositeItemWriter
It won't look exactly like your schema, it will be sequential, but it will do the job.
this is the solution I came up with.
So, the idea is to code a new Writer that "contains" both an ItemProcessor and an ItemWriter. Just to give you an idea, we called it PreprocessoWriter, and that's the core code.
private ItemWriter<O> writer;
private ItemProcessor<I, O> processor;
#Override
public void write(List<? extends I> items) throws Exception {
List<O> toWrite = new ArrayList<O>();
for (I item : items) {
toWrite.add(processor.process(item));
}
writer.write(toWrite);
}
There's a lot of things being left aside. Management of ItemStream, for instance. But in our particular scenario this was enough.
So you can just combine multiple PreprocessorWriter with CompositeWriter.
There is an other solution if you have a reasonable amount of items (like less than 1 Go) : you can cache the result of your select into a collection wrapped in a Spring bean.
Then u can just read the collection twice with no cost.

Spring batch - org.springframework.batch.item.ReaderNotOpenException - jdbccursoritemreader

I get the reader not open exception when tried to read JdbcCursorItemReader in the Item Reader implementation. I checked the stack overflow, but couldnt get a discussion on jdbc item reader.
Here is the code of Batch config and Item reader implementation. Added only required code.
public class BatchConfig extends DefaultBatchConfigurer {
#Bean
public ItemReader<Allocation> allocationReader() {
return new AllocationReader(dataSource);
}
#Bean
public Step step() {
return stepBuilderFactory.get("step").<Allocation, Allocation>chunk (1).reader(allocationReader()).processor(allocationProcessor()).writer(allocationWriter()).build();
}
}
public class AllocationReader implements ItemReader<Allocation> {
private DataSource ds;
private string block;
public AllocationReader(DataSource ds) {
this.ds = ds;
}
#BeforeStep
public void readStep(StepExecution StepExecution) {
this.stepExecution = StepExecution;
block = StepExecution.getJobExecution().get ExecutionContext().get("blocks");
}
#Override
public Allocation read() {
JdbcCursorItemReader<Allocation> reader = new JdbcCursorItemReader<Allocation>();
reader.setSql("select * from blocks where block_ref = + block);
reader.setDataSource(this.ds);
reader.setRowMapper(new AllocationMapper());
return reader.read();
}}
I could not write Item reader as bean in batch config as i need to call before step in item reader to access stepexecution.
If the Item reader read() function return type is changed to jdbccursorItemReader type, it throws type exception in Step reader() .
Let me know what am i missing or any other code snippet is required .
You are creating a JdbcCursorItemReader instance in the read method of AllocationReader. This is not correct. The code of this method should be the implementation of the actual read operation, and not for creating an item reader.
I could not write Item reader as bean in batch config as i need to call before step in item reader to access step execution.
For this use case, you can define the reader as a step-scoped bean and inject attributes from the job execution context as needed. This is explained in the reference documentation here: Late Binding of Job and Step Attributes. In your case, the reader could be defined like this:
#Bean
#StepScoped
public JdbcCursorItemReader<Allocation> itemReader(#Value("#{jobExecutionContext['block']}") String block) {
// use "block" as needed to define the reader
JdbcCursorItemReader<Allocation> reader = new JdbcCursorItemReader<Allocation>();
reader.setSql("select * from blocks where block_ref = " + block);
reader.setDataSource(this.ds);
reader.setRowMapper(new AllocationMapper());
return reader;
}
When you define an item reader bean that is an ItemStream, you need to make the bean definition method return at least ItemStreamReader (or the actual implementation type), so that Spring Batch correctly scopes the bean and calls open/update/close appropriately during the step. Otherwise, the open method will not be called and therefore you will get that ReaderNotOpenException.

Spring Batch Runtime Exception in Item processor

I am learning spring batch and trying to understand how item processor works, during exception.
I am reading data from csv file in a chunk of 3 records and process it and write it to Database.
my csv file
Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
John,Doem
Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
Batch Configuration, reading items in chunk of 3 , and skip limit 2
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Bean
public FlatFileItemReader<Person> reader() {
return new FlatFileItemReaderBuilder<Person>().name("personItemReader").resource(new ClassPathResource("sample-data.csv")).delimited()
.names(new String[] { "firstName", "lastName" }).fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {
{
setTargetType(Person.class);
}
}).build();
}
#Bean
public PersonItemProcessor processor() {
return new PersonItemProcessor();
}
#Bean
public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Person>().itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO person (first_name, last_name) VALUES (:firstName, :lastName)").dataSource(dataSource).build();
}
#Bean
public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
return jobBuilderFactory.get("importUserJob").incrementer(new RunIdIncrementer()).listener(listener).flow(step1).end().build();
}
#Bean
public Step step1(JdbcBatchItemWriter<Person> writer) {
return stepBuilderFactory.get("step1").<Person, Person> chunk(3).reader(reader()).processor(processor()).writer(writer).faultTolerant().skipLimit(2)
.skip(Exception.class).build();
}
}
I am trying to simulate a Exception, by throwing Exception manually for one record in my item processor
public class PersonItemProcessor implements ItemProcessor<Person, Person> {
private static final Logger log = LoggerFactory.getLogger(PersonItemProcessor.class);
#Override
public Person process(final Person person) throws Exception {
final String firstName = person.getFirstName().toUpperCase();
final String lastName = person.getLastName().toUpperCase();
final Person transformedPerson = new Person(firstName, lastName);
log.info("Converting (" + person + ") into (" + transformedPerson + ")");
if (person.getLastName().equals("Doem"))
throw new Exception("DOOM");
return transformedPerson;
}
}
Now as per skip limit, when the exception is thrown, the item processor is re processing the chunk and skips the item which throws error and item write also inserts all records in DB , except the one record with exception.
This is all fine, because my processor, it is just converting lower to upper case name, and it can be run many times with out impact.
But lets assume if my item processor, is calling web service and sending data.
and if some exception is thrown after successful calling for web service. then remaining data in the chunk will be processed again (and calling webservice again).
I don't want to call web service again, because it is like sending duplicate data to web service and the webservice system cannot identify duplicate data.
How to handle such case. one option is don't skip Exception, which means my still one record in the chunk will not make it to item writer, even though the processor had called web service. so that is not correct.
other option chunk should be of size 1 , then this may not be efficient in processing thousands of records.
what are the other options ?
According to your description, your item processor is not idempotent. However, the Fault tolerance section of the documentation says that the item processor should be idempotent when using a fault tolerant step. Here is an excerpt:
If a step is configured to be fault tolerant (typically by using skip or retry processing), any ItemProcessor used should be implemented in a way that is idempotent.

Getting an Error like this - "jobParameters cannot be found on object of type BeanExpressionContext"

We're creating a spring batch app that reads data from a database and writes in another database. In this process, we need to dynamically set the parameter to the SQL as we have parameters that demands data accordingly.
For this, We created a JdbcCursorItemReader Reader with #StepScope as I've found in other articles and tutorials. But was not successful. The chunk reader in our Job actually uses Peekable reader which internally uses the JdbcCursorItemReader object to perform the actual read operation.
When the job is triggered, we get the error - "jobParameters cannot be found on object of type BeanExpressionContext"
Please let me know what is that I am doing wrongly in the bean configuration below.
#Bean
#StepScope
#Scope(proxyMode = ScopedProxyMode.TARGET_CLASS)
public JdbcCursorItemReader<DTO> jdbcDataReader(#Value() String param) throws Exception {
JdbcCursorItemReader<DTO> databaseReader = new JdbcCursorItemReader<DTO>();
return databaseReader;
}
// This class extends PeekableReader, and sets JdbcReader (jdbcDataReader) as delegate
#Bean
public DataPeekReader getPeekReader() {
DataPeekReader peekReader = new DataPeekReader();
return peekReader;
}
// This is the reader that uses Peekable Item Reader (getPeekReader) and also specifies chunk completion policy.
#Bean
public DataReader getDataReader() {
DataReader dataReader = new DataReader();
return dataReader;
}
// This is the step builder.
#Bean
public Step readDataStep() throws Exception {
return stepBuilderFactory.get("readDataStep")
.<DTO, DTO>chunk(getDataReader())
.reader(getDataReader())
.writer(getWriter())
.build();
}
#Bean
public Job readReconDataJob() throws Exception {
return jobBuilderFactory.get("readDataJob")
.incrementer(new RunIdIncrementer())
.flow(readDataStep())
.end()
.build();
}
Please let me know what is that I am doing wrongly in the bean configuration below.
Your jdbcDataReader(#Value() String param) is incorrect. You need to specify a Spel expression in the #Value to specify which parameter to inject. Here is an example of how to pass a job parameter to a JdbcCursorItemReader:
#Bean
#StepScope
public JdbcCursorItemReader<DTO> jdbcCursorItemReader(#Value("#{jobParameters['table']}") String table) {
return new JdbcCursorItemReaderBuilder<DTO>()
.sql("select * from " + table)
// set other properties
.build();
}
You can find more details in the late binding section of the reference documentation.

Spring Batch: One reader, multiple processors and writers

In Spring batch I need to pass the items read by an ItemReader to two different processors and writer. What I'm trying to achieve is that...
+---> ItemProcessor#1 ---> ItemWriter#1
|
ItemReader ---> item ---+
|
+---> ItemProcessor#2 ---> ItemWriter#2
This is needed because items written by ItemWriter#1 should be processed in a completely different way compared to the ones written by ItemWriter#2.
Moreover, ItemReader reads item from a database, and the queries it executes are so computational expensive that executing the same query twice should be discarded.
Any hint about how to achieve such set up ? Or, at least, a logically equivalent set up ?
This solution is valid if your item should be processed by processor #1 and processor #2
You have to create a processor #0 with this signature:
class Processor0<Item, CompositeResultBean>
where CompositeResultBean is a bean defined as
class CompositeResultBean {
Processor1ResultBean result1;
Processor2ResultBean result2;
}
In your Processor #0 just delegate work to processors #1 and #2 and put result in CompositeResultBean
CompositeResultBean Processor0.process(Item item) {
final CompositeResultBean r = new CompositeResultBean();
r.setResult1(processor1.process(item));
r.setResult2(processor2.process(item));
return r;
}
Your own writer is a CompositeItemWriter that delegate to writer CompositeResultBean.result1 or CompositeResultBean.result2 (look at PropertyExtractingDelegatingItemWriter, maybe can help)
I followed Luca's suggestion to use PropertyExtractingDelegatingItemWriter as writer and I was able to work with two different entities in one single step.
First of all what I did was to define a DTO that stores the two entities/results from the processor
public class DatabaseEntry {
private AccessLogEntry accessLogEntry;
private BlockedIp blockedIp;
public AccessLogEntry getAccessLogEntry() {
return accessLogEntry;
}
public void setAccessLogEntry(AccessLogEntry accessLogEntry) {
this.accessLogEntry = accessLogEntry;
}
public BlockedIp getBlockedIp() {
return blockedIp;
}
public void setBlockedIp(BlockedIp blockedIp) {
this.blockedIp = blockedIp;
}
}
Then I passed this DTO to the writer, a PropertyExtractingDelegatingItemWriter class where I define two customized methods to write the entities into the database, see my writer code below:
#Configuration
public class LogWriter extends LogAbstract {
#Autowired
private DataSource dataSource;
#Bean()
public PropertyExtractingDelegatingItemWriter<DatabaseEntry> itemWriterAccessLogEntry() {
PropertyExtractingDelegatingItemWriter<DatabaseEntry> propertyExtractingDelegatingItemWriter = new PropertyExtractingDelegatingItemWriter<DatabaseEntry>();
propertyExtractingDelegatingItemWriter.setFieldsUsedAsTargetMethodArguments(new String[]{"accessLogEntry", "blockedIp"});
propertyExtractingDelegatingItemWriter.setTargetObject(this);
propertyExtractingDelegatingItemWriter.setTargetMethod("saveTransaction");
return propertyExtractingDelegatingItemWriter;
}
public void saveTransaction(AccessLogEntry accessLogEntry, BlockedIp blockedIp) throws SQLException {
writeAccessLogTable(accessLogEntry);
if (blockedIp != null) {
writeBlockedIp(blockedIp);
}
}
private void writeBlockedIp(BlockedIp entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO blocked_ips (ip,threshold,startDate,endDate,comment) VALUES (?,?,?,?,?)");
statement.setString(1, entry.getIp());
statement.setInt(2, threshold);
statement.setTimestamp(3, Timestamp.valueOf(startDate));
statement.setTimestamp(4, Timestamp.valueOf(endDate));
statement.setString(5, entry.getComment());
statement.execute();
}
private void writeAccessLogTable(AccessLogEntry entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO log_entries (date,ip,request,status,userAgent) VALUES (?,?,?,?,?)");
statement.setTimestamp(1, Timestamp.valueOf(entry.getDate()));
statement.setString(2, entry.getIp());
statement.setString(3, entry.getRequest());
statement.setString(4, entry.getStatus());
statement.setString(5, entry.getUserAgent());
statement.execute();
}
}
With this approach you can get the wanted inital behaviour from a single reader for processing multiple entities and save them in a single step.
You can use a CompositeItemProcessor and CompositeItemWriter
It won't look exactly like your schema, it will be sequential, but it will do the job.
this is the solution I came up with.
So, the idea is to code a new Writer that "contains" both an ItemProcessor and an ItemWriter. Just to give you an idea, we called it PreprocessoWriter, and that's the core code.
private ItemWriter<O> writer;
private ItemProcessor<I, O> processor;
#Override
public void write(List<? extends I> items) throws Exception {
List<O> toWrite = new ArrayList<O>();
for (I item : items) {
toWrite.add(processor.process(item));
}
writer.write(toWrite);
}
There's a lot of things being left aside. Management of ItemStream, for instance. But in our particular scenario this was enough.
So you can just combine multiple PreprocessorWriter with CompositeWriter.
There is an other solution if you have a reasonable amount of items (like less than 1 Go) : you can cache the result of your select into a collection wrapped in a Spring bean.
Then u can just read the collection twice with no cost.

Resources