Spring Batch : Aggregating records and write count - spring

We have some data coming in the flat file. e.g.
EmpCode,Salary,EmpName,...
100,1000,...,...
200,2000,...,...
200,2000,...,...
100,1000,...,...
300,3000,...,...
400,4000,...,...
We would like to aggregate the salary based on the EmpCode and write to the database as
Emp_Code Emp_Salary Updated_Time Updated_User
100 2000 ... ...
200 4000 ... ...
300 3000 ... ...
400 4000 ... ...
I have written classes as per the Spring Batch as follows
ItemReader - to read the employee data into a Employee object
A sample EmployeeItemProcessor:
public class EmployeeProcessor implements ItemProcessor<Employee, Employee> {
#Override
public Employee process(Employee employee) throws Exception {
employee.setUpdatedTime(new Date());
employee.setUpdatedUser("someuser");
return employee;
}
EmployeeItemWriter:
#Repository
public class EmployeeItemWriter implements ItemWriter<Employee> {
#Autowired
private SessionFactory sf;
#Override
public void write(List<? extends Employee> employeeList) throws Exception {
List<Employee> aggEmployeeList = aggregateEmpData(employeeList);
//write to db using session factory
}
private List<Employee> aggregateEmpData(List<? extends Employee> employeeList){
Map<String, Employee> map = new HashMap<String, Employee>();
for(Employee e: employeeList){
String empCode = e.getEmpCode();
if(map.containsKey(empCode)){
//get employee salary and add up
}else{
map.put(empCode,Employee);
}
}
return new ArrayList<Employee>(map.values());
}
}
XML Configuration
...
<batch:job id="employeeJob">
<batch:step id="step1">
<batch:tasklet>
<batch:chunk reader="employeeItemReader"
writer="employeeItemWriter" processor="employeeItemProcessor"
commit-interval="100">
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
...
It is working and serving my purpose. However, I have a couple of questions.
1) When I look at the logs, it is showing as below(commit-interval=100):
status=COMPLETED, exitStatus=COMPLETED, readCount=2652, filterCount=0, writeCount=2652 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=27, rollbackCount=0
But after aggregation, only 2515 records were written to the database. The write count is 2652. Is it because the number of items reaching ItemWriter are still 2652? How can this be corrected?
2) We are iterating through the list twice.Once in ItemProcessor and then in ItemWriter for aggregation. It could be a performance problem if, the number of records are higher. Is there any better way to achieve this?

If each line of input file, is an employee object, so your ReadCount would be number of lines in input file. WriteCount would be summation of size of all the lists passed to item writer. So, maybe your aggregateEmpData function removes or aggregates some records into one and hence, your db count is not the same as WriteCount.
If you want to make sure that WriteCount is exactly number of records in db, you should do your aggregate in processor.

Why do the aggregation in the ItemWriter? I'd do it in an ItemProcessor. This would allow the write count to be accurate and separates that component from the act of actual writing. If you provide some insight into your configuration, we could elaborate more.

I managed to write it. I did it as follows.
public class EmployeeProcessor implements ItemProcessor<Employee, Employee> {
Map<String, Employee> map;
#Override
public Employee process(Employee employee) throws Exception {
employee.setUpdatedTime(new Date());
employee.setUpdatedUser("someuser");
String empCode = employee.getEmpCode();
if(map.containsKey(empCode)){
//get employee salary and add up
return null;
}
map.put(empCode,employee);
return employee;
}
#BeforeStep
public void beforeStep(StepExecution stepExecution) {
map = new HashMap<String, Employee>();
}
The write count is appearing correctly now.

Related

Spring Batch Single Reader Multiple Processers and Multiple Writers [duplicate]

In Spring batch I need to pass the items read by an ItemReader to two different processors and writer. What I'm trying to achieve is that...
+---> ItemProcessor#1 ---> ItemWriter#1
|
ItemReader ---> item ---+
|
+---> ItemProcessor#2 ---> ItemWriter#2
This is needed because items written by ItemWriter#1 should be processed in a completely different way compared to the ones written by ItemWriter#2.
Moreover, ItemReader reads item from a database, and the queries it executes are so computational expensive that executing the same query twice should be discarded.
Any hint about how to achieve such set up ? Or, at least, a logically equivalent set up ?
This solution is valid if your item should be processed by processor #1 and processor #2
You have to create a processor #0 with this signature:
class Processor0<Item, CompositeResultBean>
where CompositeResultBean is a bean defined as
class CompositeResultBean {
Processor1ResultBean result1;
Processor2ResultBean result2;
}
In your Processor #0 just delegate work to processors #1 and #2 and put result in CompositeResultBean
CompositeResultBean Processor0.process(Item item) {
final CompositeResultBean r = new CompositeResultBean();
r.setResult1(processor1.process(item));
r.setResult2(processor2.process(item));
return r;
}
Your own writer is a CompositeItemWriter that delegate to writer CompositeResultBean.result1 or CompositeResultBean.result2 (look at PropertyExtractingDelegatingItemWriter, maybe can help)
I followed Luca's suggestion to use PropertyExtractingDelegatingItemWriter as writer and I was able to work with two different entities in one single step.
First of all what I did was to define a DTO that stores the two entities/results from the processor
public class DatabaseEntry {
private AccessLogEntry accessLogEntry;
private BlockedIp blockedIp;
public AccessLogEntry getAccessLogEntry() {
return accessLogEntry;
}
public void setAccessLogEntry(AccessLogEntry accessLogEntry) {
this.accessLogEntry = accessLogEntry;
}
public BlockedIp getBlockedIp() {
return blockedIp;
}
public void setBlockedIp(BlockedIp blockedIp) {
this.blockedIp = blockedIp;
}
}
Then I passed this DTO to the writer, a PropertyExtractingDelegatingItemWriter class where I define two customized methods to write the entities into the database, see my writer code below:
#Configuration
public class LogWriter extends LogAbstract {
#Autowired
private DataSource dataSource;
#Bean()
public PropertyExtractingDelegatingItemWriter<DatabaseEntry> itemWriterAccessLogEntry() {
PropertyExtractingDelegatingItemWriter<DatabaseEntry> propertyExtractingDelegatingItemWriter = new PropertyExtractingDelegatingItemWriter<DatabaseEntry>();
propertyExtractingDelegatingItemWriter.setFieldsUsedAsTargetMethodArguments(new String[]{"accessLogEntry", "blockedIp"});
propertyExtractingDelegatingItemWriter.setTargetObject(this);
propertyExtractingDelegatingItemWriter.setTargetMethod("saveTransaction");
return propertyExtractingDelegatingItemWriter;
}
public void saveTransaction(AccessLogEntry accessLogEntry, BlockedIp blockedIp) throws SQLException {
writeAccessLogTable(accessLogEntry);
if (blockedIp != null) {
writeBlockedIp(blockedIp);
}
}
private void writeBlockedIp(BlockedIp entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO blocked_ips (ip,threshold,startDate,endDate,comment) VALUES (?,?,?,?,?)");
statement.setString(1, entry.getIp());
statement.setInt(2, threshold);
statement.setTimestamp(3, Timestamp.valueOf(startDate));
statement.setTimestamp(4, Timestamp.valueOf(endDate));
statement.setString(5, entry.getComment());
statement.execute();
}
private void writeAccessLogTable(AccessLogEntry entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO log_entries (date,ip,request,status,userAgent) VALUES (?,?,?,?,?)");
statement.setTimestamp(1, Timestamp.valueOf(entry.getDate()));
statement.setString(2, entry.getIp());
statement.setString(3, entry.getRequest());
statement.setString(4, entry.getStatus());
statement.setString(5, entry.getUserAgent());
statement.execute();
}
}
With this approach you can get the wanted inital behaviour from a single reader for processing multiple entities and save them in a single step.
You can use a CompositeItemProcessor and CompositeItemWriter
It won't look exactly like your schema, it will be sequential, but it will do the job.
this is the solution I came up with.
So, the idea is to code a new Writer that "contains" both an ItemProcessor and an ItemWriter. Just to give you an idea, we called it PreprocessoWriter, and that's the core code.
private ItemWriter<O> writer;
private ItemProcessor<I, O> processor;
#Override
public void write(List<? extends I> items) throws Exception {
List<O> toWrite = new ArrayList<O>();
for (I item : items) {
toWrite.add(processor.process(item));
}
writer.write(toWrite);
}
There's a lot of things being left aside. Management of ItemStream, for instance. But in our particular scenario this was enough.
So you can just combine multiple PreprocessorWriter with CompositeWriter.
There is an other solution if you have a reasonable amount of items (like less than 1 Go) : you can cache the result of your select into a collection wrapped in a Spring bean.
Then u can just read the collection twice with no cost.

Spring Batch: How to Insert multiple key-value pairs into Database table for each item

After processing some XML files with Spring Batch ItemProcessor.
The ItemProcessor returns items like this:
MetsModsDef
{
int id;
String title;
String path;
Properties identifers;
....
}
now i need to save this items into a database, so that the
(id, title, path) will go into the "Work" table
and all the Properties stored in the "identifiers" field go into a "Key/Value"-Table called "Identifier" (work, identitytype, identityValue)
how can i acheive this?
currently i am using a CompositeItemWriter to split the object and write it into two tables like this:
public ItemWriter<MetsModsDef> MultiTableJdbcWriter(#Qualifier("dataSource") DataSource dataSource) {
CompositeItemWriter<MetsModsDef> cWriter = new CompositeItemWriter<MetsModsDef>();
JdbcBatchItemWriter hsqlWorkWriter = new JdbcBatchItemWriterBuilder()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO work (id, title, path,enabled) VALUES (:id, :title,:path,1)" )
.dataSource(dataSource)
.build();
JdbcBatchItemWriter hsqlIdentifierWriter = new JdbcBatchItemWriterBuilder()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO identity (work, identitytype, identityValue) VALUES (:work, :identitytype, :identityValue)" )
.dataSource(dataSource)
.build();
List<ItemWriter<? super MetsModsDef>> mWriter = new ArrayList<ItemWriter<? super MetsModsDef>>();
mWriter.add(hsqlWorkWriter);
mWriter.add(hsqlIdentifierWriter);
cWriter.setDelegates(mWriter);
but this will not work for a propertylist since (work, identitytype, identityValue) are not part of my domain object MetModsDef which only contains one map of properties which are supposed to go into the Identifier table.
i have found advice on how to do it when writing to a file,
and even on using a splitter pattern from Spring-Integration Read one record/item and write multiple records/items using spring batch
but i am still not sure how to actually do it, when writing out via jdbc or hibernate (which i assume would be similarish )
thanks for your advice !
in case somebody is interested: after a while i have come up with own solution:
I have found one extending HibernateItemWriter (for hibernate writes) on the internet:
Spring-Batch Multi-line record Item Writer with variable number of lines per record
but i did not want to extend classes, so i had to come up with my own (based on what i could research on the internet).
I am not sure how good is, and how it will handle transactions or rollback (probably bad). but for now it is the only one i have. So if you need one too, or have comments on how to improve it. or even have a better one. You are very welcome.
i have created my own IdentifierListWriter which creates the Key/value-pairs-like-objects (here each pair is called "identitifier") for each MetsModsDef Item and writes out them all using JdbcBatchItemWriter identifierWriter wich is passed to it from the configuration
public class IdentifierListWriter implements ItemWriter<MetsModsDef>
{
private ItemWriter<Identifier> _identifierWriter;
public IdentifierListWriter ( JdbcBatchItemWriter<Identifier> identifierWriter )
{
_identifierWriter= identifierWriter;
}
#Transactional(readOnly = false, propagation = Propagation.REQUIRED)
public void write(List<? extends MetsModsDef> items) throws Exception
{
// Main Table WRITER
for ( MetsModsDef item : items )
{
ArrayList<Identifier> ids = new ArrayList<Identifier>();
for(String key : item.getAllIds().stringPropertyNames())
{
ids.add(new Identifier(item.getAllIds().getProperty(key),
key, item.getId()));
}
_identifierWriter.write(ids);
}
}
}
In the java configuration i create two jdbcBatchItemWriter Beans. One for the "Work" table and one for the "identifier" table. IdentifierListWriter bean and a CompositeItemWriter MultiTableJdbcWriter Bean which uses them all to write out the object
#Bean
#Primary
public ItemWriter<MetsModsDef> MultiTableJdbcWriter(#Qualifier("dataSource") DataSource dataSource) {
IdentifierListWriter identifierListWriter = new IdentifierListWriter(identifierWriter(dataSource) );
CompositeItemWriter cWriter = new CompositeItemWriter();
cWriter.setDelegates(Arrays.asList(hsqlWorkWriter(dataSource),identifierListWriter));
return cWriter;
}
#Bean
public JdbcBatchItemWriter<MetsModsDef> hsqlWorkWriter(#Qualifier("dataSource") DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<MetsModsDef>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO work (id, title, path,enabled) VALUES (:id, :title,:path,1)")
.dataSource(dataSource)
.build();
}
#Bean
public JdbcBatchItemWriter<Identifier> identifierWriter(#Qualifier("dataSource") DataSource dataSource) {
return new JdbcBatchItemWriterBuilder()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO identifier (identifier, type, work_id) VALUES ( :identifier, :type, :work)")
.dataSource(dataSource)
//.afterPropertiesSet()
.build();
}
then the multiTableJdbcWriter is called from a Step:
#Bean
public Step step1(ItemWriter<MetsModsDef> multiTableJdbcWriter) {
return stepBuilderFactory.get("step1")
.<StreamSource, MetsModsDef>chunk(1)
.reader(new MetsModsReader())
.processor(metsModsFileProcessor())
.writer(multiTableJdbcWriter)

FlatFileFooterCallback - how to get access to StepExecution For Count

I am reading from Oracle and writing to a CSV file. I have one step which reads and writes to the CSV file. I implemented a ChunkListener so I know how many records were written.
I want to be able to write a file trailer showing the number of records written to my file. I implemented FlatFileFooterCallback but cannot figure out how to get the data from StepExecution (the "readCount") to my FlatFileFooterCallback.
I guess I am struggling with how to get access to Job, Step scopes in my write.
Any examples, or links would be helpful. I am using [Spring Batch / Boot] so I am all annotated. I can find xml examples, so maybe this annotated stuff is more complicated.
ItemWriter<Object> databaseCsvItemWriter() {
FlatFileItemWriter<Object> csvFileWriter = new FlatFileItemWriter<>();
String exportFileHeader = "one,two,three";
StringHeaderWriter headerWriter = new StringHeaderWriter(exportFileHeader);
csvFileWriter.setHeaderCallback(headerWriter);
String exportFilePath = "/tmp/students.csv";
csvFileWriter.setResource(new FileSystemResource(exportFilePath));
LineAggregator<McsendRequest> lineAggregator = createRequestLineAggregator();
csvFileWriter.setLineAggregator(lineAggregator);
csvFileWriter.setFooterCallback(headerWriter);
return csvFileWriter;
}
You can implement CustomFooterCallback as follows:
public class CustomFooterCallback implements FlatFileFooterCallback {
#Value("#{StepExecution}")
private StepExecution stepExecution;
#Override
public void writeFooter(Writer writer) throws IOException {
writer.write("footer - number of items read: " + stepExecution.getReadCount());
writer.write("footer - number of items written: " + stepExecution.getWriteCount());
}
}
Then in a #Configuration class:
#Bean
#StepScope
public FlatFileFooterCallback customFooterCallback() {
return new CustomFooterCallback();
}
And use in the Writer:
csvFileWriter.setFooterCallback(customFooterCallback());
This way, you have access to StepExecution in order to read data as needed.

Spring Batch Annotated No XML Pass Parameters to Item Readere

I created a simple Boot/Spring Batch 3.0.8.RELEASE job. I created a simple class that implements JobParametersIncrementer to go to the database, look up how many days the query should look for and puts those into the JobParameters object.
I need that value in my JdbcCursorItemReader, as it is selecting data based upon one of the looked up JobParameters, but I cannot figure this out via Java annotations. XML examples plenty, not so much for Java.
Below is my BatchConfiguration class that runs job.
`
#Autowired
SendJobParms jobParms; // this guy queries DB and puts data into JobParameters
#Bean
public Job job(#Qualifier("step1") Step step1, #Qualifier("step2") Step step2) {
return jobs.get("DW_Send").incrementer(jobParms).start(step1).next(step2).build();
}
#Bean
protected Step step2(ItemReader<McsendRequest> reader,
ItemWriter<McsendRequest> writer) {
return steps.get("step2")
.<McsendRequest, McsendRequest> chunk(5000)
.reader(reader)
.writer(writer)
.build();
}
#Bean
public JdbcCursorItemReader reader() {
JdbcCursorItemReader<McsendRequest> itemReader = new JdbcCursorItemReader<McsendRequest>();
itemReader.setDataSource(dataSource);
// want to get access to JobParameter here so I can pull values out for my sql query.
itemReader.setSql("select xxxx where rownum <= JobParameter.getCount()");
itemReader.setRowMapper(new McsendRequestMapper());
return itemReader;
}
`
Change reader definition as follows (example for parameter of type Long and name paramCount):
#Bean
#StepScope
public JdbcCursorItemReader reader(#Value("#{jobParameters[paramCount]}") Long paramCount) {
JdbcCursorItemReader<McsendRequest> itemReader = new JdbcCursorItemReader<McsendRequest>();
itemReader.setDataSource(dataSource);
itemReader.setSql("select xxxx where rownum <= ?");
ListPreparedStatementSetter listPreparedStatementSetter = new ListPreparedStatementSetter();
listPreparedStatementSetter.setParameters(Arrays.asList(paramCount));
itemReader.setPreparedStatementSetter(listPreparedStatementSetter);
itemReader.setRowMapper(new McsendRequestMapper());
return itemReader;
}

Spring Batch: One reader, multiple processors and writers

In Spring batch I need to pass the items read by an ItemReader to two different processors and writer. What I'm trying to achieve is that...
+---> ItemProcessor#1 ---> ItemWriter#1
|
ItemReader ---> item ---+
|
+---> ItemProcessor#2 ---> ItemWriter#2
This is needed because items written by ItemWriter#1 should be processed in a completely different way compared to the ones written by ItemWriter#2.
Moreover, ItemReader reads item from a database, and the queries it executes are so computational expensive that executing the same query twice should be discarded.
Any hint about how to achieve such set up ? Or, at least, a logically equivalent set up ?
This solution is valid if your item should be processed by processor #1 and processor #2
You have to create a processor #0 with this signature:
class Processor0<Item, CompositeResultBean>
where CompositeResultBean is a bean defined as
class CompositeResultBean {
Processor1ResultBean result1;
Processor2ResultBean result2;
}
In your Processor #0 just delegate work to processors #1 and #2 and put result in CompositeResultBean
CompositeResultBean Processor0.process(Item item) {
final CompositeResultBean r = new CompositeResultBean();
r.setResult1(processor1.process(item));
r.setResult2(processor2.process(item));
return r;
}
Your own writer is a CompositeItemWriter that delegate to writer CompositeResultBean.result1 or CompositeResultBean.result2 (look at PropertyExtractingDelegatingItemWriter, maybe can help)
I followed Luca's suggestion to use PropertyExtractingDelegatingItemWriter as writer and I was able to work with two different entities in one single step.
First of all what I did was to define a DTO that stores the two entities/results from the processor
public class DatabaseEntry {
private AccessLogEntry accessLogEntry;
private BlockedIp blockedIp;
public AccessLogEntry getAccessLogEntry() {
return accessLogEntry;
}
public void setAccessLogEntry(AccessLogEntry accessLogEntry) {
this.accessLogEntry = accessLogEntry;
}
public BlockedIp getBlockedIp() {
return blockedIp;
}
public void setBlockedIp(BlockedIp blockedIp) {
this.blockedIp = blockedIp;
}
}
Then I passed this DTO to the writer, a PropertyExtractingDelegatingItemWriter class where I define two customized methods to write the entities into the database, see my writer code below:
#Configuration
public class LogWriter extends LogAbstract {
#Autowired
private DataSource dataSource;
#Bean()
public PropertyExtractingDelegatingItemWriter<DatabaseEntry> itemWriterAccessLogEntry() {
PropertyExtractingDelegatingItemWriter<DatabaseEntry> propertyExtractingDelegatingItemWriter = new PropertyExtractingDelegatingItemWriter<DatabaseEntry>();
propertyExtractingDelegatingItemWriter.setFieldsUsedAsTargetMethodArguments(new String[]{"accessLogEntry", "blockedIp"});
propertyExtractingDelegatingItemWriter.setTargetObject(this);
propertyExtractingDelegatingItemWriter.setTargetMethod("saveTransaction");
return propertyExtractingDelegatingItemWriter;
}
public void saveTransaction(AccessLogEntry accessLogEntry, BlockedIp blockedIp) throws SQLException {
writeAccessLogTable(accessLogEntry);
if (blockedIp != null) {
writeBlockedIp(blockedIp);
}
}
private void writeBlockedIp(BlockedIp entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO blocked_ips (ip,threshold,startDate,endDate,comment) VALUES (?,?,?,?,?)");
statement.setString(1, entry.getIp());
statement.setInt(2, threshold);
statement.setTimestamp(3, Timestamp.valueOf(startDate));
statement.setTimestamp(4, Timestamp.valueOf(endDate));
statement.setString(5, entry.getComment());
statement.execute();
}
private void writeAccessLogTable(AccessLogEntry entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO log_entries (date,ip,request,status,userAgent) VALUES (?,?,?,?,?)");
statement.setTimestamp(1, Timestamp.valueOf(entry.getDate()));
statement.setString(2, entry.getIp());
statement.setString(3, entry.getRequest());
statement.setString(4, entry.getStatus());
statement.setString(5, entry.getUserAgent());
statement.execute();
}
}
With this approach you can get the wanted inital behaviour from a single reader for processing multiple entities and save them in a single step.
You can use a CompositeItemProcessor and CompositeItemWriter
It won't look exactly like your schema, it will be sequential, but it will do the job.
this is the solution I came up with.
So, the idea is to code a new Writer that "contains" both an ItemProcessor and an ItemWriter. Just to give you an idea, we called it PreprocessoWriter, and that's the core code.
private ItemWriter<O> writer;
private ItemProcessor<I, O> processor;
#Override
public void write(List<? extends I> items) throws Exception {
List<O> toWrite = new ArrayList<O>();
for (I item : items) {
toWrite.add(processor.process(item));
}
writer.write(toWrite);
}
There's a lot of things being left aside. Management of ItemStream, for instance. But in our particular scenario this was enough.
So you can just combine multiple PreprocessorWriter with CompositeWriter.
There is an other solution if you have a reasonable amount of items (like less than 1 Go) : you can cache the result of your select into a collection wrapped in a Spring bean.
Then u can just read the collection twice with no cost.

Resources