Spring batch - Processor or Writer? - spring-boot

In my Spring boot and Spring batch application, I have a step like this:
#Bean
public Step step1() {
return stepBuilderFactory.get("step1").<FileInfo, FileInfo>chunk(10).reader(FileInfoItemReader).processor(processor()).writer(writer()).build();
}
My writer is a empty like below:
public class BlankWriter<T> implements ItemWriter<T> {
#Override
public void write(List<? extends T> items) throws Exception {
}
}
Now, in my processor I have this:
public class FileInfoItemProcessor implements ItemProcessor<FileInfo, FileInfo> {
.....
#Override
public FileInfo process(final FileInfo FileInfo) throws Exception {
myCustomStuff () {
......
}
}
public static void myCustomStuff() {
......
......
}
}
Question: As all the objects are passed to processor, I can deal with them in my processor itself rather using any transformations etc AND since my purpose get solved by using processor, is it a good practice? or I must use a writer/custom-writer to get the job done?

I think doing the REST POST call in the writer is more appropriate than doing it in the processor. A REST POST call is a kind of write operation to a remote location.
So you can omit the processor (since it is optional) and move that code to the item writer (instead of using a NoOp item writer with an empty write method).

Related

Using Spring Batch job for processing data fetched from Rest api

There is a requirement where I need to read and process data fetched from a rest api let say restApi1 and write to different rest api let say restApi2 .
For this I am using chunk oriented approach .
But the issue is currently the restApi1 is not paginated .
That endpoint returns a large number of data approximately 10000 .
So if my step failed then while restarting I have to read all the data again and process .
I can not start from where it failed .
Is this thought correct in relation to spring batch processing ?
Kindly suggest some possible approach .
ItemReader
public class MyItemReader extends ItemStreamSupport implements ItemReader<Data> {
private int curIndex = 0;
#Override
public void open(ExecutionContext executionContext) {
this.curIndex = 0;
}
#Override
public void update(ExecutionContext executionContext) {
}
}
ItemProcessor
public class MyItemProcessor extends ItemStreamSupport implements ItemProcessor<Data1, Data2> {
#Override
public Data2 process(Data1 data1) throws Exception {
}
}
ItemWriter
public class MyItemWriter extends ItemStreamSupport implements ItemWriter<Data2> {
#Override
public void write(List<? extends Data2> listOfData) throws Exception {
}
}
You can download data to a file or a staging table in a tasklet step, then make your item reader read items from there.
In case of failure, the tasklet step should not be restarted, whereas your chunk-oriented step would resume from where it left off in the previous run from the file or staging table.

Spring Batch Single Reader Multiple Processers and Multiple Writers [duplicate]

In Spring batch I need to pass the items read by an ItemReader to two different processors and writer. What I'm trying to achieve is that...
+---> ItemProcessor#1 ---> ItemWriter#1
|
ItemReader ---> item ---+
|
+---> ItemProcessor#2 ---> ItemWriter#2
This is needed because items written by ItemWriter#1 should be processed in a completely different way compared to the ones written by ItemWriter#2.
Moreover, ItemReader reads item from a database, and the queries it executes are so computational expensive that executing the same query twice should be discarded.
Any hint about how to achieve such set up ? Or, at least, a logically equivalent set up ?
This solution is valid if your item should be processed by processor #1 and processor #2
You have to create a processor #0 with this signature:
class Processor0<Item, CompositeResultBean>
where CompositeResultBean is a bean defined as
class CompositeResultBean {
Processor1ResultBean result1;
Processor2ResultBean result2;
}
In your Processor #0 just delegate work to processors #1 and #2 and put result in CompositeResultBean
CompositeResultBean Processor0.process(Item item) {
final CompositeResultBean r = new CompositeResultBean();
r.setResult1(processor1.process(item));
r.setResult2(processor2.process(item));
return r;
}
Your own writer is a CompositeItemWriter that delegate to writer CompositeResultBean.result1 or CompositeResultBean.result2 (look at PropertyExtractingDelegatingItemWriter, maybe can help)
I followed Luca's suggestion to use PropertyExtractingDelegatingItemWriter as writer and I was able to work with two different entities in one single step.
First of all what I did was to define a DTO that stores the two entities/results from the processor
public class DatabaseEntry {
private AccessLogEntry accessLogEntry;
private BlockedIp blockedIp;
public AccessLogEntry getAccessLogEntry() {
return accessLogEntry;
}
public void setAccessLogEntry(AccessLogEntry accessLogEntry) {
this.accessLogEntry = accessLogEntry;
}
public BlockedIp getBlockedIp() {
return blockedIp;
}
public void setBlockedIp(BlockedIp blockedIp) {
this.blockedIp = blockedIp;
}
}
Then I passed this DTO to the writer, a PropertyExtractingDelegatingItemWriter class where I define two customized methods to write the entities into the database, see my writer code below:
#Configuration
public class LogWriter extends LogAbstract {
#Autowired
private DataSource dataSource;
#Bean()
public PropertyExtractingDelegatingItemWriter<DatabaseEntry> itemWriterAccessLogEntry() {
PropertyExtractingDelegatingItemWriter<DatabaseEntry> propertyExtractingDelegatingItemWriter = new PropertyExtractingDelegatingItemWriter<DatabaseEntry>();
propertyExtractingDelegatingItemWriter.setFieldsUsedAsTargetMethodArguments(new String[]{"accessLogEntry", "blockedIp"});
propertyExtractingDelegatingItemWriter.setTargetObject(this);
propertyExtractingDelegatingItemWriter.setTargetMethod("saveTransaction");
return propertyExtractingDelegatingItemWriter;
}
public void saveTransaction(AccessLogEntry accessLogEntry, BlockedIp blockedIp) throws SQLException {
writeAccessLogTable(accessLogEntry);
if (blockedIp != null) {
writeBlockedIp(blockedIp);
}
}
private void writeBlockedIp(BlockedIp entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO blocked_ips (ip,threshold,startDate,endDate,comment) VALUES (?,?,?,?,?)");
statement.setString(1, entry.getIp());
statement.setInt(2, threshold);
statement.setTimestamp(3, Timestamp.valueOf(startDate));
statement.setTimestamp(4, Timestamp.valueOf(endDate));
statement.setString(5, entry.getComment());
statement.execute();
}
private void writeAccessLogTable(AccessLogEntry entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO log_entries (date,ip,request,status,userAgent) VALUES (?,?,?,?,?)");
statement.setTimestamp(1, Timestamp.valueOf(entry.getDate()));
statement.setString(2, entry.getIp());
statement.setString(3, entry.getRequest());
statement.setString(4, entry.getStatus());
statement.setString(5, entry.getUserAgent());
statement.execute();
}
}
With this approach you can get the wanted inital behaviour from a single reader for processing multiple entities and save them in a single step.
You can use a CompositeItemProcessor and CompositeItemWriter
It won't look exactly like your schema, it will be sequential, but it will do the job.
this is the solution I came up with.
So, the idea is to code a new Writer that "contains" both an ItemProcessor and an ItemWriter. Just to give you an idea, we called it PreprocessoWriter, and that's the core code.
private ItemWriter<O> writer;
private ItemProcessor<I, O> processor;
#Override
public void write(List<? extends I> items) throws Exception {
List<O> toWrite = new ArrayList<O>();
for (I item : items) {
toWrite.add(processor.process(item));
}
writer.write(toWrite);
}
There's a lot of things being left aside. Management of ItemStream, for instance. But in our particular scenario this was enough.
So you can just combine multiple PreprocessorWriter with CompositeWriter.
There is an other solution if you have a reasonable amount of items (like less than 1 Go) : you can cache the result of your select into a collection wrapped in a Spring bean.
Then u can just read the collection twice with no cost.

Spring Batch Processor Exception Listener?

I have a partitioned Spring Batch job that reads several split up CSV files and processes each in their own thread, then writes the results to a corresponding output file.
If an item fails to process though (an exception is thrown), I want to write that result to an error file. Is there a way to add a writer or listener that can handle this?
Taking this one step further, is there a way to split this up by exception type and write the different exceptions to different files?
You can achieve this by specifying SkipPolicy. Implement this interface and add your own logic.
public class MySkipper implements SkipPolicy {
#Override
public boolean shouldSkip(Throwable exception, int skipCount) throws SkipLimitExceededException {
if (exception instanceof XYZException) {
//doSomething
}
......
}
You can specify this skip policy in your batch.
this.stepBuilders.get("importStep").<X, Y>chunk(10)
.reader(this.getItemReader()).faultTolerant().skipPolicy(....)
.processor(this.getItemProcessor())
.writer(this.getItemWriter())
.build();
One way that I have seen this done is through a combination of a SkipPolicy and a SkipListener.
The policy would allow you to skip over items that threw an exception, such as a FlatFileParseException (skippable exceptions can be configured).
The listener gives you access to the Throwable and the item that caused it (or just Throwable in the case of reads). The skip listener also lets you differentiate between skips in the read/processor/writer if you wanted to handle those separately.
public class ErrorWritingSkipListener<T, S> implements SkipListener<T, S> {
#Override
public void onSkipInRead(final Throwable t) {
// custom logic
}
#Override
public void onSkipInProcess(final T itemThatFailed, final Throwable t) {
// custom logic
}
#Override
public void onSkipInWrite(final S itemThatFailed, final Throwable t) {
// custom logic
}
}
I would recommend using the SkipPolicy only to identify the exceptions you want to write out to your various files, and leveraging the SkipListener to perform the actual file writing logic. That would match up nicely with their intended use as defined by their interfaces.

Spring Batch reader file by file

I'm developing a Spring webapp, using spring boot and spring batch frameworks.
We have a set of complex & different json files, and we need to:
read each file
slightly modify its content
finally store them in mongodb.
The question: It makes sense to use spring batch for this task? As I can see in tutorials examples etc, spring batch is the right tool for line by line processing, but what about file by file?
I don't have problems with the writer (MongoItemWritter) and processer, but I do not see how to implement the reader.
Thanks!
yes you can definetly use Spring Batch.
The item for your Reader can be a File.
public class CustomItemReader implements InitializingBean{
private List<File> yourFiles= null;
public File read() {
if ((yourFiles!= null) && (yourFiles.size() != 0)) {
return yourFiles.remove(0);
}
return null;
}
//Reading Items from Service
private void reloadItems() {
this.yourItems= new ArrayList<File>();
// populate the items
}
#Override
public void afterPropertiesSet() throws Exception {
reloadItems();
}
}
A custom Processor :
public class MyProcessor implements ItemProcessor<File, File> {
#Override
public File process(File arg0) throws Exception {
// Apply any logic to your File before transferring it to the writer
return arg0;
}
}
And A custom Writer :
public class MyWriter{
public void write(File file) throws IOException {
}
}

Spring Batch: One reader, multiple processors and writers

In Spring batch I need to pass the items read by an ItemReader to two different processors and writer. What I'm trying to achieve is that...
+---> ItemProcessor#1 ---> ItemWriter#1
|
ItemReader ---> item ---+
|
+---> ItemProcessor#2 ---> ItemWriter#2
This is needed because items written by ItemWriter#1 should be processed in a completely different way compared to the ones written by ItemWriter#2.
Moreover, ItemReader reads item from a database, and the queries it executes are so computational expensive that executing the same query twice should be discarded.
Any hint about how to achieve such set up ? Or, at least, a logically equivalent set up ?
This solution is valid if your item should be processed by processor #1 and processor #2
You have to create a processor #0 with this signature:
class Processor0<Item, CompositeResultBean>
where CompositeResultBean is a bean defined as
class CompositeResultBean {
Processor1ResultBean result1;
Processor2ResultBean result2;
}
In your Processor #0 just delegate work to processors #1 and #2 and put result in CompositeResultBean
CompositeResultBean Processor0.process(Item item) {
final CompositeResultBean r = new CompositeResultBean();
r.setResult1(processor1.process(item));
r.setResult2(processor2.process(item));
return r;
}
Your own writer is a CompositeItemWriter that delegate to writer CompositeResultBean.result1 or CompositeResultBean.result2 (look at PropertyExtractingDelegatingItemWriter, maybe can help)
I followed Luca's suggestion to use PropertyExtractingDelegatingItemWriter as writer and I was able to work with two different entities in one single step.
First of all what I did was to define a DTO that stores the two entities/results from the processor
public class DatabaseEntry {
private AccessLogEntry accessLogEntry;
private BlockedIp blockedIp;
public AccessLogEntry getAccessLogEntry() {
return accessLogEntry;
}
public void setAccessLogEntry(AccessLogEntry accessLogEntry) {
this.accessLogEntry = accessLogEntry;
}
public BlockedIp getBlockedIp() {
return blockedIp;
}
public void setBlockedIp(BlockedIp blockedIp) {
this.blockedIp = blockedIp;
}
}
Then I passed this DTO to the writer, a PropertyExtractingDelegatingItemWriter class where I define two customized methods to write the entities into the database, see my writer code below:
#Configuration
public class LogWriter extends LogAbstract {
#Autowired
private DataSource dataSource;
#Bean()
public PropertyExtractingDelegatingItemWriter<DatabaseEntry> itemWriterAccessLogEntry() {
PropertyExtractingDelegatingItemWriter<DatabaseEntry> propertyExtractingDelegatingItemWriter = new PropertyExtractingDelegatingItemWriter<DatabaseEntry>();
propertyExtractingDelegatingItemWriter.setFieldsUsedAsTargetMethodArguments(new String[]{"accessLogEntry", "blockedIp"});
propertyExtractingDelegatingItemWriter.setTargetObject(this);
propertyExtractingDelegatingItemWriter.setTargetMethod("saveTransaction");
return propertyExtractingDelegatingItemWriter;
}
public void saveTransaction(AccessLogEntry accessLogEntry, BlockedIp blockedIp) throws SQLException {
writeAccessLogTable(accessLogEntry);
if (blockedIp != null) {
writeBlockedIp(blockedIp);
}
}
private void writeBlockedIp(BlockedIp entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO blocked_ips (ip,threshold,startDate,endDate,comment) VALUES (?,?,?,?,?)");
statement.setString(1, entry.getIp());
statement.setInt(2, threshold);
statement.setTimestamp(3, Timestamp.valueOf(startDate));
statement.setTimestamp(4, Timestamp.valueOf(endDate));
statement.setString(5, entry.getComment());
statement.execute();
}
private void writeAccessLogTable(AccessLogEntry entry) throws SQLException {
PreparedStatement statement = dataSource.getConnection().prepareStatement("INSERT INTO log_entries (date,ip,request,status,userAgent) VALUES (?,?,?,?,?)");
statement.setTimestamp(1, Timestamp.valueOf(entry.getDate()));
statement.setString(2, entry.getIp());
statement.setString(3, entry.getRequest());
statement.setString(4, entry.getStatus());
statement.setString(5, entry.getUserAgent());
statement.execute();
}
}
With this approach you can get the wanted inital behaviour from a single reader for processing multiple entities and save them in a single step.
You can use a CompositeItemProcessor and CompositeItemWriter
It won't look exactly like your schema, it will be sequential, but it will do the job.
this is the solution I came up with.
So, the idea is to code a new Writer that "contains" both an ItemProcessor and an ItemWriter. Just to give you an idea, we called it PreprocessoWriter, and that's the core code.
private ItemWriter<O> writer;
private ItemProcessor<I, O> processor;
#Override
public void write(List<? extends I> items) throws Exception {
List<O> toWrite = new ArrayList<O>();
for (I item : items) {
toWrite.add(processor.process(item));
}
writer.write(toWrite);
}
There's a lot of things being left aside. Management of ItemStream, for instance. But in our particular scenario this was enough.
So you can just combine multiple PreprocessorWriter with CompositeWriter.
There is an other solution if you have a reasonable amount of items (like less than 1 Go) : you can cache the result of your select into a collection wrapped in a Spring bean.
Then u can just read the collection twice with no cost.

Resources