Spring Batch reader file by file - spring

I'm developing a Spring webapp, using spring boot and spring batch frameworks.
We have a set of complex & different json files, and we need to:
read each file
slightly modify its content
finally store them in mongodb.
The question: It makes sense to use spring batch for this task? As I can see in tutorials examples etc, spring batch is the right tool for line by line processing, but what about file by file?
I don't have problems with the writer (MongoItemWritter) and processer, but I do not see how to implement the reader.
Thanks!

yes you can definetly use Spring Batch.
The item for your Reader can be a File.
public class CustomItemReader implements InitializingBean{
private List<File> yourFiles= null;
public File read() {
if ((yourFiles!= null) && (yourFiles.size() != 0)) {
return yourFiles.remove(0);
}
return null;
}
//Reading Items from Service
private void reloadItems() {
this.yourItems= new ArrayList<File>();
// populate the items
}
#Override
public void afterPropertiesSet() throws Exception {
reloadItems();
}
}
A custom Processor :
public class MyProcessor implements ItemProcessor<File, File> {
#Override
public File process(File arg0) throws Exception {
// Apply any logic to your File before transferring it to the writer
return arg0;
}
}
And A custom Writer :
public class MyWriter{
public void write(File file) throws IOException {
}
}

Related

Using Spring Batch job for processing data fetched from Rest api

There is a requirement where I need to read and process data fetched from a rest api let say restApi1 and write to different rest api let say restApi2 .
For this I am using chunk oriented approach .
But the issue is currently the restApi1 is not paginated .
That endpoint returns a large number of data approximately 10000 .
So if my step failed then while restarting I have to read all the data again and process .
I can not start from where it failed .
Is this thought correct in relation to spring batch processing ?
Kindly suggest some possible approach .
ItemReader
public class MyItemReader extends ItemStreamSupport implements ItemReader<Data> {
private int curIndex = 0;
#Override
public void open(ExecutionContext executionContext) {
this.curIndex = 0;
}
#Override
public void update(ExecutionContext executionContext) {
}
}
ItemProcessor
public class MyItemProcessor extends ItemStreamSupport implements ItemProcessor<Data1, Data2> {
#Override
public Data2 process(Data1 data1) throws Exception {
}
}
ItemWriter
public class MyItemWriter extends ItemStreamSupport implements ItemWriter<Data2> {
#Override
public void write(List<? extends Data2> listOfData) throws Exception {
}
}
You can download data to a file or a staging table in a tasklet step, then make your item reader read items from there.
In case of failure, the tasklet step should not be restarted, whereas your chunk-oriented step would resume from where it left off in the previous run from the file or staging table.

Spring batch - Processor or Writer?

In my Spring boot and Spring batch application, I have a step like this:
#Bean
public Step step1() {
return stepBuilderFactory.get("step1").<FileInfo, FileInfo>chunk(10).reader(FileInfoItemReader).processor(processor()).writer(writer()).build();
}
My writer is a empty like below:
public class BlankWriter<T> implements ItemWriter<T> {
#Override
public void write(List<? extends T> items) throws Exception {
}
}
Now, in my processor I have this:
public class FileInfoItemProcessor implements ItemProcessor<FileInfo, FileInfo> {
.....
#Override
public FileInfo process(final FileInfo FileInfo) throws Exception {
myCustomStuff () {
......
}
}
public static void myCustomStuff() {
......
......
}
}
Question: As all the objects are passed to processor, I can deal with them in my processor itself rather using any transformations etc AND since my purpose get solved by using processor, is it a good practice? or I must use a writer/custom-writer to get the job done?
I think doing the REST POST call in the writer is more appropriate than doing it in the processor. A REST POST call is a kind of write operation to a remote location.
So you can omit the processor (since it is optional) and move that code to the item writer (instead of using a NoOp item writer with an empty write method).

How to enable Spring Boot to display a list of files under a directory

I have a folder structure /data/reports on a file system, which contains all reports.
How can I configure a SpringBoot application to serve the contents of this file sytem.
Currently I have tried few options, but none working
#Configuration
#EnableWebMvc
public class AppConfig implements WebMvcConfigurer {
#Value(value = "${spring.resources.static-locations:#{null}}")
private String fileSystem;
#Override
public void addResourceHandlers(ResourceHandlerRegistry registry) {
registry
.addResourceHandler("/data/reports/**")
.addResourceLocations(fileSystem)
.setCachePeriod(3600)
.resourceChain(true)
.addResolver(new PathResourceResolver());
}
}
and in application.properties I have defined
spring.resources.static-locations=file:///data/reports
server.servlet.jsp.init-parameters.listings=true
But in both cases, when I try
http://host:port/application/data/reports
I'm getting 404
What am I missing ?
Based on the suggestions given, I realized that one mistake I'm doing is to access the reports via
http://host:port/application/data/reports
instead of
http://host:port/data/reports
if I use application in the request, those calls will go through RequestDispatcher and will try to find for a matching RequestMapping, which does not exist. I think I'm convinced so far.
But the problem I'm seeing now is, I'm getting SocketTimeoutException while trying to read from the resource listed in the URL. I had put some breakpoints in Spring source "ResourceHttpMessageConverter.java"
protected void writeContent(Resource resource, HttpOutputMessage outputMessage)
throws IOException, HttpMessageNotWritableException {
try {
InputStream in = resource.getInputStream(); //It is timing out here
try {
StreamUtils.copy(in, outputMessage.getBody());
}
catch (NullPointerException ex) {
// ignore, see SPR-13620
}
The resource is a small text file with 1 line "Hello World". Yet it is timing out.
The resource in the above class is a FileUrlResource opened on file:///c:/data/reports/sample.txt
On the other hand, I tried to read that resource as
File file = new File("c:/data/reports/sample.txt");
System.out.println(file.exists());
URL url = file.toURI().toURL();
URLConnection con = url.openConnection();
InputStream is = con.getInputStream(); //This works
Thanks

Add camel route at runtime using end points configured in property file

I own a spring application and want to add camel routes dynamically during my application startup.End points are configured in property file and are loaded at run time.
Using Java DSL, i am using for loop to create all routes,
for(int i=0;i<allEndPoints;i++)
{
DynamcRouteBuilder route = new
DynamcRouteBuilder(context,fromUri,toUri)
camelContext.addRoutes(route)
}
private class DynamcRouteBuilder extends RouteBuilder {
private final String from;
private final String to;
private MyDynamcRouteBuilder(CamelContext context, String from, String to) {
super(context);
this.from = from;
this.to = to;
}
#Override
public void configure() throws Exception {
from(from).to(to);
}
}
but getting below exception while creating first route itself
Failed to create route file_routedirect: at: >>> OnException[[class org.apache.camel.component.file.GenericFileOperationFailedException] -> [Log[Exception trapped ${exception.class}], process[Processor#0x0]]] <<< in route: Route(file_routedirect:)[[From[direct:... because of ref must be specified on: process[Processor#0x0]\n\ta
Not sure about it- what is the issue ? Can someone has any suggestion or fix for this. Thanks
Well, to create routes in an iteration it is nice to have some object that holds the different values for one route. Let's call this RouteConfiguration, a simple POJO with String fields for from, to and routeId.
We are using YAML files to configure such things because you have a real List format instead of using "flat lists" in property files (route[0].from, route[0].to).
If you use Spring you can directly transform such a "list of object configurations" into a Collection of objects using #ConfigurationProperties
When you are able to create such a Collection of value objects, you can simply iterate over it. Here is a strongly simplified example.
#Override
public void configure() {
createConfiguredRoutes();
}
void createConfiguredRoutes() {
configuration.getRoutes().forEach(this::addRouteToContext);
}
// Implement route that is added in an iteration
private void addRouteToContext(final RouteConfiguration routeConfiguration) throws Exception {
this.camelContext.addRoutes(new RouteBuilder() {
#Override
public void configure() throws Exception {
from(routeConfiguration.getFrom())
.routeId(routeConfiguration.getRouteId())
...
.to(routeConfiguration.getTo());
}
});
}

Spring Batch: FlatFileItemWriter to InputSteam/Byte array

So I've created a batch job which generates reports (csv files). I have been able to generate the the files seamlessly using FlatFileItemWriter but my end goal is to create an InputStream to call a rest service which will store the document or a byte array to store it in the database.
public class CustomWriter implements ItemWriter<Report> {
#Override
public void write(List<? extends Report> reports) throws Exception {
reports.forEach(this::writeDataToFile);
}
private void writeDataToFile(final Report data) throws Exception {
FlatFileItemWriter writer = new FlatFileItemWriter();
writer.setResource(new FileSystemResource("C:/reports/test-report.csv"));
writer.setLineAggregator(getLineAggregator();
writer.afterPropertiesSet();
writer.open(new ExecutionContext());
writer.write(data);
writer.close();
}
private DelimitedLineAggregator<Report> getLineAggregator(final Report report) {
DelimitedLineAggregator<Report> delimitedLineAgg = new DelimitedLineAggregator<Report>();
delimitedLineAgg.setDelimiter(",");
delimitedLineAgg.setFieldExtractor(getFieldExtractor());
return delimitedLineAgg;
}
private FieldExtractor<Report> getFieldExtractor() {
BeanWrapperFieldExtractor<Report> fieldExtractor = new BeanWrapperFieldExtractor<Report>();
fieldExtractor.setNames(COLUMN_HEADERS.toArray(new String[0]));
return fieldExtractor;
}
}
One way I could do this is to store the file locally temporarily and create a new step to pick the generated files up and do the sending/storing but I would really like to skip this step and send/store it in the first step.
How do I go about doing this?

Resources