Parsing multi-format & multi line data file in spring batch job - spring

I am writing a spring batch job to process the below mentioned data file and write it into a db.
Sample data file is of this format where I have multiple headers and
each header has a bunch of rows associated with it .
I can have million of records for each header and I can have n number
of headers in a flat file that am processing.My requirement is to
pick a few readers which am concerned .
For all the picked readers I need to pick all the data rows .Each
header and its data format is also different .I can receive either of
these data in my processor and need to write them into my DB.
HDR01
A|41|57|Data1|S|62|Data2|9|N|2017-02-01 18:01:05|2017-02-01 00:00:00
A|41|57|Data1|S|62|Data2|9|N|2017-02-01 18:01:05|2017-02-01 00:00:00
HDR02
A|41|57|Data1|S|62|Data2|9|N|
A|41|57|Data1|S|62|Data2|9|N|
I tried exploring the PatternMatchingCompositeLineMapper where I can
map the different header pattern I have to a tokenizer and
corresponding FieldSetMapper but I need to read the body and not the
header here .
Don't have any footer to Crete a end of line policy of my own as well .
Also tried using AggregateItemReader but don't want to club all the
records of a header before I process them .
Each rows corresponding a header should be processed parallel .
#Bean
public LineMapper myLineMapper() {
PatternMatchingCompositeLineMapper< Domain > mapper = new PatternMatchingCompositeLineMapper<>();
final Map<String, LineTokenizer> tokenizers = new HashMap<String, LineTokenizer>();
tokenizers.put("* HDR01*", new DelimitedLineTokenizer());
tokenizers.put("*HDR02*", new DelimitedLineTokenizer());
tokenizers.put("*", new DelimitedLineTokenizer("|"));
mapper.setTokenizers(tokenizers);
Map<String, FieldSetMapper<VMSFeedStyleInfo>> mappers = new HashMap<String, FieldSetMapper<VMSFeedStyleInfo>>();
try {
mappers.put("* HDR01*", customMapper());
mappers.put("*HDR02*", customMapper());
mappers.put("*", customMapper() );
} catch (Exception e) {
e.printStackTrace();
}
mapper.setFieldSetMappers(mappers);
return mapper;
}
Can somebody help me provide some inputs as to how should I achieve this .

Related

Reading a csv file with huge number of records and a bulk insert using spring boot

Not looking for Spring Batch
Just to use spring boot and read a csv file with huge number of records, using BufferedReader, How can I use a bulk insert if I go with this approach.
Storing all student records in a List and partition the list to chunks and use saveAll is something I was planning to do, may not be a good approach
try(BufferedReader reader = new BufferedReader(new InputStreamReader(new FileSystemResource("/tmp/test.csv").getInputStream(), DEFAULT_CHARSET))) {
reader
.lines()
.skip(1)
.map(s -> {
return student;
}).forEach(s -> {
studentRepository.save(s);
});
}

How to generate multiple files with one controller in spring boot

Hi I have created a spring boot application to generate csv files that fetch data from database and write them into a csv file. Also I have added the functionality to select the columns that we need from the database to be included in the csv. Now I need to download multiple files with different columns from the database. I tried to repeat the code to generate csv in the same class but it simply adds the content required in the second file in the first file.Please let me know what can I do. Example if there are 4 columns:-id, amount, currency,name then id amount in 1 file name and currency in another.following is my code
Controller:
public void exportCSV(#RequestParam(name="cohort") String cohort ,HttpServletResponse response) throws Exception {
// set file name and content type
String filename = "liabilities.csv";
response.setContentType("text/csv");
response.setHeader(HttpHeaders.CONTENT_DISPOSITION,
"attachment; filename=\"" + filename + "\"");
// Configure the CSV writer builder
StatefulBeanToCsvBuilder<Report> builder = new StatefulBeanToCsvBuilder<Report>(response.getWriter()).withQuotechar(CSVWriter.NO_QUOTE_CHARACTER).withSeparator(CSVWriter.DEFAULT_SEPARATOR).withOrderedResults(true);
// Ignore any field except the `id` and `amount` ones
Arrays.stream(Report.class.getDeclaredFields())
.filter(field -> !("id".equals(field.getName()) || "amount".equals(field.getName())))
.forEach(field -> builder.withIgnoreField(Report.class, field));
// create a csv writer
StatefulBeanToCsv<Report> writer = builder.build();
// write all employees to csv file
writer.write(reportsService.findByCohort(cohort));
}
}
The above code generates csv file with Id and amount entries from the database now what should i do to get currency and name in one csv and download it simultaneously.

Spring Batch - create new unique CSV name while writing data using FlatFileItemWriter API

I tried the solution applied in the post here : Spring Batch - create a new file each time instead of overriding it for transferring data from CSV to XML, but it didn't worked for the annotation based approached I used.
fileItemWriter.setResource(new FileSystemResource("csv/employees-#{new java.text.SimpleDateFormat("Mddyyyyhhmmss").format(new java.util.GregorianCalendar().getTime())}.csv"));
My Batch job is scheduled to run in every 1 hours, this batch jobs reads table and write data to CSV file. When data writes I need to create new file altogether..will be good if file name is unique, so I was looking to implement the date etc as per post.
Could anyone guide what's wrong going on ?
#Bean(destroyMethod="")
public FlatFileItemWriter<Employees> employeesWriter(){
FlatFileItemWriter<Employees> fileItemWriter = new FlatFileItemWriter<>();
//fileItemWriter.setResource(new FileSystemResource("csv/employees.csv"));
fileItemWriter.setResource(new FileSystemResource("csv/employees-#{new java.text.SimpleDateFormat("Mddyyyyhhmmss").format(new java.util.GregorianCalendar().getTime())}.csv"));
fileItemWriter.setHeaderCallback(headerCallback());
BeanWrapperFieldExtractor<Employees> fieldExtractor = new BeanWrapperFieldExtractor<>();
fieldExtractor.setNames(new String[] {"employeeNumber", "lastName", "firstName", "extension", "email", "officeCode", "reportsTo", "jobTitle"});
DelimitedLineAggregator<Employees> lineAggregator = new DelimitedLineAggregator<>();
lineAggregator.setDelimiter(",");
lineAggregator.setFieldExtractor(fieldExtractor);
fileItemWriter.setLineAggregator(lineAggregator);
fileItemWriter.setShouldDeleteIfEmpty(true);
return fileItemWriter;
}
Could anyone guide what's wrong going on ?
Three things:
SpEL expressions are not interpreted when used like you do
The " copied from the xml sample will not work in Java config
The / in csv/... is not a valid character in a file name
You need to declare your writer as follows:
#Bean
public FlatFileItemWriter itemWriter(#Value("employees-#{new java.text.SimpleDateFormat('Mddyyyyhhmmss').format(new java.util.GregorianCalendar().getTime())}.csv") String filename) {
FlatFileItemWriter<Employees> fileItemWriter = new FlatFileItemWriter<>();
fileItemWriter.setResource(new FileSystemResource(filename));
...
return fileItemWriter;
}
But I would recommend using a step scoped item writer and pass the file name as a job parameter rather than using a SpEL expression.

How to read flat file header and body separately in Spring Batch

i'm doing a simple batch job with Spring Batch and Spring Boot.
I need to read a flat file, separate the header data (first line) from the body data (rest of lines) for individual business logic processing and then write everything into a single file.
As you can see, the header has 5 params that have to be mapped to one class, and the body has 12 which have to be mapped to a different one.
I first thought of using FlatFileItemReader and skip the header. Then use the skippedLinesCallback to handle that line, but i couldn't figure out how to do it.
I'm new to Spring Batch and Java Config. If someone can help me writing a solution for my problem i would really aprecciate it!
I leave here the input file:
01.01.2017|SUBDCOBR|12:21:23|01/12/2016|31/12/2016
01.01.2017|12345678231234|0002342434|BORGIA RUBEN|27-32548987-9|FA|A|2062-
00010443/444/445|142,12|30/08/2017|142,01
01.01.2017|12345673201234|2342434|ALVAREZ ESTHER|27-32533987-9|FA|A|2062-
00010443/444/445|142,12|30/08/2017|142,02
01.01.2017|12345673201234|0002342434|LOPEZ LUCRECIA|27-32553387-9|FA|A|2062-
00010443/444/445|142,12|30/08/2017|142,12
01.01.2017|12345672301234|0002342434|SILVA JESUS|27-32558657-9|NC|A|2062-
00010443|142,12|30/08/2017|142,12
Cheers!
EDIT 1:
This would be my first attepmt . My "body" POJO is called DetalleFacturacion and my "header" POJO is CabeceraFacturacion. The reader I thought to do it with DetalleFacturacion pojo, so i can skip the header and treat it later... however i'm not sure how to assign header's data into CabeceraFacturacion.
public FlatFileItemReader<DetalleFacturacion> readerDetalleFacturacion(){
FlatFileItemReader<DetalleFacturacion> reader = new FlatFileItemReader<>();
reader.setLinesToSkip(1);
reader.setResource(new ClassPathResource("/inputFiles/GLEO-MN170100-PROCESO01-SUBDFACT-000001.txt"));
DefaultLineMapper<DetalleFacturacion> detalleLineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer tokenizerDet = new DelimitedLineTokenizer("|");
tokenizerDet.setNames(new String[] {"fechaEmision", "tipoDocumento", "letra", "nroComprobante",
"nroCliente", "razonSocial", "cuit", "montoNetoGP", "montoNetoG3",
"montoExento", "impuestos", "montoTotal"});
LineCallbackHandler skippedLineCallback = new LineCallbackHandler() {
#Override
public void handleLine(String line) {
String[] headerSeparado = line.split("|");
String printDate = headerSeparado[0];
String reportIdentifier = headerSeparado[1];
String tituloReporte = headerSeparado[2];
String fechaDesde = headerSeparado[3];
String fechaHasta = headerSeparado[4];
CabeceraFacturacion cabeceraFacturacion = new CabeceraFacturacion();
cabeceraFacturacion.setPrintDate(printDate);
cabeceraFacturacion.setReportIdentifier(reportIdentifier);
cabeceraFacturacion.setTituloReporte(tituloReporte);
cabeceraFacturacion.setFechaDesde(fechaDesde);
cabeceraFacturacion.setFechaHasta(fechaHasta);
}
};
reader.setSkippedLinesCallback(skippedLineCallback);
detalleLineMapper.setLineTokenizer(tokenizerDet);
detalleLineMapper.setFieldSetMapper(new DetalleFieldSetMapper());
detalleLineMapper.afterPropertiesSet();
reader.setLineMapper(detalleLineMapper);
// Test to check if it is saving correctly data in CabeceraFacturacion
CabeceraFacturacion cabeceraFacturacion = new CabeceraFacturacion();
System.out.println("Print Date:"+cabeceraFacturacion.getPrintDate());
System.out.println("Report Identif:
"+cabeceraFacturacion.getReportIdentifier());
return reader;
}
You are correct . You need to use skippedLinesCallback to handle skip lines.
You need to implement LineCallbackHandler interface and add you processing in handleLine method.
LineCallbackHandler Interface passes the raw line content of the lines in the file to be skipped. If linesToSkip is set to 2, then this interface is called twice.
This is how you can define Reader for the same.
Java Config - Spring Batch 4
#Bean
public FlatFileItemReader<POJO> myReader() {
return FlatFileItemReader<pojo>().
.setResource(new FileSystemResource("resources/players.csv"));
.name("myReader")
.delimited()
.delimiter(",")
.names("pro1,pro2,pro3")
.targetType(POJO.class)
.skippedLinesCallback(skippedLinesCallback)
.build();
}

Best strategy to Handle large data in Apache Camel

I am using Apache Camel to generate monthly reports. I have a MySQL query which when ran against my DB generates around 5 million records (20 columns each). The query itself takes approximately 70 minutes to execute.
To speed up the process, I created 5 seda (worker) routes and used multicast().parallelProcessing()
which query the DB in parallel for different time ranges, and then merged the result using an aggregator.
Now, I can see 5 million records in my exchange body (in the form of List<HashMap<String, Object>>). When I try to format this using a Camel Bindy to generate a csv file out of this data, I am getting a GC Overhead Exception. I tried increasing Java Heap Size, but it takes forever to transform.
Is there any other method, to convert this raw data into a well formatted csv file? Can Java 8 streams be useful?
Code
from("direct://logs/testLogs")
.routeId("Test_Logs_Route")
.setProperty("Report", simple("TestLogs-${date:now:yyyyMMddHHmm}"))
.bean(Logs.class, "buildLogsQuery") // bean that generates the logs query
.multicast()
.parallelProcessing()
.to("seda:worker1?waitForTaskToComplete=Always&timeout=0", // worker routes
"seda:worker2?waitForTaskToComplete=Always&timeout=0",
"seda:worker3?waitForTaskToComplete=Always&timeout=0",
"seda:worker4?waitForTaskToComplete=Always&timeout=0",
"seda:worker5?waitForTaskToComplete=Always&timeout=0");
All my worker routes look like this
from("seda:worker4?waitForTaskToComplete=Always")
.routeId("ParallelProcessingWorker4")
.log(LoggingLevel.INFO, "Parallel Processing Worker 4 Flow Started")
.setHeader("WorkerId", constant(4))
.bean(Logs.class, "testBean") // appends time-clause to the query based in WorkerID
.to("jdbc:oss-ro-ds")
.to("seda:resultAggregator?waitForTaskToComplete=Always&timeout=0");
Aggregator
from("seda:resultAggregator?waitForTaskToComplete=Always&timeout=0")
.routeId("Aggregator_ParallelProcessing")
.log(LoggingLevel.INFO, "Aggregation triggered for processor ${header.WorkerId}")
.aggregate(header("Report"), new ParallelProcessingAggregationStrategy())
.completionSize(5)
.to("direct://logs/processResultSet")
from("direct://logs/processResultSet")
.routeId("Process_Result_Set")
.bean(Test.class, "buildLogReport");
.marshal(myLogBindy)
.to("direct://deliver/ooma");
Method buildLogReport
public void buildLogReport(List<HashMap<String, Object>> resultEntries, Exchange exchange) throws Exception {
Map<String, Object> headerMap = exchange.getIn().getHeaders();
ArrayList<MyLogEntry> reportList = new ArrayList<>();
while(resultEntries != null){
HashMap<String, Object> resultEntry = resultEntries.get(0);
MyLogEntry logEntry = new MyLogEntry();
logEntry.setA((String) resultEntry.get("A"));
logEntry.setB((String) resultEntry.get("B"));
logEntry.setC(((BigDecimal) resultEntry.get("C")).toString());
if (null != resultEntry.get("D"))
logEntry.setD(((BigInteger) resultEntry.get("D")).toString());
logEntry.setE((String) resultEntry.get("E"));
logEntry.setF((String) resultEntry.get("F"));
logEntry.setG(((BigDecimal) resultEntry.get("G")).toString());
logEntry.setH((String) resultEntry.get("H"));
logEntry.setI(((Long) resultEntry.get("I")).toString());
logEntry.setJ((String) resultEntry.get("J"));
logEntry.setK(TimeUtils.convertDBToTZ((Date) resultEntry.get("K"), (String) headerMap.get("TZ")));
logEntry.setL(((BigDecimal) resultEntry.get("L")).toString());
logEntry.setM((String) resultEntry.get("M"));
logEntry.setN((String) resultEntry.get("State"));
logEntry.setO((String) resultEntry.get("Zip"));
logEntry.setP("\"" + (String) resultEntry.get("Type") + "\"");
logEntry.setQ((String) resultEntry.get("Gate"));
reportList.add(logEntry);
resultEntries.remove(resultEntry);
}
// Transform The Exchange Message
exchange.getIn().setBody(reportList);
}

Resources