Process json files in spring batch

Process json files in spring batch - spring

I have a zip file containing multiple json files. I have unzipped them
then got POJO object from json using below code:
reader = new BufferedReader(new FileReader(file));
Gson gson = new GsonBuilder().create();
Element[] people = gson.fromJson(reader, Element[].class);
but I need to process these json files one by one using spring batch.
Can someone help me how I can achieve this in spring batch and I want to read json file using chunk of 1000
My json object is very complex. Example:
{
"students": {
"subelements": {
"dep": {
"data": [
"XYZ"
]
}
}
}
}

Your data structure is not one of the types you could handle with Spring Batch out-of-the-box. See more details here: https://stackoverflow.com/a/51933062/5019386.
So I think in your case, you would need to create a custom item reader to parse a specific fragment of your input file.

Related

How to receive multipart file upload using reactor-netty (without Spring)?

I've seen there is exemples with reactor-netty on how to post files using multipart form (https://github.com/reactor/reactor-netty/blob/89796a1839a1439a1800424e130515357a827392/src/test/java/reactor/netty/http/client/HttpClientTest.java#L337)
But I couldn't find any information on how to write a server using reactor-netty that can parse multipart information.
It seems that netty is able to do it using HttpPostRequestDecoder class but I cannot see where it fits...
I also seen InterfaceHttpData is a mother class of Attributes and FileUpload but I don't see where I can obtain these objects from the request...
Has anyone ever done this? Any clues?
Thanks a lot

request.receive()
.aggregate()
.flatMap(byteBuf -> {
FullHttpRequest dhr = new DefaultFullHttpRequest(request.version(), request.method(), request.uri(), byteBuf, request.requestHeaders(), EmptyHttpHeaders.INSTANCE);
HttpPostRequestDecoder postDecoder = new HttpPostRequestDecoder(new DefaultHttpDataFactory(false), dhr, CharsetUtil.UTF_8);
// loop data
for (InterfaceHttpData data : postDecoder.getBodyHttpDatas()) {
// attribute
if (data.getHttpDataType() == InterfaceHttpData.HttpDataType.Attribute) {
// (MemoryAttribute) data
}
// upload
else if (data.getHttpDataType() == InterfaceHttpData.HttpDataType.FileUpload) {
// (MemoryFileUpload) data
}
}
postDecoder.destroy();
dhr.release();
});

Indexing PDF file in ElasticSearch using Java Code

I am trying to Index PDF files in elastic search 6.3.2 using Java code. So far I have written following code to save the pdf in ES. The code is working fine and I am able to save the Base64 encoded string of my PDF in ES. I want to understand if the approach which I am following is correct or not? Is there any better way of doing it?
Following is my code:
InputStream inputStream = new FileInputStream(new File("mypdf.pdf"));
try {
byte[] fileByteStream = IOUtils.toByteArray(inputStream );
String base64String = new String(Base64.getEncoder().encodeToString(fileByteStream).getBytes(),"UTF-8");
String strEncoded = Base64.getEncoder().encodeToString( base64String.getBytes( "utf-8" ));
this.stream.close();
JSONObject correspondenceNode = new JSONObject();
correspondenceNode.put("data",strEncoded );
String strSsonValues = correspondenceNode.toString();
HttpEntity entity = new NStringEntity(strSsonValues , ContentType.APPLICATION_JSON);
elasticrestClient.put("/2018/documents/"1, entity);
} catch (IOException e) {
e.printStackTrace();
}
Basically what I am doing here is, I am converting the PDF document into Base64String and saving it into ES and while reading, I am converting it back.
following is the code for decoding:
String responseBody = elasticrestClient.get("/2018/documents/1");
//some code to fetch the hits
JSONObject h = hitsArray.getJSONObject(0);
source = h.getJSONObject("_source");
String object = (source.getString("data"));
byte[] decodedStr = Base64.getDecoder().decode( object );
FileOutputStream fos = new FileOutputStream("download.pdf");
fos.write(Base64.getDecoder().decode(new String( decodedStr, "utf-8" )));
fos.close();

This might be correct to store a BASE64 content in elasticsearch but few pieces might be missing here:
You are not "indexing" the PDF as per say in Elasticsearch. If you want to do so, you need to define an ingest pipeline and use the ingest attachment plugin to extract the content from the PDF.
You did not speak about the mapping you are using. If you "really" want to keep the binary content around, you might want to define the BASE64 field as a binary data type.
It does not sound to me a good idea to use elasticsearch to store large blobs like this.
Instead, I'd extract text and metadata and index that + an URL to the binary itself. Like:
{
"content": "Extracted text here",
"meta": {
// Meta data there
},
"url": "file://path/to/file"
}
You can also look at FSCrawler (including its code) which does basically that.

Spring Batch - create new unique CSV name while writing data using FlatFileItemWriter API

I tried the solution applied in the post here : Spring Batch - create a new file each time instead of overriding it for transferring data from CSV to XML, but it didn't worked for the annotation based approached I used.
fileItemWriter.setResource(new FileSystemResource("csv/employees-#{new java.text.SimpleDateFormat("Mddyyyyhhmmss").format(new java.util.GregorianCalendar().getTime())}.csv"));
My Batch job is scheduled to run in every 1 hours, this batch jobs reads table and write data to CSV file. When data writes I need to create new file altogether..will be good if file name is unique, so I was looking to implement the date etc as per post.
Could anyone guide what's wrong going on ?
#Bean(destroyMethod="")
public FlatFileItemWriter<Employees> employeesWriter(){
FlatFileItemWriter<Employees> fileItemWriter = new FlatFileItemWriter<>();
//fileItemWriter.setResource(new FileSystemResource("csv/employees.csv"));
fileItemWriter.setResource(new FileSystemResource("csv/employees-#{new java.text.SimpleDateFormat("Mddyyyyhhmmss").format(new java.util.GregorianCalendar().getTime())}.csv"));
fileItemWriter.setHeaderCallback(headerCallback());
BeanWrapperFieldExtractor<Employees> fieldExtractor = new BeanWrapperFieldExtractor<>();
fieldExtractor.setNames(new String[] {"employeeNumber", "lastName", "firstName", "extension", "email", "officeCode", "reportsTo", "jobTitle"});
DelimitedLineAggregator<Employees> lineAggregator = new DelimitedLineAggregator<>();
lineAggregator.setDelimiter(",");
lineAggregator.setFieldExtractor(fieldExtractor);
fileItemWriter.setLineAggregator(lineAggregator);
fileItemWriter.setShouldDeleteIfEmpty(true);
return fileItemWriter;
}

Could anyone guide what's wrong going on ?
Three things:
SpEL expressions are not interpreted when used like you do
The " copied from the xml sample will not work in Java config
The / in csv/... is not a valid character in a file name
You need to declare your writer as follows:
#Bean
public FlatFileItemWriter itemWriter(#Value("employees-#{new java.text.SimpleDateFormat('Mddyyyyhhmmss').format(new java.util.GregorianCalendar().getTime())}.csv") String filename) {
FlatFileItemWriter<Employees> fileItemWriter = new FlatFileItemWriter<>();
fileItemWriter.setResource(new FileSystemResource(filename));
...
return fileItemWriter;
}
But I would recommend using a step scoped item writer and pass the file name as a job parameter rather than using a SpEL expression.

How to read flat file header and body separately in Spring Batch

i'm doing a simple batch job with Spring Batch and Spring Boot.
I need to read a flat file, separate the header data (first line) from the body data (rest of lines) for individual business logic processing and then write everything into a single file.
As you can see, the header has 5 params that have to be mapped to one class, and the body has 12 which have to be mapped to a different one.
I first thought of using FlatFileItemReader and skip the header. Then use the skippedLinesCallback to handle that line, but i couldn't figure out how to do it.
I'm new to Spring Batch and Java Config. If someone can help me writing a solution for my problem i would really aprecciate it!
I leave here the input file:
01.01.2017|SUBDCOBR|12:21:23|01/12/2016|31/12/2016
01.01.2017|12345678231234|0002342434|BORGIA RUBEN|27-32548987-9|FA|A|2062-
00010443/444/445|142,12|30/08/2017|142,01
01.01.2017|12345673201234|2342434|ALVAREZ ESTHER|27-32533987-9|FA|A|2062-
00010443/444/445|142,12|30/08/2017|142,02
01.01.2017|12345673201234|0002342434|LOPEZ LUCRECIA|27-32553387-9|FA|A|2062-
00010443/444/445|142,12|30/08/2017|142,12
01.01.2017|12345672301234|0002342434|SILVA JESUS|27-32558657-9|NC|A|2062-
00010443|142,12|30/08/2017|142,12
Cheers!
EDIT 1:
This would be my first attepmt . My "body" POJO is called DetalleFacturacion and my "header" POJO is CabeceraFacturacion. The reader I thought to do it with DetalleFacturacion pojo, so i can skip the header and treat it later... however i'm not sure how to assign header's data into CabeceraFacturacion.
public FlatFileItemReader<DetalleFacturacion> readerDetalleFacturacion(){
FlatFileItemReader<DetalleFacturacion> reader = new FlatFileItemReader<>();
reader.setLinesToSkip(1);
reader.setResource(new ClassPathResource("/inputFiles/GLEO-MN170100-PROCESO01-SUBDFACT-000001.txt"));
DefaultLineMapper<DetalleFacturacion> detalleLineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer tokenizerDet = new DelimitedLineTokenizer("|");
tokenizerDet.setNames(new String[] {"fechaEmision", "tipoDocumento", "letra", "nroComprobante",
"nroCliente", "razonSocial", "cuit", "montoNetoGP", "montoNetoG3",
"montoExento", "impuestos", "montoTotal"});
LineCallbackHandler skippedLineCallback = new LineCallbackHandler() {
#Override
public void handleLine(String line) {
String[] headerSeparado = line.split("|");
String printDate = headerSeparado[0];
String reportIdentifier = headerSeparado[1];
String tituloReporte = headerSeparado[2];
String fechaDesde = headerSeparado[3];
String fechaHasta = headerSeparado[4];
CabeceraFacturacion cabeceraFacturacion = new CabeceraFacturacion();
cabeceraFacturacion.setPrintDate(printDate);
cabeceraFacturacion.setReportIdentifier(reportIdentifier);
cabeceraFacturacion.setTituloReporte(tituloReporte);
cabeceraFacturacion.setFechaDesde(fechaDesde);
cabeceraFacturacion.setFechaHasta(fechaHasta);
}
};
reader.setSkippedLinesCallback(skippedLineCallback);
detalleLineMapper.setLineTokenizer(tokenizerDet);
detalleLineMapper.setFieldSetMapper(new DetalleFieldSetMapper());
detalleLineMapper.afterPropertiesSet();
reader.setLineMapper(detalleLineMapper);
// Test to check if it is saving correctly data in CabeceraFacturacion
CabeceraFacturacion cabeceraFacturacion = new CabeceraFacturacion();
System.out.println("Print Date:"+cabeceraFacturacion.getPrintDate());
System.out.println("Report Identif:
"+cabeceraFacturacion.getReportIdentifier());
return reader;
}

You are correct . You need to use skippedLinesCallback to handle skip lines.
You need to implement LineCallbackHandler interface and add you processing in handleLine method.
LineCallbackHandler Interface passes the raw line content of the lines in the file to be skipped. If linesToSkip is set to 2, then this interface is called twice.
This is how you can define Reader for the same.
Java Config - Spring Batch 4
#Bean
public FlatFileItemReader<POJO> myReader() {
return FlatFileItemReader<pojo>().
.setResource(new FileSystemResource("resources/players.csv"));
.name("myReader")
.delimited()
.delimiter(",")
.names("pro1,pro2,pro3")
.targetType(POJO.class)
.skippedLinesCallback(skippedLinesCallback)
.build();
}

How to Make a PUT request to Elastic Search with a JSON file using REST Template

I am creating a SpringBoot Application, where I need to PUT a JSON schema in Elastic Search. The JSON schema will be in my resources folder(in my classpath). How to PUT the raw JSON file using REST Template.
Any Help?" As most of the example on internet are JUST assuming that we have a POJO class to send. But here I am not aware about the JSON Schema. I need to make the request with the raw JSON file.

Assuming the json schema contains the mapping/settings for an index. Then you can put the mapping like as shown below :
CreateIndexRequestBuilder createIndexRequestBuilder = client.admin().indices().prepareCreate(index);
// CREATE MAPPING
String mapping_json = new String(Files.readAllBytes(json_mapping_path));
createIndexRequestBuilder.addMapping("my_mapping", mapping_json);
CreateIndexResponse indexResponse = createIndexRequestBuilder.execute().actionGet();

For create index don't worry about index mapping json if you wish your json will not be changed ever you can directly create documents by using this code
for(listObject lObject:list){
XContentBuilder json;
try {
json = XContentFactory.jsonBuilder();
json.startObject();// Main Object Start
json.field(GlobalSearchCosntants.DOCUMENT_ID, lObject.getId());
json.field(GlobalSearchCosntants.DOCUMENT_NAME, lObject.getName());
json.field(GlobalSearchCosntants.DOCUMENT_TYPE, lObject.getType());
json.endObject();// Main Object Start
}catch (IOException e1) {
logger.error("Problem while creating document " + e1.getMessage());
}
client.prepareIndex(INDEX_NAME, GlobalSearchCosntants.INDEX_TYPE, id)
.setSource(json).execute().actionGet();
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Process json files in spring batch - spring

Your data structure is not one of the types you could handle with Spring Batch out-of-the-box. See more details here: https://stackoverflow.com/a/51933062/5019386. So I think in your case, you would need to create a custom item reader to parse a specific fragment of your input file.

Related

How to receive multipart file upload using reactor-netty (without Spring)?

Indexing PDF file in ElasticSearch using Java Code

Spring Batch - create new unique CSV name while writing data using FlatFileItemWriter API

How to read flat file header and body separately in Spring Batch

How to Make a PUT request to Elastic Search with a JSON file using REST Template

Categories

Resources