How to handle exception and skip the wrong csv line as well? - spring

I am reading a csv file using Spring batch. I am reading the content of csv and writing this content to the database in accordance with one of the entity class.
Now there could be certain lines in csv that are wrong and don't match the POJO attributes. To handle this I configured my Step as follows:
Step step = stepBuilderFactory.get("CSV-Step")
.<Book, Book>chunk(100)
.faultTolerant()
.skip(FlatFileParseException.class)
.skipLimit(1)
.reader(itemReader)
.writer(itemWriter)
.build();
It basically skips the line which causes FlatFileParseException and goes on with subsequent lines. Now I also want to log the lines for which parsing could not be done. For this in my GlobalExceptionHandler annotated with #ControllerAdvice I made following method:
#OnReadError
public void handleCsvParseException(FlatFileParseException ex, Throwable throwable) {
logger.error("! FlatFileParseException, line is: " + ex.getLineNumber());
logger.error("! FlatFileParseException, input is: " + ex.getInput());
logger.error("! Message: " + throwable.getMessage());
logger.error("! Cause: " + throwable.getCause());
}
The thing is that this method is not being called because i have the skip configuration in my Step. How can i ignore the unwanted lines i.e. skipping unwanted line but at the same time log information about them. I would appreciate any kind of help.

The SkipListener is what you need. This listener will be called whenever the configured skippable exception occurs during reading, processing or writing items.
In your case, you can implement the logging logic in the SkipListener#onSkipInRead(Throwable t); method. The FlatFileParseException will be passed as a parameter and will give you the necessary context (the line number and the raw input).
Hope this helps.

Related

Formatting violation during execution webServiceMessage.writeTo

Please help
We use WebServiceTemplate.marshalSendAndReceive, where do we pass the url, pojo, and WebServiceMessageCallback
And use interceptor where
ByteArrayTransportOutputStream byteArrayTransportOutputStream = new ByteArrayTransportOutputStream();
webServiceMessage.writeTo(byteArrayTransportOutputStream);
String request = EOL + IntegrationUtils.deleteSensitiveData(new String(byteArrayTransportOutputStream.toByteArray()));
Why before writeTo webServiceMessage is saving soapRequest in formatted form in webServiceMessage.envelope.element
But after writeTo I get message in one line without any formatting
What is problem? What does it depend on?

Problems using CSV Data Set Config and EOF condition

I have a CSv Data Set Config which I am using in a while loop. I have this in a module and use this same module several times in my test.
My first problem is that I have set the while loop with this condition:
${__javaScript("${data}"!="<EOF>")}
The "data" being the first column in my csv file. This works fine except for the last iteration where ${data} gets set to "EOF" and has all the tests performed on it. I want it to stop before this, not after all the tests have ran once on it.
My other problem is that when I use this module again later, ${data} is still set to "EOF" and no tests are run at all.
To avoid this <EOF> bit just put your logic under the If Controller and use "${data}" != "<EOF>" as the "Condition"
See Using the While Controller in JMeter guide for detailed explanation of this and other common use cases.
UPD. Reusing the same CSV file:
Add a Beanshell Sampler after 1st While Controller and before the 2nd While Controller and use the following code in "Script" area:
import org.apache.jmeter.services.FileServer;
FileServer.getFileServer().closeFile("yourfile.csv");
vars.put("data", "");
The above script will "close" the original CSV file so it could be reused later on in the script and "clear" ${data} variable as it contains <EOF> value.
See How to Use BeanShell: JMeter's Favorite Built-in Component guide for details on using Beanshell scripts in JMeter tests.
If you would use Loop Controller, with number of CSV lines being number of iterations you could avoid that. Just put this code into Beanshell Sampler:
import org.apache.commons.io.FileUtils;
int lines = FileUtils.readLines(new File("/home/username/csv.file")).size();
vars.put("linesCount", String.valueOf(lines));
After that you can use lineCount in the Loop Controller.
If your data variable needs to be reverted to original state, you could store default value in other variable, and at the end of Loop revert data to it using Beanshell pre/post proccesor.
EDIT:
Or you could insert If Controller in your While Controller and process all child elements only if data doesn't equals EOF:
${__javaScript("${data}"!="<EOF>")}
Neither of the previous suggestions to go and re-use a csv file had worked for me. I ended up doing something different. It's way more complicated than I like, but it works.
I posted my answer to that in another post (https://stackoverflow.com/a/64086009/4832515), but I'll copy & paste it incase that link doesn't work in the future.
I couldn't find a simple solution to this. I ended up using beanshell scripts, which let you use code very similar to java to do some custom stuff. I made an example JMeter project to demonstrate how to do this (yes it's ridiculously complicated, considering all I want to do is repeat the CSV read):
Files:
my file structure:
JMeterExample
|
⊢--JMeterTests.jmx // the JMeter file
⊢--example.csv // the CSV file
contents of my CSV:
guest-id-1,"123 fake street",
guest-id-2,"456 fake street",
guest-id-3,"789 fake street",
so in this thread group, I'm going to just have 1 user, and I'll loop 2 times. I intend to send 1 request per CSV line. So there should be 6 requests sent total.
Thread Group
User Defined Variables
This is kind of optional, but the filepath is subject to change, and I don't like changing my scripts just for a change in configuration. So I store the CSV filename in a "User Defined Variables" node.
If you are storing the CSV file in the same directory as your JMeter test, you can just specify the filename only.
If you are saving the CSV in a folder other than the directory containing your JMeter file, you will need to supply an absolute path, and then slightly modify the beanshell script below: you'll need to comment out the line that loads the file relatively, and comment in the line that loads from an absolute path.
BeanShell Sampler to parse and store CSV lines
Add a Beanshell Sampler which will basically take in a path, and parse & store each line as a variable. The first line will be stored as a variable called csv_line_0, the 2nd line will be csv_line_1 and so on. I know it's not a clean solution but... I can't find any clean simple way of doing this clean simple task. I copied and pasted my code below.
import org.apache.jmeter.services.FileServer;
import java.text.*;
import java.io.*;
import java.util.*;
String temp = null;
ArrayList lines = new ArrayList();
BufferedReader bufRdr;
ArrayList strList = new ArrayList();
// get the file
try {
// you can use this line below if your csvFilePath is an absolute path
// File file = new File(${csvFilePath});
// you can use this line below if your csvFilepath is a relative path, relative to where you saved this JMeter file
File file = new File(org.apache.jmeter.services.FileServer.getFileServer().getBaseDir() + "/" + ${csvFilePath});
if (!file.exists()) {
throw new Exception ("ERROR: file " + filename + " not found");
}
bufRdr = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF8"));
} catch(Exception e){
log.error("failed to load file");
log.error(e.getMessage());
return;
}
// For each CSV line, save it to a variable
int counter = 0;
while(true){
try{
temp = bufRdr.readLine();
if(temp == null || temp.equals("<EOF>")){
break;
}
lines.add(temp);
vars.put("csv_line_" + String.valueOf(counter), temp);
counter++;
} catch(Exception e){
log.error("failed to get next line");
log.error(e.getMessage());
break;
}
}
// store the number of CSV lines there are for the loop counter
vars.put("linesCount", String.valueOf(lines.size()));
Loop Controller
Add a Loop Controller that loops once for each CSV line. ${linesCount} is a count of the number of CSV lines and is calculated from the above beanShell script.
Beanshell script to extract data from current CSV Line
This script will run once per CSV line. It will go and grab the current line, and parse out whatever data is on it. You'll have to modify this script to get the data you want. In my example, I only had 2 columns, where column 1 is a "guestId", and column 2 is an "address".
__jm__loopController__idx is a variable JMeter defines for you, and is the index of the loop controller. The variable name is __jm__{loop controller name}__idx.
String index = vars.get("__jm__loopController__idx");
String line = vars.get("csv_line_" + index);
String [] tokens = line.split(",");
vars.put("guestId", tokens[0]);
vars.put("address", tokens[1]);
Http request sampler
Here's the HTTP request that's using the data extracted.
result
When running this, as desired, I end up sending 6 http requests over to the endpoint I defined.

Getting a field value from pipe in outside the pipe in Hadoop Cascading

Regarding above subject, is there any way to get the value of a field from a pipe. And use that value outside the pipe's scope in Hadoop Cascading? The data has delimiter as '|':
first_name|description
Binod|nothing
Rohit|nothing
Ramesh|abc
From above pipe I need to get a value from the description, whatever that is 'nothing' or 'abc'.
Hadoop Cascading is developed with a concept of creating real case scenario by flowing data between pipe and executing parallely it over Map-Reduce Hadoop system.
Execution of java program is unnecessary to depend with rest of the cascading flow (from creating source tap to sink tap), and what Hadoop Cascading does is: it executes those two different processes in different independent JVM instances and they will be unable to share their values.
Following code and its output shows brief hints:
System.out.println("Before Debugging");
m_eligPipe = new Each(m_eligPipe, new Fields("first_name"), new Debug("On Middle", true));
System.out.println("After Debugging");
Expected ouput:
Before Debugging
On Middle: ['first_name']
On Middle: ['Binod']
On Middle: ['Rohit']
On Middle: ['Ramesh']
After Debugging
Actual output:
Before Debugging
After Debugging
...
...
On Middle: ['first_name']
On Middle: ['Binod']
On Middle: ['Rohit']
On Middle: ['Ramesh']
I don't understand what you are trying to say. Do you to mean to extract the value of field ${description} outside the scope of the pipe. If possible something like this in pseudo code.
str = get value of description in inputPipe (which is in the scope of the job rather than function or buffer)
I assume this is what you want: you have a pipe with one field, that is the concatenation of ${first_name} and ${description}. And you want the output to be a pipe with field that is ${description}.
If so, this is what I'd do: implement a function that extracts description and have your flow execute it.
You function (let's call it ExtractDescriptionFunction) should override method operate with something like this:
#Override
public void operate(FlowProcess flowProcess, FunctionCall<Tuple> functionCall) {
TupleEntry arguments = functionCall.getArguments();
String concatenation = arguments.getString("$input_field_name");
String[] values = concatenation.split("\\|"); // you might want to have some data sanity check here
String description = values[1];
Tuple tuple = functionCall.getContext();
tuple.set(0, description);
functionCall.getOutputCollector().add(tuple);
}
Then, in your flow definition, add this:
Pipe outputPipe = new Each(inputPipe, new ExtractDescriptionFunction());
Hope this helps.

Apache Camel / xpath operation result detection

Given a Camel route that is supposed to extract some inner parts of an XML message, create a new message from it then pass it on.
from(SUB_EXTRACT_XML)
.setExchangePattern(ExchangePattern.InOut)
.setBody().xpath("//mmsg:MyMessage/mmsg:AnyPayload/*", namespaces)
.setBody().simple("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n${in.body}")
.to(...)
For correct input messages like this (an "embedded" xml message is inside defined in schema by xs:any), it's working since the message is what I expect it to be:
<mmsg:MyMessage>
<mmsg:RandomTags/>
...
<mmsg:AnyPayload> <!-- xs:any in xsd -->
<some><xml/><here/></some>
</mmsg:AnyPayload>
</mmsg:MyMessage>
Given there is some issues with the XML message, such as the mmsg:AnyPayload tag is missing, so that the XPATH can't do its job:
<mmsg:MyMessage>
<mmsg:RandomTags/>
...
<some><xml/><here/></some>
</mmsg:MyMessage>
The XPATH will fail to extract the data and the entire XML message (including mmsg:MyMessage) is passed on, which is not intended. I rather throw some exception at this stage.
Question:
Is there a way to check if the xpath expression actually found the element refered to later in the route or if it failed to extract the given element(s)?
I know I could have done some schema validation of the message before and reject rubbish messages, but are there any way to see if a XPath expression fails?
A solution would be to use the choice() DSL in the route like this:
from(SUB_EXTRACT_XML)
.setExchangePattern(ExchangePattern.InOut)
.choice()
.when(xpath("//mmsg:MyMessage/mmsg:AnyPayload", namespaces))
.setHeader("Status", "OK") // just for another example how to transmit some variable between two routes
.setBody().simple("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n${in.body}")
.endChoice()
.otherwise()
.log(LoggingLevel.ERROR, "LoggerName", "Error message; Stop the processing")
.stop()
.endChoice()
.end()
// Just to show the headers are following the route...
.to("DIRECT_GO_FORWARD");
from("DIRECT_GO_FORWARD")
.setExchangePattern(ExchangePattern.InOut)
.choice()
.when(header("Status").isEqualTo("OK"))
.bean(new SampleProcessor())
...
.end()
...
.to("...");
the second route is just there to show you can use the header set in the first route (and the body too).

Spring Batch FlatFileItemWriter leaves empty file

I have the following code:
File overitimeFile = new File(filePath+overtimeFileName);
FlatFileItemWriter<OvertimeSAPExport> overtimeItemWriter =
new FlatFileItemWriter<OvertimeSAPExport>();
overtimeItemWriter.setResource(new FileSystemResource(overitimeFile));
overtimeItemWriter.setShouldDeleteIfExists(true);
PassThroughLineAggregator<OvertimeSAPExport> lineAggregator =
new PassThroughLineAggregator<OvertimeSAPExport>();
overtimeItemWriter.setLineAggregator(lineAggregator);
overtimeItemWriter.open(new ExecutionContext());
List<OvertimeSAPExport> overtimeList = overtimeDAO.getSapOvertimeData(locationId, month);
overtimeItemWriter.write(overtimeList);
I have implemented the toString method for OvertimeSAPExport and when I debug I can see that it enters the toString once for each record in the list and gets te correct string from it.
It also creates the file without problems and throws no exceptions my way, but when I look at the file, it's empty.
Could someone PLEASE show me where my mistake is?
Try overtimeItemWriter.close(); and see if the file is flushed on disk. You also need to validate if a transaction is ongoing that postponed the writing.

Resources