How to extract fields enveloped in quotes in NiFi?

How to extract fields enveloped in quotes in NiFi? - etl

I have some pipe delimited files. Each field is bounded by quotes like this.
"Created_Date__c"|"CreatedById"|"CreatedDate"|"Guid_c"
"2020-03-02 00:00:00"|"0053i000002XCpAAG"|"2020-03-02 16:01:34"|"94bf83ccf9daf610VgnVCM100000307882a2RCRD"
"2020-03-03 00:00:00"|"0053i000002XCpAAG"|"2020-03-03 09:15:56"|"1a4bb238cdedd610VgnVCM100000307882a2RCRD"
"2020-03-03 00:00:00"|"0053i000002XCpAAG"|"2020-03-03 09:52:33"|"22408baca6fee610VgnVCM100000307882a2RCRD"
I need to cleanse this data and the needs to look like this.
Created_Date__c|CreatedById|CreatedDate|Guid_c
2020-03-02 00:00:00|0053i000002XCpAAG|2020-03-02 16:01:34|94bf83ccf9daf610VgnVCM100000307882a2RCRD
2020-03-03 00:00:00|0053i000002XCpAAG|2020-03-03 09:15:56|1a4bb238cdedd610VgnVCM100000307882a2RCRD
2020-03-03 00:00:00|0053i000002XCpAAG|2020-03-03 09:52:33|22408baca6fee610VgnVCM100000307882a2RCRD
I tried using ReplaceText with these configurations.
search value - ^"(.*)"$ and Replacement Value - $1. But these configurations is not working and the file is routing to failure. not sure what might be the issue.
open to other suggestions. Thanks in advance.

I think you should only use "(.*?)" regex instead of ^"(.*)"$.
Some online services such as https://www.freeformatter.com/java-regex-tester.html can be useful for testing the regex replacement.

I think your best option here is a ConvertRecord processor, have CSVReader with infer schema + changing the csv sep to your own |, and a CSVRecordSetWritter with Option Quote Mode set to Do Not Quote Values and also set your sep as per your need.

Related

How to define special characters in JMeter parameterization

I am trying to parameterize my request using CSV Data Set Config. My input includes double quotes("), colon(:) and brackets([])
Eg: fiscal_year ":["2021",2019]"
Had tried with it, but in the actual results its passing as "fiscal_year "":[""2021""
Please share your inputs on what am i missing on the input paramter.

I don't think it's due to double quotes("), colon(:) and brackets([]), CSV stands for comma-separated values so JMeter treats it as a delimiter and reads everything including to the first comma.
So you might want to change the "Delimiter" to something else:
It's hard to come up with a comprehensive solution without seeing at least couple of lines from your CSV file and the way you're parameterizing the HTTP Request with JMeter Variables
In case you have one entry per line in the CSV file it might be easier to go for __StringFromFile() function which reads next line from the file each time it's being called. See Apache JMeter Functions - An Introduction for more information.

Jmeter- How to pass Comma separated String as 1 value through parametrization

From a csv file, I need to pass
224,329,429
as a single value to one of the parameter in HTTP request.
I have parameterized using CSV data config. But, only 224 is getting passed.
I want 224,329,429 to be treated as a single value.
Please let me know how do I achieve this. Should I change anything in CSV config or CSV file to make this work?

Just use __StringFromFile() function instead of using CSV Data Set Config.
The __StringFromFile() function reads next line from the file each time it's being called so it seems to be a lot easier to stick to it for particular your scenario.
The syntax is as simple as ${__StringFromFile(/path/to/your/file.csv,,,)} and the function can be used anywhere in the script, i.e. directly in the request parameter section.
See Apache JMeter Functions - An Introduction to get started with the JMeter Functions concept and comprehensive information on the above and other JMeter functions.

You should change your delimiter to a not used character e.g. #
In that way you will be able to get full line for every request

Use ${__FileToString(dummy.csv,,payloadvar)} function. It makes the file independent that mean you can use any file extension example: .txt, .csv, .excel etc..
Just keep the string in dummy.csv and it will fetch the whole string.
benefit of using this function is, it will not consider comma's so in case your string has comma separated values then this is the best option.

Just use %2C in the place of comma.

How to clean a csv file where fields contains the csv separator and delimiter

I'm currently strugling to clean csv files generated automatically with fields containing the csv separator and the field delimiter using sed or awk or via a script.
The source software has no settings to play with to improve the situation.
Format of the csv:
"111111";"text";"";"text with ; and " sometimes "; or ;" multiple times";"user";
Fortunately, the csv is "well" formatted, the exporting software just doesn't escape or replace "forbidden" chars from the fields.
In the last few days I tried to improve my knowledge of regular expression and find expression to clean the files but I failed.
What I managed to do so far:
RegEx to find the fields (I wanted to find the fields and perform a replace inside but I didn't find a way to do it)
(?:";"|^")(.*?)(?=";"|";\n)
RegEx that find semicolon, does not work if the semicolon is the last char of the field only find one per field.
(?:^"|";")(?:.*?)(;)(?:[^"\n].*?)(?=";"|";\n)
RegEx to find the double quotes, seems to pick the first double quote of the line in online regex testers
(?:^"|";")(?:.*?)[^;](")(?:[^;].*?)(?=";"|";\n)
I thought of adding space between each chars in the fields then searching for lonely semi colon and double quotes and remove single space after that but I don't know if it's even possible and seems like a poor solution anyway.

Any standard library should be able to handle it if there is no explicit error in the CSV itself. This is why we have quote-characters and escape characters.
When you create a CSV by yourself - you may forgot handling such cases and let your final output file use this situation. AWK is not a CSV reader but simply a text processing utility.
This is what your row should rather look like.
"111111";"text";"";"text with \; and \" sometimes \"; or ;\" multiple times";"user";
So if you can still re-fetch the data, find a way to export the CSV either through the database's own functionality of csv library for the languages you work with.
In python, this would look like this:-
mywriter = csv.writer(csvfile, delimiter=';', quotechar='"', escapechar="\\")
But if you can't create csv again, the only hope is that you expect some pattern within the fields, as in this question:- parse a csv file that contains commans in the fields with awk
But this is rarely true in textual data - esp comments or posts on a webpage. Another idea in such situations would be to use '\t' as separator.

return line of strings between two strings in a ruby variable

I would like to extract a line of strings but am having difficulties using the correct RegEx. Any help would be appreciated.
String to extract: KSEA 122053Z 21008KT 10SM FEW020 SCT250 17/08 A3044 RMK AO2 SLP313 T01720083 50005
For Some reason StackOverflow wont let me cut and paste the XML data here since it includes "<>" characters. Basically I am trying to extract data between "raw_text" ... "/raw_text" from a xml that will always be formatted like the following: http://www.aviationweather.gov/adds/dataserver_current/httpparam?dataSource=metars&requestType=retrieve&format=xml&hoursBeforeNow=3&mostRecent=true&stationString=PHNL%20KSEA
However, the Station name, in this case "KSEA" will not always be the same. It will change based on user input into a search variable.
Thanks In advance

if I can assume that every strings that you want starts with KSEA, then the answer would be:
.*(KSEA.*?)KSEA.*
using ? would let .* match as less as possible.

Overcoming a basic problem with CSV parsing using the FasterCSV gem

I have found a CSV parsing issue with FasterCSV (1.5.0) which seems like a genuine bug, but which I'm hoping there's a workaround for.
Basically, adding a space after the separator (in my case a comma) when the fields are enclosed in quotes generates a MalformedCSVError.
Here's a simple example:
# No quotes on fields -- works fine
FasterCSV.parse_line("one,two,three")
=> ["one", "two", "three"]
# Quotes around fields with no spaces after separators -- works fine
FasterCSV.parse_line("\"one\",\"two\",\"three\"")
=> ["one", "two", "three"]
# Quotes around fields but with a space after the first separator -- fails!
FasterCSV.parse_line("\"one\", \"two\",\"three\"")
=> FasterCSV::MalformedCSVError: Illegal quoting on line 1.
Am I going mad, or is this a bug in FasterCSV?

The MalformedCSVError is correct here.
Leading/trailing spaces in CSV format are not ignored, they are considered part of a field. So this means you have started a field with a space, and then included unescaped double quotes in that field, which would cause the illegal quoting error.
Maybe this library is just more strict than others you have used.

Maybe you could set the :col_sep: option to ', ' to make it parse files like that.

I had hoped that the :col_sep option might allow a regular expression, but it seems to be used for both reading and writing, which is a shame. The documentation doesn't hold out much hope and your need is probably more immediate than could be satisfied by requesting a change or submitting a patch ;-)
If you're calling #parse_line explicitly, then you could always call
gsub(/,\s*/, ',')
on your input line. That regular expression might need to change significantly if you anticipate the possibility of comma-space within quoted strings. (I'd suggest reposting such a question here with a suitable tag and let the RegEx mavens loose on it should that be the case).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to extract fields enveloped in quotes in NiFi? - etl

I think you should only use "(.?)" regex instead of ^"(.)"$. Some online services such as https://www.freeformatter.com/java-regex-tester.html can be useful for testing the regex replacement.

I think your best option here is a ConvertRecord processor, have CSVReader with infer schema + changing the csv sep to your own |, and a CSVRecordSetWritter with Option Quote Mode set to Do Not Quote Values and also set your sep as per your need.

Related

How to define special characters in JMeter parameterization

Jmeter- How to pass Comma separated String as 1 value through parametrization

How to clean a csv file where fields contains the csv separator and delimiter

return line of strings between two strings in a ruby variable

Overcoming a basic problem with CSV parsing using the FasterCSV gem

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to extract fields enveloped in quotes in NiFi? - etl

I think you should only use "(.*?)" regex instead of ^"(.*)"$. Some online services such as https://www.freeformatter.com/java-regex-tester.html can be useful for testing the regex replacement.

I think your best option here is a ConvertRecord processor, have CSVReader with infer schema + changing the csv sep to your own |, and a CSVRecordSetWritter with Option Quote Mode set to Do Not Quote Values and also set your sep as per your need.

Related

How to define special characters in JMeter parameterization

Jmeter- How to pass Comma separated String as 1 value through parametrization

How to clean a csv file where fields contains the csv separator and delimiter

return line of strings between two strings in a ruby variable

Overcoming a basic problem with CSV parsing using the FasterCSV gem

Categories

Resources

I think you should only use "(.?)" regex instead of ^"(.)"$. Some online services such as https://www.freeformatter.com/java-regex-tester.html can be useful for testing the regex replacement.