NiFi ReplaceText Behaving Differently - apache-nifi

I have a CSV file that I've introduced into my pipeline (for testing purposes) in two ways. First, I'm using GetFile to read it in from the server file system. Second, I'm using GenerateFlowFile. The content of these files is identical; I copied and pasted the content from the GetFile output to insert as text into GenerateFlowFile. Yet, when I run these through a ReplaceText processor, I am seeing different results.
The file from GenerateFlowFile is working as expected, and the regex string in ReplaceText is being found and replaced with an empty string exactly as I want. But, the file from GetFile is returning a file with no change after running through ReplaceText. How is this possible, and how can I fix this?
I tried to create a reproducible example, but I'm only seeing the issue with my data and can't replicate it with non-PII data. If it makes a difference, the regex used in ReplaceText is ^.*"\(Line.*,\n and the replacement value is an empty string set. Essentially, I want to drop the extraneous first line.

Related

Write header line for empty CSVs using Apache Nifis "CsvRecordSetWriter" Controller and "ConvertRecord" Processor

I'm using NiFi 1.11.4 to read CSV files from an SFTP, do a few transformations and then drop them off on GCS. Some of the files contain no content, only a header line. During my transformations I convert the files to the AVRO format, but when converting back to CSV no file output is produced for the files where the content is empty.
I have the following settings for the Processor:
And for the Controller:
I did find the following topic: How to use ConvertRecord and CSVRecordSetWriter to output header (with no data) in Apache NiFi? but in the comments it mentions explicitly that ConvertRecord should cover this since 1.8. Sadly I understood it incorrectly, it does not seem to work or my setup is wrong.
While I could make it work with by explicitly writing the schema as a line to empty files, I wanted to know if there is also a more elegant way?

How to strip a NiFi flow file contents from a flow file?

I'm using a ReplaceText processor and "replacing" all matching characters with nothing. This effectively removes the flow file contents, but it seems inefficient.
I'm done processing with the flow file contents and will just be writing metrics from the attributes that are remaining.
Is there a better way to get rid of the flow file contents?
If you do a Literal Replace with an Evaluation Mode of Entire Text it should be very fast as there is no regex matching or overwriting or anything else, just writing nothing to the outgoing Flow File.

How do I find formatting settings for CSV on Mac?

I have a Python program that extracts data from an API, applies transformations, and converts it to a csv to be used in Tableau. When I view the file in excel and Google Sheets, it looks fine. No data formatting or read errors as it is formatted in standard UTF8.
When I read it in Tableau, different story. You will notice how the columns lose shape and get parsed incorrectly.
I am thinking it has to do with the fact that my data set is text heavy and contains punctuation, but I have been able to work with data in this format just fine without having to do any custom formatting.
It looks like your csv has multiline fields (which are quoted).
You'll somehow have to tell the Tableau reader/parser to read your data as quoted (and multiline).
Also check the escaping of the quotes (if they are inside a field) - usually this is done with another quote, but could also be with a backslash.

Getting inverted commas appended in request while reading from csv in Jmeter?

I was trying to read csv file using csvdata config element in jmeter so as to test multiple logins but when I try to read the value from csv file then I get inverted commas appended with respect to result. Please tell me how to get rid of these commas being passed in the request parameters
Please find my csv data config and excel file and request parameter screenshot in attachments
JMeter normally doesn't add anything to the variables, most probably you have the quotation marks in the generated CSV file, open it with normal text editor like Notepad and use find-and-replace feature to remove the quotation marks from there.
If you cannot efficiently control the CSV data you can use __strReplace() function in order to remove the quotation marks from the variables originating from the CSV Data Set Config on the fly like:
${__strReplace(${Username},\",,)}
Demo:
You can install __strReplace() function as well as other Custom JMeter Functions using JMeter Plugins Manager
I had the same issue when I opened a csv file as a normal text file, I saw a Values, value2.
After removing it, it started working as expected.

Convert a CSV file to JSON using Apache NiFi

I am trying to read a csv from local file system and convert the content into JSON format using Apache Nifi and put the JSON format file in the local system. I have succeeded in converting the first row of csv file but not other rows. What am I missing?
Input:
1,aaa,loc1
2,bbb,loc2
3,ccc,loc3
and my nifi workflow is as here:
http://www.filedropper.com/mycsvtojson
My output is as below which is desired format but I want that to happen for all the rows.
{ "id" : "1", "name" : "aaa",
"location" : "loc1" }
There are a few different ways this could be done...
A custom Java processor that reads in a CSV and converts to JSON
Using the ExecuteScript processor to do something similar in a Groovy/Jython script
Use SplitText to split your original CSV into single lines, then use your current approach with ExtractText and ReplaceText, and then a MergeContent to merge back together
Use ConvertCsvToAvro and then ConvertAvroToJson
Although the last option makes an extra conversion to Avro, it might be the easiest solution requiring almost no work.
This question is a bit older, but there is now a ConvertRecord processor in NiFi 1.3 and newer, which should be able to handle this conversion directly for you, and it avoids having to use split up the data by creating a single JSON array with all of the values, if that is desirable.

Resources