Convert a CSV file to JSON using Apache NiFi - apache-nifi

I am trying to read a csv from local file system and convert the content into JSON format using Apache Nifi and put the JSON format file in the local system. I have succeeded in converting the first row of csv file but not other rows. What am I missing?
Input:
1,aaa,loc1
2,bbb,loc2
3,ccc,loc3
and my nifi workflow is as here:
http://www.filedropper.com/mycsvtojson
My output is as below which is desired format but I want that to happen for all the rows.
{ "id" : "1", "name" : "aaa",
"location" : "loc1" }

There are a few different ways this could be done...
A custom Java processor that reads in a CSV and converts to JSON
Using the ExecuteScript processor to do something similar in a Groovy/Jython script
Use SplitText to split your original CSV into single lines, then use your current approach with ExtractText and ReplaceText, and then a MergeContent to merge back together
Use ConvertCsvToAvro and then ConvertAvroToJson
Although the last option makes an extra conversion to Avro, it might be the easiest solution requiring almost no work.

This question is a bit older, but there is now a ConvertRecord processor in NiFi 1.3 and newer, which should be able to handle this conversion directly for you, and it avoids having to use split up the data by creating a single JSON array with all of the values, if that is desirable.

Related

Write header line for empty CSVs using Apache Nifis "CsvRecordSetWriter" Controller and "ConvertRecord" Processor

I'm using NiFi 1.11.4 to read CSV files from an SFTP, do a few transformations and then drop them off on GCS. Some of the files contain no content, only a header line. During my transformations I convert the files to the AVRO format, but when converting back to CSV no file output is produced for the files where the content is empty.
I have the following settings for the Processor:
And for the Controller:
I did find the following topic: How to use ConvertRecord and CSVRecordSetWriter to output header (with no data) in Apache NiFi? but in the comments it mentions explicitly that ConvertRecord should cover this since 1.8. Sadly I understood it incorrectly, it does not seem to work or my setup is wrong.
While I could make it work with by explicitly writing the schema as a line to empty files, I wanted to know if there is also a more elegant way?

NiFi ReplaceText Behaving Differently

I have a CSV file that I've introduced into my pipeline (for testing purposes) in two ways. First, I'm using GetFile to read it in from the server file system. Second, I'm using GenerateFlowFile. The content of these files is identical; I copied and pasted the content from the GetFile output to insert as text into GenerateFlowFile. Yet, when I run these through a ReplaceText processor, I am seeing different results.
The file from GenerateFlowFile is working as expected, and the regex string in ReplaceText is being found and replaced with an empty string exactly as I want. But, the file from GetFile is returning a file with no change after running through ReplaceText. How is this possible, and how can I fix this?
I tried to create a reproducible example, but I'm only seeing the issue with my data and can't replicate it with non-PII data. If it makes a difference, the regex used in ReplaceText is ^.*"\(Line.*,\n and the replacement value is an empty string set. Essentially, I want to drop the extraneous first line.

MergeRecord - control name of output

I have a fairly simple process that merges xml files into one or more XML files, using the MergeRecord processor. I'm then converting them into JSON and writing them out with PutFile. The files come out with fabulous names like 79f000ec-9da1-4b59-a0a8-79cc3bb5e85a.
Is there any way to control those file names, or at least give them an appropriate extension?
beforeyour putFile use updateAttribut processor and rename ${fileName}
Exemple :

ESQL code for CSV to XML conversion in IIB v10

I want to convert input CSV file to XML file using ESQL in IIB v10. Can you please help me with the ESQL code to achieve the same. I've provided the Input CSV file sample and Output XML file sample as below:
Input CSV file
Output XML file
Your question is fundamentally wrong. Using ESQL only to do it on Integration Bus is like using a knife to cut down a tree (when you have the choice with a chainsaw). If you want to convert a csv file to an xml, the proper solution is the following :
1) Define a new DFDL schema to parse the CSV file
2) Define your xsd for the output XML
3) Use the DFDL parser when you read the CSV, and use the structure you created (on the fileInput node for example, I don't know your exact case)
4) Use a mapping node to map from your DFDL structure to your XML structure (defined in the xsd)
Note : the last step can be done with alternatives solution, like compute Nodes (ESQL, Java, C#, php).
If you have any additional questions, feel free to contact me

Nifi Merge Json Files Then Turn into JsonArray

Is there a processor / flow I am not considering when wanting to convert JsonMessages merged together (using MergeContent) into a JsonArray? I wanted to try to build JsonArrays from Multiple files and then pass to the QueryRecord to run SQL. Good chance I am missing an out of the box/ obvious way to do that. Any help would be greatly appreciated.
I could use a groovy execute script, but was wanting to avoid custom code, if possible. Thanks!
Messages In -> MergeContent -> ConvertToJsonArray -> QueryRecord.
You can use MergeContent and set the Delimiter Strategy to "Text" and then enter [ , ] for the header, demarcator, and footer respectively.
That will insert the header at the beginning of the flow file, the demarcator between every flow file, and the footer at the end.
Alternatively, since it looks like you are using the record stuff, the latest release should have a MergeRecord processor which handle this for you if you used a JsonTreeReader and JsonRecordSetWriter.

Resources