Write header line for empty CSVs using Apache Nifis "CsvRecordSetWriter" Controller and "ConvertRecord" Processor - apache-nifi

I'm using NiFi 1.11.4 to read CSV files from an SFTP, do a few transformations and then drop them off on GCS. Some of the files contain no content, only a header line. During my transformations I convert the files to the AVRO format, but when converting back to CSV no file output is produced for the files where the content is empty.
I have the following settings for the Processor:
And for the Controller:
I did find the following topic: How to use ConvertRecord and CSVRecordSetWriter to output header (with no data) in Apache NiFi? but in the comments it mentions explicitly that ConvertRecord should cover this since 1.8. Sadly I understood it incorrectly, it does not seem to work or my setup is wrong.
While I could make it work with by explicitly writing the schema as a line to empty files, I wanted to know if there is also a more elegant way?

Related

Parsing a JSON file without JSON.parse()

This is my first time using Ruby. I'm writing an application that parses data and performs some calculations based on it, the source of which is a JSON file. I'm aware I can use JSON.parse() here but I'm trying to write my program so that it will work with other sources of data. Is there a clear cut way of doing this? Thank you.
When your source file is JSON then use JSON.parse. Do not implement a JSON parser on your own. If the source file is a CSV, then use the CSV class.
When your application should be able to read multiple different formats then just add one Reader class for each data type, like JSONReader, CSVReader, etc. And then decide depending on the file extension which reader to use to read the file.

NiFi ReplaceText Behaving Differently

I have a CSV file that I've introduced into my pipeline (for testing purposes) in two ways. First, I'm using GetFile to read it in from the server file system. Second, I'm using GenerateFlowFile. The content of these files is identical; I copied and pasted the content from the GetFile output to insert as text into GenerateFlowFile. Yet, when I run these through a ReplaceText processor, I am seeing different results.
The file from GenerateFlowFile is working as expected, and the regex string in ReplaceText is being found and replaced with an empty string exactly as I want. But, the file from GetFile is returning a file with no change after running through ReplaceText. How is this possible, and how can I fix this?
I tried to create a reproducible example, but I'm only seeing the issue with my data and can't replicate it with non-PII data. If it makes a difference, the regex used in ReplaceText is ^.*"\(Line.*,\n and the replacement value is an empty string set. Essentially, I want to drop the extraneous first line.

MergeRecord - control name of output

I have a fairly simple process that merges xml files into one or more XML files, using the MergeRecord processor. I'm then converting them into JSON and writing them out with PutFile. The files come out with fabulous names like 79f000ec-9da1-4b59-a0a8-79cc3bb5e85a.
Is there any way to control those file names, or at least give them an appropriate extension?
beforeyour putFile use updateAttribut processor and rename ${fileName}
Exemple :

NiFi: Routing on File Types, e.g. csv, tsv, xlsx

I have a connected SFTP server, and I am trying to route files based on type: .csv, .tsv, and .xlsx. For now, I'm just uploading test files through the command line.
My flow is:
GetSFTP (with correct hostname, etc.) ->
RouteOnAttribute ->
LogAttribute (will dump elsewhere soon, this is just for testing)
My problem, I think, is that I created a property in RouteOnAttribute incorrectly:
Am I correct in assuming that this does not actually pick up on the .csv because it is not technically part of the filename? What would be the correct expression to route on the file type? Thanks!
You need some information that will tell you the type of file.
GetSFTP should be getting the filename from the file on the sftp server, so if those have the appropriate extensions then I would expect your RouteOnAttribute to work correctly.
If the filename does not have the appropriate extension, then the only thing you can do is try to use IdentifyMimeType to determine what type of file it is, and then route on the mime.type attribute.

Convert a CSV file to JSON using Apache NiFi

I am trying to read a csv from local file system and convert the content into JSON format using Apache Nifi and put the JSON format file in the local system. I have succeeded in converting the first row of csv file but not other rows. What am I missing?
Input:
1,aaa,loc1
2,bbb,loc2
3,ccc,loc3
and my nifi workflow is as here:
http://www.filedropper.com/mycsvtojson
My output is as below which is desired format but I want that to happen for all the rows.
{ "id" : "1", "name" : "aaa",
"location" : "loc1" }
There are a few different ways this could be done...
A custom Java processor that reads in a CSV and converts to JSON
Using the ExecuteScript processor to do something similar in a Groovy/Jython script
Use SplitText to split your original CSV into single lines, then use your current approach with ExtractText and ReplaceText, and then a MergeContent to merge back together
Use ConvertCsvToAvro and then ConvertAvroToJson
Although the last option makes an extra conversion to Avro, it might be the easiest solution requiring almost no work.
This question is a bit older, but there is now a ConvertRecord processor in NiFi 1.3 and newer, which should be able to handle this conversion directly for you, and it avoids having to use split up the data by creating a single JSON array with all of the values, if that is desirable.

Resources