Nifi Merge Json Files Then Turn into JsonArray - apache-nifi

Is there a processor / flow I am not considering when wanting to convert JsonMessages merged together (using MergeContent) into a JsonArray? I wanted to try to build JsonArrays from Multiple files and then pass to the QueryRecord to run SQL. Good chance I am missing an out of the box/ obvious way to do that. Any help would be greatly appreciated.
I could use a groovy execute script, but was wanting to avoid custom code, if possible. Thanks!
Messages In -> MergeContent -> ConvertToJsonArray -> QueryRecord.

You can use MergeContent and set the Delimiter Strategy to "Text" and then enter [ , ] for the header, demarcator, and footer respectively.
That will insert the header at the beginning of the flow file, the demarcator between every flow file, and the footer at the end.
Alternatively, since it looks like you are using the record stuff, the latest release should have a MergeRecord processor which handle this for you if you used a JsonTreeReader and JsonRecordSetWriter.

Related

Write header line for empty CSVs using Apache Nifis "CsvRecordSetWriter" Controller and "ConvertRecord" Processor

I'm using NiFi 1.11.4 to read CSV files from an SFTP, do a few transformations and then drop them off on GCS. Some of the files contain no content, only a header line. During my transformations I convert the files to the AVRO format, but when converting back to CSV no file output is produced for the files where the content is empty.
I have the following settings for the Processor:
And for the Controller:
I did find the following topic: How to use ConvertRecord and CSVRecordSetWriter to output header (with no data) in Apache NiFi? but in the comments it mentions explicitly that ConvertRecord should cover this since 1.8. Sadly I understood it incorrectly, it does not seem to work or my setup is wrong.
While I could make it work with by explicitly writing the schema as a line to empty files, I wanted to know if there is also a more elegant way?

How to store a part of log into a file using LogStash

I'm processing a log file with the help of logstash aggregate filter with grok having multiple patterns.
Now while processing the logs I want to extract a part of the log with some regex and store it into a file.
For example, let's say my log is :
id:0422 time:[2013-11-19 02:34:58] level:INFO text:(Lorem Ipsum is simply dummy text of the printing and typesetting industry)
In this log the text will be different at every time.
I have a regex with help of it I can match a part of text that can occure in logstash
So if I find something in that text with help of that regex while logstash indexing into elastic I want to store it into some file or something
Is it possible to achieve this?
There are different solutions for this:
create a filter using ruby code that will be triggered to write in a specific format when you have all the event data together
create a separate output which will be triggered based on an if statement to a file, this will be the preferred way of working as it is clear that it is an output.
Depending on the fact if you want to send all data or not, or have it look different or not you might need to use the clone function in order to clone the event into two different ones which can be manipulated apart from each other using tags.

NiFi ReplaceText Behaving Differently

I have a CSV file that I've introduced into my pipeline (for testing purposes) in two ways. First, I'm using GetFile to read it in from the server file system. Second, I'm using GenerateFlowFile. The content of these files is identical; I copied and pasted the content from the GetFile output to insert as text into GenerateFlowFile. Yet, when I run these through a ReplaceText processor, I am seeing different results.
The file from GenerateFlowFile is working as expected, and the regex string in ReplaceText is being found and replaced with an empty string exactly as I want. But, the file from GetFile is returning a file with no change after running through ReplaceText. How is this possible, and how can I fix this?
I tried to create a reproducible example, but I'm only seeing the issue with my data and can't replicate it with non-PII data. If it makes a difference, the regex used in ReplaceText is ^.*"\(Line.*,\n and the replacement value is an empty string set. Essentially, I want to drop the extraneous first line.

MergeRecord - control name of output

I have a fairly simple process that merges xml files into one or more XML files, using the MergeRecord processor. I'm then converting them into JSON and writing them out with PutFile. The files come out with fabulous names like 79f000ec-9da1-4b59-a0a8-79cc3bb5e85a.
Is there any way to control those file names, or at least give them an appropriate extension?
beforeyour putFile use updateAttribut processor and rename ${fileName}
Exemple :

Convert a CSV file to JSON using Apache NiFi

I am trying to read a csv from local file system and convert the content into JSON format using Apache Nifi and put the JSON format file in the local system. I have succeeded in converting the first row of csv file but not other rows. What am I missing?
Input:
1,aaa,loc1
2,bbb,loc2
3,ccc,loc3
and my nifi workflow is as here:
http://www.filedropper.com/mycsvtojson
My output is as below which is desired format but I want that to happen for all the rows.
{ "id" : "1", "name" : "aaa",
"location" : "loc1" }
There are a few different ways this could be done...
A custom Java processor that reads in a CSV and converts to JSON
Using the ExecuteScript processor to do something similar in a Groovy/Jython script
Use SplitText to split your original CSV into single lines, then use your current approach with ExtractText and ReplaceText, and then a MergeContent to merge back together
Use ConvertCsvToAvro and then ConvertAvroToJson
Although the last option makes an extra conversion to Avro, it might be the easiest solution requiring almost no work.
This question is a bit older, but there is now a ConvertRecord processor in NiFi 1.3 and newer, which should be able to handle this conversion directly for you, and it avoids having to use split up the data by creating a single JSON array with all of the values, if that is desirable.

Resources