issue generating json file from AttributesToJSON in Nifi? - apache-nifi

I have a scenario where list of files are coming from previous processor, where for each file, I have to create json file with attributes of the flowfile. In AttributesToJSON processor configuration there is option to extract pipeline attributes and can create json files/object, if we set Include Core Attributes to true, it will read some of the file properties and forms the json file
the out for the above case in my scenario is …
{"fragment.size":"125"
file.group:"root",
file.lastModifiedTime:"2020-12-22T15:09:13+0000",
fragment.identifier:"ee5770ea-8406-400a-a2fd-2362bd706fe0",
fragment.index:"1",
file.creationTime:"2020-12-22T15:09:13+0000",
file.lastAccessTime:"2020-12-22T17:34:22+0000",
segment.original.filename:"Sample-Spreadsheet-10000-rows.csv",
file.owner:"root",
fragment.count:"2",
file.permissions:"rw-r--r--",
text.line.count:"1"}
}
But the files has other properties, like absolute.path, filename, uuid are missing in the above json file.
My requirement is, get the absolute.path, filename and uuid and concatenate absolute.path+/+filename, assign this to custom attribute say filepath:absolute.path+/+filename and also add uuid to json object.
so my json file should like
{ uuid:"file uuid value", filepath:"absolute.path+/+filename" }
any inputs to get above form of json file

If you look at the docs for AttributesToJSON you can see that you can specificy attributes in the Attributes List property. So you could try listing the properties you want there.
Alternatively. Sounds like you have 1 FlowFile for each File? You could use UpdateRecord to insert fields. You can use the Literal Value for the Replacement Value Strategy which will let you use Expression Language to insert values - for example, you could add a Property called filename with value ${filename} to insert the value of the filename attribute to a field in the JSON called filename.
To concat the two fields you could do ${allAttributes("absolute.path", "filename"):join('/')} or use append().

Related

How to add custom attributes to AttributesToJSON?

I have a scenario where list of files are coming from previous processor, where for each file, I have to create json file with attributes of the flowfile. In AttributesToJSON processor configuration there is option to extract pipeline attributes and can create json files/object, if we set Include Core Attributes to true, it will read some of the file properties and forms the json file.
the out for the above case in my scenario is …
{"fragment.size":"125"
file.group:"root",
file.lastModifiedTime:"2020-12-22T15:09:13+0000",
fragment.identifier:"ee5770ea-8406-400a-a2fd-2362bd706fe0",
fragment.index:"1",
file.creationTime:"2020-12-22T15:09:13+0000",
file.lastAccessTime:"2020-12-22T17:34:22+0000",
segment.original.filename:"Sample-Spreadsheet-10000-rows.csv",
file.owner:"root",
fragment.count:"2",
file.permissions:"rw-r--r--",
text.line.count:"1"}
}
But the files has other properties, like absolute.path, filename, uuid are missing in the above json file.
My requirement is, get the absolute.path, filename and uuid and concatenate absolute.path+/+filename, assign this to custom attribute say filepath:absolute.path+/+filename and also add uuid to json object.
so my json file should like
{
uuid:"file uuid value",
filepath:"absolute.path+/+filename"
}
any inputs to get above json file.
Use UpdateAttribute processor to delete the unnecessary attributes before passing to AttributestoJSON or you can also specify the exact attributes you need in the attributesToJSON processor.

How to set an Attribute to Array for AttributeToJSON Processor?

NiFi Version 1.8.0
I'm trying to build our my json, and one of my fields needs to be an array. I thought I could simply use the UpdateAttribute Processor to set my attribute to '["arrayItem1", "arrayItem2"]' and then I could used AttributeToJSON to convert the attribute to JSON and it would convert to an array. Unfortunately, it simply turns into a string.
In the simplest way, how can I set an attribute to be an array so my final JSON (when using AttributeToJSON) field has the specific array?
EDIT 1
I will have a few SyslogListeners, I want to set an attribute so I know what data came from where. I want to be able to tag this data, so I though of adding an UpdateAttribute to set my attribute. I would like this to be an array. So the tag for:
SyslogListener1 will be ["tag1", "tag2"]
SyslogListener2 will be ["tag3", "tag4"]
SyslogListener3 will be ["tag1", "tag3"]
I thought of just having my flow look like this: SyslogListener -> UpdateAttribute -> Then all the data is now in the main flow -> AttributeToJSON. However, when I look at my JSON, my field is a string, not an array. How can I make this field to be an array? What I used to do, was use ReplaceText , the only problem with this is I didn't want to create a ReplaceText for ever single instance. Is there a single processor that could handle this?
Does your incoming flow file have any existing content? If not, you can use ReplaceContent to set the content to ["arrayItem1", "arrayItem2"] or whatever you wish the JSON to look like.
If the incoming flow file has existing JSON content, you can add the field explicitly (without attributes) using JoltTransformJSON or UpdateRecord.
Not my ideal solution, but I simply added a ReplaceText for each instance I would need. In my case, it was 7 different tag formations. So my nifi looks a little ugly. I was hoping for a single processor solution where I could tell it my JSON field and make it an array. So my pipeline is:
SyslogListener -> UpdateAttribute (creates our tags attribute with the string tag1, tag2 and the other tag combinations because I have 7 total SyslogListeners with their own UpdateAttribute) -> Data is now in the main pipeline, and some Other processing stuff happens here -> AttributeToJSON (setting our json with some attributes including our tags attribute) -> My 7 ReplaceTexts (which checks to see if our tags field has "tag1, tag2" and then replaces it with ["tag1", "tag2"], I do this for all 7 cases) -> PutElasticSearchHttp
So ingesting rsyslog messages, doing a bit of enriching, making my data into a JSON, then saving it to ES.
If anyone knows a single processor solution to this, so I don't need to have 7 unique ReplaceTexts (and more if I need new tags).

How to remove the flow file attributes in Nifi flow file?

Update Attribute configuration
It is hard to tell from the screenshots provided, but it looks like the fields you want to remove are part of the content of the flow file, which is different then the attributes of the flow file. UpdateAttribute can only remove attributes, not anything in the content.
In order to modify the content you would need to use a processor specific to the type of content being processed. In your case it looks like JSON, so you could use a ConvertRecord processor with a JsonTreeReader and JsonRecordSetWriter, and configure the writer to have a different schema then the reader. Basically read in all the fields, but only write out the fields you want.
There is an UpdateRecord processor too, but it doesn't currently have the ability to remove fields.
To delete a FlowFile's attribute, you can use UpdateAttribute and a property named Delete Attributes Expression. You just need to fill it with a regular expression that matches the attributes you want to remove.
But as #Bryan Bende said, it doesn't look like you're trying to remove FlowFile's attributes, but content..
If you are willing to remove JSON attributes from your content, you can use JoltTransformJSON and Jolt Transformation DSL of Remove. Then just use specification of the attributes you are willing to remove. For example, I want to delete from this JSON the attribute t1:
{
"t1": "test",
"t2": "test2",
"t3": "test3"
}
So, my specification would be:
{
"t1": ""
}
You can read more about it here.

Nifi - How to insert XML whole content into JSON attribute

I am trying to insert the whole content of a row of an XML file into a JSON attribute (I am a newbie).
I am doing it this way (tell me if there is an easier way, it's good to now):
I have configured Extract text this way:
And to finish, I configure the Replace Text, giving a JSON format:
But he result appears to be wrong (doesn't work like a normal JSON file, for example if I a try to do a httpPost):
How can I fix this problem?
cheers
If you are concern regards to new lines and json key/values then use NiFi expression language functions on the extracted attribute(data).
ReplaceText Configs:
Replacement value:
{"name" : "user1","time" : "${now()}","data" : "${data:replaceAll('\s',''):escapeJson()}"}
Use escapeJson and replaceAll function to replace all spaces,newlines with ''
Replacement Strategy as Always Replace
(or)
Another way of preparing json message is by using AttributesToJson processor.
if we are using this processor then we need to prepare attributes/values before AttributesToJson processor by using UpdateAttribute processor
Flow:
1.SplitXml
2.ExtractText //add data property to extract content to flowfile attribute
3.UpdateAttribute //add name property -> user1
add time property -> ${now()}
add data property -> ${data:replaceAll('\s',''):escapeJson()}}
4.AttributeToJson //Attributes List -> name,time,data
Destination -> flowfile content
include core attributes -> false

InferAvroSchema Avro Record Name based on flow attribute

I have a common process group that will infer avro schema based on the file i supplied. But I want to set the Avro Record Name to a name corresponding to the filename i am supplying. So I used ${filename}. But the InferAvroSchema got error saying the record name is empty. Note that before this, I already set the property "filename" to the flowfile attribute and it has a value since i tested it using ReplaceText to see if there's value for ${filename}
Unfortunately this looks like a bug in InferAvroSchema. Many of the properties support expression language, but then the processor doesn't evaluate them against the incoming flow file. So it ends up only being able to use a value typed directly into the property (non-EL), or a value from system or environment properties which doesn't really make sense for a lot of these properties.
I created this JIRA for the issue:
https://issues.apache.org/jira/browse/NIFI-2465
The fix is that all of the calls to evaluateAttributeExpressions() should be passing in a flow file like:
context.getProperty(CSV_HEADER_DEFINITION).evaluateAttributeExpressions(inputFlowFile).getValue()

Resources