Nifi expression language for jsonpath - string manipulation, split - apache-nifi

I'd like to manipulate values from two properties in the json using Nifi. I don't know how which functions are available to use in the EvaluateJsonPath processor in the Nifi. That's an example of json:
{
"ensemblGeneId":"ENSG00000145982",
"approvedName":"phenylalanyl-tRNA synthetase 2, mitochondrial",
"hgncId":"HGNC:21062",
"nameSynonyms":"\"iGb3 synthase\", \"isoglobotriaosylceramide synthase\"",
"approvedSymbol":"FARS2",
"ncbiGeneId":"10667",
"symbolSynonyms":"IGBS3S, IGB3S"
}
I'd like to to treat values from nameSynonyms and symbolSynonyms properties, converting to array of string, like this:
"nameSynonyms":["iGb3 synthase", "isoglobotriaosylceramide synthase"],
"symbolSynonyms":["IGBS3S", "IGB3S"]
I think of using ReplaceText processor or EvaluateJsonPath. If I use the ReplaceText processor, I need to use some of this processor. I'd like to use multiple replace expressions once. Otherwise, using EvaluateJsonPath, I don't know how which expressions may be use to resolve it, may include split, replace, concat functions, and so on... But I don't know how to use these functions in the processor.
How it it more appropriate to use in this case? Is there another processor that could be use?
Thank you very much!

I think we can do this with a combo of ReplaceText and JoltTransformJSON. I have attached screenshot of a .
First, I replace the escaped quotes in nameSynonyms with an empty string in the ReplaceText processor. Processor Configuration is shown below
Second, I use a jolt spec to convert nameSynonyms and symbolSynonyms to an array.
Jolt spec is below
[
{
"operation": "modify-overwrite-beta",
"spec": {
"nameSynonyms": "=split(',',#(1,nameSynonyms))",
"symbolSynonyms": "=split(',',#(1,symbolSynonyms))"
}
}
]
The output json is as in the screenshot
Hope this helps and if so please accept the answer.

Related

NiFi: change text in FlowFile (Python or ...)

Im very new in NiFi..
I get data(FlowFile ?) from my processor "ConsumerKafka", it seems like
So, i have to delete any text before '!',I know a little Python. So with "ExcecuteScript", i want to do something like this
my_string=session.get()
my_string.split('!')[1]
#it return "ZPLR_CHDN_UPN_ECN....."
but how to do it right?
p.s. or, may be, use "substringAfterLast", but how?
Tnanks.
Update:
I have to remove text between '"Tagname":' and '!', how can i do it without regex?
If you simply want to split on a bang (!) and only keep the text after it, then you could achieve this with a SplitContent configured as:
Byte Sequence Format: Text
Byte Sequence: !
Keep Byte Sequence: false
Follow this with a RouteOnAttribute configured as:
Routing Strategy: Route to Property name
Add a new dynamic property called "substring_after" with a value: ${fragment.index:equals(2)}
For your input, this will produce 2 FlowFiles - one with the substring before ! and one with the substring after !. The first FlowFile (substring before) will route out of the RouteOnAttribute to the unmatched relationship, while the second FlowFile (substring after) will route to a substring_after relationship. You can auto-terminate the unmatched relationship to drop the text you don't want.
There are downsides to this approach though.
Are you guaranteed that there is only ever a single ! in the content? How would you handle multiple?
You are doing a substring on some JSON as raw text. Splitting on ! will result in a "} left at the end of the string.
These look like log entries, you may want to consider looking into ConsumeKafkaRecord and utilising NiFi's Record capabilities to interpret and manipulate the data more intelligently.
On scripting, there are some great cookbooks for learning to script in NiFi, start here: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922
Edit:
Given your update, I would use UpdateRecord with a JSON Reader and Writer, and Replacement Value Strategy set to Record Path Value .
This uses the RecordPath syntax to perform transformations on data within Records. Your JSON Object is a Record. This would allow you to have multiple Records within the same FlowFile (rather than 1 line per FlowFile).
Then, add a dynamic property to the UpdateRecord with:
Name: /Tagname
Value: substringAfter(/Tagname, '!' )
What is this doing?
The Name of the property (/Tagname) is a RecordPath to the Tagname key in your JSON. This tells UpdateRecord where to put the result. In your case, we're replacing the value of an existing key (but it could be also be a new key if you wanted to add one).
The Value of the property is the expression to evaluate to build the value you want to insert. We are using the substringAfter function, which takes 2 parameters. The first parameter is the RecordPath to the Key in the Record that contains the input String, which is also /Tagname (we're replacing the value of Tagname, with a substring of the original Tagname value). The second parameter is the String to split on, which is !.
If your purpose getting the string between ! and "} use ReplaceText with (.*)!(.*)"} , capture second group and replace it with entire content
Please note that this regular expression may not be best for your case but I believe you can find solution for your problem with regular expression

How to extract fields enveloped in quotes in NiFi?

I have some pipe delimited files. Each field is bounded by quotes like this.
"Created_Date__c"|"CreatedById"|"CreatedDate"|"Guid_c"
"2020-03-02 00:00:00"|"0053i000002XCpAAG"|"2020-03-02 16:01:34"|"94bf83ccf9daf610VgnVCM100000307882a2RCRD"
"2020-03-03 00:00:00"|"0053i000002XCpAAG"|"2020-03-03 09:15:56"|"1a4bb238cdedd610VgnVCM100000307882a2RCRD"
"2020-03-03 00:00:00"|"0053i000002XCpAAG"|"2020-03-03 09:52:33"|"22408baca6fee610VgnVCM100000307882a2RCRD"
I need to cleanse this data and the needs to look like this.
Created_Date__c|CreatedById|CreatedDate|Guid_c
2020-03-02 00:00:00|0053i000002XCpAAG|2020-03-02 16:01:34|94bf83ccf9daf610VgnVCM100000307882a2RCRD
2020-03-03 00:00:00|0053i000002XCpAAG|2020-03-03 09:15:56|1a4bb238cdedd610VgnVCM100000307882a2RCRD
2020-03-03 00:00:00|0053i000002XCpAAG|2020-03-03 09:52:33|22408baca6fee610VgnVCM100000307882a2RCRD
I tried using ReplaceText with these configurations.
search value - ^"(.*)"$ and Replacement Value - $1. But these configurations is not working and the file is routing to failure. not sure what might be the issue.
open to other suggestions. Thanks in advance.
I think you should only use "(.*?)" regex instead of ^"(.*)"$.
Some online services such as https://www.freeformatter.com/java-regex-tester.html can be useful for testing the regex replacement.
I think your best option here is a ConvertRecord processor, have CSVReader with infer schema + changing the csv sep to your own |, and a CSVRecordSetWritter with Option Quote Mode set to Do Not Quote Values and also set your sep as per your need.

Cannot replace more than one character at a time with a ReplaceText processor in NiFi

I want to insert a comma between braces in a FlowFile.
I receive json objects and merge them together with a process. After the merge I get this:
{"tag":"a","bag":"b"}{"tag":"c","bag":"d"}
I want it to end up like this after the ReplaceText processor:
{"tag":"a","bag":"b"},{"tag":"c","bag":"d"}
However my ReplaceText process doesn't work. I have it set up like this:
With this process, nothing gets replaced.
Am I doing something wrong?
I suggest to add demarcator in MergeContent if you use it in you process
But you variant still must work. I try this on 1.10 and content has been changed.

I want to convert the text into JSON format using nifi

I have been trying to convert my integer and string values to JSON format using replacetext processor in NIFI. But I'm facing problem in regular expression. Can anyone suggest me a Regular Expression in search value and replacement value.
Orginal Text format :
{Sensor_id:2.4,locationIP:2.2,Sensor_value:A}
Expected JSON format
{Sensor_id:2.4,locationIP:2.2,Sensor_value:"A"}
Processor configuration :
You can use the regex ([\w_]+):([a-zA-Z]\w*) with replacement $1:"$2" as you can see here
But notice that a valid JSON should have quotes in the keys. For example:
{"Sensor_id":2.4,"locationIP":2.2,"Sensor_value":"A"}
In this case, I would recommend:
Add a ReplaceText processor with the regex ([\w_]+): and replacement "$1":
Link the output of the first ReplaceText to another ReplaceText processor with the regex ([\w_"]+):([a-zA-Z]\w*) and replacement $1:"$2"
I hope it helps
EDIT:
If you want to transform {Sensor_id:2.4,locationIP:2.2,Sensor_value:A} into {"Sensor_id":"2.4","locationIP":"2.2","Sensor_value":"A"} you can use only one regex in a single processor:
Regex: ([\w_]+):([.\w]*)
Replacement: "$1":"$2"

How to convert a string to a JSON array using NiFi

In NiFi I'm processing a flowfile containing the following attribute:
Key: 'my_array'
Value: '[u'firstElement', u'secondElement']'
I'd like to split flowFile on this array to process each element separately (and then merge). I tried to use SplitJson processor, but it requires JSON content to operate on, so I used AttributesToJSON before it. Unfortunately the produced flowFile's content is:
{"my_array": "[u'firstElement', u'secondElement'"}
And I receive the error
The evaluated value [u'firstElement', u'secondElement'] of $['my_array'] was not a JSON Array compatible type and cannot be split.
Is it possible to convert my_array string to the correct JSON array? Do I need to use ExecuteScript or is there some simpler way?
How about ReplaceText with Replacement Strategy of Always Replace and Replacement Value of ${my_array} and then SplitJSON?
This will replace your FlowFile's content with this attribute's value and then you could SplitJSON on it.
Suppose I want to string : "Hashtags": "['tag1','tag2']" (as part of my resultant json in Nifi,) to be changed into : "Hashtags": ['tag1','tag2'].
what I do is :
I use ReplaceText with Replacement Strategy : Regex Replace and Replacement Value : a regex Expression. This will replace FlowFile's matched content with this attribute's value and then you could continue your process.

Resources