I have been trying to convert my integer and string values to JSON format using replacetext processor in NIFI. But I'm facing problem in regular expression. Can anyone suggest me a Regular Expression in search value and replacement value.
Orginal Text format :
{Sensor_id:2.4,locationIP:2.2,Sensor_value:A}
Expected JSON format
{Sensor_id:2.4,locationIP:2.2,Sensor_value:"A"}
Processor configuration :
You can use the regex ([\w_]+):([a-zA-Z]\w*) with replacement $1:"$2" as you can see here
But notice that a valid JSON should have quotes in the keys. For example:
{"Sensor_id":2.4,"locationIP":2.2,"Sensor_value":"A"}
In this case, I would recommend:
Add a ReplaceText processor with the regex ([\w_]+): and replacement "$1":
Link the output of the first ReplaceText to another ReplaceText processor with the regex ([\w_"]+):([a-zA-Z]\w*) and replacement $1:"$2"
I hope it helps
EDIT:
If you want to transform {Sensor_id:2.4,locationIP:2.2,Sensor_value:A} into {"Sensor_id":"2.4","locationIP":"2.2","Sensor_value":"A"} you can use only one regex in a single processor:
Regex: ([\w_]+):([.\w]*)
Replacement: "$1":"$2"
Related
I am following nifi guide to parse a delimited file content.
Instead of coding the search text and replacement value i want to use the content of two attributes.
processor config
When executed the processor is not using the attribute content as a regexp even if it is a valid regex expression
Im very new in NiFi..
I get data(FlowFile ?) from my processor "ConsumerKafka", it seems like
So, i have to delete any text before '!',I know a little Python. So with "ExcecuteScript", i want to do something like this
my_string=session.get()
my_string.split('!')[1]
#it return "ZPLR_CHDN_UPN_ECN....."
but how to do it right?
p.s. or, may be, use "substringAfterLast", but how?
Tnanks.
Update:
I have to remove text between '"Tagname":' and '!', how can i do it without regex?
If you simply want to split on a bang (!) and only keep the text after it, then you could achieve this with a SplitContent configured as:
Byte Sequence Format: Text
Byte Sequence: !
Keep Byte Sequence: false
Follow this with a RouteOnAttribute configured as:
Routing Strategy: Route to Property name
Add a new dynamic property called "substring_after" with a value: ${fragment.index:equals(2)}
For your input, this will produce 2 FlowFiles - one with the substring before ! and one with the substring after !. The first FlowFile (substring before) will route out of the RouteOnAttribute to the unmatched relationship, while the second FlowFile (substring after) will route to a substring_after relationship. You can auto-terminate the unmatched relationship to drop the text you don't want.
There are downsides to this approach though.
Are you guaranteed that there is only ever a single ! in the content? How would you handle multiple?
You are doing a substring on some JSON as raw text. Splitting on ! will result in a "} left at the end of the string.
These look like log entries, you may want to consider looking into ConsumeKafkaRecord and utilising NiFi's Record capabilities to interpret and manipulate the data more intelligently.
On scripting, there are some great cookbooks for learning to script in NiFi, start here: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922
Edit:
Given your update, I would use UpdateRecord with a JSON Reader and Writer, and Replacement Value Strategy set to Record Path Value .
This uses the RecordPath syntax to perform transformations on data within Records. Your JSON Object is a Record. This would allow you to have multiple Records within the same FlowFile (rather than 1 line per FlowFile).
Then, add a dynamic property to the UpdateRecord with:
Name: /Tagname
Value: substringAfter(/Tagname, '!' )
What is this doing?
The Name of the property (/Tagname) is a RecordPath to the Tagname key in your JSON. This tells UpdateRecord where to put the result. In your case, we're replacing the value of an existing key (but it could be also be a new key if you wanted to add one).
The Value of the property is the expression to evaluate to build the value you want to insert. We are using the substringAfter function, which takes 2 parameters. The first parameter is the RecordPath to the Key in the Record that contains the input String, which is also /Tagname (we're replacing the value of Tagname, with a substring of the original Tagname value). The second parameter is the String to split on, which is !.
If your purpose getting the string between ! and "} use ReplaceText with (.*)!(.*)"} , capture second group and replace it with entire content
Please note that this regular expression may not be best for your case but I believe you can find solution for your problem with regular expression
I'd like to manipulate values from two properties in the json using Nifi. I don't know how which functions are available to use in the EvaluateJsonPath processor in the Nifi. That's an example of json:
{
"ensemblGeneId":"ENSG00000145982",
"approvedName":"phenylalanyl-tRNA synthetase 2, mitochondrial",
"hgncId":"HGNC:21062",
"nameSynonyms":"\"iGb3 synthase\", \"isoglobotriaosylceramide synthase\"",
"approvedSymbol":"FARS2",
"ncbiGeneId":"10667",
"symbolSynonyms":"IGBS3S, IGB3S"
}
I'd like to to treat values from nameSynonyms and symbolSynonyms properties, converting to array of string, like this:
"nameSynonyms":["iGb3 synthase", "isoglobotriaosylceramide synthase"],
"symbolSynonyms":["IGBS3S", "IGB3S"]
I think of using ReplaceText processor or EvaluateJsonPath. If I use the ReplaceText processor, I need to use some of this processor. I'd like to use multiple replace expressions once. Otherwise, using EvaluateJsonPath, I don't know how which expressions may be use to resolve it, may include split, replace, concat functions, and so on... But I don't know how to use these functions in the processor.
How it it more appropriate to use in this case? Is there another processor that could be use?
Thank you very much!
I think we can do this with a combo of ReplaceText and JoltTransformJSON. I have attached screenshot of a .
First, I replace the escaped quotes in nameSynonyms with an empty string in the ReplaceText processor. Processor Configuration is shown below
Second, I use a jolt spec to convert nameSynonyms and symbolSynonyms to an array.
Jolt spec is below
[
{
"operation": "modify-overwrite-beta",
"spec": {
"nameSynonyms": "=split(',',#(1,nameSynonyms))",
"symbolSynonyms": "=split(',',#(1,symbolSynonyms))"
}
}
]
The output json is as in the screenshot
Hope this helps and if so please accept the answer.
In NiFi I'm processing a flowfile containing the following attribute:
Key: 'my_array'
Value: '[u'firstElement', u'secondElement']'
I'd like to split flowFile on this array to process each element separately (and then merge). I tried to use SplitJson processor, but it requires JSON content to operate on, so I used AttributesToJSON before it. Unfortunately the produced flowFile's content is:
{"my_array": "[u'firstElement', u'secondElement'"}
And I receive the error
The evaluated value [u'firstElement', u'secondElement'] of $['my_array'] was not a JSON Array compatible type and cannot be split.
Is it possible to convert my_array string to the correct JSON array? Do I need to use ExecuteScript or is there some simpler way?
How about ReplaceText with Replacement Strategy of Always Replace and Replacement Value of ${my_array} and then SplitJSON?
This will replace your FlowFile's content with this attribute's value and then you could SplitJSON on it.
Suppose I want to string : "Hashtags": "['tag1','tag2']" (as part of my resultant json in Nifi,) to be changed into : "Hashtags": ['tag1','tag2'].
what I do is :
I use ReplaceText with Replacement Strategy : Regex Replace and Replacement Value : a regex Expression. This will replace FlowFile's matched content with this attribute's value and then you could continue your process.
Needed a StoreFunc implementation that could allow PIG to have field delimiters as multiple bytes for example - ^^ (\u005E\u005E)
Tried all these but without succcess -
store B into '/tmp/test/output' using PigStorage('\u005E\u005E');
store B into '/tmp/test/output' using PigStorage('^^');
store B into '/tmp/test/output' using PigStorage('\\^\\^');
Is there an already existing implementation like LoadFunc implementation org.apache.pig.piggybank.storage.MyRegExLoader for StoreFunc that can take regular expressions for field separator while writing ?
Worked around this by using CONCAT for first delimiter and using PigStorage syntax for second occurence