I want to merge two FlowFile by filename attribute. The UpdateAttribute contains filename -> ${UUID()}. Then EvaluateJsonPath has filname -> $.filename. However finally I get two FlowFiles with different filename attributes that cannot be merged:
Output ofEvaluateJsonPath: the value of filename is an empty string
Output of QueryElasticsearchHttp: the value of filename is 1.
How to make these two outputs to have the pair of same values of filename?
If you want to merge by filename just put an UpdateAttribute right before MergeContent and set filename to a constant value like 'myfielname'.
There seems to be no reason for the UpdateAttribute at the beginning of the flow, since you are going to overwrite filename in the EvaluateJsonPath processor, plus I think all flow files should already have filename set to the flow file's uuid, unless it came from a GetFile which set the filename from the file on disk.
Related
Im very new in NiFi..
I get data(FlowFile ?) from my processor "ConsumerKafka", it seems like
So, i have to delete any text before '!',I know a little Python. So with "ExcecuteScript", i want to do something like this
my_string=session.get()
my_string.split('!')[1]
#it return "ZPLR_CHDN_UPN_ECN....."
but how to do it right?
p.s. or, may be, use "substringAfterLast", but how?
Tnanks.
Update:
I have to remove text between '"Tagname":' and '!', how can i do it without regex?
If you simply want to split on a bang (!) and only keep the text after it, then you could achieve this with a SplitContent configured as:
Byte Sequence Format: Text
Byte Sequence: !
Keep Byte Sequence: false
Follow this with a RouteOnAttribute configured as:
Routing Strategy: Route to Property name
Add a new dynamic property called "substring_after" with a value: ${fragment.index:equals(2)}
For your input, this will produce 2 FlowFiles - one with the substring before ! and one with the substring after !. The first FlowFile (substring before) will route out of the RouteOnAttribute to the unmatched relationship, while the second FlowFile (substring after) will route to a substring_after relationship. You can auto-terminate the unmatched relationship to drop the text you don't want.
There are downsides to this approach though.
Are you guaranteed that there is only ever a single ! in the content? How would you handle multiple?
You are doing a substring on some JSON as raw text. Splitting on ! will result in a "} left at the end of the string.
These look like log entries, you may want to consider looking into ConsumeKafkaRecord and utilising NiFi's Record capabilities to interpret and manipulate the data more intelligently.
On scripting, there are some great cookbooks for learning to script in NiFi, start here: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922
Edit:
Given your update, I would use UpdateRecord with a JSON Reader and Writer, and Replacement Value Strategy set to Record Path Value .
This uses the RecordPath syntax to perform transformations on data within Records. Your JSON Object is a Record. This would allow you to have multiple Records within the same FlowFile (rather than 1 line per FlowFile).
Then, add a dynamic property to the UpdateRecord with:
Name: /Tagname
Value: substringAfter(/Tagname, '!' )
What is this doing?
The Name of the property (/Tagname) is a RecordPath to the Tagname key in your JSON. This tells UpdateRecord where to put the result. In your case, we're replacing the value of an existing key (but it could be also be a new key if you wanted to add one).
The Value of the property is the expression to evaluate to build the value you want to insert. We are using the substringAfter function, which takes 2 parameters. The first parameter is the RecordPath to the Key in the Record that contains the input String, which is also /Tagname (we're replacing the value of Tagname, with a substring of the original Tagname value). The second parameter is the String to split on, which is !.
If your purpose getting the string between ! and "} use ReplaceText with (.*)!(.*)"} , capture second group and replace it with entire content
Please note that this regular expression may not be best for your case but I believe you can find solution for your problem with regular expression
In NiFi I'm processing a flowfile containing the following attribute:
Key: 'my_array'
Value: '[u'firstElement', u'secondElement']'
I'd like to split flowFile on this array to process each element separately (and then merge). I tried to use SplitJson processor, but it requires JSON content to operate on, so I used AttributesToJSON before it. Unfortunately the produced flowFile's content is:
{"my_array": "[u'firstElement', u'secondElement'"}
And I receive the error
The evaluated value [u'firstElement', u'secondElement'] of $['my_array'] was not a JSON Array compatible type and cannot be split.
Is it possible to convert my_array string to the correct JSON array? Do I need to use ExecuteScript or is there some simpler way?
How about ReplaceText with Replacement Strategy of Always Replace and Replacement Value of ${my_array} and then SplitJSON?
This will replace your FlowFile's content with this attribute's value and then you could SplitJSON on it.
Suppose I want to string : "Hashtags": "['tag1','tag2']" (as part of my resultant json in Nifi,) to be changed into : "Hashtags": ['tag1','tag2'].
what I do is :
I use ReplaceText with Replacement Strategy : Regex Replace and Replacement Value : a regex Expression. This will replace FlowFile's matched content with this attribute's value and then you could continue your process.
In nifi, I need to transfer a bunch of json files to HDFS. The json files have a field called "creationDate" which has the date in UNIX format. I need to use the date in there to funnel the file to HDFS directories that are named after dates, like "2019-01-19" "2019-01-20" "2019-01-21" etc.
At first I used an "EvaluateJsonPath" processor going to a "PutHDFS" processor. The "Evaluate..." processor had "creationDate" as the property and "${creationDate} as the value. In the PutHDFS processor, for directory I put "/${creationDate}"
But then I realized that the date in the json file has the full timestamp, like "2019-01-19T04:34:28.527722+00:00
Obviously I don't need all that, just the first eight digits. So how can I turn this big string into a neat 8-digit directory name? Will I need to use a regex, and if so, how can this be implemented? Thanks in advance for any help.
You can use UpdateAttribute and use the date expression language functions to format it.
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
Example (not specific to your format):
${creationDate:toDate('MM-dd-yyyy'):format('yyyy/MM/dd')}
In UpdateAttribute you would add a new property name creationDate and set the value to an expression like above.
I am trying to modify the name of some files with NiFi getting a value from a JSON an adding to the original filename (for example filename.csv (original name) + january (name that provides the incoming JSON)). To do so, I am redirecting the CSV files to updateAttribute to change de Filename.
From the other hand, I am receiving a JSON that has an attribute that will be part of the name of the file.
On the evaluateJsonPath the configuration is the next (I am receiving it correctly the value):
And finally I am trying to merge the values in the UpdateAttribute processor (here is where it doesn't work properly):
The response I am getting is _filename.csv
You have quotes around name inside your Expression Language expression, try
${name}_${filename} or
${name:append('_'):append(${filename})}
I am a newbie to Pentaho (installed today). I was able to do basic transformation in Spoon. Now I need to do some stuff, which I can't figure out how.
my input looks like
2012-09-17|garima|fbhjgjhgj87687jghgj88jgjj|garima#1347868164626|::ffff:120.56.132.137
3rd field is an ID, for which I need to get some information from a REST API
http://api.app.com/app/api/v1/feature/fbhjgjhgj87687jghgj88jgjj
What do I need to do in Spoon to get this done?
Also, data return will be in json format. how do I parse that?
You should first get your input with a CSV File Input using | as delimiter. Then you can get the 3rd field as a string.
Next you probably need to remove all spaces from this string with a String operations step. Look at the Remove special character column, and select space.
Then you need to concatenate it with your http address http://api.app.com/app/api/v1/feature/. For this you'll use a Calculator step. At this step first create a new temporary field tmpAddr, with operation Define a constant value for ... (or something like this, sorry my spoon is in portuguese). At the Field A column you'll write your http address. It's a good practice, after you make this work, to set your address as a system variable so if it changes you don't need to replace it everywhere on your transformations (look at menu Edit -> System Variables).
Now on the same Calculator step create another field, let's say MyAddress, with operation A+B. Choose for Field A the field tmpAddr you just created, and for Field B the 3rd field from your input.
Now on your stream you should have the full address as a field MyAddress. Connect a REST client step. Mark Accept URL from field and choose field MyAddress as URL Field Name. Set Application Type to JSON. Set Result Fieldname as MyResult.
If you need further JSON parsing you can add a Json input step. Set Source is defined in a field and select field MyResult as Get Source from field.
An alternate approach is to use the "Replace in String" step to append the string.
Set 'use RegEx' to Y
Set 'Search' to (.*)
Set 'Replace with' to http://api.app.com/app/api/v1/feature/$1
Set 'Whole Word' to Y
The parentheses in the regex set up a capture group that you can then insert into your replacement string with the $X syntax