NiFi: change text in FlowFile (Python or ...) - apache-nifi

Im very new in NiFi..
I get data(FlowFile ?) from my processor "ConsumerKafka", it seems like
So, i have to delete any text before '!',I know a little Python. So with "ExcecuteScript", i want to do something like this
my_string=session.get()
my_string.split('!')[1]
#it return "ZPLR_CHDN_UPN_ECN....."
but how to do it right?
p.s. or, may be, use "substringAfterLast", but how?
Tnanks.
Update:
I have to remove text between '"Tagname":' and '!', how can i do it without regex?

If you simply want to split on a bang (!) and only keep the text after it, then you could achieve this with a SplitContent configured as:
Byte Sequence Format: Text
Byte Sequence: !
Keep Byte Sequence: false
Follow this with a RouteOnAttribute configured as:
Routing Strategy: Route to Property name
Add a new dynamic property called "substring_after" with a value: ${fragment.index:equals(2)}
For your input, this will produce 2 FlowFiles - one with the substring before ! and one with the substring after !. The first FlowFile (substring before) will route out of the RouteOnAttribute to the unmatched relationship, while the second FlowFile (substring after) will route to a substring_after relationship. You can auto-terminate the unmatched relationship to drop the text you don't want.
There are downsides to this approach though.
Are you guaranteed that there is only ever a single ! in the content? How would you handle multiple?
You are doing a substring on some JSON as raw text. Splitting on ! will result in a "} left at the end of the string.
These look like log entries, you may want to consider looking into ConsumeKafkaRecord and utilising NiFi's Record capabilities to interpret and manipulate the data more intelligently.
On scripting, there are some great cookbooks for learning to script in NiFi, start here: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922
Edit:
Given your update, I would use UpdateRecord with a JSON Reader and Writer, and Replacement Value Strategy set to Record Path Value .
This uses the RecordPath syntax to perform transformations on data within Records. Your JSON Object is a Record. This would allow you to have multiple Records within the same FlowFile (rather than 1 line per FlowFile).
Then, add a dynamic property to the UpdateRecord with:
Name: /Tagname
Value: substringAfter(/Tagname, '!' )
What is this doing?
The Name of the property (/Tagname) is a RecordPath to the Tagname key in your JSON. This tells UpdateRecord where to put the result. In your case, we're replacing the value of an existing key (but it could be also be a new key if you wanted to add one).
The Value of the property is the expression to evaluate to build the value you want to insert. We are using the substringAfter function, which takes 2 parameters. The first parameter is the RecordPath to the Key in the Record that contains the input String, which is also /Tagname (we're replacing the value of Tagname, with a substring of the original Tagname value). The second parameter is the String to split on, which is !.

If your purpose getting the string between ! and "} use ReplaceText with (.*)!(.*)"} , capture second group and replace it with entire content
Please note that this regular expression may not be best for your case but I believe you can find solution for your problem with regular expression

Related

Jmeter- How to pass Comma separated String as 1 value through parametrization

From a csv file, I need to pass
224,329,429
as a single value to one of the parameter in HTTP request.
I have parameterized using CSV data config. But, only 224 is getting passed.
I want 224,329,429 to be treated as a single value.
Please let me know how do I achieve this. Should I change anything in CSV config or CSV file to make this work?
Just use __StringFromFile() function instead of using CSV Data Set Config.
The __StringFromFile() function reads next line from the file each time it's being called so it seems to be a lot easier to stick to it for particular your scenario.
The syntax is as simple as ${__StringFromFile(/path/to/your/file.csv,,,)} and the function can be used anywhere in the script, i.e. directly in the request parameter section.
See Apache JMeter Functions - An Introduction to get started with the JMeter Functions concept and comprehensive information on the above and other JMeter functions.
You should change your delimiter to a not used character e.g. #
In that way you will be able to get full line for every request
Use ${__FileToString(dummy.csv,,payloadvar)} function. It makes the file independent that mean you can use any file extension example: .txt, .csv, .excel etc..
Just keep the string in dummy.csv and it will fetch the whole string.
benefit of using this function is, it will not consider comma's so in case your string has comma separated values then this is the best option.
Just use %2C in the place of comma.

Applying String Manipulations/ Mathematical operations to the contents of a flow file in nifi

I have a flow file coming in, which has fixed width data in the following format :
ABC 0F 15343543543454434 gghhhhhg
ABC 01 433534343434 hjvh
I want to have my output data in the following format:
ABC|15|15343543543454434|gghhhhhg
ABC|1|433534343434|hjvh
to get this output I need to convert the second field in each line to base10 integer and apply a strip operation to all the other fields to trim the white spaces.
I tried using the replaceText processor but I could not find a way to convert the second field to a base10 integer or apply strip function to the string fields.
Working with hexadecimal numbers is not something that is easily done in a current release of NiFi. In order to get it to work you'd need to use one of the scripting processors ExecuteScript or InvokeScripted processor.
That said, doing numeric evaluations is one of my focuses in this upcoming release (which is currently being curated to be finalized) and I've been able to create a solution involving just the ReplaceText processor. I used the following configuration:
Search Value: ^(\w*)\ *(\w*)\ *(\d*)\ *(\w*)$
Replacement Value: $1|${'$2':prepend('0x'):append('p0'):toNumber()}|$3|$4
Replacement Strategy: Regex Replace
Evaluation Mode: Line-by-line
The rest is up to your use-case (ie. which ever character set it is in). The search value will create capture groups for each of the sections. Then in the replacement value I utilize the second (the one for the hex digit) in an Expression language function to convert to base 10. The purpose of the "append" and "prepend" is that on the current master only decimals/double accept hex numbers (I need to improve that) so I just make it format it as a double.
So it is unfortunate this use-case isn't currently handled out of the box, it soon will be!
Edit: I've created a Jira to track adding hex -> whole numbers in EL here: https://issues.apache.org/jira/browse/NIFI-2950
Edit2: A commit addressing the issue has been merged to master and will be in versions 1.1+: https://github.com/apache/nifi/commit/c4be800688bf23a3bdea8def75b84c0f4ded243d

How to Select Only Alphanumeric characters from a string in Datastage?

I am facing a problem with my data, in my data other than alphanumeric characters are there in a column field, where for EX in Name column: Ravicᅩhandr¬an (¬ᅩ○`) like these many characters are there. I need a result like Ravichandran. How can I achieve this? Is there any way to remove in transformer stage.
I tried Convert function in Transformer stage, but problem in using Convert, I am not sure about these unknown characters, I have shown above is just example.
My Requirement is, other than alphanumeric must be removed. And the Balance string should be the same.
How can I get this done?
The following Convert function can be used in Transformer stage to remove any kind of unknown/special characters from the column.
**Convert(Convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 ','', Column_Name1),'',Column_Name1)
Ex : Convert(Convert('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 ','', to_txm.SourceCode),'',to_txm.SourceCode)**

Spring batch FlatFileItemWriter to add comma on end of line

We have request to format row like this:
886,89,5052299385882,1,
Problem is last character of row which should be comma, it is export job for integration so this request is dictated from another side. Is there easy way to achieve this with FlatFileItemWriter?
Currently we modeled our java representation of row to have additional string which is always empty, and told field extractor to extract blank filed as last value for row creation but I am searching for a way to append something on each line.
FlatFileItemWriter has a lineAggregator property.
Write your own implementation of LineAggregator interface (a delegation should be enough) and add a comma to returned string.

How do I concatenate string in Pentaho spoon?

I am a newbie to Pentaho (installed today). I was able to do basic transformation in Spoon. Now I need to do some stuff, which I can't figure out how.
my input looks like
2012-09-17|garima|fbhjgjhgj87687jghgj88jgjj|garima#1347868164626|::ffff:120.56.132.137
3rd field is an ID, for which I need to get some information from a REST API
http://api.app.com/app/api/v1/feature/fbhjgjhgj87687jghgj88jgjj
What do I need to do in Spoon to get this done?
Also, data return will be in json format. how do I parse that?
You should first get your input with a CSV File Input using | as delimiter. Then you can get the 3rd field as a string.
Next you probably need to remove all spaces from this string with a String operations step. Look at the Remove special character column, and select space.
Then you need to concatenate it with your http address http://api.app.com/app/api/v1/feature/. For this you'll use a Calculator step. At this step first create a new temporary field tmpAddr, with operation Define a constant value for ... (or something like this, sorry my spoon is in portuguese). At the Field A column you'll write your http address. It's a good practice, after you make this work, to set your address as a system variable so if it changes you don't need to replace it everywhere on your transformations (look at menu Edit -> System Variables).
Now on the same Calculator step create another field, let's say MyAddress, with operation A+B. Choose for Field A the field tmpAddr you just created, and for Field B the 3rd field from your input.
Now on your stream you should have the full address as a field MyAddress. Connect a REST client step. Mark Accept URL from field and choose field MyAddress as URL Field Name. Set Application Type to JSON. Set Result Fieldname as MyResult.
If you need further JSON parsing you can add a Json input step. Set Source is defined in a field and select field MyResult as Get Source from field.
An alternate approach is to use the "Replace in String" step to append the string.
Set 'use RegEx' to Y
Set 'Search' to (.*)
Set 'Replace with' to http://api.app.com/app/api/v1/feature/$1
Set 'Whole Word' to Y
The parentheses in the regex set up a capture group that you can then insert into your replacement string with the $X syntax

Resources