I am a newbie to Pentaho (installed today). I was able to do basic transformation in Spoon. Now I need to do some stuff, which I can't figure out how.
my input looks like
2012-09-17|garima|fbhjgjhgj87687jghgj88jgjj|garima#1347868164626|::ffff:120.56.132.137
3rd field is an ID, for which I need to get some information from a REST API
http://api.app.com/app/api/v1/feature/fbhjgjhgj87687jghgj88jgjj
What do I need to do in Spoon to get this done?
Also, data return will be in json format. how do I parse that?
You should first get your input with a CSV File Input using | as delimiter. Then you can get the 3rd field as a string.
Next you probably need to remove all spaces from this string with a String operations step. Look at the Remove special character column, and select space.
Then you need to concatenate it with your http address http://api.app.com/app/api/v1/feature/. For this you'll use a Calculator step. At this step first create a new temporary field tmpAddr, with operation Define a constant value for ... (or something like this, sorry my spoon is in portuguese). At the Field A column you'll write your http address. It's a good practice, after you make this work, to set your address as a system variable so if it changes you don't need to replace it everywhere on your transformations (look at menu Edit -> System Variables).
Now on the same Calculator step create another field, let's say MyAddress, with operation A+B. Choose for Field A the field tmpAddr you just created, and for Field B the 3rd field from your input.
Now on your stream you should have the full address as a field MyAddress. Connect a REST client step. Mark Accept URL from field and choose field MyAddress as URL Field Name. Set Application Type to JSON. Set Result Fieldname as MyResult.
If you need further JSON parsing you can add a Json input step. Set Source is defined in a field and select field MyResult as Get Source from field.
An alternate approach is to use the "Replace in String" step to append the string.
Set 'use RegEx' to Y
Set 'Search' to (.*)
Set 'Replace with' to http://api.app.com/app/api/v1/feature/$1
Set 'Whole Word' to Y
The parentheses in the regex set up a capture group that you can then insert into your replacement string with the $X syntax
Related
Im very new in NiFi..
I get data(FlowFile ?) from my processor "ConsumerKafka", it seems like
So, i have to delete any text before '!',I know a little Python. So with "ExcecuteScript", i want to do something like this
my_string=session.get()
my_string.split('!')[1]
#it return "ZPLR_CHDN_UPN_ECN....."
but how to do it right?
p.s. or, may be, use "substringAfterLast", but how?
Tnanks.
Update:
I have to remove text between '"Tagname":' and '!', how can i do it without regex?
If you simply want to split on a bang (!) and only keep the text after it, then you could achieve this with a SplitContent configured as:
Byte Sequence Format: Text
Byte Sequence: !
Keep Byte Sequence: false
Follow this with a RouteOnAttribute configured as:
Routing Strategy: Route to Property name
Add a new dynamic property called "substring_after" with a value: ${fragment.index:equals(2)}
For your input, this will produce 2 FlowFiles - one with the substring before ! and one with the substring after !. The first FlowFile (substring before) will route out of the RouteOnAttribute to the unmatched relationship, while the second FlowFile (substring after) will route to a substring_after relationship. You can auto-terminate the unmatched relationship to drop the text you don't want.
There are downsides to this approach though.
Are you guaranteed that there is only ever a single ! in the content? How would you handle multiple?
You are doing a substring on some JSON as raw text. Splitting on ! will result in a "} left at the end of the string.
These look like log entries, you may want to consider looking into ConsumeKafkaRecord and utilising NiFi's Record capabilities to interpret and manipulate the data more intelligently.
On scripting, there are some great cookbooks for learning to script in NiFi, start here: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922
Edit:
Given your update, I would use UpdateRecord with a JSON Reader and Writer, and Replacement Value Strategy set to Record Path Value .
This uses the RecordPath syntax to perform transformations on data within Records. Your JSON Object is a Record. This would allow you to have multiple Records within the same FlowFile (rather than 1 line per FlowFile).
Then, add a dynamic property to the UpdateRecord with:
Name: /Tagname
Value: substringAfter(/Tagname, '!' )
What is this doing?
The Name of the property (/Tagname) is a RecordPath to the Tagname key in your JSON. This tells UpdateRecord where to put the result. In your case, we're replacing the value of an existing key (but it could be also be a new key if you wanted to add one).
The Value of the property is the expression to evaluate to build the value you want to insert. We are using the substringAfter function, which takes 2 parameters. The first parameter is the RecordPath to the Key in the Record that contains the input String, which is also /Tagname (we're replacing the value of Tagname, with a substring of the original Tagname value). The second parameter is the String to split on, which is !.
If your purpose getting the string between ! and "} use ReplaceText with (.*)!(.*)"} , capture second group and replace it with entire content
Please note that this regular expression may not be best for your case but I believe you can find solution for your problem with regular expression
I am trying to get organizations resource for Name = 'G&A' using the following API
https://xxx/hcmCoreSetupApi/resources/11.13.18.02/organizations/?onlyData=true&q=Name='G&A'
But getting an error "URL request parameter A' cannot be used in this context."
Thank you for the help in advance
The ampersand & character is used as a separator between query parameters. If you want to pass an ampersand as part of a query parameter's value then use the equivalent hexidecimal code %26 instead of &:
https://xxx/hcmCoreSetupApi/resources/11.13.18.02/organizations/?onlyData=true&q=Name='G%26A'
However, that is still invalid as you have too many equals = characters in that string; so did you intend to have three parameters named onlyData, q and Name? Then you would encode them like this:
https://xxx/hcmCoreSetupApi/resources/11.13.18.02/organizations/?onlyData=true&q=&Name='G%26A'
Or, if you really had intended to have two parameters named onlyData, q=Name then you would need to encode the equals = character in the parameter name as well:
https://xxx/hcmCoreSetupApi/resources/11.13.18.02/organizations/?onlyData=true&q%3DName='G%26A'
Or, if Name= is part of the value not the key then:
https://xxx/hcmCoreSetupApi/resources/11.13.18.02/organizations/?onlyData=true&q=Name%3D'G%26A'
I'm using autokey to improve my productivity, I create a sample script with abbreviation that trigger when the tab key is pressed, and the next code is executed by autokey:
output = "My phrase"
keyboard.send_keys(output)
And it work with simple phrases, but I need send parameters in the abbreviation and I need that Autokey use that parameters to generate the phrase. For example I need that with the abbreviation abb-param1-param2, Autokey output the phrase:
This is the message, and you send the parameters: parameter 1: param1, paremeter 2: param2.
My problem with this is that I cant get the word before the cursor text to split it with the '-' separator and take the two last positions of the generated array like parameters to generate the new phrase.
Somebody know what can I to do to solve this problem ?
Thank you.
I'm trying to search for all Observations where "blood" is associated with the code using:
GET [base]/Observation?code:text=blood
It appears that the search is matching Observations where the associated text starts with "blood" but not matching on associated text that contains "blood".
Using the following, I get results with a Coding.display of "Systolic blood pressure" but I'd like to also get these Observations by searching using the text "blood".
GET [base]/Observation?code:text=sys
Is there a different modifier I should be using or wildcards I should use?
The servers seem to do as the spec requests: when using the modifier :text on a token search parameter (like code here), the spec says:
":text The search parameter is processed as a string that searches
text associated with the code/value"
If we look at how a server is supposed to search a string, we find:
"By default, a field matches a string query if the value of the field
equals or starts with the supplied parameter value, after both have
been normalized by case and accent."
Now, if code would have been a true string search parameter, we could have applied the modifier contains, however we cannot stack modifiers, so in this case code:text:containts would may logical, but is not part of the current specification.
So, I am afraid that there is currently no "standard" way to do what you want.
We have request to format row like this:
886,89,5052299385882,1,
Problem is last character of row which should be comma, it is export job for integration so this request is dictated from another side. Is there easy way to achieve this with FlatFileItemWriter?
Currently we modeled our java representation of row to have additional string which is always empty, and told field extractor to extract blank filed as last value for row creation but I am searching for a way to append something on each line.
FlatFileItemWriter has a lineAggregator property.
Write your own implementation of LineAggregator interface (a delegation should be enough) and add a comma to returned string.