How do I perform name standardization in nifi using update record processor? - apache-nifi

In my nifi flow, I need to perform name standardization for a specific column.
Examples include:
Making name title case
If it contain mc before something such as donald, make it McDonald
and such other things.
How do I perform all of these in a single go in update record processor?
Also, I dont see any function for making name titlecase in nifi expression language. I only see upper and lower. How do i build the logic? Do I need to make a custom property for this?
Please let me know. thanks.

It's possible with the method WordUtils.capitalizeFully. Check also this question
ScriptedTransformRecord processor:
Record Reader: JsonTreeReader
Record Writer: JsonRecordSetWriter
Script Language: Groovy
Script Body:
import org.apache.commons.lang3.text.WordUtils
record.setValue("text", WordUtils.capitalizeFully(record.getValue("text")))
record
Example
input json
[
{
"text": "man OF stEEL"
},
{
"text": "hELLo"
}
]
output json
[
{
"text": "Man Of Steel"
},
{
"text": "Hello"
}
]

Related

Is there a way to write an Expression in Power Automate to retrieve item from SurveyMonkey?

There is no dynamic content you can get from the SurveyMonkey trigger in Power Automate except for the Analyze URL, Created Date, and Link. Is it possible I could retrieve the data with an expression so I could add fields to SharePoint or send emails based on answers to questions?
For instance, here is some JSON data for a county multiple choice field, that I would like to know the county so I can have the email sent to the correct person:
{
"id": "753498214",
"answers": [
{
"choice_id": "4963767255",
"simple_text": "Williamson"
}
],
"family": "single_choice",
"subtype": "menu",
"heading": "County where the problem is occurring:"
}
And basically, a way to create dynamic fields from the content so it would be more usable?
I am a novice so your answer will have to assume I know nothing!
Thanks for considering the question.
Overall, anything I have tried is unsuccessful!
I was able to get an answer on Microsoft Power Users support.
Put this data in compose action:
{
"id": "753498214",
"answers": [
{
"choice_id": "4963767255",
"simple_text": "Williamson"
}
],
"family": "single_choice",
"subtype": "menu",
"heading": "County where the problem is occurring:"
}
Then these expressions in additional compose actions:
To get choice_id:
outputs('Compose')?['answers']?[0]?['choice_id']
To get simple_text:
outputs('Compose')?['answers']?[0]?['simple_text']
Reference link here where I retrieved the answer is here.
https://powerusers.microsoft.com/t5/General-Power-Automate/How-to-write-an-expression-to-retrieve-answer/m-p/1960784#M114215

Azure Data Factory REST API paging with Elasticsearch

During developing pipeline which will use Elasticsearch as a source I faced with issue related paging. I am using SQL Elasticsearch API. Basically, I've started to do request in postman and it works well. The body of request looks following:
{
"query":"SELECT Id,name,ownership,modifiedDate FROM \"core\" ORDER BY Id",
"fetch_size": 20,
"cursor" : ""
}
After first run in response body it contains cursor string which is pointer to next page. If in postman I send the request and provide cursor value from previous request it return data for second page and so on. I am trying to archive the same result in Azure Data Factory. For this I using copy activity, which store response to Azure blob. Setup for source is following.
copy activity source configuration
This is expression for body
{
"query": "SELECT Id,name,ownership,modifiedDate FROM \"#{variables('TableName')}\" WHERE ORDER BY Id","fetch_size": #{variables('Rows')}, "cursor": ""
}
I have no idea how to correctly setup pagination rule. The pipeline works properly but only for the first request. I've tried to setup Headers.cursor and expression $.cursor but this setup leads to an infinite loop and pipeline fails with the Elasticsearch restriction.
I've also tried to read document at https://learn.microsoft.com/en-us/azure/data-factory/connector-rest#pagination-support but it seems pretty limited in terms of usage examples and difficult for understanding.
Could somebody help me understand how to build the pipeline with paging abilities utilization?
Responce with the cursor looks like:
{
"columns": [
{
"name": "companyId",
"type": "integer"
},
{
"name": "name",
"type": "text"
},
{
"name": "ownership",
"type": "keyword"
},
{
"name": "modifiedDate",
"type": "datetime"
}
],
"rows": [
[
2,
"mic Inc.",
"manufacture",
"2021-03-31T12:57:51.000Z"
]
],
"cursor": "g/WuAwFaAXNoRG5GMVpYSjVWR2hsYmtabGRHTm9BZ0FBQUFBRUp6VGxGbUpIZWxWaVMzcGhVWEJITUhkbmJsRlhlUzFtWjNjQUFBQUFCQ2MwNWhaaVIzcFZZa3Q2WVZGd1J6QjNaMjVSVjNrdFptZDP/////DwQBZgljb21wYW55SWQBCWNvbXBhbnlJZAEHaW50ZWdlcgAAAAFmBG5hbWUBBG5hbWUBBHRleHQAAAABZglvd25lcnNoaXABCW93bmVyc2hpcAEHa2V5d29yZAEAAAFmDG1vZGlmaWVkRGF0ZQEMbW9kaWZpZWREYXRlAQhkYXRldGltZQEAAAEP"
}
I finally find the solution, hopefully, it will be useful for the community.
Basically, what needs to be done it is split the solution into four steps.
Step 1 Make the first request as in the question description and stage file to blob.
Step 2 Read blob file and get the cursor value, set it to variable
Step 3 Keep requesting data with a changed body
{"cursor" : "#{variables('cursor')}" }
Pipeline looks like this:
pipeline
Configuration of pagination looks following
pagination . It is a workaround as the server ignores this header, but we need to have something which allows sending a request in loop.

How do I use FreeFormTextRecordSetWriter

I my Nifi controller I want to configure the FreeFormTextRecordSetWriter, but I have no Idea what I should put in the "Text" field. I'm getting the text from my source (in my case GetSolr), and just want to write this, period.
Documentation and mailinglist do not seem to tell me how this is done, any help appreciated.
EDIT: Here the sample input + output I want to achieve (as you can see: not ransformation needed, plain text, no JSON input)
EDIT: I now realize, that I can't tell GetSolr to return just CSV data - but I have to use Json
So referencing with attribute seems to be fine. What the documentation omits is, that the ${flowFile} attribute should containt the complete flowfile that is returned.
Sample input:
{
"responseHeader": {
"zkConnected": true,
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"_": "1553686715465"
}
},
"response": {
"numFound": 3194,
"start": 0,
"docs": [
{
"id": "{402EBE69-0000-CD1D-8FFF-D07756271B4E}",
"MimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"FileName": "Test.docx",
"DateLastModified": "2019-03-27T08:05:00.103Z",
"_version_": 1629145864291221504,
"LAST_UPDATE": "2019-03-27T08:16:08.451Z"
}
]
}
}
Wanted output
{402EBE69-0000-CD1D-8FFF-D07756271B4E}
BTW: The documentation says this:
The text to use when writing the results. This property will evaluate the Expression Language using any of the fields available in a Record.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
I want to use my source's text, so I'm confused
You need to use expression language as if the record's fields are the FlowFile's attributes.
Example:
Input:
{
"t1": "test",
"t2": "ttt",
"hello": true,
"testN": 1
}
Text property in FreeFormTextRecordSetWriter:
${t1} k!${t2} ${hello}:boolean
${testN}Num
Output(using ConvertRecord):
test k!ttt true:boolean
1Num
EDIT:
Seems like what you needed was reading from Solr and write a single column csv. You need to use CSVRecordSetWriter. As for the same,
I should tell you to consider to upgrade to 1.9.1. Starting from 1.9.0, the schema can be inferred for you.
otherwise, you can set Schema Access Strategy as Use 'Schema Text' Property
then, use the following schema in Schema Text
{
"name": "MyClass",
"type": "record",
"namespace": "com.acme.avro",
"fields": [
{
"name": "id",
"type": "int"
}
]
}
this should work
I'll edit it into my answer. If it works for you, please choose my answer :)

How to use the Nifi JoltJSONTransform spec?

I wish to use the JoltTransformJSON spec that can be used to convert the input to output.
I have tried to use map to List and other syntax, but was not been successful so far.
Expected input:
{
"params": "sn=GH6747246T4JLR6AZ&c=QUERY_RECORD&p=test_station_name&p=station_id&p=result&p=mac_addresss"
}
Expected output:
{
"queryType": "scan",
"dataSource": "xyz",
"resultFormat": "list",
"columns": ["test_station_name", "station_id", "result", "mac_address"],
"intervals": ["2018-01-01/2018-02-09"],
"filter": {
"type": "selector",
"dimension": "sn",
"value": "GH6747246T4JLR6AZ"
}
}
Except for the content inside Columns and dimension and value attributes rest of the fields are hardcoded.
As all of the data is contained in a single JSON key/value, I don't think JoltTransformJSON is the best option here. I actually think writing a simple script in Python/Groovy/Ruby to split the querystring value and write it out as JSON is easier and less complicated to maintain. I would recommend Groovy specifically (you can use the specialized ExecuteGroovyScript processor), as it is the most performant & robust in Apache NiFi and has excellent JSON handling.

Mandrill API with Handlebars "each-loop" not working

Met problem when using Mandrill API to send transactional newsletters. I chose Handlebars for the template parameters. The user name was shown correctly, but data in the list (post titles) were empty. Please help indicate if anything I did wrong. Thank you!
The template is as below, sent to the endpoint /messages/send.json :
func genHTMLTemplate() string {
return "code generated template<br>" +
"Hi {{name}}, <br>" +
"{{#each posts}}<div>" +
"TITLE {{title}}, THIS {{this}}<br>" +
"</div>{{/each}}"
}
The API log in my Settings panel in mandrillapp.com shows the parameters:
{
"key": "xxxxxxxxxx",
"message": {
:
"merge_language": "handlebars",
"global_merge_vars": null,
"merge_vars": [
{
"rcpt": "xxxxxx#gmail.com",
"vars": [
{
"name": "posts",
"content": [
{
"title": "title A"
},
{
"title": "title B"
},
]
},
{
"name": "name",
"content": "John Doe"
}
]
}
],
:
},
:
}
And below is the email received. "title A" and "title B" are expected after "TITLE".
code generated template
Hi John Doe,
TITLE, THIS Array
TITLE, THIS Array
Mandrill decided to create custom handlebars helpers with some horrible, HORRIBLE names:
https://mandrill.zendesk.com/hc/en-us/articles/205582537-Using-Handlebars-for-Dynamic-Content#inline-helpers-available-in-mandrill
title and url will definitely give you grief if your objects happen to have keys named title and urlas well. Why they didn't name their helpers something like toTitleCase and encodeUrl is beyond me.
As far as arrays and #each is concerned, you can work around it by using {{this.title}} instead of {{title}}.
After testing with Mandrill's sample code here I found the key "title" just doesn't work. Dunno the reason (a reserved keyword of Mandrill?) but replace it with "title1", "titleX" or something else it can be rendered correctly.
{
"name": "posts",
"content": [
{
"title": "blah blah" // "title1" or something else works
},
}
while using handlebars as the merge language 'title' is the reserved helpername which is used in handlebars which makes your text in title case. If you do only {{title}} by default it considers as title the empty text. try giving it {{title title}} which should work or changing the key name to something else ( if you dont want your title in title case )
I know this is late but it could be of use to someone trying to debug this issue currently. Take note of this point in the Mandrill documentation
There are two main ways to add dynamic content via merge tags: Handlebars or the Mailchimp merge language. You may already be familiar with the Mailchimp merge language from creating and editing Mailchimp templates. We also offer a custom implementation of Handlebars, which is open source and offers greater flexibility.
To set your merge language, navigate to Sending Defaults and select Mailchimp or Handlebars from the Merge Language drop-down menu.
I've run into a similar issue on Sending Blue, where their default configuration does not enable handle bars so it won't evaluate them.

Resources