How to use the Nifi JoltJSONTransform spec? - apache-nifi

I wish to use the JoltTransformJSON spec that can be used to convert the input to output.
I have tried to use map to List and other syntax, but was not been successful so far.
Expected input:
{
"params": "sn=GH6747246T4JLR6AZ&c=QUERY_RECORD&p=test_station_name&p=station_id&p=result&p=mac_addresss"
}
Expected output:
{
"queryType": "scan",
"dataSource": "xyz",
"resultFormat": "list",
"columns": ["test_station_name", "station_id", "result", "mac_address"],
"intervals": ["2018-01-01/2018-02-09"],
"filter": {
"type": "selector",
"dimension": "sn",
"value": "GH6747246T4JLR6AZ"
}
}
Except for the content inside Columns and dimension and value attributes rest of the fields are hardcoded.

As all of the data is contained in a single JSON key/value, I don't think JoltTransformJSON is the best option here. I actually think writing a simple script in Python/Groovy/Ruby to split the querystring value and write it out as JSON is easier and less complicated to maintain. I would recommend Groovy specifically (you can use the specialized ExecuteGroovyScript processor), as it is the most performant & robust in Apache NiFi and has excellent JSON handling.

Related

How do I perform name standardization in nifi using update record processor?

In my nifi flow, I need to perform name standardization for a specific column.
Examples include:
Making name title case
If it contain mc before something such as donald, make it McDonald
and such other things.
How do I perform all of these in a single go in update record processor?
Also, I dont see any function for making name titlecase in nifi expression language. I only see upper and lower. How do i build the logic? Do I need to make a custom property for this?
Please let me know. thanks.
It's possible with the method WordUtils.capitalizeFully. Check also this question
ScriptedTransformRecord processor:
Record Reader: JsonTreeReader
Record Writer: JsonRecordSetWriter
Script Language: Groovy
Script Body:
import org.apache.commons.lang3.text.WordUtils
record.setValue("text", WordUtils.capitalizeFully(record.getValue("text")))
record
Example
input json
[
{
"text": "man OF stEEL"
},
{
"text": "hELLo"
}
]
output json
[
{
"text": "Man Of Steel"
},
{
"text": "Hello"
}
]

Unmarshalling json to use one of the object value as a key

Hello all,
I have a json that looks like something below ...
[
{
"Delay": 0.031247,
"Index": {
"Currency": "USD",
"Valoren": "998434",
},
"IdentifierType": "Symbol",
"Identifier": "SPX.INDCBSX",
},
{
"Delay": 0,
"Index": {
"Currency": "USD",
"Valoren": "13190963",
},
"IdentifierType": "Symbol",
"Identifier": "SPDVXT.INDCBSX",
}
]
I want to unmarshall it in such a way that I should be able to store it as map[string]interface{}, where key is value of "Identifier" value. For example if I write data["SPX.INDCBSX"] then I should get complete array element data. Of course I can unmarshall it by creating similar structure and iterating over it and creating map of this type but it is more time consuming operation.
There's no magic way to tell encoding/json to unmarshal this data for you while selecting a custom field as the key of the map. You have several options:
Implement the json.Unmarshaler interface; see the docs of encoding/json and for example this answer.
Unmarshal the data into []map[string]interface{} and then rejigger it into the format you want in a separate loop.
Use something like JSON-to-Go to create the structures representing your data and then doing (2) is much simpler, but this only works of the structure of your JSON data is rigid.
This post can also be helful. And this one.

How to convert json to collection in power apps

I have a power app that using the flow from power automate.
My flow is doing an HTTP get and respond a JSON to power apps like below.
Here is the JSON as text:
{"value": "[{\"dataAreaId\":\"mv\",\"AccountNum\":\"100000\",\"Name\":\"*****L FOOD AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100001\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100014\",\"Name\":\"****(SEB)\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100021\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100029\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"500100\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"500210\",\"Name\":\"****\"}]"}
But when I try to convert this JSON to the collection, It doesn't behave like a list.
It just seems like a text. Here is how I try to bind the list.
How can I create a collection from JSON to bind to the gallery view?
I found the solution. I finally create a collection from the response of flow.
The flow's name is GetVendor.
The response of flow is like this :
{"value": "[{\"dataAreaId\":\"mv\",\"AccountNum\":\"100000\",\"Name\":\"*****L FOOD AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100001\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100014\",\"Name\":\"****(SEB)\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100021\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100029\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"500100\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"500210\",\"Name\":\"****\"}]"}
Below code creates a list from this response :
ClearCollect(_vendorData, MatchAll(GetVendors.Run(_token.value).value, "\{""dataAreaId"":""(?<dataAreaId>[^""]*)"",""AccountNum"":""(?<AccountNum>[^""]*)"",""Name"":""(?<Name>[^""]*)""\}"));
And I could bind the accountnum and name from _vendorDatra collection to the gallery view
In my case I had the same issue as you, but couldn't manage to get data into _vendorData collection, because MatchAll regex part was not working correctly, even if I had exactly the same scenario and I could not make it work.
My solution was to modify the flow itself, where I returned Response instead of Respond to a Power app or Flow, so basically I could return full request from Http.
This caused me some issues also, because when I generated schema from sample I could not register the flow to the powerapp with the error Failed during http send request.
The solution was to manually review the response schema and change all column types to one of the following three, because other are not supported: string, integer or boolean. Object and array can be set only on top level items, but never on children, so if you have anything else than my mentioned three, replace it to string. And no property can be left with undefined type.
Basically I like this solution even more, because in powerapps itself you do not need to do any conversion or anything - simply use the data as is, because it is already recognized as collection in case of array and you have all the properties already named for you.
Response step schema example is below.
{
"type": "object",
"properties": {
"PropertyOne": {
"type": "string"
},
"PropertyTwo": {
"type": "integer"
},
"PropertyThree": {
"type": "boolean"
},
"PropertyFour": {
"type": "array",
"items": {
"type": "object",
"properties": {
"PropertyArray1": {
"type": "string"
},
"PropertyArray1": {
"type": "integer"
},
"PropertyArray1": {
"type": "boolean"
}
}
}
It is easy now.
Power Apps introduced ParseJSON function which helps converting string to collection easily.
Table(ParseJSON(JSONString));
In gallery, map columns like - ThisItem.Value.ColumnName

Kafka Connect JDBC sink - write Avro field into PG JSONB

I'm trying to build a pipeline where Avro data is written into a Postgres DB. Everything works fine with simple schemas and the AvroConverter for the values. However, I would like to have a nested field written into a JSONB column. There are a couple of problems with this. First, it seems that the Connect plugin does not support STRUCT data. Second, the plugin cannot write directly into the JSONB column.
The second problem should be avoided by adding a cast in PG, as described in this issue. The first problem is proving more diffult. I have tried different transformations but have not been able to get the Connect plugin to interpret one complex field as a string. The schema in questions looks something like this (in practice there would be more fields on the first level besides the timestamp):
{
"namespace": "test.schema",
"name": "nested_message",
"type": "record",
"fields": [
{
"name": "timestamp",
"type": "long"
},
{
"name": "nested_field",
"type": {
"name": "nested_field_record",
"type": "record",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "prop",
"type": "float",
"doc": "Some property"
}
]
}
}
]
}
The message is written in Kafka as
{"timestamp":1599493668741396400,"nested_field":{"name":"myname","prop":377.93887}}
In order to write the contents of nested_field into a single DB column, I would like to interpret this entire field as a string. Is this possible? I have tried the cast transformation, but this only supports prmitive Avro types. Something along the lines of HoistField could work, but I don't see a way to limit this to a single field. Any ideas or advice would be greatly appreciated.
A completely different approach would be to use two connect plugins and UPSERT into the table. One plugin would use the AvroConverter for all fields save the nested one, while the second plugin uses the StringConverter for the nested field. This feels wrong in all kinds of ways though.

How do I use FreeFormTextRecordSetWriter

I my Nifi controller I want to configure the FreeFormTextRecordSetWriter, but I have no Idea what I should put in the "Text" field. I'm getting the text from my source (in my case GetSolr), and just want to write this, period.
Documentation and mailinglist do not seem to tell me how this is done, any help appreciated.
EDIT: Here the sample input + output I want to achieve (as you can see: not ransformation needed, plain text, no JSON input)
EDIT: I now realize, that I can't tell GetSolr to return just CSV data - but I have to use Json
So referencing with attribute seems to be fine. What the documentation omits is, that the ${flowFile} attribute should containt the complete flowfile that is returned.
Sample input:
{
"responseHeader": {
"zkConnected": true,
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"_": "1553686715465"
}
},
"response": {
"numFound": 3194,
"start": 0,
"docs": [
{
"id": "{402EBE69-0000-CD1D-8FFF-D07756271B4E}",
"MimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"FileName": "Test.docx",
"DateLastModified": "2019-03-27T08:05:00.103Z",
"_version_": 1629145864291221504,
"LAST_UPDATE": "2019-03-27T08:16:08.451Z"
}
]
}
}
Wanted output
{402EBE69-0000-CD1D-8FFF-D07756271B4E}
BTW: The documentation says this:
The text to use when writing the results. This property will evaluate the Expression Language using any of the fields available in a Record.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
I want to use my source's text, so I'm confused
You need to use expression language as if the record's fields are the FlowFile's attributes.
Example:
Input:
{
"t1": "test",
"t2": "ttt",
"hello": true,
"testN": 1
}
Text property in FreeFormTextRecordSetWriter:
${t1} k!${t2} ${hello}:boolean
${testN}Num
Output(using ConvertRecord):
test k!ttt true:boolean
1Num
EDIT:
Seems like what you needed was reading from Solr and write a single column csv. You need to use CSVRecordSetWriter. As for the same,
I should tell you to consider to upgrade to 1.9.1. Starting from 1.9.0, the schema can be inferred for you.
otherwise, you can set Schema Access Strategy as Use 'Schema Text' Property
then, use the following schema in Schema Text
{
"name": "MyClass",
"type": "record",
"namespace": "com.acme.avro",
"fields": [
{
"name": "id",
"type": "int"
}
]
}
this should work
I'll edit it into my answer. If it works for you, please choose my answer :)

Resources