I'm working in Apache NiFi and I've a question: how to handle nested arrays in JSON with QueryRecord processor? For example I've a JSON:
{
"offerName":"Viatti Strada Asimmetrico V-130 205/55 R16 91V",
"detailedStats":[
{
"type":"mobile",
"clicks":4,
"spending":"2.95"
}
]
}
How can I extract array to get the following result:
{
"offerName": "Viatti Strada Asimmetrico V-130 205/55 R16 91V",
"type": "mobile",
"clicks": 4,
"spending": "2.95"
}
I read about RPATH, but didnt find good examples.
Tried with:
SELECT RPATH(detailedStats, '/detailedStats[1]')
FROM FLOWFILE
But it throws error. How can i get expected result with RPATH?
You can select like below via QueryRecord . However it seems you are having an issue while writing. I used JsonRecordSetWriter with Inherent Record Schema. this is a good tutorial If you prefer avro schema
SELECT offerName,
RPATH_STRING(detailedStats, '/type') type,
RPATH_INT(detailedStats, '/clicks') clicks,
RPATH_STRING(detailedStats, '/spending') spending
FROM FLOWFILE
result is an array, so you should split it with $.* at the downstream
An alternative method might be adding a JoltTransformJSON processor with (shift type) specification, which's reached from the Advanced button of Settings tab, with the following code
[
{
"operation": "shift",
"spec": {
"detailedStats": {
"*": {
"#(2,offerName)": "offerName",
"*": "&"
}
}
}
}
]
in order to extract your desired result.
Related
I'm fairly new to NiFi, my question might be basic.
I would like to rename the JSON key in the flowfile. For example:
{"path":"/home/a/a", "size":"12345"}
and I would like to convert to
{"filename":"/home/a/a", "size":"12345"}
Tried using UpdateAttribute, adding a filename attribute with the value ${path} but either I'm doing something wrong or it's not meant to be used for this kind of operation.
How could I rename the attribute in a JSON ?
This is the content of your FlowFile, not an Attribute, so UpdateAttribute is not the right way to go.
The easiest way with JSON content of FlowFiles is going to be via a JOLTTransform.
Give this spec a try:
[
{
"operation": "shift",
"spec": {
"path": "filename",
"*": {
"#": "&"
}
}
}
]
You can test JOLT transforms here with input data and see what the output will be.
I'm trying to access values of an array in Json using FreeFormTextRecordSetWriter
Input data:
{"data":[["1580860800000","67.2"]],"itemid":5917,"label":"xxx","type":"stacked_element"}
Desired output
{"data1":"1580860800000", "data2":"67.2","itemid":5917,"label":"xxx","type":"stacked_element"}
Can this be done using Nifi Expression language ?
I don't believe FreeFormTextRecordSetWriter currently allows access to nested fields, please feel free to write a Jira to add this capability or perhaps a FreeFormTextRecordPathWriter to enable the use of RecordPath expressions.
I assume if you're trying FreeFormTextRecordSetWriter, then you know there will always be two entries in the data array. If that's the case, since the input/output is valid JSON, if there's one object in the flowfile you can use JoltTransformJSON with the following spec:
[
{
"operation": "shift",
"spec": {
"data": {
"*": {
"0": "data1",
"1": "data2"
}
},
"*": "&"
}
}
]
If there is more than one JSON object in the file, you can use JoltTransformRecord with a JsonTreeReader and JsonRecordSetWriter and the above spec.
If you don't know how many elements are in the array, you can still split them up with the following spec, but note that the first element has an index of 0 not 1 (so data0 instead of data1):
[
{
"operation": "shift",
"spec": {
"data": {
"*": {
"*": "data&"
}
},
"*": "&"
}
}
]
UpdateRecord is another option, but I believe you'd still have to know how many elements are in the array.
For example,there are 8 FFs,and then i’ve convert json to attribute for each FF,as follows:
I've add 5 Properties and Value with EvaluateJsonPath in pic.
If i need to convert 1000 multi-attribute,to set 1000 P/V with EvaluateJsonPath is too trouble.
What can i do to this easily?
Any help is appreciate!TIA
You don't have to (and shouldn't) split the individual JSON objects out of the array if you intend to keep them as a group (i.e. merge them back in). In most cases the split-transform-merge pattern has been replaced by record-based processors such as UpdateRecord or JoltTransformRecord. In your case since the data is JSON you can use JoltTransformJSON with the following spec to change the ID field to ID2 without splitting up the array:
[
{
"operation": "shift",
"spec": {
"*": {
"ID": "[#2].ID2",
"*": "[#2].&"
}
}
}
]
Note that you can also do this (especially for non-JSON input/output) with JoltTransformRecord, the only difference being that the spec is applied to each object in the array rather than JoltTransformJSON which applies the spec to the entire array. The JoltTransformRecord spec would look like this:
[
{
"operation": "shift",
"spec": {
"ID": "ID2",
"*": "&"
}
}
]
I have a SQL database, and I extract some lines, transfrom them into Json to feed a MongoDB. I'm stuck with the transformation step. I have tried this flow:
The process is stalled on the MergeRecord processor, I don't knwo why.
The aim is to transform this kind of (simplified) SQL query result:
ID ROUTE_CODE STATUS SITE_ID SITE_CODE
379619 1801300001 10 220429 100001
379619 1801300001 10 219414 014037
379619 1801300001 10 220429 100001
379620 1801300002 10 220429 100001
379620 1801300002 10 219454 014075
379620 1801300002 10 220429 100001
To this json:
[
{
"routeId": "379619",
"routeCode": "1901300001",
"routeStatus": "10",
sites: [
{ "siteId": "220429", "siteCode" : "100001" },
{ "siteId": "219414", "siteCode" : "014037" }
]
},
{
"routeId": "379620",
"routeCode": "1901300002",
"routeStatus": "10",
sites: [
{ "siteId": "220429", "siteCode" : "100001" },
{ "siteId": "219454", "siteCode" : "014075" }
]
}
]
The MergeRecord should group by the "routeId", also I don't know yet the correct Jolt transform to group the sites as array...
The flow is stuck because back-pressure has engaged on the queue between ConvertAvrToJson and MergeRecord, which can be seen by the red indicator showing that the queue has reached its max size of 10k flow files. This means the ConvertAvroToJson processor will no longer execute until the queue's threshold has been reduced, except MergeRecord is likely waiting for more files so the queue isn't going to reduce.
You could change the settings on the queue to increase the threshold to be higher than the number of records you are waiting for, or you could implement the flow differently...
After ExecuteSql it looks like 3 processors are being used to basically split, convert to json, and remerge back together. This could be done a lot more efficiently by not splitting and just using ConvertRecord with an Avro reader and a JSON writer, this way you can go ExecuteSQL -> ConvertRecord -> JOLT.
Also, you may want to look at JoltTransformRecord as an alternative to JoltTransformJson.
After ExecuteSQL (or ExecuteSQLRecord), you can then use PartitionRecord with the following user-defined properties added (property name is left of =, value to the right):
routeId = /ROUTE_ID
routeCode = /ROUTE_CODE
routeStatus = /STATUS
PartitionRecord should use a JSON writer, then you can use JoltTransformJson with the following spec:
[
{
"operation": "shift",
"spec": {
"*": {
"ID": "routeId",
"ROUTE_CODE": "routeCode",
"STATUS": "routeStatus",
"SITE_ID": "sites[#2].siteId",
"SITE_CODE": "sites[#2].siteCode"
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"routeId": "=firstElement(#(1,routeId))",
"routeCode": "=firstElement(#(1,routeCode))",
"routeStatus": "=firstElement(#(1,routeStatus))"
}
}
]
That will group each of the site IDs/codes into the sites field. Then you just need MergeRecord to patch them back together. Unfortunately PartitionRecord doesn't yet support the fragment.* attributes (I have written up NIFI-6139 to cover this improvement), so MergeRecord won't be able to guarantee that all the transformed records from the original input file will be in the same merged flow file. However each merged flow file will contain records with the sites array for some number of unique routeId/routeCode/routeStatus values.
I need to change the input date to a SQL friendly format in order to insert it into DB. I get errors on both imported_at and processed_at when trying to insert into DB.
My flow: JoltTransformJSON -> ConvertJsonToSql -> PutSql
Input:
{
"transactionDate": "2018-01-01T18:06:00",
}
My Spec:
[
{
"operation": "shift",
"spec": {
"transactionDate": "processed_at"
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"processed_at": "=${processed_at.replaceAll('T',' '):toDate('yyyy-MM-dd HH:mm:ss'):format('yyyy-MM-dd HH:mm:ss')}"
}
},
{
"operation": "default",
"spec": {
"processed_at": null,
"imported_at": "${now():format('yyyy-MM-dd HH:mm:ss')}"
}
}
]
My idea was this:
1. shift transactionDate into processed_at
2. override processed_at and transform it into a date via toDate function
3. format it into my desired format via format function
This doesn't work, in the best case, I either get an empty processed_at or the initial value.
I tried
${processed_at.replaceAll('T',' '):toDate('yyyy-MM-dd HH:mm:ss'):format('yyyy-MM-dd HH:mm:ss')}
${processed_at:toDate('yyyy-MM-ddTHH:mm:ss'):format('yyyy-MM-dd HH:mm:ss')}
Apparently, I cannot access JSON properties with expression language in the jolt spec in the JoltTransformJSON processor.
The way I made it to work was:
I added before JoltTransformJSON an EvaluateJSONPath processor and extracted processed_at as a Flowfile attribute.
My flow would look like this: EvaluateJSONPath -> JoltTransformJSON -> ConvertJsonToSql -> PutSql
In the JoltTransformJSON I now have access to the Flowfile attribute processed_at extracted earlier. In the Jolt spec, I updated the default operation:
{
"operation": "default",
"spec": {
"processed_at": null,
"processed_at": "${processed_at:replace('T', ''):toDate('yyyy-MM-ddHH:mm:ss'):format('yyyy-MM-dd HH:mm:ss.SSS')}"
}
}
The correct SQL date field format in expression language is: yyyy-MM-dd HH:mm:ss.SSS
Now the flow inserts rows into the database.