Find fields which contains a text and replace it with another text - jsonata

In the JSON example below how to find all the elements which contain string "Choice" and replace them with another string, for example "Grade".
So with below all the fields name "***Choice" should change to "***Grade".
I have pasted the expected output below. Given that I don't know how many fields will have the string "Choice", I don't want to simply do [$in ~> | ** [firstChoice] | {"firstGrade": firstChoice}, ["firstChoice"] | ;] which is a straight find and replace.
{
"data": {
"resourceType": "Bundle",
"id": "e919c820-71b9-4e4b-a1c8-c2fef62ea911",
"firstChoice": "xxx",
"type": "collection",
"entry": [
{
"resource": {
"resourceType": "Condition",
"id": "SMART-Condition-342",
"code": {
"coding": [
{
"system": "http://snomed.info/sct",
"code": "38341003",
"display": "Essential hypertension",
"firstChoice": "xxx"
}
],
"text": "Essential hypertension"
},
"clinicalStatus": "active",
"secondChoice": "xxx"
},
"search": {
"mode": "match"
}
}
]
}
}
Expected output
{
"data": {
"resourceType": "Bundle",
"id": "e919c820-71b9-4e4b-a1c8-c2fef62ea911",
"firstGrade": "xxx",
"type": "collection",
"entry": [
{
"resource": {
"resourceType": "Condition",
"id": "SMART-Condition-342",
"code": {
"coding": [
{
"system": "http://snomed.info/sct",
"code": "38341003",
"display": "Essential hypertension",
"firstGrade": "xxx"
}
],
"text": "Essential hypertension"
},
"clinicalStatus": "active",
"secondGrade": "xxx"
},
"search": {
"mode": "match"
}
}
]
}
}

There might be simpler ways, but this is an expression I came up with in JSONata:
(
$prefixes := $keys(**)[$ ~> /Choice$/].$substringBefore('Choice');
$reduce($prefixes, function($acc, $prefix) {(
$choice := $prefix & "Choice";
$acc ~> | ** [$lookup($choice)] | {$prefix & "Grade": $lookup($choice)}, [$choice] |
)}, $$)
)
It looks terrible, but I'll explain how I built it up anyway.
You started with the expression
$ ~> | ** [firstChoice] | {"firstGrade": firstChoice}, ["firstChoice"] |
which is fine if you only want to replace one choice, and you know the full name. If you want to replace more than one, then you can chain these together as follows:
$ ~> | ** [firstChoice] | {"firstGrade": firstChoice}, ["firstChoice"] |
~> | ** [secondChoice] | {"secondGrade": secondChoice}, ["secondChoice"] |
~> | ** [thirdChoice] | {"thirdGrade": thirdChoice}, ["thirdChoice"] |
At this point, you could create a higher-order function that takes the choice prefix and returns a partial substitution (note that the |...|...| syntax generates a function). Then you can chain these together for an array of prefixes using the built in $reduce() higher-order function. So you get something like this:
(
$prefixes := ["first", "second", "third"];
$reduce($prefixes, function($acc, $prefix) {(
$choice := $prefix & "Choice";
$acc ~> | ** [$lookup($choice)] | {$prefix & "Grade": $lookup($choice)}, [$choice] |
)}, $$)
)
But if you don't know the set of prefixes up front, and want to select, say, all property names that end in 'Choice', the the following expression will get you that:
$prefixes := $keys(**)[$ ~> /Choice$/].$substringBefore('Choice')
Which then arrives at my final expression. You can experiment with it here in the exerciser on your data.

I know it was 2019, but here's an alternative solution to help future readers.
$replace($string(),/([first|second|third])Choice/,"$1Grade")~>$eval()

Related

How do I parse nested JSON with JQ into CSV-aggregated output?

I have a question that is an extension/followup to a previous question I've asked:
How do I concatenate dummy values in JQ based on field value, and then CSV-aggregate these concatenations?
In my bash script, when I run the following jq against my curl result:
curl -u someKey:someSecret someURL 2>/dev/null | jq -r '.schema' | jq -r -c '.fields'
I get back a JSON array as follows:
[
{"name":"id", "type":"int"},
{
"name": "agents",
"type": {
"type": "array",
"items": {
"name": "carSalesAgents",
"type": "record"
"fields": [
{
"name": "agentName",
"type": ["string", "null"],
"default": null
},
{
"name": "agentEmail",
"type": ["string", "null"],
"default": null
},
{
"name": "agentPhones",
"type": {
"type": "array",
"items": {
"name": "SalesAgentPhone",
"type": "record"
"fields": [
{
"name": "phoneNumber",
"type": "string"
}
]
}
},
"default": []
}
]
}
},
"default": []
},
{"name":"description","type":"string"}
]
Note: line breaks and indentation added here for ease of reading. This is all in reality a single blob of text.
My goal is to do a call with jq applied to return the following, given the example above (again lines and spaces added for readability, but only need to return valid JSON blob):
{
"id":1234567890,
"agents": [
{
"agentName": "xxxxxxxxxx",
"agentEmail": "xxxxxxxxxx",
"agentPhones": [
{
"phoneNumber": "xxxxxxxxxx"
},
{
"phoneNumber": "xxxxxxxxxx"
},
{
"phoneNumber": "xxxxxxxxxx"
}
]
},
{
"agentName": "xxxxxxxxxx",
"agentEmail": "xxxxxxxxxx",
"agentPhones": [
{
"phoneNumber": "xxxxxxxxxx"
},
{
"phoneNumber": "xxxxxxxxxx"
},
{
"phoneNumber": "xxxxxxxxxx"
}
]
}
],
"description":"xxxxxxxxxx"
}
To summarise, I am trying to automatically generate templated values that match the "schema" JSON shown above.
So just to clarify, the values for "name" (including their surrounding double-quotes) are concatenated with either:
:1234567890 ...when the "type" for that object is "int"
":xxxxxxxxxx" ...when the "type" for that object is "string"
...and when type is "array" or "record" the appropriate enclosures are added {} or [] with the nested content inside.
if its an array of records, generate TWO records for the output
The approach I have started down to cater for parsing nested content like this is to have a series of if-then-else's for every combination of each possible jq type.
But this is fast becoming very hard to manage and painful. From my initial scratch efforts...
echo '[{"name":"id","type":"int"},{"name":"test_string","type":"string"},{"name":"string3ish","type":["string","null"],"default":null}]' | jq -c 'map({(.name): (if .type == "int" then 1234567890 else (if .type == "string" then "xxxxxxxxxx" else (if .type|type == "array" then "xxARRAYxx" else "xxUNKNOWNxx" end) end) end)})|add'
I was wondering if anyone knew of a smarter way to do this in bash/shell with JQ.
PS: I have found alternate solutions for such parsing using Java and Python modules, but JQ is preferable for a unique case of limitations around portability. :)
Thanks!
jq supports functions. Those functions can recurse.
#!/usr/bin/env jq -f
# Ignore all but the first type, in the case of "type": ["string", "null"]
def takeFirstTypeFromArray:
if (.type | type) == "array" then
.type = .type[0]
else
.
end;
def sampleData:
takeFirstTypeFromArray |
if .type == "int" then
1234567890
elif .type == "string" then
"xxxxxxxxxx"
elif .type == "array" then # generate two entries for any test array
[(.items | sampleData), (.items | sampleData)]
elif .type == "record" then
(.fields | map({(.name): sampleData}) | add)
elif (.type | type) == "array" then
(.type[] | sampleData)
elif (.type | type) == "object" then
(.type | sampleData)
else
["UNKNOWN", .]
end;
map({(.name): sampleData}) | add

Nested Filtering json file with jq statement

I have a json file with the following structure of each object inside
{
"id": 2400321267,
"data": {
"q": "quinoa black bean and shrimp r",
"r": "quinoa black bean and shrimps r",
"s": "3"
},
"job_id": 1413792,
"results": {
"judgments": [
{
"id": 5022700047,
"unit_state": "good",
"data": {
"rewrite_quality": "1"
},
}
],
}
},
{
"id": 2400321267,
"data": {
"q": "quinoa black bean and shrimp r",
"r": "quinoa black bean and shrimps r",
"s": "3"
},
"job_id": 1413792,
"results": {
"judgments": [
{
"id": 5022700047,
"unit_state": "good",
"data": {
"rewrite_quality": "2"
},
}
],
}
}
and I was trying to use the command jq '.[] | select(any(.Tags[]; .rewrite_quality == "1"))' | less to try to see if the output is correct but I don't see any output.
I want the output to have only entries with rewrite_quality == '1', in this case only the first entry.
Reading between the lines, it would appear that the following filter should achieve the stated goals:
.[]
| select( .results | any(.judgments[]; .data.rewrite_quality == "1"))
"Tags"
If the intent in using ".Tags" was to indicate that it does not matter what path leads to .rewrite_quality, then the filter to use would be:
.[]
| select( any(.. | objects | .rewrite_quality == "1"))
Alternative to using less
If you want a brief indication of whether there are any matches, you could use this filter, which has the added value of revealing how many objects satisfy the criterion:
map(select(any(.. | objects | .rewrite_quality == "1"))) | length

unable to parse json into csv using jq

I have a JSON file that I want to convert into a CSV file using the jq in a shell script. I want to create a single row from this entire JSON file. I have to extract value from values. The row output should be something like
null,642,642,412,0,null,null
Here is my JSON file
{
"data": [
{
"name": "exits",
"period": "lifetime",
"values": [
{
"value": {}
}
],
"title": "Exits",
"description": "Number of times someone exited the carousel"
},
{
"name": "impressions",
"period": "lifetime",
"values": [
{
"value": 642
}
],
"title": "Impressions",
"description": "Total number of times the media object has been seen"
},
{
"name": "reach",
"period": "lifetime",
"values": [
{
"value": 412
}
],
"title": "Reach",
"description": "Total number of unique accounts that have seen the media object"
},
{
"name": "replies",
"period": "lifetime",
"values": [
{
"value": 0
}
],
"title": "Replies",
"description": "Total number of replies to the carousel"
},
{
"name": "taps_forward",
"period": "lifetime",
"values": [
{
"value": {}
}
],
"title": "Taps Forward",
"description": "Total number of taps to see this story's next photo or video"
},
{
"name": "taps_back",
"period": "lifetime",
"values": [
{
"value": {}
}
],
"title": "Taps Back",
"description": "Total number of taps to see this story's previous photo or video"
}
]
}
Hi tried using this jq command :
.data | map(.values[].value) | #csv
This is giving the following output:
jq: error (at :70): object ({}) is not valid in a csv row
exit status 5
So when I am getting this empty JSON object it is reflecting an error.
Please Help!!
The row output should be something like
null,642,642,412,0,null,null
Using length==0 here is dubious at best. To check for {} one could write:
jq '.data | map(.values[].value | if . == {} then "null" else . end) | #csv'
Similarly for [].
If you run the command without the #csv part you will see that the output is:
[
{},
642,
412,
0,
{},
{}
]
By replacing the empty objects with "null": (length == 0)
jq '.data | map(.values[].value) | map(if (type == "object" and length == 0 ) then "null" else . end) | #csv'
Output:
"\"null\",642,412,0,\"null\",\"null\""
Per suggestion from #aaron (see comment). The following can produce the requested output without extra post-processing. Disclaimer: this is not working with my jq 1.5, but working on jqplay with jq 1.6.
jq --raw-output '.data | map(.values[].value) | map(if (type == "object" and length == 0 ) then "null" else . end) | join(",")'
Output:
null,642,412,0,null,null

Merge two Apache Avro schemas containing a common array using jq

I've got two Apache Avro schemas (essentially JSON) - one being a "common" part across many schemas and another one as an . Looking for a way to merge them in a shell script.
base.avsc
{
"type": "record",
"fields": [
{
"name": "id",
"type": "string"
}
]
}
schema1.avsc
{
"name": "schema1",
"namespace": "test",
"doc": "Test schema",
"fields": [
{
"name": "property1",
"type": [
"null",
"string"
],
"default": null,
"doc": "Schema 1 specific field"
}
]
}
jq -s '.[0] * .[1]' base.avsc schema1.avsc doesn't merge the array for me:
{
"type": "record",
"fields": [
{
"name": "property1",
"type": [
"null",
"string"
],
"default": null,
"doc": "Schema 1 specific field"
}
],
"name": "schema1",
"namespace": "test",
"doc": "Test schema"
}
I don't expect to have same keys in the "fields" array. And "type": "record", could be moved into schema1.avsc if that makes it easier.
An expected result should be something like this (the order of the keys doesn't make a difference)
{
"name": "schema1",
"namespace": "test",
"doc": "Test schema",
"type": "record",
"fields": [
{
"name": "property1",
"type": [
"null",
"string"
],
"default": null,
"doc": "Schema 1 specific field"
},
{
"name": "id",
"type": "string"
}
]
}
Can't figure out how to write an expression in jq for what I want.
You need an addition (+) operator to perform a union of records from both the files and combine the common record fields from both the files as
jq -s '.[0] as $o1 | .[1] as $o2 | ($o1 + $o2) |.fields = ($o2.fields + $o1.fields) ' base.avsc schema1.avsc
Answer adopted from pkoppstein's comment on this GitHub post Merge arrays in two json files.
The jq manual says this under the addition operator +
Objects are added by merging, that is, inserting all the key-value pairs from both objects into a single combined object. If both objects contain a value for the same key, the object on the right of the + wins. (For recursive merge use the * operator.)
Here's a concise solution that avoids "slurping":
jq --argfile base base.avsc '
$base + .
| .fields += ($base|.fields)
' schema1.avsc
Or you could go with brevity:
jq -s '
.[0].fields as $f | add | .fields += $f
' base.avsc schema1.avsc
as an alternative solution, you may consider handling hierarchical json using a walk-path based unix utility jtc.
the ask here is mere a recursive merge, which with jtc looks like this:
bash $ <schema1.avsc jtc -mi base.avsc
{
"doc": "Test schema",
"fields": [
{
"default": null,
"doc": "Schema 1 specific field",
"name": "property1",
"type": [
"null",
"string"
]
},
{
"name": "id",
"type": "string"
}
],
"name": "schema1",
"namespace": "test",
"type": "record"
}
bash $
PS> Disclosure: I'm the creator of the jtc - shell cli tool for JSON operations

filter json via bash - case insensitive

I have json code and need to filter it by the value of the attribute DNSName. The filter must be case insensitive.
How can I do that? Is there a possibility to solve it with jq?
This is how I create the json code:
aws elbv2 describe-load-balancers --region=us-west-2 | jq
My unfiltered source json code looks like this:
{
"LoadBalancers": [
{
"IpAddressType": "ipv4",
"VpcId": "vpc-abcdabcd",
"LoadBalancerArn": "arn:aws:elasticloadbalancing:us-west-2:000000000000:loadbalancer/app/MY-LB1/a00000000000000a",
"State": {
"Code": "active"
},
"DNSName": "MY-LB1-123454321.us-west-2.elb.amazonaws.com",
"SecurityGroups": [
"sg-00100100",
"sg-01001000",
"sg-10010001"
],
"LoadBalancerName": "MY-LB1",
"CreatedTime": "2018-01-01T00:00:00.000Z",
"Scheme": "internet-facing",
"Type": "application",
"CanonicalHostedZoneId": "ZZZZZZZZZZZZZ",
"AvailabilityZones": [
{
"SubnetId": "subnet-17171717",
"ZoneName": "us-west-2a"
},
{
"SubnetId": "subnet-27272727",
"ZoneName": "us-west-2c"
},
{
"SubnetId": "subnet-37373737",
"ZoneName": "us-west-2b"
}
]
},
{
"IpAddressType": "ipv4",
"VpcId": "vpc-abcdabcd",
"LoadBalancerArn": "arn:aws:elasticloadbalancing:us-west-2:000000000000:loadbalancer/app/MY-LB2/b00000000000000b",
"State": {
"Code": "active"
},
"DNSName": "MY-LB2-9876556789.us-west-2.elb.amazonaws.com",
"SecurityGroups": [
"sg-88818881"
],
"LoadBalancerName": "MY-LB2",
"CreatedTime": "2018-01-01T00:00:00.000Z",
"Scheme": "internet-facing",
"Type": "application",
"CanonicalHostedZoneId": "ZZZZZZZZZZZZZ",
"AvailabilityZones": [
{
"SubnetId": "subnet-54545454",
"ZoneName": "us-west-2a"
},
{
"SubnetId": "subnet-64646464",
"ZoneName": "us-west-2c"
},
{
"SubnetId": "subnet-74747474",
"ZoneName": "us-west-2b"
}
]
}
]
}
I now want some bash code to filter this result for the record with the DNSName property value MY-LB2-9876556789.us-west-2.elb.amazonaws.com, and need the entire LoadBalancer object back as a result. This is how I wish my result to look like:
{
"IpAddressType": "ipv4",
"VpcId": "vpc-abcdabcd",
"LoadBalancerArn": "arn:aws:elasticloadbalancing:us-west-2:000000000000:loadbalancer/app/MY-LB2/b00000000000000b",
"State": {
"Code": "active"
},
"DNSName": "MY-LB2-9876556789.us-west-2.elb.amazonaws.com",
"SecurityGroups": [
"sg-88818881"
],
"LoadBalancerName": "MY-LB2",
"CreatedTime": "2018-01-01T00:00:00.000Z",
"Scheme": "internet-facing",
"Type": "application",
"CanonicalHostedZoneId": "ZZZZZZZZZZZZZ",
"AvailabilityZones": [
{
"SubnetId": "subnet-54545454",
"ZoneName": "us-west-2a"
},
{
"SubnetId": "subnet-64646464",
"ZoneName": "us-west-2c"
},
{
"SubnetId": "subnet-74747474",
"ZoneName": "us-west-2b"
}
]
}
Does anyone know how to do it?
Update:
This solution works, but is not case insensitive:
aws elbv2 describe-load-balancers --region=us-west-2 | jq -c '.LoadBalancers[] | select(.DNSName | contains("MY-LB2"))'
Update:
This solution seems to work even better:
aws elbv2 describe-load-balancers --region=us-west-2 | jq -c '.LoadBalancers[] | select(.DNSName | match("my-lb2";"i"))'
But I did not have the chance to test in detail yet.
You probably should be using test/2 rather than match/2, but in either case, since the problem description calls for
case-insensitive equality, you would use an anchored regex:
.LoadBalancers[]
| select(.DNSName | test("^my-lb2-9876556789.us-west-2.elb.amazonaws.com$";"i"))
With the caveat that ascii_upcase only translates ASCII characters, it might be more efficient to use it:
.LoadBalancers[]
| select(.DNSName | ascii_upcase == "MY-LB2-9876556789.US-WEST-2.ELB.AMAZONAWS.COM")

Resources