Get the key values using jq from json - bash

I am looking for a way to find the full key path for given value taken from the variable. My input comes from the elasticsearch query result.
For example I want a full path to the key value: 9i6O4ERWWB
They key value is always unique and what only changes is the example.com and template1 keys (I cannot predict what will be the name).
Once knowing the key path:
_source.example.com.template1 I want to increment the "counter" field and update the elasticsearch document.
My input JSON:
{
"_index": "domains",
"_type": "doc",
"_id": "c66443eb1e6a0850b03a91fdb967f4d1",
"_score": 2.4877305,
"_source": {
"user_id": "c66443eb1e6a0850b03a91fdb967f4d1",
"statistics": {
"test_count": 0,
"datasize": 0,
"example.com": {
"template1": {
"image_id": "iPpDWbaO3YTIEb0pBkW3.png",
"link_id": "4ybOOUJpaBpDaLxPkz1j.html",
"counter": 0,
"subdomain_id": "9i6O4ERWWB"
},
"template2": {
"image_id": "iPpDWasdas322sdaW3.png",
"link_id": "4ybOOd3425sdfsz1j.html",
"counter": 1,
"subdomain_id": "432432sdxWWB"
}
},
"example1.com": {
"template1": {
"image_id": "iPpDWdasdasdasdas3.png",
"link_id": "4ybOOUadsasdadsasd1j.html",
"subdomain_id": "9i6O4ERWWB"
}
}
}
}
}
What I have tried was:
<myfile jq -c 'paths | select(.[-1])
<myfile jq -c 'paths | select(.[-1] == "subdomain_id")'
but this prints all apart the key values:
["_index"]
["_type"]
["_id"]
["_score"]
["_source"]
["_source","user_id"]
["_source","statistics"]
["_source","statistics","test_count"]
["_source","statistics","datasize"]
["_source","statistics","example.com"]
["_source","statistics","example.com","template1"]
["_source","statistics","example.com","template1","image_id"]
["_source","statistics","example.com","template1","link_id"]
["_source","statistics","example.com","template1","subdomain_id"]
["_source","statistics","template2"]
["_source","statistics","template2","image_id"]
["_source","statistics","template2","link_id"]
["_source","statistics","template2","subdomain_id"]
["_source","statistics","example1.com"]
["_source","statistics","example1.com","template1"]
["_source","statistics","example1.com","template1","image_id"]
["_source","statistics","example1.com","template1","link_id"]
["_source","statistics","example1.com","template1","subdomain_id"]
My pseudocode I am trying to write:
seeked_key_value="432432sdxWWB"
jq -n --arg seeked_key_value "$seeked_key_value" \
'paths | select(.[-1].$seeked_key_value'
Expected result: ["_source","statistics","example.com","template1","subdomain_id":"432432sdxWWB"]
Is this doable with jq in bash?

It's best to avoid grep in cases like this. To meet the exact requirements in the present case, one could write:
jq -c 'paths(scalars) as $p
| [$p, getpath($p)]
| select(.[1] == "9i6O4ERWWB")' input.json
If one really needs grep-like functionality, you can always use jq's test/1.

You can 'extract' paths using the following:
jq -c 'paths(scalars) as $p | [$p, getpath($p)]' file.json | grep 432432sdxWWB
and response is:
[["_source","statistics","example.com","template2","subdomain_id"],"432432sdxWWB"]
Possibly you can improve jq query to get only single value but I hope it helps you in determining final version :)

Related

How to extract data from a JSON file into a variable

I have the following json format, basically it is a huge file with several of such entries.
[
{
"id": "kslhe6em",
"version": "R7.8.0.00_BNK",
"hostname": "abacus-ap-hf-test-001:8080",
"status": "RUNNING",
},
{
"id": "2bkaiupm",
"version": "R7.8.0.00_BNK",
"hostname": "abacus-ap-hotfix-001:8080",
"status": "RUNNING",
},
{
"id": "rz5savbi",
"version": "R7.8.0.00_BNK",
"hostname": "abacus-ap-hf-test-005:8080",
"status": "RUNNING",
},
]
I wanted to fetch all the hostname values that starts with "abacus-ap-hf-test" and without ":8080" into a variable and then wanted to use those values for further commands over a for loop something like below. But, am bit confused how can I extract such informaion.
HOSTAME="abacus-ap-hf-test-001 abacus-ap-hf-test-005"
for HOSTANAME in $HOSTNAME
do
sh ./trigger.sh
done
The first line command update to this:
HOSTAME=$(grep -oP 'hostname": "\K(abacus-ap-hf-test[\w\d-]+)' json.file)
or if you sure that the hostname end with :8080", try this:
HOSTAME=$(grep -oP '(?<="hostname": ")abacus-ap-hf-test[\w\d-]+(?=:8080")' json.file)
you will find that abacus-ap-hf-test[\w\d-]+ is the regex, and other strings are the head or the end of the regex content which for finding result accuracy.
Assuming you have valid JSON, you can get the hostname values using jq:
while read -r hname ; do printf "%s\n" "$hname" ; done < <(jq -r .[].hostname j.json)
Output:
abacus-ap-hf-test-001:8080
abacus-ap-hotfix-001:8080
abacus-ap-hf-test-005:8080

jq does not show null output

I have the following code in the command line script:
output_json=$(jq -n \
--argjson ID "${id}" \
--arg Title "${title}" \
--argjson like "\"${like}\"" \
'$ARGS.named')
I put the id, title and like variables into the jq. I get the following output:
[
{
"ID": 6,
"Title": "ABC",
"like": ""
},
{
"ID": 22,
"Title": "ABC",
"like": "Yes"
}
]
But, I am trying to get the output in the following format, i.e. with null:
[
{
"ID": 6,
"Title": "ABC",
"like": null
},
{
"ID": 22,
"Title": "ABC",
"like": "Yes"
}
]
I don't quite get it is it possible to do this in general, or is it a problem with my jq command?
And as far as I understood "like": "" is not the same as "like": null. I am also a little confused now, and do not really understand what is the correct choice to use.
By using --argjson you need to provide valid JSON-encoded argument, thus if you want to receive null the value needs to be literally null. Your solution, however, adds quotes around it, so it can never be evaluated to null. (Also, it will only be a valid JSON string if it follows the JSON encoding for special characters such as the quote characters itself).
If you want to have a JSON string in the regular case, and null in the case where it is empty, import the content of ${like} as string using --arg and without the extra quotes (just as you do with ${title}), then use some jq logic to turn the empty string into null. An if statement would do, for example:
like=
jq -n --arg like "${like}" '{like: (if $like == "" then null else $like end)}'
{
"like": null
}

Can't set different json values with different values

Linux Mint 20.2
Here report.empty.json
{
"total": 0,
"project": "",
"Severity": [],
"issues": []
}
I want to set value = 500 (int value) to "total" and "MY_PROJECT".
To do this I use tool "jq"
Here by bash script file:
#!/bin/bash
readonly PROJECT_KEY=MY_PROJECT
readonly PAGE_SIZE=500
jq --arg totalArg "$PAGE_SIZE" '.total = $totalArg' report.empty.json > report.json
jq --arg projectKey "${PROJECT_KEY}" '.project = $projectKey' report.empty.json > report.json
echo "Done"
But it set only key project. The key total is not changed.
Content of file report.json
{
"total": 0,
"project": "MY_PROJECT",
"Severity": [],
"issues": []
}
But I need to update BOTH KEYS.
The result must be:
{
"total": 500,
"project": "MY_PROJECT",
"Severity": [],
"issues": []
}
The second command reads from report.empty.json instead of the already-modified report.json.
You could chain the jq
jq --arg totalArg "$PAGE_SIZE" '.total = $totalArg' report.empty.json |
jq --arg projectKey "${PROJECT_KEY}" '.project = $projectKey' >report.json
But a better solution is to use just use one command.
jq --arg totalArg "$PAGE_SIZE" --arg projectKey "$PROJECT_KEY" '
.total = $totalArg | .project = $projectKey
' report.empty.json >report.json
My proposal for How to populate JSON values, using jq
Thinking about How to process arrays using jq, here is my modified version of your script. (Of course, you could keep empty.json out of script)...
#!/bin/bash
declare -r projectKey=MY_PROJECT
declare -ir pageSize=500
declare -a issueList=()
declare -i issueCnt=0
declare issueStr='' jqCmd='.project = $projArg | .total = $totArg | .issues=[ '
declare promptMessage='Enter issue (or [return] if none): '
while read -rp "$promptMessage" issue && [ "$issue" ];do
promptMessage='Enter next issue (or [return] if no more): '
issueCnt+=1
issueList+=(--arg is$issueCnt "$issue")
issueStr+="\$is$issueCnt, "
done
jqCmd+="${issueStr%, } ]"
jq --arg totArg "$pageSize" --arg projArg "$projectKey" \
"${issueList[#]}" "( $jqCmd )" <<-EoEmptyJson
{
"total": 0,
"project": "",
"Severity": [],
"issues": []
}
EoEmptyJson
Sample run (I want to add two issues):
./reportJson
Enter issue (or [return] if none): Foo
Enter next issue (or [return] if no more): Bar Baz
Enter next issue (or [return] if no more):
{
"total": "500",
"project": "MY_PROJECT",
"Severity": [],
"issues": [
"Foo",
"Bar Baz"
]
}
No answer (so far) accounts for the requirement that total be of type int. This can be accomplished by using --argjson instead of --arg. Here's my two cents:
jq --argjson total 500 --arg project "MY_PROJECT" '. + {$total, $project}' report.json
{
"total": 500,
"project": "MY_PROJECT",
"Severity": [],
"issues": []
}

JQ query on JSON file

I am having below code in JSON file.
{
"comment": {
"vm-updates": [],
"site-ops-updates": [
{
"comment": {
"message": "You can start maintenance on this resource"
},
"hw-name": "Machine has got missing disks. "
}
]
},
"object_name": "4QXH862",
"has_problems": "yes",
"tags": ""
}
I want to separate "hw-name" from this JSON file using jq. I've tried below combinations, but nothing worked.
cat jsonfile | jq -r '.comment[].hw-name'
cat json_file.json | jq -r '.comment[].site-ops-updates[].hw-name'
Appreciated help from StackOverflow!!!
It should be:
▶ cat jsonfile | jq -r '.comment."site-ops-updates"[]."hw-name"'
Machine has got missing disks.
Or better still:
▶ jq -r '.comment."site-ops-updates"[]."hw-name"' jsonfile
Machine has got missing disks.
From the docs:
If the key contains special characters, you need to surround it with double quotes like this: ."foo$", or else .["foo$"].

Find ec2 instances with improper or missing tags

I am trying to simply output a list of all instance IDs that do not follow a particular tagging convention.
Tag is missing (Tag Keys: Environment or Finance)
Environment Tag value is not one of (prod, stg, test, dev)
Finance Tag value is not one of (GroupA , GroupB)
For (1) I can use the following:
aws ec2 describe-instances --output json --query 'Reservations[*].Instances[?!not_null(Tags[?Key==`Environment`].Value)] | [].InstanceId'
[
"i-12345678901234567",
"i-76543210987654321"
]
But I still need (2) and (3). What if the tag exists but is empty, or has a typo in the value?
"ec2 --query" functionality is limited and I've yet to find a way for it to get me (2) or (3), especially when it comes to inverting results.
I've gone back and forth trying to
modify the output from the CLI to make it easier to parse in JQ
VS
trying to wrangle the output in JQ
For (2) and (3). Here's a pair of outputs from the CLI that I've tried sending to JQ to parse with sample output for 2 instances:
CLI Sample Output [A] Tag.Value and Tag.Key need to be paired when searching, and then negating/inverting a set of searches...
aws ec2 describe-instances --output json --query 'Reservations[].Instances[].{ID:InstanceId, Tag: Tags[]}' | jq '.[]'
{
"Tag": [
{
"Value": "GroupA",
"Key": "Finance"
},
{
"Value": "stg",
"Key": "Environment"
},
{
"Value": "true",
"Key": "Backup"
},
{
"Value": "Another Server",
"Key": "Name"
}
],
"ID": "i-87654321"
}
{
"Tag": [
{
"Value": "GroupB",
"Key": "Finance"
},
{
"Value": "Server 1",
"Key": "Name"
},
{
"Value": "true",
"Key": "Backup"
},
{
"Value": "stg",
"Key": "Environment"
}
],
"ID": "i-12345678"
}
CLI Sample Output [B] Tag value being inside an array has been enough to trigger syntax errors when attempting things like "jq map" or "jq select"
aws ec2 describe-instances --output json --query 'Reservations[].Instances[].{ID:InstanceId, EnvTag: Tags[?Key==`Environment`].Value, FinTag: Tags[?Key==`Finance`].Value}' | jq '.[]'
{
"EnvTag": [
"stg"
],
"ID": "i-87654321",
"FinTag": [
"GroupA"
]
}
{
"EnvTag": [
"stg"
],
"ID": "i-12345678",
"FinTag": [
"GroupB"
]
}
I find most of the time, when I try to expand some solution from a simpler use case, I only ever end up with cryptic syntax errors due to some oddity in the structure of my incoming dataset.
Example Issue 1
Below is an example of how the inverting / negating fails. This is using CLI output B:
aws ec2 describe-instances --output json --query 'Reservations[].Instances[].{ID:InstanceId, EnvTag: Tags[?Key==`Environment`].Value, FinTag: Tags[?Key==`Finance`].Value}' | jq '.[]' | jq 'select(.EnvTag[] | contains ("prod", "dev") | not)'
I would expect the above to return everything except prod and dev. But it looks like the logic is inverted on each item as opposed to the set of contains:
"!A + !B" instead of "!(A or B)"
The resulting dataset returned is a list of everything, including dev and prod.
Example Issue 1.5
I can workaround the logic issue by chaining the contain excludes, but then I discover that "contains" won't work for me as it will pickup typos that still happen to contain the string in question:
aws ec2 describe-instances --output json --query 'Reservations[].Instances[].{ID:InstanceId, EnvTag: Tags[?Key==`Environment`].Value, FinTag: Tags[?Key==`Finance`].Value}' | jq '.[]' | jq 'select(.EnvTag[] | contains ("dev") | not) | select(.EnvTag[] | contains ("stg") | not) | select(.EnvTag[] | contains ("test") | not) | select(.EnvTag[] | contains ("prod") | not) | select (.EnvTag[] | contains ("foo") | not)'
prod != production
"prod" contains("prod") = true
"production" contains ("prod") = true <-- bad :(
I believe I've found a solution.
It can be greatly simplified. First, in this case, there is no need to invoke jq twice. jq '.[]' | jq ... is equivalent to jq '.[] | ...'
Second, the long pipeline of 'select' filters can be condensed, for example to:
select(.EnvTag[]
| (. != "dev" and . != "stg" and . != "prod" and . != "test" and . != "ops"))
or, if your jq has all/2, even more concisely to:
select( . as $in | all( ("dev", "stg", "prod", "test", "ops"); . != $in.EnvTag[]) )
I believe I've found a solution. It may not be optimal, but I've found a way to pipe-chain excludes of exact strings:
aws ec2 describe-instances --output json --query 'Reservations[].Instances[].{ID:InstanceId, EnvTag: Tags[?Key==`Environment`].Value, FinTag: Tags[?Key==`Finance`].Value}' | jq '.[]' | jq 'select(.EnvTag[] != "dev") | select (.EnvTag[] != "stg") | select (.EnvTag[] != "prod") | select (.EnvTag[] != "test") | select (.EnvTag[] != "ops") | .ID'
I verified this by changing an environment tag from "ops" to "oops".
Upon running this query, it returned the single instance with the oops tag.

Resources