JQ filter formatting - bash

I have a JSON file I'd like to filter by the ID field, and show the matching Body and Source fields.
Format of the JSON file to filter
[
{
"timestamp" : 1638550971085,
"id" : "54f",
"body" : "Orange",
"source" : "827261"
},
{
"timestamp" : 1638550971096,
"id" : "54f",
"body" : "Apple",
"source" : "137261"
},
{
"timestamp" : 1638550971126,
"id" : "5da",
"body" : "Pear",
"source" : "1da61"
}
]
In this example I would like to filter where id = 54f and show the Timestamp (Unixtime converted to local time), Body and Source fields that match, ideally as shown below.
[Timestamp] Orange 827261
[Timestamp] Apple 137261
I have tried this command, but it is showing extra body / source fields outside of the SQL filter. It also adds a line break between printing the body and source, and ideally I'd like this printed on one line (tab separated). I also don't know how to convert the timestamp to localtime string.
more file.json | jq '.[] | select(.Id=="54f").body, .source'

Your JSON input is not proper JSON as it has
commas after the .source field but no following field
no commas between the elements of the top-level array
no quotes around the objects' field names
You'd need to address these issues first before proceeding. This is how it should look like:
[
{
"timestamp": 1638550971085,
"id": "54f",
"body": "Orange",
"source": "827261"
},
{
"timestamp": 1638550971096,
"id": "54f",
"body": "Apple",
"source": "137261"
},
{
"timestamp": 1638550971126,
"id": "5da",
"body": "Pear",
"source": "1da61"
}
]
Then you can go with this
localtime (available since jq 1.6) converts a timestamp of seconds (so, divide yours by 1000) since the Unix epoch into a so-called "broken down time" object (see the manual) which you can either process using strftime (see the answer from David Conrad), or parse yourself manually. With .[:3] | .[1] += 1 | join("-") I provided a rather primitive example for demonstration purposes which concatenates its first three items (year, month, day) with dashes in between, after incrementing the second item (as the month has a 0-based encoding) - for padding with zeroes check out one of the answers over here
#tsv creates tabs between the columns
jq -r '
.[]
| select(.id == "54f")
| [(.timestamp / 1000 | localtime | .[:3] | join("-")), .body, .source]
| #tsv
' file.json
2021-12-3 Orange 827261
2021-12-3 Apple 137261
Demo

As the other answer states, your JSON is not correct. After fixing that, you can filter and extract the data as that answer suggests, but use the strftime function to format the dates properly:
jq -r '.[] | select(.id == "54f")
| [(.timestamp / 1000 | localtime | strftime("%Y-%m-%d")), .body, .source]
| #tsv' file.json
The use of strftime("%Y-%m-%d") is critical to both displaying the correct month and formatting the date with leading zeroes on single-digit months and days.
2021-12-03 Orange 827261
2021-12-03 Apple 137261

Related

How to extract a value by searching for two words in different lines and getting the value of second one

How to search for a word, once it's found, in the next line save a specific value in a variable.
The json bellow is only a small part of the file.
Due to this specific file json structure be inconsistent and subject to change overtime, it need to by done via search like grep sed awk.
however the paramenters bellow will be always the same.
search for the word next
get the next line bellow it
extract everything after the word page_token not the boundary "
store in a variable to be used
test.txt:
"link": [
{
"relation": "search",
"url": "aaa/ww/rrrrrrrrr/aaaaaaaaa/ffffffff/ccccccc/dddd/?token=gggggggg3444"
},
{
"relation": "next",
"url": "aaa/ww/rrrrrrrrr/aaaaaaaaa/ffffffff/ccccccc/dddd/?&_page_token=121_%_#212absa23bababa121212121212121"
},
]
so the desired output in this case is:
PAGE_TOKEN="121_%_#212absa23bababa121212121212121"
my attempt:
PAGE_TOKEN=$(cat test.txt| grep "next" | sed 's/^.*: *//;q')
no lucky..
This might work for you (GNU sed):
sed -En '/next/{n;s/.*(page_token=)([^"]*).*/\U\1\E"\2"/p}' file
This is essentially a filtering operation, hence the use of the -n option.
Find a line containing next, fetch the next line, format as required and print the result.
Presuming your input is valid json, one option is to use:
cat test.json
[{
"relation": "search",
"url": "aaa/ww/rrrrrrrrr/aaaaaaaaa/ffffffff/ccccccc/dddd/?token=gggggggg3444"
},
{
"relation": "next",
"url": "aaa/ww/rrrrrrrrr/aaaaaaaaa/ffffffff/ccccccc/dddd/?&_page_token=121_%_#212absa23bababa121212121212121"
}
]
PAGE_TOKEN=$(cat test.json | jq -r '.[] | select(.relation=="next") | .url | gsub(".*=";"")')
echo "$PAGE_TOKEN"
121_%_#212absa23bababa121212121212121

How I can edit my response to 1 row + add condition to jq function

Hey I am using the next command
aws ec2 describe-volumes --filters Name=status,Values=available | jq '.Volumes[] | { State: .State, VolumeId: .VolumeId, Tags: .Tags}'
I get the next response:
{
"State": "available",
"VolumeId": "vol-03449dadd29f2067f",
"Tags": [
{
"Key": "sre",
"Value": "test"
}
]}
1.I want my response will be in 1 row regarding each volume.
2. I want to check if the volume has the tag "Name" if so I want the value, otherwise the tags are not interest me. How can I manipulate it ?
Use the -c option.
Use a query of the form select(any(Tags[]; CONDITION))
e.g. select(any(Tags[]; .Key == "Name"))
You can abbreviate expressions of the form {foo: .foo} to just {foo}
So, depending on your exact requirements, your invocation of jq could look like:
jq -c '.Volumes[] | {State, VolumeId, Tags} | select(any(.Tags[]; .Key == "Name"))'

jq bash issue with filtering with CONTAINS

I have an issue where I am trying to filter records with a CONTAINS, but it won't accept a variable that has spaces in it. I am including the JSON and the calls. I explain what works and the last one that does not work. I have looked High and Low but I can't make it work. I have seen and tried many (hundreds of ways taking into account the double quotes, escaped, not escaped, with, without, but no luck) can someone take a look and point me to something that might help.
JSON used to test
_metadatadashjson='{ "meta": { "provisionedExternalId": "" }, "dashboard": { "liveNow": false, "panels": [ { "collapsed": false, "title": "Gyrex Thread Count Gauges", "type": "row", "targets": [ { "expr": "jvm_threads_current{instance=\"192.1.50.22:8055\",job=\"prometheus_gyrex\"}", "refId": "B" } ] }, { "datasource": "Prometheus_16_Docker", "targets": [ { "exemplar": true, "expr": "jvm_threads_current{instance=\"10.32.0.4:8055\",job=\"prometheus_gyrex\"}" } ], "title": ".16 : 3279", "type": "gauge" }, { "description": "", "targets": [ { "expr": "jvm_threads_current{instance=\"10.32.0.7:8055\",job=\"prometheus_gyrex\"}", "refId": "B" } ], "title": ".16 : 3288", "type": "graph" }, { "description": "", "targets": [ { "expr": "jvm_threads_current{instance=\"192.168.2.16:3288\",job=\"prometheus_gyrex\"}", "refId": "C" } ], "title": ".16 : 3288", "type": "graph" } ], "version": 55 }}'
Set the string to search for in key "expr"
exprStrSearch="10.32.0.4:8055"
This works returns one record
echo "${_metadatadashjson}" | jq -r --arg EXPRSTRSEARCH "$exprStrSearch" '.dashboard.panels[] | select(.targets[].expr | contains($EXPRSTRSEARCH)) | .targets[].expr'
This works no problem returns two records.
echo "${_metadatadashjson}" | jq -r --arg EXPRSTRSEARCH "$exprStrSearch" '.dashboard.panels[] | select(.targets[].expr | contains("10.32.0.4:8055", "10.32.0.7:8055")) | .targets[].expr'
Change the value to include a space and another string
exprStrSearch="10.32.0.4:8055 10.32.0.7:8055"
Does not work.
echo "${_metadatadashjson}" | jq -r --arg EXPRSTRSEARCH "$exprStrSearch" '.dashboard.panels[] | select(.targets[].expr | contains($EXPRSTRSEARCH)) | .targets[].expr'
None of your data contains "10.32.0.4:8055 10.32.0.7:8055".
You could pass multiple strings to contains(), using a bash array:
strings=("10.32.0.4:8055" "10.32.0.7:8055")
echo "${_metadatadashjson}" |
jq -r --args '.dashboard.panels[] | select(.targets[].expr | contains($ARGS.positional[])) | .targets[].expr' "${strings[#]}"
But contains will evaluate to true for each match. Ie. if one expr contained both strings, it would be selected (and printed) twice.
With test, that won't happen. Here's how you can add the |s between multiple strings, and pass them in a single jq variable (as well as escape all the dots):
strings=("10.32.0.4:8055" "10.32.0.7:8055")
IFS=\|
echo "${_metadatadashjson}" |
jq -r --arg str "${strings[*]//./\\.}" '.dashboard.panels[] | select(.targets[].expr | test($str)) | .targets[].expr'
Both examples print this:
jvm_threads_current{instance="10.32.0.4:8055",job="prometheus_gyrex"}
jvm_threads_current{instance="10.32.0.7:8055",job="prometheus_gyrex"}
Update: I forgot to escape the dots for test. I edited the test example so that all the dots get escaped (with a single backslash). It's regex, so (unescaped) dots will match any character. The contains example matches the strings literally (not regex).
The problem is that the string with the space in it does not in fact occur in the given JSON. It's not too clear what you are trying to do but please note that contains is not symmetric:
"a" | contains("a b")
evaluates to false.
If you intended to write a boolean search criterion, you could use a boolean expression, or use jq's regular expression machinery, e.g.
test("10.32.0.4:8055|10.32.0.7:8055")
or probably even better:
test("\"(10[.]32[.]0[.]4:8055|10[.]32[.]0[.]7:8055)\"")

jq Compare two files and output the difference in text format

I have 2 files
file_one.json
{
"releases": [
{
"name": "bpm",
"version": "1.1.5"
},
{
"name": "haproxy",
"version": "9.8.0"
},
{
"name": "test",
"version": "10"
}
]
}
and file_two.json
{
"releases": [
{
"name": "bpm",
"version": "1.1.6"
},
{
"name": "haproxy",
"version": "9.8.1"
},
{
"name": "test",
"version": "10"
}
]
}
In file 2 the versions were changed and i need to echo the new changes.
I have used the following command to see the changes:
diff -C 2 <(jq -S . file_one.json) <(jq -S . file_two.json)
But than i need to format the output to something like this.
I need to output text:
The new versions are:
bpm 1.1.6
haproxy 9.8.1
You may be able to use the following jq command :
jq --slurp -r 'map(.releases) | add
| group_by(.name)
| map(unique | select(length > 1) | max_by(.version))
| map("\(.name) : \(.version)") | join("\n")'
file_one.json file_two.json
It first merges the two releases arrays, groups the elements by name, then unicize the elements of the resulting arrays, remove the arrays with a single element (the versions that were identic between the two files), then map the arrays into their greatest element (by version) and finally format those for display.
You can try it here.
A few particularities that might make this solution incorrect for your use :
it doesn't only report version upgrades, but also version downgrades. However, it always returns the greatest version, disregarding which file contains it.
the version comparison is alphabetic. It's okay with your sample, but it can fail for multi-digits versions (e.g. 1.1.5 is considered greater than 1.1.20 because 5 > 2). This could be fixed but might not be problematic depending on your versionning scheme.
Edit following your updated request in the comments : the following jq command will output the versions changed between the first file and the second. It nicely handles downgrades and somewhat handles products that have appeared or disappeared in the second file (although it always shows the version as version --> null whether it is a product that appeared or disappeared).
jq --slurp -r 'map(.releases) | add
| group_by(.name)
| map(select(.[0].version != .[1].version))
| map ("\(.[0].name) : \(.[0].version) --> \(.[1].version)")
| join("\n")' file_one.json file_two.json
You can try it here.

Generate json file with formatting

I have a curl command which generates json output. I want to add a few characters in generated file to be able to process it further.
Command:
curl -sN --negotiate -u foo:bar "http://hostname/db/tbl_name/" >> db.json
This runs under a for loop which runs it for a db and tbl_name combination. Hence it ends up generating a number of json outputs(one for each table) concatenated together without any delimiter.
Output looks like :
{"columns":[{"name":"tbl_id","type":"varchar(50)"},{"name":"cret_timestmp","type":"timestamp"},{"name":"updt_timestmp","type":"timestamp"},{"name":"frst_nm","type":"varchar(50)"},{"name":"last_nm","type":"varchar(50)"},{"name":"acct_num","type":"varchar(15)"},{"name":"r_num","type":"varchar(15)"},{"name":"pid","type":"decimal(15,0)"},{"name":"ami_id","type":"varchar(30)"},{"name":"ssn","type":"varchar(9)"},{"name":"client_id","type":"varchar(30)"},{"name":"client_nm","type":"varchar(100)"},{"name":"info","type":"timestamp"},{"name":"rmx","type":"varchar(10)"},{"name":"id","type":"decimal(12,0)"},{"name":"ingest_timestamp","type":"string"},{"name":"incr_ingest_timestamp","type":"string"}],"database":"db_i","table":"db_tbl"}{"columns":[{"name":"key","type":"varchar(15)"},{"name":"foo_cd","type":"varchar(10)"},{"name":"foo_nm","type":"varchar(56)"},{"name":"tmc_regn_cd","type":"varchar(10)"},{"name":"tmc_mrkt_cd","type":"varchar(20)"},{"name":"mrkt_grp","type":"varchar(30)"},{"name":"ingest_timestamp","type":"string"},{"name":"incr_ingest_timestamp","type":"string"}],"database":"db_i","table":"ss_mv"}{"columns":[{"name":"bar_src_name","type":"string"},{"name":"bar_ent_name","type":"string"},{"name":"from_src","type":"string"},{"name":"reload","type":"string"},{"name":"column_mismatch","type":"string"},{"name":"xx_src_name","type":"string"},{"name":"xx_ent_name","type":"string"}],"database":"db_i","table":"test_table"}
Desired output is to start and end the output with []. Also I want to include "," between the end and beginning where column list starts.
So for ex: if the curl command runs against 3 tables as shown above, then the three generated jsons should be created like :
[{json1},{json2},{json3}]
Number 1,2,3 ...etc corresponds to different tables in curl command running in for loop against a particular db whose json should be created in one file but with desired format.
instead of what I'm currently getting :
{json1}{json2}{json3}
In the output pasted above, JSON 1 is :
{"columns":[{"name":"tbl_id","type":"varchar(50)"},{"name":"cret_timestmp","type":"timestamp"},{"name":"updt_timestmp","type":"timestamp"},{"name":"frst_nm","type":"varchar(50)"},{"name":"last_nm","type":"varchar(50)"},{"name":"acct_num","type":"varchar(15)"},{"name":"r_num","type":"varchar(15)"},{"name":"pid","type":"decimal(15,0)"},{"name":"ami_id","type":"varchar(30)"},{"name":"ssn","type":"varchar(9)"},{"name":"client_id","type":"varchar(30)"},{"name":"client_nm","type":"varchar(100)"},{"name":"info","type":"timestamp"},{"name":"rmx","type":"varchar(10)"},{"name":"id","type":"decimal(12,0)"},{"name":"ingest_timestamp","type":"string"},
{"name":"incr_ingest_timestamp","type":"string"}],"database":"db_i","table":"db_tbl"}
JSON 2 is :
{"columns":[{"name":"key","type":"varchar(15)"},{"name":"foo_cd","type":"varchar(10)"},{"name":"foo_nm","type":"varchar(56)"},{"name":"tmc_regn_cd","type":"varchar(10)"},{"name":"tmc_mrkt_cd","type":"varchar(20)"},{"name":"mrkt_grp","type":"varchar(30)"},{"name":"ingest_timestamp","type":"string"},{"name":"incr_ingest_timestamp","type":"string"}],"database":"db_i","table":"ss_mv"}
JSON 3 is :
{"columns":[{"name":"bar_src_name","type":"string"},{"name":"bar_ent_name","type":"string"},{"name":"from_src","type":"string"},{"name":"reload","type":"string"},{"name":"column_mismatch","type":"string"},{"name":"xx_src_name","type":"string"},{"name":"xx_ent_name","type":"string"}],"database":"db_i","table":"test_table"}
I hope the requirement is clear, thanks in advance, looking to achieve this via bash.
Use jq -s.
--slurp/-s: Instead of running the filter for each JSON object in the input, read the entire input stream into a large array
and run the filter just once.
Here's an example:
$ cat file.json
{ "key": "value1" }
{ "key": "value2" }
{ "key":
"value3"}{"key": "value4"}
$ jq -s < file.json
[
{
"key": "value1"
},
{
"key": "value2"
},
{
"key": "value3"
},
{
"key": "value4"
}
]
I'm not sure if I got it correctly, but I think you are looking for something like
echo "[$(cat *.json | paste -sd ',')]" > result.json
This works by creating a string that starts with [ and ends with ], and in the middle, there are the contents of the json files concatenated (cat) and separated by commas (with the help of paste). That string is echoed and written to a new file.
Presuming input in valid JSONL format (one JSON document per line of input), you can embed a Python script inside your bash script:
slurpjson_py='
import json, sys
json.dump([json.loads(line.strip()) for line in sys.stdin], sys.stdout, indent=4)
sys.stdout.write("\n")
'
slurpjson() { python -c "$slurpjson_py" "$#"; }
If called as:
slurpjson <<EOF
{ "first": "document", "starting": "here" }
{ "second": "document", "ending": "here" }
EOF
...output is correctly:
[
{
"starting": "here",
"first": "document"
},
{
"second": "document",
"ending": "here"
}
]
I managed to achieve this by running curl command and adding a "," with every line break using
sed 's/$/,/'
And then remove the last "," and added first and end [] using :
for i in *; do cat $i | sed '$ s/.$//' | awk '{print "["$0"]"}' > $json_dir/$i; done

Resources