Custom JSON output formatting with JQ - format

I'd like to use jq to format the output of some known keys in my objects more succinctly.
Sample object:
// test.json
[
{
"target": "some-string",
"datapoints": [
[
123,
456
],
[
789,
101112
]
]
}
]
I'd like to use JQ (with some incantation) to change this to put all the datapoints objects on a single line. E.g.
[
{
"target": "some-string",
"datapoints": [[ 123, 456 ], [ 789, 101112 ]]
}
]
I don't really know if JQ allows this. I searched around - and found custom formatters like https://www.npmjs.com/package/perfect-json which seem to do what I want. I'd prefer to have a portable incantation for this using jq alone (and/or with standard *nix tools).

Use a two-pass approach. In the first, stringify the field using special markers so that in the second pass, they can be removed.
Depending on your level of paranoia, this second pass could be very simple or quite complex. On the simple end of the spectrum, choose markers that simply will not occur elsewhere, perhaps "<q>…</q>", or using some combination of non-ASCII characters. On the complex end of the spectrum, only remove the markers if they occur in the fields in which they are known to be markers.
Both passes could be accomplished with jq, along the lines of:
jq '.[].datapoints |= "<q>\(tojson)</q>"' |
jq -Rr 'sub("<q>(?<s>.*)</q>"; .s)'

Using jq and perl :
jq 'map(.datapoints |= "\u001b\(tojson)\u001b")
' test.json | perl -pe 's/"\\u001b(.*?)\\u001b"/$1/g'

Related

grep multiples results randomly results in bash

I'm making a query into a rest api, from this result i got:
{ "meta": { "query_time": 0.004266858, "pagination": { "offset": 0, "limit": 00, "total": 4 }, "powered_by": "device-api", "trace_id": "foo" }, "resources": [ "foo/bar", "foo/bar/2", "foo/bar/3", "foo/bar/4" ], "errors": [] }
I want to take results only from resources like this:
"resources": [
"foo/bar",
"foo/bar/2",
"foo/bar/3",
"foo/bar/4"
],
Can we share some knowledge? thanks a lot!
PS: these results from resources are random
Don't use grep or other regular expression tools to parse JSON. JSON is structured data and should be processed by a tool designed to read JSON. On the command line jq is a great tool for this purpose. There are many powerful JSON libraries written in other languages if jq isn't what you need.
Once you've extracted the data you care about, you can use the shuf utility to select random lines, e.g. shuf -n 5 would sample five random lines from the input.
With the JSON you've provided this appears to do what I think you want:
jq --raw-output '.resources[]' | shuf -n 2
You may need to tweak the jq syntax slightly if the real JSON has a different structure.

use bash string as jq filter

I don't understand what I'm doing wrong or why this does not work.
test.json file:
[
{
"Header": {
"Region": "US",
"Tenant": "Tenant1",
"Stage": "testing",
"ProductType": "old"
},
"Body": []
},
{
"Header": {
"Region": "EU",
"Tenant": "Tenant2",
"Stage": "development",
"ProductType": "new"
},
"Body": []
}
]
I want to display the values of the .Header.Tenant key. So the simple jq call does its job:
$ jq '[.[].Header.Tenant]' test.json
[
"Tenant1",
"Tenant2"
]
Now I want to assign that jq filter to a bash variable and use it with jq's --arg variable.
And I am getting this:
$ a=".[].Header.Tenant"; jq --arg xx "$a" '[$xx]' test.json
[
".[].Header.Tenant"
]
What is wrong?
jq does not have an eval function for evaluating arbitrary jq expressions, but it does provide functions that can be used to achieve much the same effect, the key idea being that certain JSON values can be used to specify query operations.
In your case, you would have to translate the jq query into a suitable jq operation, such as:
jq --argjson a '["Header","Tenant"]' '
getpath(paths|select( .[- ($a|length) :]== $a))
' test.json
Extending jq's JSON-based query language
More interestingly, you could write your own eval, e.g.
jq --argjson a '[[], "Header","Tenant"]' '
def eval($expr):
if $expr == [] then .
else $expr[0] as $op
| if $op == [] then .[] | eval($expr[1:])
else getpath([$op]) | eval($expr[1:])
end
end;
eval($a)
' test.json
With eval.jq as a module
If the above def of eval were put in a file, say ~/jq/eval.jq, then you could simply write:
jq -L ~/jq --argjson a '[[], "Header","Tenant"]' '
include "eval";
eval($a)' test.json
Or you could specify the search path in the jq program:
jq --argjson a '[[], "Header","Tenant"]' '
include "eval" { "search": "~/jq" };
eval($a)' input.json
Or you could use import ...
TLDR; The following code does the job:
$ a=".[].Header.Tenant"; jq -f <(echo "[$a]") test.json
[
"Tenant1",
"Tenant2"
]
One as well can add/modify the filter in the jq call, if needed:
$ a=".[].Header.Tenant"; jq -f <(echo "[$a]|length") test.json
2
Longer explanation
My ultimate goal was to figure out how I can define the lowest common denominator jq filter in a variable and use it when calling jq, plus add additional parameters if necessary. If you have a really complex jq filter spanning multiple lines that you call frequently, you probably want to template it somehow and use that template when calling jq.
While peak demonstrated how it can be done, I think it is overengineering the simple task.
However, using process substitution combined with the jq's -f option to read a filter from the file does solve my problem.

How to access special character like $,# using jq

I am trying to process some string which has special characters in it like abc123#45 or ab$123 or qwe&123.
I am trying to fetch it in shell like:
In json file : foo=qwe$123
foo=`cat tmp_json | jq -r '.keys.foo'`
But it is coming like :
foo=qwe23
JSON input
{
"metadata": {
"name": "xyz",
"version": 7,
"lastUpdated": 1585551422521
},
"keys": {
"abc": "qwe$123",
"foo": "qwe$123"
}
}
When shell strings contain special characters that you do not want to be interpreted specially by the shell, you have to quote them using single quotes, e.g. foo='qwe$123'
Using bash 4.x, the form
x=`...`
does not present any problems with respect to characters such $, #, or &, though it should be noted that the preferred form for such assignments is x=$(...)
However these forms should only be used with great care because of other special characters.
Generally, it would be better to use an idiom such as:
jq -r .... | while -r read line ; do .... ; done
Depending on your requirements, you might also wish to consider jq's #sh filter.

Bash/*NIX: split a file into multiple files on a substring

Variants of this question have been asked and answered before, but I find that my sed/grep/awk skills are far too rudimentary to work from those to a custom solution since I hardly ever work in shell scripts.
I have a rather large (100K+ lines) text file in which each line defines a GeoJSON object, each such object including a property called "county" (there are, all told, 100 different counties). Here's a snippet:
{"type": "Feature", "properties": {"county":"ALAMANCE", "vBLA": 0, "vWHI": 4, "vDEM": 0, "vREP": 2, "vUNA": 2, "vTOT": 4}, "geometry": {"type":"Polygon","coordinates":[[[-79.537429,35.843303],[-79.542428,35.843303],[-79.542428,35.848302],[-79.537429,35.848302],[-79.537429,35.843303]]]}},
{"type": "Feature", "properties": {"county":"NEW HANOVER", "vBLA": 0, "vWHI": 0, "vDEM": 0, "vREP": 0, "vUNA": 0, "vTOT": 0}, "geometry": {"type":"Polygon","coordinates":[[[-79.532429,35.843303],[-79.537428,35.843303],[-79.537428,35.848302],[-79.532429,35.848302],[-79.532429,35.843303]]]}},
{"type": "Feature", "properties": {"county":"ALAMANCE", "vBLA": 0, "vWHI": 0, "vDEM": 0, "vREP": 0, "vUNA": 0, "vTOT": 0}, "geometry": {"type":"Polygon","coordinates":[[[-79.527429,35.843303],[-79.532428,35.843303],[-79.532428,35.848302],[-79.527429,35.848302],[-79.527429,35.843303]]]}},
I need to split this into 100 separate files, each containing one county's GeoJSONs, and each named xxxx_bins_2016.json (where xxxx is the county's name). I'd also like the final character (comma) at the end of each such file to go away.
I'm doing this in Mac OSX, if that matters. I hope to learn a lot by studying any solutions you could suggest, so if you feel like taking the time to explain the 'why' as well as the 'what' that would be fantastic. Thanks!
EDITED to make clear that there are different county names, some of them two-word names.
jq can kind of do this; it can group the input and output one line of text per group. The shell then takes care of writing each line to an appropriately named file. jq itself doesn't really have the ability to open files for writing that would allow you to do this in a single process.
jq -Rn -c '[inputs[:-1]|fromjson] | group_by(.properties.county)[]' tmp.json |
while IFS= read -r line; do
county=$(jq -r '.[0].properties.county' <<< $line)
jq -r '.[]' <<< "$line" > "$county.txt"
done
[inputs[:-1]|fromjson] reads each line of your file as a string, strips the trailing comma, then parses the line as JSON and wraps the lines into a single array. The resulting array is sorted and grouped by county name, then written to standard output, one group per line.
The shell loop reads each line, extracts the county name from the first element of the group with a call to jq, then uses jq again to write each element of the group to the appropriate file, again one element per line.
(A quick look at https://github.com/stedolan/jq/issues doesn't appear to show any requests yet for an output function that would let you open and write to a file from inside a jq filter. I'm thinking of something like
jq -Rn '... | group_by(.properties.county) | output("\(.properties.county).txt")' tmp.json
without the need for the shell loop.)
If using string parsing rather than proper JSON parsing to extract the county name is acceptable - brittle in general, but would work in this simple case - consider Sam Tolton's GNU awk answer, which has the potential to be by far the simplest and fastest solution.
To complement chepner's excellent answer with a variation that focuses on performance:
jq -Rrn '[inputs[:-1]|fromjson] | .properties.county + "|" + (.|tostring)' file |
awk -F'|' '{ print $2 > ($1 "_bins_2016.json") }'
Shell loops are avoided altogether, which should speed up the operation.
The general idea is:
Use jq to trim the trailing , from each input line, interpret the trimmed string as JSON, extract the county name, then output the trimmed JSON strings prepended with the county name and a distinct separator, |.
Use an awk command to split each line into the prepended county name and the trimmed JSON string, which allows awk to easily construct the output filename and write the JSON string to it.
Note: The awk command keeps all output files open until the script has finished, which means that, in your case, 100 output files will be open simultaneously - a number that shouldn't be a problem, however.
In cases where it is a problem, you can use the following variation, in which jq first sorts the lines by county name, which then allows awk to immediately close the previous output field whenever the next county is reached in the input:
jq -Rrn '
[inputs[:-1]|fromjson] | sort_by(.properties.county)[] |
.properties.county + "|" + (.|tostring)
' file |
awk -F'|' '
prevCounty != $1 { if (outFile) close(outFile); outFile = $1 "_bins_2016.json" }
{ print $2 > outFile; prevCounty = $1 }
'
A simpler version of chepner's answer:
while IFS= read -r line
do
countyName=$(jq --raw-output '.properties.county' <<<"${line: : -1}")
jq <<< "${line: : -1}" >> "$countyName"_bins_2016.json
done<file
The idea is to filter the county name using a jq filter after stripping the , from each line of your input file. Then the line is passed to jq as plain stream to produce a JSON file in prettified format.
If you are from a relatively older version of bash (< 4.0) use "${line%?}" over "${line: : -1}"
For example with the change above, one of your county becomes,
cat ALAMANCE_bins_2016.json
{
"type": "Feature",
"properties": {
"county": "ALAMANCE",
"vBLA": 0,
"vWHI": 0,
"vDEM": 0,
"vREP": 0,
"vUNA": 0,
"vTOT": 0
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-79.527429,
35.843303
],
[
-79.532428,
35.843303
],
[
-79.532428,
35.848302
],
[
-79.527429,
35.848302
],
[
-79.527429,
35.843303
]
]
]
}
}
Note: The current solution could be performance intensive as reading file line by line is an expensive operation, and equally invoking jq for each of the lines.
This will do what you want minus getting rid of the last comma:-
gawk 'match($0, /"county":"([^"]+)/, array){ print >array[1]"_bins_2016.json" }' INPUT_FILE
This will output files in the current path with a filename in the format COUNTRY NAME_bins_2016.json.
The script goes line by line and uses a regex to match the exact term "country":" followed by 1 or more characters that aren't a ". It captures the characters within the quotes and then uses it as part of the filename to append the current line to.
To remove the trailing comma from all .json files in the current path you could use:-
sed -i '$ s/,$//' *.json
If you were certain that the last char was always a comma, a faster solution would be to use truncate:-
truncate -s-1 *.json
Last part taken from this answer: https://stackoverflow.com/a/40568723/1453798
Here is a quickie script that will do the job. It has the virtue of working on most systems without having to install any other tools.
IFS=$'\n'
counties=( $( sed 's/^.*"county":"//;s/".*$//' counties.txt ) )
unset IFS
for county in "${!counties[#]}"
do
county="${counties[$i]}"
filename="$county".out.txt
echo "'$filename'"
grep "\"$county\"" counties.txt > "$filename"
done
The setting of IFS to \n allows the array elements to contain spaces. The sed command strips off all the text up to the start of the county name and all the text after it. The for loop is the form that allows iterating over the array. Finally, the grep command needs to have double quotes around the search string so that counties that are substrings of other counties don't accidentally get put into the wrong file.
See this section of the GNU BASH Reference Manual for more info.

Parsing Json data columnwise in shell

When I run a command I get a response like this
{
"status": "available",
"managed": true,
"name":vdisk7,
"support":{
"status": "supported"
},
"storage_pool": "pfm9253_pfm9254_new",
"id": "ff10abad"-2bf-4ef3-9038-9ae7f18ea77c",
"size":100
},
and hundreds of this type of lists or dictionaries
I want a command that does such sort of a thing
if name = "something",
get the id
Any links that would help me in learning such sort of commands would be highly appreciated
I have tried
awk '{if ($2 == "something") print $0;}'
But I think the response is in Json so the colum wise awk formatting is not working.
Also it's just a single command that I need to run so I would prefer not to use any external library.
JSON parser is better for this task
awk and sed are utilities to parse line-oriented text, but not json. What if your json formatting will change ? (some lines will go on one line ?).
You should use any standard json parser out there. Or use some powerful scripting language, such as PHP, Python, Ruby, etc.
I can provide you with example on how to do it with python.
What if I can't use powerful scripting language ?
If you totally unable to use python, then there is utility jq out there: link
If you have some recent distro, jq maybe already in repositories (example: Ubuntu 13.10 has it in repos).
I can use python!
I would do that using simple python inline script.
For example we have some some_command that returns json as a result.
We have to get value of data["name"].
Here we go:
some_command | python -c "import json, sys; print json.load(sys.stdin)['name']"
It will output vdisk7 in your case
For this to work you need to be sure, json is fully valid.
If you have a list of json objects:
[
{
...
"name": "vdisk17"
...
},
{
...
"name": "vdisk18"
...
},
{
...
"name": "vdisk19"
...
},
...
]
You could use some list comprehensions:
some_command | python -c "import json, sys; [sys.stdout.write(x['name'] + '\n') for x in json.load(sys.stdin)]"
It will output:
vdisk17
vdisk18
vdisk19

Resources