JQ - Argument list too long error - Large Input - shell

I use Jq to perform some filtering on a large json file using :
paths=$(jq '.paths | to_entries | map(select(.value[].tags | index("Filter"))) | from_entries' input.json)
and write the result to a new file using :
jq --argjson prefix "$paths" '.paths=$prefix' input.json > output.json
But this ^ fails as $paths has a very high line count (order of 100,000).
Error :
jq: Argument list too long
I also went through : /usr/bin/jq: Argument list too long error bash , understood the same problem there, but did not get the solution.

In general, assuming your jq allows it, you could use —argfile or —slurpfile but in your case you can simply avoid the issue by invoking jq just once instead of twice. For example, to keep things clear:
( .paths | to_entries | map(select(.value[].tags | index("Filter"))) | from_entries ) as $prefix
| .paths=$prefix
Even better, simply use |=:
.paths |= ( to_entries | map(select(.value[].tags | index("Filter"))) | from_entries)
or better yet, use with_entries.

Related

How to get the index of element using jq

I want to get the index of a element from the array.
Link: https://api.github.com/repos/checkstyle/checkstyle/releases
I want to fetch the tag_name.
I generally use this command to get my latest release tag:
LATEST_RELEASE_TAG=$(curl -s https://api.github.com/repos/checkstyle/checkstyle/releases/latest \
| jq ".tag_name")
But now I want previous releases too so I want the index of a particular tag name. For eg: tag_name = "10.3.1"
Also, I am thinking to use mathematical reduction to get the other previous release if I get a particular index number like:
( 'index of 10.3.1' - 1) Any thought regarding this?
Just index of "checkstyle-10.3.1":
curl -s https://api.github.com/repos/checkstyle/checkstyle/releases | jq '[.[].tag_name] | to_entries | .[] | select(.value=="checkstyle-10.3.1") | .key'
Release before "checkstyle-10.3.1":
curl -s https://api.github.com/repos/checkstyle/checkstyle/releases | jq '([.[].tag_name] | to_entries | .[] | select(.value=="checkstyle-10.3.1") | .key) as $index | .[$index+1]'
Release "checkstyle-10.3.1"
curl -s https://api.github.com/repos/checkstyle/checkstyle/releases | jq '.[] | select(.tag_name=="checkstyle-10.3.1")'
Here's a general-purpose function to get the index of something in an array:
# Return the 0-based index of the first item in the array input
# for which f is truthy, else null
def index_by(f):
first(foreach .[] as $x (-1;
.+1;
if ($x|f) then . else empty end)) // null;
This can be used to retrieve the item immediately before a target as follows:
jq '
def index_by(f): first(foreach .[] as $x (-1; .+1; if ($x|f) then . else empty end)) // null;
index_by(.tag_name == "checkstyle-10.3.1") as $i
| if $i then .[$i - 1] else empty end
'

Using jq and outputting specific columns with formatting

Can anyone help me to understand how I can print countryCode followed by connectionName and load with a percentage symbol all on one line nicely formatted - all using jq - not using sed, column or any other unix external command. I cannot seem print anything other than the one column
curl --silent "https://api.surfshark.com/v3/server/clusters" | jq -r -c "map(select(.countryCode == "US" and .load <= "99")) | sort_by(.load) | limit(20;.[]) | [.countryCode, .connectionName, .load] | (.[1])
Is this what you wanted ?
curl --silent "https://api.surfshark.com/v3/server/clusters" |
jq -r -c 'map(select(.countryCode == "US" and .load <= 99)) |
sort_by(.load) |
limit(20;.[]) |
"\(.countryCode) \(.connectionName) \(.load)%"'

Bash variable parameter expansions complete documentation

There are some lesser known bash variable expansions:
+----------------------------------------------------------+----------------+
| description | expression |
+----------------------------------------------------------+----------------+
| Remove everything **after** the **last** '7' | ${var%7*} |
| Remove everything **after** the **first** '7' | ${var%%7*} |
| Remove everything **before** the **first** '7' | ${var#*7} |
| Remove everything **before** the **last** '7' | ${var##*7} |
| First char upper case | ${var^} |
| All upper case | ${var^^} |
| First char lower case | ${var,} |
| All lower case | ${var,,} |
| Show how variable was set | ${var#A} |
| ?? something cool ?? | ${var#E} |
| Print variable as though it were the prompt variable PS1 | ${var#P} |
| ?? something cool ?? | ${var#Q} |
+----------------------------------------------------------+----------------+
I have been struggling to find a source that documents all of these tricks. So far the best one I have found is this cheat sheet. But even that page is missing some of these expansion rules. For the purposes of writing good bash code, and making that code portable I am looking for several things:
What are all of the bash variable expansion tricks?
Where is there a document that shows all of them (with examples ideally)?
What versions of bash do which tricks work with?
Some good pointers on parameter expansions:
https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html
http://mywiki.wooledge.org/BashFAQ/073
https://wiki.bash-hackers.org/syntax/pe
You missed many, like
single substitution a -> b : ${x/a/b}
multiple substitutions a -> b : ${x//a/b}
offset manipulation: ${x:1:3}
${var-word} if var is defined, use var; otherwise, "word"
${var+word} if var is defined, use "word"; otherwise, nothing
${var=word} if var is defined, use var; otherwise, use "word" AND also assign "word" to var
${var?error} if var is defined, use var; otherwise print "error" and exit
array slice ${files[#]: -4}
Note that most of PE works with array too

Merge multiple jq invocations to sort and limit the content of a stream of objects

I have a json stream of updates for products, and I'm trying to get the last X versions sorted by version (they are sorted by release date currently).
It looks like jq can't sort a stream of objects directly, sort_by only works on arrays, and I couldn't find a way to collect a stream into an array that doesn't involve piping the output of jq -c to jq -s.
My current solution:
< xx \
jq -c '.[] | select(.platform | contains("Unix"))' \
| jq -cs 'sort_by(.version) | reverse | .[]' \
| head -5 \
| jq -C . \
| less
I expected to be able to use
jq '.[] | select(...) | sort_by(.version) | limit(5) | reverse'
but I couldn't find a thing that limits and sort_by doesn't work on non arrays.
I am testing this on atlassian's json for releases: https://my.atlassian.com/download/feeds/archived/bamboo.json
In jq you can always contain the results to an array using the [..] that put the results to an array for the subsequent functions to operate on. Your given requirement could be simply done as
jq '[.[] | select(.platform | contains("Unix"))] | sort_by(.version) | limit(5;.[])'
See it working on jq playground tested on v1.6
and with added reverse() function, introduce an another level of array nesting. Use reverse[] to dump the objects alone
jq '[[.[] | select(.platform | contains("Unix"))] | sort_by(.version) | limit(5;.[]) ] | reverse'

Strip a particular word from the beginning of string in jq output

I am getting list of values as below using the curl command:
curl -s http://internal.registry.com/v2/_catalog | jq -r '.repositories[0:5] | to_entries | map( .value )[]'
Output:
centos
containersol/consul-server
containersol/mesos-agent
containersol/mesos-master
cybs/address-api
I want to make sure that output should not have the prefix cybs/ in it. for example, cybs/address-api should just be address-api
Just use sub:
curl ... | jq -r '.repositories[0:5][] | sub("^cybs/"; "")'
Also note that to_entries | map( .value ) is a NOP and should be removed.
Output:
centos
containersol/consul-server
containersol/mesos-agent
containersol/mesos-master
address-api

Resources