jq with multiple inputs from different sources - shell

How can we mix different input sources when using jq ?
For a specific usecase, I'd like to add some data from a file into a feed that was pipe in stdout.
$ echo '[{"a": 1}]' > /tmp/a1
$ echo '[{"a": 2}]' > /tmp/a2
$ jq --slurp '.[0] + .[1]' /tmp/a1 /tmp/a2
[
{
"a": 1
},
{
"a": 2
}
]
$ cat /tmp/a1 | jq --slurp '.[0] + .[1]' /tmp/a2 # Expecting the same result
[
{
"a": 2
}
]
As you can see, the last command didn't interpret the piped data.
Right now, I'm forced to save the output from the first operation into a temporary file, so that I can do the jq merging operation, before sending it back to the network. Having a single stream would be much more efficient

I'd like to add some data from a file into a feed that was pipe in stdout.
There are various ways to do this, depending on the shell and also the version of jq you are using.
Assuming your jq supports the --argfile option, you might find that quite congenial:
cat /tmp/a1 | jq --argfile a2 /tmp/a2 '. + $a2'
Here is another variation that suggests some of the other possibilities:
jq -n --argfile a1 <(cat /tmp/a1) --argfile a2 <(cat /tmp/a2) '$a1 + $a2'
More interestingly:
(cat /tmp/a1 ; cat /tmp/a2) | jq '. + input'
You might also wish to consider using the --slurpfile option instead of --argfile, but note that --slurpfile always "slurps" the file.
And finally an approach that should work for every version of jq:
jq -s '.[0] + .[1]' <(cat /tmp/a1) /tmp/a2
In general, though, it's best to avoid the -s option.
A note on slurping
If you compare the outputs produced by:
echo '1 2' |
jq -s --debug-dump-disasm --debug-trace '.[0], .[1]'
and
echo '1 2' |
jq --debug-dump-disasm --debug-trace '., input'
you'll notice the former has to PUSHK_UNDER to store the entire array [1,2],
whereas the latter program just reads the two inputs separately.
In the first program, the memory for the array cannot be freed until after
all the pointers into it have been processed, whereas in the second program,
the memory for . can be freed after the first RET.

You could do this, where cat forwards its stdin followed by a2:
<GENERATE a1> | cat - /tmp/a2 | jq --slurp '.[0] + .[1]'
Or this, which is a compound statement passing the results of two separate commands into a pipe:
{ <GENERATE a1> ; cat /tmp/a2; } | jq --slurp '.[0] + .[1]'
Take care to have spaces beside the curly braces and to have a semi-colon before the final one.

Completing peak's answer, you actually don't need redirections:
jq -n --argfile a1 /tmp/a1 --argfile a2 /tmp/a2 '$a1 + $a2'

Related

Why can't jq's slurp handle a combination of here-strings and other files?

I want to merge some json in a file with some json generated at runtime. jq seems to have no difficulty if all the files passed to it are here-strings, or files in the system. But if I try to mix the file types it seems the here-strings are ignored, see snippet below:
Two normal files:
bash-4.2# echo '{"key":0}' > zero
bash-4.2# echo '{"key":1}' > one
bash-4.2# jq --slurp add zero one
{
"key": 1
}
Normal file and here-string (only normal file appears in result!):
bash-4.2# jq --slurp add zero <<< '{"key":1}'
{
"key": 0
}
Here-string first, then normal file (only normal file appears in result!):
bash-4.2# jq --slurp add <<< '{"key":0,"anotherkey":2}' one
{
"key": 1
}
Single here-string (works fine):
bash-4.2# jq --slurp add <<< '{"key":0}'
{
"key": 0
}
Two here-strings (works fine): EDIT: Output is misleading, something else is going on here.
bash-4.2# jq --slurp add <<< '{"key":0}' <<< '{"key":1}'
{
"key": 1
}
My suspicion is that jq works just fine and I am ignorant of how bash resolves the here-strings. But, how would I debug this to improve my understanding?
Note: A very easy workaround would be to evaluate my runtime json and produce a file, then merge the two files as above. I really want to know why the bold examples above don't produce what I would expect.
After reading through the comments this is my understanding:
<<< is evaluated by the shell first and redirects stdin. If jq receives no positional arguments after the filter, it reads from stdin. Therefore all these statements are equivalent:
echo "{}" | jq --slurp add
<<< {} jq --slurp add
jq <<< {} --slurp add
jq --slurp <<< {} add
jq --slurp add <<< {}
If jq does receive positional arguments after the filter, it interprets them as filenames. It adheres to the convention of treating - as stdin.
bash-4.2# echo '{"one":1,"two":1}' > first
bash-4.2# echo '{"three":3}' > third
bash-4.2# jq --slurp add first - third <<< '{"two":2}'
{
"one": 1,
"two": 2,
"three": 3
}
The here-string construct simply redirects standard input. You will separately need to tell jq to read standard input if you call it in a way where it receives file name arguments. The de facto standard way to do that is to specify - as the input (pseudo-) filename.
I believe one of your test cases didn't actually work, and just looked like it did because the input data was constructed so as to be a no-op.
One idea would be to use process substitution which, in essence, provides jq with a (temp) file descriptor it can work with.
Using awk to demonstrate the file descriptor idea:
$ awk '{print FILENAME}' <(echo 'abc')
/dev/fd/63
Demonstrating with a few of your examples:
$ jq --slurp add zero <(echo '{"key":1}')
{
"key": 1
}
$ jq --slurp add zero <(echo '{"keyx":1}')
{
"key": 0,
"keyx": 1
}
$ jq --slurp add <(echo '{"key":0,"anotherkey":2}') one
{
"key": 1,
"anotherkey": 2
}
$ jq --slurp add <(echo '{"key":0}') <(echo '{"key":1}')
{
"key": 1
}
$ jq --slurp add <(echo '{"key":0}') <(echo '{"keyx":1}')
{
"key": 0,
"keyx": 1
}

Bash loop is not working when word contains space

I am using JQ module the parse some of the data and then running the final loop over it to parse few more data.
cluster_list=`databricks --profile hq_dev clusters list --output JSON | jq 'select(.clusters != null) | .clusters[] | [.cluster_name,.autotermination_minutes,.state,.cluster_id] | #csv' | grep -v "job-"`
for cluster in ${cluster_list[#]}
do
cluster_id=`echo $cluster| cut -d "," -f 4 | sed 's/\"//g' | sed 's/\\\//g'`
cluster_name=`echo "${cluster}"| cut -d "," -f 1| sed 's/\"//g' | sed 's/\\\//g'`
echo $cluster_name
done
cluster_list contains following value.
"\"Test Space Cluster\",15,\"TERMINATED\",\"ddd-dese23-can858\""
"\"GatewayCluster\",15,\"TERMINATED\",\"ddd-ddsd-ddsds\""
"\"delete_later\",15,\"TERMINATED\",\"1120-195800-93839\""
"\"GatewayCluster_old\",15,\"TERMINATED\",\"0108-2y7272-393893\""
it prints following.
Test
Space
Cluster
GatewayCluster
delete_later
GatewayCluster_old
Desired output
it shouldn't break to newline if there is a space, I am doing few more action by the name I am getting here.
Test Space Cluster
GatewayCluster
delete_later
GatewayCluster_old
Your script seems a bit overly complex to achieve your goal. Better use read to store each value in a separate variable, and set a comma for the input field separator IFS:
databricks --profile hq_dev clusters list --output JSON |
jq 'select(.clusters != null) | .clusters[] |
[.cluster_name,.autotermination_minutes,.state,.cluster_id] | #csv' |
grep -v "job-" |
sed 's/\\\?"//g' |
while IFS=, read name autotermination_minutes state id ; do
echo $name
done
Note: I didn't touch your jq command. The sed line I put aims to remove quotes, protected or not. You can tune jq to remove these quotes with -r, as said in the man page:
INVOKING JQ
[...]
--raw-output / -r::
With this option, if the filterĀ“s result is a string then it will be written directly to standard output rather than being formatted as a JSON string with quotes. This can be useful for making jq filters talk to non-JSON-based systems.

jq and bash: object construction with --arg is not working

Given the following input:
J='{"a":1,"b":10,"c":100}
{"a":2,"b":20,"c":200}
{"a":3,"b":30,"c":300}'
The command
SELECT='a,b'; echo $J | jq -c -s --arg P1 $SELECT '.[]|{a,b}'
produces
{"a":1,"b":10}
{"a":2,"b":20}
{"a":3,"b":30}
but this command produces unexpected results:
SELECT='a,b'; echo $J | jq -c -s --arg P1 $SELECT '.[]|{$P1}'
{"P1":"a,b"}
{"P1":"a,b"}
{"P1":"a,b"}
How does one get jq to treat an arg string literally?
Using tostring gives an error
SELECT='a,b'; echo $J | jq -c -s --arg P1 $SELECT '.[]|{$P1|tostring}'
jq: error: syntax error, unexpected '|', expecting '}' (Unix shell quoting
issues?) at <top-level>, line 1:
.[]|{$SELECT|tostring}
jq: 1 compile error
SELECT needs to be a variable and not hardcoded in the script.
SELECT needs to be a variable and not hardcoded in the script.
Assuming you want to avoid the risks of "code injection" and that you want the shell variable SELECT to be a simple string such as "a,b", then consider this reduce-free solution along the lines you were attempting:
J='{"a":1,"b":10,"c":100}'
SELECT='a,b'
echo "$J" |
jq -c --arg P1 "$SELECT" '
. as $in | $P1 | split(",") | map( {(.): $in[.]} ) | add'
Output:
{"a":1,"b":10}
If you really want your data to be parsed as syntax...
This is not an appropriate use case for --arg. Instead, substitute into the code:
select='a,b'; jq -c -s '.[]|{'"$select"'}' <<<"$j"
Note that this has all the usual caveats of code injection: If the input is uncontrolled, the output (or other behavior of the script, particularly if jq gains more capable I/O features in the future) should be considered likewise.
If you want to split the literal string into a list of keys...
Here, we take your select_str (of the form a,b), and generate a map: {'a': 'a', 'b': 'b'}; then, we can break each data item into entries, select only the items in the map, and there's our output.
jq --arg select_str "$select" '
($select_str
| split(",")
| reduce .[] as $item ({}; .[$item]=$item)) as $select_map
| with_entries(select($select_map[.key]))' <<<"$j"

using jq to assign multiple output variables

I am trying to use jq to parse information from the TVDB api. I need to pull a couple of fields and assign the values to variables that I can continue to use in my bash script. I know I can easily assign the output to one variable through bash with variable="$(command)" but I need the output to produce multiple variables and I don't want to make to use multiple commands.
I read this documentation:
https://stedolan.github.io/jq/manual/v1.5/#Advancedfeatures
but I don't know if this relevant to what I am trying to do.
jq '.data' produces the following output:
[
{
"absoluteNumber": 51,
"airedEpisodeNumber": 6,
"airedSeason": 4,
"airedSeasonID": 680431,
"dvdEpisodeNumber": 6,
"dvdSeason": 4,
"episodeName": "We Will Rise",
"firstAired": "2017-03-15",
"id": 5939660,
"language": {
"episodeName": "en",
"overview": "en"
},
"lastUpdated": 1490769062,
"overview": "Clarke and Roan must work together in hostile territory in order to deliver an invaluable asset to Abby and her team."
}
]
I tried jq '.data | {episodeName:$name}' and jq '.data | .episodeName as $name' just to try and get one working. I don't understand the documentation or even if it's what I'm looking for. Is there a way to do what I am trying to do?
You can use separate variables with read :
read var1 var2 var3 < <(echo $(curl -s 'https://api.github.com/repos/torvalds/linux' |
jq -r '.id, .name, .full_name'))
echo "id : $var1"
echo "name : $var2"
echo "full_name : $var3"
Using array :
read -a arr < <(echo $(curl -s 'https://api.github.com/repos/torvalds/linux' |
jq -r '.id, .name, .full_name'))
echo "id : ${arr[0]}"
echo "name : ${arr[1]}"
echo "full_name : ${arr[2]}"
Also you can split jq output with some character :
IFS='|' read var1 var2 var3 var4 < <(curl '......' | jq -r '.data |
map([.absoluteNumber, .airedEpisodeNumber, .episodeName, .overview] |
join("|")) | join("\n")')
Or use an array like :
set -f; IFS='|' data=($(curl '......' | jq -r '.data |
map([.absoluteNumber, .airedEpisodeNumber, .episodeName, .overview] |
join("|")) | join("\n")')); set +f
absoluteNumber, airedEpisodeNumber, episodeName & overview are respectively ${data[0]}, ${data[1]}, ${data[2]}, ${data[3]}. set -f and set +f are used to respectively disable & enable globbing.
For the jq part, all your required fields are mapped and delimited with a '|' character with join("|")
If your are using jq < 1.5, you'll have to convert Number to String with tostring for each Number fields eg:
IFS='|' read var1 var2 var3 var4 < <(curl '......' | jq -r '.data |
map([.absoluteNumber|tostring, .airedEpisodeNumber|tostring, .episodeName, .overview] |
join("|")) | join("\n")')
jq always produces a stream of zero or more values. For example, to produce the two values corresponding to "episodeName" and "id"' you could write:
.data[] | ( .episodeName, .id )
For your purposes, it might be helpful to use the -c command-line option, to ensure each JSON output value is presented on a single line. You might also want to use the -r command-line option, which removes the outermost quotation marks from each output value that is a JSON string.
For further variations, please see the jq FAQ https://github.com/stedolan/jq/wiki/FAQ, e.g. the question:
Q: How can a stream of JSON texts produced by jq be converted into a bash array of corresponding values?
Experimental conversion of quoted OP input, (tv.dat), to a series of bash variables, (and an array). The jq code is mostly borrowed from here and there, but I don't know how to get jq to unroll an array within an array, so the sed code does that, (that's only good for one level, but so are bash arrays):
jq -r ".[] | to_entries | map(\"DAT_\(.key) \(.value|tostring)\") | .[]" tv.dat |
while read a b ; do echo "${a,,}='$b'" ; done |
sed -e '/{.*}/s/"\([^"]*\)":/[\1]=/g;y/{},/() /' -e "s/='(/=(/;s/)'$/)/"
Output:
dat_absolutenumber='51'
dat_airedepisodenumber='6'
dat_airedseason='4'
dat_airedseasonid='680431'
dat_dvdepisodenumber='6'
dat_dvdseason='4'
dat_episodename='We Will Rise'
dat_firstaired='2017-03-15'
dat_id='5939660'
dat_language=([episodeName]="en" [overview]="en")
dat_lastupdated='1490769062'
dat_overview='Clarke and Roan must work together in hostile territory in order to deliver an invaluable asset to Abby and her team.'

Get line number (count of newlines) when piping in bash

I am converting a file of json documents to a file of differently shaped json documents using jq. I need the output documents to have a contiguous positive id. Can I access a variable that equals the number of newlines seen?
gzcat input.gz | jq -r '"{\"id\":???, \"foo\":\(.foo)}"' > output
# can anything take the place of ??? to give 0..n?
If your jq has input_line_number, you might be able to use that. Here is a typescript illustrating what it does:
$ jq 'input_line_number'
"a"
1
"b"
2
(In the above, the input line is immediately followed by the output line.)
Similarly, here is how foreach and inputs can be used together:
$ jq -n 'foreach inputs as $line (0; .+1; "line \(.) is \($line)")'
"abc"
"line 1 is abc"
123
"line 2 is 123"
If your jq does not have foreach, then you might find reduce adequate for your needs:
$ jq -s -r 'reduce .[] as $line
( [0,""]; .[0]+=1 | .[1] += "line \(.[0]) is \($line)\n")
| .[1]'
Input:
"abc"
123
Output:
line 1 is abc
line 2 is 123

Resources