how to pass json file name to jq? - command-line-arguments

currently I call
cat my_file.json | jq
to pretty print json data. I am a bit surprised that I can't do
I would like to avoid the extra cat; i.e.,
jq my_file.json
Can I specify a file name?

You need to specify the jq program to run:
jq . my_file.json
jq -h
The usage line produced by jq -h:
Usage: jq [options] <jq filter> [file...]
Note that the summary produced by invoking jq with the -h option does not (currently) provide a complete listing of the options. For the supported options, see the jq manual: https://stedolan.github.io/jq/manual/
Two undocumented options of note are:
--debug-dump-disasm
--debug-trace
jq .
Under certain circumstances, jq . can be abbreviated to jq but it's always safe to use the full form; a good rule of thumb is: if in doubt, do so.

Related

Difference between slurp, null input, and inputs filter

Given the input document:
{"a":1}
{"b":2}
{"c":3,"d":4}
What is the difference between the following jq programs (if any)? They all seem to produce the same output.
jq '[., inputs] | map(to_entries[].value)'
jq -n '[inputs] | map(to_entries[].value)'
jq -s 'map(to_entries[].value)'
In other words, the following (simplified/reduced) invocations seem identical:
jq '[.,inputs]'
jq -n '[inputs]'
jq -s '.'.
How are they different? Are there scenarios where one works, but the others don't? Did older versions of jq not support all of them? Is it performance related? Or simply a matter of readability and personal preference?
Bonus points (added later to the question): does the same hold true for the following programs?
jq '., inputs | to_entries[].value'
jq -n 'inputs | to_entries[].value'
jq -s '.[] | to_entries[].value'
jq 'to_entries[].value'
With jq '-n [inputs] ....' and jq '[.,inputs] ....', you are loading the whole file into memory.
A more memory-efficient way to achieve the result as an array is:
jq -n '[inputs | to_entries[].value]'
Those first three programs are equivalent, both functionally and in terms of resource utilization, but they obscure the difference between array-oriented and stream-oriented programming.
In a nutshell, think sed and awk. For more details, see e.g. my A Stream-oriented Introduction to jq, and i.p. the section On the importance of inputs.
Bonus points: does the same hold true for the following programs:
Referring to the last four numbered examples in the Q: (4), (5) and (7) are essentially equivalent; (6) is just silly.
If you're looking for a reason why all these variations exist, please bear in mind that input and inputs were late additions in the development of jq. Perhaps they were late additions because jq was originally envisioned as a very simple and for the most part "purely functional" language.
Adding even more cases for the sake of completeness:
From the manual:
--raw-input/-R:
Don't parse the input as JSON. Instead, each line of text is passed to the filter as a string. If combined with --slurp, then the entire input is passed to the filter as a single long string.
This means that on one hand
jq -R -n '[inputs]' and
jq -R '[., inputs]'
both produce an array of strings, as each item provided by inputs (and . if it wasn't silenced by -n) corresponds to a line of text from the input document(s), whereas on the other hand
jq -R -s '.'
slurps all characters from the input document(s) into exactly one long string, newlines included.

Unable to print value using jq with grep

I'm using below jq statement with grep in my code to print a value.
jq '.Subnets[0].Tags' subnet.txt | grep -q "${add}usea1 internal us-east"
This works fine for some values however, few values need grep to be "${add}use* internal us-east", can i use asterisk so that all my values can be printed.
I get error when i include asterisk. any suggestions?
You have not followed the mcve guidelines, but as #shellter pointed out, the problem description suggests you just have to use the proper (grep) regex:
grep -q "${add}use.* internal us-east"
However, since you are using jq in any case, it would probably be better to perform the filtering by extending the jq filter, for example as follows:
jq --arg add "$add" '
.Subnets[0].Tags
| select(test("\($add)use.* internal us-east"))
' subnet.txt

Parse a nested variable from YAML file in bash

A complex .yaml file from this link needs to be fed into a bash script that runs as part of an automation program running on an EC2 instance of Amazon Linux 2. Note that the .yaml file in the link above contains many objects, and that I need to extract one of the environment variables defined inside one of the many objects that are defined in the file.
Specifically, how can I extract the 192.168.0.0/16 value of the CALICO_IPV4POOL_CIDR variable into a bash variable?
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
I have read a lot of other postings and blog entries about parsing flatter, simpler .yaml files, but none of those other examples show how to extract a nested value like the value of CALICO_IPV4POOL_CIDR in this question.
As others are commenting, it is recommended to make use of yq (along with jq) if available.
Then please try the following:
value=$(yq -r 'recurse | select(.name? == "CALICO_IPV4POOL_CIDR") | .value' "calico.yaml")
echo "$value"
Output:
192.168.0.0/16
If you're able to install new dependencies, and are planning on dealing with lots of yaml files, yq is a wrapper around jq that can handle yaml. It'd allow a safe (non-grep) way of accessing nested yaml values.
Usage would look something like MY_VALUE=$(yq '.myValue.nested.value' < config-file.yaml)
Alternatively, How can I parse a YAML file from a Linux shell script? has a bash-only parser that you could use to get your value.
The right way to do this is to use a scripting language and a YAML parsing library to extract the field you're interested in.
Here's an example of how to do it in Python. If you were doing this for real you'd probably split it out into multiple functions and have better error reporting. This is literally just to illustrate some of the difficulties caused by the format of calico.yaml, which is several YAML documents concatenated together, not just one. You also have to loop over some of the lists internal to the document in order to extract the field you're interested in.
#!/usr/bin/env python3
import yaml
def foo():
with open('/tmp/calico.yaml', 'r') as fil:
docs = yaml.safe_load_all(fil)
doc = None
for candidate in docs:
if candidate["kind"] == "DaemonSet":
doc = candidate
break
else:
raise ValueError("no YAML document of kind DaemonSet")
l1 = doc["spec"]
l2 = l1["template"]
l3 = l2["spec"]
l4 = l3["containers"]
for containers_item in l4:
l5 = containers_item["env"]
env = l5
for entry in env:
if entry["name"] == "CALICO_IPV4POOL_CIDR":
return entry["value"]
raise ValueError("no CALICO_IPV4POOL_CIDR entry")
print(foo())
However, sometimes you need a solution right now and shell scripts are very good at that.
If you're hitting an API endpoint, then the YAML will usually be pretty-printed so you can get away with extracting text in ways that won't work on arbitrary YAML.
Something like the following should be fairly robust:
cat </tmp/calico.yaml | grep -A1 CALICO_IPV4POOL_CIDR | grep value: | cut -d: -f2 | tr -d ' "'
Although it's worth checking at the end with a regex that the extracted value really is valid IPv4 CIDR notation.
The key thing here is grep -A1 CALICO_IPV4POOL_CIDR .
The two-element dictionary you mentioned (shown below) will always appear as one chunk since it's a subtree of the YAML document.
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
The keys in calico.yaml are not sorted alphabetically in general, but in {"name": <something>, "value": <something else>} constructions, name does consistently appear before value.
MYVAR=$(\
curl https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml | \
grep -A 1 CALICO_IPV4POOL_CIDR | \
grep value | \
cut -d ':' -f2 | \
tr -d ' "')
Replace curl https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml with however you're sourcing the file. That gets piped to grep -A 1 CALICO_IPV4POOL_CIDR. This gives you 2 lines of text: the name line, and the value line. That gets piped to grep value, which now gives us the line we want with just the value. That gets piped to cut -d ':' -f2 which uses the colon as a delimiter and gives us the second field. $(...) executes the enclosed script, and it is assigned to MYVAR. After this script, echo $MYVAR should produce 192.168.0.0/16.
You have two problems there:
How to read a YAML document from a file with multiple documents
How to select the key you want from that YAML document
I have guessed that you need the YAML document of kind 'DaemonSet' from reading Gregory Nisbett's answer.
I will try to only use tools that are likely to be already installed on your system because you mentioned you want to do this in a Bash script. I assume you have JQ because it is hard to do much in Bash without it!
For the YAML library I tend to use Ruby for this because:
Most systems have a Ruby
Ruby's Psych library has been bundled since Ruby 1.9
The PyYAML library in Python is a bit inflexible and sometimes broken compared to Ruby's in my experience
The YAML library in Perl is often not installed by default
It was suggested to use yq, but that won't help so much in this case because you still need a tool that can extract the YAML document.
Having extracted the document I am going to again use Ruby to save the file as JSON. Then we can use jq.
Extracting the YAML document
To get the YAML document using Ruby and save it as JSON:
url=...
curl -s $url | \
ruby -ryaml -rjson -e \
"puts YAML.load_stream(ARGF.read)
.select{|doc| doc['kind']=='DaemonSet'}[0].to_json" \
| jq . > calico.json
Further explanation:
The YAML.load_stream reads the YAML documents and returns them all as an Array
ARGF.read reads from a file passed via STDIN
Ruby's select allows easy selection of the YAML document according to its kind key
Then we take the element 4 and convert to JSON.
I pass that response through jq . so that it's formatted for human readability but that step isn't really necessary. I could do the same in Ruby but I'm guessing you want Ruby code kept to a minimum.
Selecting the key you want
To select the key you want the following JQ query can be used:
jq -r \
'.spec.template.spec.containers[].env[] | select(.name=="CALICO_IPV4POOL_CIDR") | .value' \
calico.json
Further explanation:
The first part spec.template.spec.containers[].env[] iterates for all containers and for all envs inside them
Then we select the Hash where the name key equals CALICO_IPV4POOL_CIDR and return the value
The -r removes the quotes around the string
Putting it all together:
#!/usr/bin/env bash
url='https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml'
curl -s $url | \
ruby -ryaml -rjson -e \
"puts YAML.load_stream(ARGF.read)
.select{|doc| doc['kind']=='DaemonSet'}[0].to_json" \
| jq . > calico.json
jq -r \
'.spec.template.spec.containers[].env[] | select(.name=="CALICO_IPV4POOL_CIDR") | .value' \
calico.json
Testing:
▶ bash test.sh
192.168.0.0/16

Parse JQ output through external bash function?

I want to parse out data out of a log file which consist of JSON sting and I wonder if there's a way for me to use a bash function to perform any custom parsing instead of overloading jq command.
Command:
tail errors.log --follow | jq --raw-output '. | [.server_name, .server_port, .request_file] | #tsv'
Outputs:
8.8.8.8 80 /var/www/domain.com/www/public
I want to parse 3rd column to cut the string to exclude /var/www/domain.com part where /var/www/domain.com is the document root, and /var/www/domain.com/subdomain/public is the public html section of the site. Therefore I would like to leave my output as /subdomain/public (or from the example /www/public).
I wonder if I can somehow inject a bash function to parse .request_file column? Or how would I do that using jq?
I'm having issues piping out the output of any part of this command that would allow me to do any sort of string manipulation.
Use a BashFAQ #1 while read loop to iterate over the lines, and a BashFAQ #100 parameter expansion to perform the desired modifications:
tail -f -- errors.log \
| jq --raw-output --unbuffered \
'[.server_name, .server_port, .request_file] | #tsv' \
| while IFS=$'\t' read -r server_name server_port request_file; do
printf '%s\t%s\t%s\n' "$server_name" "$server_port" "/${request_file#/var/www/*/}"
done
Note the use of --unbuffered, to force jq to flush its output lines immediately rather than buffering them. This has a performance penalty (so it's not default), but it ensures that you get output immediately when reading from a potentially-slow input source.
That said, it's also easy to remove a prefix in jq, so there's no particular reason to do the above:
tail -f -- errors.log | jq -r '
def withoutPrefix: sub("^([/][^/]+){3}"; "");
[.server_name, .server_port, (.request_file | withoutPrefix)] | #tsv'

How to use `jq` in a shell pipeline?

I can't seem to get jq to behave "normally" in a shell pipeline. For example:
$ curl -s https://api.github.com/users/octocat/repos | jq | cat
results in jq simply printing out its help text*. The same thing happens if I try to redirect jq's output to a file:
$ curl -s https://api.github.com/users/octocat/repos | jq > /tmp/stuff.json
Is jq deliberately bailing out if it determines that it's not being run from a tty? How can I prevent this behavior so that I can use jq in a pipeline?
Edit: it looks like this is no longer an issue in recent versions of jq. I have jq-1.6 now and the examples above work as expected.
* (I realize this example contains a useless use of cat; it's for illustration purposes only)
You need to supply a filter as an argument. To pass the JSON through unmodified other than the pretty printing jq provides by default, use the identity filter .:
curl -s https://api.github.com/users/octocat/repos | jq '.' | cat
One use case I have found myself doing frequently as well is "How do I construct JSON data to supply into other shell commands, for example curl?" The way I do this is by using the --null-input/-n option:
Don’t read any input at all! Instead, the filter is run once using null as the input. This is useful when using jq as a simple calculator or to construct JSON data from scratch.
And an example passing it into curl:
jq -n '{key: "value"}' | curl -d #- \
--url 'https://some.url.com' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json'

Resources