JQ issues with comments on Json file - bash

I'm using JQ https://stedolan.github.io/jq/ to work in bash with my json and when I read the json is throwing me an error
parse error: Invalid numeric literal at line 2, column 5=
Since my json has some comments
// comment
"spawn": {}
I've been seen looking the options and I cannot find any option to fix the problem. Any idea how to solve it?

JSON and thus jq do not support comments (in the usual sense) in JSON input. The jq FAQ lists a number of tools that can be used to remove comments, including jsonlint, json5, and any-json. I'd recommend one that can act as a filter.
See https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json for links and further details.
——
It might be worth pointing out that jq can be used to process JSON with #-style comments, at least if the JSON is not too large to be processed as a jq program. For example, you could use jq with the -f option to read a JSON file as a jq program.

Remove them; JSON does not support comments.
(JSON is defined here; you can see a briefer description of the grammar here.)

I found https://github.com/sindresorhus/strip-json-comments-cli which allows you to do:
cat my_json_with_comments.json | strip-json-comments | jq .

Can be stripped out using sed, eg to remove lines beginning with '//':
cat test.json | sed 's/^ *\/\/.*//' | jq <>commands>
sed is a pass-through/stream editor, in this case it's substituting nothing ( // ) for lines that begin with '//'; '//' must be escaped with a backslash character since the '/' is used by sed as a delimiter.

With this sed, you can remove:
Empty lines
Comments, even in the format "key": "value" //my comment
Eventually, json will be output, which can be processed without problems using jq
sed '/^[[:blank:]]*#/d;s/\/\/.*//' my.json | jq '.<your_block>'
reference: https://unix.stackexchange.com/a/157619

Related

Regex: match only string C that is in between string A and string B

How can I write a regex in a shell script that would target only the targeted substring between two given values? Give the example
https://www.stackoverflow.com
How can I match only the ":" in between "https" and "//".
If possible please also explain the approach.
The context is that I need to prepare a file that would fetch a config from the server and append it to the .env file. The response comes as JSON
{
"GRAPHQL_URL": "https://someurl/xyz",
"PUBLIC_TOKEN": "skml2JdJyOcrVdfEJ3Bj1bs472wY8aSyprO2DsZbHIiBRqEIPBNg9S7yXBbYkndX2Lk8UuHoZ9JPdJEWaiqlIyGdwU6O5",
"SUPER_SECRET": "MY_SUPER_SECRET"
}
so I need to adjust it to the .env syntax. What I managed to do this far is
#!/bin/bash
CURL_RESPONSE="$(curl -s url)"
cat <<< ${CURL_RESPONSE} | jq -r '.property.source' | sed -r 's/:/=/g;s/[^a-zA-Z0-9=:_/-]//g' > .env.test
so basically I fetch the data, then extract the key I am after with jq, and then I use sed to first replace all ":" to "=" and after that I remove all the quotations and semicolons and white spaces that comes from JSON and leave some characters that are necessary.
I am almost there but the problem is that now my graphql url (and only other) would look like so
https=//someurl/xyz
so I need to replace this = that is in between https and // back with the colon.
Thank you very much #Nic3500 for the response, not sure why but I get error saying that
sed: 1: "s/:/=/g;s#https\(.*\)// ...": \1 not defined in the RE
I searched SO and it seems that it should work since the brackets are escaped and I use -r flag (tried -E but no difference) and I don't know how to apply it. To be honest I assume that the replacement block is this part
#\1#
so how can I let this know to what character should it be replaced?
This is how I tried to use it
#!/bin/bash
CURL_RESPONSE="$(curl -s url)"
cat <<< ${CURL_RESPONSE} | jq -r '.property.source' | sed -r 's/:/=/g;s#https\(.*\)//.*#\1#;s/[^a-zA-Z0-9=:_/-]//g' > .env.test
Hope with this context you would be able to help me.
echo "https://www.stackoverflow.com" | sed 's#https\(.*\)//.*#\1#'
:
sed operator s/regexp/replacement/
regexp: https\(.*)//.*. So "https" followed by something (.*), followed by "//", followed by anything else .*
the parenthesis are back slashed since they are not part of the pattern. They are used to group a part of the regex for the replacement part of the s### operator.
replacement: \1, means the first group found in the regex \(.*\)
I used s###, but the usual form is s///. Any character can take the place of the / with the s operator. I used # as using / would have been confusing since you use / in the url.
The problem is that your sed substitutions are terribly imprecise. Anyway, you want to do it in jq instead, where you have more control over which parts you are substituting, and avoid spawning a separate process for something jq quite easily does natively in the first place.
curl -s url |
jq -r '.property.source | to_entries[] |
"\(.key)=\"\(.value\)\""' > .env.test
Tangentially, capturing the output of curl into a variable just so you can immediately cat it once to standard output is just a waste of memory.

Difference between slurp, null input, and inputs filter

Given the input document:
{"a":1}
{"b":2}
{"c":3,"d":4}
What is the difference between the following jq programs (if any)? They all seem to produce the same output.
jq '[., inputs] | map(to_entries[].value)'
jq -n '[inputs] | map(to_entries[].value)'
jq -s 'map(to_entries[].value)'
In other words, the following (simplified/reduced) invocations seem identical:
jq '[.,inputs]'
jq -n '[inputs]'
jq -s '.'.
How are they different? Are there scenarios where one works, but the others don't? Did older versions of jq not support all of them? Is it performance related? Or simply a matter of readability and personal preference?
Bonus points (added later to the question): does the same hold true for the following programs?
jq '., inputs | to_entries[].value'
jq -n 'inputs | to_entries[].value'
jq -s '.[] | to_entries[].value'
jq 'to_entries[].value'
With jq '-n [inputs] ....' and jq '[.,inputs] ....', you are loading the whole file into memory.
A more memory-efficient way to achieve the result as an array is:
jq -n '[inputs | to_entries[].value]'
Those first three programs are equivalent, both functionally and in terms of resource utilization, but they obscure the difference between array-oriented and stream-oriented programming.
In a nutshell, think sed and awk. For more details, see e.g. my A Stream-oriented Introduction to jq, and i.p. the section On the importance of inputs.
Bonus points: does the same hold true for the following programs:
Referring to the last four numbered examples in the Q: (4), (5) and (7) are essentially equivalent; (6) is just silly.
If you're looking for a reason why all these variations exist, please bear in mind that input and inputs were late additions in the development of jq. Perhaps they were late additions because jq was originally envisioned as a very simple and for the most part "purely functional" language.
Adding even more cases for the sake of completeness:
From the manual:
--raw-input/-R:
Don't parse the input as JSON. Instead, each line of text is passed to the filter as a string. If combined with --slurp, then the entire input is passed to the filter as a single long string.
This means that on one hand
jq -R -n '[inputs]' and
jq -R '[., inputs]'
both produce an array of strings, as each item provided by inputs (and . if it wasn't silenced by -n) corresponds to a line of text from the input document(s), whereas on the other hand
jq -R -s '.'
slurps all characters from the input document(s) into exactly one long string, newlines included.

Unable to print value using jq with grep

I'm using below jq statement with grep in my code to print a value.
jq '.Subnets[0].Tags' subnet.txt | grep -q "${add}usea1 internal us-east"
This works fine for some values however, few values need grep to be "${add}use* internal us-east", can i use asterisk so that all my values can be printed.
I get error when i include asterisk. any suggestions?
You have not followed the mcve guidelines, but as #shellter pointed out, the problem description suggests you just have to use the proper (grep) regex:
grep -q "${add}use.* internal us-east"
However, since you are using jq in any case, it would probably be better to perform the filtering by extending the jq filter, for example as follows:
jq --arg add "$add" '
.Subnets[0].Tags
| select(test("\($add)use.* internal us-east"))
' subnet.txt

How to keep double quotes in variables in shell

I use jq to process json str, but the shell does not retain double quotes. I can't add an escape character because the json str is externally sent.I want to keep the double quotes of the original string。
The json string is dynamically generated, the data content is undefined, I can't use sed to add double quotes
# The json_str is externally sent.
# Assume that the content is "{"name": "John", "age": 0}"
# I want get the name
echo "$json_str" | jq -r ".name"
I expect the output is "John", but the actual output is
parse error: Invalid literal at line 1, column 6
You can either use single quotes
json_str='{"name", "John", "age": 22}'
or escape the double quotes
json_str="{\"name\", \"John\", \"age\": 22}"
Note that this answer applies to the original version of the question.
I expect the output is "John"
Apart from the error introduced by your test case, the use of option -r is the issue:
· --raw-output / -r:
With this option, if the filter´s result is a string then it will
be written directly to standard output rather than being formatted
as a JSON string with quotes.
If you don't want colored output, you can use -M instead:
· --colour-output / -C and --monochrome-output / -M:
By default, jq outputs colored JSON if writing to a terminal. You
can force it to produce color even if writing to a pipe or a file
using -C, and disable color with -M.
The quote in the JSON example are part of the JSON layout, not the content of the files.
When you want to have the quotes, you can use
echo "${json_str}" | jq -r ".name" | sed 's/.*/"&"/'
or
name=$(echo "${json_str}" | jq -r ".name" | sed 's/.*/"&"/')

Parse a nested variable from YAML file in bash

A complex .yaml file from this link needs to be fed into a bash script that runs as part of an automation program running on an EC2 instance of Amazon Linux 2. Note that the .yaml file in the link above contains many objects, and that I need to extract one of the environment variables defined inside one of the many objects that are defined in the file.
Specifically, how can I extract the 192.168.0.0/16 value of the CALICO_IPV4POOL_CIDR variable into a bash variable?
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
I have read a lot of other postings and blog entries about parsing flatter, simpler .yaml files, but none of those other examples show how to extract a nested value like the value of CALICO_IPV4POOL_CIDR in this question.
As others are commenting, it is recommended to make use of yq (along with jq) if available.
Then please try the following:
value=$(yq -r 'recurse | select(.name? == "CALICO_IPV4POOL_CIDR") | .value' "calico.yaml")
echo "$value"
Output:
192.168.0.0/16
If you're able to install new dependencies, and are planning on dealing with lots of yaml files, yq is a wrapper around jq that can handle yaml. It'd allow a safe (non-grep) way of accessing nested yaml values.
Usage would look something like MY_VALUE=$(yq '.myValue.nested.value' < config-file.yaml)
Alternatively, How can I parse a YAML file from a Linux shell script? has a bash-only parser that you could use to get your value.
The right way to do this is to use a scripting language and a YAML parsing library to extract the field you're interested in.
Here's an example of how to do it in Python. If you were doing this for real you'd probably split it out into multiple functions and have better error reporting. This is literally just to illustrate some of the difficulties caused by the format of calico.yaml, which is several YAML documents concatenated together, not just one. You also have to loop over some of the lists internal to the document in order to extract the field you're interested in.
#!/usr/bin/env python3
import yaml
def foo():
with open('/tmp/calico.yaml', 'r') as fil:
docs = yaml.safe_load_all(fil)
doc = None
for candidate in docs:
if candidate["kind"] == "DaemonSet":
doc = candidate
break
else:
raise ValueError("no YAML document of kind DaemonSet")
l1 = doc["spec"]
l2 = l1["template"]
l3 = l2["spec"]
l4 = l3["containers"]
for containers_item in l4:
l5 = containers_item["env"]
env = l5
for entry in env:
if entry["name"] == "CALICO_IPV4POOL_CIDR":
return entry["value"]
raise ValueError("no CALICO_IPV4POOL_CIDR entry")
print(foo())
However, sometimes you need a solution right now and shell scripts are very good at that.
If you're hitting an API endpoint, then the YAML will usually be pretty-printed so you can get away with extracting text in ways that won't work on arbitrary YAML.
Something like the following should be fairly robust:
cat </tmp/calico.yaml | grep -A1 CALICO_IPV4POOL_CIDR | grep value: | cut -d: -f2 | tr -d ' "'
Although it's worth checking at the end with a regex that the extracted value really is valid IPv4 CIDR notation.
The key thing here is grep -A1 CALICO_IPV4POOL_CIDR .
The two-element dictionary you mentioned (shown below) will always appear as one chunk since it's a subtree of the YAML document.
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
The keys in calico.yaml are not sorted alphabetically in general, but in {"name": <something>, "value": <something else>} constructions, name does consistently appear before value.
MYVAR=$(\
curl https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml | \
grep -A 1 CALICO_IPV4POOL_CIDR | \
grep value | \
cut -d ':' -f2 | \
tr -d ' "')
Replace curl https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml with however you're sourcing the file. That gets piped to grep -A 1 CALICO_IPV4POOL_CIDR. This gives you 2 lines of text: the name line, and the value line. That gets piped to grep value, which now gives us the line we want with just the value. That gets piped to cut -d ':' -f2 which uses the colon as a delimiter and gives us the second field. $(...) executes the enclosed script, and it is assigned to MYVAR. After this script, echo $MYVAR should produce 192.168.0.0/16.
You have two problems there:
How to read a YAML document from a file with multiple documents
How to select the key you want from that YAML document
I have guessed that you need the YAML document of kind 'DaemonSet' from reading Gregory Nisbett's answer.
I will try to only use tools that are likely to be already installed on your system because you mentioned you want to do this in a Bash script. I assume you have JQ because it is hard to do much in Bash without it!
For the YAML library I tend to use Ruby for this because:
Most systems have a Ruby
Ruby's Psych library has been bundled since Ruby 1.9
The PyYAML library in Python is a bit inflexible and sometimes broken compared to Ruby's in my experience
The YAML library in Perl is often not installed by default
It was suggested to use yq, but that won't help so much in this case because you still need a tool that can extract the YAML document.
Having extracted the document I am going to again use Ruby to save the file as JSON. Then we can use jq.
Extracting the YAML document
To get the YAML document using Ruby and save it as JSON:
url=...
curl -s $url | \
ruby -ryaml -rjson -e \
"puts YAML.load_stream(ARGF.read)
.select{|doc| doc['kind']=='DaemonSet'}[0].to_json" \
| jq . > calico.json
Further explanation:
The YAML.load_stream reads the YAML documents and returns them all as an Array
ARGF.read reads from a file passed via STDIN
Ruby's select allows easy selection of the YAML document according to its kind key
Then we take the element 4 and convert to JSON.
I pass that response through jq . so that it's formatted for human readability but that step isn't really necessary. I could do the same in Ruby but I'm guessing you want Ruby code kept to a minimum.
Selecting the key you want
To select the key you want the following JQ query can be used:
jq -r \
'.spec.template.spec.containers[].env[] | select(.name=="CALICO_IPV4POOL_CIDR") | .value' \
calico.json
Further explanation:
The first part spec.template.spec.containers[].env[] iterates for all containers and for all envs inside them
Then we select the Hash where the name key equals CALICO_IPV4POOL_CIDR and return the value
The -r removes the quotes around the string
Putting it all together:
#!/usr/bin/env bash
url='https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml'
curl -s $url | \
ruby -ryaml -rjson -e \
"puts YAML.load_stream(ARGF.read)
.select{|doc| doc['kind']=='DaemonSet'}[0].to_json" \
| jq . > calico.json
jq -r \
'.spec.template.spec.containers[].env[] | select(.name=="CALICO_IPV4POOL_CIDR") | .value' \
calico.json
Testing:
▶ bash test.sh
192.168.0.0/16

Resources