JQ statement to build Json from csv - bash

I have a CSV file that I want to convert to a JSON file with the quotes from the CSV removed using JQ in a shell script.
Here is the CSV named input.csv:
1,"SC1","Leeds"
2,"SC2","Barnsley"
Here is the JQ extract:
jq --slurp --raw-input --raw-output \
'split("\n") | .[1:] | map(split(",")) |
map({
"ListElementCode": .[0],
"ListElement": "\(.[1]) \(.[2])
})' \
input.csv > output.json
this writes to output.json:
[
{
"ListElementCode": "1",
"ListElement": "\"SC1\" \"Leeds\""
},
{
"ListElementCode": "2",
"ListElement": "\"SC2\" \"Barnsley\""
}
]
Any idea how I can remove the quotes around the 2 text values that get put into the ListElement part?

To solve only the most immediate problem, one could write a function that strips quotes if-and-when they exist:
jq -n --raw-input --raw-output '
def stripQuotes: capture("^\"(?<content>.*)\"$").content // .;
[inputs | split(",") | map(stripQuotes) |
{
"ListElementCode": .[0],
"ListElement": "\(.[1]) \(.[2])"
}]
' <in.csv >out.json
That said, to really handle CSV correctly, you can't just split(","), but need to split only on commas that aren't inside quotes (and need to recognize doubled-up quotes as the escaped form of a single quote). Really, I'd use Python instead of jq for this job -- and of this writing, the jq cookbook agrees that native jq code is only suited for "trivially simple" CSV files.

As mentioned, a Ruby answer:
ruby -rjson -rcsv -e '
data = CSV.foreach(ARGV.shift)
.map do |row|
{
ListElementCode: row.first,
ListElement: row.drop(1).join(" ")
}
end
puts JSON.pretty_generate(data)
' input.csv
[
{
"ListElementCode": "1",
"ListElement": "SC1 Leeds"
},
{
"ListElementCode": "2",
"ListElement": "SC2 Barnsley"
}
]

Using a proper CSV/JSON parser in perl:
#!/usr/bin/env perl
use strict; use warnings;
use JSON::XS;
use Text::CSV qw/csv/;
# input.csv:
#1,"SC1","Leeds"
#2,"SC2","Barnsley"
my $vars = [csv in => 'input.csv'];
#use Data::Dumper;
#print Dumper $vars; # display the data structure
my $o = [ ];
foreach my $a (#{ $vars->[0] }) {
push #{ $o }, {
ListElementCode => $a->[0],
ListElement => $a->[1] . " " . $a->[2]
};
}
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;
print $coder->encode($o);
Output
[
{
"ListElement" : "SC1 Leeds",
"ListElementCode" : "1"
},
{
"ListElement" : "SC2 Barnsley",
"ListElementCode" : "2"
}
]

Here's an uncomplicated and efficient way to solve this particular problem:
jq -n --raw-input --raw-output '
[inputs
| split(",")
| { "ListElementCode": .[0],
"ListElement": "\(.[1]|fromjson) \(.[2]|fromjson)"
} ]' input.csv
Incidentally, there are many robust command-line CSV-to-JSON tools, amongst which I would include:
any-json (https://www.npmjs.com/package/any-json)
csv2json (https://github.com/fadado/CSV)

Related

shell script: Returning wrong output

In the given script, the nested key is not getting appended with the value. I could not figure out where the script is going wrong.
#!/bin/bash
echo "Add the figma json file path"
read path
figma_json="$(echo -e "${path}" | tr -d '[:space:]')"
echo $(cat $figma_json | jq -r '.color | to_entries[] | "\(.key):\(.value| if .value == null then .[] | .value else .value end)"')
Sample input:
{
"color": {
"white": {
"description": "this is just plain white color",
"type": "color",
"value": "#ffffffff",
"extensions": {
"org.lukasoppermann.figmaDesignTokens": {
"styleId": "S:40940df38088633aa746892469dd674de8b147eb,",
"exportKey": "color"
}
}
},
"gray": {
"50": {
"description": "",
"type": "color",
"value": "#fafafaff",
"extensions": {
"org.lukasoppermann.figmaDesignTokens": {
"styleId": "S:748a0078c39ca645fbcb4b2a5585e5b0d84e5fd7,",
"exportKey": "color"
}
}
}
}
}
}
Actual output:
white:#ffffffff gray:#fafafaff
Excepted output:
white:#ffffffff gray:50:#fafafaff
Full input file
Here's a solution using tostream instead of to_entries to facilitate simultaneous access to the full path and its value:
jq -r '
.color | tostream | select(.[0][-1] == "value" and has(1)) | .[0][:-1]+.[1:] | join(":")
' "$figma_json"
white:#ffffffff
gray:50:#fafafaff
Demo
An approach attempting to demonstrate bash best-practices:
#!/bin/bash
figma_json=$1 # prefer command-line arguments to prompts
[[ $figma_json ]] || {
read -r -p 'Figma JSON file path: ' path # read -r means backslashes not munged
figma_json=${path//[[:space:]]/} # parameter expansion is more efficient than tr
}
jq -r '
def recurse_for_value($prefix):
to_entries[]
| .key as $current_key
| .value?.value? as $immediate_value
| if $immediate_value == null then
.value | recurse_for_value(
if $prefix != "" then
$prefix + ":" + $current_key
else
$current_key
end
)
else
if $prefix == "" then
"\($current_key):\($immediate_value)"
else
"\($prefix):\($current_key):\($immediate_value)"
end
end
;
.color |
recurse_for_value("")
' "$figma_json"

jq format when running from a bash script with variable expansion

I've got a jq command that works when running directly from the shell or from within a shell script, but when I try to add variable expansion, I get jq errors for unexpected format or invalid characters. My goal is to have a quick and easy way to update some json configuration.
Here's a simplified example.
The format of the json I'm modifying:
{
"pets": {
"some-new-pet": {
"PetInfo": {
"name": "my-brand-new-pet",
"toys": [
"toy1-postfix",
"toy2-postfix",
"toy3-postfix"
]
}
}
}
}
The jq without variable expansion:
cat myfile.json | jq '.pets."some-new-pet" += {PetInfo: {name: "my-brand-new-pet"}, toys: ["toy1", "toy2", "toy3"]}}'
The above runs fine, and adds the new pets.some-new-pet entry to my json.
Below is what I'm trying to do with variable expansion that fails.
jq_args = "'.pets.\"${PET}\" += {PetInfo: {name: \"${NAME}\"}, toys: [\"${toy1}-postfix\", \"${toy2}-postfix\", \"${toy3}-postfix\"]}}'"
cat myfile.json | jq $jq_args
The error message I get with the above:
jq: error: syntax error, unexpected INVALID_CHARACTER, expecting $end (Unix shell quoting issues?) at <top-level>, line 1: '.pets."some-new-pet"
My file is formatted as utf-8 and uses LF line endings.
I do not recommend constructing a jq filter using variable expansion or printf. It will work for simple cases but will fail if the string contains double quotes, backslashes or control-codes, as they have special meanings inside a JSON string. As an alternative to using printf, jq has a way to pass in variables directly via the command-line, avoiding all these issues.
pet='some-second-pet'
name='my-even-newer'
toy1=toy1
toy2=toy2
toy3=toy3
jq \
--arg pet "$pet" \
--arg name "$name" \
--arg toy1 "$toy1" \
--arg toy2 "$toy2" \
--arg toy3 "$toy3" \
'.pets[$pet] += {
PetInfo: {name: $name},
toys: ["\($toy1)-postfix", "\($toy2)-postfix", "\($toy3)-postfix"]
}' \
myfile.json
Output:
{
"pets": {
"some-new-pet": {
"PetInfo": {
"name": "my-brand-new-pet",
"toys": [
"toy1-postfix",
"toy2-postfix",
"toy3-postfix"
]
}
},
"some-second-pet": {
"PetInfo": {
"name": "my-even-newer-pet"
},
"toys": [
"toy1-postfix",
"toy2-postfix",
"toy3-postfix"
]
}
}
}
It would be cleaner and less error prone to format the string using printf
PET='dog'
NAME='sam'
toy1="t1"
toy2="t2"
toy3="t3"
jq_args=$(printf '.pets."%s" += {PetInfo: {name: "%s"}, toys: ["%s-postfix", "%s-postfix", "%s-postfix"]}}' "${PET}" "${NAME}" "${toy1}" "${toy2}" "${toy3}")
echo "$jq_args"
Result:
.pets."dog" += {PetInfo: {name: "sam"}, toys: ["t1-postfix", "t2-postfix", "t3-postfix"]}
Additionally, redundant quoting could be avoided by quoting the arg on this command
cat myfile.json | jq "$jq_args"
Fix your jq code by removing extra } at end
Fix bash jq call:
Add cotes "..." around your $jq_args
so don't use singles '...' in your jq_args definition
Use printf with -v option to define jq_args:
printf -v jq_args "...format..." value1 value2 ...
So your code became:
PET="some-new-pet"
NAME="my-brand-new-pet"
toy1="toy1"
toy2="toy2"
toy3="toy3"
format='.pets."%s" += {PetInfo: {name: "%s"}, toys: ["%s", "%s", "%s"]}'
printf -v jq_args "${format}" "${PET}" "${NAME}" "${toy1}" "${toy2}" "${toy3}"
cat myfile.json | jq "$jq_args"
Output:
{
"pets": {
"some-new-pet": {
"PetInfo": {
"name": "my-brand-new-pet"
},
"toys": [
"toy1",
"toy2",
"toy3"
]
}
}
}
Notes:
When you define your format, you put it into simple cotes '...'. It's really better to format JSON (or XML) without back-slashes (\\) before each double cotes (")
Use printf -v variable_name. It's more readable than var_name=$(printf ...)
By constructing the jq filter ("code") using outer bash variables ("data") you may run into escaping issues, which could eventually break or even divert your filter. (see https://en.wikipedia.org/wiki/Code_injection)
Instead, use mechanisms by jq to introduce external data through variables (parameter --arg):
jq --arg pet "${PET}" \
--arg name "${NAME}" \
--arg toy1 "${toy1}-postfix" \
--arg toy2 "${toy2}-postfix" \
--arg toy3 "${toy3}-postfix" \
'
.pets[$pet] += {PetInfo: {$name, toys: [$toy1,$toy2,$toy3]}}
' myfile.json
If you have an unknown number of variables to include, check out jq's --args parameter (note the additional s)

use bash string as jq filter

I don't understand what I'm doing wrong or why this does not work.
test.json file:
[
{
"Header": {
"Region": "US",
"Tenant": "Tenant1",
"Stage": "testing",
"ProductType": "old"
},
"Body": []
},
{
"Header": {
"Region": "EU",
"Tenant": "Tenant2",
"Stage": "development",
"ProductType": "new"
},
"Body": []
}
]
I want to display the values of the .Header.Tenant key. So the simple jq call does its job:
$ jq '[.[].Header.Tenant]' test.json
[
"Tenant1",
"Tenant2"
]
Now I want to assign that jq filter to a bash variable and use it with jq's --arg variable.
And I am getting this:
$ a=".[].Header.Tenant"; jq --arg xx "$a" '[$xx]' test.json
[
".[].Header.Tenant"
]
What is wrong?
jq does not have an eval function for evaluating arbitrary jq expressions, but it does provide functions that can be used to achieve much the same effect, the key idea being that certain JSON values can be used to specify query operations.
In your case, you would have to translate the jq query into a suitable jq operation, such as:
jq --argjson a '["Header","Tenant"]' '
getpath(paths|select( .[- ($a|length) :]== $a))
' test.json
Extending jq's JSON-based query language
More interestingly, you could write your own eval, e.g.
jq --argjson a '[[], "Header","Tenant"]' '
def eval($expr):
if $expr == [] then .
else $expr[0] as $op
| if $op == [] then .[] | eval($expr[1:])
else getpath([$op]) | eval($expr[1:])
end
end;
eval($a)
' test.json
With eval.jq as a module
If the above def of eval were put in a file, say ~/jq/eval.jq, then you could simply write:
jq -L ~/jq --argjson a '[[], "Header","Tenant"]' '
include "eval";
eval($a)' test.json
Or you could specify the search path in the jq program:
jq --argjson a '[[], "Header","Tenant"]' '
include "eval" { "search": "~/jq" };
eval($a)' input.json
Or you could use import ...
TLDR; The following code does the job:
$ a=".[].Header.Tenant"; jq -f <(echo "[$a]") test.json
[
"Tenant1",
"Tenant2"
]
One as well can add/modify the filter in the jq call, if needed:
$ a=".[].Header.Tenant"; jq -f <(echo "[$a]|length") test.json
2
Longer explanation
My ultimate goal was to figure out how I can define the lowest common denominator jq filter in a variable and use it when calling jq, plus add additional parameters if necessary. If you have a really complex jq filter spanning multiple lines that you call frequently, you probably want to template it somehow and use that template when calling jq.
While peak demonstrated how it can be done, I think it is overengineering the simple task.
However, using process substitution combined with the jq's -f option to read a filter from the file does solve my problem.

cannot call bash environment variable inside jq

In the below script, I am not able to successfully call the "repovar" variable in the jq command.
cat quayrepo.txt | while read line
do
export repovar="$line"
jq -r --arg repovar "$repovar" '.data.Layer| .Features[] | "\(.Name), \(.Version), $repovar"' severity.json > volume.csv
done
The script uses a text file to loop through the repo names
quayrepo.txt---> file has the list of names in this case the file has a value of "Reponame1"
sample input severity.json file:
{
"status": "scanned",
"data": {
"Layer": {
"IndexedByVersion": 3,
"Features": [
{
"Name": "elfutils",
"Version": "0.168-1",
"Vulnerabilities": [
{
"NamespaceName": "debian:9",
"Severity": "Medium",
"Name": "CVE-2016-2779"
}
]
}
]
}
}
}
desired output:
elfutils, 0.168-1, Medium, Reponame1
Required output: I need to retrieve the value of my environment variable as the last column in my output csv file
You need to surround $repovar with parenthesis, as the other values
repovar='qweqe'; jq -r --arg repovar "$repovar" '.data.Layer| .Features[] | "\(.Name), \(.Version), \($repovar)"' tmp.json
Result:
elfutils, 0.168-1, qweqe
There's no need for the export.
#!/usr/bin/env bash
while read line
do
jq -r --arg repovar "$line" '.data.Layer.Features[] | .Name + ", " + .Version + ", " + $repovar' severity.json
done < quayrepo.txt > volume.csv
with quayrepo.txt as
Reponame1
and severity.json as
{
"status": "scanned",
"data": {
"Layer": {
"IndexedByVersion": 3,
"Features": [
{
"Name": "elfutils",
"Version": "0.168-1",
"Vulnerabilities": [
{
"NamespaceName": "debian:9",
"Severity": "Medium",
"Name": "CVE-2016-2779"
}
]
}
]
}
}
}
produces volume.csv containing
elfutils, 0.168-1, Reponame1
To #peak's point, changing > to >> in ...severity.json >> volume.csv will create a multi-line csv instead of just overwriting until the last line
You don't need a while read loop in bash at all; jq itself can loop over your input lines, even when they aren't JSON, letting you run jq only once, not once per line in quayrepo.txt.
jq -rR --slurpfile inJson severity.json <quayrepo.txt >volume.csv '
($inJson[0].data.Layer | .Features[]) as $features |
[$features.Name, $features.Version, .] |
#csv
'
jq -R specifies raw input, letting jq directly read lines from quayrepo.txt into .
jq --slurpfile varname filename.json reads filename.json into an array of JSON objects parsed from that file. If the file contains only one object, one needs to refer to $varname[0] to refer to it.
#csv converts an array to a CSV output line, correctly handling data with embedded quotes or other oddities that require special processing.

How to manipulate a jq output using bash?

I have the following jq code snippet:
https://jqplay.org/s/QzOttRHoz1
I want to loop each element from the result array using bash such as the pseudo code shows:
#!/bin/bash
foreach result
print "My name is {name}, I'm {age} years old"
print "--"
The result would be:
My name is A, I'm 1 years old.
---
My name is B, I'm 2 years old.
---
My name is C, I'm 3 years old.
---
Of course this is a trivial example just to clarify that my goal is to manipulate each array from the jq result individually.
Any suggestions on how to write the pseudo code into valid bash statements?
Saving the json:
{
"Names": [
{ "Name": "A", "Age": "1" },
{ "Name": "B", "Age": "2" },
{ "Name": "C", "Age": "3" }
]
}
as /tmp/input.txt I can run:
</tmp/input.txt jq --raw-output 'foreach .Names[] as $name ([];[];$name | .Name, .Age )' \
| while read -r name && read -r age; do
printf "My name is %s, I'm %d years old.\n" "$name" "$age";
printf -- "--\n";
done
The --raw-output with | .Name, .Age just prints two lines per .Names array member, one with name and another with age. Then I read two lines at a time with while read && read and use that to loop through them.
If you rather have:
["A","1"]
["B","2"]
["C","3"]
that's sad, the best would be to write a full parser that would take strings like "\"" into account. Anyway then you can:
</tmp/input2.txt sed 's/^\[//;s/\]$//;' \
| while IFS=, read name age; do
name=${name%\"};
name=${name#\"};
age=${age%\"};
age=${age#\"};
printf "My name is %s, I'm %d years old.\n" "$name" "$age";
printf -- "--\n";
done
The first sed removed the leading and enclosing [ and ] in each line. Then I read two strings separated by , (so vars like "a,b","c,d" will be read incorrectly). Then these two strings are stripped of leading and enclosing ". Then the usuall printf is used to output the result.
I have a written a simple script to achieve what you need:
My Json file test.json which is similar to your snippet:
{
"Names": [
{ "Name": "A", "Age": "1" },
{ "Name": "B", "Age": "2" },
{ "Name": "C", "Age": "3" }
]
}
My script:
#!/bin/bash
for i in $(cat test.json | jq -r '.Names[] | #base64'); do
_jq() {
echo ${i} | base64 --decode | jq -r ${1}
}
echo "My Name is $(_jq '.Name'), I'm $(_jq '.Age') years old"
done
Note that foreach .Names[] as $name ([];[];$name | .Name, .Age )
can be simplified to:
.Names[] | ( .Name, .Age )
or even in this specific case to:
.Names[][]
or for that matter to:
.[][][]
The important point, however, is that foreach is not needed to achieve simple iteration.

Resources