yq v4: print all key value pairs with full key path - yaml

I'm trying to determine the correct syntax for using yq to print all key/value pairs from a given yaml input using yq v4 - with the desired output having the full key "path". This was possible using v3 such as this:
$ cat << EOF | yq r -p pv - '**'
> a:
> b: foo
> c: bar
> EOF
a.b: foo
a.c: bar
but I'm having difficulty wrapping my head around the new syntax.
Any help is greatly appreciated.

$ cat << EOF | yq e '.. | select(. == "*") | {(path | join(".")): .} ' -
> a:
> b: foo
> c: bar
> EOF
a.b: foo
a.c: bar
What does this do? Let's go over it:
.. recursively select all values
select(. == "*") filter for scalar values (i.e. filter out the value of a)
(path | join(".")) gets the path as array and joins the elements with .
{…: .} create a mapping, having the joined paths as keys and their values as values
Edit: to get sequence indexes in square brackets ([0] etc), do
$ cat << EOF | yq e '.. | select(. == "*") | {(path | . as $x | (.[] | select((. | tag) == "!!int") |= (["[", ., "]"] | join(""))) | $x | join(".") | sub(".\[", "[")): .} ' -
This seems like there should be a simpler way to do it, but I don't know yq well enough to figure it out.

Related

How to get the index of element using jq

I want to get the index of a element from the array.
Link: https://api.github.com/repos/checkstyle/checkstyle/releases
I want to fetch the tag_name.
I generally use this command to get my latest release tag:
LATEST_RELEASE_TAG=$(curl -s https://api.github.com/repos/checkstyle/checkstyle/releases/latest \
| jq ".tag_name")
But now I want previous releases too so I want the index of a particular tag name. For eg: tag_name = "10.3.1"
Also, I am thinking to use mathematical reduction to get the other previous release if I get a particular index number like:
( 'index of 10.3.1' - 1) Any thought regarding this?
Just index of "checkstyle-10.3.1":
curl -s https://api.github.com/repos/checkstyle/checkstyle/releases | jq '[.[].tag_name] | to_entries | .[] | select(.value=="checkstyle-10.3.1") | .key'
Release before "checkstyle-10.3.1":
curl -s https://api.github.com/repos/checkstyle/checkstyle/releases | jq '([.[].tag_name] | to_entries | .[] | select(.value=="checkstyle-10.3.1") | .key) as $index | .[$index+1]'
Release "checkstyle-10.3.1"
curl -s https://api.github.com/repos/checkstyle/checkstyle/releases | jq '.[] | select(.tag_name=="checkstyle-10.3.1")'
Here's a general-purpose function to get the index of something in an array:
# Return the 0-based index of the first item in the array input
# for which f is truthy, else null
def index_by(f):
first(foreach .[] as $x (-1;
.+1;
if ($x|f) then . else empty end)) // null;
This can be used to retrieve the item immediately before a target as follows:
jq '
def index_by(f): first(foreach .[] as $x (-1; .+1; if ($x|f) then . else empty end)) // null;
index_by(.tag_name == "checkstyle-10.3.1") as $i
| if $i then .[$i - 1] else empty end
'

Unable to find match with yq in for loop

I have a yaml file called teams.yml that looks like this:
test:
bonjour: hello
loop:
bonjour: allo
I want to loop over an array and get itsvalues matching a key in the yaml. I have tried yq e 'keys | .[] | select(. == "test")' .github/teams.yml and it returns test whereas yq e 'keys | .[] | select(. == "abc")' .github/teams.yml returns nothing which is enough to get the information I am interested in.
The issue is that, when using the same logic in a for loop, yq returns nothing:
#!/bin/bash
yq e 'keys | .[] | select(. == "abc")' .github/teams.yml # Prints nothing
yq e 'keys | .[] | select(. == "test")' .github/teams.yml # Prints 'test'
ar=( "abc" "test" "xyz" )
for i in "${ar[#]}" # Prints nothing
do
yq e 'keys | .[] | select(. == "$i")' .github/teams.yml
done
What explains the lack of output in the for loop?
Call yq with the loop variable set in the environment, and use env within yq to read environment variables. See also the section Env Variable Operators in the manual.
ar=( "abc" "test" "xyz" )
for i in "${ar[#]}" # Prints nothing
do
i="$i" yq e 'keys | .[] | select(. == env(i))' .github/teams.yml
done
Another way could be to import the (preformatted) array from the environment and just make the array subtractions within yq (saving you the looping and calling yq multiple times):
ar='["abc","test","xyz"]' yq e 'env(ar) - (env(ar) - keys) | .[]' .github/teams.yml

Upsert a csv file from a second file using bash

I have a main csv file with records (file1). I then have a second "delta" csv file (file2). I would like to update the main file with the records from the delta file using bash. Existing records should get the new value (replace the row) and new records should be appended.
Example file1
unique_id|value
'1'|'old'
'2'|'old'
'3'|'old'
Example file2
unique_id|value
'1'|'new'
'4'|'new'
Desired outcome
unique_id|value
'1'|'new'
'2'|'old'
'3'|'old'
'4'|'new'
awk -F '|' '
! ($1 in rows){ ids[id_count++] = $1 }
{ rows[$1] = $0 }
END {
for(i=0; i<id_count; i++)
print rows[ids[i]]
}
' old.csv new.csv
Output:
unique_id|value
'1'|'new'
'2'|'old'
'3'|'old'
'4'|'new'
Similar approach using perl
perl -F'\|' -lane '
$id = $F[0];
push #ids, $id unless exists $rows{$id};
$rows{$id} = $_;
END { print $rows{$_} for #ids }
' old.csv new.csv
You could also use an actual database e.g. sqlite
sqlite> create table old (unique_id text primary key, value text);
sqlite> create table new (unique_id text primary key, value text);
# skip headers
sqlite> .sep '|'
sqlite> .import --skip 1 new.csv new
sqlite> .import --skip 1 old.csv old
sqlite> select * from old;
'1'|'old'
'2'|'old'
'3'|'old'
sqlite> insert into old
select * from new where true
on conflict(unique_id)
do update set value=excluded.value;
sqlite> select * from old;
'1'|'new'
'2'|'old'
'3'|'old'
'4'|'new'
I immediately thought of join, but you cannot specify "take this column if there's a match, otherwise use another column, and have either output end up in a single column".
For command-line processing of CSV files, I really like GoCSV. It has its own CSV-aware join command—which is also limited like join (above)—and it has other commands that we can chain together to produce the desired output.
GoCSV uses a streaming/buffered reader/writer for as many subcommands as it can. Every command but join operates in this buffered-in-buffered-out fashion, but join needs to read both sides in total to match. Still, GoCSV is compiled and just really, really fast.
All GoCSV commands read the delimiter to use from the GOCSV_DELIMITER environment variable, so your first order of business is to export that for your pipe delimiter:
export GOCSV_DELIMITER='|'
Joining is easy, just specify the columns from either file to use as the key. I'm also going to rename the columns now so that we're set up for the conditional logic in the next step. If your columns vary from file to file, you'll want to rename each set of columns first, before you join.
I'm telling gocsv join to pick the first columns from both files, -c 1,1 and use an outer join to keep both left and right sides, regardless of match:
gocsv join -c 1,1 -outer file1.csv file2.csv \
| gocsv rename -c 1,2,3,4 -names 'id_left','val_left','id_right','val_right'
| id_left | val_left | id_right | val_right |
|---------|----------|----------|-----------|
| 1 | old | 1 | new |
| 2 | old | | |
| 3 | old | | |
| | | 4 | new |
There's no way to change a value in an existing column based on another column's value, but we can add new columns and use a templating language to define the logic we need.
The following syntax creates two new columns, id_final and val_final. For both columns, if there's a value in val_right that value is used, otherwise val_left is used. This, cominbed with the outer-join of "left then right" from before, gives us the effect of the right side updating/overwriting the left side if the IDs matched:
... \
| gocsv add -name 'id_final' -t '{{ if .id_right }}{{ .id_right }}{{ else }}{{ .id_left }}{{ end }}' \
| gocsv add -name 'val_final' -t '{{ if .val_right }}{{ .val_right }}{{ else }}{{ .val_left }}{{ end }}'
| id_left | val_left | id_right | val_right | id_final | val_final |
|---------|----------|----------|-----------|----------|-----------|
| 1 | old | 1 | new | 1 | new |
| 2 | old | | | 2 | old |
| 3 | old | | | 3 | old |
| | | 4 | new | 4 | new |
Finally, we can select just the "final" fields and rename them back to their original names:
... \
| gocsv select -c 'id_final','val_final' \
| gocsv rename -c 1,2 -names 'unique_id','value'
| unique_id | value |
|-----------|-------|
| 1 | new |
| 2 | old |
| 3 | old |
| 4 | new |
GoCSV has pre-built binaries for modern platforms.
I use Miller and run
mlr --csv --fs "|" join --ul --ur -j unique_id --lp "l#" --rp "r#" -f 01.csv \
then put 'if(is_not_null(${r#value})){$value=${r#value}}else{$value=$value}' \
then cut -x -r -f '#' 02.csv
and I have
unique_id|value
'1'|'new'
'4'|'new'
'2'|'old'
'3'|'old'
I run a full join and I use an if condition, to check if I have value on the right. If I have it, I use it.

yq v4 get root keys based on existence of deeper keys

I have this structure:
foo:
image: 123
bar:
image: 456
baz:
config: "my config"
and I'd like to print the root keys (i.e. foo, bar, baz) based on the existence of the child "image"
In yq version 3 I could do this:
$ yq read test.yaml --printMode p "*.image" | awk -F'.' '{print $1}'
foo
bar
But I can't find the equivalent in v4. The yq + jq solution would be:
$ yq -j e test.yaml | jq -r 'to_entries[] | select(.value | has("image")) | [.key][]'
foo
bar
Any idea how to do this with yq v4?
You can use the path operator to get the path of the matching object containing the tag image
yq e '.[] | select(has("image")) | path | .[]' yaml

Replace string in Nth array

I have a .txt file with strings in arrays which looks like these:
id | String1 | String2 | Counts
1 | Abc | Abb | 0
2 | Cde | Cdf | 0
And i want to add counts, so i need to replace last digit, but i need to change it only for the one line.
I am getting new needed value by this function:
$(awk -F "|" -v i=$idOpen 'FNR == i { gsub (" ", "", $0); print $4}' filename)"
And them I want to replace it with new value, which will be bigger for 1.
And im doing it right in there.
counts=(("$(awk -F "|" -v i=$idOpen 'FNR == i { gsub (" ", "", $0); print $4}' filename)"+1))
Where IdOpen is an id of the array, where i need to replace string.
So i have tried to replace the whole array by these:
counter="$(awk -v i=$idOpen 'BEGIN{FNqR == i}{$7+=1} END{ print $0}' bookmarks)"
N=$idOpen
sed -i "{N}s/.*/${counter}" bookmarks
But it doesn't work!
So is there a way to replace only last string with value which i have got earlier?
As result i need to get:
id | String1 | String2 | Counts
1 | Abc | Abb | 1 # if idOpen was 1 for 1 time
2 | Cde | Cdf | 2 # if idOpen was 2 for 2 times
And the last number will be increased by 1 everytime when i will activate these commands.
awk solution:
setting idOpen variable(for ex. 2):
idOpen=2
awk -F'|' -v i=$idOpen 'NR>1{if($1 == i) $4=" "$4+1}1' OFS='|' file > tmp && mv tmp file
The output(after executing the above command twice):
cat file
id | String1 | String2 | Counts
1 | Abc | Abb | 0
2 | Cde | Cdf | 2
NR>1 - skipping the header line

Resources