Parse yaml file with varying number of key:values - bash

Long story short, I will be parsing yaml files in a directory with bash using yq. My yaml files could look like this:
CLIENT_FIRST_NAME: bob
CLIENT_LAST_NAME: smith
Or
CLIENT_FIRST_NAME: bob
CLIENT_LAST_NAME: smith
CLIENT_MIDDLE_NAME: michael
So I am looping through each file with a do loop and setting the variables to values
For example:
for f in $FILES
do
FIRSTNAME=$(yq r $f CLIENT_FIRST_NAME)
LASTNAME=$(yq r $f CLIENT_LAST_NAME)
add client --firstname=${FIRSTNAME} --lastname=${LASTNAME}
done
But sometimes I will have that middle name and I would need to include that:
add client --firstname=${FIRSTNAME} --lastname=${LASTNAME} --middlename=${MIDDLENAME}
The order doesn't matter, I just need to be able to account for additional fields that may show up in the yaml that need to be added to the 'add client' command. EVERY line in the yaml will be added to the command. Every key added will be a viable parameter for the 'add client' command. I don't have to worry about whether or not a key in the yaml is a valid parameter. They WILL be.
Curious on the best approach to the unknown here. Thanks!

I'm assuming yq returns nothing if it doesn't find a key.
I might make the entire flag based on whether yq returns something, like
for f in "${FILES[#]}"
do
FIRSTNAME=$(yq r "$f" CLIENT_FIRST_NAME)
MIDDLENAME=$(yq r "$f" CLIENT_MIDDLE_NAME)
LASTNAME=$(yq r "$f" CLIENT_LAST_NAME)
[[ -n $MIDDLENAME ]] && MIDDLENAME="--middlename=${MIDDLENAME}"
add client --firstname="${FIRSTNAME}" --lastname="${LASTNAME}" "${MIDDLENAME}"
done

This code would be far more efficient if you only ran yq once per input file, not once per data item per input file. Consider:
for f in *.yml; do
{ read -r firstname; read -r middlename; read -r lastname; } < <(
yq -r '(.CLIENT_FIRST_NAME, .CLIENT_MIDDLE_NAME // "", .CLIENT_LAST_NAME)' "$f"
)
add client \
--firstname="$firstname" \
${middlename:+--middlename="$middlename"} \
--lastname="$lastname"
done
Some notes to use in reading this:
Each read command in bash reads one line, when -d is not used to modify this.
The above yq command outputs one line per data item.
Using // "" causes the empty string, instead of null, to be used when no CLIENT_MIDDLE_NAME is found.
${foo:+...words here...} expands to ...words here... if-and-only-if foo is set to a non-empty value.

Related

Bash: Issue when iterating string with lines [duplicate]

I have a JSON data as follows in data.json file
[
{"original_name":"pdf_convert","changed_name":"pdf_convert_1"},
{"original_name":"video_encode","changed_name":"video_encode_1"},
{"original_name":"video_transcode","changed_name":"video_transcode_1"}
]
I want to iterate through the array and extract the value for each element in a loop. I saw jq. I find it difficult to use it to iterate. How can I do that?
Just use a filter that would return each item in the array. Then loop over the results, just make sure you use the compact output option (-c) so each result is put on a single line and is treated as one item in the loop.
jq -c '.[]' input.json | while read i; do
# do stuff with $i
done
By leveraging the power of Bash arrays, you can do something like:
# read each item in the JSON array to an item in the Bash array
readarray -t my_array < <(jq --compact-output '.[]' input.json)
# iterate through the Bash array
for item in "${my_array[#]}"; do
original_name=$(jq --raw-output '.original_name' <<< "$item")
changed_name=$(jq --raw-output '.changed_name' <<< "$item")
# do your stuff
done
jq has a shell formatting option: #sh.
You can use the following to format your json data as shell parameters:
cat data.json | jq '. | map([.original_name, .changed_name])' | jq #sh
The output will look like:
"'pdf_convert' 'pdf_convert_1'"
"'video_encode' 'video_encode_1'",
"'video_transcode' 'video_transcode_1'"
To process each row, we need to do a couple of things:
Set the bash for-loop to read the entire row, rather than stopping at the first space (default behavior).
Strip the enclosing double-quotes off of each row, so each value can be passed as a parameter to the function which processes each row.
To read the entire row on each iteration of the bash for-loop, set the IFS variable, as described in this answer.
To strip off the double-quotes, we'll run it through the bash shell interpreter using xargs:
stripped=$(echo $original | xargs echo)
Putting it all together, we have:
#!/bin/bash
function processRow() {
original_name=$1
changed_name=$2
# TODO
}
IFS=$'\n' # Each iteration of the for loop should read until we find an end-of-line
for row in $(cat data.json | jq '. | map([.original_name, .changed_name])' | jq #sh)
do
# Run the row through the shell interpreter to remove enclosing double-quotes
stripped=$(echo $row | xargs echo)
# Call our function to process the row
# eval must be used to interpret the spaces in $stripped as separating arguments
eval processRow $stripped
done
unset IFS # Return IFS to its original value
From Iterate over json array of dates in bash (has whitespace)
items=$(echo "$JSON_Content" | jq -c -r '.[]')
for item in ${items[#]}; do
echo $item
# whatever you are trying to do ...
done
Try Build it around this example. (Source: Original Site)
Example:
jq '[foreach .[] as $item ([[],[]]; if $item == null then [[],.[0]] else [(.[0] + [$item]),[]] end; if $item == null then .[1] else empty end)]'
Input [1,2,3,4,null,"a","b",null]
Output [[1,2,3,4],["a","b"]]
None of the answers here worked for me, out-of-the-box.
What did work was a combination of a few:
projectList=$(echo "$projRes" | jq -c '.projects[]')
IFS=$'\n' # Read till newline
for project in ${projectList[#]}; do
projectId=$(jq '.id' <<< "$project")
projectName=$(jq -r '.name' <<< "$project")
...
done
unset IFS
NOTE: I'm not using the same data as the question does, in this example assume projRes is the output from an API that gives us a JSON list of projects, eg:
{
"projects": [
{"id":1,"name":"Project"},
... // array of projects
]
}
An earlier answer in this thread suggested using jq's foreach, but that may be much more complicated than needed, especially given the stated task. Specifically, foreach (and reduce) are intended for certain cases where you need to accumulate results.
In many cases (including some cases where eventually a reduction step is necessary), it's better to use .[] or map(_). The latter is just another way of writing [.[] | _] so if you are going to use jq, it's really useful to understand that .[] simply creates a stream of values.
For example, [1,2,3] | .[] produces a stream of the three values.
To take a simple map-reduce example, suppose you want to find the maximum length of an array of strings. One solution would be [ .[] | length] | max.
Here is a simple example that works in zch shell:
DOMAINS='["google","amazon"]'
arr=$(echo $DOMAINS | jq -c '.[]')
for d in $arr; do
printf "Here is your domain: ${d}\n"
done
I stopped using jq and started using jp, since JMESpath is the same language as used by the --query argument of my cloud service and I find it difficult to juggle both languages at once. You can quickly learn the basics of JMESpath expressions here: https://jmespath.org/tutorial.html
Since you didn't specifically ask for a jq answer but instead, an approach to iterating JSON in bash, I think it's an appropriate answer.
Style points:
I use backticks and those have fallen out of fashion. You can substitute with another command substitution operator.
I use cat to pipe the input contents into the command. Yes, you can also specify the filename as a parameter, but I find this distracting because it breaks my left-to-right reading of the sequence of operations. Of course you can update this from my style to yours.
set -u has no function in this solution, but is important if you are fiddling with bash to get something to work. The command forces you to declare variables and therefore doesn't allow you to misspell a variable name.
Here's how I do it:
#!/bin/bash
set -u
# exploit the JMESpath length() function to get a count of list elements to iterate
export COUNT=`cat data.json | jp "length( [*] )"`
# The `seq` command produces the sequence `0 1 2` for our indexes
# The $(( )) operator in bash produces an arithmetic result ($COUNT minus one)
for i in `seq 0 $((COUNT - 1))` ; do
# The list elements in JMESpath are zero-indexed
echo "Here is element $i:"
cat data.json | jp "[$i]"
# Add or replace whatever operation you like here.
done
Now, it would also be a common use case to pull the original JSON data from an online API and not from a local file. In that case, I use a slightly modified technique of caching the full result in a variable:
#!/bin/bash
set -u
# cache the JSON content in a stack variable, downloading it only once
export DATA=`api --profile foo compute instance list --query "bar"`
export COUNT=`echo "$DATA" | jp "length( [*] )"`
for i in `seq 0 $((COUNT - 1))` ; do
echo "Here is element $i:"
echo "$DATA" | jp "[$i]"
done
This second example has the added benefit that if the data is changing rapidly, you are guaranteed to have a consistent count between the elements you are iterating through, and the elements in the iterated data.
This is what I have done so far
arr=$(echo "$array" | jq -c -r '.[]')
for item in ${arr[#]}; do
original_name=$(echo $item | jq -r '.original_name')
changed_name=$(echo $item | jq -r '.changed_name')
echo $original_name $changed_name
done

Reading filenames from a structured file to a bash script

I have a file with a structured list of filenames (file1.sh, file2.sh, ...) and would like to read loop the file names inside a bash script.
cat /home/flora/logs/9681-T13:17:07.091363777.org
%rec: dynamic
Ptrn: Gnu
File: /home/flora/comint.rc
+ /home/flora/engine.rc
+ /home/flora/playa.rc
+ /home/flora/edva.rc
+ /home/flora/dyna.rc
+ /home/flora/lin.rc
Have started with
while read -r fl; do
echo "$fl" | grep -oE '[/].+'
done < "$logfl"
But I want to be more specific by matching the File: , then continue reading the rest using + as a continuation character.
bash doesn't have impose a limit on variables (other than memory). That said, I would start by processing the list of lines one by one:
#!/bin/bash
while read _ f
do
process "$f"
done
where process is whatever function you need to implement.
If you want a variables use an array like this:
#!/bin/bash
while read _ f
do
files+=("$f")
done
In either case pass the input file to script with:
your_script < /home/flora/logs/27043-T13:09:44.893003954.log

Formatting need to change for text file in bash scripting

I have below output from a text file. This is long file i just copy here some rows only.
HP83904B74E6
13569.06
7705.509999999999
HP4DC2EECAA8
4175.1
2604.13
And i want to print it like below.
HP83904B74E6 13569.06 7705.509999999999
HP4DC2EECAA8 4175.1 2604.13
I have tried by reading the file line by live using while loop and try to store the value of variable e.g. variablename$i so that i can print it like variablename0 and after every 3 line i have used If statement to print the value of variablename0 variablename1 variablename2, but did not work for me.
Use pr:
$ pr -a3t tmp.txt
HP83904B74E6 13569.06 7705.509999999999
HP4DC2EECAA8 4175.1 2604.13
i have tried by reading the file line by live using while loop and try to store the value of variable e.g. variablename$i so that i can print it like variablename0 and after every 3 line i have used If statement to print the value of variablename0 variablename1 variablename2, but did not work for me. I am just learning bash.
while read -r a; do
read -r b;
read -r c;
echo "$a $b $c";
done < file
you get,
HP83904B74E6 13569.06 7705.509999999999
HP4DC2EECAA8 4175.1 2604.13

converting lines to json in bash

I would like to convert a list into JSON array. I'm looking at jq for this but the examples are mostly about parsing JSON (not creating it). It would be nice to know proper escaping will occur. My list is single line elements so the new line will probably be the best delimiter.
I was also trying to convert a bunch of lines into a JSON array, and was at a standstill until I realized that -s was the only way I could handle more than one line at a time in the jq expression, even if that meant I'd have to parse the newlines manually.
jq -R -s -c 'split("\n")' < just_lines.txt
-R to read raw input
-s to read all input as a single string
-c to not pretty print the output
Easy peasy.
Edit: I'm on jq ≥ 1.4, which is apparently when the split built-in was introduced.
--raw-input, then --slurp
Just summarizing what the others have said in a hopefully quicker to understand form:
cat /etc/hosts | jq --raw-input . | jq --slurp .
will return you:
[
"fe00::0 ip6-localnet",
"ff00::0 ip6-mcastprefix",
"ff02::1 ip6-allnodes",
"ff02::2 ip6-allrouters"
]
Explanation
--raw-input/-R:
Don´t parse the input as JSON. Instead, each line of text is passed
to the filter as a string. If combined with --slurp, then the
entire input is passed to the filter as a single long string.
--slurp/-s:
Instead of running the filter for each JSON object in the input,
read the entire input stream into a large array and run the filter
just once.
You can also use jq -R . to format each line as a JSON string and then jq -s (--slurp) to create an array for the input lines after parsing them as JSON:
$ printf %s\\n aa bb|jq -R .|jq -s .
[
"aa",
"bb"
]
The method in chbrown's answer adds an empty element to the end if the input ends with a linefeed, but you can use printf %s "$(cat)" to remove trailing linefeeds:
$ printf %s\\n aa bb|jq -R -s 'split("\n")'
[
"aa",
"bb",
""
]
$ printf %s\\n aa bb|printf %s "$(cat)"|jq -R -s 'split("\n")'
[
"aa",
"bb"
]
If the input lines don't contain ASCII control characters (which have to be escaped in strings in valid JSON), you can use sed:
$ printf %s\\n aa bb|sed 's/["\]/\\&/g;s/.*/"&"/;1s/^/[/;$s/$/]/;$!s/$/,/'
["aa",
"bb"]
Update: If your jq has inputs you can simply write:
jq -nR [inputs] /etc/hosts
to produce a JSON array of strings. This avoids having to read the text file as a whole.
I found in the man page for jq and through experimentation what seems to me to be a simpler answer.
$ cat test_file.txt | jq -Rsc '. / "\n" - [""]'
["aa","bb"]
The -R is to read without trying to parse json, the -s says to read all of the input as one string, and the -c is for one-line output - not necessary, but it's what I was looking for.
Then in the string I pass to jq, the '.' says take the input as it is. The '/ \n' says to divide the string (split it) on newlines. The '- [""]' says to remove from the resulting array any empty strings (resulting from an extra newline at the end).
It's one line and without any complicated constructs, using just simple built in jq features.

Save a newline separated list into several bash variables

I'm relatively new to shell scripting and am writing a script to organize my music library. I'm using awk to parse the id3 tag info and am generating a newline separated list like so:
Kanye West
College Dropout
All Falls Down
I want to store each field in a separate variable so I can easily compose some mkdir and mv commands. I've tried piping the output to IFS=$'\n' read artist album title but each variable remains empty. I'm open to producing a different output from awk, but I still want to know how to parse a newline separated list using bash.
Edit:
It turns out that by piping directly to read by doing:
id3info "$filename" | awk "$awkscript" | {read artist; read album; read title;}
WILL NOT WORK. It results in the variables existing in a different scope. I found that using a herestring works best:
{read artist; read album; read title;} <<< "$(id3info "$filename" | awk "$awkscript")"
read normally reads one line at a time. So, if your id3 info is in the file testfile.txt, you can read it in as follows:
{ read artist ; read album ; read song ; } <testfile.txt
echo "artist='$artist' album='$album' song='$song'"
# insert your mkdir and mv commands....
When run on your test file, the above outputs:
artist='Kanye West' album='College Dropout' song='All Falls Down'
You can just read the file into a bash array and loop through the array like so:
IFS=$'\r\n' content=($(cat ${filepath}))
for ((idx = 0; idx < ${#content[#]}; idx+=3)); do
artist=${content[idx]}
album=${content[idx+1]}
title=${content[idx+2]}
done
Or read three lines in a loop.
yourscript |
while read artist; do # read first line of input
read album # read second line of input
read song # read third line of input
: self-destruct if the genre is rap
done
This loop will consume input lines in groups of three. If there is not an even multiple of three lines of input, the reads after that inside the loop will simply fail and the variables will be empty.
You can read the output from awk into an array. E.g.
readarray -t array <<< "$(printf '%s\n' 'Kanye West' 'College Dropout' 'All Falls Down')"
for ((i=0; i<${#array[#]}; i++ )) ; do
echo "array[$i]=${array[$i]}"
done
Produces:
array[0]=Kanye West
array[1]=College Dropout
array[2]=All Falls Down

Resources