How can I parse a YAML file from a Linux shell script? - shell

I wish to provide a structured configuration file which is as easy as possible for a non-technical user to edit (unfortunately it has to be a file) and so I wanted to use YAML. I can't find any way of parsing this from a Unix shell script however.

Here is a bash-only parser that leverages sed and awk to parse simple yaml files:
function parse_yaml {
local prefix=$2
local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo #|tr # '\034')
sed -ne "s|^\($s\):|\1|" \
-e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" $1 |
awk -F$fs '{
indent = length($1)/2;
vname[indent] = $2;
for (i in vname) {if (i > indent) {delete vname[i]}}
if (length($3) > 0) {
vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
}
}'
}
It understands files such as:
## global definitions
global:
debug: yes
verbose: no
debugging:
detailed: no
header: "debugging started"
## output
output:
file: "yes"
Which, when parsed using:
parse_yaml sample.yml
will output:
global_debug="yes"
global_verbose="no"
global_debugging_detailed="no"
global_debugging_header="debugging started"
output_file="yes"
it also understands yaml files, generated by ruby which may include ruby symbols, like:
---
:global:
:debug: 'yes'
:verbose: 'no'
:debugging:
:detailed: 'no'
:header: debugging started
:output: 'yes'
and will output the same as in the previous example.
typical use within a script is:
eval $(parse_yaml sample.yml)
parse_yaml accepts a prefix argument so that imported settings all have a common prefix (which will reduce the risk of namespace collisions).
parse_yaml sample.yml "CONF_"
yields:
CONF_global_debug="yes"
CONF_global_verbose="no"
CONF_global_debugging_detailed="no"
CONF_global_debugging_header="debugging started"
CONF_output_file="yes"
Note that previous settings in a file can be referred to by later settings:
## global definitions
global:
debug: yes
verbose: no
debugging:
detailed: no
header: "debugging started"
## output
output:
debug: $global_debug
Another nice usage is to first parse a defaults file and then the user settings, which works since the latter settings overrides the first ones:
eval $(parse_yaml defaults.yml)
eval $(parse_yaml project.yml)

I've written shyaml in python for YAML query needs from the shell command line.
Overview:
$ pip install shyaml ## installation
Example's YAML file (with complex features):
$ cat <<EOF > test.yaml
name: "MyName !!"
subvalue:
how-much: 1.1
things:
- first
- second
- third
other-things: [a, b, c]
maintainer: "Valentin Lab"
description: |
Multiline description:
Line 1
Line 2
EOF
Basic query:
$ cat test.yaml | shyaml get-value subvalue.maintainer
Valentin Lab
More complex looping query on complex values:
$ cat test.yaml | shyaml values-0 | \
while read -r -d $'\0' value; do
echo "RECEIVED: '$value'"
done
RECEIVED: '1.1'
RECEIVED: '- first
- second
- third'
RECEIVED: '2'
RECEIVED: 'Valentin Lab'
RECEIVED: 'Multiline description:
Line 1
Line 2'
A few key points:
all YAML types and syntax oddities are correctly handled, as multiline, quoted strings, inline sequences...
\0 padded output is available for solid multiline entry manipulation.
simple dotted notation to select sub-values (ie: subvalue.maintainer is a valid key).
access by index is provided to sequences (ie: subvalue.things.-1 is the last element of the subvalue.things sequence.)
access to all sequence/structs elements in one go for use in bash loops.
you can output whole subpart of a YAML file as ... YAML, which blend well for further manipulations with shyaml.
More sample and documentation are available on the shyaml github page or the shyaml PyPI page.

yq is a lightweight and portable command-line YAML processor
The aim of the project is to be the jq or sed of yaml files.
(https://github.com/mikefarah/yq#readme)
As an example (stolen straight from the documentation), given a sample.yaml file of:
---
bob:
item1:
cats: bananas
item2:
cats: apples
then
yq eval '.bob.*.cats' sample.yaml
will output
- bananas
- apples

My use case may or may not be quite the same as what this original post was asking, but it's definitely similar.
I need to pull in some YAML as bash variables. The YAML will never be more than one level deep.
YAML looks like so:
KEY: value
ANOTHER_KEY: another_value
OH_MY_SO_MANY_KEYS: yet_another_value
LAST_KEY: last_value
Output like-a dis:
KEY="value"
ANOTHER_KEY="another_value"
OH_MY_SO_MANY_KEYS="yet_another_value"
LAST_KEY="last_value"
I achieved the output with this line:
sed -e 's/:[^:\/\/]/="/g;s/$/"/g;s/ *=/=/g' file.yaml > file.sh
s/:[^:\/\/]/="/g finds : and replaces it with =", while ignoring :// (for URLs)
s/$/"/g appends " to the end of each line
s/ *=/=/g removes all spaces before =

Given that Python3 and PyYAML are quite easy dependencies to meet nowadays, the following may help:
yaml() {
python3 -c "import yaml;print(yaml.safe_load(open('$1'))$2)"
}
VALUE=$(yaml ~/my_yaml_file.yaml "['a_key']")

It's possible to pass a small script to some interpreters, like Python. An easy way to do so using Ruby and its YAML library is the following:
$ RUBY_SCRIPT="data = YAML::load(STDIN.read); puts data['a']; puts data['b']"
$ echo -e '---\na: 1234\nb: 4321' | ruby -ryaml -e "$RUBY_SCRIPT"
1234
4321
, wheredata is a hash (or array) with the values from yaml.
As a bonus, it'll parse Jekyll's front matter just fine.
ruby -ryaml -e "puts YAML::load(open(ARGV.first).read)['tags']" example.md

here an extended version of the Stefan Farestam's answer:
function parse_yaml {
local prefix=$2
local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo #|tr # '\034')
sed -ne "s|,$s\]$s\$|]|" \
-e ":1;s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s,$s\(.*\)$s\]|\1\2: [\3]\n\1 - \4|;t1" \
-e "s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s\]|\1\2:\n\1 - \3|;p" $1 | \
sed -ne "s|,$s}$s\$|}|" \
-e ":1;s|^\($s\)-$s{$s\(.*\)$s,$s\($w\)$s:$s\(.*\)$s}|\1- {\2}\n\1 \3: \4|;t1" \
-e "s|^\($s\)-$s{$s\(.*\)$s}|\1-\n\1 \2|;p" | \
sed -ne "s|^\($s\):|\1|" \
-e "s|^\($s\)-$s[\"']\(.*\)[\"']$s\$|\1$fs$fs\2|p" \
-e "s|^\($s\)-$s\(.*\)$s\$|\1$fs$fs\2|p" \
-e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" | \
awk -F$fs '{
indent = length($1)/2;
vname[indent] = $2;
for (i in vname) {if (i > indent) {delete vname[i]; idx[i]=0}}
if(length($2)== 0){ vname[indent]= ++idx[indent] };
if (length($3) > 0) {
vn=""; for (i=0; i<indent; i++) { vn=(vn)(vname[i])("_")}
printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, vname[indent], $3);
}
}'
}
This version supports the - notation and the short notation for dictionaries and lists. The following input:
global:
input:
- "main.c"
- "main.h"
flags: [ "-O3", "-fpic" ]
sample_input:
- { property1: value, property2: "value2" }
- { property1: "value3", property2: 'value 4' }
produces this output:
global_input_1="main.c"
global_input_2="main.h"
global_flags_1="-O3"
global_flags_2="-fpic"
global_sample_input_1_property1="value"
global_sample_input_1_property2="value2"
global_sample_input_2_property1="value3"
global_sample_input_2_property2="value 4"
as you can see the - items automatically get numbered in order to obtain different variable names for each item. In bash there are no multidimensional arrays, so this is one way to work around. Multiple levels are supported.
To work around the problem with trailing white spaces mentioned by #briceburg one should enclose the values in single or double quotes. However, there are still some limitations: Expansion of the dictionaries and lists can produce wrong results when values contain commas. Also, more complex structures like values spanning multiple lines (like ssh-keys) are not (yet) supported.
A few words about the code: The first sed command expands the short form of dictionaries { key: value, ...} to regular and converts them to more simple yaml style. The second sed call does the same for the short notation of lists and converts [ entry, ... ] to an itemized list with the - notation. The third sed call is the original one that handled normal dictionaries, now with the addition to handle lists with - and indentations. The awk part introduces an index for each indentation level and increases it when the variable name is empty (i.e. when processing a list). The current value of the counters are used instead of the empty vname. When going up one level, the counters are zeroed.
Edit: I have created a github repository for this.

Moving my answer from How to convert a json response into yaml in bash, since this seems to be the authoritative post on dealing with YAML text parsing from command line.
I would like to add details about the yq YAML implementation. Since there are two implementations of this YAML parser lying around, both having the name yq, it is hard to differentiate which one is in use, without looking at the implementations' DSL. There two available implementations are
kislyuk/yq - The more often talked about version, which is a wrapper over jq, written in Python using the PyYAML library for YAML parsing
mikefarah/yq - A Go implementation, with its own dynamic DSL using the go-yaml v3 parser.
Both are available for installation via standard installation package managers on almost all major distributions
kislyuk/yq - Installation instructions
mikefarah/yq - Installation instructions
Both the versions have some pros and cons over the other, but a few valid points to highlight (adopted from their repo instructions)
kislyuk/yq
Since the DSL is the adopted completely from jq, for users familiar with the latter, the parsing and manipulation becomes quite straightforward
Supports mode to preserve YAML tags and styles, but loses comments during the conversion. Since jq doesn't preserve comments, during the round-trip conversion, the comments are lost.
As part of the package, XML support is built in. An executable, xq, which transcodes XML to JSON using xmltodict and pipes it to jq, on which you can apply the same DSL to perform CRUD operations on the objects and round-trip the output back to XML.
Supports in-place edit mode with -i flag (similar to sed -i)
mikefarah/yq
Prone to frequent changes in DSL, migration from 2.x - 3.x
Rich support for anchors, styles and tags. But lookout for bugs once in a while
A relatively simple Path expression syntax to navigate and match yaml nodes
Supports YAML->JSON, JSON->YAML formatting and pretty printing YAML (with comments)
Supports in-place edit mode with -i flag (similar to sed -i)
Supports coloring the output YAML with -C flag (not applicable for JSON output) and indentation of the sub elements (default at 2 spaces)
Supports Shell completion for most shells - Bash, zsh (because of powerful support from spf13/cobra used to generate CLI flags)
My take on the following YAML (referenced in other answer as well) with both the versions
root_key1: this is value one
root_key2: "this is value two"
drink:
state: liquid
coffee:
best_served: hot
colour: brown
orange_juice:
best_served: cold
colour: orange
food:
state: solid
apple_pie:
best_served: warm
root_key_3: this is value three
Various actions to be performed with both the implementations (some frequently used operations)
Modifying node value at root level - Change value of root_key2
Modifying array contents, adding value - Add property to coffee
Modifying array contents, deleting value - Delete property from orange_juice
Printing key/value pairs with paths - For all items under food
Using kislyuk/yq
yq -y '.root_key2 |= "this is a new value"' yaml
yq -y '.drink.coffee += { time: "always"}' yaml
yq -y 'del(.drink.orange_juice.colour)' yaml
yq -r '.food|paths(scalars) as $p | [($p|join(".")), (getpath($p)|tojson)] | #tsv' yaml
Which is pretty straightforward. All you need is to transcode jq JSON output back into YAML with the -y flag.
Using mikefarah/yq
yq w yaml root_key2 "this is a new value"
yq w yaml drink.coffee.time "always"
yq d yaml drink.orange_juice.colour
yq r yaml --printMode pv "food.**"
As of today Dec 21st 2020, yq v4 is in beta and supports much powerful path expressions and supports DSL similar to using jq. Read the transition notes - Upgrading from V3

I just wrote a parser that I called Yay! (Yaml ain't Yamlesque!) which parses Yamlesque, a small subset of YAML. So, if you're looking for a 100% compliant YAML parser for Bash then this isn't it. However, to quote the OP, if you want a structured configuration file which is as easy as possible for a non-technical user to edit that is YAML-like, this may be of interest.
It's inspred by the earlier answer but writes associative arrays (yes, it requires Bash 4.x) instead of basic variables. It does so in a way that allows the data to be parsed without prior knowledge of the keys so that data-driven code can be written.
As well as the key/value array elements, each array has a keys array containing a list of key names, a children array containing names of child arrays and a parent key that refers to its parent.
This is an example of Yamlesque:
root_key1: this is value one
root_key2: "this is value two"
drink:
state: liquid
coffee:
best_served: hot
colour: brown
orange_juice:
best_served: cold
colour: orange
food:
state: solid
apple_pie:
best_served: warm
root_key_3: this is value three
Here is an example showing how to use it:
#!/bin/bash
# An example showing how to use Yay
. /usr/lib/yay
# helper to get array value at key
value() { eval echo \${$1[$2]}; }
# print a data collection
print_collection() {
for k in $(value $1 keys)
do
echo "$2$k = $(value $1 $k)"
done
for c in $(value $1 children)
do
echo -e "$2$c\n$2{"
print_collection $c " $2"
echo "$2}"
done
}
yay example
print_collection example
which outputs:
root_key1 = this is value one
root_key2 = this is value two
root_key_3 = this is value three
example_drink
{
state = liquid
example_coffee
{
best_served = hot
colour = brown
}
example_orange_juice
{
best_served = cold
colour = orange
}
}
example_food
{
state = solid
example_apple_pie
{
best_served = warm
}
}
And here is the parser:
yay_parse() {
# find input file
for f in "$1" "$1.yay" "$1.yml"
do
[[ -f "$f" ]] && input="$f" && break
done
[[ -z "$input" ]] && exit 1
# use given dataset prefix or imply from file name
[[ -n "$2" ]] && local prefix="$2" || {
local prefix=$(basename "$input"); prefix=${prefix%.*}
}
echo "declare -g -A $prefix;"
local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo #|tr # '\034')
sed -n -e "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" "$input" |
awk -F$fs '{
indent = length($1)/2;
key = $2;
value = $3;
# No prefix or parent for the top level (indent zero)
root_prefix = "'$prefix'_";
if (indent ==0 ) {
prefix = ""; parent_key = "'$prefix'";
} else {
prefix = root_prefix; parent_key = keys[indent-1];
}
keys[indent] = key;
# remove keys left behind if prior row was indented more than this row
for (i in keys) {if (i > indent) {delete keys[i]}}
if (length(value) > 0) {
# value
printf("%s%s[%s]=\"%s\";\n", prefix, parent_key , key, value);
printf("%s%s[keys]+=\" %s\";\n", prefix, parent_key , key);
} else {
# collection
printf("%s%s[children]+=\" %s%s\";\n", prefix, parent_key , root_prefix, key);
printf("declare -g -A %s%s;\n", root_prefix, key);
printf("%s%s[parent]=\"%s%s\";\n", root_prefix, key, prefix, parent_key);
}
}'
}
# helper to load yay data file
yay() { eval $(yay_parse "$#"); }
There is some documentation in the linked source file and below is a short explanation of what the code does.
The yay_parse function first locates the input file or exits with an exit status of 1. Next, it determines the dataset prefix, either explicitly specified or derived from the file name.
It writes valid bash commands to its standard output that, if executed, define arrays representing the contents of the input data file. The first of these defines the top-level array:
echo "declare -g -A $prefix;"
Note that array declarations are associative (-A) which is a feature of Bash version 4. Declarations are also global (-g) so they can be executed in a function but be available to the global scope like the yay helper:
yay() { eval $(yay_parse "$#"); }
The input data is initially processed with sed. It drops lines that don't match the Yamlesque format specification before delimiting the valid Yamlesque fields with an ASCII File Separator character and removing any double-quotes surrounding the value field.
local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo #|tr # '\034')
sed -n -e "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" "$input" |
The two expressions are similar; they differ only because the first one picks out quoted values where as the second one picks out unquoted ones.
The File Separator (28/hex 12/octal 034) is used because, as a non-printable character, it is unlikely to be in the input data.
The result is piped into awk which processes its input one line at a time. It uses the FS character to assign each field to a variable:
indent = length($1)/2;
key = $2;
value = $3;
All lines have an indent (possibly zero) and a key but they don't all have a value. It computes an indent level for the line dividing the length of the first field, which contains the leading whitespace, by two. The top level items without any indent are at indent level zero.
Next, it works out what prefix to use for the current item. This is what gets added to a key name to make an array name. There's a root_prefix for the top-level array which is defined as the data set name and an underscore:
root_prefix = "'$prefix'_";
if (indent ==0 ) {
prefix = ""; parent_key = "'$prefix'";
} else {
prefix = root_prefix; parent_key = keys[indent-1];
}
The parent_key is the key at the indent level above the current line's indent level and represents the collection that the current line is part of. The collection's key/value pairs will be stored in an array with its name defined as the concatenation of the prefix and parent_key.
For the top level (indent level zero) the data set prefix is used as the parent key so it has no prefix (it's set to ""). All other arrays are prefixed with the root prefix.
Next, the current key is inserted into an (awk-internal) array containing the keys. This array persists throughout the whole awk session and therefore contains keys inserted by prior lines. The key is inserted into the array using its indent as the array index.
keys[indent] = key;
Because this array contains keys from previous lines, any keys with an indent level grater than the current line's indent level are removed:
for (i in keys) {if (i > indent) {delete keys[i]}}
This leaves the keys array containing the key-chain from the root at indent level 0 to the current line. It removes stale keys that remain when the prior line was indented deeper than the current line.
The final section outputs the bash commands: an input line without a value starts a new indent level (a collection in YAML parlance) and an input line with a value adds a key to the current collection.
The collection's name is the concatenation of the current line's prefix and parent_key.
When a key has a value, a key with that value is assigned to the current collection like this:
printf("%s%s[%s]=\"%s\";\n", prefix, parent_key , key, value);
printf("%s%s[keys]+=\" %s\";\n", prefix, parent_key , key);
The first statement outputs the command to assign the value to an associative array element named after the key and the second one outputs the command to add the key to the collection's space-delimited keys list:
<current_collection>[<key>]="<value>";
<current_collection>[keys]+=" <key>";
When a key doesn't have a value, a new collection is started like this:
printf("%s%s[children]+=\" %s%s\";\n", prefix, parent_key , root_prefix, key);
printf("declare -g -A %s%s;\n", root_prefix, key);
The first statement outputs the command to add the new collection to the current's collection's space-delimited children list and the second one outputs the command to declare a new associative array for the new collection:
<current_collection>[children]+=" <new_collection>"
declare -g -A <new_collection>;
All of the output from yay_parse can be parsed as bash commands by the bash eval or source built-in commands.

Hard to say because it depends on what you want the parser to extract from your YAML document. For simple cases, you might be able to use grep, cut, awk etc. For more complex parsing you would need to use a full-blown parsing library such as Python's PyYAML or YAML::Perl.

Another option is to convert the YAML to JSON, then use jq to interact with the JSON representation either to extract information from it or edit it.
I wrote a simple bash script that contains this glue - see Y2J project on GitHub

perl -ne 'chomp; printf qq/%s="%s"\n/, split(/\s*:\s*/,$_,2)' file.yml > file.sh

I used to convert yaml to json using python and do my processing in jq.
python -c "import yaml; import json; from pathlib import Path; print(json.dumps(yaml.safe_load(Path('file.yml').read_text())))" | jq '.'

If you need a single value you could a tool which converts your YAML document to JSON and feed to jq, for example yq.
Content of sample.yaml:
---
bob:
item1:
cats: bananas
item2:
cats: apples
thing:
cats: oranges
Example:
$ yq -r '.bob["thing"]["cats"]' sample.yaml
oranges

I know this is very specific, but I think my answer could be helpful for certain users.
If you have node and npm installed on your machine, you can use js-yaml.
First install :
npm i -g js-yaml
# or locally
npm i js-yaml
then in your bash script
#!/bin/bash
js-yaml your-yaml-file.yml
Also if you are using jq you can do something like that
#!/bin/bash
json="$(js-yaml your-yaml-file.yml)"
aproperty="$(jq '.apropery' <<< "$json")"
echo "$aproperty"
Because js-yaml converts a yaml file to a json string literal. You can then use the string with any json parser in your unix system.

A quick way to do the thing now (previous ones haven't worked for me):
sudo wget https://github.com/mikefarah/yq/releases/download/v4.4.1/yq_linux_amd64 -O /usr/bin/yq &&\
sudo chmod +x /usr/bin/yq
Example asd.yaml:
a_list:
- key1: value1
key2: value2
key3: value3
parsing root:
user#vm:~$ yq e '.' asd.yaml
a_list:
- key1: value1
key2: value2
key3: value3
parsing key3:
user#vm:~$ yq e '.a_list[0].key3' asd.yaml
value3

If you have python 2 and PyYAML, you can use this parser I wrote called parse_yaml.py. Some of the neater things it does is let you choose a prefix (in case you have more than one file with similar variables) and to pick a single value from a yaml file.
For example if you have these yaml files:
staging.yaml:
db:
type: sqllite
host: 127.0.0.1
user: dev
password: password123
prod.yaml:
db:
type: postgres
host: 10.0.50.100
user: postgres
password: password123
You can load both without conflict.
$ eval $(python parse_yaml.py prod.yaml --prefix prod --cap)
$ eval $(python parse_yaml.py staging.yaml --prefix stg --cap)
$ echo $PROD_DB_HOST
10.0.50.100
$ echo $STG_DB_HOST
127.0.0.1
And even cherry pick the values you want.
$ prod_user=$(python parse_yaml.py prod.yaml --get db_user)
$ prod_port=$(python parse_yaml.py prod.yaml --get db_port --default 5432)
$ echo prod_user
postgres
$ echo prod_port
5432

You could use an equivalent of yq that is written in golang:
./go-yg -yamlFile /home/user/dev/ansible-firefox/defaults/main.yml -key
firefox_version
returns:
62.0.3

Whenever you need a solution for "How to work with YAML/JSON/compatible data from a shell script" which works on just about every OS with Python (*nix, OSX, Windows), consider yamlpath, which provides several command-line tools for reading, writing, searching, and merging YAML, EYAML, JSON, and compatible files. Since just about every OS either comes with Python pre-installed or it is trivial to install, this makes yamlpath highly portable. Even more interesting: this project defines an intuitive path language with very powerful, command-line-friendly syntax that enables accessing one or more nodes.
To your specific question and after installing yamlpath using Python's native package manager or your OS's package manager (yamlpath is available via RPM to some OSes):
#!/bin/bash
# Read values directly from YAML (or EYAML, JSON, etc) for use in this shell script:
myShellVar=$(yaml-get --query=any.path.no[matter%how].complex source-file.yaml)
# Use the value any way you need:
echo "Retrieved ${myShellVar}"
# Perhaps change the value and write it back:
myShellVar="New Value"
yaml-set --change=/any/path/no[matter%how]/complex --value="$myShellVar" source-file.yaml
You didn't specify that the data was a simple Scalar value though, so let's up the ante. What if the result you want is an Array? Even more challenging, what if it's an Array-of-Hashes and you only want one property of each result? Suppose further that your data is actually spread out across multiple YAML files and you need all the results in a single query. That's a much more interesting question to demonstrate with. So, suppose you have these two YAML files:
File: data1.yaml
---
baubles:
- name: Doohickey
sku: 0-000-1
price: 4.75
weight: 2.7g
- name: Doodad
sku: 0-000-2
price: 10.5
weight: 5g
- name: Oddball
sku: 0-000-3
price: 25.99
weight: 25kg
File: data2.yaml
---
baubles:
- name: Fob
sku: 0-000-4
price: 0.99
weight: 18mg
- name: Doohickey
price: 10.5
- name: Oddball
sku: 0-000-3
description: This ball is odd
How would you report only the sku of every item in inventory after applying the changes from data2.yaml to data1.yaml, all from a shell script? Try this:
#!/bin/bash
baubleSKUs=($(yaml-merge --aoh=deep data1.yaml data2.yaml | yaml-get --query=/baubles/sku -))
for sku in "${baubleSKUs[#]}"; do
echo "Found bauble SKU: ${sku}"
done
You get exactly what you need from only a few lines of code:
Found bauble SKU: 0-000-1
Found bauble SKU: 0-000-2
Found bauble SKU: 0-000-3
Found bauble SKU: 0-000-4
As you can see, yamlpath turns very complex problems into trivial solutions. Note that the entire query was handled as a stream; no YAML files were changed by the query and there were no temp files.
I realize this is "yet another tool to solve the same question" but after reading the other answers here, yamlpath appears more portable and robust than most alternatives. It also fully understands YAML/JSON/compatible files and it does not need to convert YAML to JSON to perform requested operations. As such, comments within the original YAML file are preserved whenever you need to change data in the source YAML file. Like some alternatives, yamlpath is also portable across OSes. More importantly, yamlpath defines a query language that is extremely powerful, enabling very specialized/filtered data queries. It can even operate against results from disparate parts of the file in a single query.
If you want to get or set many values in the data at once -- including complex data like hashes/arrays/maps/lists -- yamlpath can do that. Want a value but don't know precisely where it is in the document? yamlpath can find it and give you the exact path(s). Need to merge multiple data file together, including from STDIN? yamlpath does that, too. Further, yamlpath fully comprehends YAML anchors and their aliases, always giving or changing exactly the data you expect whether it is a concrete or referenced value.
Disclaimer: I wrote and maintain yamlpath, which is based on ruamel.yaml, which is in turn based on PyYAML. As such, yamlpath is fully standards-compliant.

Complex parsing is easiest with a library such as Python's PyYAML or YAML::Perl.
If you want to parse all the YAML values into bash values, try this script. This will handle comments as well. See example usage below:
# pparse.py
import yaml
import sys
def parse_yaml(yml, name=''):
if isinstance(yml, list):
for data in yml:
parse_yaml(data, name)
elif isinstance(yml, dict):
if (len(yml) == 1) and not isinstance(yml[list(yml.keys())[0]], list):
print(str(name+'_'+list(yml.keys())[0]+'='+str(yml[list(yml.keys())[0]]))[1:])
else:
for key in yml:
parse_yaml(yml[key], name+'_'+key)
if __name__=="__main__":
yml = yaml.safe_load(open(sys.argv[1]))
parse_yaml(yml)
test.yml
- folders:
- temp_folder: datasets/outputs/tmp
- keep_temp_folder: false
- MFA:
- MFA: false
- speaker_count: 1
- G2P:
- G2P: true
- G2P_model: models/MFA/G2P/english_g2p.zip
- input_folder: datasets/outputs/Youtube/ljspeech/wavs
- output_dictionary: datasets/outputs/Youtube/ljspeech/dictionary.dict
- dictionary: datasets/outputs/Youtube/ljspeech/dictionary.dict
- acoustic_model: models/MFA/acoustic/english.zip
- temp_folder: datasets/outputs/tmp
- jobs: 4
- align:
- config: configs/MFA/align.yaml
- dataset: datasets/outputs/Youtube/ljspeech/wavs
- output_folder: datasets/outputs/Youtube/ljspeech-aligned
- TTS:
- output_folder: datasets/outputs/Youtube
- preprocess:
- preprocess: true
- config: configs/TTS_preprocess.yaml # Default Config
- textgrid_folder: datasets/outputs/Youtube/ljspeech-aligned
- output_duration_folder: datasets/outputs/Youtube/durations
- sampling_rate: 44000 # Make sure sampling rate is same here as in preprocess config
Script where YAML values are needed:
yaml() {
eval $(python pparse.py "$1")
}
yaml "test.yml"
# What python printed to bash:
folders_temp_folder=datasets/outputs/tmp
folders_keep_temp_folder=False
MFA_MFA=False
MFA_speaker_count=1
MFA_G2P_G2P=True
MFA_G2P_G2P_model=models/MFA/G2P/english_g2p.zip
MFA_G2P_input_folder=datasets/outputs/Youtube/ljspeech/wavs
MFA_G2P_output_dictionary=datasets/outputs/Youtube/ljspeech/dictionary.dict
MFA_dictionary=datasets/outputs/Youtube/ljspeech/dictionary.dict
MFA_acoustic_model=models/MFA/acoustic/english.zip
MFA_temp_folder=datasets/outputs/tmp
MFA_jobs=4
MFA_align_config=configs/MFA/align.yaml
MFA_align_dataset=datasets/outputs/Youtube/ljspeech/wavs
MFA_align_output_folder=datasets/outputs/Youtube/ljspeech-aligned
TTS_output_folder=datasets/outputs/Youtube
TTS_preprocess_preprocess=True
TTS_preprocess_config=configs/TTS_preprocess.yaml
TTS_preprocess_textgrid_folder=datasets/outputs/Youtube/ljspeech-aligned
TTS_preprocess_output_duration_folder=datasets/outputs/Youtube/durations
TTS_preprocess_sampling_rate=44000
Access variables with bash:
echo "$TTS_preprocess_sampling_rate";
>>> 44000

If you know what tags you are interested in and the yaml structure you expect then it is not that hard to write a simple YAML parser in Bash.
In the following example the parser reads a structured YAML file into environment variables, an array and an associative array.
Note: The complexity of this parser is tied to the structure of the YAML file. You will need a separate subroutine for each structured component of the YAML file. Highly structured YAML files might require a more sophisticated approach, eg a generic recursive descent parser.
The xmas.yaml file:
# Xmas YAML example
---
# Values
pear-tree: partridge
turtle-doves: 2.718
french-hens: 3
# Array
calling-birds:
- huey
- dewey
- louie
- fred
# Structure
xmas-fifth-day:
calling-birds: four
french-hens: 3
golden-rings: 5
partridges:
count: 1
location: "a pear tree"
turtle-doves: two
The parser uses mapfile to read the file into memory as an array then cycles through each tag and creates environment variables.
pear-tree:, turtle-doves: and french-hens: end up as simple environment variables
calling-birds: becomes an array
The xmas-fifth-day: structure is represented as an associative array however you could encode these as environment variables if you are not using Bash 4.0 or later.
Comments and white space are ignored.
#!/bin/bash
# -------------------------------------------------------------------
# A simple parser for the xmas.yaml file
# -------------------------------------------------------------------
#
# xmas.yaml tags
# # - Ignored
# - Blank lines are ignored
# --- - Initialiser for days-of-xmas
# pear-tree: partridge - a string
# turtle-doves: 2.718 - a string, no float type in Bash
# french-hens: 3 - a number
# calling-birds: - an array of strings
# - huey - calling-birds[0]
# - dewey
# - louie
# - fred
# xmas-fifth-day: - an associative array
# calling-birds: four - a string
# french-hens: 3 - a number
# golden-rings: 5 - a number
# partridges: - changes the key to partridges.xxx
# count: 1 - a number
# location: "a pear tree" - a string
# turtle-doves: two - a string
#
# This requires the following routines
# ParseXMAS
# parses #, ---, blank line
# unexpected tag error
# calls days-of-xmas
#
# days-of-xmas
# parses pear-tree, turtle-doves, french-hens
# calls calling-birds
# calls xmas-fifth-day
#
# calling-birds
# elements of the array
#
# xmas-fifth-day
# parses calling-birds, french-hens, golden-rings, turtle-doves
# calls partridges
#
# partridges
# parses partridges.count, partridges.location
#
function ParseXMAS()
{
# days-of-xmas
# parses pear-tree, turtle-doves, french-hens
# calls calling-birds
# calls xmas-fifth-day
#
function days-of-xmas()
{
unset PearTree TurtleDoves FrenchHens
while [ $CURRENT_ROW -lt $ROWS ]
do
LINE=( ${CONFIG[${CURRENT_ROW}]} )
TAG=${LINE[0]}
unset LINE[0]
VALUE="${LINE[*]}"
echo " days-of-xmas[${CURRENT_ROW}] ${TAG}=${VALUE}"
if [ "$TAG" = "pear-tree:" ]
then
declare -g PearTree=$VALUE
elif [ "$TAG" = "turtle-doves:" ]
then
declare -g TurtleDoves=$VALUE
elif [ "$TAG" = "french-hens:" ]
then
declare -g FrenchHens=$VALUE
elif [ "$TAG" = "calling-birds:" ]
then
let CURRENT_ROW=$(($CURRENT_ROW + 1))
calling-birds
continue
elif [ "$TAG" = "xmas-fifth-day:" ]
then
let CURRENT_ROW=$(($CURRENT_ROW + 1))
xmas-fifth-day
continue
elif [ -z "$TAG" ] || [ "$TAG" = "#" ]
then
# Ignore comments and blank lines
true
else
# time to bug out
break
fi
let CURRENT_ROW=$(($CURRENT_ROW + 1))
done
}
# calling-birds
# elements of the array
function calling-birds()
{
unset CallingBirds
declare -ag CallingBirds
while [ $CURRENT_ROW -lt $ROWS ]
do
LINE=( ${CONFIG[${CURRENT_ROW}]} )
TAG=${LINE[0]}
unset LINE[0]
VALUE="${LINE[*]}"
echo " calling-birds[${CURRENT_ROW}] ${TAG}=${VALUE}"
if [ "$TAG" = "-" ]
then
CallingBirds[${#CallingBirds[*]}]=$VALUE
elif [ -z "$TAG" ] || [ "$TAG" = "#" ]
then
# Ignore comments and blank lines
true
else
# time to bug out
break
fi
let CURRENT_ROW=$(($CURRENT_ROW + 1))
done
}
# xmas-fifth-day
# parses calling-birds, french-hens, golden-rings, turtle-doves
# calls fifth-day-partridges
#
function xmas-fifth-day()
{
unset XmasFifthDay
declare -Ag XmasFifthDay
while [ $CURRENT_ROW -lt $ROWS ]
do
LINE=( ${CONFIG[${CURRENT_ROW}]} )
TAG=${LINE[0]}
unset LINE[0]
VALUE="${LINE[*]}"
echo " xmas-fifth-day[${CURRENT_ROW}] ${TAG}=${VALUE}"
if [ "$TAG" = "calling-birds:" ]
then
XmasFifthDay[CallingBirds]=$VALUE
elif [ "$TAG" = "french-hens:" ]
then
XmasFifthDay[FrenchHens]=$VALUE
elif [ "$TAG" = "golden-rings:" ]
then
XmasFifthDay[GOLDEN-RINGS]=$VALUE
elif [ "$TAG" = "turtle-doves:" ]
then
XmasFifthDay[TurtleDoves]=$VALUE
elif [ "$TAG" = "partridges:" ]
then
let CURRENT_ROW=$(($CURRENT_ROW + 1))
partridges
continue
elif [ -z "$TAG" ] || [ "$TAG" = "#" ]
then
# Ignore comments and blank lines
true
else
# time to bug out
break
fi
let CURRENT_ROW=$(($CURRENT_ROW + 1))
done
}
function partridges()
{
while [ $CURRENT_ROW -lt $ROWS ]
do
LINE=( ${CONFIG[${CURRENT_ROW}]} )
TAG=${LINE[0]}
unset LINE[0]
VALUE="${LINE[*]}"
echo " partridges[${CURRENT_ROW}] ${TAG}=${VALUE}"
if [ "$TAG" = "count:" ]
then
XmasFifthDay[PARTRIDGES.COUNT]=$VALUE
elif [ "$TAG" = "location:" ]
then
XmasFifthDay[PARTRIDGES.LOCATION]=$VALUE
elif [ -z "$TAG" ] || [ "$TAG" = "#" ]
then
# Ignore comments and blank lines
true
else
# time to bug out
break
fi
let CURRENT_ROW=$(($CURRENT_ROW + 1))
done
}
# ===================================================================
# Load the configuration file
mapfile CONFIG < xmas.yaml
let ROWS=${#CONFIG[#]}
let CURRENT_ROW=0
# +
# #
#
# ---
# -
while [ $CURRENT_ROW -lt $ROWS ]
do
LINE=( ${CONFIG[${CURRENT_ROW}]} )
TAG=${LINE[0]}
unset LINE[0]
VALUE="${LINE[*]}"
echo "[${CURRENT_ROW}] ${TAG}=${VALUE}"
if [ "$TAG" = "---" ]
then
let CURRENT_ROW=$(($CURRENT_ROW + 1))
days-of-xmas
continue
elif [ -z "$TAG" ] || [ "$TAG" = "#" ]
then
# Ignore comments and blank lines
true
else
echo "Unexpected tag at line $(($CURRENT_ROW + 1)): <${TAG}>={${VALUE}}"
break
fi
let CURRENT_ROW=$(($CURRENT_ROW + 1))
done
}
echo =========================================
ParseXMAS
echo =========================================
declare -p PearTree
declare -p TurtleDoves
declare -p FrenchHens
declare -p CallingBirds
declare -p XmasFifthDay
This produces the following output
=========================================
[0] #=Xmas YAML example
[1] ---=
days-of-xmas[2] #=Values
days-of-xmas[3] pear-tree:=partridge
days-of-xmas[4] turtle-doves:=2.718
days-of-xmas[5] french-hens:=3
days-of-xmas[6] =
days-of-xmas[7] #=Array
days-of-xmas[8] calling-birds:=
calling-birds[9] -=huey
calling-birds[10] -=dewey
calling-birds[11] -=louie
calling-birds[12] -=fred
calling-birds[13] =
calling-birds[14] #=Structure
calling-birds[15] xmas-fifth-day:=
days-of-xmas[15] xmas-fifth-day:=
xmas-fifth-day[16] calling-birds:=four
xmas-fifth-day[17] french-hens:=3
xmas-fifth-day[18] golden-rings:=5
xmas-fifth-day[19] partridges:=
partridges[20] count:=1
partridges[21] location:="a pear tree"
partridges[22] turtle-doves:=two
xmas-fifth-day[22] turtle-doves:=two
=========================================
declare -- PearTree="partridge"
declare -- TurtleDoves="2.718"
declare -- FrenchHens="3"
declare -a CallingBirds=([0]="huey" [1]="dewey" [2]="louie" [3]="fred")
declare -A XmasFifthDay=([CallingBirds]="four" [PARTRIDGES.LOCATION]="\"a pear tree\"" [FrenchHens]="3" [GOLDEN-RINGS]="5" [PARTRIDGES.COUNT]="1" [TurtleDoves]="two" )

You can also consider using Grunt (The JavaScript Task Runner). Can be easily integrated with shell. It supports reading YAML (grunt.file.readYAML) and JSON (grunt.file.readJSON) files.
This can be achieved by creating a task in Gruntfile.js (or Gruntfile.coffee), e.g.:
module.exports = function (grunt) {
grunt.registerTask('foo', ['load_yml']);
grunt.registerTask('load_yml', function () {
var data = grunt.file.readYAML('foo.yml');
Object.keys(data).forEach(function (g) {
// ... switch (g) { case 'my_key':
});
});
};
then from shell just simply run grunt foo (check grunt --help for available tasks).
Further more you can implement exec:foo tasks (grunt-exec) with input variables passed from your task (foo: { cmd: 'echo bar <%= foo %>' }) in order to print the output in whatever format you want, then pipe it into another command.
There is also similar tool to Grunt, it's called gulp with additional plugin gulp-yaml.
Install via: npm install --save-dev gulp-yaml
Sample usage:
var yaml = require('gulp-yaml');
gulp.src('./src/*.yml')
.pipe(yaml())
.pipe(gulp.dest('./dist/'))
gulp.src('./src/*.yml')
.pipe(yaml({ space: 2 }))
.pipe(gulp.dest('./dist/'))
gulp.src('./src/*.yml')
.pipe(yaml({ safe: true }))
.pipe(gulp.dest('./dist/'))
To more options to deal with YAML format, check YAML site for available projects, libraries and other resources which can help you to parse that format.
Other tools:
Jshon
parses, reads and creates JSON

I know my answer is specific, but if one already has PHP and Symfony installed, it can be very handy to use Symfony's YAML parser.
For instance:
php -r "require '$SYMFONY_ROOT_PATH/vendor/autoload.php'; \
var_dump(\Symfony\Component\Yaml\Yaml::parse(file_get_contents('$YAML_FILE_PATH')));"
Here I simply used var_dump to output the parsed array but of course you can do much more... :)

Related

Is there a way to unpack a config file to cli flags in general?

Basically what foo(**bar) does in python, here I’d want something like
foo **bar.yaml
and that would become
foo --bar1=1 --bar2=2
Where bar.yaml would be
bar1: 1
bar2: 2
You could use a combination of sed and xargs:
sed -E 's/^(.+):[[:space:]]+(.+)$/--\1=\2/' bar.yaml | xargs -d '\n' foo
sed converts the format of bar.yaml lines (e.g. bar1: 1 -> --bar1=1) and xargs feeds the converted lines as arguments to foo.
You could of course modify/extend the sed part to support other formats or single-dash options like -v.
To test if this does what you want, you can run this Bash script instead of foo:
#!/usr/bin/env bash
echo "Arguments: $#"
for ((i=1; i <= $#; i++)); do
echo "Argument $i: '${!i}'"
done
Here's a version for zsh. Run this code or add it to ~/.zshrc:
function _yamlExpand {
setopt local_options extended_glob
# 'words' array contains the current command line
# yaml filename is the last value
yamlFile=${words[-1]}
# parse 'key : value' lines from file, create associative array
typeset -A parms=("${(#s.:.)${(f)"$(<${yamlFile})"}}")
# trim leading and trailing whitespace from keys and values
# requires extended_glob
parms=("${(kv#)${(kv#)parms##[[:space:]]##}%%[[:space:]]##}")
# add -- and = to create flags
typeset -a flags
for key val in "${(#kv)parms}"; do
flags+=("--${key}='${val}'")
done
# replace the value on the command line
compadd -QU -- "$flags"
}
# add the function as a completion and map it to ctrl-y
compdef -k _yamlExpand expand-or-complete '^Y'
At the zsh shell prompt, type in the command and the yaml file name:
% print -l -- ./bar.yaml▃
With the cursor immediately after the yaml file name, hit ctrl+y. The yaml filename will be replaced with the expanded parameters:
% print -l -- --bar1='1' --bar2='2' ▃
Now you're set; you can hit enter, or add parameters, just like any other command line.
Notes:
This only supports the yaml subset in your example.
You can add more yaml parsing to the function, possibly with yq.
In this version, the cursor must be next to the yaml filename - otherwise the last value in words will be empty. You can add code to detect that case and then alter the words array with compset -n.
compadd and compset are described in the zshcompwid man page.
zshcompsys has details on compdef; the section on autoloaded files describes another way to deploy something like this.

How to iterate over multiple variables and echo them using Shell Script?

Consider the below variables which are dynamic and might change each time. Sometimes there might even be 5 variables, But the length of all the variables will be the same every time.
var1='a b c d e... upto z'
var2='1 2 3 4 5... upto 26'
var3='I II III IV V... upto XXVI'
I am looking for a generalized approach to iterate the variables in a for loop & My desired output should be like below.
a,1,I
b,2,II
c,3,III
d,4,IV
e,5,V
.
.
goes on upto
z,26,XXVI
If I use nested loops, then I get all possible combinations which is not the expected outcome.
Also, I know how to make this work for 2 variables using for loop and shift using below link
https://unix.stackexchange.com/questions/390283/how-to-iterate-two-variables-in-a-sh-script
With paste
paste -d , <(tr ' ' '\n' <<<"$var1") <(tr ' ' '\n' <<<"$var2") <(tr ' ' '\n' <<<"$var3")
a,1,I
b,2,II
c,3,III
d,4,IV
e...z,5...26,V...XXVI
But clearly having to add other parameter substitutions for more varN's is not scalable.
You need to "zip" two variables at a time.
var1='a b c d e...z'
var2='1 2 3 4 5...26'
var3='I II III IV V...XXVI'
zip_var1_var2 () {
set $var1
for v2 in $var2; do
echo "$1,$v2"
shift
done
}
zip_var12_var3 () {
set $(zip_var1_var2)
for v3 in $var3; do
echo "$1,$v3"
shift
done
}
for x in $(zip_var12_var3); do
echo "$x"
done
If you are willing to use eval and are sure it is safe to do so, you can write a single function like
zip () {
if [ $# -eq 1 ]; then
eval echo \$$1
return
fi
a1=$1
shift
x=$*
set $(eval echo \$$a1)
for v in $(zip $x); do
printf '=== %s\n' "$1,$v" >&2
echo "$1,$v"
shift
done
}
zip var1 var2 var3 # Note the arguments are the *names* of the variables to zip
If you can use arrays, then (for example, in bash)
var1=(a b c d e)
var2=(1 2 3 4 5)
var3=(I II III IV V)
for i in "${!var1[#]}"; do
printf '%s,%s,%s\n' "${var1[i]}" "${var2[i]}" "${var3[i]}"
done
Use this Perl one-liner:
perl -le '#in = map { [split] } #ARGV; for $i ( 0..$#{ $in[0] } ) { print join ",", map { $in[$_][$i] } 0..$#in; }' "$var1" "$var2" "$var3"
Prints:
a,1,I
b,2,II
c,3,III
d,4,IV
e,5,V
z,26,XXVI
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
The input variables must be quoted with double quotes "like so", to keep the blank-separated words from being treated as separate arguments.
#ARGV is an array of the command line arguments, here $var1, $var2, $var3.
#in is an array of 3 elements, each element being a reference to an array obtained as a result of splitting the corresponding element of #ARGV on whitespace. Note that split splits the string on whitespace by default, but you can specify a different delimiter, it accepts regexes.
The subsequent for loop prints #in elements separated by comma.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlvar: Perl predefined variables
The following is (almost) a copy of this answer with a few tweaks that make it fit this question.
The Original Question
First let’s assign a few variables to play with, 26 tokens in each of them:
var1="$(echo {a..z})"
var2="$(echo {1..26})"
var3="$(echo I II III IV \
V{,I,II,III} IX \
X{,I,II,III} XIV \
XV{,I,II,III} XIX \
XX{,I,II,III} XXIV \
XXV XXVI)"
var4="$(echo {A..Z})"
var5="$(echo {010101..262626..10101})"
Now we want a “magic” function that zips an arbitrary number of variables, ideally in pure Bash:
zip_vars var1 # a trivial test
zip_vars var{1..2} # a slightly less trivial test
zip_vars var{1..3} # the original question
zip_vars var{1..4} # more vars, becasuse we can
zip_vars var{1..5} # more vars, because why not
What could zip_vars look like? Here’s one in pure Bash, without any external commands:
zip_vars() {
local var
for var in "$#"; do
local -a "array_${var}"
local -n array_ref="array_${var}"
array_ref=(${!var})
local -ar "array_${var}"
done
local -n array_ref="array_${1}"
local -ir size="${#array_ref[#]}"
local -i i
local output
for ((i = 0; i < size; ++i)); do
output=
for var in "$#"; do
local -n array_ref="array_${var}"
output+=",${array_ref[i]}"
done
printf '%s\n' "${output:1}"
done
}
How it works:
It splits all variables (passed by reference (by variable name)) into arrays. For each variable varX it creates a local array array_varX.
It would be actually way easier if the input variables were already Bash arrays to start with (see below), but … we stick with the original question initially.
It determines the size of the first array and then blindly expects all arrays to be of that size.
For each index i from 0 to size - 1 it concatenates the ith elements of all arrays, separated by ,.
Arrays Make Things Easier
If you use Bash arrays from the very start, the script will be shorter and look simpler and there won’t be any string-to-array conversions.
zip_arrays() {
local -n array_ref="$1"
local -ir size="${#array_ref[#]}"
local -i i
local output
for ((i = 0; i < size; ++i)); do
output=
for arr in "$#"; do
local -n array_ref="$arr"
output+=",${array_ref[i]}"
done
printf '%s\n' "${output:1}"
done
}
arr1=({a..z})
arr2=({1..26})
arr3=( I II III IV
V{,I,II,III} IX
X{,I,II,III} XIV
XV{,I,II,III} XIX
XX{,I,II,III} XXIV
XXV
XXVI)
arr4=({A..Z})
arr5=({010101..262626..10101})
zip_arrays arr1 # a trivial test
zip_arrays arr{1..2} # a slightly less trivial test
zip_arrays arr{1..3} # (almost) the original question
zip_arrays arr{1..4} # more arrays, becasuse we can
zip_arrays arr{1..5} # more arrays, because why not

Shell script to read flat file and replace xml values

I have a flat file like this:
File:
# Environment
Application.Env~DEV
# Identity
Application.ID~999
# Name
Application.Name~appname
An XML like this:
<name>Application/Env</name>
<value>XXX</value>
<name>Application/ID</name>
<value>000</value>
<name>Application/Name</name>
<value>AAA</value>
I'm looking for a script (awk, sed etc) to read the flat file and replace all of the data in the <value> tags in the xml with the data found after the ~ when the <name> tag matches the data before the ~. Ultimately the resulting XML will look like:
<name>Application/Env</name>
<value>DEV</value>
<name>Application/ID</name>
<value>999</value>
<name>Application/Name</name>
<value>appname</value>
Thanks for your help!
Using XMLStarlet, this would look something like the following:
#!/bin/bash
# usage: [script] [flatfile-name] <in.xml >out.xml
flatfile=$1
# store an array of variables, and an array of edit commands
xml_vars=( )
xml_cmd=( )
count=0
while read -r line; do
[[ $line = *"~"* ]] || continue
key=${line%%"~"*} # put everything before the ~ into key
key=${key//"."/"/"} # change "."s to "/"s in key
val=${line#*"~"} # put everything after the ~ into val
# assign key to an XMLStarlet variable to avoid practices that can lead to injection
xml_vars+=( --var "var$count" "'$key'" )
# update the first value following a matching name
xml_cmd+=( -u "//name[.=\$var${count}]/following-sibling::value[1]" \
-v "$val" )
# increment the counter used to assign variable names
(( ++count ))
done <"$flatfile"
if (( ${#xml_cmd[#]} )); then
xmlstarlet ed "${xml_vars[#]}" "${xml_cmd[#]}"
else
cat # no edits to do
fi
This will run a command like the following:
xmlstarlet ed \
--var var0 "Application/Env" \
--var var2 "Application/ID" \
--var var3 "Application/Name" \
-u '//name[.=$var0]/following-sibling::value[1]' -v 'DEV' \
-u '//name[.=$var1]/following-sibling::value[1]' -v '999' \
-u '//name[.=$var2]/following-sibling::value[1]' -v 'appname'
...which replaces the first value after the name Application/Env with DEV, the first value after the name Application/ID with 999, and the first value after the name Application/Name with appname.
A slightly less paranoid approach might instead generate queries like //name[.="Application/Name"]/following-sibling::value[1]; putting the variables out-of-band is being followed as a security practice. Consider what could happen otherwise if the input file contained:
Application.Foo"or 1=1 or .="~bar
...and the resulting XPath were
//name[.="Application/Foo" or 1=1 or .=""]/following-sibling::value[1]
Because 1=1 is always true, this would then match every name, and thus change every value in the file to bar.
Unfortunately, the implementation of XMLStarlet doesn't effectively guard against this; however, using bind variables makes it possible for an implementation to provide such precautions, so a future release could be safe in this context.
Using Perl and XML::XSH2, a wrapper around XML::LibXML:
#!/usr/bin/perl
use warnings;
use strict;
use XML::XSH2;
open my $IN, '<', 'flatfile' or die $!;
$XML::XSH2::Map::replace = { map { chomp; split /~/ } grep /~/, <$IN> };
xsh << 'end.';
open 1.xml ;
for //name {
set following-sibling::value[1]
xsh:lookup('replace', xsh:subst(., '/', '.')) ;
}
save :b ;
end.
I wrapped the XML into a <root> tag to make it well formed.

Reading value from an ini style file with sed/awk

I wrote a simple bash function which would read value from an ini file (defined by variable CONF_FILE) and output it
getConfValue() {
#getConfValue section variable
#return value of a specific variable from given section of a conf file
section=$1
var="$2"
val=$(sed -nr "/\[$section\]/,/\[/{/$var/p}" $CONF_FILE)
val=${val#$var=}
echo "$val"
}
The problem is that it does not ignore comments and runs into troubles if multiple variables within a section names share common substrings.
Example ini file:
[general]
# TEST=old
; TEST=comment
TEST=new
TESTING=this will be output too
PATH=/tmp/test
Running getConfValue general PATH would output /tmp/test as expected, but running getConfValue general TEST shows all the problems this approach has.
How to fix that?
I know there are dedicated tools like crudini or tools for python, perl and php out in the wild, but I do not want to add extra dependencies for simple config file parsing. A solution incorporating awk instead of sed would be just fine too.
Sticking with sed you could anchor your var search to the start of the record using ^ and end it with an equal sign:
"/\[$section\]/,/\[/{/^$var=/p}"
If you are concerned about whitespace in front of your record you could account for that:
"/\[$section\]/,/\[/{/^(\W|)$var=/p}"
That ^(\W|)$var= says "if there is whitespace at the beginning (^(\W) or nothing (|)) before your variable concatenated with an equal sign ($var=)."
If you wanted to switch over to awk you could use something like:
val=$(awk -F"=" -v section=$section -v var=$var '$1=="["section"]"{secFound=1}secFound==1 && $1==var{print $2; secFound=0}' $CONF_FILE)
That awk command splits the record by equal -F"=". Then if the first field in the record is your section ($1=="["section"]") then set variable secFound to 1. Then... if secFound is 1 and the first field is exactly equal to your var variable (secFound==1 && $1==var) then print out the second field ({print $2}) and sets secFound to 0 so we don't pick up any other Test keys.
I encountered this problem and came up with a solution similar to others here.
The main difference is it uses a single awk call to get a response suitable for creating an associative array of the property/value pairs for a section.
This will not ignore the commented properties. Though adding something to do that should not be to hard.
Here's a testing script demonstrating the awk and declare statements used;
#!/bin/bash
#
# Parse a INI style properties file and extract property values for a given section
#
# Author: Alan Carlyle
# License: CC0 (https://creativecommons.org/about/cclicenses/)
#
# Example Input: (example.properties)
# [SEC1]
# item1=value1
# item2="value 2"
#
# [Section 2]
# property 1="Value 1 of 'Section 2'"
# property 2='Single "quoted" value'
#
# Usage:
# $ read_props example.properties "Section 2" property\ 2
# $ Single "quoted" value
#
# Section names and properties with spaces do not need to be quoted.
# Values with spaces must be quoted. Values can use single or double quotes.
# The following characters [ = ] can not be used in names or values.
#
# If the property is not provided the the whole section is outputed.
#
propertiesFile=$1
section=$2
property=$3
# Extract the propetites for the section formated as for associtive array
sectionData="( "$(awk -F'=' -v s="$section" '/^\[/{ gsub(/[\[\]]/, "", $1); f = ($1 == s); next }
NF && f{ print "["$1"]="$2 }' $propertiesFile)" )"
# Create associtive array from extracted section data
declare +x -A "properties=$sectionData"
if [ -z "$property" ]
then
echo $sectionData
else
echo ${properties[$property]}
fi

How to parse a string into variables?

I know how to parse a string into variables in the manner of this SO question, e.g.
ABCDE-123456
becomes:
var1=ABCDE
var2=123456
via, say, cut. I can do that in one script, no problem.
But I have a few dozen scripts which parse strings/arguments all in the same fashion (same arguments & variables, i.e. same parsing strategy).
And sometimes I need to make a change or add a variable to the parsing mechanism.
Of course, I could go through every one of my dozens of scripts and change the parsing manually (even if just copy & paste), but that would be tedious and more error-prone to bugs/mistakes.
Is there a modular way to do parse strings/arguments as such?
I thought of writing a script which parses the string/args into variables and then exports, but the export command does not work form child-to-parent, (only vice-versa).
Something like this might work:
parse_it () {
SEP=${SEP--}
string=$1
names=${#:2}
IFS="$SEP" read $names <<< "$string"
}
$ parse_it ABCDE-123456 var1 var2
$ echo "$var1"
ABCDE
$ echo "$var2"
123456
$ SEP=: parse_it "foo:bar:baz" id1 id2 id3
$ echo $id2
bar
The first argument is the string to parse, the remaining arguments are names of variables that get passed to read as the variables to set. (Not quoting $names here is intentional, as we will let the shell split the string into multiple words, one per variable. Valid variable names consist of only _, letters, and numbers, so there are no worries about undesired word splitting or pathname generation by not quoting $names). The function assumes the string uses a single separator of "-", which can be overridden via the environment.
For more complex parsing, you may want to use a custom regular expression (bash 4 or later required for the -g flag to declare):
parse_it () {
reg_ex=$1
string=$2
shift 2
[[ $string =~ $reg_ex ]] || return
i=1
for name; do
declare -g "$name=${BASH_REMATCH[i++]}"
done
}
$ parse_it '(.*)-(.*):(.*)' "abc-123:xyz" id1 id2 id3
$ echo "$id2"
123
I think what you really want is to write your function in one script and include it in all of your other scripts.
You can include other shell scripts by the source or . command.
For example, you can define your parse function in parseString.sh
function parseString {
...
}
And then in any of your other script, do
source parseString.sh
# now we can call parseString function
parseString abcde-12345

Resources