Append data to another column in a CSV if duplicate is found in first column - bash

I have a CSV with data such as:
somename1,value1
somename1,value2
somename1,value3
anothername1,anothervalue1
anothername1,anothervalue2
anothername1,anothervalue3
I would like to rewrite the CSV so that when a duplicate in column 1 is found, the the data is appended to a new column on the first entry.
For instance, the desired output would be :
somename1,value1,value2,value3
anothername1,anothervalue1,anothervalue2,anothervalue3
How can i do this in a shell script ?
TIA

You need much more than just removing duplicated lines when using Awk, you need a logic as below to create an array of elements for each unique entry in $1.
The solution creates a hash-map with unique values in $1 working as indices of the array and elements as the value appended with a , separator.
awk 'BEGIN{FS=OFS=","; prev="";}{ if (prev != $1) {unique[$1]=$2;} else {unique[$1]=(unique[$1]","$2)} prev=$1; }END{for (i in unique) print i,unique[i]}' file
anothername1,anothervalue1,anothervalue2,anothervalue3
somename1,value1,value2,value3
A more readable version would be to have something like,
BEGIN {
# set input and output field separator to ',' and initialize
# variable holding last instance of $1 to empty
FS=OFS=","
prev=""
}
{
# Update the value of $2 directly in the hash array only when new
# unique elements are found in $1
if (prev != $1){
unique[$1]=$2
}
else {
unique[$1]=(unique[$1]","$2)
}
# Update the current $1
prev=$1
}
END {
for (i in unique) {
print i,unique[i]
}

FILE=$1
NAMES=`cut -d',' -f 1 $FILE | sort -u`
for NAME in $NAMES; do
echo -n "$NAME"
VALUES=`grep "$NAME" $FILE | cut -d',' -f2`
for VAL in $VALUES; do
echo -n ",$VAL"
done
echo ""
done
running with your data generates:
>bash script.sh data1.txt
anothername1,anothervalue1,anothervalue2,anothervalue3
somename1,value1,value2,value3
the filename of your data has to be passed as parameter. output can be written to a new file by redirecting.
>bash script.sh data1.txt > data_new.txt

Related

How to merge two or more lines if they start with the same word?

I have a file like this:
AAKRKA HIST1H1B AAGAGAAKRKATGPP
AAKRKA HIST1H1E RKSAGAAKRKASGPP
AAKRLN ACAT1 LMTADAAKRLNVTPL
AAKRLN SUCLG2 NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA
I would like to kind of merge 2 lines if they are exactly the same in the 1st column. The desired output is:
AAKRKA HIST1H1B,HIST1H1E AAGAGAAKRKATGPP,RKSAGAAKRKASGPP
AAKRLN ACAT1,SUCLG2 LMTADAAKRLNVTPL,NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA
Sometimes there could be more than two lines starting with the same word. How could I reach the desired output with bash/awk?
Thanks for help!
Since this resembles SQL like group operations, you can use sqlite which is available in bash
with the given inputs
$ cat aqua.txt
AAKRKA HIST1H1B AAGAGAAKRKATGPP
AAKRKA HIST1H1E RKSAGAAKRKASGPP
AAKRLN ACAT1 LMTADAAKRLNVTPL
AAKRLN SUCLG2 NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA
$
Script:
$ cat ./sqlite_join.sh
#!/bin/sh
sqlite3 << EOF
create table data(a,b,c);
.separator ' '
.import $1 data
select a, group_concat(b) , group_concat(c) from data group by a;
EOF
$
Results
$ ./sqlite_join.sh aqua.txt
AAKRKA HIST1H1B,HIST1H1E AAGAGAAKRKATGPP,RKSAGAAKRKASGPP
AAKRLN ACAT1,SUCLG2 LMTADAAKRLNVTPL,NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA
$
This is a two-liner in awk; the first line stores the second and third fields in associative arrays indexed by the first field, accumulating fields with identical indices with leading commas before each field, and the second line iterates over the two arrays, deleting the leading comma on output:
{ second[$1] = second[$1] "," $2; third[$1] = third[$1] "," $3 }
END { for (i in second) print i, substr(second[i],2), substr(third[i],2) }
I made no assumptions about the order of the input or the output. If you want sorted output, pipe the output through sort. You can run the program at https://ideone.com/sbgLNk.
try this:
DATAFILE=data.txt
cut -d " " -f1 < $DATAFILE | sort | uniq |
while read key; do
column1="$key"
column2=""
column3=""
grep "$key" $DATAFILE |
while read line; do
set -- $line
[ -n "$column2" ] && [ -n "$2" ] && column2="$column2,"
[ -n "$column3" ] && [ -n "$3" ] && column3="$column3,"
column2="$column2$2"
column3="$column3$3"
echo "$column1 $column2 $column3"
done | tail -n1
done

How to convert CSV to Excel with adding header rows between different data using Shell script?

I want to process CSV file line by line and if table_name is different, need to add header row.
Sample CSV:
table_name,no.,data
attribute,column_name,definition,data_type,valid_values,notes
archive_rule,1,ID,id,,int,,
archive_rule,2,EXECUTE SEQ,execute_seq,,int,,
archive_rule,3,ARCHIVE RULE NAME,archive_rule_name,,varchar,,
archive_rule,4,ARCHIVE RULE TABLE NAME,archive_rule_table_name,,varchar,,
archive_rule,5,ARCHIVE RULE PK NAME,archive_rule_pk_name,,varchar,,
archive_rule,6,ARCHIVE BATCH SIZE,archive_batch_size,,int,,
archive_rule,7,ACTIVE STATUS,active_status,,varchar,,
archive_table,1,ID,id,,int,,
archive_table,2,ARCHIVE RULE ID,archive_rule_id,,int,,
archive_table,3,EXECUTE SEQ,execute_seq,,int,,
archive_table,4,ARCHIVE DEPEND TABLE ID,archive_depend_table_id,,int,,
archive_table,5,ARCHIVE DEPEND LEVEL,archive_depend_level,,int,,
archive_table,6,ACTIVE STATUS,active_status,,varchar,,
batch_job,1,BATCH JOB ID,batch_job_id,,int,,
batch_job,2,JOB TYPE,job_type,,varchar,,
batch_job,3,JOB NAME,job_name,,varchar,,
batch_job,4,EXECUTION DATE,execution_date,,timestamp,,
batch_job,5,EXECUTION RESULT,execution_result,,varchar,,
batch_job,6,ERROR MESSAGE,error_message,,varchar,,
batch_job,7,REPORT OUTPUT,report_output,,varchar,,
Desired Result:
Data : archive_rule
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,ID,id,,int,,
2,EXECUTE SEQ,execute_seq,,int,,
3,ARCHIVE RULE NAME,archive_rule_name,,varchar,,
4,ARCHIVE RULE TABLE NAME,archive_rule_table_name,,varchar,,
5,ARCHIVE RULE PK NAME,archive_rule_pk_name,,varchar,,
6,ARCHIVE BATCH SIZE,archive_batch_size,,int,,
...
Data: archive_table
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,ID,id,,int,,
2,ARCHIVE RULE ID,archive_rule_id,,int,,
3,EXECUTE SEQ,execute_seq,,int,,
4,ARCHIVE DEPEND TABLE ID,archive_depend_table_id,,int,,
5,ARCHIVE DEPEND LEVEL,archive_depend_level,,int,,
...
Please help me to find a way to get output.
I can only imagine one way here: read the input file line by line, and use cut to extract the first field. This should do the trick:
#! /bin/bash
# accept both process.sh file and process.sh < file
if [ $# -eq 1 ]
then file="$1"
else file=-
fi
#initialize table name to the empty string
cur=""
# process the input line by line after skipping the header
cat "file" | tail +3 | (
while true
do
read line
if [ $? -ne 0 ] # exit loop on end of file or error
then
break
fi
tab=$( echo $line | cut -f 1 -d, ) # extract table name
if [ "x$tab" != "x$cur" ]
then
cur=$tab # if a new one remember it
echo "Data: $tab" # and write header
echo "no.,data attribute,column_name,definition,data_type,valid_values,notes"
fi
echo $line | cut -f 2- -d, # copy all except first field
done )
But I would use a true script language like Ruby or Python here...
Using awk:
$ awk '
BEGIN { FS=OFS="," } # set field separators
NR==1 { # first record, start building the header
h=$2 OFS $3
next
}
NR==2 { # second record, continue header construct
h=h $0 # space was in the end of record NR==1
next
}
$1!=p { # when the table name changes
print "Data : " $1 # print table name
print h # and header
}
{
for(i=2;i<=NF;i++) # print fields 2->
printf "%s%s",$i,(i==NF?ORS:OFS) # field separator or newline
p=$1 # remember the table name for next record
}' file
Output:
Data : archive_rule
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,ID,id,,int,,
2,EXECUTE SEQ,execute_seq,,int,,
...
Data : archive_table
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,ID,id,,int,,
2,ARCHIVE RULE ID,archive_rule_id,,int,,
...
Data : batch_job
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,BATCH JOB ID,batch_job_id,,int,,
2,JOB TYPE,job_type,,varchar,,
...

Bash XSV auto populate empty values with CSV column

I have a CSV export that I need to map to new values to in order to then import into a different system. I am using ArangoDB to create this data migration mapping.
Below is the full script used:
#!/bin/bash
execute () {
filepath=$1
prefix=$2
keyField=$3
filename=`basename "${filename%.csv}"`
collection="$prefix$filename"
filepath="/data-migration/$filepath"
# Check for "_key" column
if ! xsv headers "$1" | grep -q _key
# Add "_key" column using the keyfield provided
then
xsv select $keyField "$1" | sed -e "1s/$keyField/_key/" > "$1._key"
xsv cat columns "$1" "$1._key" > "$1.cat"
mv "$1.cat" "$1"
rm "$1._key"
fi
# Import CSV into Arango Collection
docker exec arango arangoimp --collection "$collection" --type csv "$filepath" --server.password ''
}
# This single line runs the execute() above
execute 'myDirectory/myFile.csv' prefix_ OLD_ORG_ID__C
So far I've deduced the $keyField (OLD_ORG_ID__C) parameter passed to the execute() function, is used in the loop of the script. This looks for $keyField column and then migrates the values to a newly created _key column using the XSV toolkit.
OLD_ORG_ID__C | _key
A123 -> A123
B123 -> B123
-> ## <-auto populate
Unfortunately not every row has a value for the OLD_ORG_ID__C column and as a result the _key for that row is also empty which then causes the import to Arango to fail.
Note: This _key field is necessary for my AQL scripts to work properly
How can I rewrite the loop to auto-index the blank values?
then
xsv select $keyField "$1" | sed -e "1s/$keyField/_key/" > "$1._key"
xsv cat columns "$1" "$1._key" > "$1.cat"
mv "$1.cat" "$1"
rm "$1._key"
fi
Is there a better way to solve this issue? Perhaps xsv sort by the keyField and then auto populate the from the blank rows to the end?
UPDATE: Per the comments/answer I tried something along these lines but so far still not working
#!/bin/bash
execute () {
filepath=$1
prefix=$2
keyField=$3
filename=`basename "${filename%.csv}"`
collection="$prefix$filename"
filepath="/data-migration/$filepath"
# Check for "_key" column
if ! xsv headers "$1" | grep -q _key
# Add "_key" column using the keyfield provided
then
awk -F, 'NR==1 { for(i=1; i<=NF;++i) if ($i == "'$keyField'") field=i; print; next }
$field == "" { $field = "_generated_" ++n }1' $1 > $1-test.csv
fi
}
# import a single collection if needed
execute 'agas/Account.csv' agas_ OLD_ORG_ID__C
This creates a Account-test.csv file but unfortunately it does not have the "_key" column or and changes to the OLD_ORG_ID__C values. Preferably I would only want to see the "_key" values populated with auto-numbered values when OLD_ORG_ID__C is blank, otherwise they should copy the provided value.
If your question is "how can I find from the first header line of a CSV file which field is named OLD_ORG_ID__C, then on subsequent lines put a unique value in this column if it is empty" try something like
awk -F, 'NR==1 { for(i=1; i<=NF;++i) if ($i == "OLD_ORG_ID__C") field=i ; print; next }
$field == "" { $field = "_generated_" ++n }1' file >newfile
This has no provision for coping with complexities like quoted fields with embedded commas. (I have no idea what xsv is but maybe it would be better equipped for such scenarios?)
If I can guess what this code does
xsv select $keyField "$1" |
sed -e "1s/$keyField/_key/" > "$1._key"
then probably you could replace it with something like
xsv select "$keyField" "$1" |
awk -v field="$keyField" 'NR==1 { $0 = field }
/^$/ { $0 = NR } 1' >"$1._key"
to replace the first line with the value of $keyField and replace any subsequent empty lines with their line number.

Want to sort a file based on another file in unix shell

I have 2 files refer.txt and parse.txt
refer.txt contains the following
julie,remo,rob,whitney,james
parse.txt contains
remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,whitney/hello/1.0,julie/hello/2.0,julie/hello/3.0,rob/hello/4.0,james/hello/6.0
Now my output.txt should list the files in parse.txt based on the order specified in refer.txt
ex of output.txt should be:
julie/hello/2.0,julie/hello/3.0,remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,rob/hello/4.0,whitney/hello/1.0,james/hello/6.0
i have tried the following code:
sort -nru refer.txt parse.txt
but no luck.
please assist me.TIA
You can do that using gnu-awk:
awk -F/ -v RS=',|\n' 'FNR==NR{a[$1] = (a[$1])? a[$1] "," $0 : $0 ; next}
{s = (s)? s "," a[$1] : a[$1]} END{print s}' parse.txt refer.txt
Output:
julie/hello/2.0,julie/hello/3.0,remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,rob/hello/4.0,whitney/hello/1.0,james/hello/6.0
Explanation:
-F/ # Use field separator as /
-v RS=',|\n' # Use record separator as comma or newline
NR == FNR { # While processing parse.txt
a[$1]=(a[$1])?a[$1] ","$0:$0 # create an array with 1st field as key and value as all the
# records with keys julie, remo, rob etc.
}
{ # while processing the second file refer.txt
s = (s)?s "," a[$1]:a[$1] # aggregate all values by reading key from 2nd file
}
END {print s } # print all the values
In pure native bash (4.x):
# read each file into an array
IFS=, read -r -a values <parse.txt
IFS=, read -r -a ordering <refer.txt
# create a map from content before "/" to comma-separated full values in preserved order
declare -A kv=( )
for value in "${values[#]}"; do
key=${value%%/*}
if [[ ${kv[$key]} ]]; then
kv[$key]+=",$value" # already exists, comma-separate
else
kv[$key]="$value"
fi
done
# go through refer list, putting full value into "out" array for each entry
out=( )
for value in "${ordering[#]}"; do
out+=( "${kv[$value]}" )
done
# print "out" array in comma-separated form
IFS=,
printf '%s\n' "${out[*]}" >output.txt
If you're getting more output fields than you have input fields, you're probably trying to run this with bash 3.x. Since associative array support is mandatory for correct operation, this won't work.
tr , "\n" refer.txt | cat -n >person_id.txt # 'cut -n' not posix, use sed and paste
cat person_id.txt | while read person_id person_key
do
print "$person_id" > $person_key
done
tr , "\n" parse.txt | sed 's/(^[^\/]*)(\/.*)$/\1 \1\2/' >person_data.txt
cat person_data.txt | while read foreign_key person_data
do
person_id="$(<$foreign_key)"
print "$person_id" " " "$person_data" >>merge.txt
done
sort merge.txt >output.txt
A text book data processing approach, a person id table, a person data table, merged on a common key field, which is the first name of the person:
[person_key] [person_id]
- person id table, a unique sortable 'id' for each person (line number in this instance, since that is the desired sort order), and key for each person (their first name)
[person_key] [person_data]
- person data table, the data for each person indexed by 'person_key'
[person_id] [person_data]
- a merge of the 'person_id' table and 'person_data' table on 'person_key', which can then be sorted on person_id, giving the output as requested
The trick is to implement an associative array using files, the file name being the key (in this instance 'person_key'), the content being the value. [Essentially a random access file implemented using the filesystem.]
This actually adds a step to the otherwise simple but not very efficient task of grepping parse.txt with each value in refer.txt - which is more efficient I'm not sure.
NB: The above code is very unlikely to work out of the box.
NBB: On reflection, probably a better way of doing this would be to use the file system to create a random access file of parse.txt (essentially an index), and to then consider refer.txt as a batch file, submitting it as a job as such, printing out from the parse.txt random access file the data for each of the names read in from refer.txt in turn:
# 1) index data file on required field
cat person_data.txt | while read data
do
key="$(print "$data" | sed 's/(^[^\/]*)/\1/')" # alt. `cut -d'/' -f1` ??
print "$data" >>./person_data/"$key"
done
# 2) run batch job
cat refer_data.txt | while read key
do
print ./person_data/"$key"
done
However having said that, using egrep is probably just as rigorous a solution or at least for small datasets, I would most certainly use this approach given the specific question posed. (Or maybe not! The above could well prove faster as well as being more robust.)
Command
while read line; do
grep -w "^$line" <(tr , "\n" < parse.txt)
done < <(tr , "\n" < refer.txt) | paste -s -d , -
Key points
For both files, newlines are translated to commas using the tr command (without actually changing the files themselves). This is useful because while read and grep work under the assumption that your records are separated by newlines instead of commas.
while read will read in every name from refer.txt, (i.e julie, remo, etc.) and then use grep to retrieve lines from parse.txt containing that name.
The ^ in the regex ensures matching is only performed from the start of the string and not in the middle (thanks to #CharlesDuffy's comment below), and the -w option for grep allows whole-word matching only. For example, this ensures that "rob" only matches "rob/..." and not "robby/..." or "throb/...".
The paste command at the end will comma-separate the results. Removing this command will print each result on its own line.

Parse out key=value pairs into variables

I have a bunch of different kinds of files I need to look at periodically, and what they have in common is that the lines have a bunch of key=value type strings. So something like:
Version=2 Len=17 Hello Var=Howdy Other
I would like to be able to reference the names directly from awk... so something like:
cat some_file | ... | awk '{print Var, $5}' # prints Howdy Other
How can I go about doing that?
The closest you can get is to parse the variables into an associative array first thing every line. That is to say,
awk '{ delete vars; for(i = 1; i <= NF; ++i) { n = index($i, "="); if(n) { vars[substr($i, 1, n - 1)] = substr($i, n + 1) } } Var = vars["Var"] } { print Var, $5 }'
More readably:
{
delete vars; # clean up previous variable values
for(i = 1; i <= NF; ++i) { # walk through fields
n = index($i, "="); # search for =
if(n) { # if there is one:
# remember value by name. The reason I use
# substr over split is the possibility of
# something like Var=foo=bar=baz (that will
# be parsed into a variable Var with the
# value "foo=bar=baz" this way).
vars[substr($i, 1, n - 1)] = substr($i, n + 1)
}
}
# if you know precisely what variable names you expect to get, you can
# assign to them here:
Var = vars["Var"]
Version = vars["Version"]
Len = vars["Len"]
}
{
print Var, $5 # then use them in the rest of the code
}
$ cat file | sed -r 's/[[:alnum:]]+=/\n&/g' | awk -F= '$1=="Var"{print $2}'
Howdy Other
Or, avoiding the useless use of cat:
$ sed -r 's/[[:alnum:]]+=/\n&/g' file | awk -F= '$1=="Var"{print $2}'
Howdy Other
How it works
sed -r 's/[[:alnum:]]+=/\n&/g'
This places each key,value pair on its own line.
awk -F= '$1=="Var"{print $2}'
This reads the key-value pairs. Since the field separator is chosen to be =, the key ends up as field 1 and the value as field 2. Thus, we just look for lines whose first field is Var and print the corresponding value.
Since discussion in commentary has made it clear that a pure-bash solution would also be acceptable:
#!/bin/bash
case $BASH_VERSION in
''|[0-3].*) echo "ERROR: Bash 4.0 required" >&2; exit 1;;
esac
while read -r -a words; do # iterate over lines of input
declare -A vars=( ) # refresh variables for each line
set -- "${words[#]}" # update positional parameters
for word; do
if [[ $word = *"="* ]]; then # if a word contains an "="...
vars[${word%%=*}]=${word#*=} # ...then set it as an associative-array key
fi
done
echo "${vars[Var]} $5" # Here, we use content read from that line.
done <<<"Version=2 Len=17 Hello Var=Howdy Other"
The <<<"Input Here" could also be <file.txt, in which case lines in the file would be iterated over.
If you wanted to use $Var instead of ${vars[Var]}, then substitute printf -v "${word%%=*}" %s "${word*=}" in place of vars[${word%%=*}]=${word#*=}, and remove references to vars elsewhere. Note that this doesn't allow for a good way to clean up variables between lines of input, as the associative-array approach does.
I will try to explain you a very generic way to do this which you can adapt easily if you want to print out other stuff.
Assume you have a string which has a format like this:
key1=value1 key2=value2 key3=value3
or more generic
key1_fs2_value1_fs1_key2_fs2_value2_fs1_key3_fs2_value3
With fs1 and fs2 two different field separators.
You would like to make a selection or some operations with these values. To do this, the easiest is to store these in an associative array:
array["key1"] => value1
array["key2"] => value2
array["key3"] => value3
array["key1","full"] => "key1=value1"
array["key2","full"] => "key2=value2"
array["key3","full"] => "key3=value3"
This can be done with the following function in awk:
function str2map(str,fs1,fs2,map, n,tmp) {
n=split(str,map,fs1)
for (;n>0;n--) {
split(map[n],tmp,fs2);
map[tmp[1]]=tmp[2]; map[tmp[1],"full"]=map[n]
delete map[n]
}
}
So, after processing the string, you have the full flexibility to do operations in any way you like:
awk '
function str2map(str,fs1,fs2,map, n,tmp) {
n=split(str,map,fs1)
for (;n>0;n--) {
split(map[n],tmp,fs2);
map[tmp[1]]=tmp[2]; map[tmp[1],"full"]=map[n]
delete map[n]
}
}
{ str2map($0," ","=",map) }
{ print map["Var","full"] }
' file
The advantage of this method is that you can easily adapt your code to print any other key you are interested in, or even make selections based on this, example:
(map["Version"] < 3) { print map["var"]/map["Len"] }
The simplest and easiest way is to use the string substitution like this:
property='my.password.is=1234567890=='
name=${property%%=*}
value=${property#*=}
echo "'$name' : '$value'"
The output is:
'my.password.is' : '1234567890=='
Yore.
Using bash's set command, we can split the line into positional parameters like awk.
For each word, we'll try to read a name value pair delimited by =.
When we find a value, assign it to the variable named $key using bash's printf -v feature.
#!/usr/bin/env bash
line='Version=2 Len=17 Hello Var=Howdy Other'
set $line
for word in "$#"; do
IFS='=' read -r key val <<< "$word"
test -n "$val" && printf -v "$key" "$val"
done
echo "$Var $5"
output
Howdy Other
SYNOPSIS
an awk-based solution that doesn't require manually checking the fields to locate the desired key pair :
approach being avoid splitting unnecessary fields or arrays - only performing regex match via function call when needed
only returning FIRST occurrence of input key value. Subsequent matches along the row are NOT returned
i just called it S() cuz it's the closest letter to $
I only included an array (_) of the 3 test values for demo purposes. Those aren't needed. In fact, no state information is being kept at all
caveat being : key-match must be exact - this version of the code isn't for case-insensitive or fuzzy/agile matching
Tested and confirmed working on
- gawk 5.1.1
- mawk 1.3.4
- mawk-2/1.9.9.6
- macos nawk
CODE
# gawk profile, created Fri May 27 02:07:53 2022
{m,n,g}awk '
function S(__,_) {
return \
! match($(_=_<_), "(^|["(_="[:blank:]]")")"(__)"[=][^"(_)"*") \
? "^$" \
: substr(__=substr($-_, RSTART, RLENGTH), index(__,"=")+_^!_)
}
BEGIN { OFS = "\f" # This array is only for testing
_["Version"] _["Len"] _["Var"] # purposes. Feel free to discard at will
} {
for (__ in _) {
print __, S(__) } }'
OUTPUT
Var
Howdy
Len
17
Version
2
So either call the fields in BAU fashion
- $5, $0, $NF, etc
or call S(QUOTED_KEY_VALUE), case-sensitive, like
As a safeguard, to prevent mis-interpreting null strings
or invalid inputs as $0, a non-match returns ^$
instead of empty string
S("Version") to get back 2.
As a bonus, it can safely handle values in multibyte unicode, both for values and even for keys, regardless of whether ur awk is UTF-8-aware or not :
1 ✜
🤡
2 Version
2
3 Var
Howdy
4 Len
17
5 ✜=🤡 Version=2 Len=17 Hello Var=Howdy Other
I know this is particularly regarding awk but mentioning this as many people come here for solutions to break down name = value pairs ( with / without using awk as such).
I found below way simple straight forward and very effective in managing multiple spaces / commas as well -
Source: http://jayconrod.com/posts/35/parsing-keyvalue-pairs-in-bash
change="foo=red bar=green baz=blue"
#use below if var is in CSV (instead of space as delim)
change=`echo $change | tr ',' ' '`
for change in $changes; do
set -- `echo $change | tr '=' ' '`
echo "variable name == $1 and variable value == $2"
#can assign value to a variable like below
eval my_var_$1=$2;
done

Resources