Print string variable that stores the output of a command in Bash [duplicate] - bash

This question already has answers here:
Add a prefix string to beginning of each line
(18 answers)
Closed last month.
I need to place the output of a command in Bash into a string variable.
Each value should be separated by a space. There are many options to do that but I cannot use mapfileor read options (I'm using Bash < 4 version in macOS).
This is the output of the command:
values="$(mycommand | awk 'NR > 2 { printf "%s\n", $2 }')"
where mycommand is just a cloud command that gets some values like:
echo $values
mycommand output: (which I think is a string ending with \n for each value)
55369972
75369973
85369974
95369975
This is what I'm trying to do:
Here I should print the values like (I need to iterate over the variable values so I can print each value individually).
desired output in the foor loop
value: 55369972
value: 75369973
value: 85369974
value: 95369975
but I'm getting this:
value: 55369972 75369973 85369974 95369975
# Getting the id field of the values
values="$(mycommand| awk 'NR > 2 { printf "%s\n", $2 }')"
# Replacing the new line with a space so I can iterate over each value
new_values="${values//$'\n'/ }"
# new_values=("${values//$'\n'/ }")
# Checking if I can print each value correctly
for i in "${new_values[#]}"
# for i in "$new_values"
do
echo "value: ${i}"
done
Also, I cannot use things like
# shellcheck disable=xxx
values=($(echo "${values}" | tr "\n" " "))
As I'm getting error messages when checking the code...
Any idea what I'm doing wrong in my code?

try this:
#!/bin/bash
values="$(mycommand | awk 'NR > 2 { printf "%s\n", $2 }')"
for v in $values; do
echo value: $v
done

Your step that replaces the newlines with spaces renders it as a string. If you want to split that string into a list, you should put it in brackets (based on this answer )
This should do what you are expecting:
# Getting the id field of the values
values="$(mycommand| awk 'NR > 2 { printf "%s\n", $2 }')"
# Replacing the new line with a space
new_values=("${values//$'\n'/ }")
# Checking if I can print the values correctly
for i in ${new_values}
do
echo "value: ${i}"
done
where new_values=("${values//$'\n'/ }") is the crucial part, then you need to avoid putting it in quotes when you iterate it (or you turn it back into a string)

Since I can't paste code into the comments, I post an answer but the credits go to #akathimy above.
This works for me (solution #1):
#!/bin/bash
# Getting the id field of the values
values="55369972 75369973 85369974 95369975"
#
for v in $values; do
echo value: "$v"
done
and this also (solution #2):
#!/bin/bash
# Getting the id field of the values
values="55369972
75369973
85369974
95369975"
#
for v in $values; do
echo value: "$v"
done
Edit: And what about this one (solution #3)? :
#!/bin/bash
# Getting the id field of the values
values=("55369972
75369973
85369974
95369975")
#
for v in ${values[#]}; do
echo value: "$v"
done
This last one works for me, and perhaps also for you. Let me know.

Related

How can I assign each column value to Its name?

I have a MetaData.csv file that contains many values to perform an analysis. All I want are:
1- Reading column names and making variables similar to column names.
2- Put values in each column into variables as an integer that can be read by other commands. column_name=Its_value
MetaData.csv:
MAF,HWE,Geno_Missing,Inds_Missing
0.05,1E-06,0.01,0.01
I wrote the following codes but it doesn't work well:
#!/bin/bash
Col_Names=$(head -n 1 MetaData.csv) # Cut header (camma sep)
Col_Names=$(echo ${Col_Names//,/ }) # Convert header to space sep
Col_Names=($Col_Names) # Convert header to an array
for i in $(seq 1 ${#Col_Names[#]}); do
N="$(head -1 MetaData.csv | tr ',' '\n' | nl |grep -w
"${Col_Names[$i]}" | tr -d " " | awk -F " " '{print $1}')";
${Col_Names[$i]}="$(cat MetaData.csv | cut -d"," -f$N | sed '1d')";
done
Output:
HWE=1E-06: command not found
Geno_Missing=0.01: command not found
Inds_Missing=0.01: command not found
cut: 2: No such file or directory
cut: 3: No such file or directory
cut: 4: No such file or directory
=: command not found
Expected output:
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
Problems:
1- I want to use array length (${#Col_Names[#]}) as the final iteration which is 5, but the array index start from 0 (0-4). So MAF column was not captured by the loop. Loop also iterate twice (once 0-4 and again 2-4!).
2- When I tried to call values in variables (echo $MAF), they were empty!
Any solution is really appreciated.
This produces the expected output you posted from the sample input you posted:
$ awk -F, -v OFS='=' 'NR==1{split($0,hdr); next} {for (i=1;i<=NF;i++) print hdr[i], $i}' MetaData.csv
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
If that's not all you need then edit your question to clarify your requirements.
If I'm understanding your requirements correctly, would you please try something like:
#!/bin/bash
nr=1 # initialize input line number to 1
while IFS=, read -r -a ary; do # split the line on "," then assign "ary" to the fields
if (( nr == 1 )); then # handle the header line
col_names=("${ary[#]}") # assign column names
else # handle the body lines
for (( i = 0; i < ${#ary[#]}; i++ )); do
printf -v "${col_names[i]}" "${ary[i]}"
# assign the variable "${col_names[i]}" to the input field
done
# now you can access the values via its column name
echo "Fnames=$Fnames"
echo "MAF=$MAF"
fname_list+=("$Fnames") # create a list of Fnames
fi
(( nr++ )) # increment the input line number
done < MetaData.csv
echo "${fname_list[#]}" # print the list of Fnames
Output:
Fnames=19.vcf.gz
MAF=0.05
Fnames=20.vcf.gz
MAF=
Fnames=21.vcf.gz
MAF=
Fnames=22.vcf.gz
MAF=
19.vcf.gz 20.vcf.gz 21.vcf.gz 22.vcf.gz
The statetemt IFS=, read -a ary is mostly equivalent to your
first three lines; it splits the input on ",", and assigns the
array variable ary to the field values.
There are several ways to use a variable's value as a variable name
(Indirect Variable References). printf -v VarName Value is one of them.
[EDIT]
Based on the OP's updated input file, here is an another version:
#!/bin/bash
nr=1 # initialize input line number to 1
while IFS=, read -r -a ary; do # split the line on "," then assign "ary" to the fields
if (( nr == 1 )); then # handle the header line
col_names=("${ary[#]}") # assign column names
else # handle the body lines
for (( i = 0; i < ${#ary[#]}; i++ )); do
printf -v "${col_names[i]}" "${ary[i]}"
# assign the variable "${col_names[i]}" to the input field
done
fi
(( nr++ )) # increment the input line number
done < MetaData.csv
for n in "${col_names[#]}"; do # iterate over the variable names
echo "$n=${!n}" # print variable name and its value
done
# you can also specify the variable names literally as follows:
echo "MAF=$MAF HWE=$HWE Geno_Missing=$Geno_Missing Inds_Missing=$Inds_Missing"
Output:
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
MAF=0.05 HWE=1E-06 Geno_Missing=0.01 Inds_Missing=0.01
As for the output, the first four lines are printed by echo "$n=${!n}" and the last line is printed by echo "MAF=$MAF ....
You can choose either statement depending on your usage of the variables in the following code.
I don't really think you can implement a robust CSV reader/parser in Bash, but you can implement it to work to some extent with simple CSV files. For example, a very simply bash-implemented CSV might look like this:
#!/bin/bash
set -e
ROW_NUMBER='0'
HEADERS=()
while IFS=',' read -ra ROW; do
if test "$ROW_NUMBER" == '0'; then
for (( I = 0; I < ${#ROW[#]}; I++ )); do
HEADERS["$I"]="${ROW[I]}"
done
else
declare -A DATA_ROW_MAP
for (( I = 0; I < ${#ROW[#]}; I++ )); do
DATA_ROW_MAP[${HEADERS["$I"]}]="${ROW[I]}"
done
# DEMO {
echo -e "${DATA_ROW_MAP['Fnames']}\t${DATA_ROW_MAP['Inds_Missing']}"
# } DEMO
unset DATA_ROW_MAP
fi
ROW_NUMBER=$((ROW_NUMBER + 1))
done
Note that is has multiple disadvantages:
it only works with ,-separated fields (truly "C"SV);
it cannot handle multiline records;
it cannot handle field escapes;
it considers the first row always represents a header row.
This is why many commands may produce and consume \0-delimited data just because this control character may be easier to use. Now what I'm not sure about is whether test is the only external command executed by bash (I believe it is, but it can be probably re-implemented using case so that no external test is executed?).
Example of use (with the demo output):
./read-csv.sh < MetaData.csv
19.vcf.gz 0.01
20.vcf.gz
21.vcf.gz
22.vcf.gz
I wouldn't recommend using this parser at all, but would recommend using a more CSV-oriented tool (Python would probably be the easiest choice to use; + or if your favorite language, as you mentioned, is R, then probably this is another option for you: Run R script from command line ).

Looping through multiline CSV rows in bash

I have the following csv file with 3 columns:
row1value1,row1value2,"row1
multi
line
value"
row2value1,row2value2,"row2
multi
line
value"
Is there a way to loop through its rows like (this does not work, it reads lines):
while read $ROW
do
#some code that uses $ROW variable
done < file.csv
Using gnu-awk you can do this using FPAT:
awk -v RS='"\n' -v FPAT='"[^"]*"|[^,]*' '{
print "Record #", NR, " =======>"
for (i=1; i<=NF; i++) {
sub(/^"/, "", $i)
printf "Field # %d, value=[%s]\n", i, $i
}
}' file.csv
Record # 1 =======>
Field # 1, value=[row1value1]
Field # 2, value=[row1value2]
Field # 3, value=[row1
multi
line
value]
Record # 2 =======>
Field # 1, value=[row2value1]
Field # 2, value=[row2value2]
Field # 3, value=[row2
multi
line
value]
However, as I commented above a dedicated CSV parser using PHP, Perl or Python will be more robust for this job.
Here is a pure bash solution. The multiline_csv.sh script translates the multiline csv into standard csv by replacing the newline characters between quotes with some replacement string. So the usage is
./multiline_csv.sh CSVFILE SEP
I placed your example script in a file called ./multi.csv. Running the command ./multiline_csv.sh ./multi.csv "\n" yielded the following output
[ericthewry#eric-arch-pc stackoverflow]$ ./multiline_csv.sh ./multi.csv "\n"
r1c2,r1c2,"row1\nmulti\nline\nvalue"
r2c1,r2c2,"row2\nmultiline\nvalue"
This can be easily translated back to the original csv file using printf:
[ericthewry#eric-arch-pc stackoverflow]$ printf "$(./multiline_csv.sh ./multi.csv "\n")\n"
r1c2,r1c2,"row1
multi
line
value"
r2c1,r2c2,"row2
multiline
value"
This might be an Arch-specific quirk of echo/sprintf (I'm not sure), but you could use some other separator string like ~~~++??//NEWLINE\\??++~~~ that you could sed out if need be.
# multiline_csv.sh
open=0
line_is_open(){
quote="$2"
(printf "$1" | sed -e "s/\(.\)/\1\n/g") | (while read char; do
if [[ "$char" = '"' ]]; then
open=$((($open + 1) % 2))
fi
done && echo $open)
}
cat "$1" | while read ln ; do
flatline="${ln}"
open=$(line_is_open "${ln}" $open)
until [[ "$open" = "0" ]]; do
if read newln
then
flatline="${flatline}$2${newln}"
open=$(line_is_open "${newln}" $open)
else
break
fi
done
echo "${flatline}"
done
Once you've done this translation, you can proceed as you would normally via the while read $ROW do ... done method.

Parse out key=value pairs into variables

I have a bunch of different kinds of files I need to look at periodically, and what they have in common is that the lines have a bunch of key=value type strings. So something like:
Version=2 Len=17 Hello Var=Howdy Other
I would like to be able to reference the names directly from awk... so something like:
cat some_file | ... | awk '{print Var, $5}' # prints Howdy Other
How can I go about doing that?
The closest you can get is to parse the variables into an associative array first thing every line. That is to say,
awk '{ delete vars; for(i = 1; i <= NF; ++i) { n = index($i, "="); if(n) { vars[substr($i, 1, n - 1)] = substr($i, n + 1) } } Var = vars["Var"] } { print Var, $5 }'
More readably:
{
delete vars; # clean up previous variable values
for(i = 1; i <= NF; ++i) { # walk through fields
n = index($i, "="); # search for =
if(n) { # if there is one:
# remember value by name. The reason I use
# substr over split is the possibility of
# something like Var=foo=bar=baz (that will
# be parsed into a variable Var with the
# value "foo=bar=baz" this way).
vars[substr($i, 1, n - 1)] = substr($i, n + 1)
}
}
# if you know precisely what variable names you expect to get, you can
# assign to them here:
Var = vars["Var"]
Version = vars["Version"]
Len = vars["Len"]
}
{
print Var, $5 # then use them in the rest of the code
}
$ cat file | sed -r 's/[[:alnum:]]+=/\n&/g' | awk -F= '$1=="Var"{print $2}'
Howdy Other
Or, avoiding the useless use of cat:
$ sed -r 's/[[:alnum:]]+=/\n&/g' file | awk -F= '$1=="Var"{print $2}'
Howdy Other
How it works
sed -r 's/[[:alnum:]]+=/\n&/g'
This places each key,value pair on its own line.
awk -F= '$1=="Var"{print $2}'
This reads the key-value pairs. Since the field separator is chosen to be =, the key ends up as field 1 and the value as field 2. Thus, we just look for lines whose first field is Var and print the corresponding value.
Since discussion in commentary has made it clear that a pure-bash solution would also be acceptable:
#!/bin/bash
case $BASH_VERSION in
''|[0-3].*) echo "ERROR: Bash 4.0 required" >&2; exit 1;;
esac
while read -r -a words; do # iterate over lines of input
declare -A vars=( ) # refresh variables for each line
set -- "${words[#]}" # update positional parameters
for word; do
if [[ $word = *"="* ]]; then # if a word contains an "="...
vars[${word%%=*}]=${word#*=} # ...then set it as an associative-array key
fi
done
echo "${vars[Var]} $5" # Here, we use content read from that line.
done <<<"Version=2 Len=17 Hello Var=Howdy Other"
The <<<"Input Here" could also be <file.txt, in which case lines in the file would be iterated over.
If you wanted to use $Var instead of ${vars[Var]}, then substitute printf -v "${word%%=*}" %s "${word*=}" in place of vars[${word%%=*}]=${word#*=}, and remove references to vars elsewhere. Note that this doesn't allow for a good way to clean up variables between lines of input, as the associative-array approach does.
I will try to explain you a very generic way to do this which you can adapt easily if you want to print out other stuff.
Assume you have a string which has a format like this:
key1=value1 key2=value2 key3=value3
or more generic
key1_fs2_value1_fs1_key2_fs2_value2_fs1_key3_fs2_value3
With fs1 and fs2 two different field separators.
You would like to make a selection or some operations with these values. To do this, the easiest is to store these in an associative array:
array["key1"] => value1
array["key2"] => value2
array["key3"] => value3
array["key1","full"] => "key1=value1"
array["key2","full"] => "key2=value2"
array["key3","full"] => "key3=value3"
This can be done with the following function in awk:
function str2map(str,fs1,fs2,map, n,tmp) {
n=split(str,map,fs1)
for (;n>0;n--) {
split(map[n],tmp,fs2);
map[tmp[1]]=tmp[2]; map[tmp[1],"full"]=map[n]
delete map[n]
}
}
So, after processing the string, you have the full flexibility to do operations in any way you like:
awk '
function str2map(str,fs1,fs2,map, n,tmp) {
n=split(str,map,fs1)
for (;n>0;n--) {
split(map[n],tmp,fs2);
map[tmp[1]]=tmp[2]; map[tmp[1],"full"]=map[n]
delete map[n]
}
}
{ str2map($0," ","=",map) }
{ print map["Var","full"] }
' file
The advantage of this method is that you can easily adapt your code to print any other key you are interested in, or even make selections based on this, example:
(map["Version"] < 3) { print map["var"]/map["Len"] }
The simplest and easiest way is to use the string substitution like this:
property='my.password.is=1234567890=='
name=${property%%=*}
value=${property#*=}
echo "'$name' : '$value'"
The output is:
'my.password.is' : '1234567890=='
Yore.
Using bash's set command, we can split the line into positional parameters like awk.
For each word, we'll try to read a name value pair delimited by =.
When we find a value, assign it to the variable named $key using bash's printf -v feature.
#!/usr/bin/env bash
line='Version=2 Len=17 Hello Var=Howdy Other'
set $line
for word in "$#"; do
IFS='=' read -r key val <<< "$word"
test -n "$val" && printf -v "$key" "$val"
done
echo "$Var $5"
output
Howdy Other
SYNOPSIS
an awk-based solution that doesn't require manually checking the fields to locate the desired key pair :
approach being avoid splitting unnecessary fields or arrays - only performing regex match via function call when needed
only returning FIRST occurrence of input key value. Subsequent matches along the row are NOT returned
i just called it S() cuz it's the closest letter to $
I only included an array (_) of the 3 test values for demo purposes. Those aren't needed. In fact, no state information is being kept at all
caveat being : key-match must be exact - this version of the code isn't for case-insensitive or fuzzy/agile matching
Tested and confirmed working on
- gawk 5.1.1
- mawk 1.3.4
- mawk-2/1.9.9.6
- macos nawk
CODE
# gawk profile, created Fri May 27 02:07:53 2022
{m,n,g}awk '
function S(__,_) {
return \
! match($(_=_<_), "(^|["(_="[:blank:]]")")"(__)"[=][^"(_)"*") \
? "^$" \
: substr(__=substr($-_, RSTART, RLENGTH), index(__,"=")+_^!_)
}
BEGIN { OFS = "\f" # This array is only for testing
_["Version"] _["Len"] _["Var"] # purposes. Feel free to discard at will
} {
for (__ in _) {
print __, S(__) } }'
OUTPUT
Var
Howdy
Len
17
Version
2
So either call the fields in BAU fashion
- $5, $0, $NF, etc
or call S(QUOTED_KEY_VALUE), case-sensitive, like
As a safeguard, to prevent mis-interpreting null strings
or invalid inputs as $0, a non-match returns ^$
instead of empty string
S("Version") to get back 2.
As a bonus, it can safely handle values in multibyte unicode, both for values and even for keys, regardless of whether ur awk is UTF-8-aware or not :
1 ✜
🤡
2 Version
2
3 Var
Howdy
4 Len
17
5 ✜=🤡 Version=2 Len=17 Hello Var=Howdy Other
I know this is particularly regarding awk but mentioning this as many people come here for solutions to break down name = value pairs ( with / without using awk as such).
I found below way simple straight forward and very effective in managing multiple spaces / commas as well -
Source: http://jayconrod.com/posts/35/parsing-keyvalue-pairs-in-bash
change="foo=red bar=green baz=blue"
#use below if var is in CSV (instead of space as delim)
change=`echo $change | tr ',' ' '`
for change in $changes; do
set -- `echo $change | tr '=' ' '`
echo "variable name == $1 and variable value == $2"
#can assign value to a variable like below
eval my_var_$1=$2;
done

Using bash and awk to print to a specific column in a new document

I am trying to use bash and awk together with a nested for loop to print data out into columns beside each other.
so far this is what I have:
for k in {1..147..3}
do
for i in "52" "64" "60" "70" "74"
do
awk -v x="${i}" -F, 'match ($0,x) { print $k }' all.csv > final.csv
done
done
echo "script has run"
I need to print out the information into the column k in the new file.. however that does not work.
so in the csv file data is like this:
52,9/05,6109
52,9/06,6119
64,9/05,7382
64,9/06,7392
64,9/07,3382
60,9/06,3829
...
I want my output like this:
52,9/05,6109,64,9/05,7382,60,9/06,3829
52,9/06,6119,64,9/06,7392
,,,64,9/07,3382
basically, all the 52s in the first column, the 64s in fourth column, the 60s in seventh column
Instead of print $k, use printf "%s,",$k.
printf is the print formatter function that is common to many languages. %s tells it the first argument should be a string.
Note that awk won't get the $k from the shell, so you'll need to add -v k=$k.

Substitute value with result of calling function on value in unix shell

I have a text stream that looks like this:
----------------------------------------
s123456789_9780
heartbeat:test # 1344280205000000: '0'
heartbeat:test # 1344272490000000: '0'
Those long numbers are timestamps in microseconds. I would like to run this output through some sort of pipe that will change those timestamps to a more human-understandable date.
I have a date command that can do that, given just the timestamp (with the following colon):
$ date --date=#$(echo 1344272490000000: | sed 's/.......$//') +%Y/%d/%m-%H:%M:%S
2012/06/08-10:01:30
I would like to end up with something like this:
----------------------------------------
s123456789_9780
heartbeat:test # 2012/06/08-12:10:05: '0'
heartbeat:test # 2012/06/08-10:01:30: '0'
I don't think sed will allow me to match the timestamp and replace it with the value of calling a shell function on it (although I'd love to be shown wrong). Perhaps awk can do it? I'm not very familiar with awk.
The other part that seems tricky to me is letting the lines that don't match through without modification.
I could of course write a Python program that would do this, but I'd rather keep this in shell if possible (this is generated inside a shell script, and I'd rather not have dependencies on outside files).
This might work for you (GNU sed):
sed '/# /!b;s//&\n/;h;s/.*\n//;s#\(.\{10\}\)[^:]*\(:.*\)#date --date=#\1 +%Y/%d/%m-%H:%M:%S"\2"#e;H;g;s/\n.*\n//' file
Explanation:
/# /!b bail out and just print any lines that don't contain an # followed by a space
s//&\n/ insert a newline after the above pattern
h copy the pattern space (PS) to the hold space (HS)
s/.*\n// delete upto and including the # followed by a space
s#\(.\{10\}\)[^:]*\(:.*\)#date --date=#\1 +%Y/%d/%m-%H:%M:%S"\2"#e from whats remaining in the PS, make a back reference of the first 10 characters and from the : to the end of the string. Have these passed in to the date command and evaluate the result into the PS
H append the PS to the HS inserting a newline at the same time
g copy the HS into the PS
s/\n.*\n// remove the original section of the string
Bash with a little sed, preserving the whitespace of the input:
while read -r; do
parts=($REPLY)
if [[ ${parts[0]} == "heartbeat:test" ]]; then
dateStr=$(date --date=#${parts[2]%000000:} +%Y/%d/%m-%H:%M:%S)
REPLY=$(echo "$REPLY" | sed "s#[0-9]\+000000:#$dateStr#")
fi
printf "%s\n" "$REPLY"
done
How about:
while read s1 at tm s2
do
tm=${tm%000000:}
echo $s1 $at $(date --date #$tm +%Y/%d/%m-%H:%M:%S)
done < yourfile
I would also like to see a sed solution, but it is a bit beyond my sed-fu. As awk supports strftime it is fairly straight forward here:
awk '
/^ *heartbeat/ {
gsub(".{7}$", "", $3)
$3 = strftime("%Y/%d/%m-%T", $3)
print " ", $1, $3
}
$0 !~ /heartbeat/' file
Output:
s123456789_9780
heartbeat:test 2012/06/08-21:10:05
heartbeat:test 2012/06/08-19:01:30
$3 is the microsecond field. gsub converts the timestamp to seconds.
The $0 !~ makes sure non-heartbeat lines are printed ({ print } implicitly is the default block).
This does it mostly within bash using your date command:
#!/bin/bash
IFS=$
while read a ; do
case "$a" in
*" # "[0-9]*) pre=${a% # *}
a=${a#$pre # }
post=${a##*:}
a=${a%??????:$post}
echo "$pre$(date --date=#$a +%Y/%d/%m-%H:%M:%S):$post"
;;
*) echo "$a" ;;
esac
done <<.
----------------------------------------
s123456789_9780
heartbeat:test # 1344280205000000: '0'
heartbeat:test # 1344272490000000: '0'
.

Resources