values to array from variable names with pattern - bash

I have an unknown number of variable names with the pattern rundate*. For example, rundate=180618 && rundate2=180820. I know from here that I can send multiple variable names to a third variable: alld=(`echo "${!rundate*}"`) and while attempting to solve my problem, I figured out how to send multiple variable indices to a third variable: alld_indices=(`echo "${!alld[#]}"`). But, how do I send multiple values to my third variable: alld_values such that echo ${alld_values[#]} gives 180618 180820. I know from here how I can get the first value: firstd_value=(`echo "${!alld}"`). I suspect, I've seen the answer already in my searching but did not realize it. Happy to delete my question if that is the case. Thanks!

#!/usr/bin/env bash
# set up some test data
rundate="180618"
rundate1="180820"
rundate2="Values With Spaces Work Too"
# If we know all values are numeric, we can use a regular indexed array
# otherwise, the below would need to be ''declare -A alld=( )''
alld=( ) # initialize an array
for v in "${!rundate#}"; do # using # instead of * avoids IFS-related bugs
alld[${v#rundate}]=${!v} # populate the array, using varname w/o prefix as key
done
# print our results
printf 'Full array definition:\n '
declare -p alld # emits code that, if run, will redefine the array
echo; echo "Indexes only:"
printf ' - %s\n' "${!alld[#]}" # "${!varname[#]}" expands to the list of keys
echo; echo "Values only:"
printf ' - %s\n' "${alld[#]}" # "${varname[#]}" expands to the list of values
...properly emits as output:
Full array definition:
declare -a alld=([0]="180618" [1]="180820" [2]="Values With Spaces Work Too")
Indexes only:
- 0
- 1
- 2
Values only:
- 180618
- 180820
- Values With Spaces Work Too
...as you can see running at https://ideone.com/yjSD1J

eval in a loop will do it.
$: for v in ${!rundate*}
> do eval "alld_values+=( \$$v )"
> done
$: echo "${alld_values[#]}"
180618 180820
or
$: eval "alld_values=( $( sed 's/ / $/g' <<< " ${!rundate*}" ) )"
or
$: echo "alld_values=( $( sed 's/ / $/g' <<< " ${!rundate*}" ) )" > tmp && . tmp

Related

how to assign each of multiple lines in a file as different variable?

this is probably a very simple question. I looked at other answers but couldn't come up with a solution. I have a 365 line date file. file as below,
01-01-2000
02-01-2000
I need to read this file line by line and assign each day to a separate variable. like this,
d001=01-01-2000
d002=02-01-2000
I tried while read commands but couldn't get them to work.It takes a lot of time to shoot one by one. How can I do it quickly?
Trying to create named variable out of an associative array, is time waste and not supported de-facto. Better use this, using an associative array:
#!/bin/bash
declare -A array
while read -r line; do
printf -v key 'd%03d' $((++c))
array[$key]=$line
done < file
Output
for i in "${!array[#]}"; do echo "key=$i value=${array[$i]}"; done
key=d001 value=01-01-2000
key=d002 value=02-01-2000
Assumptions:
an array is acceptable
array index should start with 1
Sample input:
$ cat sample.dat
01-01-2000
02-01-2000
03-01-2000
04-01-2000
05-01-2000
One bash/mapfile option:
unset d # make sure variable is not currently in use
mapfile -t -O1 d < sample.dat # load each line from file into separate array location
This generates:
$ typeset -p d
declare -a d=([1]="01-01-2000" [2]="02-01-2000" [3]="03-01-2000" [4]="04-01-2000" [5]="05-01-2000")
$ for i in "${!d[#]}"; do echo "d[$i] = ${d[i]}"; done
d[1] = 01-01-2000
d[2] = 02-01-2000
d[3] = 03-01-2000
d[4] = 04-01-2000
d[5] = 05-01-2000
In OP's code, references to $d001 now become ${d[1]}.
A quick one-liner would be:
eval $(awk 'BEGIN{cnt=0}{printf "d%3.3d=\"%s\"\n",cnt,$0; cnt++}' your_file)
eval makes the shell variables known inside your script or shell. Use echo $d000 to show the first one of the newly defined variables. There should be no shell special characters (like * and $) inside your_file. Remove eval $() to see the result of the awk command. The \" quoted %s is to allow spaces in the variable values. If you don't have any spaces in your_file you can remove the \" before and after %s.

Is it possible to save perl hash into bash array?

I have done some processing in perl, and got the result in perl's hash data structure. Usually in bash, when I try to retrieve result from other script like
output=$(perl -E '...')
I got the output in string. Is it possible to save the result in bash array?
Assuming a perl variable hash is an associative array, please try:
declare -A "output=($(perl -e '
$hash{"foo"} = "xx"; # just an example
$hash{"bar"} = "yy"; # ditto
for (keys %hash) {print "[\"$_\"]=\"$hash{$_}\"\n"}'))"
for i in "${!output[#]}"; do
echo "$i => ${output[$i]}" # see the result
done
The outermost double quotes around output=.. is required to tell declare
to evaluate the argument.
[Update]
Considering tripleee's comment, here is a robust version against special characters:
mapfile -d "" -t a < <(perl -e '
$hash{"baz"} = "boo"; # example
$hash{"foo"} = "x\"x"; # example with a double quote
$hash{"bar"} = "y\ny"; # example with a newline
print join("\0", %hash), "\0"') # use a nul byte as a delimiter
declare -A output # bash associative array
for ((i = 0; i < ${#a[#]}; i+=2 )); do
output[${a[i]}]=${a[i+1]} # key and value pair
done
for i in "${!output[#]}"; do
echo "$i => ${output[$i]}" # see the result
done
The conversion from perl variables to bash variables works only if they are free of null bytes (\n), as perl can store null bytes in strings, but bash cannot.
At least, we can use that limitation to print the hash in perl with null delimiters and safely parse it in bash again:
declare -A "array=($(
perl -e 'print join("\0", %hash), "\0"' |
xargs -0 printf '[%q]=%q '
))"
Please note that neither %q nor -0 are specified by posix. For a more portable solution see tshiono's answer.
If the hash is very big such that ARG_MAX might be exceeded you should ensure that xargs does not split a key value pair across two calls to printf. To do so, add the option -n2 (or any other number 2n where you are sure that n key value pairs never exceed ARG_MAX).

How can I assign each column value to Its name?

I have a MetaData.csv file that contains many values to perform an analysis. All I want are:
1- Reading column names and making variables similar to column names.
2- Put values in each column into variables as an integer that can be read by other commands. column_name=Its_value
MetaData.csv:
MAF,HWE,Geno_Missing,Inds_Missing
0.05,1E-06,0.01,0.01
I wrote the following codes but it doesn't work well:
#!/bin/bash
Col_Names=$(head -n 1 MetaData.csv) # Cut header (camma sep)
Col_Names=$(echo ${Col_Names//,/ }) # Convert header to space sep
Col_Names=($Col_Names) # Convert header to an array
for i in $(seq 1 ${#Col_Names[#]}); do
N="$(head -1 MetaData.csv | tr ',' '\n' | nl |grep -w
"${Col_Names[$i]}" | tr -d " " | awk -F " " '{print $1}')";
${Col_Names[$i]}="$(cat MetaData.csv | cut -d"," -f$N | sed '1d')";
done
Output:
HWE=1E-06: command not found
Geno_Missing=0.01: command not found
Inds_Missing=0.01: command not found
cut: 2: No such file or directory
cut: 3: No such file or directory
cut: 4: No such file or directory
=: command not found
Expected output:
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
Problems:
1- I want to use array length (${#Col_Names[#]}) as the final iteration which is 5, but the array index start from 0 (0-4). So MAF column was not captured by the loop. Loop also iterate twice (once 0-4 and again 2-4!).
2- When I tried to call values in variables (echo $MAF), they were empty!
Any solution is really appreciated.
This produces the expected output you posted from the sample input you posted:
$ awk -F, -v OFS='=' 'NR==1{split($0,hdr); next} {for (i=1;i<=NF;i++) print hdr[i], $i}' MetaData.csv
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
If that's not all you need then edit your question to clarify your requirements.
If I'm understanding your requirements correctly, would you please try something like:
#!/bin/bash
nr=1 # initialize input line number to 1
while IFS=, read -r -a ary; do # split the line on "," then assign "ary" to the fields
if (( nr == 1 )); then # handle the header line
col_names=("${ary[#]}") # assign column names
else # handle the body lines
for (( i = 0; i < ${#ary[#]}; i++ )); do
printf -v "${col_names[i]}" "${ary[i]}"
# assign the variable "${col_names[i]}" to the input field
done
# now you can access the values via its column name
echo "Fnames=$Fnames"
echo "MAF=$MAF"
fname_list+=("$Fnames") # create a list of Fnames
fi
(( nr++ )) # increment the input line number
done < MetaData.csv
echo "${fname_list[#]}" # print the list of Fnames
Output:
Fnames=19.vcf.gz
MAF=0.05
Fnames=20.vcf.gz
MAF=
Fnames=21.vcf.gz
MAF=
Fnames=22.vcf.gz
MAF=
19.vcf.gz 20.vcf.gz 21.vcf.gz 22.vcf.gz
The statetemt IFS=, read -a ary is mostly equivalent to your
first three lines; it splits the input on ",", and assigns the
array variable ary to the field values.
There are several ways to use a variable's value as a variable name
(Indirect Variable References). printf -v VarName Value is one of them.
[EDIT]
Based on the OP's updated input file, here is an another version:
#!/bin/bash
nr=1 # initialize input line number to 1
while IFS=, read -r -a ary; do # split the line on "," then assign "ary" to the fields
if (( nr == 1 )); then # handle the header line
col_names=("${ary[#]}") # assign column names
else # handle the body lines
for (( i = 0; i < ${#ary[#]}; i++ )); do
printf -v "${col_names[i]}" "${ary[i]}"
# assign the variable "${col_names[i]}" to the input field
done
fi
(( nr++ )) # increment the input line number
done < MetaData.csv
for n in "${col_names[#]}"; do # iterate over the variable names
echo "$n=${!n}" # print variable name and its value
done
# you can also specify the variable names literally as follows:
echo "MAF=$MAF HWE=$HWE Geno_Missing=$Geno_Missing Inds_Missing=$Inds_Missing"
Output:
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
MAF=0.05 HWE=1E-06 Geno_Missing=0.01 Inds_Missing=0.01
As for the output, the first four lines are printed by echo "$n=${!n}" and the last line is printed by echo "MAF=$MAF ....
You can choose either statement depending on your usage of the variables in the following code.
I don't really think you can implement a robust CSV reader/parser in Bash, but you can implement it to work to some extent with simple CSV files. For example, a very simply bash-implemented CSV might look like this:
#!/bin/bash
set -e
ROW_NUMBER='0'
HEADERS=()
while IFS=',' read -ra ROW; do
if test "$ROW_NUMBER" == '0'; then
for (( I = 0; I < ${#ROW[#]}; I++ )); do
HEADERS["$I"]="${ROW[I]}"
done
else
declare -A DATA_ROW_MAP
for (( I = 0; I < ${#ROW[#]}; I++ )); do
DATA_ROW_MAP[${HEADERS["$I"]}]="${ROW[I]}"
done
# DEMO {
echo -e "${DATA_ROW_MAP['Fnames']}\t${DATA_ROW_MAP['Inds_Missing']}"
# } DEMO
unset DATA_ROW_MAP
fi
ROW_NUMBER=$((ROW_NUMBER + 1))
done
Note that is has multiple disadvantages:
it only works with ,-separated fields (truly "C"SV);
it cannot handle multiline records;
it cannot handle field escapes;
it considers the first row always represents a header row.
This is why many commands may produce and consume \0-delimited data just because this control character may be easier to use. Now what I'm not sure about is whether test is the only external command executed by bash (I believe it is, but it can be probably re-implemented using case so that no external test is executed?).
Example of use (with the demo output):
./read-csv.sh < MetaData.csv
19.vcf.gz 0.01
20.vcf.gz
21.vcf.gz
22.vcf.gz
I wouldn't recommend using this parser at all, but would recommend using a more CSV-oriented tool (Python would probably be the easiest choice to use; + or if your favorite language, as you mentioned, is R, then probably this is another option for you: Run R script from command line ).

Generate a column for each file matching a glob

I'm having difficulties with something that sounds relatively simple. I have a few data files with single values in them as shown below:
data1.txt:
100
data2.txt
200
data3.txt
300
I have another file called header.txt and its a template file that contains the header as shown below:
Data_1 Data2 Data3
- - -
I'm trying to add the data from the data*.txt files to the last line of Master.txt
The desired output would be something like this:
Data_1 Data2 Data3
- - -
100 200 300
I'm actively working this so I'm not sure where to begin. This doesn't need to be implemented in pure shell -- use of standard UNIX tools such as awk or sed is entirely reasonable.
paste is the key tool:
#!/bin/bash
exec >>Master.txt
cat header.txt
paste $'-d\n' data1.txt data2.txt data3.txt |
while read line1
do
read line2
read line3
printf '%-10s %-10s %-10s\n' "$line1" "$line2" "$line3"
done
As a native-bash implementation:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4.0+ needed" >&2; exit 1;; esac
declare -A keys=( ) # define an associative array (a string->string map)
for f in data*.txt; do # iterate over data*.txt files
name=${f%.txt} # for each, remove the ".txt" extension to get our name...
keys[${name^}]=$(<"$f") # capitalize the first letter, and read the file to get the value
done
{ # start a group so we can redirect output just once
printf '%s\t' "${!keys[#]}"; echo # first line: keys in our associative array
printf '%s\t' "${keys[#]//*/-}"; echo # second line: convert values to dashes
printf '%s\t' "${keys[#]}"; echo # third line: print the values unmodified
} >>Master.txt # all the above with output redirected to Master.txt
Most of the magic here is performed by parameter expansions:
${f%.txt} trims the .txt extension from the end of $f
${name^} capitalizes the first letter of $name
"${keys[#]}" expands to all values in the array named keys
"${keys[#]//*/-} replaces * (everything) in each key with the fixed string -.
"${!keys[#]}" expands to the names of entries in the associative array keys.

Storing Bash function parameters in variables

I want to define and get all the required variables that are passed to a function as parameters:
function explain_vars() {
echo "Explaining vars '$#':" >&2
for _var in "$#"; do
printf " $_var: '${!_var}'\n" >&2
done
printf "\n" >&2
}
function read_params() (
## Define local variables
_vars=(v1 v2 v3)
local ${_vars[#]}
## Read variables from input:
read ${_vars[#]} <<< "$#"
explain_vars "${_vars[#]}"
)
The read takes puts all parameters in the specified variables, default delimiter here is space. So if I pass different strings as second parameter, read will store only the first string in the second parameter, and all the rest to the following parameters:
$ read_params one "two dot one" "three" "four"
Explaining vars 'v1 v2 v3':
v1: 'one'
v2: 'two'
v3: 'dot one three four'
As we can see, variable v2 is not synchronized with given parameters anymore. Moreover, it fails at reading empty strings:
$ read_params one "" " " '' ' ' "two dot one" "three" "four"
Explaining vars 'v1 v2 v3':
v1: 'one'
v2: 'two'
v3: 'dot one three four'
By looping through the all-parameters variable $# inside the function it is possible to distinguish variables:
function raw_params() (
echo "Explaining row parameters:"
for _v in "${#}"; do
printf " '$_v'\n"
done
)
$ raw_params one "" " " '' ' ' "two dot one" "three" "four"
Explaining row parameters:
'one'
''
' '
''
' '
'two dot one'
'three'
'four'
To me the read command offers benefit and quickness at defining, controlling and checking requested parameters that are passed to functions. However this works only for single and non-empty stringed parameters. Is it possible to read all different parameters in variables like the read command does, but respecting spaces and empty parameters? Or is there a better approach maybe?
From the original question seems read command is not correctly understood, read is a builtin which reads one line for standard input, IFS environment variable is used as field separator and -d option allows to change record delimiter (default is newline), for more information see read in bash manual.
The function arguments are retrieved using special variable "$#", bash syntax to assign an array is just
_vars=( "$#" ) # no space between variable name and = and no space between = and (
As a space is not valid in a variable name ${!_var} expansion will fail writing an error bash: ...: bad substitution, if _var contains an expression with a space.
function keyword is useless because of (), the use of parenthesis around the body of function instead of braces { ;} starts a new sub-shell.
I'm not sure what you are hoping to accomplish with this code, but this would appear to solve your problem for the case of three input parameters. Perhaps it shows you a way forward even if it doesn't completely do what you want.
read_params () (
## Define local variables
_vars=(v1 v2 v3)
local ${_vars[#]}
local i
for ((i=1; i<=$#; ++i)); do
## Read variables from input:
printf -v "${_vars[i-1]}" "${!i}"
done
explain_vars "${_vars[#]}"
)
I rewrote my function in a script according to tripleee's answer. printf works fine except for assigning empty values ('' and "") to arrays. For that, we need to format the string we pass to printf -v with '%s'. For the example I pass an array of parameters to my function twice: once per assigning them to my regular local parameters; a second time to pass the same parameters to my last local array variable. Here is the code:
$ cat parse-local-vars.sh
#!/usr/bin/env bash
function explain_vars() {
echo "Explaining vars '$#':" >&2
for _var in "$#"; do
printf " $_var: '${!_var}'\n" >&2
done
}
function parse_params() (
#
# Stores given parameters in defined local variables _vars.
# Last variable will be treated as an array and
# remaining parameters will be stored therein.
#
## Define local variables
local _vars=(v1 v_empty_double_quote v_spaced_double_quote v_empty_single_quote v_spaced_single_quote v2_1 v3 v4 args)
local ${_vars[#]}
## Make sure we assign parameters to variables
[ ${#_vars[#]} -gt 0 ] \
|| return 1
_args_pos=$(( ${#_vars[#]}-1 ))
_args_counter=0
local p
for ((p=1; p<=$#; ++p)); do
## Read variables from input:
if [ $p -le $_args_pos ]; then
#printf -v "${_vars[p-1]}" '%s' "${!p}"
printf -v "${_vars[p-1]}" "${!p}"
else
#printf -v "${_vars[_args_pos]}[$_args_counter]" '%s' "${!p}"
printf -v "${_vars[_args_pos]}[$_args_counter]" "${!p}" # Without the '%s' assigning empty variable to an array does not work
_args_counter=$(( _args_counter+1 ))
fi
done
explain_vars "${_vars[#]}"
echo "exlaining array args[#]: '${args[#]}'"
for _v in "${args[#]}"; do
echo " >'$_v'"
done
)
params_to_test=(one "" " " '' ' ' "two dot one" "three" "four")
parse_params "${params_to_test[#]}" "${params_to_test[#]}"
As one can see, here I use printf -v without formatting the parameter string (no use of '%s'):
$ bash parse-local-vars.sh
Explaining vars 'v1 v_empty_double_quote v_spaced_double_quote v_empty_single_quote v_spaced_single_quote v2_1 v3 v4 args':
v1: 'one'
v_empty_double_quote: ''
v_spaced_double_quote: ' '
v_empty_single_quote: ''
v_spaced_single_quote: ' '
v2_1: 'two dot one'
v3: 'three'
v4: 'four'
args: 'one'
exlaining array args[#]: 'one two dot one three four' (6 values)
>'one'
>' '
>' '
>'two dot one'
>'three'
>'four'
The empty parameters "" and '' are not passed to the array (6 values).
Allowing the string formatting of printf by toggling the following comments:
printf -v "${_vars[_args_pos]}[$_args_counter]" '%s' "${!p}"
#printf -v "${_vars[_args_pos]}[$_args_counter]" "${!p}" # Without the '%s' as,signing empty variable to an array does not work
Leads to this output:
$ bash parse-local-vars.sh
Explaining vars 'v1 v_empty_double_quote v_spaced_double_quote v_empty_single_quote v_spaced_single_quote v2_1 v3 v4 args':
v1: 'one'
v_empty_double_quote: ''
v_spaced_double_quote: ' '
v_empty_single_quote: ''
v_spaced_single_quote: ' '
v2_1: 'two dot one'
v3: 'three'
v4: 'four'
args: 'one'
exlaining array args[#]: 'one two dot one three four' (8 values)
>'one'
>''
>' '
>''
>' '
>'two dot one'
>'three'
>'four'
This is the expected results, as empty strings are correctly assigned to the array (8 values). I haven't figured out what is going on with passing empty strings to an array with printf -v, but using formatted strings with printf -v seems to be the safe way to go. Any correction, explanation and improvements are welcome.

Resources