How to crete a TSV file using Bash that takes as input some shell variables? - bash

I have some shell variables which equal different types of values :
variable 1: 72.9%
variable 2: 27.1%
variable 3: Y
variable 4: 8756
I want to be able to print the values of these variables to a tab separated file and possibly even have the name of the variables as column headers
output:
variable1 variable2 variable3 variable4
72.9% 27.1% Y 8756
Any ideas?

Relatively easy, you just need to read the values one line-at-a-time into individual array variables and then provide the formatted output, e.g.
#!/bin/bash
declare -a name
declare -a num
declare -a value
while read -r a b c; do
name+=( "$a" )
num+=( "$b" )
value+=( "$c" )
done < "$1"
## C-style loop used to index both name & num for headings
for ((i = 0; i < ${#name[#]}; i++)); do
printf "%s\t" "${name[i]}${num[i]%:}"
done
echo
for i in "${value[#]}"; do
printf "%s\t\t" "$i"
done
echo
Which will result in tab separated headings and values (you may need to play with the spacing a bit -- e.g. using 2 tabs on value output)
Example Use/Output
$ bash headings.sh csvdata.txt
variable1 variable2 variable3 variable4
72.9% 27.1% Y 8756
If you have the variables in the script itself, you will have to take the same approach. With a variable, you have the name, but will need to create an array holding the names, as well as the values in order to loop over the values to provide the output you want. Whether you write a temp_file and read the values in, or use arrays to store the names of the variables (created by string concatenation between the number num above) the process will be the same.
Variables Already In Script
As mentioned above, you will take a similar approach, only here, you choose the heading prefix, and just use the loop counter to add the number at the end of whatever name you choose, then simply loop over the values you have stored in the array, e.g.
#!/bin/bash
foo="72.9%"
bar="27.1%"
baz="Y"
buz="8756"
declare -a value
value=( "$foo" "$bar" "$baz" "$buz" )
for ((i = 0; i < ${#value[#]}; i++)); do
printf "%s\t" "variable$((i+1))"
done
echo
for i in "${value[#]}"; do
printf "%s\t\t" "$i"
done
echo
Example Use/Output
(the same)
$ bash headings2.sh
variable1 variable2 variable3 variable4
72.9% 27.1% Y 8756

Related

how to assign each of multiple lines in a file as different variable?

this is probably a very simple question. I looked at other answers but couldn't come up with a solution. I have a 365 line date file. file as below,
01-01-2000
02-01-2000
I need to read this file line by line and assign each day to a separate variable. like this,
d001=01-01-2000
d002=02-01-2000
I tried while read commands but couldn't get them to work.It takes a lot of time to shoot one by one. How can I do it quickly?
Trying to create named variable out of an associative array, is time waste and not supported de-facto. Better use this, using an associative array:
#!/bin/bash
declare -A array
while read -r line; do
printf -v key 'd%03d' $((++c))
array[$key]=$line
done < file
Output
for i in "${!array[#]}"; do echo "key=$i value=${array[$i]}"; done
key=d001 value=01-01-2000
key=d002 value=02-01-2000
Assumptions:
an array is acceptable
array index should start with 1
Sample input:
$ cat sample.dat
01-01-2000
02-01-2000
03-01-2000
04-01-2000
05-01-2000
One bash/mapfile option:
unset d # make sure variable is not currently in use
mapfile -t -O1 d < sample.dat # load each line from file into separate array location
This generates:
$ typeset -p d
declare -a d=([1]="01-01-2000" [2]="02-01-2000" [3]="03-01-2000" [4]="04-01-2000" [5]="05-01-2000")
$ for i in "${!d[#]}"; do echo "d[$i] = ${d[i]}"; done
d[1] = 01-01-2000
d[2] = 02-01-2000
d[3] = 03-01-2000
d[4] = 04-01-2000
d[5] = 05-01-2000
In OP's code, references to $d001 now become ${d[1]}.
A quick one-liner would be:
eval $(awk 'BEGIN{cnt=0}{printf "d%3.3d=\"%s\"\n",cnt,$0; cnt++}' your_file)
eval makes the shell variables known inside your script or shell. Use echo $d000 to show the first one of the newly defined variables. There should be no shell special characters (like * and $) inside your_file. Remove eval $() to see the result of the awk command. The \" quoted %s is to allow spaces in the variable values. If you don't have any spaces in your_file you can remove the \" before and after %s.

How to iterate over multiple variables and echo them using Shell Script?

Consider the below variables which are dynamic and might change each time. Sometimes there might even be 5 variables, But the length of all the variables will be the same every time.
var1='a b c d e... upto z'
var2='1 2 3 4 5... upto 26'
var3='I II III IV V... upto XXVI'
I am looking for a generalized approach to iterate the variables in a for loop & My desired output should be like below.
a,1,I
b,2,II
c,3,III
d,4,IV
e,5,V
.
.
goes on upto
z,26,XXVI
If I use nested loops, then I get all possible combinations which is not the expected outcome.
Also, I know how to make this work for 2 variables using for loop and shift using below link
https://unix.stackexchange.com/questions/390283/how-to-iterate-two-variables-in-a-sh-script
With paste
paste -d , <(tr ' ' '\n' <<<"$var1") <(tr ' ' '\n' <<<"$var2") <(tr ' ' '\n' <<<"$var3")
a,1,I
b,2,II
c,3,III
d,4,IV
e...z,5...26,V...XXVI
But clearly having to add other parameter substitutions for more varN's is not scalable.
You need to "zip" two variables at a time.
var1='a b c d e...z'
var2='1 2 3 4 5...26'
var3='I II III IV V...XXVI'
zip_var1_var2 () {
set $var1
for v2 in $var2; do
echo "$1,$v2"
shift
done
}
zip_var12_var3 () {
set $(zip_var1_var2)
for v3 in $var3; do
echo "$1,$v3"
shift
done
}
for x in $(zip_var12_var3); do
echo "$x"
done
If you are willing to use eval and are sure it is safe to do so, you can write a single function like
zip () {
if [ $# -eq 1 ]; then
eval echo \$$1
return
fi
a1=$1
shift
x=$*
set $(eval echo \$$a1)
for v in $(zip $x); do
printf '=== %s\n' "$1,$v" >&2
echo "$1,$v"
shift
done
}
zip var1 var2 var3 # Note the arguments are the *names* of the variables to zip
If you can use arrays, then (for example, in bash)
var1=(a b c d e)
var2=(1 2 3 4 5)
var3=(I II III IV V)
for i in "${!var1[#]}"; do
printf '%s,%s,%s\n' "${var1[i]}" "${var2[i]}" "${var3[i]}"
done
Use this Perl one-liner:
perl -le '#in = map { [split] } #ARGV; for $i ( 0..$#{ $in[0] } ) { print join ",", map { $in[$_][$i] } 0..$#in; }' "$var1" "$var2" "$var3"
Prints:
a,1,I
b,2,II
c,3,III
d,4,IV
e,5,V
z,26,XXVI
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
The input variables must be quoted with double quotes "like so", to keep the blank-separated words from being treated as separate arguments.
#ARGV is an array of the command line arguments, here $var1, $var2, $var3.
#in is an array of 3 elements, each element being a reference to an array obtained as a result of splitting the corresponding element of #ARGV on whitespace. Note that split splits the string on whitespace by default, but you can specify a different delimiter, it accepts regexes.
The subsequent for loop prints #in elements separated by comma.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlvar: Perl predefined variables
The following is (almost) a copy of this answer with a few tweaks that make it fit this question.
The Original Question
First let’s assign a few variables to play with, 26 tokens in each of them:
var1="$(echo {a..z})"
var2="$(echo {1..26})"
var3="$(echo I II III IV \
V{,I,II,III} IX \
X{,I,II,III} XIV \
XV{,I,II,III} XIX \
XX{,I,II,III} XXIV \
XXV XXVI)"
var4="$(echo {A..Z})"
var5="$(echo {010101..262626..10101})"
Now we want a “magic” function that zips an arbitrary number of variables, ideally in pure Bash:
zip_vars var1 # a trivial test
zip_vars var{1..2} # a slightly less trivial test
zip_vars var{1..3} # the original question
zip_vars var{1..4} # more vars, becasuse we can
zip_vars var{1..5} # more vars, because why not
What could zip_vars look like? Here’s one in pure Bash, without any external commands:
zip_vars() {
local var
for var in "$#"; do
local -a "array_${var}"
local -n array_ref="array_${var}"
array_ref=(${!var})
local -ar "array_${var}"
done
local -n array_ref="array_${1}"
local -ir size="${#array_ref[#]}"
local -i i
local output
for ((i = 0; i < size; ++i)); do
output=
for var in "$#"; do
local -n array_ref="array_${var}"
output+=",${array_ref[i]}"
done
printf '%s\n' "${output:1}"
done
}
How it works:
It splits all variables (passed by reference (by variable name)) into arrays. For each variable varX it creates a local array array_varX.
It would be actually way easier if the input variables were already Bash arrays to start with (see below), but … we stick with the original question initially.
It determines the size of the first array and then blindly expects all arrays to be of that size.
For each index i from 0 to size - 1 it concatenates the ith elements of all arrays, separated by ,.
Arrays Make Things Easier
If you use Bash arrays from the very start, the script will be shorter and look simpler and there won’t be any string-to-array conversions.
zip_arrays() {
local -n array_ref="$1"
local -ir size="${#array_ref[#]}"
local -i i
local output
for ((i = 0; i < size; ++i)); do
output=
for arr in "$#"; do
local -n array_ref="$arr"
output+=",${array_ref[i]}"
done
printf '%s\n' "${output:1}"
done
}
arr1=({a..z})
arr2=({1..26})
arr3=( I II III IV
V{,I,II,III} IX
X{,I,II,III} XIV
XV{,I,II,III} XIX
XX{,I,II,III} XXIV
XXV
XXVI)
arr4=({A..Z})
arr5=({010101..262626..10101})
zip_arrays arr1 # a trivial test
zip_arrays arr{1..2} # a slightly less trivial test
zip_arrays arr{1..3} # (almost) the original question
zip_arrays arr{1..4} # more arrays, becasuse we can
zip_arrays arr{1..5} # more arrays, because why not

Arithmetic operations using numbers from grep

I have FILE from which I can extract two numbers using grep. The numbers appear in the last column.
$ grep number FILE
number1: 123
number2: 456
I would like to assign the numbers to variables, e.g. $num1 and $num2, and do some arithmetic operations using the variables.
How can I do this using bash commands?
Assumptions:
we want to match on lines that start with the string number
we will always find 2 matches for ^number from the input file
not interested in storing values in an array
Sample data:
$ cat file.dat
number1: 123
not a number: abc
number: 456
We'll use awk to find the desired values and print all to a single line of output:
$ awk '/^number/ { printf "%s ",$2 }' file.dat
123 456
From here we can use read to load the variables:
$ read -r num1 num2 < <(awk '/^number/ { printf "%s ",$2 }' file.dat)
$ typeset -p num1 num2
declare -- num1="123"
declare -- num2="456"
$ echo ".${num1}.${num2}."
.123.456.
NOTE: periods added as visual delimiters
Firstly, you need to extract the numbers from the file. Assuming that the file is always in the format stated, then you can use a while loop, combined with the the read command to read the numbers into a named variable, one row at a time.
You can then use the $(( )) operator to perform integer arithmetic to keep a running total of the incoming numbers.
For example:
#!/bin/bash
declare -i total=0 # -i declares an integer.
while read discard number; do # read returns false at EOF. discard is ignored.
total=$((total+number)) # Variables don't need '$' prefix in this case.
done < FILE # while loop passes STDIN to the 'read' command.
echo "Total is: ${total}"

How can I assign each column value to Its name?

I have a MetaData.csv file that contains many values to perform an analysis. All I want are:
1- Reading column names and making variables similar to column names.
2- Put values in each column into variables as an integer that can be read by other commands. column_name=Its_value
MetaData.csv:
MAF,HWE,Geno_Missing,Inds_Missing
0.05,1E-06,0.01,0.01
I wrote the following codes but it doesn't work well:
#!/bin/bash
Col_Names=$(head -n 1 MetaData.csv) # Cut header (camma sep)
Col_Names=$(echo ${Col_Names//,/ }) # Convert header to space sep
Col_Names=($Col_Names) # Convert header to an array
for i in $(seq 1 ${#Col_Names[#]}); do
N="$(head -1 MetaData.csv | tr ',' '\n' | nl |grep -w
"${Col_Names[$i]}" | tr -d " " | awk -F " " '{print $1}')";
${Col_Names[$i]}="$(cat MetaData.csv | cut -d"," -f$N | sed '1d')";
done
Output:
HWE=1E-06: command not found
Geno_Missing=0.01: command not found
Inds_Missing=0.01: command not found
cut: 2: No such file or directory
cut: 3: No such file or directory
cut: 4: No such file or directory
=: command not found
Expected output:
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
Problems:
1- I want to use array length (${#Col_Names[#]}) as the final iteration which is 5, but the array index start from 0 (0-4). So MAF column was not captured by the loop. Loop also iterate twice (once 0-4 and again 2-4!).
2- When I tried to call values in variables (echo $MAF), they were empty!
Any solution is really appreciated.
This produces the expected output you posted from the sample input you posted:
$ awk -F, -v OFS='=' 'NR==1{split($0,hdr); next} {for (i=1;i<=NF;i++) print hdr[i], $i}' MetaData.csv
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
If that's not all you need then edit your question to clarify your requirements.
If I'm understanding your requirements correctly, would you please try something like:
#!/bin/bash
nr=1 # initialize input line number to 1
while IFS=, read -r -a ary; do # split the line on "," then assign "ary" to the fields
if (( nr == 1 )); then # handle the header line
col_names=("${ary[#]}") # assign column names
else # handle the body lines
for (( i = 0; i < ${#ary[#]}; i++ )); do
printf -v "${col_names[i]}" "${ary[i]}"
# assign the variable "${col_names[i]}" to the input field
done
# now you can access the values via its column name
echo "Fnames=$Fnames"
echo "MAF=$MAF"
fname_list+=("$Fnames") # create a list of Fnames
fi
(( nr++ )) # increment the input line number
done < MetaData.csv
echo "${fname_list[#]}" # print the list of Fnames
Output:
Fnames=19.vcf.gz
MAF=0.05
Fnames=20.vcf.gz
MAF=
Fnames=21.vcf.gz
MAF=
Fnames=22.vcf.gz
MAF=
19.vcf.gz 20.vcf.gz 21.vcf.gz 22.vcf.gz
The statetemt IFS=, read -a ary is mostly equivalent to your
first three lines; it splits the input on ",", and assigns the
array variable ary to the field values.
There are several ways to use a variable's value as a variable name
(Indirect Variable References). printf -v VarName Value is one of them.
[EDIT]
Based on the OP's updated input file, here is an another version:
#!/bin/bash
nr=1 # initialize input line number to 1
while IFS=, read -r -a ary; do # split the line on "," then assign "ary" to the fields
if (( nr == 1 )); then # handle the header line
col_names=("${ary[#]}") # assign column names
else # handle the body lines
for (( i = 0; i < ${#ary[#]}; i++ )); do
printf -v "${col_names[i]}" "${ary[i]}"
# assign the variable "${col_names[i]}" to the input field
done
fi
(( nr++ )) # increment the input line number
done < MetaData.csv
for n in "${col_names[#]}"; do # iterate over the variable names
echo "$n=${!n}" # print variable name and its value
done
# you can also specify the variable names literally as follows:
echo "MAF=$MAF HWE=$HWE Geno_Missing=$Geno_Missing Inds_Missing=$Inds_Missing"
Output:
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
MAF=0.05 HWE=1E-06 Geno_Missing=0.01 Inds_Missing=0.01
As for the output, the first four lines are printed by echo "$n=${!n}" and the last line is printed by echo "MAF=$MAF ....
You can choose either statement depending on your usage of the variables in the following code.
I don't really think you can implement a robust CSV reader/parser in Bash, but you can implement it to work to some extent with simple CSV files. For example, a very simply bash-implemented CSV might look like this:
#!/bin/bash
set -e
ROW_NUMBER='0'
HEADERS=()
while IFS=',' read -ra ROW; do
if test "$ROW_NUMBER" == '0'; then
for (( I = 0; I < ${#ROW[#]}; I++ )); do
HEADERS["$I"]="${ROW[I]}"
done
else
declare -A DATA_ROW_MAP
for (( I = 0; I < ${#ROW[#]}; I++ )); do
DATA_ROW_MAP[${HEADERS["$I"]}]="${ROW[I]}"
done
# DEMO {
echo -e "${DATA_ROW_MAP['Fnames']}\t${DATA_ROW_MAP['Inds_Missing']}"
# } DEMO
unset DATA_ROW_MAP
fi
ROW_NUMBER=$((ROW_NUMBER + 1))
done
Note that is has multiple disadvantages:
it only works with ,-separated fields (truly "C"SV);
it cannot handle multiline records;
it cannot handle field escapes;
it considers the first row always represents a header row.
This is why many commands may produce and consume \0-delimited data just because this control character may be easier to use. Now what I'm not sure about is whether test is the only external command executed by bash (I believe it is, but it can be probably re-implemented using case so that no external test is executed?).
Example of use (with the demo output):
./read-csv.sh < MetaData.csv
19.vcf.gz 0.01
20.vcf.gz
21.vcf.gz
22.vcf.gz
I wouldn't recommend using this parser at all, but would recommend using a more CSV-oriented tool (Python would probably be the easiest choice to use; + or if your favorite language, as you mentioned, is R, then probably this is another option for you: Run R script from command line ).

Read multi variable csv bash build multi line file from it

I had what I thought was a simple concept which I could easily do as I did something similar.
I have an input file input.csv
1a,1b
2a,2b
I would like the following output
Output file 1
This is variable 1 named 1a ok
This is variable 2 named 1b ok
Output file 2
This is variable 1 named 2a ok
This is variable 2 named 2b ok
I thought I could do something similar to below
i=1
while IFS=, read var1 var2; do
echo This is variable 1 named "var1" > filenamei
echo This is variable 2 named "var2" >> filenamei
i=i+1
done </inputfile.csv
I previously wrote code to take a single variable from a long file and write output to a single file and it worked fine. Like below
Input file
a
b
Single output file
This is A
This is B
Script was
while read p;do
echo this is "$p" >>output file
done < input file
Been through lots of different errors but getting nowhere.
It will be easy by configuring double loop: the outer loop to iterate over lines and the inner one for comma-separated fields. Then how about:
#!/bin/bash
i=1
while read -r line; do
ifs_back="$IFS"
IFS=","
set -- $line
for ((j=1; j<=$#; j++)); do
echo This is variable "$j" named "${!j}" >> "filename${i}"
done
IFS="$ifs_back"
i=$((i+1))
done < "inputfile.csv"
Explanations:
In order to split the input line with commas, we temporarily set IFS to "," then assign the fields to positional parameters $1, $2.
The loop counter j for the inner loop starts with 1 and ends with $#1, number of fields.
We can access the value of the positional parameter via ${!j}.
As a clean up of the inner loop, we retrieve IFS and increment i for the next line.
The code above is flexible with #lines and #fields so would work with the input:
1a,1b
2a,2b
3a,3b
as wel as with:
1a,1b,1c
2a,2b,2c
3a,3b,3c
Hope this helps.

Resources