Read multi variable csv bash build multi line file from it - bash

I had what I thought was a simple concept which I could easily do as I did something similar.
I have an input file input.csv
1a,1b
2a,2b
I would like the following output
Output file 1
This is variable 1 named 1a ok
This is variable 2 named 1b ok
Output file 2
This is variable 1 named 2a ok
This is variable 2 named 2b ok
I thought I could do something similar to below
i=1
while IFS=, read var1 var2; do
echo This is variable 1 named "var1" > filenamei
echo This is variable 2 named "var2" >> filenamei
i=i+1
done </inputfile.csv
I previously wrote code to take a single variable from a long file and write output to a single file and it worked fine. Like below
Input file
a
b
Single output file
This is A
This is B
Script was
while read p;do
echo this is "$p" >>output file
done < input file
Been through lots of different errors but getting nowhere.

It will be easy by configuring double loop: the outer loop to iterate over lines and the inner one for comma-separated fields. Then how about:
#!/bin/bash
i=1
while read -r line; do
ifs_back="$IFS"
IFS=","
set -- $line
for ((j=1; j<=$#; j++)); do
echo This is variable "$j" named "${!j}" >> "filename${i}"
done
IFS="$ifs_back"
i=$((i+1))
done < "inputfile.csv"
Explanations:
In order to split the input line with commas, we temporarily set IFS to "," then assign the fields to positional parameters $1, $2.
The loop counter j for the inner loop starts with 1 and ends with $#1, number of fields.
We can access the value of the positional parameter via ${!j}.
As a clean up of the inner loop, we retrieve IFS and increment i for the next line.
The code above is flexible with #lines and #fields so would work with the input:
1a,1b
2a,2b
3a,3b
as wel as with:
1a,1b,1c
2a,2b,2c
3a,3b,3c
Hope this helps.

Related

how to assign each of multiple lines in a file as different variable?

this is probably a very simple question. I looked at other answers but couldn't come up with a solution. I have a 365 line date file. file as below,
01-01-2000
02-01-2000
I need to read this file line by line and assign each day to a separate variable. like this,
d001=01-01-2000
d002=02-01-2000
I tried while read commands but couldn't get them to work.It takes a lot of time to shoot one by one. How can I do it quickly?
Trying to create named variable out of an associative array, is time waste and not supported de-facto. Better use this, using an associative array:
#!/bin/bash
declare -A array
while read -r line; do
printf -v key 'd%03d' $((++c))
array[$key]=$line
done < file
Output
for i in "${!array[#]}"; do echo "key=$i value=${array[$i]}"; done
key=d001 value=01-01-2000
key=d002 value=02-01-2000
Assumptions:
an array is acceptable
array index should start with 1
Sample input:
$ cat sample.dat
01-01-2000
02-01-2000
03-01-2000
04-01-2000
05-01-2000
One bash/mapfile option:
unset d # make sure variable is not currently in use
mapfile -t -O1 d < sample.dat # load each line from file into separate array location
This generates:
$ typeset -p d
declare -a d=([1]="01-01-2000" [2]="02-01-2000" [3]="03-01-2000" [4]="04-01-2000" [5]="05-01-2000")
$ for i in "${!d[#]}"; do echo "d[$i] = ${d[i]}"; done
d[1] = 01-01-2000
d[2] = 02-01-2000
d[3] = 03-01-2000
d[4] = 04-01-2000
d[5] = 05-01-2000
In OP's code, references to $d001 now become ${d[1]}.
A quick one-liner would be:
eval $(awk 'BEGIN{cnt=0}{printf "d%3.3d=\"%s\"\n",cnt,$0; cnt++}' your_file)
eval makes the shell variables known inside your script or shell. Use echo $d000 to show the first one of the newly defined variables. There should be no shell special characters (like * and $) inside your_file. Remove eval $() to see the result of the awk command. The \" quoted %s is to allow spaces in the variable values. If you don't have any spaces in your_file you can remove the \" before and after %s.

How can I assign each column value to Its name?

I have a MetaData.csv file that contains many values to perform an analysis. All I want are:
1- Reading column names and making variables similar to column names.
2- Put values in each column into variables as an integer that can be read by other commands. column_name=Its_value
MetaData.csv:
MAF,HWE,Geno_Missing,Inds_Missing
0.05,1E-06,0.01,0.01
I wrote the following codes but it doesn't work well:
#!/bin/bash
Col_Names=$(head -n 1 MetaData.csv) # Cut header (camma sep)
Col_Names=$(echo ${Col_Names//,/ }) # Convert header to space sep
Col_Names=($Col_Names) # Convert header to an array
for i in $(seq 1 ${#Col_Names[#]}); do
N="$(head -1 MetaData.csv | tr ',' '\n' | nl |grep -w
"${Col_Names[$i]}" | tr -d " " | awk -F " " '{print $1}')";
${Col_Names[$i]}="$(cat MetaData.csv | cut -d"," -f$N | sed '1d')";
done
Output:
HWE=1E-06: command not found
Geno_Missing=0.01: command not found
Inds_Missing=0.01: command not found
cut: 2: No such file or directory
cut: 3: No such file or directory
cut: 4: No such file or directory
=: command not found
Expected output:
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
Problems:
1- I want to use array length (${#Col_Names[#]}) as the final iteration which is 5, but the array index start from 0 (0-4). So MAF column was not captured by the loop. Loop also iterate twice (once 0-4 and again 2-4!).
2- When I tried to call values in variables (echo $MAF), they were empty!
Any solution is really appreciated.
This produces the expected output you posted from the sample input you posted:
$ awk -F, -v OFS='=' 'NR==1{split($0,hdr); next} {for (i=1;i<=NF;i++) print hdr[i], $i}' MetaData.csv
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
If that's not all you need then edit your question to clarify your requirements.
If I'm understanding your requirements correctly, would you please try something like:
#!/bin/bash
nr=1 # initialize input line number to 1
while IFS=, read -r -a ary; do # split the line on "," then assign "ary" to the fields
if (( nr == 1 )); then # handle the header line
col_names=("${ary[#]}") # assign column names
else # handle the body lines
for (( i = 0; i < ${#ary[#]}; i++ )); do
printf -v "${col_names[i]}" "${ary[i]}"
# assign the variable "${col_names[i]}" to the input field
done
# now you can access the values via its column name
echo "Fnames=$Fnames"
echo "MAF=$MAF"
fname_list+=("$Fnames") # create a list of Fnames
fi
(( nr++ )) # increment the input line number
done < MetaData.csv
echo "${fname_list[#]}" # print the list of Fnames
Output:
Fnames=19.vcf.gz
MAF=0.05
Fnames=20.vcf.gz
MAF=
Fnames=21.vcf.gz
MAF=
Fnames=22.vcf.gz
MAF=
19.vcf.gz 20.vcf.gz 21.vcf.gz 22.vcf.gz
The statetemt IFS=, read -a ary is mostly equivalent to your
first three lines; it splits the input on ",", and assigns the
array variable ary to the field values.
There are several ways to use a variable's value as a variable name
(Indirect Variable References). printf -v VarName Value is one of them.
[EDIT]
Based on the OP's updated input file, here is an another version:
#!/bin/bash
nr=1 # initialize input line number to 1
while IFS=, read -r -a ary; do # split the line on "," then assign "ary" to the fields
if (( nr == 1 )); then # handle the header line
col_names=("${ary[#]}") # assign column names
else # handle the body lines
for (( i = 0; i < ${#ary[#]}; i++ )); do
printf -v "${col_names[i]}" "${ary[i]}"
# assign the variable "${col_names[i]}" to the input field
done
fi
(( nr++ )) # increment the input line number
done < MetaData.csv
for n in "${col_names[#]}"; do # iterate over the variable names
echo "$n=${!n}" # print variable name and its value
done
# you can also specify the variable names literally as follows:
echo "MAF=$MAF HWE=$HWE Geno_Missing=$Geno_Missing Inds_Missing=$Inds_Missing"
Output:
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
MAF=0.05 HWE=1E-06 Geno_Missing=0.01 Inds_Missing=0.01
As for the output, the first four lines are printed by echo "$n=${!n}" and the last line is printed by echo "MAF=$MAF ....
You can choose either statement depending on your usage of the variables in the following code.
I don't really think you can implement a robust CSV reader/parser in Bash, but you can implement it to work to some extent with simple CSV files. For example, a very simply bash-implemented CSV might look like this:
#!/bin/bash
set -e
ROW_NUMBER='0'
HEADERS=()
while IFS=',' read -ra ROW; do
if test "$ROW_NUMBER" == '0'; then
for (( I = 0; I < ${#ROW[#]}; I++ )); do
HEADERS["$I"]="${ROW[I]}"
done
else
declare -A DATA_ROW_MAP
for (( I = 0; I < ${#ROW[#]}; I++ )); do
DATA_ROW_MAP[${HEADERS["$I"]}]="${ROW[I]}"
done
# DEMO {
echo -e "${DATA_ROW_MAP['Fnames']}\t${DATA_ROW_MAP['Inds_Missing']}"
# } DEMO
unset DATA_ROW_MAP
fi
ROW_NUMBER=$((ROW_NUMBER + 1))
done
Note that is has multiple disadvantages:
it only works with ,-separated fields (truly "C"SV);
it cannot handle multiline records;
it cannot handle field escapes;
it considers the first row always represents a header row.
This is why many commands may produce and consume \0-delimited data just because this control character may be easier to use. Now what I'm not sure about is whether test is the only external command executed by bash (I believe it is, but it can be probably re-implemented using case so that no external test is executed?).
Example of use (with the demo output):
./read-csv.sh < MetaData.csv
19.vcf.gz 0.01
20.vcf.gz
21.vcf.gz
22.vcf.gz
I wouldn't recommend using this parser at all, but would recommend using a more CSV-oriented tool (Python would probably be the easiest choice to use; + or if your favorite language, as you mentioned, is R, then probably this is another option for you: Run R script from command line ).

How to crete a TSV file using Bash that takes as input some shell variables?

I have some shell variables which equal different types of values :
variable 1: 72.9%
variable 2: 27.1%
variable 3: Y
variable 4: 8756
I want to be able to print the values of these variables to a tab separated file and possibly even have the name of the variables as column headers
output:
variable1 variable2 variable3 variable4
72.9% 27.1% Y 8756
Any ideas?
Relatively easy, you just need to read the values one line-at-a-time into individual array variables and then provide the formatted output, e.g.
#!/bin/bash
declare -a name
declare -a num
declare -a value
while read -r a b c; do
name+=( "$a" )
num+=( "$b" )
value+=( "$c" )
done < "$1"
## C-style loop used to index both name & num for headings
for ((i = 0; i < ${#name[#]}; i++)); do
printf "%s\t" "${name[i]}${num[i]%:}"
done
echo
for i in "${value[#]}"; do
printf "%s\t\t" "$i"
done
echo
Which will result in tab separated headings and values (you may need to play with the spacing a bit -- e.g. using 2 tabs on value output)
Example Use/Output
$ bash headings.sh csvdata.txt
variable1 variable2 variable3 variable4
72.9% 27.1% Y 8756
If you have the variables in the script itself, you will have to take the same approach. With a variable, you have the name, but will need to create an array holding the names, as well as the values in order to loop over the values to provide the output you want. Whether you write a temp_file and read the values in, or use arrays to store the names of the variables (created by string concatenation between the number num above) the process will be the same.
Variables Already In Script
As mentioned above, you will take a similar approach, only here, you choose the heading prefix, and just use the loop counter to add the number at the end of whatever name you choose, then simply loop over the values you have stored in the array, e.g.
#!/bin/bash
foo="72.9%"
bar="27.1%"
baz="Y"
buz="8756"
declare -a value
value=( "$foo" "$bar" "$baz" "$buz" )
for ((i = 0; i < ${#value[#]}; i++)); do
printf "%s\t" "variable$((i+1))"
done
echo
for i in "${value[#]}"; do
printf "%s\t\t" "$i"
done
echo
Example Use/Output
(the same)
$ bash headings2.sh
variable1 variable2 variable3 variable4
72.9% 27.1% Y 8756

How can I set a variable = null in for loop?

I have this code in Elastix2.5 (CentOS):
for variable in $(while read line; do myarray[ $index]="$line"; index=$(($index+1)); echo "$line"; done < prueba);
This extract the values for each line from "prueba" file.
Prueba file contents passwords like this:
Admin1234
Hello543
Chicken5444
Dino6759
3434Cars4
Adminis5555
But, $variable only get values from lines where there are letters, I need that it get NULL values from blank lines. How can I do it?
Your problem is use of a for loop with a command substitution ($(...)); let's look at this simple example:
$ for v in $(echo 'line_1'; echo ''; echo 'line_3'); do echo "$v"; done
line_1
line_3
Note how the empty string produced by the 2nd echo command is effectively discarded.
Analogously, any empty lines produced by your while loop are discarded.
The solution is to avoid for loops altogether for parsing command output:
In your case, simply use only the while loop for iterating over the input file:
while read -r line; do
myarray[index++]="$line"
done < prueba
printf '%s\n' "${myarray[#]}"
-r was added to ensure that read doesn't modify the input (doesn't try to interpret \-prefixed sequences) - this is good practice in general.
Note how incrementing the index was moved directly into the array subscript (index++).
printf '%s\n' "${myarray[#]}" prints all array elements after the file's been read, demonstrating that empty lines were read as well.
You can use is_null function.
is_null($a)
http://php.net/manual/en/function.is-null.php

Loop two variables through one command in shell

I want to run a shell script that can simultaneously loop through two variables.
So that I can have an input and output file name. I feel like this isn't too hard of a concept but any help is appreciated.
Files = "File1,
File2,
...
FileN
"
Output = OutFile1,
Outfile2,
...
OutfileN
"
and I would in theory my code would be:
for File in $Files
do
COMMAND --file $File --ouput $Output
done
Obviously, there needs to be another loop but I'm stuck, any help is appreciated.
You don't really need to loop 2 variables, just use 2 BASH arrays:
input=("File1" "File2" "File3")
output=("OutFile1" "OutFile2" "OutFile3")
for ((i=0; i<${#input[#]}; i++)); do
echo "Processing input=${input[$i]} and output=${output[$i]}"
done
zsh enables multiple loop variables before the list.
#!/bin/zsh
input2output=(
'File1' 'Outfile1'
'File2' 'Outfile2'
)
for input ouput in $input2output
do
echo "[$input] --> [$ouput]"
done
quotes from zsh(5.9) manual or man zshmisc
for name ... [ in word ... ] term do list done
More than one parameter name can appear before the list of words. If N names are given, then on each execution of the loop the next N words are assigned to the corresponding parameters. If there are more names than remaining words, the remaining parameters are each set to the empty string.

Resources