AWK: parsing arguments in a loop - bash

I'm trying to write a simple script that will display the fields specified by the user as bash arguments. For example I've got text file looks like this:
1 2 3 4 5
1 2 3 4 5
a b c d e
And for example user types:
./script.sh text 1 2 5
Where $1 = text, and other parameters (like $2 $3 and $4) are the fields, so output will look like this:
1 2 5
1 2 5
a b e
I've got this code which prints all the columns defined as a arguments, but one below the others:
#!/bin/bash
text="$1"
shift
for x in $#; do
awk '{print $var}' var="$x" $text
done
Output for example ./script.sh text 1 2 5:
1
1
a
2
2
b
5
5
e
I guess output looks like that because loop "for" is outside of AWK. Is it a good solution for this task to place the loop inside AWK? I tried a few things but always have trouble with the syntax.
Thank you for your time and help!

file="$1"
shift
awk -v flds="$*" 'BEGIN{n=split(flds,f)} {for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS)}' "$file"

You don't need to loop over the params, pass all of them to awk with -v option:
awk -v v1=$2 -v v2=$3 -v v3=$4 '{print $v1, $v2, $v3;}' $1
You may want to perform additional checks such as whether the file ($1) contains enough fields, the file ($1) exists etc. But the idea is the same.
In your code, you are reading the file multiple times, each checking for only a particular field but to get your desired output, each line must be checked for multiple fields at the same time.
Pass the columns to awk and split them into an array and print the column corresponding to each value in the array:
file=$1
shift
p="$#"
awk -v l="$p" '{t=split(l,a," "); for (i=1;i<=t;i++) printf $(a[i]) " ";printf "\n";}' $file

Related

Modify values of one column based on values of another column on a line-by-line basis

I'm looking to use bash/awk/sed in order to modify a document.
The document contains multiple columns. Column 5 currently has the value "A" at every row. Column six is composed of increasing numbers. I'm attempting a script that goes through the document line by line, checks the value of Column 6, if the value is greater than a certain integer (specifically 275) the value of Column 5 in that same line is changed to "B".
while IFS="" read -r line ; do
awk 'BEGIN {FS = " "}'
Num=$(awk '{print $6}' original.txt)
if [ $Num > 275 ] ; then
awk '{ gsub("A","B",$5) }'
fi
done < original.txt >> edited.txt
For the above, I've tried setting the residueNum variable both inside and outside of the while loop.
I've also tried using a for loop and cat:
awk 'BEGIN {FS = " "}' original.txt
Num=$(awk '{print $6}' heterodimer_P49913/unrelaxed_model_1.pdb)
integer=275
for data in $Num ; do
if [ $data > $integer ] ; then
##Change value in other column to "B" for all lines containing column 6 values greater than "integer"
fi
done
Thanks in advance.
GNU AWK does not need external while loop (there is implicit loop), if you need further explanation read awk info page. Let file.txt content be
1 2 3 4 A 100
1 2 3 4 A 275
1 2 3 4 A 300
and task to be
checks the value of Column 6, if the value is greater than a certain
integer (specifically 275) the value of Column 5 in that same line is
changed to "B".
then it might be done using GNU AWK following way
awk '$6>275{$5="B"}{print}' file.txt
which gives output
1 2 3 4 A 100
1 2 3 4 A 275
1 2 3 4 B 300
Explanation: action set value of 5th field ($5) to B is applied conditionally to rows where value of 6th field is greater than 275. Action to print is applied unconditionally to all lines. Observe that change if applied is done before printing.
(tested in GNU Awk 5.0.1)

How can one dynamically create a new csv from selected columns of another csv file?

I dynamically iterate through a csv file and select columns that fit the criteria I need. My CSV is separated by commas.
I save these indexes to an array that looks like
echo "${cols_needed[#]}"
1 3 4 7 8
I then need to write these columns to a new file and I've tried the following cut and awk commands, however, as the array is dynamically created, I cant seem to find the right commands that can select them all at once. I have tried cut, awk and paste commands.
awk -v fields=${cols_needed[#]} 'BEGIN{ n = split(fields,f) }
{ for (i=1; i<=n; ++i) printf "%s%s", $f[i], (i<n?OFS:ORS) }' test.csv
This throws an error as it cannot split the fields unless I hard code them (even then, it can only do 2), split on spaces.
fields="1 2’
I have tried to dynamically create -f parameters, but can only do so with one variable in a loop like so
for item in "${cols_needed[#]}";
do
cat test.csv | cut -f$item
done
which outputs one column at a time.
And I have tried to dynamically create it with commas - input as 1,3,4,7...
cat test.csv | cut -f${cols_needed[#]};
which also does not work!
Any help is appreciated! I understand awk does not work like bash and we cannot pass variables around in the same way. I feel like I'm going around in circles a bit! Thanks in advance.
Your first approach is ok, just:
change -v fields=${cols_needed[#]} to -v fields="${cols_needed[*]}", to pass the array as a single shell word
add FS=OFS="," to BEGIN, after splitting (you want to split on spaces, before FS is changed to ,)
ie. BEGIN {n = split(fields, f); FS=OFS=","}
Also, if there are no commas embedded in quoted csv fields, you can use cut:
IFS=,; cut -d, -f "${cols_needed[*]}" test.csv
If there are embedded commas, you can use gawk's FPAT, to only split fields on unquoted commas.
Here's an example using that.
# prepend $ to each number
for i in "${cols_needed[#]}"; do
fields[j++]="\$$i"
done
IFS=,
gawk -v FPAT='([^,]+)|(\"[^\"]+\")' -v OFS=, "{print ${fields[*]}}"
Injecting shell code in to an awk command is generally not great practice, but it's ok here IMO.
Expanding on my comments re: passing the bash array into awk:
Passing the array in as an awk variable:
$ cols_needed=(1 3 4 7 8)
$ typeset -p cols_needed
declare -a cols_needed=([0]="1" [1]="3" [2]="4" [3]="7" [4]="8")
$ awk -v fields="${cols_needed[*]}" 'BEGIN{n=split(fields,f); for (i=1;i<=n;i++) print i,f[i]}'
1 1
2 3
3 4
4 7
5 8
Passing the array in as a 'file' via process substitution:
$ awk 'FNR==NR{f[++n]=$1;next} END {for (i=1;i<=n;i++) print i,f[i]}' <(printf "%s\n" "${cols_needed[#]}")
1 1
2 3
3 4
4 7
5 8
As for OP's main question of extracting a specific set of columns from a .csv file ...
Borrowing dawg's .csv file:
$ cat file.csv
1,2,3,4,5,6,7,8
11,12,13,14,15,16,17,18
21,22,23,24,25,26,27,28
Expanding on the suggestion for passing the bash array in as an awk variable:
awk -v fields="${cols_needed[*]}" '
BEGIN { FS=OFS=","
n=split(fields,f," ")
}
{ pfx=""
for (i=1;i<=n;i++) {
printf "%s%s", pfx, $(f[i])
pfx=OFS
}
print ""
}
' file.csv
NOTE: this assumes OP has provided a valid list of column numbers; if there's some doubt as to the validity of the input (column) numbers then OP can add some logic to address said doubts (eg, are they integers? are they positive integers? do they reference a field (in file.csv) that actually exists?, etc)
This generates:
1,3,4,7,8
11,13,14,17,18
21,23,24,27,28
Suppose you have this variable in bash:
$ echo "${cols_needed[#]}"
3 4 7 8
And this CSV file:
$ cat file.csv
1,2,3,4,5,6,7,8
11,12,13,14,15,16,17,18
21,22,23,24,25,26,27,28
You can select columns of that csv file in awk this way:
awk '
BEGIN{FS=OFS=","}
FNR==NR{split($0, cols," "); next}
{
s=""
for (e=1;e<=length(cols); e++)
s=e<length(cols) ? s $(cols[e]) OFS : s $(cols[e])
print s
}' <(echo "${cols_needed[#]}") file.csv
Prints:
3,4,7,8
13,14,17,18
23,24,27,28
Or, you can do:
awk -v cw="${cols_needed[*]}" '
BEGIN{FS=OFS=","; split(cw, cols," ")}
{
s=""
for (e=1;e<=length(cols); e++)
s=e<length(cols) ? s $(cols[e]) OFS : s $(cols[e])
print s
}' file.csv
# same output
BTW, you can do this entirely with cut:
cut -d ',' -f $(IFS=, ; echo "${cols_needed[*]}") file.csv
3,4,7,8
13,14,17,18
23,24,27,28

bash cycle - output according to string from file

How to call the output file as the string in 4th column of output (or according to 4th column of ith row of the input)?
I tried:
for i in {1..321}; do
awk '(FNR==i) {outfile = $4 print $0 >> outfile}' RV1_phase;
done
or
for i in {1..321}; do
awk '(FNR==i) {outfile = $4; print $0}' RV1_phase > "$outfile";
done
input file:
1 2 2 a
4 5 6 f
4 4 5 f
....
....
desired input i=1
name: a
1 2 2 a
The aim: I have data that I plotted in gnuplot and I would like to plot set of figures named after string to know which point come from which file. The point will be coloured. I need to get files for plotting in gnuplot so I would like to create them using the cycle from my question.
Simply
for i in {1..321}; do
awk '(FNR==i) {print $0 >> $4}' RV1_phase;
done
The problem with your first attempt was that you didn't use a ; to separate the assignment to outfile from the print command. The separate variable isn't necessary, though.
You don't need a bash loop, either:
awk '1 <= FNR && FNR <= 321 {print $0 >> $4}' RV1_phase;

Sum values of specific columns using awk

So I have a file which looks like this:
1 4 6
2 5
3
I want to sum only specific columns, let's say the first and third.
And the output should look like this:
7
2
3
I store numbers of columns (arguments) in a variable:
x=${#:2} (because I omit first passed argument which is a $filename)
How to calclute this using awk in a bash script ?
I was thinking about sth like this
for i in ${#:2}
do
awk -v c=$i '{sum+=$c;print sum}' $fname
done
But it does not work properly.
How about something like this:
$ awk -v c="1 3" 'BEGIN{split(c,a)}{c=0;for(i in a) c+=$a[i]; print c}' file
7
2
3
Explained:
$ awk -v c="1 3" ' # the desired column list space-separated
BEGIN {
split(c,a) # if not space-separated, change it here
}
{
c=0; # reusing col var as count var. recycle or die!
for(i in a) # after split desired cols are in a arr, ie. a[1]=1, a[2]=3
c+=$a[i]; # sum em up
print c # print it
}' file
EDIT: changed comma-separation to space-separation.
awk '{print $1 + $3}' file
7
2
3

find numbers divisible by 3 in csv file using shell script

I have csv file having content like below :
1|2|3
4|5|6
7|8|9
Now I would like to find the numbers which are divisible by 3 using shell scripting.
I would like to use awk command for this. I am learning shell scripting. So could you please help me out to find solution.
awk -F'|' '{for(i=1;i<=NF;i++)if(!($i%3))print $i}' file
this awk one-liner shoud do.
With your example, the cmd outputs:
3
6
9
Using GNU awk, which allows for a multi-character record separator:
awk -v RS='[|[:space:]]+' '$0 % 3 == 0' file
This sets the record separator to one or more pipes or space characters, printing each record that divides evenly by 3.
You can use this awk:
awk -v RS='\\||\n' '$0 % 3 == 0' file
3
6
9
-v RS='\||\n' will set input record separator as | or newline, thus giving us each number in $0.
!($0 % 3) (modulo) will make sure to print only when $0 % %3 is zero.
Here's how to read a CSV file into a variable:
http://www.cyberciti.biz/faq/unix-linux-bash-read-comma-separated-cvsfile/
Here's how to find the length of the array to which the CSVs are saved:
L=${#array[#]}
Now run a for loop like this:
for i in `seq 1 $L`; do
if [ ${array[${i}]}%3 -eq 0 ] then
echo ${array[${i}]}
fi
done

Resources