So I have a file which looks like this:
1 4 6
2 5
3
I want to sum only specific columns, let's say the first and third.
And the output should look like this:
7
2
3
I store numbers of columns (arguments) in a variable:
x=${#:2} (because I omit first passed argument which is a $filename)
How to calclute this using awk in a bash script ?
I was thinking about sth like this
for i in ${#:2}
do
awk -v c=$i '{sum+=$c;print sum}' $fname
done
But it does not work properly.
How about something like this:
$ awk -v c="1 3" 'BEGIN{split(c,a)}{c=0;for(i in a) c+=$a[i]; print c}' file
7
2
3
Explained:
$ awk -v c="1 3" ' # the desired column list space-separated
BEGIN {
split(c,a) # if not space-separated, change it here
}
{
c=0; # reusing col var as count var. recycle or die!
for(i in a) # after split desired cols are in a arr, ie. a[1]=1, a[2]=3
c+=$a[i]; # sum em up
print c # print it
}' file
EDIT: changed comma-separation to space-separation.
awk '{print $1 + $3}' file
7
2
3
Related
I dynamically iterate through a csv file and select columns that fit the criteria I need. My CSV is separated by commas.
I save these indexes to an array that looks like
echo "${cols_needed[#]}"
1 3 4 7 8
I then need to write these columns to a new file and I've tried the following cut and awk commands, however, as the array is dynamically created, I cant seem to find the right commands that can select them all at once. I have tried cut, awk and paste commands.
awk -v fields=${cols_needed[#]} 'BEGIN{ n = split(fields,f) }
{ for (i=1; i<=n; ++i) printf "%s%s", $f[i], (i<n?OFS:ORS) }' test.csv
This throws an error as it cannot split the fields unless I hard code them (even then, it can only do 2), split on spaces.
fields="1 2’
I have tried to dynamically create -f parameters, but can only do so with one variable in a loop like so
for item in "${cols_needed[#]}";
do
cat test.csv | cut -f$item
done
which outputs one column at a time.
And I have tried to dynamically create it with commas - input as 1,3,4,7...
cat test.csv | cut -f${cols_needed[#]};
which also does not work!
Any help is appreciated! I understand awk does not work like bash and we cannot pass variables around in the same way. I feel like I'm going around in circles a bit! Thanks in advance.
Your first approach is ok, just:
change -v fields=${cols_needed[#]} to -v fields="${cols_needed[*]}", to pass the array as a single shell word
add FS=OFS="," to BEGIN, after splitting (you want to split on spaces, before FS is changed to ,)
ie. BEGIN {n = split(fields, f); FS=OFS=","}
Also, if there are no commas embedded in quoted csv fields, you can use cut:
IFS=,; cut -d, -f "${cols_needed[*]}" test.csv
If there are embedded commas, you can use gawk's FPAT, to only split fields on unquoted commas.
Here's an example using that.
# prepend $ to each number
for i in "${cols_needed[#]}"; do
fields[j++]="\$$i"
done
IFS=,
gawk -v FPAT='([^,]+)|(\"[^\"]+\")' -v OFS=, "{print ${fields[*]}}"
Injecting shell code in to an awk command is generally not great practice, but it's ok here IMO.
Expanding on my comments re: passing the bash array into awk:
Passing the array in as an awk variable:
$ cols_needed=(1 3 4 7 8)
$ typeset -p cols_needed
declare -a cols_needed=([0]="1" [1]="3" [2]="4" [3]="7" [4]="8")
$ awk -v fields="${cols_needed[*]}" 'BEGIN{n=split(fields,f); for (i=1;i<=n;i++) print i,f[i]}'
1 1
2 3
3 4
4 7
5 8
Passing the array in as a 'file' via process substitution:
$ awk 'FNR==NR{f[++n]=$1;next} END {for (i=1;i<=n;i++) print i,f[i]}' <(printf "%s\n" "${cols_needed[#]}")
1 1
2 3
3 4
4 7
5 8
As for OP's main question of extracting a specific set of columns from a .csv file ...
Borrowing dawg's .csv file:
$ cat file.csv
1,2,3,4,5,6,7,8
11,12,13,14,15,16,17,18
21,22,23,24,25,26,27,28
Expanding on the suggestion for passing the bash array in as an awk variable:
awk -v fields="${cols_needed[*]}" '
BEGIN { FS=OFS=","
n=split(fields,f," ")
}
{ pfx=""
for (i=1;i<=n;i++) {
printf "%s%s", pfx, $(f[i])
pfx=OFS
}
print ""
}
' file.csv
NOTE: this assumes OP has provided a valid list of column numbers; if there's some doubt as to the validity of the input (column) numbers then OP can add some logic to address said doubts (eg, are they integers? are they positive integers? do they reference a field (in file.csv) that actually exists?, etc)
This generates:
1,3,4,7,8
11,13,14,17,18
21,23,24,27,28
Suppose you have this variable in bash:
$ echo "${cols_needed[#]}"
3 4 7 8
And this CSV file:
$ cat file.csv
1,2,3,4,5,6,7,8
11,12,13,14,15,16,17,18
21,22,23,24,25,26,27,28
You can select columns of that csv file in awk this way:
awk '
BEGIN{FS=OFS=","}
FNR==NR{split($0, cols," "); next}
{
s=""
for (e=1;e<=length(cols); e++)
s=e<length(cols) ? s $(cols[e]) OFS : s $(cols[e])
print s
}' <(echo "${cols_needed[#]}") file.csv
Prints:
3,4,7,8
13,14,17,18
23,24,27,28
Or, you can do:
awk -v cw="${cols_needed[*]}" '
BEGIN{FS=OFS=","; split(cw, cols," ")}
{
s=""
for (e=1;e<=length(cols); e++)
s=e<length(cols) ? s $(cols[e]) OFS : s $(cols[e])
print s
}' file.csv
# same output
BTW, you can do this entirely with cut:
cut -d ',' -f $(IFS=, ; echo "${cols_needed[*]}") file.csv
3,4,7,8
13,14,17,18
23,24,27,28
How to call the output file as the string in 4th column of output (or according to 4th column of ith row of the input)?
I tried:
for i in {1..321}; do
awk '(FNR==i) {outfile = $4 print $0 >> outfile}' RV1_phase;
done
or
for i in {1..321}; do
awk '(FNR==i) {outfile = $4; print $0}' RV1_phase > "$outfile";
done
input file:
1 2 2 a
4 5 6 f
4 4 5 f
....
....
desired input i=1
name: a
1 2 2 a
The aim: I have data that I plotted in gnuplot and I would like to plot set of figures named after string to know which point come from which file. The point will be coloured. I need to get files for plotting in gnuplot so I would like to create them using the cycle from my question.
Simply
for i in {1..321}; do
awk '(FNR==i) {print $0 >> $4}' RV1_phase;
done
The problem with your first attempt was that you didn't use a ; to separate the assignment to outfile from the print command. The separate variable isn't necessary, though.
You don't need a bash loop, either:
awk '1 <= FNR && FNR <= 321 {print $0 >> $4}' RV1_phase;
I have this file:
1
2
3
4
a
b
c
XY
Z
I want to convert every block into a TAB separated line, and append the current timestamp at the last column to get an output like this:
1 2 3 4 1548915098
a b c 1548915098
XY Z 1548915098
I can use awk to do it like this:
awk '$(NF+1)=systime()' RS= OFS="\t" file
where empty RS is equivalent to set RS="\n\n+".
But I want to use Ruby one-liner to do it. I've come up with this:
ruby -a -ne 'BEGIN{#lines=Array.new}; if ($_ !~ /^$/) then #lines.push($_.chomp) else (puts #lines.push(Time.now.to_i.to_s).join "\t"; #lines=Array.new) unless #lines.empty? end; END{puts #lines.push(Time.now.to_i.to_s).join "\t" unless #lines.empty?}' file
which is somehow awkward.
Is there any elegant way to do this?
And is there any ruby equivalent to awk's RS, NF, and OFS?
Thanks :)
$ awk '$(NF+1)=systime()' RS= OFS="\t" ip.txt
1 2 3 4 1548917728
a b c 1548917728
XY Z 1548917728
$ # .to_s can be ignored here, since puts will take care of it
$ ruby -00 -lane '$F.append(Time.now.to_i.to_s); puts $F.join("\t")' ip.txt
1 2 3 4 1548917730
a b c 1548917730
XY Z 1548917730
-00 paragraph mode
-a auto split, results available from $F array
-l chomps record separator
I have a set of CSV files which I wish to add a field at the end of each line.
The first field is an ID, some ten-digit number:
id,2nd_field,...,last_field
1234567890,Smith,...,Arkansas
1234567891,Jones,...,California
1234567892,White,...,
I want to add another field at the end where the value is based on modulo 3 (id % 3) of the ID:
id,2nd_field,...,last_field,added_field
1234567890,Smith,...,Arkansas,x
1234567891,Jones,...,California,y
1234567892,White,...,,z
Please take into account the fact that the last_field could be null or blank.
How to do this using sed or awk? I'm a newbie on using these tools, kindly provide as well some explanation to your script. Thanks.
Using awk:
awk 'BEGIN{FS=OFS=","} NR==1{print $0, "added_field"; next}
($1%3)==0{p="x"} ($1%3)==1{p="y"} ($1%3)==2{p="z"} {print $0, p}' file
Output:
id,2nd_field,...,last_field,added_field
1234567890,Smith,...,Arkansas,x
1234567891,Jones,...,California,y
1234567892,White,...,,z
$ cat tst.awk
BEGIN { FS=OFS=","; split("y,z,x",map) }
{ print $0, (NR>1 ? map[($1-1)%3+1] : "added_field") }
$ awk -f tst.awk file
id,2nd_field,...,last_field,added_field
1234567890,Smith,...,Arkansas,x
1234567891,Jones,...,California,y
1234567892,White,...,,z
The above just uses split() to create a mapping of:
map[1] = y
map[2] = z
map[3] = x
and then accesses it when needed via the common (VALUE-1)%N+1 syntax that maps mod N results for values 1,2,..,N-1,N to 1,2,..,N-1,N instead of 1,2,..,N-1,0:
map[($1-1)%3+1]
e.g.:
$ awk 'BEGIN{ for (i=1;i<=6;i++) print i, i%3, (i-1)%3+1 }'
1 1 1
2 2 2
3 0 3
4 1 1
5 2 2
6 0 3
I'm trying to write a simple script that will display the fields specified by the user as bash arguments. For example I've got text file looks like this:
1 2 3 4 5
1 2 3 4 5
a b c d e
And for example user types:
./script.sh text 1 2 5
Where $1 = text, and other parameters (like $2 $3 and $4) are the fields, so output will look like this:
1 2 5
1 2 5
a b e
I've got this code which prints all the columns defined as a arguments, but one below the others:
#!/bin/bash
text="$1"
shift
for x in $#; do
awk '{print $var}' var="$x" $text
done
Output for example ./script.sh text 1 2 5:
1
1
a
2
2
b
5
5
e
I guess output looks like that because loop "for" is outside of AWK. Is it a good solution for this task to place the loop inside AWK? I tried a few things but always have trouble with the syntax.
Thank you for your time and help!
file="$1"
shift
awk -v flds="$*" 'BEGIN{n=split(flds,f)} {for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS)}' "$file"
You don't need to loop over the params, pass all of them to awk with -v option:
awk -v v1=$2 -v v2=$3 -v v3=$4 '{print $v1, $v2, $v3;}' $1
You may want to perform additional checks such as whether the file ($1) contains enough fields, the file ($1) exists etc. But the idea is the same.
In your code, you are reading the file multiple times, each checking for only a particular field but to get your desired output, each line must be checked for multiple fields at the same time.
Pass the columns to awk and split them into an array and print the column corresponding to each value in the array:
file=$1
shift
p="$#"
awk -v l="$p" '{t=split(l,a," "); for (i=1;i<=t;i++) printf $(a[i]) " ";printf "\n";}' $file