awk with loops to reorder columns - bash

I am trying to reorder the columns of a file writting a awk programn. The file looks like:
My little program to reorder the columns is:
awk -v column=number 'BEGIN {FS=","; ORS="\n"; OFS=","; n=column} {for (i=1; i<=NF; i++){if (i!=n) $(i+1)=$i else $1=$i} {print $0}' file_name
I would like to put first the column given with number and then the remaing ones, but it does not work

You are overwriting fields as you iterate. You should instead "bubble" the value from position in column to the first position.
Consider how would you move column 3 here:
1 2 3 4
to
3 1 2 4
For example with this input file:
$ cat table
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
You could do it like this (separators changed to whitespace for readability):
$ awk -v col=3 '{val=$col; for (i=col; i>1; i--) $i=$(i-1); $1=val; print $0}' table
3 1 2 4
4 2 3 5
5 3 4 6
6 4 5 7

You can simply print the required column first and then the rest of columns.
$ awk -v column=3 -F "," '{n=column; printf $column; for (i=1; i<NF; i++){if (i!=n) printf ","$i} print ","$NF}' file
Input:
10,Hello,meow,20,30
hello,world,34,meow,60
Output:
meow,10,Hello,20,30
34,hello,world,meow,60

Related

How to print the data that have largest length using awk?

I have this input:
1 happy,t,c,d
2 t,c
3 e,fgh,k
4 yk,j,f
5 leti,j,f,g
I want to print the length of the largest item (with comma as a delimiter), which should yield:
1 5,1,1,1
2 1,1
3 1,3,1
4 2,1,1
5 4,1,1,1
And then I want to select the max value for the 2nd column finally creating:
1 5
2 1
3 3
4 2
5 4
How can I do this in awk?
1) For the first task I have tried:
awk -v col=$2 -F',' '{OFS=","; for(i=1; i<=NF; i++) print length($i);}' test.data.txt
Which doesn't output correct data:
7
1
1
1
3
1
3
3
1
4
1
1
6
1
1
1
The only problem is that I am not able to use -v option properly to read only that column. So, I have all data in one column, and values added (from length) from column1 and space between column1 and column2.
2) To select the max value, I am doing:
awk -F',' '{OFS="\t"; m=length($1); for(i=1; i<=NF; i++) if (length($i) > m) m=length($i); print m}' test.data.txt
This works properly, but due to the presence of 1st column the values are added to the max values giving me:
7
3
3
4
6
instead of:
5
1
3
2
4
Lastly, I want to merge these two processes in one go. Any suggestions on improvements?
awk -F'[, ]' -v OFS="," '{m=length($2);for (i=3;i<=NF;i++) if (length($i) > m) m=length($i)}{print $1,m}' file
1,5
2,1
3,3
4,2
5,4
For the first case:
awk -F'[, ]' -v OFS="," '{printf "%s",$1;for (i=2;i<=NF;i++) printf "%s%s",(i==2?" ":OFS),length($i)}{print ""}'
1 5,1,1,1
2 1,1
3 1,3,1
4 2,1,1
5 4,1,1,1
Shorter alternative:
awk -F'[, ]' -v OFS="," '{printf "%s ",$1;for (i=2;i<=NF;i++) printf "%s%s",length($i),(i==NF?ORS:OFS)}'
While print in awk prints data and changes line by printing a new line at the end, printf does not change line on it's own.
PS: Thanks Ed Morton for the valuable comment.
We start with this data file:
$ cat data
1 happy,t,c,d
2 t,c
3 e,fgh,k
4 yk,j,f
5 leti,j,f,g
For the first task:
$ awk '{n=split($2,a,/,/); printf "%2s %s",$1,length(a[1]); for(i=2; i<=n; i++) printf ",%s",length(a[i]); print""}' data
1 5,1,1,1
2 1,1
3 1,3,1
4 2,1,1
5 4,1,1,1
For the second task:
$ awk '{n=split($2,a,/,/); max=length(a[1]); for(i=2; i<=n; i++) if (length(a[i])>max)max=length(a[i]); print $1,max}' data
1 5
2 1
3 3
4 2
5 4
How it works
For the second task:
n=split($2,a,/,/)
We split up the contents of field 2 into array a
max=length(a[1])
We assign the length of the first element of array a to the awk variable max.
for(i=2; i<=n; i++) if (length(a[i])>max)max=length(a[i])
If any succeeding element of array a is larger than max, we update `max.
print $1,max
We print the first field and the value of max.
Trying to golf this down:
gawk -F'[ ,]' '{m=0;for(i=2;i<=NF;i++){l=length($i);if(l>m)m=l}print$1,m}'
perl -MList::Util=max -F'\s+|,' -lne'$,=" ";print shift(#F),max map{length}#F'
perl -MList::Util=max -F'\s+|,' -lne'print"#{[shift(#F),max map{length}#F]}"'
perl -MList::Util=max -F'\s+|,' -lpe'$_="#{[shift(#F),max map{length}#F]}"'
ruby -F'[ ,]' -lape'$_="#{$F[0]} #{$F[1..-1].map{|e|e.size}.max}"'

Can anyone tell me the Working of this scripts

matrix
1 2 3
4 5 6
1 8 9
awk '{for (i=1;i<=NF+1-NR;i++) printf "%s%s",$i,FS; print""}' matrix
1 2 3
4 5
7
awk '{for (i=1;i<=NF;i++) if (NR>=1 && NR==i) {for (j=1;j<=i-1;j++) printf " ";print $(i-0)}}' matrix
1
2
3
4
NF , store the number of field in a record. From your matrix each line have 3 element, so NF is 3.
NR gives you the total number of records being processed or line number, it is a dynamic variable value change from 1-3 reference to your matrix.
You should have a look and research on awk command. http://www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename-fnr/?ref=binfind.com/web

How to edit few lines in a column using awk?

I have a ascii data file e.g.:
ifile.txt
2
3
2
3
4
5
6
4
I would like to multiply 3 into all the numbers after 6th line. So outfile will be:
ofile.txt
2
3
2
3
4
15
18
12
my algorithm/ script is
awk '{if ($1<line 6); printf "%10.5f\n", $1}' ifile.txt > ofile.txt
awk '{if ($1>=line 6); printf "%10.5f\n", $1*3}' ifile.txt >> ofile.txt
The simplest way to do this is:
awk 'NR > 6 { $1 *= 3 } 1' ifile.txt
Multiply the first field by 3 when the record (line) number NR is greater than 6.
The structure of an awk program is condition { action }, where the default condition is true and the default action is { print }, so the 1 at the end is the shortest way of always printing every line.

Sum of all rows of all columns - Bash

I have a file like this
1 4 7 ...
2 5 8
3 6 9
And I would like to have as output
6 15 24 ...
That is the sum of all the lines for all the columns. I know that to sum all the lines of a certain column (say column 1) you can do like this:
awk '{sum+=$1;}END{print $1}' infile > outfile
But I can't do it automatically for all the columns.
One more awk
awk '{for(i=1;i<=NF;i++)$i=(a[i]+=$i)}END{print}' file
Output
6 15 24
Explanation
{for (i=1;i<=NF;i++) Set field to 1 and increment through
$i=(a[i]+=$i) Set the field to the sum + the value in field
END{print} Print the last line which now contains the sums
As with the other answers this will retain the order of the fields regardless of the number of them.
You want to sum every column differently. Hence, you need an array, not a scalar:
$ awk '{for (i=1;i<=NF;i++) sum[i]+=$i} END{for (i in sum) print sum[i]}' file
6
15
24
This stores sum[column] and finally prints it.
To have the output in the same line, use:
$ awk '{for (i=1;i<=NF;i++) sum[i]+=$i} END{for (i in sum) printf "%d%s", sum[i], (i==NF?"\n":" ")}' file
6 15 24
This uses the trick printf "%d%s", sum[i], (i==NF?"\n":" "): print the digit + a character. If we are in the last field, let this char be new line; otherwise, just a space.
There is a very simple command called numsum to do this:
numsum -c FileName
-c --- Print out the sum of each column.
For example:
cat FileName
1 4 7
2 5 8
3 6 9
Output :
numsum -c FileName
6 15 24
Note:
If the command is not installed in your system, you can do it with this command:
apt-get install num-utils
echo "1 4 7
2 5 8
3 6 9 " \
| awk '{for (i=1;i<=NF;i++){
sums[i]+=$i;maxi=i}
}
END{
for(i=1;i<=maxi;i++){
printf("%s ", sums[i])
}
print}'
output
6 15 24
My recollection is that you can't rely on for (i in sums) to produce the keys any particular order, but maybe this is "fixed" in newer versions of gawk.
In case you're using an old-line Unix awk, this solution will keep your output in the same column order, regardless of how "wide" your file is.
IHTH
AWK Program
#!/usr/bin/awk -f
{
print($0);
len=split($0,a);
if (maxlen < len) {
maxlen=len;
}
for (i=1;i<=len;i++) {
b[i]+=a[i];
}
}
END {
for (i=1;i<=maxlen;i++) {
printf("%s ", b[i]);
}
print ""
}
Output
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
3 6 9 12 15
Your answer is correct. It is just missed to print "sum". Try this:
awk '{sum+=$1;} END{print sum;}' infile > outfile

How can I use awk to sort columns by the last value of a column?

I have a file like this (with hundreds of lines and columns)
1 2 3
4 5 6
7 88 9
and I would like to re-order columns basing on the last line values (or a specific line values)
1 3 2
4 6 5
7 9 88
How can I use awk (or other) to accomplish this task?
Thank you in advance for your help
EDIT: I would like to thank everybody and to apologize if I wasn't enough clear.
What I would like to do is:
take a line (for example the last one);
reorder the columns of the matrix using the sorted values of the chosen line to derermine the order.
So, the last line is 7 88 9, which sorted is 7 9 88, then the three columns have to be reordered in a way such that, in this case, the last two columns are swapped.
A four-column more generic example, based on the last line again:
Input:
1 2 3 4
4 5 6 7
7 88.0 9 -3
Output:
4 1 3 2
7 4 6 5
-3 7 9 88.0
Here's a quick, dirty and improvable solution: (edited because OP clarified that numbers are floating point).
$ cat test.dat
1 2 3
4 5 6
.07 .88 -.09
$ awk "{print $(printf '$%d%.0s\n' \
$(i=0; for x in $(tail -n1 test.dat); do
echo $((++i)) $x
done |
sort -k2g) | paste -sd,)}" test.dat
3 1 2
6 4 5
-.09 .07 .88
To see what's going on there (or at least part of it):
$ echo "{print $(printf '$%d%.0s\n' \
$(i=0; for x in $(tail -n1 test.dat); do
echo $((++i)) $x
done |
sort -k2g) | paste -sd,)}" test.dat
{print $3,$1,$2} test.dat
To make it work for an arbitrary line, replace tail -n1 with tail -n+$L|head -n1
This problem can be elegantly solved using GNU awk's array sorting feature. GNU awk allows you to control array traversal using PROCINFO. So two passes of the file are required, the first pass to split the last record into an array and the second pass to loop through the indices of the array in value order and output fields based on indices. The code below probably explains it better than I do.
awk 'BEGIN{PROCINFO["sorted_in"] = "#val_num_asc"};
NR == FNR {for (x in arr) delete arr[x]; split($0, arr)};
NR != FNR{sep=""; for (x in arr) {printf sep""$x; sep=" "} print ""}' file.txt file.txt
4 1 3 2
7 4 6 5
-3 7 9 88.0
Update:
Create a file called transpose.awk like this:
{
for (i=1; i<=NF; i++) {
a[NR,i] = $i
}
}
NF>p { p = NF }
END {
for(j=1; j<=p; j++) {
str=a[1,j]
for(i=2; i<=NR; i++){
str=str OFS a[i,j];
}
print str
}
}
Now here is the script that should do work for you:
awk -f transpose.awk file | sort -n -k $(awk 'NR==1{print NF}' file) | awk -f transpose.awk
1 3 2
4 6 5
7 9 88
I am using transpose.awk twice here. Once to transpose rows to columns then I am doing numeric sorting by last column and then again I am transposing rows to columns. It may not be most efficient solution but it is something that works as per the OP's requirements.
transposing awk script courtesy of: #ghostdog74 from An efficient way to transpose a file in Bash

Resources