How to print the data that have largest length using awk? - bash

I have this input:
1 happy,t,c,d
2 t,c
3 e,fgh,k
4 yk,j,f
5 leti,j,f,g
I want to print the length of the largest item (with comma as a delimiter), which should yield:
1 5,1,1,1
2 1,1
3 1,3,1
4 2,1,1
5 4,1,1,1
And then I want to select the max value for the 2nd column finally creating:
1 5
2 1
3 3
4 2
5 4
How can I do this in awk?
1) For the first task I have tried:
awk -v col=$2 -F',' '{OFS=","; for(i=1; i<=NF; i++) print length($i);}' test.data.txt
Which doesn't output correct data:
7
1
1
1
3
1
3
3
1
4
1
1
6
1
1
1
The only problem is that I am not able to use -v option properly to read only that column. So, I have all data in one column, and values added (from length) from column1 and space between column1 and column2.
2) To select the max value, I am doing:
awk -F',' '{OFS="\t"; m=length($1); for(i=1; i<=NF; i++) if (length($i) > m) m=length($i); print m}' test.data.txt
This works properly, but due to the presence of 1st column the values are added to the max values giving me:
7
3
3
4
6
instead of:
5
1
3
2
4
Lastly, I want to merge these two processes in one go. Any suggestions on improvements?

awk -F'[, ]' -v OFS="," '{m=length($2);for (i=3;i<=NF;i++) if (length($i) > m) m=length($i)}{print $1,m}' file
1,5
2,1
3,3
4,2
5,4
For the first case:
awk -F'[, ]' -v OFS="," '{printf "%s",$1;for (i=2;i<=NF;i++) printf "%s%s",(i==2?" ":OFS),length($i)}{print ""}'
1 5,1,1,1
2 1,1
3 1,3,1
4 2,1,1
5 4,1,1,1
Shorter alternative:
awk -F'[, ]' -v OFS="," '{printf "%s ",$1;for (i=2;i<=NF;i++) printf "%s%s",length($i),(i==NF?ORS:OFS)}'
While print in awk prints data and changes line by printing a new line at the end, printf does not change line on it's own.
PS: Thanks Ed Morton for the valuable comment.

We start with this data file:
$ cat data
1 happy,t,c,d
2 t,c
3 e,fgh,k
4 yk,j,f
5 leti,j,f,g
For the first task:
$ awk '{n=split($2,a,/,/); printf "%2s %s",$1,length(a[1]); for(i=2; i<=n; i++) printf ",%s",length(a[i]); print""}' data
1 5,1,1,1
2 1,1
3 1,3,1
4 2,1,1
5 4,1,1,1
For the second task:
$ awk '{n=split($2,a,/,/); max=length(a[1]); for(i=2; i<=n; i++) if (length(a[i])>max)max=length(a[i]); print $1,max}' data
1 5
2 1
3 3
4 2
5 4
How it works
For the second task:
n=split($2,a,/,/)
We split up the contents of field 2 into array a
max=length(a[1])
We assign the length of the first element of array a to the awk variable max.
for(i=2; i<=n; i++) if (length(a[i])>max)max=length(a[i])
If any succeeding element of array a is larger than max, we update `max.
print $1,max
We print the first field and the value of max.

Trying to golf this down:
gawk -F'[ ,]' '{m=0;for(i=2;i<=NF;i++){l=length($i);if(l>m)m=l}print$1,m}'
perl -MList::Util=max -F'\s+|,' -lne'$,=" ";print shift(#F),max map{length}#F'
perl -MList::Util=max -F'\s+|,' -lne'print"#{[shift(#F),max map{length}#F]}"'
perl -MList::Util=max -F'\s+|,' -lpe'$_="#{[shift(#F),max map{length}#F]}"'
ruby -F'[ ,]' -lape'$_="#{$F[0]} #{$F[1..-1].map{|e|e.size}.max}"'

Related

awk with loops to reorder columns

I am trying to reorder the columns of a file writting a awk programn. The file looks like:
My little program to reorder the columns is:
awk -v column=number 'BEGIN {FS=","; ORS="\n"; OFS=","; n=column} {for (i=1; i<=NF; i++){if (i!=n) $(i+1)=$i else $1=$i} {print $0}' file_name
I would like to put first the column given with number and then the remaing ones, but it does not work
You are overwriting fields as you iterate. You should instead "bubble" the value from position in column to the first position.
Consider how would you move column 3 here:
1 2 3 4
to
3 1 2 4
For example with this input file:
$ cat table
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
You could do it like this (separators changed to whitespace for readability):
$ awk -v col=3 '{val=$col; for (i=col; i>1; i--) $i=$(i-1); $1=val; print $0}' table
3 1 2 4
4 2 3 5
5 3 4 6
6 4 5 7
You can simply print the required column first and then the rest of columns.
$ awk -v column=3 -F "," '{n=column; printf $column; for (i=1; i<NF; i++){if (i!=n) printf ","$i} print ","$NF}' file
Input:
10,Hello,meow,20,30
hello,world,34,meow,60
Output:
meow,10,Hello,20,30
34,hello,world,meow,60

Can anyone tell me the Working of this scripts

matrix
1 2 3
4 5 6
1 8 9
awk '{for (i=1;i<=NF+1-NR;i++) printf "%s%s",$i,FS; print""}' matrix
1 2 3
4 5
7
awk '{for (i=1;i<=NF;i++) if (NR>=1 && NR==i) {for (j=1;j<=i-1;j++) printf " ";print $(i-0)}}' matrix
1
2
3
4
NF , store the number of field in a record. From your matrix each line have 3 element, so NF is 3.
NR gives you the total number of records being processed or line number, it is a dynamic variable value change from 1-3 reference to your matrix.
You should have a look and research on awk command. http://www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename-fnr/?ref=binfind.com/web

How to edit few lines in a column using awk?

I have a ascii data file e.g.:
ifile.txt
2
3
2
3
4
5
6
4
I would like to multiply 3 into all the numbers after 6th line. So outfile will be:
ofile.txt
2
3
2
3
4
15
18
12
my algorithm/ script is
awk '{if ($1<line 6); printf "%10.5f\n", $1}' ifile.txt > ofile.txt
awk '{if ($1>=line 6); printf "%10.5f\n", $1*3}' ifile.txt >> ofile.txt
The simplest way to do this is:
awk 'NR > 6 { $1 *= 3 } 1' ifile.txt
Multiply the first field by 3 when the record (line) number NR is greater than 6.
The structure of an awk program is condition { action }, where the default condition is true and the default action is { print }, so the 1 at the end is the shortest way of always printing every line.

Sum of all rows of all columns - Bash

I have a file like this
1 4 7 ...
2 5 8
3 6 9
And I would like to have as output
6 15 24 ...
That is the sum of all the lines for all the columns. I know that to sum all the lines of a certain column (say column 1) you can do like this:
awk '{sum+=$1;}END{print $1}' infile > outfile
But I can't do it automatically for all the columns.
One more awk
awk '{for(i=1;i<=NF;i++)$i=(a[i]+=$i)}END{print}' file
Output
6 15 24
Explanation
{for (i=1;i<=NF;i++) Set field to 1 and increment through
$i=(a[i]+=$i) Set the field to the sum + the value in field
END{print} Print the last line which now contains the sums
As with the other answers this will retain the order of the fields regardless of the number of them.
You want to sum every column differently. Hence, you need an array, not a scalar:
$ awk '{for (i=1;i<=NF;i++) sum[i]+=$i} END{for (i in sum) print sum[i]}' file
6
15
24
This stores sum[column] and finally prints it.
To have the output in the same line, use:
$ awk '{for (i=1;i<=NF;i++) sum[i]+=$i} END{for (i in sum) printf "%d%s", sum[i], (i==NF?"\n":" ")}' file
6 15 24
This uses the trick printf "%d%s", sum[i], (i==NF?"\n":" "): print the digit + a character. If we are in the last field, let this char be new line; otherwise, just a space.
There is a very simple command called numsum to do this:
numsum -c FileName
-c --- Print out the sum of each column.
For example:
cat FileName
1 4 7
2 5 8
3 6 9
Output :
numsum -c FileName
6 15 24
Note:
If the command is not installed in your system, you can do it with this command:
apt-get install num-utils
echo "1 4 7
2 5 8
3 6 9 " \
| awk '{for (i=1;i<=NF;i++){
sums[i]+=$i;maxi=i}
}
END{
for(i=1;i<=maxi;i++){
printf("%s ", sums[i])
}
print}'
output
6 15 24
My recollection is that you can't rely on for (i in sums) to produce the keys any particular order, but maybe this is "fixed" in newer versions of gawk.
In case you're using an old-line Unix awk, this solution will keep your output in the same column order, regardless of how "wide" your file is.
IHTH
AWK Program
#!/usr/bin/awk -f
{
print($0);
len=split($0,a);
if (maxlen < len) {
maxlen=len;
}
for (i=1;i<=len;i++) {
b[i]+=a[i];
}
}
END {
for (i=1;i<=maxlen;i++) {
printf("%s ", b[i]);
}
print ""
}
Output
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
3 6 9 12 15
Your answer is correct. It is just missed to print "sum". Try this:
awk '{sum+=$1;} END{print sum;}' infile > outfile

how to sum up matrices in multiple files using bash or awk

If I have an arbitrary number of files, say n files, and each file contains a matrix, how can I use bash or awk to sum up all the matrices in each file and get an output?
For example, if n=3, and I have these 3 files with the following contents
$ cat mat1.txt
1 2 3
4 5 6
7 8 9
$cat mat2.txt
1 1 1
1 1 1
1 1 1
$ cat mat3.txt
2 2 2
2 2 2
2 2 2
I want to get this output:
$ cat output.txt
4 5 6
7 8 9
10 11 12
Is there a simple one liner to do this?
Thanks!
$ awk '{for (i=1;i<=NF;i++) total[FNR","i]+=$i;} END{for (j=1;j<=FNR;j++) {for (i=1;i<=NF;i++) printf "%3i ",total[j","i]; print "";}}' mat1.txt mat2.txt mat3.txt
4 5 6
7 8 9
10 11 12
This will automatically adjust to different size matrices. I don't believe that I have used any GNU features so this should be portable to OSX and elsewhere.
How it works:
This command reads from each line from each matrix, one matrix at a time.
For each line read, the following command is executed:
for (i=1;i<=NF;i++) total[FNR","i]+=$i
This loops over every column on the line and adds it to the array total.
GNU awk has multidimensional arrays but, for portability, they are not used here. awk's arrays are associative and this creates an index from the file's line number, FNR, and the column number i, by combining them together with a comma. The result should be portable.
After all the matrices have been read, the results in total are printed:
END{for (j=1;j<=FNR;j++) {for (i=1;i<=NF;i++) printf "%3i ",total[j","i]; print ""}}
Here, j loops over each line up to the total number of lines, FNR. Then i loops over each column up to the total number of columns, NF. For each row and column, the total is printed via printf "%3i ",total[j","i]. This prints the total as a 3-character-wide integer. If you numbers are float or are bigger, adjust the format accordingly.
At the end of each row, the print "" statement causes a newline character to be printed.
You can use awk with paste:
awk -v n=3 '{for (i=1; i<=n; i++) printf "%s%s", ($i + $(i+n) + $(i+n*2)),
(i==n)?ORS:OFS}' <(paste mat{1,2,3}.txt)
4 5 6
7 8 9
10 11 12
GNU awk has multi-dimensional arrays.
gawk '
{
for (i=1; i<=NF; i++)
m[i][FNR] += $i
}
END {
for (y=1; y<=FNR; y++) {
for (x=1; x<=NF; x++)
printf "%d ", m[x][y]
print ""
}
}
' mat{1,2,3}.txt

Resources