Split multiple lines after matching a pattern

Split multiple lines after matching a pattern - bash

Sorry for a newbie question but i have a file that looks like this and wanted to capture a line after a certain string which is /aggr.
/aggr0_usts_nz_3001/plex0/rg0:
9g.10.0 0 4.08 0.00 .... . 4.08 1.00 41 0.00 .... .
1a.10.1 0 4.08 0.00 .... . 4.08 1.00 10 0.00 .... .
9g.10.4 0 4.08 0.00 .... . 4.08 1.00 49 0.00 .... .
1a.10.1 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
9g.10.4 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
/aggr1_usts_nz_3001/plex0/rg0:
1e.00.0 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
9o.01.44 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
1e.00.1 4 994.04 994.04 1.44 119 0.00 .... . 0.00 .... .
9o.01.41 4 981.91 981.91 1.41 141 0.00 .... . 0.00 .... .
1e.00.4 4 811.19 811.19 1.14 149 0.00 .... . 0.00 .... .
9o.01.14 4 809.99 809.99 1.14 119 0.00 .... . 0.00 .... .
1e.00.1 4 980.86 980.86 1.19 144 0.00 .... . 0.00 .... .
9o.01.11 4 998.89 998.89 1.11 140 0.00 .... . 0.00 .... .
/aggr1_usts_nz_3001/plex0/rg1:
9a.10.14 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
9e.40.14 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
1g.11.14 4 999.10 999.10 1.16 110 0.00 .... . 0.00 .... .
1o.41.14 4 996.90 996.90 1.44 118 0.00 .... . 0.00 .... .
9a.10.11 4 911.11 911.11 1.44 116 0.00 .... . 0.00 .... .
9e.40.11 4 919.48 919.48 1.11 141 0.00 .... . 0.00 .... .
1g.11.11 4 900.44 900.44 1.16 146 0.00 .... . 0.00 .... .
1o.41.11 1 694.19 694.19 1.19 109 0.00 .... . 0.00 .... .
9a.10.14 4 941.44 941.44 1.61 111 0.00 .... . 0.00 .... .
i wanted to take out a certain line after say for example /aggr0 and redirect it to a file. so sample would be, file1 will have this information
/aggr0_usts_nz_3001/plex0/rg0:
9g.10.0 0 4.08 0.00 .... . 4.08 1.00 41 0.00 .... .
1a.10.1 0 4.08 0.00 .... . 4.08 1.00 10 0.00 .... .
9g.10.4 0 4.08 0.00 .... . 4.08 1.00 49 0.00 .... .
1a.10.1 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
9g.10.4 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
and then file2 would be this information and so on.
/aggr1_usts_nz_3001/plex0/rg0:
1e.00.0 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
9o.01.44 0 0.00 0.00 .... . 0.00 .... . 0.00 .... .
1e.00.1 4 994.04 994.04 1.44 119 0.00 .... . 0.00 .... .
9o.01.41 4 981.91 981.91 1.41 141 0.00 .... . 0.00 .... .
1e.00.4 4 811.19 811.19 1.14 149 0.00 .... . 0.00 .... .
9o.01.14 4 809.99 809.99 1.14 119 0.00 .... . 0.00 .... .
1e.00.1 4 980.86 980.86 1.19 144 0.00 .... . 0.00 .... .
9o.01.11 4 998.89 998.89 1.11 140 0.00 .... . 0.00 .... .
so it's like segregating the information on the file.
had this command below but since lines after aggr are not the same, it will only show what was defined (which is 7)
for i in `cat sample.txt`; do
echo $i | grep aggr* -A 7
done
but it's only showing what I grepped.
this command here prints 2 lines after matching the pattern however what i want is to redirect it to a file.
awk '/aggr/{x=NR+2}(NR<=x){print}' sample.txt
any idea how can I accomplish it.

You may use this awk:
awk '/^\/aggr/ {close(fn); fn = "file" ++fNo ".txt"} {print > fn}' file

This might work for you (GNU csplit):
csplit -n1 -ffile file '/^\/aggr/' '{*}'
This will produce 4 file from your example file0 file1 file2 file3, where file0 is empty. If you don't mind numbers starting from zero, use:
csplit -zn1 -ffile file '/^\/aggr/' '{*}'
This will elide the first empty file.
For a sed solution using bash:
sed -En '/^\/aggr/!b;x;s/^/1/;x;:a;H;$!{n;/^\/aggr/!ba};x
s/^(\S+)\n(.*)/echo "\2" > file\1;echo $((\1+1))/e;x;$!ba' file
In essence - this gathers up a split of the file in the hold space and writes it out when it encounters the next split or the at the end of the file.
The file number is prepped when the first split is encountered as the first line of the hold space and after the split is written out, the file number is also incremented using standard bash arithmetic and replaces the contents of the hold space.

Related

get MAx value of a column withing a fixed timestamp

I have a file with timestamp and data in 12 columns. This data is dumped every second and I need to pick the MAX value of 6th column within every Minute. I am not even sure from were to start .I thought of doing as follow ,but do not know how to get one out of minute group. Also what if data is more then of 24 hours. so cannot use this approach. I think somehow I need to create a group of 60 rows and then sort data out of it, but not sure how to do that.
cat file |sort -k6 -r |awk '!a[$1]++' |sort -k1
For example :Input data
16:06:00 0 1.01 0.00 4.04 1.00 0.00 0.00 0.00 0.00 0.00 94.95
16:06:01 0 0.00 0.00 2.00 2.00 0.00 0.00 0.00 0.00 0.00 98.00
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:06:03 0 4.08 1.02 2.04 2.00 0.00 0.00 0.00 0.00 0.00 92.86
...
...
16:06:59 0 4.08 1.02 2.04 3.00 0.00 0.00 0.00 0.00 0.00 92.86
16:07:00 0 1.01 0.00 4.04 4.00 0.00 0.00 0.00 0.00 0.00 94.95
16:07:01 0 0.00 0.00 2.00 5.00 0.00 0.00 0.00 0.00 0.00 98.00
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:03 0 4.08 1.02 2.04 0.00 0.00 0.00 0.00 0.00 0.00 92.86
...
...
16:07:59 0 4.08 1.02 2.04 0.00 0.00 0.00 0.00 0.00 0.00 92.86
...
...
Expected output:
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91

awk to the rescue!
$ awk ' {split($1,a,":"); k=a[1]a[2]}
max[k]<$6 {max[k]=$6; maxR[k]=$0}
END {for(r in maxR) print maxR[r]}' file
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
note that max is not initialized (implicitly initialized to zero), if values are all negative this is not going to work. Workaround is simple but perhaps not needed in this context.
This alternative assumes time sorted records and prints the max in one minute intervals, so different dates will not be merged.
$ awk '{split($1,a,":"); k=a[1]a[2]}
max<$6 {max=$6; maxR=$0}
p!=k {if(p) print maxR; p=k}
END {print maxR}' file
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91

Using Perl
$ cat monk.log
16:06:00 0 1.01 0.00 4.04 1.00 0.00 0.00 0.00 0.00 0.00 94.95
16:06:01 0 0.00 0.00 2.00 2.00 0.00 0.00 0.00 0.00 0.00 98.00
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:06:03 0 4.08 1.02 2.04 2.00 0.00 0.00 0.00 0.00 0.00 92.86
16:06:59 0 4.08 1.02 2.04 3.00 0.00 0.00 0.00 0.00 0.00 92.86
16:07:00 0 1.01 0.00 4.04 4.00 0.00 0.00 0.00 0.00 0.00 94.95
16:07:01 0 0.00 0.00 2.00 5.00 0.00 0.00 0.00 0.00 0.00 98.00
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:03 0 4.08 1.02 2.04 0.00 0.00 0.00 0.00 0.00 0.00 92.86
16:07:59 0 4.08 1.02 2.04 0.00 0.00 0.00 0.00 0.00 0.00 92.86
$ perl -F'/\s+/' -lane ' $F[0]=~/(.*):/ and $x=$1 ; if( $F[5]>$kv{$x} ) { $kv{$x}=$F[5]; $kv2{$x}=$_ } END { print "$kv2{$_}" for(keys %kv) } ' monk.log
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
or
$ perl -F'/\s+/' -lane ' $F[0]=~/(.*):/ ; if( $F[5]>$kv{$1} ) { $kv{$1}=$F[5]; $kv2{$1}=$_ } END { print "$kv2{$_}" for(keys %kv) } ' monk.log
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91

awk + sort
$ cat monk.log
16:06:00 0 1.01 0.00 4.04 1.00 0.00 0.00 0.00 0.00 0.00 94.95
16:06:01 0 0.00 0.00 2.00 2.00 0.00 0.00 0.00 0.00 0.00 98.00
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:06:03 0 4.08 1.02 2.04 2.00 0.00 0.00 0.00 0.00 0.00 92.86
16:06:59 0 4.08 1.02 2.04 3.00 0.00 0.00 0.00 0.00 0.00 92.86
16:07:00 0 1.01 0.00 4.04 4.00 0.00 0.00 0.00 0.00 0.00 94.95
16:07:01 0 0.00 0.00 2.00 5.00 0.00 0.00 0.00 0.00 0.00 98.00
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:03 0 4.08 1.02 2.04 0.00 0.00 0.00 0.00 0.00 0.00 92.86
16:07:59 0 4.08 1.02 2.04 0.00 0.00 0.00 0.00 0.00 0.00 92.86
$ awk ' { split($1,t,":"); $(NF+1)=t[1]t[2] }1 ' monk.log | sort -k12 -n -k6 | awk ' !a[$NF] { a[$NF]++ ; NF--; print} '
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91
or
$ awk ' split($1,t,":") && $(NF+1)=t[1]t[2] ' monk.log | sort -k12 -n -k6 | awk ' !a[$NF] { a[$NF]++ ; NF--; print} '
16:06:02 0 3.03 0.00 6.06 5.00 0.00 0.00 0.00 0.00 0.00 90.91
16:07:02 0 3.03 0.00 6.06 9.00 0.00 0.00 0.00 0.00 0.00 90.91

Grep not parsing the whole file

I want to use grep to pick lines not containing "WAT" in a file containing 425409 lines with a file size of 26.8 MB, UTF8 encoding.
The file looks like this
>ATOM 1 N ALA 1 9.979 -15.619 28.204 1.00 0.00
>ATOM 2 H1 ALA 1 9.594 -15.053 28.938 1.00 0.00
>ATOM 3 H2 ALA 1 9.558 -15.358 27.323 1.00 0.00
>ATOM 12 O ALA 1 7.428 -16.246 28.335 1.00 0.00
>ATOM 13 N HID 2 7.563 -18.429 28.562 1.00 0.00
>ATOM 14 H HID 2 6.557 -18.369 28.638 1.00 0.00
>ATOM 15 CA HID 2 8.082 -19.800 28.535 1.00 0.00
>ATOM 24 HE1 HID 2 8.603 -23.670 33.041 1.00 0.00
>ATOM 25 NE2 HID 2 8.012 -23.749 30.962 1.00 0.00
>ATOM 29 O HID 2 5.854 -20.687 28.537 1.00 0.00
>ATOM 30 N GLN 3 7.209 -21.407 26.887 1.00 0.00
>ATOM 31 H GLN 3 8.168 -21.419 26.566 1.00 0.00
>ATOM 32 CA GLN 3 6.271 -22.274 26.157 1.00 0.00
**16443 lines**
>ATOM 16425 C116 PA 1089 -34.635 6.968 -0.185 1.00 0.00
>ATOM 16426 H16R PA 1089 -35.669 7.267 -0.368 1.00 0.00
>ATOM 16427 H16S PA 1089 -34.579 5.878 -0.218 1.00 0.00
>ATOM 16428 H16T PA 1089 -34.016 7.366 -0.990 1.00 0.00
>ATOM 16429 C115 PA 1089 -34.144 7.493 1.177 1.00 0.00
>ATOM 16430 H15R PA 1089 -33.101 7.198 1.305 1.00 0.00
>ATOM 16431 H15S PA 1089 -34.179 8.585 1.197 1.00 0.00
>ATOM 16432 C114 PA 1089 -34.971 6.910 2.342 1.00 0.00
>ATOM 16433 H14R PA 1089 -35.147 5.847 2.166 1.00 0.00
**132284 lines**
>ATOM 60981 O WAT 7952 -46.056 -5.515 -56.245 1.00 0.00
>ATOM 60982 H1 WAT 7952 -45.185 -5.238 -56.602 1.00 0.00
>ATOM 60983 H2 WAT 7952 -46.081 -6.445 -56.561 1.00 0.00
>TER
>ATOM 60984 O WAT 7953 -51.005 -3.205 -46.712 1.00 0.00
>ATOM 60985 H1 WAT 7953 -51.172 -3.159 -47.682 1.00 0.00
>ATOM 60986 H2 WAT 7953 -51.051 -4.177 -46.579 1.00 0.00
>TER
>ATOM 60987 O WAT 7954 -49.804 -0.759 -49.284 1.00 0.00
>ATOM 60988 H1 WAT 7954 -48.962 -0.677 -49.785 1.00 0.00
>ATOM 60989 H2 WAT 7954 -49.868 0.138 -48.903 1.00 0.00
**many lines until the end**
>TER
>END
I have used grep -v 'WAT' file.txt but it only returned me the first 16179 lines not containing "WAT" and I can see that there are more lines not containing "WAT". For instance, the following line (and many others) does not appear in the output:
> ATOM 16425 C116 PA 1089 -34.635 6.968 -0.185 1.00 0.00
In order to try to figure out what was happening I've tried grep ' ' file.txt. This command should return every line in the file, but it only returned he first 16179 lines too.
I've also tried to use tail -408977 file.txt | grep ' ' and it returned me all lines recalled by tail. Then I've tried tail -408978 file.txt | grep ' ' and the output was totally empty, zero lines.
I am working on a "normal" 64 bit system, Kubuntu.
Thanks a lot for the help!

When I try I get
$: grep WAT file.txt
Binary file file.txt matches
grep is assuming it's a binary file. add -a
-a, --text equivalent to --binary-files=text
$: grep -a WAT file.txt|head -3
ATOM 29305 O WAT 4060 -75.787 -79.125 25.925 1.00 0.00 O
ATOM 29306 H1 WAT 4060 -76.191 -78.230 25.936 1.00 0.00 H
ATOM 29307 H2 WAT 4060 -76.556 -79.670 25.684 1.00 0.00 H
Your file has 2 NULLs each at the end of lines 16426, 16428, 16430, and 16432.
$: tr "\0" # <file.txt|grep -n #
16426:ATOM 16421 KA CAL 1085 -20.614 -22.960 18.641 1.00 0.00 ##
16428:ATOM 16422 KA CAL 1086 20.249 21.546 19.443 1.00 0.00 ##
16430:ATOM 16423 KA CAL 1087 22.695 -19.700 19.624 1.00 0.00 ##
16432:ATOM 16424 KA CAL 1088 -22.147 19.317 17.966 1.00 0.00 ##

Sort by highest value in any field

I want to sort a file based on values in columns 2-8?
Essentially I want ascending order based on the highest value that appears on the line in any of those fields but ignoring columns 1, 9 and 10. i.e. the line with the highest value should be the last line of the file, 2nd largest value should be 2nd last line etc... If the next number in the ascending order appears on multiple lines (like A/B) I don't care of the order it gets printed.
I've looked at using sort but can't figure out an easy way to do what I want...
I'm a bit stumped, any ideas?
Input:
#1 2 3 4 5 6 7 8 9 10
A 0.00 0.00 0.01 0.23 0.19 0.07 0.26 0.52 0.78
B 0.00 0.00 0.02 0.26 0.19 0.09 0.20 0.56 0.76
C 0.00 0.00 0.02 0.16 0.20 0.22 2.84 0.60 3.44
D 0.00 0.00 0.02 0.29 0.22 0.09 0.28 0.62 0.90
E 0.00 0.00 0.90 0.09 0.18 0.05 0.24 1.21 1.46
F 0.00 0.00 1.06 0.03 0.04 0.01 0.00 1.13 1.14
G 0.00 0.00 1.11 0.10 0.31 0.08 0.64 1.60 2.25
H 0.00 0.00 1.39 0.03 0.04 0.01 0.01 1.47 1.48
I 0.00 0.00 1.68 0.16 0.55 0.24 5.00 2.63 7.63
J 0.00 0.00 6.86 0.52 1.87 0.59 12.79 9.83 22.62
K 0.00 0.00 7.26 0.57 2.00 0.64 11.12 10.47 21.59
Expected output:
#1 2 3 4 5 6 7 8 9 10
A 0.00 0.00 0.01 0.23 0.19 0.07 (0.26) 0.52 0.78
B 0.00 0.00 0.02 (0.26) 0.19 0.09 0.20 0.56 0.76
D 0.00 0.00 0.02 (0.29) 0.22 0.09 0.28 0.62 0.90
E 0.00 0.00 (0.90) 0.09 0.18 0.05 0.24 1.21 1.46
F 0.00 0.00 (1.06) 0.03 0.04 0.01 0.00 1.13 1.14
G 0.00 0.00 (1.11) 0.10 0.31 0.08 0.64 1.60 2.25
H 0.00 0.00 (1.39) 0.03 0.04 0.01 0.01 1.47 1.48
C 0.00 0.00 0.02 0.16 0.20 0.22 (2.84) 0.60 3.44
I 0.00 0.00 1.68 0.16 0.55 0.24 (5.00) 2.63 7.63
K 0.00 0.00 7.26 0.57 2.00 0.64 (11.12) 10.47 21.59
J 0.00 0.00 6.86 0.52 1.87 0.59 (12.79) 9.83 22.62

Preprocess the data: print the max of columns 2 through 8 at the start of each line, then sort, then remove the added column:
awk '
NR==1{print "x ", $0}
NR>1{
max = $2;
for( i = 3; i <= 8; i++ )
if( $i > max )
max = $i;
print max, $0
}' OFS=\\t input-file | sort -n | cut -f 2-

Another pure awk variant:
$ awk 'NR==1; # print header
NR>1{ #For other lines,
a=$2;
ai=2;
for(i=3;i<=8;i++){
if($i>a){
a=$i;
ai=i;
}
} # Find the max number in the line
$ai= "(" $ai ")"; # decoration - mark highest with ()
g[$0]=a;
}
function cmp_num_val(i1, v1, i2, v2) {return (v1 - v2);} # sorting function
END{
PROCINFO["sorted_in"]="cmp_num_val"; # assign sorting function
for (a in g) print a; # print
}' sortme.txt | column -t # column -t for formatting.
#1 2 3 4 5 6 7 8 9 10
A 0.00 0.00 0.01 0.23 0.19 0.07 (0.26) 0.52 0.78
B 0.00 0.00 0.02 (0.26) 0.19 0.09 0.20 0.56 0.76
D 0.00 0.00 0.02 (0.29) 0.22 0.09 0.28 0.62 0.90
E 0.00 0.00 (0.90) 0.09 0.18 0.05 0.24 1.21 1.46
F 0.00 0.00 (1.06) 0.03 0.04 0.01 0.00 1.13 1.14
G 0.00 0.00 (1.11) 0.10 0.31 0.08 0.64 1.60 2.25
H 0.00 0.00 (1.39) 0.03 0.04 0.01 0.01 1.47 1.48
C 0.00 0.00 0.02 0.16 0.20 0.22 (2.84) 0.60 3.44
I 0.00 0.00 1.68 0.16 0.55 0.24 (5.00) 2.63 7.63
K 0.00 0.00 7.26 0.57 2.00 0.64 (11.12) 10.47 21.59
J 0.00 0.00 6.86 0.52 1.87 0.59 (12.79) 9.83 22.62

Unix Shell: Summing up values, one per line but skipping every nth row

I am trying to design a Unix shell script (preferably generic sh) that will take a file whose contents are numbers, one per line. These numbers are the CPU idle time from mpstat obtained by:
cat ${PARSE_FILE} | awk '{print $13}' | grep "^[!0-9]" > temp.txt
So the file is a list if numbers, like:
46.19
93.41
73.60
99.40
95.80
96.00
77.10
99.20
52.76
81.18
69.38
89.80
97.00
97.40
76.18
97.10
What these values really are is that line 1 is for Core 1, line 2 for Core 2, etc... for X number of cores (in my case 8) - so every 9th line is again for Core 1, etc...
The original file looks something like this:
10/28/2013 Linux 2.6.32-358.el6.x86_64 (host) 10/28/2013 _x86_64_
(32 CPU)
10/28/2013
10/28/2013 02:25:05 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10/28/2013 02:25:15 PM 0 51.20 0.00 2.61 0.00 0.00 0.00 0.00 0.00 46.19
10/28/2013 02:25:15 PM 1 6.09 0.00 0.50 0.00 0.00 0.00 0.00 0.00 93.41
10/28/2013 02:25:15 PM 2 25.20 0.00 1.20 0.00 0.00 0.00 0.00 0.00 73.60
10/28/2013 02:25:15 PM 3 0.40 0.00 0.20 0.00 0.00 0.00 0.00 0.00 99.40
10/28/2013 02:25:15 PM 4 3.80 0.00 0.40 0.00 0.00 0.00 0.00 0.00 95.80
10/28/2013 02:25:15 PM 5 3.70 0.00 0.30 0.00 0.00 0.00 0.00 0.00 96.00
10/28/2013 02:25:15 PM 6 21.70 0.00 1.20 0.00 0.00 0.00 0.00 0.00 77.10
10/28/2013 02:25:15 PM 7 0.70 0.00 0.10 0.00 0.00 0.00 0.00 0.00 99.20
10/28/2013 02:25:25 PM 0 45.03 0.00 1.61 0.00 0.00 0.60 0.00 0.00 52.76
10/28/2013 02:25:25 PM 1 17.82 0.00 1.00 0.00 0.00 0.00 0.00 0.00 81.18
10/28/2013 02:25:25 PM 2 29.62 0.00 1.00 0.00 0.00 0.00 0.00 0.00 69.38
10/28/2013 02:25:25 PM 3 9.70 0.00 0.40 0.00 0.00 0.10 0.00 0.00 89.80
10/28/2013 02:25:25 PM 4 2.40 0.00 0.60 0.00 0.00 0.00 0.00 0.00 97.00
10/28/2013 02:25:25 PM 5 2.00 0.00 0.60 0.00 0.00 0.00 0.00 0.00 97.40
10/28/2013 02:25:25 PM 6 22.92 0.00 0.90 0.00 0.00 0.00 0.00 0.00 76.18
10/28/2013 02:25:25 PM 7 2.40 0.00 0.50 0.00 0.00 0.00 0.00 0.00 97.10
I'm trying to design a script that will take the number of cores and this file as a variable and get me the average for each core and I'm not sure how to do this. Here is what I have:
cat ${PARSE_FILE} | awk '{print $13}' | grep "^[!0-9]" > temp.txt
NUMBER_OF_CORES=8
NUMBER_OF_LINES=`awk ' END { print NR } ' temp.txt`
NUMBER_OF_VALUES=`echo "scale=0;${NUMBER_OF_LINES}/${NUMBER_OF_CORES}" | bc`
for i in `seq 1 ${NUMBER_OF_CORES}`
do
awk 'NR % $i == 0' temp.txt
echo Core: ${i} Average: xx
done
So I have the number of values (lines over cores) that each core has, so that is every nth line I need to skip but I'm not sure how to cleanly do this. I basically need to loop every "NUMBER_OF_CORES" times through the file, skipping every "NUMBER_OF_CORES" line and summing them up to divide by "NUMBER_OF_VALUES".

Will this do ?
awk '/CPU/&&/idle/{f=1;next}f{a[$4]+=$13;b[$4]++}END{for(i in a){print i,a[i]/b[i]}}' your_file
Actually the number of cores is not needed here. It will calculate average idle time for all the cores available in the file
Tested:
> cat temp
10/28/2013 Linux 2.6.32-358.el6.x86_64 (host) 10/28/2013 _x86_64_
(32 CPU)
10/28/2013
10/28/2013 02:25:05 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10/28/2013 02:25:15 PM 0 51.20 0.00 2.61 0.00 0.00 0.00 0.00 0.00 46.19
10/28/2013 02:25:15 PM 1 6.09 0.00 0.50 0.00 0.00 0.00 0.00 0.00 93.41
10/28/2013 02:25:15 PM 2 25.20 0.00 1.20 0.00 0.00 0.00 0.00 0.00 73.60
10/28/2013 02:25:15 PM 3 0.40 0.00 0.20 0.00 0.00 0.00 0.00 0.00 99.40
10/28/2013 02:25:15 PM 4 3.80 0.00 0.40 0.00 0.00 0.00 0.00 0.00 95.80
10/28/2013 02:25:15 PM 5 3.70 0.00 0.30 0.00 0.00 0.00 0.00 0.00 96.00
10/28/2013 02:25:15 PM 6 21.70 0.00 1.20 0.00 0.00 0.00 0.00 0.00 77.10
10/28/2013 02:25:15 PM 7 0.70 0.00 0.10 0.00 0.00 0.00 0.00 0.00 99.20
10/28/2013 02:25:25 PM 0 45.03 0.00 1.61 0.00 0.00 0.60 0.00 0.00 52.76
10/28/2013 02:25:25 PM 1 17.82 0.00 1.00 0.00 0.00 0.00 0.00 0.00 81.18
10/28/2013 02:25:25 PM 2 29.62 0.00 1.00 0.00 0.00 0.00 0.00 0.00 69.38
10/28/2013 02:25:25 PM 3 9.70 0.00 0.40 0.00 0.00 0.10 0.00 0.00 89.80
10/28/2013 02:25:25 PM 4 2.40 0.00 0.60 0.00 0.00 0.00 0.00 0.00 97.00
10/28/2013 02:25:25 PM 5 2.00 0.00 0.60 0.00 0.00 0.00 0.00 0.00 97.40
10/28/2013 02:25:25 PM 6 22.92 0.00 0.90 0.00 0.00 0.00 0.00 0.00 76.18
10/28/2013 02:25:25 PM 7 2.40 0.00 0.50 0.00 0.00 0.00 0.00 0.00 97.10
> nawk '/CPU/&&/idle/{f=1;next}f{a[$4]+=$13;b[$4]++}END{for(i in a){print i,a[i]/b[i]}}' temp
2 71.49
3 94.6
4 96.4
5 96.7
6 76.64
7 98.15
0 49.475
1 87.295
>

The script below countCores.sh is based on the data you gave in temp.txt
This may not be what you want but will give you some ideas. I was'nt sure
what overall total average you wanted so I just chose to show average of the values
in column one for all 8 cores. I also used cat -n to represent the core number.
Hope This helps. VonBell
#!/bin/bash
#Execute As: countCores.sh temp.txt 8
AllCoreTotals=0
DataFile="$1"
NumCores="$2"
AllCoreTotals=0
NumLines="`cat -n $DataFile|cut -f1|tail -1|tr -d " "`"
PrtCols="`echo $NumLines / $NumCores|bc`"
clear;echo;echo
echo "============================================================="
pr -t${PrtCols} $DataFile|tr -d "\t"|tr -s " " "+"|bc |\
while read CoreTotal
do
CoreAverage=`echo $CoreTotal / $PrtCols|bc`
echo "$CoreTotal Core Average $CoreAverage"
AllCoreTotals="`echo $CoreTotal + $AllCoreTotals|bc`"
echo "$AllCoreTotals" > AllCoreTot.tmp
done|cat -n
AllCoreAverage=`cat AllCoreTot.tmp`
AllCoreAverage="`echo $AllCoreAverage / $NumCores|bc`"
echo "============================================================="
echo "(Col One) Total Core Average: $AllCoreAverage "
rm $DataFile
rm AllCoreTot.tmp

Why not do it for all cores at the same time:
awk -f prog.awk ${PARSE_FILE}
Then in prog.awk put
{ if ((NF == 13) && ($4 != "CPU"))
{ SUM[$4] += $13;
CNT[$4]++;
}
}
END { for(loop in SUM)
{ printf("CPU: %d Total: %d Count: %d Average: %d\n",
loop, SUM[loop], CNT[loop], SUM[loop]/CNT[loop]);
}
}
If you want to do it on one line:
awk '{if ((NF == 13) && ($4 != "CPU")){SUM[$4] += $13;CNT[$4]++;}} END {for(loop in SUM){printf("CPU: %d Total: %d Count: %d Average: %d\n", loop, SUM[loop], CNT[loop], SUM[loop]/CNT[loop]);}}' ${PARSE_FILE}

After some more study, this snippet seems to do the trick:
#Parse logs to get CPU averages for cores
PARSE_FILE=`ls ~/logs/*mpstat*`
echo "Parsing ${PARSE_FILE}..."
cat ${PARSE_FILE} | awk '{print $13}' | grep "^[!0-9]" > temp.txt
NUMBER_OF_CORES=8
NUMBER_OF_LINES=`awk ' END { print NR } ' temp.txt`
NUMBER_OF_VALUES=`echo "scale=0;${NUMBER_OF_LINES}/${NUMBER_OF_CORES}" | bc`
TOTAL=0
for i in `seq 1 ${NUMBER_OF_CORES}`
do
sed -n $i'~'$NUMBER_OF_CORES'p' temp.txt > temp2.txt
SUM=`awk '{s+=$0} END {print s}' temp2.txt`
AVERAGE=`echo "scale=0;${SUM}/${NUMBER_OF_VALUES}" | bc`
echo Core: ${i} Average: `expr 100 - ${AVERAGE}`
TOTAL=$((TOTAL+${AVERAGE}))
done
TOTAL_AVERAGE=`echo "scale=0;${TOTAL}/${NUMBER_OF_CORES}" | bc`
echo "Total Average: `expr 100 - ${TOTAL_AVERAGE}`"
rm temp*.txt

is it possible to create a graph by shell script

i want to create a graph file using shell script. For example, i want to make graph of sar output of my system.
sar 1 10
05:36:32 AM CPU %user %nice %system %iowait %steal %idle
05:36:33 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:34 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:35 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:36 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:37 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:38 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:39 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:40 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:41 AM all 0.00 0.00 0.00 0.00 0.00 100.00
05:36:42 AM all 0.00 0.00 0.00 0.00 0.00 100.00
Average: all 0.00 0.00 0.00 0.00 0.00 100.00

As a visualizer you can use Gnuplot.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Split multiple lines after matching a pattern - bash

You may use this awk: awk '/^\/aggr/ {close(fn); fn = "file" ++fNo ".txt"} {print > fn}' file

Related

get MAx value of a column withing a fixed timestamp

Grep not parsing the whole file

Sort by highest value in any field

Unix Shell: Summing up values, one per line but skipping every nth row

is it possible to create a graph by shell script

Categories

Resources