filter output of vmstat and iostat - bash

I am gathering statistics data using iostat and vmstat and am running each one for 10 seconds regularly. However, I don't want to print out the whole output. For iostat I want to only show the number of reads and writes and display them as a column. With vmstat, I just want to show the free, cache and buffer columns. How can I do this? Any filters I use just return this result.
The systems are ubuntu 12.04 on both desktop terminal and server only version. they are run using vmware player.
ms total merged
0 0 0
0 0 0
0 0 0
0 0 0
758118 836340 1892
0 0 0
0 0 0

Assuming the output formats are as follows:
> iostat -dx sda
Linux 3.13.0-45-generic (hostname obscured) 03/22/2015 _x86_64_ (8 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 7.02 30.64 4.48 8.32 174.81 789.29 150.64 0.86 67.48 10.76 98.01 1.06 1.36
> vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 3728772 969952 614416 29911568 3 13 22 99 1 4 48 5 47 0 0
You can do the following for iostat (every 10 seconds if you'd like to):
device_name=sda # or whatever device name you want
iostat -dx ${device_name} | awk 'NR==4 { print $4 " " $5 }'
Example output (r/s w/s):
4.48 8.32
If you need a count greater than 1, do this:
iostat -dx ${device_name} ${interval} ${count} | awk 'NR==1 || /^$/ || /^Device:/ {next}; { print $4 " " $5 }'
Example output (for device_name=sda; interval=1; count=5):
10.24 8.88
0.00 0.00
0.00 2.00
0.00 0.00
0.00 0.00
And you can do the following for vmstat (every 10 seconds if you'd like to):
vmstat | awk 'NR==3 {print $4 " " $5 " " $6}'
Example output (free buff cache):
969952 614416 29911568

Related

Split column into multiple based on match/delimiter using bash awk

I have a dataset in a single column that I would like to split into any number of new columns when a certain string is found (in this case 'male_position'.
>cat test.file
male_position
0.00
0.00
1.05
1.05
1.05
1.05
3.1
5.11
12.74
30.33
40.37
40.37
male_position
0.00
1.05
2.2
4.0
4.0
8.2
25.2
30.1
male_position
1.0
5.0
I would like the script to produce new tab separated columns each time 'male_position' is encountered but just print each each line/data point below that (added to that column) until the next occurrence of 'male_position':
script.awk test.file > output
0.00 0.00 1.0
0.00 1.05 5.0
1.05 2.2
1.05 4.0
1.05 4.0
1.05 8.2
3.1 25.2
5.11 30.1
12.74
30.33
40.37
40.37
Any ideas?
update -
I have tried to adapt code based on this post(Linux split a column into two different columns in a same CSV file)
cat script.awk
BEGIN {
line = 0; #Initialize at zero
}
/male_position/ { #every time we hit the delimiter
line = 0; #resed line to zero
}
!/male_position/{ #otherwise
a[line] = a[line]" "$0; # Add the new input line to the output line
line++; # increase the counter by one
}
END {
for (i in a )
print a[i] # print the output
}
Results....
$ awk -f script.awk test.file
1.05 2.2
1.05 4.0
1.05 4.0
1.05 8.2
3.1 25.2
5.11 30.1
12.74
30.33
40.37
40.37
0.00 0.00 1.0
0.00 1.05 5.0
UPDATE 2 #######
I can recreate the expected with the test.file case. Running the script (script.awk) on Linux with test file and 'awk.script"(see above) seemed to work. However, that simple example file has only decreasing numbers of columns (data points) between the delimiter (male_position). When you increase the number of columns between, the output seems to fail...
cat test.file2
male_position
0.00
0.00
1.05
1.05
1.05
1.05
3.1
5.11
12.74
male_position
0
5
10
male_position
0
1
2
3
5
awk -f script.awk test.file2
0.00 0 0
0.00 5 1
1.05 10 2
1.05 3
1.05 5
1.05
3.1
5.11
12.74
there is no 'padding' of the lines after the the last observation for a given column, so a column with more values than the predeeding column has its values fall in line with the previous column ( the 3 and the 5 are in column 2, when they should be in column 3).
Here's a csplit+paste solution
$ csplit --suppress-matched -zs test.file2 /male_position/ {*}
$ ls
test.file2 xx00 xx01 xx02
$ paste xx*
0.00 0 0
0.00 5 1
1.05 10 2
1.05 3
1.05 5
1.05
3.1
5.11
12.74
From man csplit
csplit - split a file into sections determined by context lines
-z, --elide-empty-files
remove empty output files
-s, --quiet, --silent
do not print counts of output file sizes
--suppress-matched
suppress the lines matching PATTERN
/male_position/ is the regex used to split the input file
{*} specifies to create as many splits as possible
use -f and -n options to change the default output file names
paste xx* to paste the files column wise, TAB is default separator
Following awk may help you on same.
awk '/male_position/{count++;max=val>max?val:max;val=1;next} {array[val++,count]=$0} END{for(i=1;i<=max;i++){for(j=1;j<=count;j++){printf("%s%s",array[i,j],j==count?ORS:OFS)}}}' OFS="\t" Input_file
Adding a non-one liner form of solution too now.
awk '
/male_position/{
count++;
max=val>max?val:max;
val=1;
next}
{
array[val++,count]=$0
}
END{
for(i=1;i<=max;i++){
for(j=1;j<=count;j++){ printf("%s%s",array[i,j],j==count?ORS:OFS) }}
}
' OFS="\t" Input_file

Grep rows from top command based on a condition

[xxxxx#xxxx3 ~]$ top
top - 16:29:00 up 197 days, 19:06, 12 users, load average: 19.16, 21.08, 21.58
Tasks: 3668 total, 21 running, 3646 sleeping, 0 stopped, 1 zombie
Cpu(s): 14.1%us, 6.8%sy, 0.0%ni, 79.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 264389504k total, 53305000k used, 211084504k free, 859908k buffers
Swap: 134217720k total, 194124k used, 134023596k free, 12854016k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19938 jai_web 20 0 3089m 2.9g 7688 R 100.0 1.1 0:10.26 Engine
19943 jai_web 20 0 3089m 2.9g 7700 R 100.0 1.1 0:10.14 Engine
20147 jai_web 20 0 610m 454m 3556 R 78.4 0.2 0:02.54 java
77169 jai_web 20 0 9414m 1.4g 29m S 21.3 0.6 38:51.69 java
20160 jai_web 20 0 362m 196m 3336 R 16.7 0.1 0:00.54 java
272287 jai_web 20 0 20.1g 2.0g 5784 S 15.1 0.8 165:39.50 java
26597 jai_web 20 0 6371m 134m 3444 S 9.6 0.1 429:41.97 java
From the snippet of top command above i want to grep PIDs which have Value of TIME+ greater than 10:00:00 that belongs to 'java' process
so am expecting grep output as below:
77169 jai_web 20 0 9414m 1.4g 29m S 21.3 0.6 **38:51.69** java
272287 jai_web 20 0 20.1g 2.0g 5784 S 15.1 0.8 **165:39.58** java
26597 jai_web 20 0 6371m 134m 3444 S 9.6 0.1 **429:41.97** java
i have tried below:
top -p "$(pgrep -d ',' java)"
But doesnt satisfies my condition.Please assist
I would just do this for one time analysis.
$ top -n 1 -b | awk '$NF=="java" && $(NF-1) >= "10:00.00"'
Ok here is what I came up with...
You need to get the output of top, filter only the java lines, then check each line to see if the TIME is bigger than your limit. Here is what I did:
#!/bin/bash
#
tmpfile="/tmp/top.output"
top -o TIME -n 1 | grep java >$tmpfile
# filter each line and keep only the ones where TIME is bigger than a certain value
limit=10
while read line
do
# take the line and keep only the 11th field, which is the time value
# In that time value, keep only the first number
timevalue=$(echo $line | awk '{print $12}' | cut -d':' -f1)
# compare timevalue to the limit we set
if [ $timevalue -gt $limit ]
then
# output the entire line
echo $line
fi
done <$tmpfile
# cleanup
rm -f /tmp/top.output
The trick here is to extract the TIME value, only the first digits. The other digits are not significant, as long as the first is bigger than 10.
Someone might know of a way to do it via grep, but I doubt it, I have never seen conditionals in grep.

awk condition always TRUE in a loop [duplicate]

This question already has answers here:
How do I use shell variables in an awk script?
(7 answers)
Closed 7 years ago.
Good morning,
I'm sorry this question will seem trivial to some. It has been driving me mad for hours. My problem is the following:
I have these two files:
head <input file>
SNP CHR BP A1 A2 OR P
chr1:751343 1 751343 A T 0.85 0.01
chr1:751756 1 751756 T C 1.17 0.01
rs3094315 1 752566 A G 1.14 0.0093
rs3131972 1 752721 A G 0.88 0.009
rs3131971 1 752894 T C 0.87 0.01
chr1:753405 1 753405 A C 1.17 0.01
chr1:753425 1 753425 T C 0.87 0.0097
rs2073814 1 753474 G C 1.14 0.009
rs2073813 1 753541 A G 0.85 0.0095
and
head <interval file>
1 112667912 114334946
1 116220516 117220516
1 160997252 161997252
1 198231312 199231314
2 60408994 61408994
2 64868452 65868452
2 99649474 100719272
2 190599907 191599907
2 203245673 204245673
2 203374196 204374196
I would like to use a bash script to remove all lines from the input file in which the BP column lies within the interval specified in the input file and in which there is matching of the CHR column with the first column of the interval file.
Here is the code I've been working with (although a simpler solution would be welcomed):
while read interval; do
chr=$(echo $interval | awk '{print $1}')
START=$(echo $interval | awk '{print $2}')
STOP=$(echo $interval | awk '{print $3}')
awk '$2!=$chr {print} $2==$chr && ($3<$START || $3>$STOP) {print}' < input_file > tmp
mv tmp <input file>
done <
My problem is that no lines are removed from the input file. Even if the command
awk '$2==1 && ($3>112667912 && $3<114334946) {print}' < input_file | wc -l
returns >4000 lines, so the lines clearly are in the input file.
Thank you very much for your help.
You can try with perl instead of awk. The reason is that in perl you can create a hash of arrays to save the data of interval file, and extract it easier when processing your input, like:
perl -lane '
$. == 1 && next;
#F == 3 && do {
push #{$h{$F[0]}}, [#F[1..2]];
next;
};
#F == 7 && do {
$ok = 1;
if (exists $h{$F[1]}) {
for (#{$h{$F[1]}}) {
if ($F[2] > $_->[0] and $F[2] < $_->[1]) {
$ok = 0;
last;
}
}
}
printf qq|%s\n|, $_ if $ok;
};
' interval input
$. skips header of interval file. #F checks number of columns and the push creates the hash of arrays.
Your test data is not accurate because none line is filtered out, I changed it to:
SNP CHR BP A1 A2 OR P
chr1:751343 1 751343 A T 0.85 0.01
chr1:751756 1 112667922 T C 1.17 0.01
rs3094315 1 752566 A G 1.14 0.0093
rs3131972 1 752721 A G 0.88 0.009
rs3131971 1 752894 T C 0.87 0.01
chr1:753405 2 753405 A C 1.17 0.01
chr1:753425 1 753425 T C 0.87 0.0097
rs2073814 1 199231312 G C 1.14 0.009
rs2073813 2 204245670 A G 0.85 0.0095
So you can run it and get as result:
SNP CHR BP A1 A2 OR P
chr1:751343 1 751343 A T 0.85 0.01
rs3094315 1 752566 A G 1.14 0.0093
rs3131972 1 752721 A G 0.88 0.009
rs3131971 1 752894 T C 0.87 0.01
chr1:753405 2 753405 A C 1.17 0.01
chr1:753425 1 753425 T C 0.87 0.0097

Print awk for empty row

I have one problem. I am using this code in bash and awk:
#!/bin/bash
awk 'BEGIN {print "CHR\tSTART\tSTOP\tPOCET_READU\tGCcontent"}'
for z in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
do
export $z
for i in {0..249480000..60000}
do
u=$i
let "u +=60000"
export $i
export $u
samtools view /home/filip/Desktop/AMrtin\ Hynek/highThan89MapQ.bam chr$z:$i-$u | awk '{ n=length($10); print gsub(/[GCCgcs]/,"",$10)/n;}'| awk -v chr="chr"$z -v min=$i -v max=$u '{s+=$1}END{print chr,"\t",min,"\t",max,"\t",NR,"\t",s/NR}}'
done
done
From this I am getting the result like this:
chr1 60000 120000 30 0.333
chr3 540000 600000 10 0.555
The step of loop is 60000, but if I divide s/NR, sometimes the NR is 0 and this row is not in output. Thank I wan to get if the NR=0 and s/NR does not exist (because we cannot divide by 0):
chr1 0 60000 N/A N/A
chr1 60000 120000 30 0.333
chr3 480000 540000 N/A N/A
chr3 540000 600000 10 0.555
I tried use condition like
{s+=$1}END{print chr,"\t",min,"\t",max,"\t",NR,"\t",s/NR; if (S/NR == "") print chr,"\t",min,"\t",max,"\t","N/A","\t","N/A"}'
But it doesnt work.
Could you help me please?
The problem is you're dividing by zero, which is an error. You need to test NR before doing the division.
awk -v chr="chr"$z -v min=$i -v max=$u '
{s+=$1}
END {print chr, "\t", min, "\t", max, "\t", (NR ? NR : "N/A"), "\t", (NR ? s/NR : "N/A")}'

How to Extract some Fields from Real Time Output of a Command in Bash script

I want to extract some fields out of output of command xentop. It's like top command; provides an ongoing look at cpu usage,memory usage,...in real time.
If I run this command in batch mode, I will have its output as you see in a file:
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR VBD_RSECT VBD_WSECT SSID
Domain-0 -----r 13700 33.0 7127040 85.9 no limit n/a 8 0 0 0 0 0 0 0 0 0 0
fed18 -----r 738 190.6 1052640 12.7 1052672 12.7 3 1 259919 8265 1 0 82432 22750 2740966 1071672 0
and running this
cat file| tr '\r' '\n' | sed 's/[0-9][;][0-9][0-9][a-Z]/ /g' | col -bx | awk '{print $1,$4,$6}'
on this file gives me what I want
NAME CPU(%) MEM(%)
Domain-0 33.0 85.9
fed18 190.6 12.7
but my script doesn't work on realtime output of xentop. I even tried to just run xentop one time by setting itteration option as 1(xentop -i 1) but It does not work!
How can I pipe output of xentop as "not" realtime to my script?
It may not be sending any output to the standard output stream. There are several ways of sending output to the screen without using stdout. A quick google search didn't provide much information about how it works internally.
I use xentop version 1.0 on xenserver 7.0 like :
[root#xen] xentop -V
xentop 1.0
[root#xen] cat /etc/centos-release
XenServer release 7.0.0-125380c (xenenterprise)
If you want to save the xentop output you can do it with '-b' (batch mode) and '-i' (number of iterations before exiting) options :
[root#xen] xentop -b -i 1
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR VBD_RSECT VBD_WSECT SSID
Domain-0 -----r 132130 0.0 4194304 1.6 4194304 1.6 16 0 0 0 0 0 0 0 0 0 0
MY_VM --b--- 5652 0.0 16777208 6.3 16915456 6.3 4 0 0 0 1 - - - - - 0
[root#xen] xentop -b -i 1 > output.txt
[root#xen] cat output.txt
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR VBD_RSECT VBD_WSECT SSID
Domain-0 -----r 132130 0.0 4194304 1.6 4194304 1.6 16 0 0 0 0 0 0 0 0 0 0
MY_VM --b--- 5652 0.0 16777208 6.3 16915456 6.3 4 0 0 0 1 - - - - - 0

Resources