Grep rows from top command based on a condition - bash

[xxxxx#xxxx3 ~]$ top
top - 16:29:00 up 197 days, 19:06, 12 users, load average: 19.16, 21.08, 21.58
Tasks: 3668 total, 21 running, 3646 sleeping, 0 stopped, 1 zombie
Cpu(s): 14.1%us, 6.8%sy, 0.0%ni, 79.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 264389504k total, 53305000k used, 211084504k free, 859908k buffers
Swap: 134217720k total, 194124k used, 134023596k free, 12854016k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19938 jai_web 20 0 3089m 2.9g 7688 R 100.0 1.1 0:10.26 Engine
19943 jai_web 20 0 3089m 2.9g 7700 R 100.0 1.1 0:10.14 Engine
20147 jai_web 20 0 610m 454m 3556 R 78.4 0.2 0:02.54 java
77169 jai_web 20 0 9414m 1.4g 29m S 21.3 0.6 38:51.69 java
20160 jai_web 20 0 362m 196m 3336 R 16.7 0.1 0:00.54 java
272287 jai_web 20 0 20.1g 2.0g 5784 S 15.1 0.8 165:39.50 java
26597 jai_web 20 0 6371m 134m 3444 S 9.6 0.1 429:41.97 java
From the snippet of top command above i want to grep PIDs which have Value of TIME+ greater than 10:00:00 that belongs to 'java' process
so am expecting grep output as below:
77169 jai_web 20 0 9414m 1.4g 29m S 21.3 0.6 **38:51.69** java
272287 jai_web 20 0 20.1g 2.0g 5784 S 15.1 0.8 **165:39.58** java
26597 jai_web 20 0 6371m 134m 3444 S 9.6 0.1 **429:41.97** java
i have tried below:
top -p "$(pgrep -d ',' java)"
But doesnt satisfies my condition.Please assist

I would just do this for one time analysis.
$ top -n 1 -b | awk '$NF=="java" && $(NF-1) >= "10:00.00"'

Ok here is what I came up with...
You need to get the output of top, filter only the java lines, then check each line to see if the TIME is bigger than your limit. Here is what I did:
#!/bin/bash
#
tmpfile="/tmp/top.output"
top -o TIME -n 1 | grep java >$tmpfile
# filter each line and keep only the ones where TIME is bigger than a certain value
limit=10
while read line
do
# take the line and keep only the 11th field, which is the time value
# In that time value, keep only the first number
timevalue=$(echo $line | awk '{print $12}' | cut -d':' -f1)
# compare timevalue to the limit we set
if [ $timevalue -gt $limit ]
then
# output the entire line
echo $line
fi
done <$tmpfile
# cleanup
rm -f /tmp/top.output
The trick here is to extract the TIME value, only the first digits. The other digits are not significant, as long as the first is bigger than 10.
Someone might know of a way to do it via grep, but I doubt it, I have never seen conditionals in grep.

Related

How to convert file size to human readable and print with other columns?

I want to convert the 5th column in this command output to human readable format.
For ex if this is my input :
-rw-rw-r-- 1 bhagyaraj bhagyaraj 280000 Jun 17 18:34 demo1
-rw-rw-r-- 1 bhagyaraj bhagyaraj 2800000 Jun 17 18:34 demo2
-rw-rw-r-- 1 bhagyaraj bhagyaraj 28000000 Jun 17 18:35 demo3
To something like this :
-rw-rw-r-- 280K demo1
-rw-rw-r-- 2.8M demo2
-rw-rw-r-- 28M demo3
I tried this command, but this will return only the file size column.
ls -l | tail -n +2 |awk '{print $5 | "numfmt --to=si"}'
ls is just for example my use case is very huge and repeated execution must be avoided
Any help would be appreciated :)
Just use -h --si
-h, --human-readable with -l and -s, print sizes like 1K 234M 2G etc.
--si likewise, but use powers of 1000 not 1024
So the command would be
ls -lh --si | tail -n +2
If you don't use ls and the command you intend to run doesn't have an option similar to -h --si in ls then numfmt already has the --field option to specify which column you want to format. For example
$ df | LC_ALL=en_US.UTF-8 numfmt --header --field 2-4 --to=si
Filesystem 1K-blocks Used Available Use% Mounted on
udev 66M 0 66M 0% /dev
tmpfs 14M 7.2K 14M 1% /run
/dev/mapper/vg0-lv--0 4.1G 3.7G 416M 90% /
tmpfs 5.2K 4 5.2K 1% /run/lock
/dev/nvme2n1p1 524K 5.4K 518K 2% /boot/efi
Unfortunately although numfmt does try to preserve the columnation, it fails if there are some large variation in the line length after inserting group separators like you can see above. So sometimes you might still need to reformat the table with column
df | LC_ALL=en_US.UTF-8 numfmt --header --field 2-4 --to=si | column -t -R 2,3,4,5
The -R 2,3,4,5 option is for right alignment, but some column versions like the default one in Ubuntu don't support it so you need to remove that
Alternatively you can also use awk to format only the columns you want, for example column 5 in case of ls
$ ls -l demo* | awk -v K=1e3 -v M=1e6 -v G=1e9 'func format(v) {
if (v > G) return v/G "G"; else if (v > M) return v/M "M";
else if (v > K) return v/K "K"; else return v
} { $5 = format($5); print $0 }' | column -t
-rw-rw-r-- 1 ph ph 280K Jun 18 09:23 demo1
-rw-rw-r-- 1 ph ph 2.8M Jun 18 09:24 demo2
-rw-rw-r-- 1 ph ph 28M Jun 18 09:23 demo3
-rw-rw-r-- 1 ph ph 2.8G Jun 18 09:30 demo4
And column 2, 3, 4 in case of df
# M=1000 and G=1000000 because df output is 1K-block, not bytes
$ df | awk -v M=1000 -v G=1000000 'func format(v) {
if (v > G) return v/G "G"; else if (v > M) return v/M "M"; else return v
}
{
# Format only columns 2, 3 and 4, ignore header
if (NR > 1) { $2 = format($2); $3 = format($3); $4 = format($4) }
print $0
}' OFS="\t" | column -t
Filesystem 1K-blocks Used Available Use% Mounted on
udev 65.8273G 0 65.8273G 0% /dev
tmpfs 13.1772G 7M 13.1702G 1% /run
/dev/mapper/vg0-lv--0 4073.78G 3619.05G 415.651G 90% /
tmpfs 65.8861G 0 65.8861G 0% /dev/shm
tmpfs 5.12M 4 5.116M 1% /run/lock
tmpfs 65.8861G 0 65.8861G 0% /sys/fs/cgroup
/dev/nvme2n1p2 999.32M 363.412M 567.096M 40% /boot
UPDATE 1 :
if you need just a barebones module for byte size formatting (it's setup for base-2 now but modifying it for --si should be trivial):
{m,g}awk '
BEGIN { OFS="="
_____=log(__=(_+=_+=_^=_)^(____=++_))
} gsub("[.]0+ bytes",
" -bytes-",
$(!__*($++NF = sprintf("%#10.*f %s" ,____,
(_ = $!!__) / __^(___=int(log(_)/_____)),
!___ ? "bytes" : substr("KMGTPEZY",___,!!__)"iB"))))^!__'
=
734 734 -bytes-
180043 175.82324 KiB
232819 227.36230 KiB
421548373 402.01986 MiB
838593829 799.74540 MiB
3739382399 3.48257 GiB
116601682159 108.59378 GiB
147480014471 137.35147 GiB
11010032230111 10.01357 TiB
19830700070261 18.03592 TiB
111120670776601 101.06366 TiB
15023323323323321 13.34339 PiB
85255542224555233 75.72213 PiB
444444666677766611 394.74616 PiB
106941916666944416909 92.75733 EiB
111919999919911191991 97.07513 EiB
767777766776776776777767 650.33306 ZiB
5558888858993555888686669 4.59821 YiB
========================
this is probably waaaay overkill, but I wrote it a while back, which can calculate the human-readable value, as well as comma formatted of raw byte value, supporting everything from kilo-bit to yotta-byte
with options for :
base 2 or base 10 (enter 10 or "M/m" for metric)
bytes (B) or bits (b)
The only thing that needs to be hard coded in are the letters themselves, since they grow linearly upon either
every 3rd power of 10 (1,000), or
every 5th power of 4 (1,024)
.
{m,g}awk '
BEGIN {
1 FS = OFS = "="
}
2302812 $!NF = substr(bytesformat($2, 10, "B"), 1, 15)\
substr(bytesformat($2, 2, "B"), 1, 15)\
bytesformat($2, 2, "b")'
# Functions, listed alphabetically
6908436 function bytesformat(_,_______,________,__, ___, ____, _____, ______)
{
6908436 _____=__=(____^=___*=((____=___+=___^= "")/___)+___+___)
6908436 ___/=___
6908436 sub("^0+","",_)
6908436 ____=_____-= substr(_____,index(_____,index(_____,!__))) * (_______~"^(10|[Mm])$")
6908436 _______=length((____)____)^(________~"^b(it)?$")
6908436 if ((____*__) < (_______*_)) { # 6906267
24438981 do {
24438981 ____*=_____
24438981 ++___
} while ((____*__) < (_______*_))
}
6908436 __=_
6908436 sub("(...)+$", ",&", __)
6908436 gsub("[^#-.][^#-.][^#-.]", "&,", __)
6908436 gsub("[,]*$|^[,]+", "", __)
6908436 sub("^[.]", "0&", __)
6908436 return \
sprintf("%10.4f %s%s | %s byte%.*s",
_=="" ? +_:_/(_____^___)*_______,
substr("KMGTPEZY", ___, _^(_<_)),
--_______?"b":"B",__==""?+__:__,(_^(_<_))<_,"s")
}
|
In this sample, it's showing metric bytes, binary bytes, binary bits, and raw input byte value :
180.0430 KB | 175.8232 KB | 1.3736 Mb | 180,043 bytes
232.8190 KB | 227.3623 KB | 1.7763 Mb | 232,819 bytes
421.5484 MB | 402.0199 MB | 3.1408 Gb | 421,548,373 bytes
838.5938 MB | 799.7454 MB | 6.2480 Gb | 838,593,829 bytes
3.7394 GB | 3.4826 GB | 27.8606 Gb | 3,739,382,399 bytes
116.6017 GB | 108.5938 GB | 868.7502 Gb | 116,601,682,159 bytes
147.4800 GB | 137.3515 GB | 1.0731 Tb | 147,480,014,471 bytes
11.0100 TB | 10.0136 TB | 80.1085 Tb | 11,010,032,230,111 bytes
19.8307 TB | 18.0359 TB | 144.2873 Tb | 19,830,700,070,261 bytes
111.1207 TB | 101.0637 TB | 808.5093 Tb | 111,120,670,776,601 bytes
15.0233 PB | 13.3434 PB | 106.7471 Pb | 15,023,323,323,323,321 bytes
85.2555 PB | 75.7221 PB | 605.7771 Pb | 85,255,542,224,555,233 bytes
444.4447 PB | 394.7462 PB | 3.0840 Eb | 444,444,666,677,766,611 bytes
106.9419 EB | 92.7573 EB | 742.0586 Eb | 106,941,916,666,944,416,909 bytes
111.9200 EB | 97.0751 EB | 776.6010 Eb | 111,919,999,919,911,191,991 bytes
767.7778 ZB | 650.3331 ZB | 5.0807 Yb | 767,777,766,776,776,776,777,767 bytes
5.5589 YB | 4.5982 YB | 36.7856 Yb | 5,558,888,858,993,555,888,686,669 bytes

Wildcard symbol with grep -F

I have the following file
0 0
0 0.001
0 0.032
0 0.1241
0 0.2241
0 0.42
0.0142 0
0.0234 0
0.01429 0.01282
0.001 0.224
0.098 0.367
0.129 0
0.123 0.01282
0.149 0.16
0.1345 0.216
0.293 0
0.2439 0.01316
0.2549 0.1316
0.2354 0.5
0.3345 0
0.3456 0.0116
0.3462 0.316
0.3632 0.416
0.429 0
0.42439 0.016
0.4234 0.3
0.5 0
0.5 0.33
0.5 0.5
Notice that the two columns are sorted ascending, first by the first column and then by the second one. The minimum value is 0 and the maximum is 0.5.
I would like to count the number of lines that are:
0 0
and store that number in a file called "0_0". In this case, this file should contain "1".
Then, the same for those that are:
0 0.0*
For example,
0 0.032
And call it "0_0.0" (it should contain "2"), and this for all combinations only considering the first decimal digit (0 0.1*, 0 0.2* ... 0.0* 0, 0.0* 0.0* ... 0.5 0.5).
I am using this loop:
for i in 0 0.0 0.1 0.2 0.3 0.4 0.5
do
for j in 0 0.0 0.1 0.2 0.3 0.4 0.5
do
grep -F ""$i" "$j"" file | wc -l > "$i"_"$j"
done
done
rm 0_0 #this 0_0 output is badly done, the good way is with the next command, which accepts \n
pcregrep -M "0 0\n" file | wc -l > 0_0
The problem is that for example, line
0.0142 0
will not be recognized by the iteration "0.0 0", since there are digits after the "0.0". Removing the -F option in grep in order to consider all numbers that start by "0.0" will not work, since the point will be considered a wildcard symbol and therefore for example in the iteration "0.1 0" the line
0.0142 0
will be counted, because 0.0142 is a 0"anything"1.
I hope I am making myself clear!
Is there any way to include a wildcard symbol with grep -F, like in:
for i in 0 0.0 0.1 0.2 0.3 0.4 0.5
do
for j in 0 0.0 0.1 0.2 0.3 0.4 0.5
do
grep -F ""$i"* "$j"*" file | wc -l > "$i"_"$j"
done
done
(Please notice the asterisks after the variables in the grep command).
Thank you!
Don't use shell loops just to manipulate text, that's what the guys who invented shell also invented awk to do. See why-is-using-a-shell-loop-to-process-text-considered-bad-practice.
It sounds like all you need is:
awk '{cnt[substr($1,1,3)"_"substr($2,1,3)]++} END{ for (pair in cnt) {print cnt[pair] > pair; close(pair)} }' file
That will be vastly more efficient than your nested shell loops approach.
Here's what it'll be outputting to the files it creates:
$ awk '{cnt[substr($1,1,3)"_"substr($2,1,3)]++} END{for (pair in cnt) print pair "\t" cnt[pair]}' file
0.0_0.3 1
0_0.4 1
0.5_0 1
0.2_0.5 1
0.4_0.3 1
0.0_0 2
0.1_0.0 1
0.3_0 1
0.1_0.1 1
0.1_0.2 1
0.3_0.0 1
0_0 1
0.1_0 1
0.5_0.3 1
0.4_0 1
0.3_0.3 1
0.2_0.0 1
0_0.0 2
0.5_0.5 1
0.3_0.4 1
0.2_0.1 1
0.0_0.0 1
0_0.1 1
0_0.2 1
0.4_0.0 1
0.2_0 1
0.0_0.2 1

filter output of vmstat and iostat

I am gathering statistics data using iostat and vmstat and am running each one for 10 seconds regularly. However, I don't want to print out the whole output. For iostat I want to only show the number of reads and writes and display them as a column. With vmstat, I just want to show the free, cache and buffer columns. How can I do this? Any filters I use just return this result.
The systems are ubuntu 12.04 on both desktop terminal and server only version. they are run using vmware player.
ms total merged
0 0 0
0 0 0
0 0 0
0 0 0
758118 836340 1892
0 0 0
0 0 0
Assuming the output formats are as follows:
> iostat -dx sda
Linux 3.13.0-45-generic (hostname obscured) 03/22/2015 _x86_64_ (8 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 7.02 30.64 4.48 8.32 174.81 789.29 150.64 0.86 67.48 10.76 98.01 1.06 1.36
> vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 3728772 969952 614416 29911568 3 13 22 99 1 4 48 5 47 0 0
You can do the following for iostat (every 10 seconds if you'd like to):
device_name=sda # or whatever device name you want
iostat -dx ${device_name} | awk 'NR==4 { print $4 " " $5 }'
Example output (r/s w/s):
4.48 8.32
If you need a count greater than 1, do this:
iostat -dx ${device_name} ${interval} ${count} | awk 'NR==1 || /^$/ || /^Device:/ {next}; { print $4 " " $5 }'
Example output (for device_name=sda; interval=1; count=5):
10.24 8.88
0.00 0.00
0.00 2.00
0.00 0.00
0.00 0.00
And you can do the following for vmstat (every 10 seconds if you'd like to):
vmstat | awk 'NR==3 {print $4 " " $5 " " $6}'
Example output (free buff cache):
969952 614416 29911568

How to Extract some Fields from Real Time Output of a Command in Bash script

I want to extract some fields out of output of command xentop. It's like top command; provides an ongoing look at cpu usage,memory usage,...in real time.
If I run this command in batch mode, I will have its output as you see in a file:
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR VBD_RSECT VBD_WSECT SSID
Domain-0 -----r 13700 33.0 7127040 85.9 no limit n/a 8 0 0 0 0 0 0 0 0 0 0
fed18 -----r 738 190.6 1052640 12.7 1052672 12.7 3 1 259919 8265 1 0 82432 22750 2740966 1071672 0
and running this
cat file| tr '\r' '\n' | sed 's/[0-9][;][0-9][0-9][a-Z]/ /g' | col -bx | awk '{print $1,$4,$6}'
on this file gives me what I want
NAME CPU(%) MEM(%)
Domain-0 33.0 85.9
fed18 190.6 12.7
but my script doesn't work on realtime output of xentop. I even tried to just run xentop one time by setting itteration option as 1(xentop -i 1) but It does not work!
How can I pipe output of xentop as "not" realtime to my script?
It may not be sending any output to the standard output stream. There are several ways of sending output to the screen without using stdout. A quick google search didn't provide much information about how it works internally.
I use xentop version 1.0 on xenserver 7.0 like :
[root#xen] xentop -V
xentop 1.0
[root#xen] cat /etc/centos-release
XenServer release 7.0.0-125380c (xenenterprise)
If you want to save the xentop output you can do it with '-b' (batch mode) and '-i' (number of iterations before exiting) options :
[root#xen] xentop -b -i 1
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR VBD_RSECT VBD_WSECT SSID
Domain-0 -----r 132130 0.0 4194304 1.6 4194304 1.6 16 0 0 0 0 0 0 0 0 0 0
MY_VM --b--- 5652 0.0 16777208 6.3 16915456 6.3 4 0 0 0 1 - - - - - 0
[root#xen] xentop -b -i 1 > output.txt
[root#xen] cat output.txt
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR VBD_RSECT VBD_WSECT SSID
Domain-0 -----r 132130 0.0 4194304 1.6 4194304 1.6 16 0 0 0 0 0 0 0 0 0 0
MY_VM --b--- 5652 0.0 16777208 6.3 16915456 6.3 4 0 0 0 1 - - - - - 0

bash 'while read line' efficiency with big file

I was using a while loop to process a task,
which read records from a big file about 10 million lines.
I found that the processing become more and more slower as time goes by.
and I make a simulated script with 1 million lines as blow, which reveal the problem.
but I still don't know why, how does the read command work?
seq 1000000 > seq.dat
while read s;
do
if [ `expr $s % 50000` -eq 0 ];then
echo -n $( expr `date +%s` - $A) ' ';
A=`date +%s`;
fi
done < seq.dat
The terminal outputs the time interval:
98 98 98 98 98 97 98 97 98 101 106 112 121 121 127 132 135 134
at about 50,000 lines,the processing become slower obviously.
Using your code, I saw the same pattern of increasing times (right from the beginning!). If you want faster processing, you should rewrite using shell internal features. Here's my bash version:
tabChar=" " # put a real tab char here, of course
seq 1000000 > seq.dat
while read s;
do
if (( ! ( s % 50000 ) )) ;then
echo $s "${tabChar}" $( expr `date +%s` - $A)
A=$(date +%s);
fi
done < seq.dat
edit
fixed bug, output indicated each line was being processed, now only every 50000'th line gets the timing treatment. Doah!
was
if (( s % 50000 )) ;then
fixed to
if (( ! ( s % 50000 ) )) ;then
output now echo ${.sh.version} = Version JM 93t+ 2010-05-24
50000
100000 1
150000 0
200000 1
250000 0
300000 1
350000 0
400000 1
450000 0
500000 1
550000 0
600000 1
650000 0
700000 1
750000 0
output bash
50000 480
100000 3
150000 2
200000 3
250000 3
300000 2
350000 3
400000 3
450000 2
500000 2
550000 3
600000 2
650000 2
700000 3
750000 3
800000 2
850000 2
900000 3
950000 2
800000 1
850000 0
900000 1
950000 0
1e+06 1
As to why your original test case is taking so long ... not sure. I was surprised to see both the time for each test cyle AND the increase in time. If you really need to understand this, you may need to spend time instrumenting more test stuff. Maybe you'd see something running truss or strace (depending on your base OS).
I hope this helps.
Read is a comparatively slow process, as the author of "Learning the Korn Shell" points out*. (Just above Section 7.2.2.1.) There are other programs, such as awk or sed that have been highly optimized to do what is essentially the same thing: read from a file one line at a time and perform some operations using that input.
Not to mention, that you're calling an external process every time you're doing subtraction or taking the modulus, which can get expensive. awk has both of those functionalities built in.
As the following test points out, awk is quite a bit faster:
#!/usr/bin/env bash
seq 1000000 |
awk '
BEGIN {
command = "date +%s"
prevTime = 0
}
$1 % 50000 == 0 {
command | getline currentTime
close(command)
print currentTime - prevTime
prevTime = currentTime
}
'
Output:
1335629268
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
Note that the first number is equivalent to date +%s. Just like in your test case, I let the first match be.
Note
*Yes the author is talking about the Korn Shell, not bash as the OP tagged, but bash and ksh are rather similar in a lot of ways. ksh is actually a superset of bash. So I would assume that the read command is not drastically different from one shell to another.

Resources