BASH - extract integer from logfile with sed - bash

I've got the following logfile and I'd like to extract the number of dropped packets (in the following example the number is 0):
ITGDec version 2.8.1 (r1023)
Compile-time options: bursty multiport
----------------------------------------------------------
Flow number: 1
From 192.168.1.2:0
To 192.168.1.2:8999
----------------------------------------------------------
Total time = 2.990811 s
Total packets = 590
Minimum delay = 0.000033 s
Maximum delay = 0.000169 s
Average delay = 0.000083 s
Average jitter = 0.000010 s
Delay standard deviation = 0.000016 s
Bytes received = 241900
Average bitrate = 647.048576 Kbit/s
Average packet rate = 197.270907 pkt/s
Packets dropped = 0 (0.00 %)
Average loss-burst size = 0.000000 pkt
----------------------------------------------------------
__________________________________________________________
**************** TOTAL RESULTS ******************
__________________________________________________________
Number of flows = 1
Total time = 2.990811 s
Total packets = 590
Minimum delay = 0.000033 s
Maximum delay = 0.000169 s
Average delay = 0.000083 s
Average jitter = 0.000010 s
Delay standard deviation = 0.000016 s
Bytes received = 241900
Average bitrate = 647.048576 Kbit/s
Average packet rate = 197.270907 pkt/s
Packets dropped = 0 (0.00 %)
Average loss-burst size = 0 pkt
Error lines = 0
----------------------------------------------------------
I'm trying with the following command:
cat logfile | grep -m 1 dropped | sed -n 's/.*=\([0-9]*\) (.*/\1/p'
but nothing gets printed.
Thank you
EDIT: I just wanted to tell you that the "Dropped packets" line gets printed in the following way in the code of the program:
printf("Packets dropped = %13lu (%3.2lf %%)\n", (long unsigned int) 0, (double) 0);

It will be easier to use awk here:
awk '/Packets dropped/{print $4}' logfile

Aside from the problem in your sed expression (that it doesn't allow space after =), you don't really need a pipeline here.
grep would suffice:
grep -m 1 -oP 'dropped\s*=\s*\K\d+' logfile
You could have fixed your sed expression by permitting space after the =:
sed -n 's/.*= *\([0-9]*\) (.*/\1/p'

Avoiding your use of cat and grep, in plain sed:
sed -n 's/^Packets dropped[=[:space:]]\+\([0-9]\+\).*/\1/p' logfile
Matches
any line starting with "Packets dropped"
one or more whitespace or "=" characters
one or more digits (which are captured)
The rest .* is discarded.
With the -r option as well, you can lose a few backslashes:
sed -nr 's/^Packets dropped[=[:space:]]+([0-9]+).*/\1/p' logfile

sed -n '/Packets dropped/ s/.*[[:space:]]\([0-9]\{1,\}\)[[:space:]].*/\1/p' YourFile
but print both 2 line (detail + summary) where info is write

Related

Is there a way to analyze packet intervals in order to output # of packets per second?

I have about 54,000 packets to analyze and I am trying to determine the average # of packets per second (as well as the min and max # of packets during a given second)
My input file is a single column of the packet times (see sample below):
0.004
0.015
0.030
0.050
..
..
1999.99
I've used awk to determine the timing deltas but can't figure out a way to parse out the chunks of time to get an output of:
0-1s = 10 packets
1-2s = 15 packets
etc
Here is an example of how you can use awk to get the desired output.
Suppose your original input file is sample.txt, first thing to do is reverse sort it (sort -nr) then you can supply awk with the newly sorted file along with the time variable through awk "-v" argument. Perform your tests inside awk, make use of "next" to skip lines and "exit" to quit the awk script when needed.
#!/bin/bash
#
for i in 0 1 2 3
do
sort -nr sample.txt |awk -v time=$i 'BEGIN{number=0}''{
if($1>=(time+1)){next}
else if( $1>=time && $1 <(time+1))
{number+=1}
else{
printf "[ %d - %d [ : %d records\n",time,time+1,number;exit}
}'
done
Here's the sample file:
0.1
0.2
0.8
.
.
0.94
.
.
1.5
1.9
.
3.0
3.6
Here's the program's output:
[ 1 - 2 [ : 5 records
[ 2 - 3 [ : 8 records
[ 3 - 4 [ : 2 records
Hope this helps !
Would you please try the followings:
With bash:
max=0
while read -r line; do
i=${line%.*} # extract the integer part
a[$i]=$(( ${a[$i]} + 1 )) # increment the array element
(( i > max )) && max=$i # update the maximum index
done < sample.txt
# report the summary
for (( i=0; i<=max; i++ )); do
printf "%d-%ds = %d packets\n" "$i" $(( i+1 )) "${a[$i]}"
done
With AWK:
awk '
{
i = int($0)
a[i]++
if (i > max) max = i
}
END {
for (i=0; i<=max; i++)
printf("%d-%ds = %d packets\n", i, i+1, a[i])
}' sample.txt
sample.txt:
0.185
0.274
0.802
1.204
1.375
1.636
1.700
1.774
1.963
2.044
2.112
2.236
2.273
2.642
2.882
3.000
3.141
5.023
5.082
Output:
0-1s = 3 packets
1-2s = 6 packets
2-3s = 6 packets
3-4s = 2 packets
4-5s = 0 packets
5-6s = 2 packets
Hope this helps.

splitting large file into small files

I have a text file which has thousands of number values like
1
2
3
4
5
.
.
.
.
n
I know we can use awk to separate these values. But is there a way in which one can fetch first 10,20,40,80,160....,n values in different text files.
I was using python to do so but it takes a lot of time to separate these files.Here is the sample code for python
import numpy as np
from itertools import islice
data = np.loadtxt('ABC.txt',
unpack=True,
delimiter=',',
skiprows=1)
n = 10
iterator = list(islice(data[0], n))
for item in range(n):
np.savetxt('output1.txt',iterator,delimiter=',',fmt='%10.5f')
iterator = list(islice(data[0], n*2))
for item in iterator:
np.savetxt('output2.txt', iterator, delimiter=',',fmt='%10.5f')
iterator = list(islice(data[0], n*4))
for item in iterator:
np.savetxt('output3.txt', iterator, delimiter=',',fmt='%10.5f')
iterator = list(islice(data[0], n*8))
for item in iterator:
np.savetxt('output4.txt', iterator, delimiter=',',fmt='%10.5f')
and so on.
Is there a better way to do this in bash or in python. Thank you in advance!
an inefficient but quick to implement apprach
s=5; for i in {1..10}; do ((s*=2)); head -$s file > sub$i; done
since the files are overlapping, there will be better ways, but based on the size of the file and how many times it needs to be repeated this might be good enough.
You didn't provide any sample input and expected output and the text of your questions is ambiguous so this is just a guess but this MAY be what you're looking for:
$ seq 1000 | awk -v c=10 'NR==c{print; c=2*c}'
10
20
40
80
160
320
640
If not then edit your question to clarify.
SED is your friend:
$ numlines=$( wc -l big_text_file.txt | cut -d' ' -f1 )
$ step=100
$ echo $numlines
861
$ for (( ii=1; ii<=$numlines; ii+=$step )); do echo $ii,$(( ii+step-1 ))w big_text_file.${ii}.txt; done > break.sed
$ cat break.sed
1,100w big_text_file.1.txt
101,200w big_text_file.101.txt
201,300w big_text_file.201.txt
301,400w big_text_file.301.txt
401,500w big_text_file.401.txt
501,600w big_text_file.501.txt
601,700w big_text_file.601.txt
701,800w big_text_file.701.txt
801,900w big_text_file.801.txt
$ sed -n -f break.sed big_text_file.txt
$ wc -l big_text_file*.txt
100 big_text_file.101.txt
100 big_text_file.1.txt
100 big_text_file.201.txt
100 big_text_file.301.txt
100 big_text_file.401.txt
100 big_text_file.501.txt
100 big_text_file.601.txt
100 big_text_file.701.txt
61 big_text_file.801.txt
861 big_text_file.txt
1722 total

Calculate CPU per process

I'm trying to write a script that gives back the CPU usage (in %) for a specific process I need to use the /proc/PID/stat because ps aux is not present on the embedded system.
I tried this:
#!/usr/bin/env bash
PID=$1
PREV_TIME=0
PREV_TOTAL=0
while true;do
TOTAL=$(grep '^cpu ' /proc/stat |awk '{sum=$2+$3+$4+$5+$6+$7+$8+$9+$10; print sum}')
sfile=`cat /proc/$PID/stat`
PROC_U_TIME=$(echo $sfile|awk '{print $14}')
PROC_S_TIME=$(echo $sfile|awk '{print $15}')
PROC_CU_TIME=$(echo $sfile|awk '{print $16}')
PROC_CS_TIME=$(echo $sfile|awk '{print $17}')
let "PROC_TIME=$PROC_U_TIME+$PROC_CU_TIME+$PROC_S_TIME+$PROC_CS_TIME"
CALC="scale=2 ;(($PROC_TIME-$PREV_TIME)/($TOTAL-$PREV_TOTAL)) *100"
USER=`bc <<< $CALC`
PREV_TIME="$PROC_TIME"
PREV_TOTAL="$TOTAL"
echo $USER
sleep 1
done
But is doesn't give the correct value if i compare this to top. Do some of you know where I make a mistake?
Thanks
Under a normal invocation of top (no arguments), the %CPU column is the proportion of ticks used by the process against the total ticks provided by one CPU, over a period of time.
From the top.c source, the %CPU field is calculated as:
float u = (float)p->pcpu * Frame_tscale;
where pcpu for a process is the elapsed user time + system time since the last display:
hist_new[Frame_maxtask].tics = tics = (this->utime + this->stime);
...
if(ptr) tics -= ptr->tics;
...
// we're just saving elapsed tics, to be converted into %cpu if
// this task wins it's displayable screen row lottery... */
this->pcpu = tics;
and:
et = (timev.tv_sec - oldtimev.tv_sec)
+ (float)(timev.tv_usec - oldtimev.tv_usec) / 1000000.0;
Frame_tscale = 100.0f / ((float)Hertz * (float)et * (Rc.mode_irixps ? 1 : Cpu_tot));
Hertz is 100 ticks/second on most systems (grep 'define HZ' /usr/include/asm*/param.h), et is the elapsed time in seconds since the last displayed frame, and Cpu_tot is the numer of CPUs (but the 1 is what's used by default).
So, the equation on a system using 100 ticks per second for a process over T seconds is:
(curr_utime + curr_stime - (last_utime + last_stime)) / (100 * T) * 100
The script becomes:
#!/bin/bash
PID=$1
SLEEP_TIME=3 # seconds
HZ=100 # ticks/second
prev_ticks=0
while true; do
sfile=$(cat /proc/$PID/stat)
utime=$(awk '{print $14}' <<< "$sfile")
stime=$(awk '{print $15}' <<< "$sfile")
ticks=$(($utime + $stime))
pcpu=$(bc <<< "scale=4 ; ($ticks - $prev_ticks) / ($HZ * $SLEEP_TIME) * 100")
prev_ticks="$ticks"
echo $pcpu
sleep $SLEEP_TIME
done
The key differences between this approach and that of your original script is that top is computing its CPU time percentages against 1 CPU, whereas you were attempting to do so against the aggregate total for all CPUs. It's also true that you can compute the exact aggregate ticks over a period of time by doing Hertz * time * n_cpus, and that it may not necessarily be the case that the numbers in /proc/stat will sum correctly:
$ grep 'define HZ' /usr/include/asm*/param.h
/usr/include/asm-generic/param.h:#define HZ 100
$ grep ^processor /proc/cpuinfo | wc -l
16
$ t1=$(awk '/^cpu /{sum=$2+$3+$4+$5+$6+$7+$8+$9+$10; print sum}' /proc/stat) ; sleep 1 ; t2=$(awk '/^cpu /{sum=$2+$3+$4+$5+$6+$7+$8+$9+$10; print sum}' /proc/stat) ; echo $(($t2 - $t1))
1602

Script to extract highest latency from traceroute

I'm looking for a script that can extract the line with the highest latency hop from a traceroute. Ideally it would look at the max or avg of the 3 values by line. How can I so that?
This is what I tried so far:
traceroute www.google.com | awk '{printf "%s\t%s\n", $2, $3+$4+$5; }' | sort -rgk2 | head -n1
traceroute -w10 www.google.com | awk '{printf "%s\t%s\n", $2, ($3+$4+$5)/3; }' | sort -rgk2 | head -n1
It seemed a step in the right direction, except some of the values coming back from a traceroute are *, so both the sum and the average provide a wrong value.
Update
Got one step further:
traceroute www.cnn.com | awk '{count = 0;sum = 0;for (i=3; i<6; i++){ if ($i != "*") {sum += $i;count++;}}; printf "%s\t%s\t%s\t%s\n", $2, count, sum, sum/count }' | sort -rgk2
now need to intercept if I dont' have a column 4,5. Sometimes traceroute only provides 3 stars like this:
17 207.88.13.153 235.649ms 234.864ms 239.316ms
18 * * *
You will have to
Kick off a traceroute
Collect each line of output ( a pipe would likely work well here)
Use a tool like awk to
Analyze the line and extract the information you want
Compare the values you just got with previous values and store the current line if appropriate
At the end of the input print the stored value
Try:
$ traceroute 8.8.8.8 | awk ' BEGIN { FPAT="[0-9]+\\.[0-9]{3} ms" }
/[\\* ]{3}/ {next}
NR>1 {
for (i=1;i<4;i++) {gsub("*","5000.00 ms",$i)}
av = (gensub(" ms","",1,$1) + gensub(" ms","",1,$2) + gensub(" ms","",1,$3))/3
if (av > worst) {
ln = $0
worst = av
}
}
ND { print "Highest:", ln, " Average:", worst, "ms"}'
which gives:
Highest: 6 72.14.242.166 (72.14.242.166) 7.383 ms 72.14.232.134 (72.14.232.134) 7.865 ms 7.768 ms Average: 7.672 ms
If there are three asterix (asteri?) * * * the script assumes that the hop isn't responding with the IGMP response and ignores it completely. If there are one or two * in a line, it gives them the value of 5.0 seconds.
Stephan, you could try and use pchar a derivative of pathchar. It should be in the Ubuntu repository.
I takes a while to run though so you need some patience. It will show you throughput and that will be much better than latency for determining the bottleneck.
http://www.caida.org/tools/taxonomy/perftaxonomy.xml
Here is an example:
rayd#raydHPEliteBook8440p ~ sudo pchar anddroiddevs.com
pchar to anddroiddevs.com (31.221.38.104) using UDP/IPv4
Using raw socket input
Packet size increments from 32 to 1500 by 32
46 test(s) per repetition
32 repetition(s) per hop
0: 192.168.0.20 (raydHPEliteBook8440p.local)
Partial loss: 0 / 1472 (0%)
Partial char: rtt = 6.553065 ms, (b = 0.000913 ms/B), r2 = 0.241811
stddev rtt = 0.196989, stddev b = 0.000244
Partial queueing: avg = 0.012648 ms (13848 bytes)
Hop char: rtt = 6.553065 ms, bw = 8759.575088 Kbps
Hop queueing: avg = 0.012648 ms (13848 bytes)
1: 80.5.69.1 (cpc2-glfd6-2-0-gw.6-2.cable.virginm.net)
Use mtr --raw -c 1 google.com. It's wayy faster and easier to parse.

Is there a shell command to delay a buffer?

I am looking for a shell command X such as, when I execute:
command_a | X 5000 | command_b
the stdout of command_a is written in stdin of command_b (at least) 5 seconds later.
A kind of delaying buffer.
As far as I know, buffer/mbuffer can write at constant rate (a fixed number of bytes per second). Instead, I would like a constant delay in time (t=0 is when X read a command_a output chunk, at t=5000 it must write this chunk to command_b).
[edit] I've implemented it: https://github.com/rom1v/delay
I know you said you're looking for a shell command, but what about using a subshell to your advantage? Something like:
command_a | (sleep 5; command_b)
So to grep a file cat-ed through (I know, I know, bad use of cat, but just an example):
cat filename | (sleep 5; grep pattern)
A more complete example:
$ cat testfile
The
quick
brown
fox
$ cat testfile | (sleep 5; grep brown)
# A 5-second sleep occurs here
brown
Or even, as Michale Kropat recommends, a group command with sleep would also work (and is arguably more correct). Like so:
$ cat testfile | { sleep 5; grep brown; }
Note: don't forget the semicolon after your command (here, the grep brown), as it is necessary!
As it seemed such a command dit not exist, I implemented it in C:
https://github.com/rom1v/delay
delay [-b <dtbufsize>] <delay>
Something like this?
#!/bin/bash
while :
do
read line
sleep 5
echo $line
done
Save the file as "slowboy", then do
chmod +x slowboy
and run as
command_a | ./slowboy | command_b
This might work
time_buffered () {
delay=$1
while read line; do
printf "%d %s\n" "$(date +%s)" "$line"
done | while read ts line; do
now=$(date +%s)
if (( now - ts < delay)); then
sleep $(( now - ts ))
fi
printf "%s\n" "$line"
done
}
commandA | time_buffered 5 | commandB
The first loop tags each line of its input with a timestamp and immediately feeds it to the second loop. The second loop checks the timestamp of each line, and will sleep if necessary until $delay seconds after it was first read before outputting the line.
Your question intrigued me, and I decided to come back and play with it. Here is a basic implementation in Perl. It's probably not portable (ioctl), tested on Linux only.
The basic idea is:
read available input every X microseconds
store each input chunk in a hash, with current timestamp as key
also push current timestamp on a queue (array)
lookup oldest timestamps on queue and write + discard data from the hash if delayed long enough
repeat
Max buffer size
There is a max size for stored data. If reached, additional data will not be read until space becomes available after writing.
Performance
It is probably not fast enough for your requirements (several Mb/s). My max throughput was 639 Kb/s, see below.
Testing
# Measure max throughput:
$ pv < /dev/zero | ./buffer_delay.pl > /dev/null
# Interactive manual test, use two terminal windows:
$ mkfifo data_fifo
terminal-one $ cat > data_fifo
terminal-two $ ./buffer_delay.pl < data_fifo
# now type in terminal-one and see it appear delayed in terminal-two.
# It will be line-buffered because of the terminals, not a limitation
# of buffer_delay.pl
buffer_delay.pl
#!/usr/bin/perl
use strict;
use warnings;
use IO::Select;
use Time::HiRes qw(gettimeofday usleep);
require 'sys/ioctl.ph';
$|++;
my $delay_usec = 3 * 1000000; # (3s) delay in microseconds
my $buffer_size_max = 10 * 1024 * 1024 ; # (10 Mb) max bytes our buffer is allowed to contain.
# When buffer is full, incoming data will not be read
# until space becomes available after writing
my $read_frequency = 10; # Approximate read frequency in Hz (will not be exact)
my %buffer; # the data we are delaying, saved in chunks by timestamp
my #timestamps; # keys to %buffer, used as a queue
my $buffer_size = 0; # num bytes currently in %buffer, compare to $buffer_size_max
my $time_slice = 1000000 / $read_frequency; # microseconds, min time for each discrete read-step
my $sel = IO::Select->new([\*STDIN]);
my $overflow_unread = 0; # Num bytes waiting when $buffer_size_max is reached
while (1) {
my $now = sprintf "%d%06d", gettimeofday; # timestamp, used to label incoming chunks
# input available?
if ($overflow_unread || $sel->can_read($time_slice / 1000000)) {
# how much?
my $available_bytes;
if ($overflow_unread) {
$available_bytes = $overflow_unread;
}
else {
$available_bytes = pack("L", 0);
ioctl (STDIN, FIONREAD(), $available_bytes);
$available_bytes = unpack("L", $available_bytes);
}
# will it fit?
my $remaining_space = $buffer_size_max - $buffer_size;
my $try_to_read_bytes = $available_bytes;
if ($try_to_read_bytes > $remaining_space) {
$try_to_read_bytes = $remaining_space;
}
# read input
if ($try_to_read_bytes > 0) {
my $input_data;
my $num_read = read (STDIN, $input_data, $try_to_read_bytes);
die "read error: $!" unless defined $num_read;
exit if $num_read == 0; # EOF
$buffer{$now} = $input_data; # save input
push #timestamps, $now; # save the timestamp
$buffer_size += length $input_data;
if ($overflow_unread) {
$overflow_unread -= length $input_data;
}
elsif (length $input_data < $available_bytes) {
$overflow_unread = $available_bytes - length $input_data;
}
}
}
# write + delete any data old enough
my $then = $now - $delay_usec; # when data is old enough
while (scalar #timestamps && $timestamps[0] < $then) {
my $ts = shift #timestamps;
print $buffer{$ts} if defined $buffer{$ts};
$buffer_size -= length $buffer{$ts};
die "Serious problem\n" unless $buffer_size >= 0;
delete $buffer{$ts};
}
# usleep any remaining time up to $time_slice
my $time_left = (sprintf "%d%06d", gettimeofday) - $now;
usleep ($time_slice - $time_left) if $time_slice > $time_left;
}
Feel free to post comments and suggestions below!

Resources