I use the below script to get the average response time for a specific website. It works just fine. I just need to convert some of the values in the response like the time_total, i want to view it in milliseconds and the size_download in KB format. And in the end of the command where I share the average response time, i also want to print it in milliseconds. Any help is really appreciated.
for ((i=1;i<=50;i++)); do curl -w 'Return Code: %{http_code}; Bytes Received: %{size_download}; Response Time: %{time_total}\n' "https://www.google.com" -m 2 -o /dev/null -s; done |tee /dev/tty|awk '{ sum += $NF; n++ } END { If (n > 0); print "Average Response Time =",sum /n;}'
Three answer here...
As this question is tagged shell (not bash), here is three different way for doing this, by using awk + bash, shell + bc or bash alone.
1. Using awk to process output and compute averages
As doing forks like curl ... | awk ... repetitively, is resource killer, I prefer to run awk only once, doing the whole formatting output job under awk:
iter=10
for ((i=iter;i--;));do
curl -w '%{http_code} %{size_download} %{time_total}\n' \
"https://www.google.com" -m 2 -o /dev/null -s
sleep .02
done | awk -v iter=$iter '
BEGIN {
ttim=0;tsiz=0;
printf " %-3s %8s %11s %10s\n","Res","Size","Time","Rate"
};
{
printf " %-3s %7.2fK %9.2fms %7.2fK/s\n", \
$1, $2/1024,$3*1000,$2/$3/1024;
tsiz+=$2;ttim+=$3;
};
END {
printf "Tot %7.2fK %9.2fms\nAvg %7.2fK %9.2fms %7.2fK/s\n", \
tsiz/1024,ttim*1000, tsiz/iter/1024,ttim/iter*1000,tsiz/ttim/1024;
}'
May produce:
Res Size Time Rate
200 14.61K 128.48ms 113.71K/s
200 14.75K 131.06ms 112.52K/s
200 14.73K 131.71ms 111.85K/s
200 14.72K 130.24ms 113.05K/s
200 14.66K 134.68ms 108.86K/s
200 14.69K 131.39ms 111.79K/s
200 14.63K 131.15ms 111.53K/s
200 14.70K 126.26ms 116.42K/s
200 14.71K 129.08ms 113.98K/s
200 14.68K 131.23ms 111.86K/s
Tot 146.88K 1305.28ms
Avg 14.69K 130.53ms 112.53K/s
2. Working on averages using shell
As shell don't work with floating numbers, you have to use bc or another subprocess to resolv this operations.
As doing forks like var=$(echo '3*4'|bc) is resource killer, I prefer to run bc only once, as a background process. One advantage of this is bc can store overall variables (total download and total size here).
First part: Init some variables and run bc
Creating two fifos for backgrounded bc and one fifo for curl, making parsing of curl's output easier.
Declaring some variables, and numfmt function into bc for further use...
Note about numfmt: This function compute human readable presentation from integer value, output two values
octal character octal value of b, K, M, G, P and T and
floating number from submited value, divided by power of 1024
The character from octal could be output by printf '%b' \\$value under shell.
#!/bin/sh
target=${1:-https://www.google.com}
iter=${2:-10}
delay=${3:-.02}
tempdir=$(mktemp -d)
bcin="$tempdir/bcin" bcout="$tempdir/bcout" curlout="$tempdir/curlout"
mkfifo "$bcin" "$bcout" "$curlout"
exec 3<>"$bcin"; exec 4<>"$bcout"
cleanUp() { [ -e "$bcin" ] && rm "$bcin" "$bcout" "$curlout" && rmdir "$tempdir"
exit;}
trap cleanUp 0 1 2 3 6 15
bc -l <&3 >&4 &
mBc() { echo >&3 "$2"; read -r $1 <&4 ;}
cat >&3 <<EOInitBc
unit[0]=142;unit[1]=113;unit[2]=115;unit[3]=107;unit[4]=124;unit[5]=120
define void numfmt (s) {
if (s==0) { print "0,0\n"; return;};
p=l(s)/l(1024);
scale=0;
p=p/1;
scale=20;
print unit[p]," ",s/1024^p,"\n";
};
tsiz=0;ttim=0;
EOInitBc
# read variables from curl
checkHttpRes() {
curl -sm2 -w'%{http_code} %{size_download} %{time_total}' -o/dev/null \
"$1" >"$curlout" &
read -r cod siz tim <"$curlout"
mBc rate "tsiz+=$siz;ttim+=$tim;$siz/$tim;"
mBc mtim "$tim*1000"
mBc 'hunit hsz' "numfmt($siz)"
mBc 'hurt hrat' "numfmt($rate)"
printf ' %-3s %7.2f%b %9.2fms %7.2f%b/s\n' "$cod" "$hsz" "\\$hunit" \
"$mtim" "$hrat" "\\$hurt"
}
# Last part: main routine
printf ' %-3s %8s %11s %10s\n' Res Size Time Rate
i="$iter"
while [ "$i" -gt 0 ];do
checkHttpRes "$target"
sleep "$delay"
i=$((i-1))
done
mBc 'hutsz htsz' "numfmt(tsiz)"
mBc 'huasz hasz' "numfmt(tsiz/$iter)"
mBc ttim "1000*ttim"
mBc atim "1000*ttim/$iter"
mBc 'huart hart' "numfmt(tsiz/ttim)"
printf 'Tot %7.2f%b %9.2fms\nAvg %7.2f%b %9.2fms %7.2f%b/s\n' \
"$htsz" "\\$hutsz" "$ttim" "$hasz" "\\$huasz" "$atim" "$hart" "\\$huart"
Run sample:
$ ./curlStat.sh http://www.google.com 10 .1
Res Size Time Rate
200 14.76K 141.84ms 104.09K/s
200 14.65K 136.21ms 107.53K/s
200 14.61K 136.74ms 106.86K/s
200 14.67K 138.08ms 106.26K/s
200 14.70K 130.56ms 112.56K/s
200 14.65K 135.72ms 107.97K/s
200 14.68K 135.28ms 108.53K/s
200 14.64K 134.20ms 109.07K/s
200 14.70K 136.32ms 107.82K/s
200 14.71K 136.19ms 108.00K/s
Tot 146.77K 1361.14ms
Avg 14.68K 136.11ms 107.83K/s
3. Last: bash
This question is tagged sh, but for ((i=... syntax is a bashism. So here is a compact pure bash version of this:
#!/bin/bash
target=${1:-https://www.google.com} iter=${2:-10} delay=${3:-.02}
txsz(){ local i=$(($1>=1<<50?5:$1>=1<<40?4:$1>=1<<30?3:$1>=1<<20?2:$1>1023?1:0
)) a=(b K M G T P);((i>4?i+=-2:0))&&a=(${a[#]:2})&&set -- $(($1>>20)) $2;local\
r=00$((1000*$1/(1024**i)));printf -v $2 %.2f%s ${r::-3}.${r: -3} ${a[i]};}
declare -i ttim=0 tsiz=0
checkHttpRes() { local code size time ustim hsz hrat
read -r code size time < <(
curl -sm2 -w'%{http_code} %{size_download} %{time_total}' -o/dev/null "$1"
)
printf -v ustim '%.6f' "$time"
ustim=$((10#${ustim/.}))
tsz "$size" hsz
tsz $(( size*10**7/ustim/10 )) hrat
ttim+=ustim tsiz+=size ustim=00$ustim
printf ' %-3s %8s %9.2fms %8s/s\n' "$code" "$hsz" \
"${ustim::-3}.${ustim: -3}" "$hrat"
}
printf ' %-3s %8s %11s %10s\n' Res Size Time Rate
for ((i=iter;i--;)) ;do
checkHttpRes "$target"
sleep "$delay"
done
tsz $tsiz htsz
ustim=00$ttim uatim=00$((ttim/iter))
tsz $((tsiz/iter)) hasz
tsz $(( tsiz*10**7/ttim/10 )) hart
printf 'Tot %8s %9.2fms\nAvg %8s %9.2fms %8s/s\n' "$htsz" \
"${ustim::-3}.${ustim: -3}" "$hasz" "${uatim::-3}.${uatim: -3}" "$hart"
Without any fork to bc nor awk, all operation are done in pseudo float using shifted integers.
This will produce same result:
Res Size Time Rate
200 14.68K 132.79ms 110.55K/s
200 14.68K 135.59ms 108.24K/s
200 14.68K 132.31ms 110.99K/s
200 14.75K 141.66ms 104.15K/s
200 14.66K 139.90ms 104.79K/s
200 14.71K 140.07ms 105.00K/s
200 14.68K 142.74ms 102.86K/s
200 14.64K 133.42ms 109.71K/s
200 14.72K 135.62ms 108.56K/s
200 14.71K 139.16ms 105.72K/s
Tot 146.92K 1373.25ms
Avg 14.69K 137.32ms 106.98K/s
You can just pipe the curl output through awk and format it as you want like this:
curl -w 'Return Code: %{http_code}; Bytes Received: %{size_download}; Response Time: %{time_total}\n' "https://www.google.com" -m 2 -o /dev/null -s | awk '{printf "Return Code: %d; KiB Received: %f; Response Time(ms): %f\n", $3, $6/1024, $9*1000}'
So the oneliner is the following:
for ((i=1;i<=50;i++)); do curl -w 'Return Code: %{http_code}; Bytes Received: %{size_download}; Response Time: %{time_total}\n' "https://www.google.com" -m 2 -o /dev/null -s | awk '{printf "Return Code: %d; KiB Received: %f; Response Time(ms): %f\n", $3, $6/1024, $9*1000}'; done | tee /dev/tty |awk '{ sum += $NF; n++ } END { If (n > 0); print "Average Response Time =",sum /n;}'
You can also format the numbers as you want by puttingg, for example %.2f for 2 decimal precision or %d for integer...
Related
I need to parse the output of ldapsearch and only keep the attributes with numeric values.
Also I need to transform the output to make it usable in prometheus monitoring.
this is the output of a raw ldapsearch:
# 389, snmp, monitor
dn: cn=389,cn=snmp,cn=monitor
cn: 389
objectClass: top
objectClass: extensibleObject
anonymousbinds: 9
unauthbinds: 9
simpleauthbinds: 122256
strongauthbinds: 0
bindsecurityerrors: 27869
inops: 24501385
readops: 17933653
compareops: 24852
addentryops: 14205
removeentryops: 0
modifyentryops: 378287
modifyrdnops: 0
listops: 0
searchops: 19194674
onelevelsearchops: 117
wholesubtreesearchops: 1260904
referrals: 0
chainings: 0
securityerrors: 2343
errors: 4694375
connections: 1075
connectionseq: 4720927
bytesrecv: 1608469180
bytessent: -424079608
entriesreturned: 19299393
referralsreturned: 0
I execute this query in order to remove the fields that are not numerical and also the dn/cn fields if they have numbers eg cn=389.
${LDAPSEARCH} -LLL -H ${LDAP_URI} -x -D "${BINDDN}" -w ${LDAP_PASSWD} -b "${cn}" -s base | sed '/^cn\|^dn/d' | awk -F: '{ if ( $1 != "connection" && $2 ~ /[[:digit:]$]/) printf "dsee_%s\n", $1 $2}'
But i need to modify the print f so that it prints me the field like this:
dsee_modifyrdnops{node="vm1",cn="389"} 0
dsee_listops{node="vm1",cn="1389"} 0
dsee_strongauthbinds{node="vm1",cn="389"} 0
dsee_readops{"node="vm1",cn="389"} 37194588
I have difficulties adding the curly brackets and quotes to the printf command.
what would be the best way to improve the awk/sed command and modify the printf output?
In plain bash:
#!/bin/bash
node=vm1
while IFS=: read -r key val; do
[[ $key = cn ]] && { cn=${val# }; continue; }
if [[ $val =~ ^\ -?[0-9]+(\.[0-9]*)?$ ]]; then
printf 'dsee_%s{node="%s",cn="%s"}%s\n' "$key" "$node" "$cn" "$val"
fi
done < <( your_raw_ldapsearch_command )
something along these lines:
$ cat tst.awk
BEGIN {
FS=":[[:blank:]]*"
qq="\""
node="vm1"
}
$1=="cn" {cn=$2}
$1!~/^((cn|dn)$|connection)/ && $2~/^[[:digit:]]+$/ {
printf("dsee_%s{node=%s%s%s,cn=%s%s%s} %d\n", $1, qq, node, qq, qq, cn, qq, $2)
}
$ awk -f tst.awk myFile
dsee_anonymousbinds{node="vm1",cn="389"} 9
dsee_unauthbinds{node="vm1",cn="389"} 9
dsee_simpleauthbinds{node="vm1",cn="389"} 122256
dsee_strongauthbinds{node="vm1",cn="389"} 0
dsee_bindsecurityerrors{node="vm1",cn="389"} 27869
dsee_inops{node="vm1",cn="389"} 24501385
dsee_readops{node="vm1",cn="389"} 17933653
dsee_compareops{node="vm1",cn="389"} 24852
dsee_addentryops{node="vm1",cn="389"} 14205
dsee_removeentryops{node="vm1",cn="389"} 0
dsee_modifyentryops{node="vm1",cn="389"} 378287
dsee_modifyrdnops{node="vm1",cn="389"} 0
dsee_listops{node="vm1",cn="389"} 0
dsee_searchops{node="vm1",cn="389"} 19194674
dsee_onelevelsearchops{node="vm1",cn="389"} 117
dsee_wholesubtreesearchops{node="vm1",cn="389"} 1260904
dsee_referrals{node="vm1",cn="389"} 0
dsee_chainings{node="vm1",cn="389"} 0
dsee_securityerrors{node="vm1",cn="389"} 2343
dsee_errors{node="vm1",cn="389"} 4694375
dsee_bytesrecv{node="vm1",cn="389"} 1608469180
dsee_entriesreturned{node="vm1",cn="389"} 19299393
dsee_referralsreturned{node="vm1",cn="389"} 0
I have my command below and I want to have the result in the same line with delimeters. My command:
Array=("GET" "POST" "OPTIONS" "HEAD")
echo $(date "+%Y-%m-%d %H:%M")
for i in "${Array[#]}"
do
cat /home/log/myfile_log | grep "$(date "+%d/%b/%Y:%H")"| awk -v last5=$(date --date="-5 min" "+%M") -F':' '$3>=last5 && $3<last5+5{print}' | egrep -a "$i" | wc -l
done
Results is:
2019-01-01 13:27
1651
5760
0
0
I want to have the result below:
2019-01-01 13:27,1651,5760,0,0
It looks (to me) like the overall objective is to scan /home/log/myfile.log for entries that have occurred within the last 5 minutes and which match one of the 4 entries in ${Array[#]}, keeping count of the matches along the way and finally printing the current date and the counts to a single line of output.
I've opted for a complete rewrite that uses awk's abilities of pattern matching, keeping counts and generating a single line of output:
date1=$(date "+%Y-%m-%d %H:%M") # current date
date5=$(date --date="-5 min" "+%M") # date from 5 minutes ago
awk -v d1="${date1}" -v d5="${date5}" -F":" '
BEGIN { keep=0 # init some variables
g=0
p=0
o=0
h=0
}
$3>=d5 && $3<d5+5 { keep=1 } # do we keep processing this line?
!keep { next } # if not then skip to next line
/GET/ { g++ } # increment our counters
/POST/ { p++ }
/OPTIONS/ { o++ }
/HEAD/ { h++ }
{ keep=0 } # reset keep flag for next line
# print results to single line of output
END { printf "%s,%s,%s,%s,%s\n", d1, g, p, o, h }
' <(grep "$(date '+%d/%b/%Y:%H')" /home/log/myfile_log)
NOTE: The OP may need to revisit the <(grep "$(date ...)" /home/log/myfile.log) to handle timestamp periods that span hours, days, months and years, eg, 14:59 - 16:04, 12/31/2019 23:59 - 01/01/2020 00:04, etc.
Yeah, it's a bit verbose but a bit easier to understand; OP can rewrite/reduce as sees fit.
To get the execution time of any executable, say a.out, I can simply write time ./a.out. This will output a real time, user time and system time.
Is it possible write a bash script that runs the program numerous times and calculates and outputs the average real execution time?
You could write a loop and collect the output of time command and pipe it to awk to compute the average:
avg_time() {
#
# usage: avg_time n command ...
#
n=$1; shift
(($# > 0)) || return # bail if no command given
for ((i = 0; i < n; i++)); do
{ time -p "$#" &>/dev/null; } 2>&1 # ignore the output of the command
# but collect time's output in stdout
done | awk '
/real/ { real = real + $2; nr++ }
/user/ { user = user + $2; nu++ }
/sys/ { sys = sys + $2; ns++}
END {
if (nr>0) printf("real %f\n", real/nr);
if (nu>0) printf("user %f\n", user/nu);
if (ns>0) printf("sys %f\n", sys/ns)
}'
}
Example:
avg_time 5 sleep 1
would give you
real 1.000000
user 0.000000
sys 0.000000
This can be easily enhanced to:
sleep for a given amount of time between executions
sleep for a random time (within a certain range) between executions
Meaning of time -p from man time:
-p
When in the POSIX locale, use the precise traditional format
"real %f\nuser %f\nsys %f\n"
(with numbers in seconds) where the number of decimals in the
output for %f is unspecified but is sufficient to express the
clock tick accuracy, and at least one.
You may want to check out this command-line benchmarking tool as well:
sharkdp/hyperfine
Total execution time vs sum of single execution time
Care! dividing sum of N rounded execution time is imprecise!
Instead, we could divide total execution time of N iteration (by N)
avg_time_alt() {
local -i n=$1
local foo real sys user
shift
(($# > 0)) || return;
{ read foo real; read foo user; read foo sys ;} < <(
{ time -p for((;n--;)){ "$#" &>/dev/null ;} ;} 2>&1
)
printf "real: %.5f\nuser: %.5f\nsys : %.5f\n" $(
bc -l <<<"$real/$n;$user/$n;$sys/$n;" )
}
Nota: This uses bc instead of awk to compute the average. For this, we would create a temporary bc file:
printf >/tmp/test-pi.bc "scale=%d;\npi=4*a(1);\nquit\n" 60
This would compute ΒΆ with 60 decimals, then exit quietly. (You can adapt number of decimals for your host.)
Demo:
avg_time_alt 1000 sleep .001
real: 0.00195
user: 0.00008
sys : 0.00016
avg_time_alt 1000 bc -ql /tmp/test-pi.bc
real: 0.00172
user: 0.00120
sys : 0.00058
Where codeforester's function will anser:
avg_time 1000 sleep .001
real 0.000000
user 0.000000
sys 0.000000
avg_time 1000 bc -ql /tmp/test-pi.bc
real 0.000000
user 0.000000
sys 0.000000
Alternative, inspired by choroba's answer, using Linux's/proc
Ok, you could consider:
avgByProc() {
local foo start end n=$1 e=$1 values times
shift;
export n;
{
read foo;
read foo;
read foo foo start foo
} < /proc/timer_list;
mapfile values < <(
for((;n--;)){ "$#" &>/dev/null;}
read -a endstat < /proc/self/stat
{
read foo
read foo
read foo foo end foo
} </proc/timer_list
printf -v times "%s/100/$e;" ${endstat[#]:13:4}
bc -l <<<"$[end-start]/10^9/$e;$times"
)
printf -v fmt "%-7s: %%.5f\\n" real utime stime cutime cstime
printf "$fmt" ${values[#]}
}
This is based on /proc:
man 5 proc | grep [su]time\\\|timer.list | sed 's/^/> /'
(14) utime %lu
(15) stime %lu
(16) cutime %ld
(17) cstime %ld
/proc/timer_list (since Linux 2.6.21)
Then now:
avgByProc 1000 sleep .001
real : 0.00242
utime : 0.00015
stime : 0.00021
cutime : 0.00082
cstime : 0.00020
Where utime and stime represent user time and system time for bash himself and cutime and cstime represent child user time and child system time wich are the most interesting.
Nota: In this case (sleep) command won't use a lot of ressources.
avgByProc 1000 bc -ql /tmp/test-pi.bc
real : 0.00175
utime : 0.00015
stime : 0.00025
cutime : 0.00108
cstime : 0.00032
This become more clear...
Of course, as accessing timer_list and self/stat successively but not atomicaly, differences between real (nanosecs based) and c?[su]time (based in ticks ie: 1/100th sec) may appear!
From bashoneliners
adapted to transform (,) to (.) for i18n support
hardcoded to 10, adapt as needed
returns only the "real" value, the one you most likely want
Oneliner
for i in {1..10}; do time $#; done 2>&1 | grep ^real | sed s/,/./ | sed -e s/.*m// | awk '{sum += $1} END {print sum / NR}'
I made a "fuller" version
outputs the results of every execution so you know the right thing is executed
shows every run time, so you glance for outliers
But really, if you need advanced stuff just use hyperfine.
GREEN='\033[0;32m'
PURPLE='\033[0;35m'
RESET='\033[0m'
# example: perf sleep 0.001
# https://serverfault.com/questions/175376/redirect-output-of-time-command-in-unix-into-a-variable-in-bash
perfFull() {
TIMEFORMAT=%R # `time` outputs only a number, not 3 lines
export LC_NUMERIC="en_US.UTF-8" # `time` outputs `0.100` instead of local format, like `0,100`
times=10
echo -e -n "\nWARMING UP ${PURPLE}$#${RESET}"
$# # execute passed parameters
echo -e -n "RUNNING ${PURPLE}$times times${RESET}"
exec 3>&1 4>&2 # redirects subshell streams
durations=()
for _ in `seq $times`; {
durations+=(`{ time $# 1>&3 2>&4; } 2>&1`) # passes stdout through so only `time` is caputured
}
exec 3>&- 4>&- # reset subshell streams
printf '%s\n' "${durations[#]}"
total=0
for duration in "${durations[#]}"; {
total=$(bc <<< "scale=3;$total + $duration")
}
average=($(bc <<< "scale=3;$total/$times"))
echo -e "${GREEN}$average average${RESET}"
}
It's probably easier to record the start and end time of the execution and divide the difference by the number of executions.
#!/bin/bash
times=10
start=$(date +%s)
for ((i=0; i < times; i++)) ; do
run_your_executable_here
done
end=$(date +%s)
bc -l <<< "($end - $start) / $times"
I used bc to calculate the average, as bash doesn't support floating point arithmetics.
To get more precision, you can switch to nanoseconds:
start=$(date +%s.%N)
and similarly for $end.
I have a bash question (when using awk). I'm extracting every single instance of the first and fifth column in a textfile and piping it to a new file with the following code,
cut -f4 test170201.rawtxt | awk '/stream_0/ { print $1, $5 }' > testLogFile.txt
This is part of the file (test170201.rawtxt) I'm extracting the data from, columns Timestamp and Loss,
Timestamp Stream Status Seq Loss Bytes Delay
17/02/01.10:58:25.212577 stream_0 OK 80281 0 1000 38473
17/02/01.10:58:25.213401 stream_0 OK 80282 0 1000 38472
17/02/01.10:58:25.215560 stream_0 OK 80283 0 1000 38473
17/02/01.10:58:25.216645 stream_0 OK 80284 0 1000 38472
This is the result I'm getting in testLogFile.txt
17/02/01.10:58:25.212577 0
17/02/01.10:58:25.213401 0
17/02/01.10:58:25.215560 0
17/02/01.10:58:25.216645 0
However, I want the Timestamp to be written in epoch in the file above. Is there an easy way of modifying the code I already have to do this?
Given:
$ cat file
Timestamp Stream Status Seq Loss Bytes Delay
17/02/01.10:58:25.212577 stream_0 OK 80281 0 1000 38473
17/02/01.10:58:25.213401 stream_0 OK 80282 0 1000 38472
17/02/01.10:58:25.215560 stream_0 OK 80283 0 1000 38473
17/02/01.10:58:25.216645 stream_0 OK 80284 0 1000 38472
You can write a POSIX Bash script to do what you are looking for:
while IFS= read -r line || [[ -n "$line" ]]; do
if [[ "$line" =~ ^[[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{2} ]]
then
arr=($line)
ts=${arr[0]}
dec=${ts##*.} # fractional seconds
# GNU date may need different flags:
epoch=$(date -j -f "%y/%m/%d.%H:%M:%S" "${ts%.*}" "+%s")
printf "%s.%s\t%s\n" "$epoch" "$dec" "${arr[4]}"
fi
done <file >out_file
$ cat out_file
1485975505.212577 0
1485975505.213401 0
1485975505.215560 0
1485975505.216645 0
For GNU date, try:
while IFS= read -r line || [[ -n "$line" ]]; do
if [[ "$line" =~ ^[[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{2} ]]
then
arr=($line)
ts="20${arr[0]}"
d="${ts%%.*}"
tmp="${ts%.*}"
tm="${tmp#*.}"
dec="${ts##*.}" # fractional seconds
epoch=$(date +"%s" --date="$d $tm" )
printf "%s.%s\t%s\n" "$epoch" "$dec" "${arr[4]}"
fi
done <file >out_file
For an GNU awk solution, you can do:
awk 'function epoch(s){
split(s, dt, /[/:. ]/)
s="20" dt[1] " " dt[2] " " dt[3] " " dt[4] " " dt[5] " " dt[6]
return mktime(s) "." dt[7]}
/^[0-9][0-9]/ { print epoch($1), $5 }' file >out_file
If you don't want the fractional second included in the epoch, they are easily removed.
awk -F '[.[:blank:]]+' '
# use separator for dot and space (to avoid trailing time info)
{
# for line other than header
if( NR>1) {
# time is set for format "YYYY MM DD HH MM SS [DST]"
# prepare with valuable info
T = "20"$1 " " $2
# use correct separator
gsub( /[\/:]/, " ", T)
# convert to epoch
E = mktime( T)
# print result, adding fractionnal as mentionned later
printf("%d.%d %s\n", E, $3, $7)
}
else {
# print header (line 1)
print $1 " "$7
}
}
' test170201.rawtxt \
> Redirected.file
self commented, code is longer for understanding purpose
use of gnu awk for the mktime function not available in posix or older version
Oneliner a bit optimized here after
awk -F '[.[:blank:]]+' '{if(NR>1){T="20"$1" "$2;gsub(/[\/:]/," ", T);$1=mktime(T)}print $1" "$7}' test170201.rawtxt
Using GNU awk
Input
$ cat f
Timestamp Stream Status Seq Loss Bytes Delay
17/02/01.10:58:25.212577 stream_0 OK 80281 0 1000 38473
17/02/01.10:58:25.213401 stream_0 OK 80282 0 1000 38472
17/02/01.10:58:25.215560 stream_0 OK 80283 0 1000 38473
17/02/01.10:58:25.216645 stream_0 OK 80284 0 1000 38472
Output
$ awk '
BEGIN{cyear = strftime("%y",systime())}
function epoch(v, datetime){
sub(/\./," ",v);
split(v,datetime,/[/: ]/);
datetime[1] = datetime[1] <= cyear ? 2000+datetime[1] : 1900+datetime[1];
return mktime(datetime[1] " " datetime[2] " " datetime[3] " " datetime[4]" " datetime[5]" " datetime[6])
}
/stream_0/{
print epoch($1),$5
}' f
1485926905 0
1485926905 0
1485926905 0
1485926905 0
To write to new file just redirect like below
cut -f4 test170201.rawtxt | awk '
BEGIN{cyear = strftime("%y",systime());}
function epoch(v, datetime){
sub(/\./," ",v);
split(v,datetime,/[/: ]/);
datetime[1] = datetime[1] <= cyear ? 2000+datetime[1] : 1900+datetime[1];
return mktime(datetime[1] " " datetime[2] " " datetime[3] " " datetime[4]" " datetime[5]" " datetime[6])
}
/stream_0/{
print epoch($1),$5
}' > testLogFile.txt
How to split file by percentage of no. of lines?
Let's say I want to split my file into 3 portions (60%/20%/20% parts), I could do this manually, -_- :
$ wc -l brown.txt
57339 brown.txt
$ bc <<< "57339 / 10 * 6"
34398
$ bc <<< "57339 / 10 * 2"
11466
$ bc <<< "34398 + 11466"
45864
bc <<< "34398 + 11466 + 11475"
57339
$ head -n 34398 brown.txt > part1.txt
$ sed -n 34399,45864p brown.txt > part2.txt
$ sed -n 45865,57339p brown.txt > part3.txt
$ wc -l part*.txt
34398 part1.txt
11466 part2.txt
11475 part3.txt
57339 total
But I'm sure there's a better way!
There is a utility that takes as arguments the line numbers that should become the first of each respective new file: csplit. This is a wrapper around its POSIX version:
#!/bin/bash
usage () {
printf '%s\n' "${0##*/} [-ks] [-f prefix] [-n number] file arg1..." >&2
}
# Collect csplit options
while getopts "ksf:n:" opt; do
case "$opt" in
k|s) args+=(-"$opt") ;; # k: no remove on error, s: silent
f|n) args+=(-"$opt" "$OPTARG") ;; # f: filename prefix, n: digits in number
*) usage; exit 1 ;;
esac
done
shift $(( OPTIND - 1 ))
fname=$1
shift
ratios=("$#")
len=$(wc -l < "$fname")
# Sum of ratios and array of cumulative ratios
for ratio in "${ratios[#]}"; do
(( total += ratio ))
cumsums+=("$total")
done
# Don't need the last element
unset cumsums[-1]
# Array of numbers of first line in each split file
for sum in "${cumsums[#]}"; do
linenums+=( $(( sum * len / total + 1 )) )
done
csplit "${args[#]}" "$fname" "${linenums[#]}"
After the name of the file to split up, it takes the ratios for the sizes of the split files relative to their sum, i.e.,
percsplit brown.txt 60 20 20
percsplit brown.txt 6 2 2
percsplit brown.txt 3 1 1
are all equivalent.
Usage similar to the case in the question is as follows:
$ percsplit -s -f part -n 1 brown.txt 60 20 20
$ wc -l part*
34403 part0
11468 part1
11468 part2
57339 total
Numbering starts with zero, though, and there is no txt extension. The GNU version supports a --suffix-format option that would allow for .txt extension and which could be added to the accepted arguments, but that would require something more elaborate than getopts to parse them.
This solution plays nice with very short files (split file of two lines into two) and the heavy lifting is done by csplit itself.
$ cat file
a
b
c
d
e
$ cat tst.awk
BEGIN {
split(pcts,p)
nrs[1]
for (i=1; i in p; i++) {
pct += p[i]
nrs[int(size * pct / 100) + 1]
}
}
NR in nrs{ close(out); out = "part" ++fileNr ".txt" }
{ print $0 " > " out }
$ awk -v size=$(wc -l < file) -v pcts="60 20 20" -f tst.awk file
a > part1.txt
b > part1.txt
c > part1.txt
d > part2.txt
e > part3.txt
Change the " > " to just > to actually write to the output files.
Usage
The following bash script allows you to specify the percentage like
./split.sh brown.txt 60 20 20
you also can use the placeholder . which fills the percentage up to 100%.
./split.sh brown.txt 60 20 .
the splitted file is written to
part1-brown.txt
part2-brown.txt
part3-brown.txt
The script always generates as much part files as numbers are specified.
If the percentages sum up to 100, cat part* will always generate the original file (no duplicated or missing lines).
Bash Script: split.sh
#! /bin/bash
file="$1"
fileLength=$(wc -l < "$file")
shift
part=1
percentSum=0
currentLine=1
for percent in "$#"; do
[ "$percent" == "." ] && ((percent = 100 - percentSum))
((percentSum += percent))
if ((percent < 0 || percentSum > 100)); then
echo "invalid percentage" 1>&2
exit 1
fi
((nextLine = fileLength * percentSum / 100))
if ((nextLine < currentLine)); then
printf "" # create empty file
else
sed -n "$currentLine,$nextLine"p "$file"
fi > "part$part-$file"
((currentLine = nextLine + 1))
((part++))
done
BEGIN {
split(w, weight)
total = 0
for (i in weight) {
weight[i] += total
total = weight[i]
}
}
FNR == 1 {
if (NR!=1) {
write_partitioned_files(weight,a)
split("",a,":") #empty a portably
}
name=FILENAME
}
{a[FNR]=$0}
END {
write_partitioned_files(weight,a)
}
function write_partitioned_files(weight, a) {
split("",threshold,":")
size = length(a)
for (i in weight){
threshold[length(threshold)] = int((size * weight[i] / total)+0.5)+1
}
l=1
part=0
for (i in threshold) {
close(out)
out = name ".part" ++part
for (;l<threshold[i];l++) {
print a[l] " > " out
}
}
}
Invoke as:
awk -v w="60 20 20" -f above_script.awk file_to_split1 file_to_split2 ...
Replace " > " with > in script to actually write partitioned files.
The variable w expects space separated numbers. Files are partitioned in that proportion. For example "2 1 1 3" will partition files into four with number of lines in proportion of 2:1:1:3. Any sequence of numbers adding up to 100 can be used as percentages.
For large files the array a may consume too much memory. If that is an issue, here is an alternative awk script:
BEGIN {
split(w, weight)
for (i in weight) {
total += weight[i]; weight[i] = total #cumulative sum
}
}
FNR == 1 {
#get number of lines. take care of single quotes in filename.
name = gensub("'", "'\"'\"'", "g", FILENAME)
"wc -l '" name "'" | getline size
split("", threshold, ":")
for (i in weight){
threshold[length(threshold)+1] = int((size * weight[i] / total)+0.5)+1
}
part=1; close(out); out = FILENAME ".part" part
}
{
if(FNR>=threshold[part]) {
close(out); out = FILENAME ".part" ++part
}
print $0 " > " out
}
This passes through each file twice. Once for counting lines (via wc -l) and the other time while writing partitioned files. Invocation and effect is similar to the first method.
i like Benjamin W.'s csplit solution, but it's so long...
#!/bin/bash
# usage ./splitpercs.sh file 60 20 20
n=`wc -l <"$1"` || exit 1
echo $* | tr ' ' '\n' | tail -n+2 | head -n`expr $# - 1` |
awk -v n=$n 'BEGIN{r=1} {r+=n*$0/100; if(r > 1 && r < n){printf "%d\n",r}}' |
uniq | xargs csplit -sfpart "$1"
(the if(r > 1 && r < n) and uniq bits are to prevent creating empty files or strange behavior for small percentages, files with small numbers of lines, or percentages that add to over 100.)
I just followed your lead and made what you do manually into a script. It may not be the fastest or "best", but if you understand what you are doing now and can just "scriptify" it, you may be better off should you need to maintain it.
#!/bin/bash
# thisScript.sh yourfile.txt 20 50 10 20
YOURFILE=$1
shift
# changed to cat | wc so I dont have to remove the filename which comes from
# wc -l
LINES=$(cat $YOURFILE | wc -l )
startpct=0;
PART=1;
for pct in $#
do
# I am assuming that each parameter is on top of the last
# so 10 30 10 would become 10, 10+30 = 40, 10+30+10 = 50, ...
endpct=$( echo "$startpct + $pct" | bc)
# your math but changed parts of 100 instead of parts of 10.
# change bc <<< to echo "..." | bc
# so that one can capture the output into a bash variable.
FIRSTLINE=$( echo "$LINES * $startpct / 100 + 1" | bc )
LASTLINE=$( echo "$LINES * $endpct / 100" | bc )
# use sed every time because the special case for head
# doesn't really help performance.
sed -n $FIRSTLINE,${LASTLINE}p $YOURFILE > part${PART}.txt
$((PART++))
startpct=$endpct
done
# get the rest if the % dont add to 100%
if [[ $( "lastpct < 100" | bc ) -gt 0 ]] ; then
sed -n $FIRSTLINE,${LASTLINE}p $YOURFILE > part${PART}.txt
fi
wc -l part*.txt