Subtract two variables in Bash - bash

I have the script below to subtract the counts of files between two directories but the COUNT= expression does not work. What is the correct syntax?
#!/usr/bin/env bash
FIRSTV=`ls -1 | wc -l`
cd ..
SECONDV=`ls -1 | wc -l`
COUNT=expr $FIRSTV-$SECONDV ## -> gives 'command not found' error
echo $COUNT

Try this Bash syntax instead of trying to use an external program expr:
count=$((FIRSTV-SECONDV))
BTW, the correct syntax of using expr is:
count=$(expr $FIRSTV - $SECONDV)
But keep in mind using expr is going to be slower than the internal Bash syntax I provided above.

You just need a little extra whitespace around the minus sign, and backticks:
COUNT=`expr $FIRSTV - $SECONDV`
Be aware of the exit status:
The exit status is 0 if EXPRESSION is neither null nor 0, 1 if EXPRESSION is null or 0.
Keep this in mind when using the expression in a bash script in combination with set -e which will exit immediately if a command exits with a non-zero status.

You can use:
((count = FIRSTV - SECONDV))
to avoid invoking a separate process, as per the following transcript:
pax:~$ FIRSTV=7
pax:~$ SECONDV=2
pax:~$ ((count = FIRSTV - SECONDV))
pax:~$ echo $count
5

This is how I always do maths in Bash:
count=$(echo "$FIRSTV - $SECONDV"|bc)
echo $count

White space is important, expr expects its operands and operators as separate arguments. You also have to capture the output. Like this:
COUNT=$(expr $FIRSTV - $SECONDV)
but it's more common to use the builtin arithmetic expansion:
COUNT=$((FIRSTV - SECONDV))

For simple integer arithmetic, you can also use the builtin let command.
ONE=1
TWO=2
let "THREE = $ONE + $TWO"
echo $THREE
3
For more info on let, look here.

Alternatively to the suggested 3 methods you can try let which carries out arithmetic operations on variables as follows:
let COUNT=$FIRSTV-$SECONDV
or
let COUNT=FIRSTV-SECONDV

Diff Real Positive Numbers
diff_real () {
echo "df=($1 - $2); if (df < 0) { df=df* -1}; print df" | bc -l;
}
Usage
var_a=10
var_b=4
output=$(diff_real $var_a $var_b)
# 6
#########
var_a=4
var_b=10
output=$(diff_real $var_a $var_b)
# 6

Use BASH:
#!/bin/bash
# home/victoria/test.sh
START=$(date +"%s") ## seconds since Epoch
for i in $(seq 1 10)
do
sleep 1.5
END=$(date +"%s") ## integer
TIME=$((END - START)) ## integer
AVG_TIME=$(python -c "print(float($TIME/$i))") ## int to float
printf 'i: %i | elapsed time: %0.1f sec | avg. time: %0.3f\n' $i $TIME $AVG_TIME
((i++)) ## increment $i
done
Output
$ ./test.sh
i: 1 | elapsed time: 1.0 sec | avg. time: 1.000
i: 2 | elapsed time: 3.0 sec | avg. time: 1.500
i: 3 | elapsed time: 5.0 sec | avg. time: 1.667
i: 4 | elapsed time: 6.0 sec | avg. time: 1.500
i: 5 | elapsed time: 8.0 sec | avg. time: 1.600
i: 6 | elapsed time: 9.0 sec | avg. time: 1.500
i: 7 | elapsed time: 11.0 sec | avg. time: 1.571
i: 8 | elapsed time: 12.0 sec | avg. time: 1.500
i: 9 | elapsed time: 14.0 sec | avg. time: 1.556
i: 10 | elapsed time: 15.0 sec | avg. time: 1.500
$
Get current time in seconds since the Epoch on Linux, Bash

Related

Calculating percentile for each request from a log file based on start time and end time using bash script

I have a simulation.log file where it will have below results and I want to calculate 5th, 25th, 95th, 99th percentile of each request using shell script by reading the file.
Below is a sample simulation.log file where 1649410339141 and 1649410341026 are start and end time in milliseconds.
REQUEST1 somelogprinted TTP123099SM000202 002 1649410339141 1649410341026 OK
REQUEST2 somelogprinted TTP123099SM000202 001 1649410339141 1649410341029 OK
......
I tried below code but did not give me any result and I am not a Unix developer:
FILE=filepath
sort -n $* > $FILE
N=$(wc -l $FILE | awk '{print $1}')
P50=$(dc -e "$N 2 / p")
P90=$(dc -e "$N 9 * 10 / p")
P99=$(dc -e "$N 99 * 100 / p") echo ";;
50th, 90th and 99th percentiles for
$N data points" awk "FNR==$P50 || FNR==$P90 || FNR==$P99" $FILE
Sample output:
Request | 5thpercentile | 25Percentile | 95Percentile | 99Percentile
Request1 | 657 | 786 | 821 | 981
Request2 | 453 | 654 | 795 | 854

Mean of execution time of a program

I have the following bash code (A.cpp, B.cpp and C.txt are filename in the current directory):
#!/bin/bash
g++ A.cpp -o A
g++ B.cpp -o B
Inputfiles=(X Y Z U V)
for j in "${Inputfiles[#]}"
do
echo $j.txt:
i=1
while [ $i -le 5 ]
do
./A $j.txt
./B C.txt
echo ""
i=`expr $i + 1`
done
echo ""
done
rm -f A B
One execution of ./A and ./B is one execution of my program. I run my program 5 times for each input file in the array 'Inputfiles'. I want the average execution time of my program over each input-file. How can I do so?
(Earlier, I tried to add time and clock functions within the A.cpp and B.cpp files, but I am not able to add the execution times of both files to get the execution time of a program.)
If I understand correctly what average you would like to calculate, I think the code below will serve your purpose.
Some explanations on the additions to your script:
Lines 6 - 14 declare a function that expects three arguments and updates the accumulated total time, in seconds
Line 26 initializes variable total_time.
Lines 31, 38, execute programs A and B respectively. Using bash time to collect the execution time. >/dev/null "discards" A's and B's outputs. 2>&1 redirects stderr to stdout so that grep can get time's outputs (a nice explanation can be found here). grep real keeps only the real output from time, you could refer to this post for an explanation of time's output and choose the specific time of your interest. awk {print $2} keeps only the numeric part of grep's output.
Lines 32, 39 store the minutes part to the corresponding variable
Lines 33-34, 40-41 trim the seconds part of real_time variable
Lines 35, 42 accumulate the total time by calling function accumulate_time
Line 46 calculates the average time by dividing with 5
Converted the while loop to a nested for loop and introduced the iterations variable, not necessarily part of the initial question but helps re-usability of the number of iterations
1 #!/bin/bash
2
3 # Function that receives three arguments (total time,
4 # minutes and seconds) and returns the accumulated time in
5 # seconds
6 function accumulate_time() {
7 total_time=$1
8 minutes=$2
9 seconds=$3
10
11 accumulated_time_secs=$(echo "$minutes * 60 + $seconds + $total_time" | bc )
12 echo "$accumulated_time_secs"
13
14 }
15
16 g++ A.cpp -o A
17 g++ B.cpp -o B
18 Inputfiles=(X Y Z U V)
19
20 iterations=5
21
22 for j in "${Inputfiles[#]}"
23 do
24 echo $j.txt:
25 # Initialize total_time
26 total_time=0.0
27
28 for i in $(seq 1 $iterations)
29 do
30 # Execute A and capture its real time
31 real_time=`{ time ./A $j.txt >/dev/null; } 2>&1 | grep real | awk '{print $2}'`
32 minutes=${real_time%m*}
33 seconds=${real_time#*m}
34 seconds=${seconds%s*}
35 total_time=$(accumulate_time "$total_time" "$minutes" "$seconds")
36
37 # Execute B and capture its real time
38 real_time=`{ time ./B C.txt >/dev/null; } 2>&1 | grep real | awk '{print $2}'`
39 minutes=${real_time%m*}
40 seconds=${real_time#*m}
41 seconds=${seconds%s*}
42 total_time=$(accumulate_time "$total_time" "$minutes" "$seconds")
43 echo ""
44 done
45
46 average_time=$(echo "scale=3; $total_time / $iterations" | bc)
47 echo "Average time for input file $j is: $average_time"
48 done
49
50 rm -f A B

How do i split the input into chunks of six entries each using bash?

This is the script which i run to output the raw data of data_tripwire.sh
#!/bin/sh
LOG=/var/log/syslog-ng/svrs/sec2tes1
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
CBS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.41 |sort|uniq | wc -l`
echo $CBS >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
GFS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.31 |sort|uniq | wc -l`
echo $GFS >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
HR1=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.10.1 |sort|uniq | wc -l `
echo $HR1 >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
HR2=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.12 |sort|uniq | wc -l`
echo $HR2 >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
PAYROLL=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.18 |sort|uniq | wc -l`
echo $PAYROLL >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
INCV=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.71 |sort|uniq | wc -l`
echo $INCV >> /home/secmgr/attmrms1/data_tripwire1.sh
done
data_tripwire.sh
91
58
54
108
52
18
8
81
103
110
129
137
84
15
14
18
11
17
12
6
1
28
6
14
8
8
0
0
28
24
25
23
21
13
9
4
18
17
18
30
13
3
I want to do the first 6 entries(91,58,54,108,52,18) from the output above. Then it will break out of the loop.After that it will continue for the next 6 entries.Then it will break out of the loop again....
The problem now is that it reads all the 42 numbers without breaking out of the loop.
This is the output of the table
Tripwire
Month CBS GFS HR HR Payroll INCV
cb2db1 gfs2db1 hr2web1 hrm2db1 hrm2db1a incv2svr1
2013-07 85 76 12 28 26 4
2013-08 58 103 18 6 24 18
2013-09 54 110 11 14 25 17
2013-10 108 129 17 8 23 18
2013-11 52 137 12 8 21 30
2013-12 18 84 6 0 13 13
2014-01 8 16 1 0 9 3
The problem now is that it read the total 42 numbers from 85...3
I want to make a loop which run from july till jan for one server.Then it will do the average mean and standard deviation calculation which is already done below.
After that done, it will continue the next cycle of 6 numbers for the next server and it will do the same like initial cycle.Assistance is required for the for loops which has break and continue in it or any simpler.
This is my standard deviation calculation
count=0 # Number of data points; global.
SC=3 # Scale to be used by bc. three decimal places.
E_DATAFILE=90 # Data file error
## ----------------- Set data file ---------------------
if [ ! -z "$1" ] # Specify filename as cmd-line arg?
then
datafile="$1" # ASCII text file,
else #+ one (numerical) data point per line!
datafile=/home/secmgr/attmrms1/data_tripwire1.sh
fi # See example data file, below.
if [ ! -e "$datafile" ]
then
echo "\""$datafile"\" does not exist!"
exit $E_DATAFILE
fi
Calculate the mean
arith_mean ()
{
local rt=0 # Running total.
local am=0 # Arithmetic mean.
local ct=0 # Number of data points.
while read value # Read one data point at a time.
do
rt=$(echo "scale=$SC; $rt + $value" | bc)
(( ct++ ))
done
am=$(echo "scale=$SC; $rt / $ct" | bc)
echo $am; return $ct # This function "returns" TWO values!
# Caution: This little trick will not work if $ct > 255!
# To handle a larger number of data points,
#+ simply comment out the "return $ct" above.
} <"$datafile" # Feed in data file.
sd ()
{
mean1=$1 # Arithmetic mean (passed to function).
n=$2 # How many data points.
sum2=0 # Sum of squared differences ("variance").
avg2=0 # Average of $sum2.
sdev=0 # Standard Deviation.
while read value # Read one line at a time.
do
diff=$(echo "scale=$SC; $mean1 - $value" | bc)
# Difference between arith. mean and data point.
dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
done
avg2=$(echo "scale=$SC; $sum2 / $n" | bc) # Avg. of sum of squares.
sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
echo $sdev # Standard Deviation.
} <"$datafile" # Rewinds data file.
Showing the output
mean=$(arith_mean); count=$? # Two returns from function!
std_dev=$(sd $mean $count)
echo
echo "<tr><th>Servers</th><th>"Number of data points in \"$datafile"\"</th> <th>Arithmetic mean (average)</th><th>Standard Deviation</th></tr>" >> $HTML
echo "<tr><td>cb2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>gfs2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hr2web1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1a<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>incv21svr1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo
I want to split the input into chunks of six entries each with the arithmetic mean and the sd of the entries 1..6, then of the entries 7..12, then of 13..18 etc.
This is the output of the table i want.
Tripwire
Month CBS GFS HR HR Payroll INCV
cb2db1 gfs2db1 hr2web1 hrm2db1 hrm2db1a incv2svr1
2013-07 85 76 12 28 26 4
2013-08 58 103 18 6 24 18
2013-09 54 110 11 14 25 17
2013-10 108 129 17 8 23 18
2013-11 52 137 12 8 21 30
2013-12 18 84 6 0 13 13
2014-01 8 16 1 0 9 3
*Standard
deviation
(7mths) 31.172 35.559 5.248 8.935 5.799 8.580
* Mean
(7mths) 54.428 94.285 11.142 9.142 20.285 14.714
paste - - - - - - < data_tripwire.sh | while read -a values; do
# values is an array with 6 values
# ${values[0]} .. ${values[5]}
arith_mean "${values[#]}"
done
This means you have to rewrite your function so they don't use read: change
while read value
to
for value in "$#"
#Matt, yes change both functions to iterate over arguments instead of reading from stdin. Then, you will pass the data file (now called "data_tripwire1.sh" (terrible file extension for data, use .txt or .dat)) into paste to reformat the data so that the first 6 values now form the first row. Read the line into the array values (using read -a values) and invoke the functions :
arith_mean () {
local sum=$(IFS=+; echo "$*")
echo "scale=$SC; ($sum)/$#" | bc
}
sd () {
local mean=$1
shift
local sum2=0
for i in "$#"; do
sum2=$(echo "scale=$SC; $sum2 + ($mean-$i)^2" | bc)
done
echo "scale=$SC; sqrt($sum2/$#)"|bc
}
paste - - - - - - < data_tripwire1.sh | while read -a values; do
mean=$(arith_mean "${values[#]}")
sd=$(sd $mean "${values[#]}")
echo "${values[#]} $mean $sd"
done | column -t
91 58 54 108 52 18 63.500 29.038
8 81 103 110 129 137 94.666 42.765
84 15 14 18 11 17 26.500 25.811
12 6 1 28 6 14 11.166 8.648
8 8 0 0 28 24 11.333 10.934
25 23 21 13 9 4 15.833 7.711
18 17 18 30 13 3 16.500 7.973
Note you don't need to return a fancy value from the functions: you know how many points you pass in.
Based on Glenn's answer I propose this which needs very little changes to the original:
paste - - - - - - < data_tripwire.sh | while read -a values
do
for value in "${values[#]}"
do
echo "$value"
done | arith_mean
for value in "${values[#]}"
do
echo "$value"
done | sd
done
You can type (or copy & paste) this code directly in an interactive shell. It should work out of the box. Of course, this is not feasible if you intend to use this often, so you can put that code into a text file, make that executable and call that text file as a shell script. In this case you should add #!/bin/bash as first line in that file.
Credit to Glenn Jackman for the use of paste - - - - - - which is the real solution I'd say.
The functions will now be able to only read 6 items in datafile.
arith_mean ()
{
local rt=0 # Running total.
local am=0 # Arithmetic mean.
local ct=0 # Number of data points.
while read value # Read one data point at a time.
do
rt=$(echo "scale=$SC; $rt + $value" | bc)
(( ct++ ))
done
am=$(echo "scale=$SC; $rt / $ct" | bc)
echo $am; return $ct # This function "returns" TWO values!
# Caution: This little trick will not work if $ct > 255!
# To handle a larger number of data points,
#+ simply comment out the "return $ct" above.
} <(awk -v block=$i 'NR > (6* (block - 1)) && NR < (6 * block + 1) {print}' "$datafile") # Feed in data file.
sd ()
{
mean1=$1 # Arithmetic mean (passed to function).
n=$2 # How many data points.
sum2=0 # Sum of squared differences ("variance").
avg2=0 # Average of $sum2.
sdev=0 # Standard Deviation.
while read value # Read one line at a time.
do
diff=$(echo "scale=$SC; $mean1 - $value" | bc)
# Difference between arith. mean and data point.
dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
done
avg2=$(echo "scale=$SC; $sum2 / $n" | bc) # Avg. of sum of squares.
sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
echo $sdev # Standard Deviation.
} <(awk -v block=$i 'NR > (6 * (block - 1)) && NR < (6 * block + 1) {print}' "$datafile") # Rewinds data file.
From main you will need to set your blocks to read.
for((i=1; i <= $(( $(wc -l $datafile | sed 's/[A-Za-z \/]*//g') / 6 )); i++))
do
mean=$(arith_mean); count=$? # Two returns from function!
std_dev=$(sd $mean $count)
done
Of course it is better to move the wc -l outside of the loop for faster execution. But you get the idea.
The syntax error occured between < and ( due to space. There shouldn't be a space between them. Sorry for the typo.
cat <(awk -F: '{print $1}' /etc/passwd) works.
cat < (awk -F: '{print $1}' /etc/passwd) syntax error near unexpected token `('

/proc/uptime in Mac OS X

I need the EXACT same output as Linux's "cat /proc/uptime".
For example, with /proc/uptime, you'd get
1884371.64 38646169.12
but with any Mac alternative, like "uptime", you'd get
20:25 up 20:26, 6 users, load averages: 3.19 2.82 2.76
I need it to be exactly like cat /proc/uptime, but on Mac OS X.
Got it...
$sysctl -n kern.boottime | cut -c14-18
87988
Then I just converted that to readable format (don't remember how):
1 Days 00:26:28
There simply is no "/proc" directory on the Macintosh.
On MacOS, you can do a command like:
sysctl kern.boottime
and you'll get a response like:
kern.boottime: { sec = 1362633455, usec = 0 } Wed Mar 6 21:17:35 2013
boottime=`sysctl -n kern.boottime | awk '{print $4}' | sed 's/,//g'`
unixtime=`date +%s`
timeAgo=$(($unixtime - $boottime))
uptime=`awk -v time=$timeAgo 'BEGIN { seconds = time % 60; minutes = int(time / 60 % 60); hours = int(time / 60 / 60 % 24); days = int(time / 60 / 60 / 24); printf("%.0f days, %.0f hours, %.0f minutes, %.0f seconds", days, hours, minutes, seconds); exit }'`
echo $uptime
Will return something like
1 Day, 20 hours, 10 minutes, 55 seconds
Here is what I Do to get the the values instead of Cut method
sysctl kern.boottime | awk '{print $5}'
Where $5 is the Range of the string
Example
$1 Gives you "sysctl kern.boottime"
$2 Gives you "{"
$3 Gives you "sec"
from the String
kern.boottime: { sec = 1604030189, usec = 263821 } Fri Oct 30 09:26:29 2020

bash 'while read line' efficiency with big file

I was using a while loop to process a task,
which read records from a big file about 10 million lines.
I found that the processing become more and more slower as time goes by.
and I make a simulated script with 1 million lines as blow, which reveal the problem.
but I still don't know why, how does the read command work?
seq 1000000 > seq.dat
while read s;
do
if [ `expr $s % 50000` -eq 0 ];then
echo -n $( expr `date +%s` - $A) ' ';
A=`date +%s`;
fi
done < seq.dat
The terminal outputs the time interval:
98 98 98 98 98 97 98 97 98 101 106 112 121 121 127 132 135 134
at about 50,000 lines,the processing become slower obviously.
Using your code, I saw the same pattern of increasing times (right from the beginning!). If you want faster processing, you should rewrite using shell internal features. Here's my bash version:
tabChar=" " # put a real tab char here, of course
seq 1000000 > seq.dat
while read s;
do
if (( ! ( s % 50000 ) )) ;then
echo $s "${tabChar}" $( expr `date +%s` - $A)
A=$(date +%s);
fi
done < seq.dat
edit
fixed bug, output indicated each line was being processed, now only every 50000'th line gets the timing treatment. Doah!
was
if (( s % 50000 )) ;then
fixed to
if (( ! ( s % 50000 ) )) ;then
output now echo ${.sh.version} = Version JM 93t+ 2010-05-24
50000
100000 1
150000 0
200000 1
250000 0
300000 1
350000 0
400000 1
450000 0
500000 1
550000 0
600000 1
650000 0
700000 1
750000 0
output bash
50000 480
100000 3
150000 2
200000 3
250000 3
300000 2
350000 3
400000 3
450000 2
500000 2
550000 3
600000 2
650000 2
700000 3
750000 3
800000 2
850000 2
900000 3
950000 2
800000 1
850000 0
900000 1
950000 0
1e+06 1
As to why your original test case is taking so long ... not sure. I was surprised to see both the time for each test cyle AND the increase in time. If you really need to understand this, you may need to spend time instrumenting more test stuff. Maybe you'd see something running truss or strace (depending on your base OS).
I hope this helps.
Read is a comparatively slow process, as the author of "Learning the Korn Shell" points out*. (Just above Section 7.2.2.1.) There are other programs, such as awk or sed that have been highly optimized to do what is essentially the same thing: read from a file one line at a time and perform some operations using that input.
Not to mention, that you're calling an external process every time you're doing subtraction or taking the modulus, which can get expensive. awk has both of those functionalities built in.
As the following test points out, awk is quite a bit faster:
#!/usr/bin/env bash
seq 1000000 |
awk '
BEGIN {
command = "date +%s"
prevTime = 0
}
$1 % 50000 == 0 {
command | getline currentTime
close(command)
print currentTime - prevTime
prevTime = currentTime
}
'
Output:
1335629268
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
Note that the first number is equivalent to date +%s. Just like in your test case, I let the first match be.
Note
*Yes the author is talking about the Korn Shell, not bash as the OP tagged, but bash and ksh are rather similar in a lot of ways. ksh is actually a superset of bash. So I would assume that the read command is not drastically different from one shell to another.

Resources