BASH printf numbers with 1 char suffix - bash

I'm trying to format a number in BASH. I'd like to replicate the byte/packet number output from iptables.
here are some examples:
258
591K
55273
37G
22244
2212
6127K
12M
114K
As you can see:
there is no thousands separator,
the field is a max of 5 characters wide,
each suffix is either: none, K, M, G, etc...
I've searched the documentation on printf but have been unable to find anything that can format a number this way. Does anyone know how to do this?
Thanks.

You could build a custom formatting with awk, something like this :
awk 'BEGIN{ u[0]=""; u[1]="K"; u[2]="M"; u[3]="G"} { n = $1; i = 0; while(n > 1000) { i+=1; n= int(n/1000) } print n u[i] } '
Input sample :
258
591000
55273
37000000000
22244
2212
6127000
12000000
114000
Output :
258
591K
55K
37G
22K
2K
6M
12M
114K

has to be done programmatically, but it's not hard
#!/bin/sh
humanFormat() {
test $x -gt 1000000000 && x=`expr x / 1000000000`G
test $x -gt 1000000 && x=`expr x / 1000000`M
test $x -gt 1000 && x=`expr x / 1000`K
}
(edited to fix execution order)

Related

Optimally finding the index of the maximum element in BASH array

I am using bash in order to process software responses on-the-fly and I am looking for a way to find the
index of the maximum element in the array.
The data that gets fed to the bash script is like this:
25 9
72 0
3 3
0 4
0 7
And so I create two arrays. There is
arr1 = [ 25 72 3 0 0 ]
arr2 = [ 9 0 3 4 7 ]
And what I need is to find the index of the maximum number in arr1 in order to use it also for arr2.
But I would like to see if there is a quick - optimal way to do this.
Would it maybe be better to use a dictionary structure [key][value] with the data I have? Would this make the process easier?
I have also found [1] (from user jhnc) but I don't quite think it is what I want.
My brute - force approach is the following:
function MAX {
arr1=( 25 72 3 0 0 )
arr2=( 9 0 3 4 7 )
local indx=0
local max=${arr1[0]}
local flag
for ((i=1; i<${#arr1[#]};i++)); do
#To avoid invalid arithmetic operators when items are floats/doubles
flag=$( python <<< "print(${arr1$[${i}]} > ${max})")
if [ $flag == "True" ]; then
indx=${i}
max=${arr1[${i}]}
fi
done
echo "MAX:INDEX = ${max}:${indx}"
echo "${arr1[${indx}]}"
echo "${arr2[${indx}]}"
}
This approach obviously will work, BUT, is it the optimal one? Is there a faster way to perform the task?
arr1 = [ 99.97 0.01 0.01 0.01 0 ]
arr2 = [ 0 6 4 3 2 ]
In this example, if an array contains floats then I would get a
syntax error: invalid arithmetic operator (error token is ".97)
So, I am using
flag=$( python <<< "print(${arr1$[${i}]} > ${max})")
In order to overcome this issue.
Finding a maximum is inherently an O(n) operation. But there's no need to spawn a Python process on each iteration to perform the comparison. Write a single awk script instead.
awk 'BEGIN {
split(ARGV[1], a1);
split(ARGV[2], a2);
max=a1[1];
indx=1;
for (i in a1) {
if (a1[i] > max) {
indx = i;
max = a1[i];
}
}
print "MAX:INDEX = " max ":" (indx - 1)
print a1[indx]
print a2[indx]
}' "${arr1[*]}" "${arr2[*]}"
The two shell arrays are passed as space-separated strings to awk, which splits them back into awk arrays.
It's difficult to do it efficiently if you really do need to compare floats. Bash can't do floats, which means invoking an external program for every number comparison. However, comparing every number in bash, is not necessarily needed.
Here is a fast, pure bash, integer only solution, using comparison:
#!/bin/bash
arr1=( 25 72 3 0 0)
arr2=( 9 0 3 4 7)
# Get the maximum, and also save its index(es)
for i in "${!arr1[#]}"; do
if ((arr1[i]>arr1_max)); then
arr1_max=${arr1[i]}
max_indexes=($i)
elif [[ "${arr1[i]}" == "$arr1_max" ]]; then
max_indexes+=($i)
fi
done
# Print the results
printf '%s\n' \
"Array1 max is $arr1_max" \
"The index(s) of the maximum are:" \
"${max_indexes[#]}" \
"The corresponding values from array 2 are:"
for i in "${max_indexes[#]}"; do
echo "${arr2[i]}"
done
Here is another optimal method, that can handle floats. Comparison in bash is avoided altogether. Instead the much faster sort(1) is used, and is only needed once. Rather than starting a new python instance for every number.
#!/bin/bash
arr1=( 25 72 3 0 0)
arr2=( 9 0 3 4 7)
arr1_max=$(printf '%s\n' "${arr1[#]}" | sort -n | tail -1)
for i in "${!arr1[#]}"; do
[[ "${arr1[i]}" == "$arr1_max" ]] &&
max_indexes+=($i)
done
# Print the results
printf '%s\n' \
"Array 1 max is $arr1_max" \
"The index(s) of the maximum are:" \
"${max_indexes[#]}" \
"The corresponding values from array 2 are:"
for i in "${max_indexes[#]}"; do
echo "${arr2[i]}"
done
Example output:
Array 1 max is 72
The index(s) of the maximum are:
1
The corresponding values from array 2 are:
0
Unless you need those arrays, you can also feed your input script directly in to something like this:
#!/bin/bash
input-script |
sort -nr |
awk '
(NR==1) {print "Max: "$1"\nCorresponding numbers:"; max = $1}
{if (max == $1) print $2; else exit}'
Example (with some extra numbers):
$ echo \
'25 9
72 0
72 11
72 4
3 3
3 14
0 4
0 1
0 7' |
sort -nr |
awk '(NR==1) {max = $1; print "Max: "$1"\nCorresponding numbers:"}
{if (max == $1) print $2; else exit}'
Max: 72
Corresponding numbers:
4
11
0
You can also do it 100% in awk, including sorting:
$ echo \
'25 9
72 0
72 11
72 4
3 3
3 14
0 4
0 1
0 7' |
awk '
{
col1[a++] = $1
line[a-1] = $0
}
END {
asort(col1)
col1_max = col1[a-1]
print "Max is "col1_max"\nCorresponding numbers are:"
for (i in line) {
if (line[i] ~ col1_max"\\s") {
split(line[i], max_line)
print max_line[2]
}
}
}'
Max is 72
Corresponding numbers are:
0
11
4
Or, just to get the maximum of column 1, and any single number from column 2, that corresponds with it. As simply as possible:
$ echo \
'25 9
72 0
3 3
0 4
0 7' |
sort -nr |
head -1
72 0

Need help to find average, min and max values in shell script from text file

I'm working on a shell script right now. I need to loop through a text file, grab the text from it, and find the average number, max number and min number from each line of numbers then print them in a chart with the name of each line. This is the text file:
Experiment1 9 8 1 2 9 0 2 3 4 5
collect1 83 39 84 2 1 3 0 9
jump1 82 -1 9 26 8 9
exp2 22 0 7 1 0 7 3 2
jump2 88 7 6 5
taker1 5 5 44 2 3
so far all I can do is loop through it and print each line like so:
#!/bin/bash
while read line
do
echo $line
done < mystats.txt
I'm a beginner and nothing I've found online has helped me.
One way, using perl for all the calculations:
$ perl -MList::Util=min,max,sum -anE 'BEGIN { say "Name\tAvg\tMin\tMax" }
$n = shift #F; say join("\t", $n, sum(#F)/#F, min(#F), max(#F))' mystats.txt
Name Avg Min Max
Experiment1 4.3 0 9
collect1 27.625 0 84
jump1 22.1666666666667 -1 82
exp2 5.25 0 22
jump2 26.5 5 88
taker1 11.8 2 44
It uses autosplit mode (-a) to split each line into an array (Much like awk), and the standard List::Util module's math functions to calculate the mean, min, and max of each line's numbers.
And here's a pure bash version using nothing but builtins (Though I don't recommend doing this; among other things bash doesn't do floating point math, so the averages are off):
#!/usr/bin/env bash
printf "Name\tAvg\tMin\tMax\n"
while read name nums; do
read -a numarr <<< "$nums"
total=0
min=${numarr[0]}
max=${numarr[0]}
for n in "${numarr[#]}"; do
(( total += n ))
if [[ $n -lt $min ]]; then
min=$n
fi
if [[ $n -gt $max ]]; then
max=$n
fi
done
(( avg = total / ${#numarr[*]} ))
printf "%s\t%d\t%d\t%d\n" "$name" "$avg" "$min" "$max"
done < mystats.txt
Using awk:
awk '{
min = $2; max = $2; sum = $2;
for (i=3; i<=NF; i++) {
if (min > $i) min = $i;
if (max < $i) max = $i;
sum+=$i }
printf "for %-20s min=%10i max=%10i avg=%10.3f\n", $1, min, max, sum/(NF-1) }' mystats.txt

Is there a way to analyze packet intervals in order to output # of packets per second?

I have about 54,000 packets to analyze and I am trying to determine the average # of packets per second (as well as the min and max # of packets during a given second)
My input file is a single column of the packet times (see sample below):
0.004
0.015
0.030
0.050
..
..
1999.99
I've used awk to determine the timing deltas but can't figure out a way to parse out the chunks of time to get an output of:
0-1s = 10 packets
1-2s = 15 packets
etc
Here is an example of how you can use awk to get the desired output.
Suppose your original input file is sample.txt, first thing to do is reverse sort it (sort -nr) then you can supply awk with the newly sorted file along with the time variable through awk "-v" argument. Perform your tests inside awk, make use of "next" to skip lines and "exit" to quit the awk script when needed.
#!/bin/bash
#
for i in 0 1 2 3
do
sort -nr sample.txt |awk -v time=$i 'BEGIN{number=0}''{
if($1>=(time+1)){next}
else if( $1>=time && $1 <(time+1))
{number+=1}
else{
printf "[ %d - %d [ : %d records\n",time,time+1,number;exit}
}'
done
Here's the sample file:
0.1
0.2
0.8
.
.
0.94
.
.
1.5
1.9
.
3.0
3.6
Here's the program's output:
[ 1 - 2 [ : 5 records
[ 2 - 3 [ : 8 records
[ 3 - 4 [ : 2 records
Hope this helps !
Would you please try the followings:
With bash:
max=0
while read -r line; do
i=${line%.*} # extract the integer part
a[$i]=$(( ${a[$i]} + 1 )) # increment the array element
(( i > max )) && max=$i # update the maximum index
done < sample.txt
# report the summary
for (( i=0; i<=max; i++ )); do
printf "%d-%ds = %d packets\n" "$i" $(( i+1 )) "${a[$i]}"
done
With AWK:
awk '
{
i = int($0)
a[i]++
if (i > max) max = i
}
END {
for (i=0; i<=max; i++)
printf("%d-%ds = %d packets\n", i, i+1, a[i])
}' sample.txt
sample.txt:
0.185
0.274
0.802
1.204
1.375
1.636
1.700
1.774
1.963
2.044
2.112
2.236
2.273
2.642
2.882
3.000
3.141
5.023
5.082
Output:
0-1s = 3 packets
1-2s = 6 packets
2-3s = 6 packets
3-4s = 2 packets
4-5s = 0 packets
5-6s = 2 packets
Hope this helps.

Calculate Median in Multiple Rows

I have a file name numbers, simply contain bunch random numbers
1 2 3
7 5 9
2 2 9
5 4 5
7 2 6
I have to create a script that find the median for each row, and here is my code:
while read -a row
do
for i in "${row[#]}"
do
length=`expr ${#row[#]} % 2`
if [ $length -ne 0 ] ; then
mid=`expr ${#row[#]} / 2`
echo ${row[middle]}
elif [ $length -eq 0 ] ; then
val1=`expr ${#row[#]} / 2`
val2=`expr (${$row[#]} / 2) + 1`
mid=`expr ($val1 + $val2) / 2`
echo $mid
done | sort -n
done < numbers
However this doesn't work, it shows error instead. What mistake did I do in this code? Also I still haven't figure out where is the proper way to place the sort -n since it needs to be sorted first before calculate the median, right?
Bash can only do integer arithmetic, you need a tool like bc to compute the average:
#!/bin/bash
while read -a n ; do
n=($(IFS=$'\n' ; echo "${n[*]}" | sort -n))
len=${#n[#]}
if (( len % 2 )) ; then
echo ${n[ len / 2 ]}
else
bc -l <<< "scale=1; (${n[ len / 2 - 1 ]} + ${n[ len / 2 ]}) / 2"
fi
done
I'd probably reach for a higher level language, e.g. Perl:
#!/usr/bin/perl
use warnings;
use strict;
while (<>) {
my #n = sort { $a <=> $b } split;
print #n % 2 ? $n[ #n / 2 ]
: ($n[ #n / 2 - 1 ] + $n[ #n / 2 ]) / 2,
"\n";
}
I just had to awk it, for the fun of it.
Notice I don't use an if but fractions of indexes.
awk '{
split($0,a) # create array a from input line
asort(a,b) # sort array into array b (gnu awk specific)
# add twice the median, or around the median and divide by 2
print ( b[int(NF/2+0.7)] + b[int(NF/2+1.2)] )/2
}' numbers
Shortened (67 chars):
awk '{split($0,a);asort(a,b);print(b[int(NF/2+0.7)]+b[int(NF/2+1.2)])/2}' numbers
66 chars golf :-)
awk '{split($0,a);asort(a,b);$0=(b[int(NF/2+0.7)]+b[int(NF/2+1.2)])/2}1' numbers

Bash scripting: Find minimum value in a script

I am writing a script that finds the minimum value in a string. The string is given to me with a cat <file> and then I parse each number inside that string. The string only contains a set of numbers that is separated by spaced.
This is the code:
echo $FREQUENCIES
for freq in $FREQUENCIES
do
echo "Freq: $freq"
if [ -z "$MINFREQ" ]
then
MINFREQ=$freq
echo "Assigning MINFREQ for the first time with $freq"
elif [ $MINFREQ -gt $freq ]
then
MINFREQ=$freq
echo "Replacing MINFREQ with $freq"
fi
done
Here is the output I get:
800000 700000 600000 550000 500000 250000 125000
Freq: 800000
Assigning MINFREQ for the first time with 800000
Freq: 700000
Replacing MINFREQ with 700000
Freq: 600000
Replacing MINFREQ with 600000
Freq: 550000
Replacing MINFREQ with 550000
Freq: 500000
Replacing MINFREQ with 500000
Freq: 250000
Replacing MINFREQ with 250000
Freq: 125000
Replacing MINFREQ with 125000
Freq:
: integer expression expected
The problem is that the last line, for some reason, is empty or contain white spaces (I am not sure why). I tried testing if the variable was set: if [ -n "$freq" ] but this test doesn't seem to work fine here, it still goes through the if statement for the last line.
Could someone please help me figure out why the last time the loop executes, $freq is set to empty or whitespace and how to avoid this please?
EDIT:
using od -c feeded with echo "<<$freq>>"
0000000 < < 8 0 0 0 0 0 > > \n
0000013
0000000 < < 7 0 0 0 0 0 > > \n
0000013
0000000 < < 6 0 0 0 0 0 > > \n
0000013
0000000 < < 5 5 0 0 0 0 > > \n
0000013
0000000 < < 5 0 0 0 0 0 > > \n
0000013
0000000 < < 2 5 0 0 0 0 > > \n
0000013
0000000 < < 1 2 5 0 0 0 > > \n
0000013
0000000 < < \r > > \n
0000006
There seems to be an extra \r (from the file).
Thank you very much!
If you're only working with integer values, you can validate your string using regex:
elif [[ $freq =~ ^[0-9]+$ && $MINFREQ -gt $freq ]]
For the error problem: you might have some extra white space in $FREQUENCIES?
Another solution with awk
echo $FREQUENCIES | awk '{min=$1;for (i=1;i++;i<=NF) {if ( $i<min ) { min=$i } } ; print min }'
If it's a really long variable, you can go with:
echo $FREQUENCIES | awk -v RS=" " 'NR==1 {min=$0} {if ( $0<min ) { min=$0 } } END {print min }'
(It sets the record separator to space, then on the very first record sets the min to the value, then for every record check if it's smaller than min and finally prints it.
HTH
If you are using bash you have arithmetic expressions and the "if unset: use value and assign" parameter substitution:
#!/bin/bash
for freq in "$#"; do
(( minfreq = freq < ${minfreq:=freq} ? freq : minfreq ))
done
echo $minfreq
use:
./script 800000 700000 600000 550000 500000 250000 125000
Data :
10,
10.2,
-3,
3.8,
3.4,
12
Minimum :
echo -e "10\n10.2\n-3\n3.8\n3.4\n12" | sort -n | head -1
Output: -3
Maximum :
echo -e "10\n10.2\n-3\n3.8\n3.4\n12" | sort -nr | head -1
Output: 12
How ? : 1. Print line by line 2. sort for numbers (Reverse for getting maximum)3. print first line alone.Simple !!
This may not be a good method. But easy for learners. I am sure.
echo $FREQUENCIES | awk '{for (;NF-1;NF--) if ($1>$NF) $1=$NF} 1'
compare first and last field
set first field to the smaller of the two
remove last field
once one field remains, print
Example

Resources