Handling integer out of range in bash

Handling integer out of range in bash - bash

I have collected vmstat data in a file . It gives details about free, buffer and cache .
SInce i'm interested in finding the memory usage , I should do the following computation for each line of vmstat output -- USED=TOTAL - (FREE+BUFFER+CACHE) where TOTAL is the total RAM memory and USED is the instantaneous memory value.
TOTAL memory = 4042928 (4 GB)
My code is here
grep -v procs $1 | grep -v free | awk '{USED=4042928-$4-$5-$6;print $USED}' > test.dat
awk: program limit exceeded: maximum number of fields size=32767
FILENAME="-" FNR=1 NR=1

You should not be printing $USED for a start, the variable in awk is USED:
pax> vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 1402804 258392 2159316 0 0 54 79 197 479 1 2 93 3
pax> vmstat | egrep -v 'procs|free' | awk '{USED=4042928-$4-$5-$6;print USED}'
222780
What is most likely to be happening in your case is that you're using an awk with that limitation of about 32000 fields per record.
Because your fields 4, 5 and 6 are respectively 25172, 664 and 8520 (from one of your comments), your USED value becomes 4042928-25172-664-8520 or 4008572.
If you tried to print USED, that's what you'd get but, because you're trying to print $USED, it thinks you want $4008572 (field number 4008572) which is just a little bit beyond the 32000 range.
Interestingly, if you have a lot more free memory, you wouldn't get the error but you'd still get an erroneous value :-)
By the way, gawk doesn't have this limitation, it simply prints an empty field (see, for example, section 11.9 here).

you can just do it with one awk command
vmstat | awk 'NR>2{print 4042928-$4-$5-$6 }' file

Related

Most elegant unix shell one-liner to sum list of numbers of arbitrary precision?

As regards integer adding one-liners, several proposed shell scripting solutions exist;
however, on closer look at each of the solutions chosen, there are inherent limitations:
awk ones would choke at arbitrary precision and integer size (it behaves C-like, afterall)
bc ones would rather be unhappy with arbitrarily long inputs: (sed 's/$/+\\/g';echo 0)|bc
Understanding that there may be issues of portability on top of that across platforms (see [1] [2]) which is undesired,
is there a generic solution which is a winner on both practicality and brevity?
Hint: SunOS & MacOSX are examples where portability would be an issue.
fi. could dc command permit to handle arbitrarily large 2^n, integer or otherwise, inputs?
[1] awk: https://stackoverflow.com/a/450821/1574494 or https://stackoverflow.com/a/25245025/1574494 or Printing long integers in awk
[2] bc: Bash command to sum a column of numbers

An optimal solution for dc(1) sums the inputs as they are read:
$ jot 1000000 | sed '2,$s/$/+/;$s/$/p/' | dc
500000500000

The one I usually use is paste -sd+|bc:
$ time seq 1 20000000 | paste -sd+|bc
200000010000000
real 0m10.092s
user 0m10.854s
sys 0m0.481s
(For strict Posix compliance, paste needs to be provided with an explicit argument: paste -sd+ -|bc. Apparently that is necessary with the BSD paste implementation installed by default on OS X.)
However, that will fail for larger inputs, because bc buffers an entire expression in memory before evaluating it. On my system, bc ran out of memory trying to add 100 million numbers, although it was able to do 70 million. But other systems may have smaller capacities.
Since bc has variables, you could avoid long lines by repetitively adding to a variable instead of constructing a single long expression. This is (as far as I know) 100% Posix compliant, but there is a 3x time penalty:
$ time seq 1 20000000|sed -e's/^/s+=/;$a\' -es|bc
200000010000000
real 0m29.224s
user 0m44.119s
sys 0m0.820s
Another way to handle the case where the input size exceeds bc's buffering capacity would be to use the standard xargs tool to add the numbers in groups:
$ time seq 1 100000000 |
> IFS=+ xargs sh -c 'echo "$*"' _ | bc | paste -sd+ | bc
5000000050000000
real 1m0.289s
user 1m31.297s
sys 0m19.233s
The number of input lines used by each xargs evaluation will vary from system to system, but it will normally be in the hundreds and it might be much more. Obviously, the xargs | bc invocations could be chained arbitrarily to increase capacity.
It might be necessary to limit the size of the xargs expansion using the -s switch, on systems where ARG_MAX exceeds the capacity of the bc command. Aside from performing an experiment to establish the bc buffer limit, there is no portable way to establish what that limit might be but it certainly should be no less than LINE_MAX which is guaranteed to be at least 2048. Even with 100-digit addends, that will allow a reduction by a factor of 20, so a chain of 10 xargs|bc pipes would handle over 1013 addends assuming you were prepared to wait a couple of months for that to complete.
As an alternative to constructing a large fixed-length pipeline, you could use a function to recursively pipe the output from xargs|bc until only one value is produced:
radd () {
if read a && read b; then
{ printf '%s\n%s\n' "$a" "$b"; cat; } |
IFS=+ xargs -s $MAXLINE sh -c 'echo "$*"' _ |
bc | radd
else
echo "$a"
fi
}
If you use a very conservative value for MAXLINE, the above is quite slow, but with plausible larger values it is not much slower than the simple paste|bc solution:
$ time seq 1 20000000 | MAXLINE=2048 radd
200000010000000
real 1m38.850s
user 0m46.465s
sys 1m34.503s
$ time seq 1 20000000 | MAXLINE=60000 radd
200000010000000
real 0m12.097s
user 0m17.452s
sys 0m5.090s
$ time seq 1 100000000 | MAXLINE=60000 radd
5000000050000000
real 1m3.972s
user 1m31.394s
sys 0m27.946s
As well as the bc solutions, I timed some other possibilities. As shown above, with an input of 20 million numbers, paste|bc took 10 seconds. That's almost identical to the time used by adding 20 million numbers with
gawk -M '{s+=$0} END{print s}'
Programming languages such as python and perl proved to be faster:
# 9.2 seconds to sum 20,000,000 integers
python -c $'import sys\nprint(sum(int(x) for x in sys.stdin))'
# 5.1 seconds
perl -Mbignum -lne '$s+=$_; END{print $s}'
I was unable to test dc -f - -e '[+z1<r]srz1<rp' on large inputs, since its performance appears to be quadratic (or worse); it summed 25 thousand numbers in 3 seconds, but it took 19 seconds to sum 50 thousand and 90 seconds to do 100 thousand.
Although bc is not the fastest and memory limitations require awkward workarounds, it has the advantage of working out of the box on Posix-compliant systems without the necessity to install enhanced versions of any standard utility (awk) or programming languages not required by Posix (perl and python).

You can use gawk with the -M flag:
$ seq 1 20000000 | gawk -M '{s+=$0} END{print s}'
200000010000000
Or Perl with bignum enabled:
$ seq 1 20000000 | perl -Mbignum -lne '$s+=$_; END{print $s}'
200000010000000

$ seq 1000|(sum=0;while read num; do sum=`echo $sum+$num|bc -l`;done;echo $sum)
500500
Also, this one will not win a top-speed prize, however it IS:
oneliner, yes.
portable
adds lists of any length
adds numbers of any precision (each number's length limited only by MAXLINE)
does not rely on external tools such as python/perl/awk/R etc
with a stretch, you may call it elegant too ;-)
come on guys, show the better way to do this!

It seems that the following does the trick:
$ seq 1000|dc -f - -e '[+z1<r]srz1<rp'
500500
but, is it the optimal solution?

jot really slows u down :
( time ( jot 100000000 | pvZ -i 0.2 -l -cN in0 |
mawk2 '{ __+=$_ } END { print __ }' FS='\n' ) )
in0: 100M 0:00:17 [5.64M/s] [ <=> ]
( jot 100000000 | nice pv -pteba -i 1 -i 0.2 -l -cN in0 |
mawk2 FS='\n'; )
26.43s user 0.78s system 153% cpu 17.730 total
5000000050000000
using another awk instance to generate the sequence shaves off 39.7% :
( time (
mawk2 -v __='100000000' '
BEGIN { for(_-=_=__=+__;_<__;) {
print ++_ } }' |
pvZ -i 0.2 -l -cN in0 |
mawk2 '{ __+=$_ } END{ print __ }' FS='\n' ))
in0: 100M 0:00:10 [9.37M/s] [ <=> ]
( mawk2 -v __='100000000' 'BEGIN {…}' | )
19.44s user 0.68s system 188% cpu 10.687 total
5000000050000000
for the bc option, gnu-paste is quite a bit faster than bsd-paste in this regard, but both absolutely pale compared to awk, while perl is only slightly behind :
time jot 15000000 | pvE9 | mawk2 '{ _+=$__ } END { print _ }'
out9: 118MiB 0:00:02 [45.0MiB/s] [45.0MiB/s] [ <=> ]
112500007500000
jot 15000000 2.60s user 0.03s system 99% cpu 2.640 total
pvE 0.1 out9 0.01s user 0.05s system 2% cpu 2.640 total
mawk2 '{...}'
1.09s user 0.03s system 42% cpu 2.639 total
perl -Mbignum -lne '$s+=$_; END{print $s}' # perl 5.36
1.36s user 0.03s system 52% cpu 2.662 total
time jot 15000000 | pvE9 | gpaste -sd+ -|bc
out9: 118MiB 0:00:02 [45.3MiB/s] [45.3MiB/s] [ <=> ]
112500007500000
jot 15000000 2.59s user 0.03s system 99% cpu 2.627 total
pvE 0.1 out9 0.01s user 0.05s system 2% cpu 2.626 total
gpaste -sd+ - 0.27s user 0.03s system 11% cpu 2.625 total # gnu-paste
bc 4.55s user 0.46s system 66% cpu 7.544 total
time jot 15000000 | pvE9 | paste -sd+ -|bc
out9: 118MiB 0:00:05 [22.7MiB/s] [22.7MiB/s] [ <=> ]
112500007500000
jot 15000000 2.63s user 0.03s system 51% cpu 5.207 total
pvE 0.1 out9 0.01s user 0.06s system 1% cpu 5.209 total
paste -sd+ - 5.14s user 0.05s system 99% cpu 5.211 total # bsd-paste
bc 4.53s user 0.40s system 49% cpu 10.029 total

Itemize rows and columns of free command

I want to know how much memory is being used and free in my system. I ran the free command and following is my output
total used free shared buffers cached
Mem: 16334624 16199712 134912 372780 333456 4673092
-/+ buffers/cache: 11193164 5141460
Swap: 4194300 806484 3387816
Now I want to get rid of the first column and last line. So, I used this free | sed -n 1,3p | cut -d " " -f2- and following is the output
total used free shared buffers cached
16334624 16200348 134276 372732 333520 4658336
buffers/cache: 11208492 5126132
Now I want to arrange the values in a single line. e.g.
total = 16334624 , used = 16200348 and so on.... and finally buffers/cache used/free = 11208492 /5126132
Any idea how can I do this?

Using awk it can be done like this:
free | awk 'NR==2 { printf("total = %s\nused = %s\nfree = %s\nshared = %s\nbuff/cache = %s\navailable = %s\n", $2,$3,$4,$5,$6,$7)}'
the condition NR==2 selects only the second line to apply the action on.
the different fields $2 to $7 are the numeric columns from free output.
the \n in the printf creates a new line.

How to calculate half of total memory rounded up to next 1G boundary?

How can I extend the answer in Linux command for percentage of memory that is free so I can get the total memory in rounded up to the next 1G boundary? For example,
# free -h
total used free shared buff/cache available
Mem: 7.4G 2.0G 3.0G 150M 2.4G 5.0G
Swap: 0B 0B 0B
I want the value of "free -h | awk {...}" to be 4, since total Mem = 7.4G, so 7.4/2 = 3 + 1 = 4. Make sense?

Assuming everything in G, you need:
free -h | awk 'NR==2{print(int(substr($2,1,length($2)-1)/2+.5))}'
Explanation:
To print the second line, you need
free -h | awk 'NR==2{print}'
To print the second field of the second line, you need
free -h | awk 'NR==2{print($2)}' # prints 7.7G
To remove the final G, you need substring which start at 1st character and contains as many character as in the string, minus 1.
free -h | awk 'NR==2{print(substr($2,1,length($2)-1))}' # prints 7.7
To get half of it, you need
free -h | awk 'NR==2{print(substr($2,1,length($2)-1)/2)}' # prints 3.85
To round to the closest integer, you need to add 0.5 and truncate the fractional part (which is done by int function)
free -h | awk 'NR==2{print(int(substr($2,1,length($2)-1)/2+.5))}' # prints 4

Try this :
awk '$1=="MemTotal:"{t=$2}END{printf("%.0f GB\n", t/2/1024/1024)}' /proc/meminfo

how to compute CPU usage with bash? [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed last year.
Improve this question
I am wondering how you can get the system CPU usage and present it in percent using bash, for example.
Sample output:
57%
In case there is more than one core, it would be nice if an average percentage could be calculated.

Take a look at cat /proc/stat
grep 'cpu ' /proc/stat | awk '{usage=($2+$4)*100/($2+$4+$5)} END {print usage "%"}'
EDIT please read comments before copy-paste this or using this for any serious work. This was not tested nor used, it's an idea for people who do not want to install a utility or for something that works in any distribution. Some people think you can "apt-get install" anything.
NOTE: this is not the current CPU usage, but the overall CPU usage in all the cores since the system bootup. This could be very different from the current CPU usage. To get the current value top (or similar tool) must be used.
Current CPU usage can be potentially calculated with:
awk '{u=$2+$4; t=$2+$4+$5; if (NR==1){u1=u; t1=t;} else print ($2+$4-u1) * 100 / (t-t1) "%"; }' \
<(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)

You can try:
top -bn1 | grep "Cpu(s)" | \
sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | \
awk '{print 100 - $1"%"}'

Try mpstat from the sysstat package
> sudo apt-get install sysstat
Linux 3.0.0-13-generic (ws025) 02/10/2012 _x86_64_ (2 CPU)
03:33:26 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
03:33:26 PM all 2.39 0.04 0.19 0.34 0.00 0.01 0.00 0.00 97.03
Then some cutor grepto parse the info you need:
mpstat | grep -A 5 "%idle" | tail -n 1 | awk -F " " '{print 100 - $ 12}'a

Might as well throw up an actual response with my solution, which was inspired by Peter Liljenberg's:
$ mpstat | awk '$12 ~ /[0-9.]+/ { print 100 - $12"%" }'
0.75%
This will use awk to print out 100 minus the 12th field (idle), with a percentage sign after it. awk will only do this for a line where the 12th field has numbers and dots only ($12 ~ /[0-9]+/).
You can also average five samples, one second apart:
$ mpstat 1 5 | awk 'END{print 100-$NF"%"}'
Test it like this:
$ mpstat 1 5 | tee /dev/tty | awk 'END{print 100-$NF"%"}'

EDITED: I noticed that in another user's reply %idle was field 12 instead of field 11. The awk has been updated to account for the %idle field being variable.
This should get you the desired output:
mpstat | awk '$3 ~ /CPU/ { for(i=1;i<=NF;i++) { if ($i ~ /%idle/) field=i } } $3 ~ /all/ { print 100 - $field }'
If you want a simple integer rounding, you can use printf:
mpstat | awk '$3 ~ /CPU/ { for(i=1;i<=NF;i++) { if ($i ~ /%idle/) field=i } } $3 ~ /all/ { printf("%d%%",100 - $field) }'

Do this to see the overall CPU usage. This calls python3 and uses the cross-platform psutil module.
printf "%b" "import psutil\nprint('{}%'.format(psutil.cpu_percent(interval=2)))" | python3
The interval=2 part says to measure the total CPU load over a blocking period of 2 seconds.
Sample output:
9.4%
The python program it contains is this:
import psutil
print('{}%'.format(psutil.cpu_percent(interval=2)))
Placing time in front of the call proves it takes the specified interval time of about 2 seconds in this case. Here is the call and output:
$ time printf "%b" "import psutil\nprint('{}%'.format(psutil.cpu_percent(interval=2)))" | python3
9.5%
real 0m2.127s
user 0m0.119s
sys 0m0.008s
To view the output for individual cores as well, let's use this python program below. First, I obtain a python list (array) of "per CPU" information, then I average everything in that list to get a "total % CPU" type value. Then I print the total and the individual core percents.
Python program:
import psutil
cpu_percent_cores = psutil.cpu_percent(interval=2, percpu=True)
avg = sum(cpu_percent_cores)/len(cpu_percent_cores)
cpu_percent_total_str = ('%.2f' % avg) + '%'
cpu_percent_cores_str = [('%.2f' % x) + '%' for x in cpu_percent_cores]
print('Total: {}'.format(cpu_percent_total_str))
print('Individual CPUs: {}'.format(' '.join(cpu_percent_cores_str)))
This can be wrapped up into an incredibly ugly 1-line bash script like this if you like. I had to be sure to use only single quotes (''), NOT double quotes ("") in the Python program in order to make this wrapping into a bash 1-liner work:
printf "%b" \
"\
import psutil\n\
cpu_percent_cores = psutil.cpu_percent(interval=2, percpu=True)\n\
avg = sum(cpu_percent_cores)/len(cpu_percent_cores)\n\
cpu_percent_total_str = ('%.2f' % avg) + '%'\n\
cpu_percent_cores_str = [('%.2f' % x) + '%' for x in cpu_percent_cores]\n\
print('Total: {}'.format(cpu_percent_total_str))\n\
print('Individual CPUs: {}'.format(' '.join(cpu_percent_cores_str)))\n\
" | python3
Sample output: notice that I have 8 cores, so there are 8 numbers after "Individual CPUs:":
Total: 10.15%
Individual CPUs: 11.00% 8.50% 11.90% 8.50% 9.90% 7.60% 11.50% 12.30%
For more information on how the psutil.cpu_percent(interval=2) python call works, see the official psutil.cpu_percent(interval=None, percpu=False) documentation here:
psutil.cpu_percent(interval=None, percpu=False)
Return a float representing the current system-wide CPU utilization as a percentage. When interval is > 0.0 compares system CPU times elapsed before and after the interval (blocking). When interval is 0.0 or None compares system CPU times elapsed since last call or module import, returning immediately. That means the first time this is called it will return a meaningless 0.0 value which you are supposed to ignore. In this case it is recommended for accuracy that this function be called with at least 0.1 seconds between calls. When percpu is True returns a list of floats representing the utilization as a percentage for each CPU. First element of the list refers to first CPU, second element to second CPU and so on. The order of the list is consistent across calls.
Warning: the first time this function is called with interval = 0.0 or None it will return a meaningless 0.0 value which you are supposed to ignore.
Going further:
I use the above code in my cpu_logger.py script in my eRCaGuy_dotfiles repo.
References:
Stack Overflow: How to get current CPU and RAM usage in Python?
Stack Overflow: Executing multi-line statements in the one-line command-line?
How to display a float with two decimal places?
Finding the average of a list
Related
https://unix.stackexchange.com/questions/295599/how-to-show-processes-that-use-more-than-30-cpu/295608#295608
https://askubuntu.com/questions/22021/how-to-log-cpu-load

Convert Mainframe SORT to Shell Script

Is there any easy way to convert JCL SORT to Shell Script?
Here is the JCL SORT:
OPTION ZDPRINT
SORT FIELDS=(15,1,CH,A)
SUM FIELDS=(16,8,25,8,34,8,43,8,52,8,61,8),FORMAT=ZD
OUTREC BUILD=(14X,15,54,13X)
Only bytes 15 for a length of 54 are relevant from the input data, which is the key and the source values for the summation. Others bytes from the input are not important.
Assuming the data is printable.
The data is sorted on the one-byte key, and each value for records with the same key is summed, separately, for each of the six numbers. A single record is written, per key, with the summed values and with other data (those one bytes in between and at the end) from the first record. The sort is "unstable" (meaning that the order of records presented to the summation is not reproduceable from one execution to the next) so the byte values should theoretically be the same on all records, or be irrelevant.
The output, for each key, is presented as a record containing 14 blanks (14X) then the 54 bytes starting at position 15 (which is the one-byte key) and then followed by 13 blanks (13X). The numbers should be right-aligned and left-zero-filled [OP to confirm, and amend sample data and expected output].
Assuming the sum will only contain positive number and will not be signed, and that for any number which is less than 999999990 there will be leading zeros for any unused positions (numbers are character, right-aligned and left-zero-filled).
Assuming the one-byte key will only be alphabetic.
The data has already been converted to ASCII from EBCDIC.
Sample Input:
00000000000000A11111111A11111111A11111111A11111111A11111111A111111110000000000000
00000000000000B22222222A22222222A22222222A22222222A22222222A222222220000000000000
00000000000000C33333333A33333333A33333333A33333333A33333333A333333330000000000000
00000000000000A44444444B44444444B44444444B44444444B44444444B444444440000000000000
Expected Output:
A55555555A55555555A55555555A55555555A55555555A55555555
B22222222A22222222A22222222A22222222A22222222A22222222
C33333333A33333333A33333333A33333333A33333333A33333333
(14 preceding blanks and 13 trailing blanks)
Expected Volume: tenth thousands

I have figured an answer:
awk -v FIELDWIDTHS="14 1 8 1 8 1 8 1 8 1 8 1 8 13" \
'{if(!($2 in a)) {a[$2]=$2; c[$2]=$4; e[$2]=$6; g[$2]=$8; i[$2]=$10; k[$2]=$12} \
b[$2]+=$3; d[$2]+=$5; f[$2]+=$7; h[$2]+=$9; j[$2]+=$11; l[$2]+=$13;} END \
{for(id in a) printf("%14s%s%s%s%s%s%s%s%s%s%s%s%s%13s\n","",a[id],b[id],c[id],d[id],e[id],f[id],g[id],h[id],i[id],j[id],k[id],l[id],"");}' input
Explaination:
1) Split the string
awk -v FIELDWIDTHS="14 1 8 1 8 1 8 1 8 1 8 1 8 13"
2) Let $2 be the key and $4, $6, $8, $10, $12 will only set value for the first time
{if(!($2 in a)) {a[$2]=$2; c[$2]=$4; e[$2]=$6; g[$2]=$8; i[$2]=$10; k[$2]=$12}
3) Others will be summed up
b[$2]+=$3; d[$2]+=$5; f[$2]+=$7; h[$2]+=$9; j[$2]+=$11; l[$2]+=$13;} END
4) Print for each key
{for(id in a) printf("%14s%s%s%s%s%s%s%s%s%s%s%s%s%13s\n","",a[id],b[id],c[id],d[id],e[id],f[id],g[id],h[id],i[id],j[id],k[id],l[id],"");}

okay I have tried something
1) extracting duplicate keys from file and storing it in duplicates file.
awk '{k=substr($0,1,15);a[k]++}END{for(i in a)if(a[i]>1)print i}' sample > duplicates
OR
awk '{k=substr($0,1,15);print k}' sample | sort | uniq -c | awk '$1>1{print $2}' > duplicates
2) For duplicates, doing the calculation and creating newfile with specificied format
while read line
do
grep ^$line sample | awk -F[A-Z] -v key=$line '{for(i=2;i<=7;i++)f[i]=f[i]+$i}END{printf("%14s"," ");for(i=2;i<=7;i++){printf("%s%.8s",substr(key,15,1),f[i]);if(i==7)printf("%13s\n"," ")}}' > newfile
done < duplicates
3) for unique ones,format and append to newfile
grep -v -f duplicates sample | sed 's/0/ /g' >> newfile ## gives error if 0 is within data instead of start and end in a row.
OR
grep -v -f duplicates sample | awk '{printf("%14s%s%13s\n"," ",substr($0,15,54)," ")}' >> newfile
if you have any doubt, let me know.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio