Convert integers to floats in awk - bash

cat file.txt
1
1.6
0
0.7
2
3
3.7
The expected output is:
1.0
1.6
0.0
0.7
2.0
3.0
3.7
any awk or bash solution? Wherever is integer, convert to float.

To awk, it is just a matter to indicate the format you want to use to print the results. Use printf to indicate that you want 1 decimal digit:
$ awk '{printf ("%.1f\n", $1)}' file
1.0
1.6
0.0
0.7
2.0
3.0
3.7
You can say %e for scientific format, %.5f to get 5 digits, etc. See the provided link for more details.

Similarly to the solution with awk, you can use printf with bash too, e.g. via xargs:
cat file.txt | xargs -n 1 printf "%.1f\n"
The formatting options of printf are similar.
Note that printf (either for awk or bash) will output rounded value. So if you have 0.77 in your input, the result of the %.1f format will be 0.8.

Related

How cut one character after the dot in a shell script variables

I need to remove all characters after the first one after the dot:
example:
the temperature is 28.34567 C°
I need only 28.3
I've tried with cut -d'.' -f1 but cut all after the dot..
Thanks a lot
If you are using Bash:
$ var=23.123
$ [[ $var =~ [0-9]*(\.[0-9]{,1})? ]] && echo ${BASH_REMATCH[0]}
Output:
23.1
You could add this line to your script if you have python3+
python -c "print(f\"{28.34567:.1f}\")"
This solution rounds the result (ceil)
Output:
28.3
There are a few ways to do this, each with its own issues. The trivial solution is sed. Something like:
$ echo "the temperature is 28.37567 C°" | sed -e 's/\([[:digit:]]\.[0-9]\)[0-9]*/\1/g'
the temperature is 28.3 C°
but you probably don't want truncation. Rounding is probably more appropriate, in which case:
$ echo "the temperature is 28.37567 C°" | awk '{$4 = sprintf("%.1f", $4)}1'
the temperature is 28.4 C°
but that's pretty fragile in matching the 4th field. You could add a loop to check all the fields, but this gives the idea. Also note that awk will squeeze all your whitespace.
2 brute force ways, depending on whether u wanna keep the celsius sign or not :
mawk '$!_=int(10*$_)/10' <<<'28.34567 C°'
28.3 C°
mawk '$!NF=int(10*$_)/10' <<<'28.34567 C°'
28.3

Awk muliplication gives me a different value than the normal multiplication of 2 numbers [duplicate]

I have a pipe delimited feed file which has several fields. Since I only need a few, I thought of using awk to capture them for my testing purposes. However, I noticed that printf changes the value if I use "%d". It works fine if I use "%s".
Feed File Sample:
[jaypal:~/Temp] cat temp
302610004125074|19769904399993903|30|15|2012-01-13 17:20:02.346000|2012-01-13 17:20:03.307000|E072AE4B|587244|316|13|GSM|1|SUCC|0|1|255|2|2|0|213|2|0|6|0|0|0|0|0|10|16473840051|30|302610|235|250|0|7|0|0|0|0|0|10|54320058002|906|722310|2|0||0|BELL MOBILITY CELLULAR, INC|BELL MOBILITY CELLULAR, INC|Bell Mobility|AMX ARGENTINA SA.|Claro aka CTI Movil|CAN|ARG|
I am interested in capturing the second column which is 19769904399993903.
Here are my tests:
[jaypal:~/Temp] awk -F"|" '{printf ("%d\n",$2)}' temp
19769904399993904 # Value is changed
However, the following two tests works fine -
[jaypal:~/Temp] awk -F"|" '{printf ("%s\n",$2)}' temp
19769904399993903 # Value remains same
[jaypal:~/Temp] awk -F"|" '{print $2}' temp
19769904399993903 # Value remains same
So is this a limit of "%d" of not able to handle long integers. If thats the case why would it add one to the number instead of may be truncating it?
I have tried this with BSD and GNU versions of awk.
Version Info:
[jaypal:~/Temp] gawk --version
GNU Awk 4.0.0
Copyright (C) 1989, 1991-2011 Free Software Foundation.
[jaypal:~/Temp] awk --version
awk version 20070501
Starting with GNU awk 4.1 you can use --bignum or -M
$ awk 'BEGIN {print 19769904399993903}'
19769904399993904
$ awk --bignum 'BEGIN {print 19769904399993903}'
19769904399993903
§ Command-Line Options
I believe the underlying numeric format in this case is an IEEE double. So the changed value is a result of floating point precision errors. If it is actually necessary to treat the large values as numerics and to maintain accurate precision, it might be better to use something like Perl, Ruby, or Python which have the capabilities (maybe via extensions) to handle arbitrary-precision arithmetic.
UPDATE: Recent versions of GNU awk support arbitrary precision arithmetic. See the GNU awk manual for more info.
ORIGINAL POST CONTENT:
XMLgawk supports arbitrary precision arithmetic on floating-point numbers.
So, if installing xgawk is an option:
zsh-4.3.11[drado]% awk --version |head -1; xgawk --version | head -1
GNU Awk 4.0.0
Extensible GNU Awk 3.1.6 (build 20080101) with dynamic loading, and with statically-linked extensions
zsh-4.3.11[drado]% awk 'BEGIN {
x=665857
y=470832
print x^4 - 4 * y^4 - 4 * y^2
}'
11885568
zsh-4.3.11[drado]% xgawk -lmpfr 'BEGIN {
MPFR_PRECISION = 80
x=665857
y=470832
print mpfr_sub(mpfr_sub(mpfr_pow(x, 4), mpfr_mul(4, mpfr_pow(y, 4))), 4 * y^2)
}'
1.0000000000000000000000000
This answer was partially answered by #Mark Wilkins and #Dennis Williamson already but I found out the largest 64-bit integer that can be handled without losing precision is 2^53.
Eg awk's reference page
http://www.gnu.org/software/gawk/manual/gawk.html#Integer-Programming
(sorry if my answer is too old. Figured I'd still share for the next person before they spend too much time on this like I did)
You're running into Awk's Floating Point Representation Issues. I don't think you can find a work-around within awk framework to perform arithmetic on huge numbers accurately.
Only possible (and crude) way I can think of is to break the huge number into smaller chunk, perform your math and join them again or better yet use Perl/PHP/TCL/bsh etc scripting languages that are more powerful than awk.
Using nawk on Solaris 11, I convert the number to a string by adding (concatenate) a null to the end, and then use %15s as the format string:
printf("%15s\n", bignum "")
another caveat about the precision :
the errors pile up with extra operations ::
echo 19769904399993903 | mawk2 '{ CONVFMT = "%.2000g";
OFMT = "%.20g";
} {
print;
print +$0;
print $0/1.0
print $0^1.0;
print exp(-log($0))^-1;
print exp(1*log($0))
print sqrt(exp(exp(log(20)-log(10))*log($0)))
print (exp(exp(log(6)-log(3))*log($0)))^2^-1
}'
19769904399993903
19769904399993904
19769904399993904
19769904399993904
19769904399993912
19769904399993908
19769904399993628 <<<—— -275
19769904399993768 <<<—- -135
The first few only off by less than 10.
last 2 equations have triple digit deltas.
For any of the versions that require calling helper math functions, simply getting the -M bignum flag is insufficient. One must also set the PREC variable.
For this exmaple, setting PREC=64 and OFMT="%.17g" should suffice.
Beware of setting OFMT too high, relative to PREC, otherwise you'll see oddities like this :
gawk -M -v PREC=256 -e '{ CONVFMT="%.2000g"; OFMT="%.80g";... } '
19769904399993903
19769904399993903.000000000000000000000000000000000000000000000000000000000003734
19769904399993903.000000000000000000000000000000000000000000000000000000000003734
19769904399993903.000000000000000000000000000000000000000000000000000000000003734
19769904399993903.000000000000000000000000000000000000000000000000000000000003734
since 80 significant digits require precision of at least 265.75, so basically 266-bits, but gawk is fast enough that you can probably safely pre-set it at PREC=4096/8192 instead of having to worry about it everytime

awk-if with variables doesn't work properly sometimes

In a text, I want to classify my data according to it range.
For example, 8.1,9.1,9.9 are all in [8,10). I used variables called left and right to replace 8 and 10 respectively in awk-if. But it doesn't work properly.
My data like this:
9.1 aa
9.2 bb
10.1 cc
11.9 dd
Then my scripts like this:
left=8;right=10 #left=10;right=12
echo "["$left","$right"]:"
cat data | awk '{if(($1>="'$left'")&&($1<"'$right'")) print $2}' | xargs
The result is empty.
[8,10]:
But if I use 8 and 10 directly (without variables), it's OK. And when I use left=10, right=12, it works also properly.
I also found when left=98, right=100, it also didn't work. So why sometimes it doesn't work? Thanks a lot!
With awk's option -v:
left=8;right=10
awk -v l="$left" -v r="$right" '{if($1>=l&&$1<r) print $2}' data
or with environment variables:
export left=8 right=10
awk '{if($1>=ENVIRON["left"]&&$1<ENVIRON["right"]) print $2}' data
Output:
aa
bb
You are performing string comparison but want numeric comparison. Just swap the quotes:
left=8;right=10 #left=10;right=12
echo "["$left","$right"]:"
cat data | awk '{if(($1>='"$left"')&&($1<'"$right"')) print $2}' | xargs

Bash Math Oddity (Floating Point Division)

So I'm having some trouble with bash / bc math here..
I'm trying to print the filesize in of a backup after I move it to my gdrive via rclone for backup. So I get the filesize via an rclone ls statement with awk print $1 which works great.
In my specific example, I get the value of 1993211 (bytes).
So in my printing code I try to divide this by 1048576 to get it into mb. Which should give me 1.9 mb.
However,
$ expr 1993211 / 1048576 | bc -l
prints 1
I've tried various other math options listed here (incl via python / node) and I always get 1 or 1.0. How is this possible?
The calculation should be 1993211 / 1048576 = 1.90087413788
Any idea whats going on here?
That's because it does integer division.
Do get floating point division you could run:
bc -l <<< '1993211 / 1048576'
which returns: 1.90087413787841796875
or you can set the number of decimals using scale:
bc -l <<< 'scale=5; 1993211 / 1048576'
which returns: 1.90087
In the command expr 1993211 / 1048576 | bc -l, expr divides 1993211 by 1048576 using integer division ('cause that's what expr knows how to do), gets "1" as the result, and prints it. bc -l receives that "1" as input, and since there's no operation specified (expr already did that), it just prints it.
What you want is to pass the expression "1993211 / 1048576" directly as input to bc -l:
$ echo "1993211 / 1048576" | bc -l
1.90087413787841796875

error in bash script

I have input files of this format
real 0.00
user 0.00
sys 0.00
real 0.00
user 0.00
sys 0.00
real 0.00
user 0.00
sys 0.00
I'm writing a bash script to get the average of the 'real' values. This is the script I've written
#! /bin/sh
# FILES=/home/myfiles
for f in $FILES
do
echo " Processing $f file.."
sum=0;
grep real $f | while read LINE; do
value=$(sed "s/[^0-9]//g")
#value=`awk "^[0-9]"`
echo $value
$sum+=$value
done
#average=$sum/10;
#echo $average
done
But I'm getting an error in this stmt
$sum+=$value
Any solutions plz ?
Use this:
sum+=$value
Otherwise you'd be saying "0+=$value"
Also, you can do:
grep real $f | while read LINE value; do
That'll avoid the need to sed/awk.
bash does not support floating point arithmetic. It only supports integer. If you don't care how to get the result, awk is better equipped for this:
awk '/real/ {sum += $2} END {print sum}' files*
The /real/ says, "look for those lines with the word real", then {sum += $2} means add the second field to sum. By default, a variable like sum will starts as empty or zero depends on context. Finally, the END pattern says, "after processing all the files, print the sum."
Its better to use awk or some programming language that does file processing as well as floating maths all in one. Bash does not support floating point maths. The problem with your script is , you call external sed command for every "real" lines you find. This is a performance hit.
awk '/real/{s+=$2;c++}END{print "average is: " s/c}' file
Here is a quick try, based on our discussion above.
I tried not to change your script too much
here is a summary
Values are dumped into an array (the pipes you had cause the math to occur at a subshell, results would not make it up to the 'main' shell
Added more processing in the sed expression to strip leading zeros (but make sure there is at least one digit)
I didnt follow why to calculate average you divided by 10, so I use the actual count of items in the array
finally, assuming you want the average in the same unit of meeasurement as the input - I am printing with printf the result of dividing by 100.
-
#! /bin/sh
# FILES=/home/myfiles
FILES=a
for f in $FILES
do
echo " Processing $f file.."
values=($(grep real ${f} | sed -e "s/[^0-9]//g" -e "s/^0*//" -e "s/^$/0/"))
sum=0;
for value in ${values[#]}; do
echo $value
((sum+=value))
done
average=$((sum/${#values[#]}));
printf "AVG: %d.%02d\n" $((average/100)) $((average%100))
done

Resources