Why does printf (Unix) use round half down? - bash

Why does printf behave in such an uncommon way?
> printf %.0f 2.5
> 2
> printf %.0f 2.51
> 3
Is there an advantage of this behaviour that compensates the probable misunderstandings (like this one)?

It's not strictly round-down:
> printf '%.0f\n' 2.5
2
> printf '%.0f\n' 3.5
4
This is a form of rounding used to combat bias if you are rounding a large number of values; roughly half of them will be rounded down, the other half rounded up. The rule is, round down if the integer portion is even, up if the integer portion is odd.
This is, however, only an explanation of a particular rounding scheme, which is not guaranteed to be used by all implementations of printf.

From the POSIX specification for the printf command:
The floating-point formatting conversion specifications of printf() are not required because all arithmetic in the shell is integer arithmetic. The awk utility performs floating-point calculations and provides its own printf function. The bc utility can perform arbitrary-precision floating-point arithmetic, but does not provide extensive formatting capabilities. (This printf utility cannot really be used to format bc output; it does not support arbitrary precision.) Implementations are encouraged to support the floating-point conversions as an extension.
Thus: %f isn't even required to exist at all; anything it may or may not do is entirely unspecified by the relevant standard.
Similarly, there's no guidance on rounding provided on the POSIX standard for the printf() function:
f, F
The double argument shall be converted to decimal notation in the style "[-]ddd.ddd", where the number of digits after the radix character is equal to the precision specification. If the precision is missing, it shall be taken as 6; if the precision is explicitly zero and no '#' flag is present, no radix character shall appear. If a radix character appears, at least one digit appears before it. The low-order digit shall be rounded in an implementation-defined manner.
A double argument representing an infinity shall be converted in one of the styles "[-]inf" or "[-]infinity"; which style is implementation-defined. A double argument representing a NaN shall be converted in one of the styles "[-]nan(n-char-sequence)" or "[-]nan"; which style, and the meaning of any n-char-sequence, is implementation-defined. The F conversion specifier produces "INF", "INFINITY", or "NAN" instead of "inf", "infinity", or "nan", respectively.

Related

Convert string currency to float with ruby

I have the following string:
"1.273,08"
And I need to convert to float and the result must be:
1273.08
I tried some code using gsub but I can't solve this.
How can I do this conversion?
You have already received two good answers how to massage your String into your desired format using String#delete and String#tr.
But there is a deeper problem.
The decimal value 1 273.0810 cannot be accurately represented as an IEEE 754-2019 / ISO/IEC 60559:2020 binary64 floating point value.
Just like the value 1/3rd can easily be represented in ternary (0.13) but has an infinite representation in decimal (0.33333333…10, i.e. 0.[3]…10) and thus cannot be accurately represented, the value 8/100th can easily be represented in decimal (0.0810) but has an infinite representation in binary (0.0001010001111010111000010100011110101110000101…2, i.e. 0.[00010100011110101110]…2). In other words, it is impossible to express 1 273.0810 as a Ruby Float.
And that's not specific to Ruby, or even to programming, that is just basic high school maths: you cannot represent this number in binary, period, just like you cannot represent 1/3rd in decimal, or π in any integer base.
And of course, computers don't have infinite memory, so not only does 1 273.0810 have an infinite representation in binary, but as a Float, it will also be cut off after 64 bits. The closest possible value to 1 273.0810 as an IEEE 754-2019 / ISO/IEC 60559:2020 binary64 floating point value is 1 273.079 999 999 999 927 240 423 858 1710, which is less than 1 273.0810.
That is why you should never represent money using binary numbers: everybody will expect it to be decimal, not binary; if I write a cheque, I write it in decimal, not binary. People will expect that it is impossible to represent $ 1/3rd, they will expect that it is impossible to represent $ π, but they will not expect and not accept that if they put $ 1273.08 into their account, they will actually end up with slightly less than that.
The correct way to represent money would be to use a specialized Money datatype, or at least using the bigdecimal library from the standard library:
require 'bigdecimal'
BigDecimal('1.273,08'.delete('.').tr(',', '.'))
#=> 0.127308e4
I would do
"1.273,08".delete('.') # delete '.' from the string
.tr(',', '.') # replace ',' with '.'
.to_f # translate to float
#=> 1273.08
So, we're using . as a thousands separator and , instead of a dot:
str = "1.273,08"
str.gsub('.','').gsub(',', '.').to_f

d3.format "none" type not rounding

The d3 documentation states that the (none) format type works "like g, but trims insignificant trailing zeros". The g format type uses "either decimal or exponent notation, rounded to significant digits."
Mike Bostock explained that "The none format type trims trailing zeros, but the precision is interpreted as significant digits rather than the number of digits past the decimal point."
If I use d3.format('.2')(2.0), I get 2 (trailing zeros are dropped).
But when I use d3.format('.2')(2.001) the result is 2.001: No rounding happens. I would have expected the result to be 2.0 (rounding to two significant digits, but keeping the zero), or 2 (rounding to two significant digits, then dropping the zero).
Is this a bug, or am I misunderstanding the syntax?
This happened because I was using an old version of d3 (3.5.17, which ships with the current version of plot.ly 1.27.1).
In that version of d3, the (none) format type doesn't exist. It was introduced in 2015.

Representation of negative integers

Does ISO-Prolog have any prescriptions / recommendations
regarding the representation of negative integers and operations on them? 2's complement, maybe?
Asking as a programmer/user: Are there any assumptions I can safely make when performing bit-level operations on negative integers?
ISO/IEC 13211-1 has several requirements for integers, but a concrete representation is not required. If the integer representation is bounded, one of the following conditions holds
7.1.2 Integer
...
minint = -(*minint)
minint = -(maxint+1)
Further, the evaluable functors listed in 9.4 Bitwise functors, that is (>>)/2, (<<)/2, (/\)/2, (\/)/2, (\)/1, and xor/2 are implementation defined for negative values. E.g.,
8.4.1 (>>)/2 – bitwise right shift
9.4.1.1 Description
...
The value shall be implementation defined depending onwhether the shift is logical (fill with zeros) or arithmetic(fill with a copy of the sign bit).The value shall be implementation defined if VS is negative,or VS is larger than the bit size of an integer.
Note that implementation defined means that a conforming processor has to document this in the accompanying documentation. So before using a conforming processor, you have to read the manual.
De facto, there is no current Prolog processor (I am aware of) that does not provide arithmetic right shift and does not use 2's complement.
Strictly speaking these are two different questions:
Actual physical representation: this isn't visible at the Prolog level, and therefore the standard quite rightly has nothing to say about it. Note that many Prolog systems have two or more internal representations (e.g. two's complement fixed size and sign+magnitude bignums) but present a single integer type to the programmer.
Results of bitwise operations: while the standard defines these operations, it leaves much of their behaviour implementation defined. This is a consequence of (a) not having a way to specify the width of a bit pattern, and (b) not committing to a specific mapping between negative numbers and bit patterns.
This not only means that all bitwise operations on negative numbers are officially not portable, but also has the curious effect that the result of bitwise negation is totally implementation-defined (even for positive arguments): Y is \1 could legally give -2, 268435454, 2147483646, 9223372036854775806, etc. All you know is that negating twice returns the original number.
In practice, fortunately, there seems to be a consensus towards "The bitwise arithmetic operations behave as if operating on an unlimited length two's complement representation".

Bash - stripping of the last digits of a number - which one is better in terms of semantic?

Consider a this string containing an integer
nanoseconds=$(date +%s%N)
when I want to strip off the last six characters, what would be semantically better?
Stripping just the characters off the string
nanoseconds=$(date +%s%N)
milliseconds=${nanoseconds%??????}
or dividing the value by 1000000
milliseconds=$((nanoseconds / 1000000))
EDIT
Sry for not being clear. It's basically for doing a conversion from nanoseconds to milliseconds. I think I answered my own question...
Both are equivalent, but in general I would consider the former method to be safer. The first method is explicit and does precisely what you want to do: to remove a substring from the back of the string.
The other one is a mathematical operation that relies on correct rounding. Although I cannot imagine where it would fail, I would prefer the first method.
Unless, of course, what you really want is not stripping the last three characters but dividing by 1000 :-)
Post scriptum: hah, of course I know where it would fail. Let value="123". ${value%???} strips the last three digits, as intended, leaving an empty string. $(( value / 1000 )) results in value equal to "0" (a string of length of 1).
EDIT: since we know now that it is not about stripping characters, but rounding, clearly dividing by 1000 is the correct way of approaching the problem :-)
The clearest method when strings are involved is probably string subscription in shells that support it.
s=$(LC_TIME=C date +%s.%N) s=${s::-3}
Fortunately it appears GNU date at least defaults to zero-padding for %N, so division should be reliable. (note that both of these methods are truncation, not rounding).
(( s=(10#$(LC_TIME=C date +%s%N))/1000 ))
If you want to round, you can do a bit better than these using printf
printf -v milliseconds %.6f "$(LC_TIME=C date +%s.%N)"
ksh93's printf supports %N so there's no need for date. The conversion can be automatic. If you have (a modern) ksh available you should definitely use it.
typeset -T MsTime=(
typeset -lF6 .=0
function get {
((.sh.value=$(LC_TIME=C printf '%(%s.%N)T')))
}
)
MsTime milliseconds
print -r "$milliseconds"

Meaning of # in Scheme number literals

DrRacket running R5RS says that 1### is a perfectly valid Scheme number and prints a value of 1000.0. This leads me to believe that the pound signs (#) specify inexactness in a number, but I'm not certain. The spec also says that it is valid syntax for a number literal, but it does not say what those signs mean.
Any ideas as to what the # signs in Scheme number literals signifiy?
The hash syntax was introduced in 1989. There were a discussion on inexact numbers on the Scheme authors mailing list, which contains several nice ideas. Some caught on and some didn't.
http://groups.csail.mit.edu/mac/ftpdir/scheme-mail/HTML/rrrs-1989/msg00178.html
One idea that stuck was introducing the # to stand for an unknown digit.
If you have measurement with two significant digits you can indicate that with 23## that the digits 2 and 3 are known, but that the last digits are unknown. If you write 2300, then you can't see that the two zero aren't to ne trusted. When I saw the syntax I expected 23## to evaluate to 2350, but (I believe) the interpretation is implementation dependent. Many implementation interpret 23## as 2300.
The syntax was formally introduced here:
http://groups.csail.mit.edu/mac/ftpdir/scheme-mail/HTML/rrrs-1989/msg00324.html
EDIT
From http://groups.csail.mit.edu/mac/ftpdir/scheme-reports/r3rs-html/r3rs_8.html#SEC52
An attempt to produce more digits than are available in the internal
machine representation of a number will be marked with a "#" filling
the extra digits. This is not a statement that the implementation
knows or keeps track of the significance of a number, just that the
machine will flag attempts to produce 20 digits of a number that has
only 15 digits of machine representation:
3.14158265358979##### ; (flo 20 (exactness s))
EDIT2
Gerald Jay Sussman writes why the introduced the syntax here:
http://groups.csail.mit.edu/mac/ftpdir/scheme-mail/HTML/rrrs-1994/msg00096.html
Here's the R4RS and R5RS docs regarding numerical constants:
R4RS 6.5.4 Syntax of numerical constants
R5RS 6.2.4 Syntax of numerical constants.
To wit:
If the written representation of a number has no exactness prefix, the constant may be either inexact or exact. It is inexact if it contains a decimal point, an exponent, or a "#" character in the place of a digit, otherwise it is exact.
Not sure they mean anything beyond that, other than 0.

Resources