How can an Oracle NUMBER have a Scale larger than the Precision? - oracle

The documentation states: "Precision can range from 1 to 38. Scale can range from -84 to 127".
How can the scale be larger than the precision? Shouldn't the Scale range from -38 to 38?

The question could be why not ?
Try the following SQL.
select cast(0.0001 as number(2,5)) num,
to_char(cast(0.0001 as number(2,5))) cnum,
dump(cast(0.0001 as number(2,5))) dmp
from dual
What you see is that you can hold small numbers is that sort of structure
It might not be required very often, but I'm sure somewhere there is someone who is storing very precise but very small numbers.

According to Oracle Documentation:
Scale can be greater than precision, most commonly when ex notation is used (wherein decimal part can be so great). When scale is greater than precision, the precision specifies the maximum number of significant digits to the right of the decimal point. For example, a column defined as NUMBER(4,5) requires a zero for the first digit after the decimal point and rounds all values past the fifth digit after the decimal point.
Here's how I see it :
When Precision is greater than Scale (e.g NUMBER(8,5)), no problem, this is straightforward. Precision means the number will have a total of 8 digits, 5 of which are in the fractional part (.→), so the integer part (←.) will have 3 digits. This is easy.
When you see that Precision is smaller than Scale (e.g NUMBER(2, 5)), this means 3 things :
The number will not have any integer part, only fractional part. So the 0 in the integer part is not counted in the calculations, you say .12345 not 0.12345. In fact, if you specify just 1 digit in the integer part, it will always return an error.
The Scale represents the total number of digits in the fractional part that the number will have. 5 in this case. So it can be .12345 or .00098 but no more than 5 digits in total.
The fractional part is divided into 2 parts, significant numbers and zeros. Significant numbers are specified by Precision, and minimum number of zeros equals (Scale - Precision). Example :
here The number will must have a minimum of 3 zeros in the fractional part. followed by 2 significant numbers (could have a zero as well). So 3 zeros + 2 significant numbers = 5 which is the Scale number.
In brief, when you see for example NUMBER(6,9), this tells us that the fractional part will have 9 digits in total, starting by an obligatory 3 zeros and followed by 6 digits.
Here are some examples :
SELECT CAST(.0000123 AS NUMBER(6,9)) FROM dual; -- prints: 0.0000123; .000|012300
SELECT CAST(.000012345 AS NUMBER(6,9)) FROM dual; -- prints: 0.0000123; .000|012345
SELECT CAST(.123456 AS NUMBER(3,4)) FROM dual; -- ERROR! must have a 1 zero (4-3=1)
SELECT CAST(.013579 AS NUMBER(3,4)) FROM dual; -- prints: 0.0136; max 4 digits, .013579 rounded to .0136

Thanks to everyone for the answers. It looks like the precision is the number of significant digits.
select cast(0.000123 as number(2,5)) from dual
results in:
.00012
Where
select cast(0.00123 as number(2,5)) from dual
and
select cast(0.000999 as number(2,5)) from dual
both result in:
ORA-01438: value larger than specified precision allowed for this column
the 2nd one due to rounding.

According to Oracle Documentation:
Scale can be greater than precision, most commonly when e notation is used. When scale is greater than precision, the precision specifies the maximum number of significant digits to the right of the decimal point. For example, a column defined as NUMBER(4,5) requires a zero for the first digit after the decimal point and rounds all values past the fifth digit after the decimal point.
It is good practice to specify the scale and precision of a fixed-point number column for extra integrity checking on input. Specifying scale and precision does not force all values to a fixed length. If a value exceeds the precision, then Oracle returns an error. If a value exceeds the scale, then Oracle rounds it.

The case where Scale is larger than Precision could be summarized this way:
Number of digits on the right of decimal point = Scale
Minimum number of zeroes right of decimal = Scale - Precision
--this will work
select cast(0.123456 as number(5,5)) from dual;
returns 0.12346
-- but this
select cast(0.123456 as number(2,5)) from dual;
--will return "ORA-1438 value too large".
--It will not return err with at least 5-2 = 3 zeroes:
select cast(0.000123456 as number(2,5)) from dual;
returns 0.00012
-- and of course this will work too
select cast(0.0000123456 as number(2,5)) from dual;
returning 0.00001

Hmm as I understand the reference the precision is the count of digits.
maximum precision of 126 binary digits, which is roughly equivalent to 38 decimal digits
In oracle you have type NUMBER(precision,scale) where precision is total number of digits and scale is number of digits right of decimal point. Scale can be omitted, but it means zero. Precision can be unspecified (use i.e. NUMBER(*,10)) - this means total number of digits is as needed, but there are 10 digits right
If the scale is less than zero, the value will be rounded to scale digits left the decimal point.
I think that if you reserve more numbers right of the decimal point than there can be in the whole number, this means something like 0.00000000123456 but I am not 100% sure.

Related

IEEE 754: rationale for format: subnormal and normal numbers

Can someone please clarify:
Why exactly the format of subnormal numbers is ±(0.F) × 2^-126 and not ±(1.F) × 2^-127?
Why exactly the format of normal numbers is: ±(1.F) × 2^exp and not, say, ±(11.F) × 2^exp, or, say, ±(10.F) × 2^exp?
A floating-point format represents numbers using a sign (− or +), an exponent (an integer in some range emin to emax, inclusive), and a significand that is a numeral of p digits in base b, where b is the fixed base for the format and p is called the precision. We will consider a binary format, in which b is two.
Let the digits of the significand be f0, f−1, f−2,… f1−p, so the significand is sum−p<i≤0 fi•bi, and the value represented is (−1)s•be•sum−p<i≤0 fi•bi, where s is a bit for the sign and e is the exponent.
If f0 is zero, we can omit it from the sum, and the value represented equals (−1)s•be•sum−p<i≤−1 fi•bi = (−1)s•be−1•sum−p<i≤−1 fi•bi+1 = (−1)s•be−1•sum1−p<i≤0 fi−1•bi. Therefore, when f0 is zero, and e is not emin, there are two representations of the number. Encoding both of them would be wasteful, so we desire an encoding scheme that does not encode both representations.
We accomplish this:
Some value E encodes the exponent e. The values of s and f−1 to f1−p are stored directly as bits.
If E is zero, e is emin and f0 is zero.
If E is not zero, e is E−bias and f0 is one, where bias is 1−emin.
(A special value of E may be reserved to represent infinities and NaNs, not discussed further here.)
This representation and this encoding scheme answer the questions:
Why exactly the format of subnormal numbers is ±(0.F) × 2− 126 and not ±(1.F) × 2−127?
Subnormal numbers of the form ±(1.F) × 2−127 would fail to include zero and would include numbers not in the represented numbers of the format, as they would have numbers with non-zero digits below that of the lowest non-zero digit in the chosen set. (The lowest digit of the form described in the first paragraph corresponds to bemin+(1−p), whereas numbers in the form ±(1.F) × 2−127 would have a lowest digit corresponding to bemin−1+(1−p).)
Why exactly the format of normal numbers is: ±(1.F) × 2exp and not, say, ±(11.F) × 2exp, or, say, ±(10.F) × 2exp?
Where the decimal point (or “radix point”) lies in the significand is irrelevant, as long as it is fixed. A representation described using the decimal point just after the first digit, as used herein, is equivalent to a representation using decimal point after the last digit or at any other position, with a suitable adjustment to the exponent bounds: The same set of numbers is represented and the arithmetic properties are identical. So, in considering the difference between 1.F and 11.F, we do not care where the decimal point lies. However, we do care about how many digits are represented. A floating-point format uses a representation with a fixed number of digits. 11.F has one more digit than 1.F, and we have no reason to encode that.
As for the difference between 11.F and 10.F, the reason the normal/subnormal distinction exists is because arithmetically there are two representations of the same number if the first digit is zero and the exponent is not at the minimum. Specifying one form as normal form allows us to eliminate these duplicate representations. However 11.F and 10.F represent different numbers, so there is no duplicate to eliminate and no reason to say one of these is normal and the other is not.
I checked the properties of both format using simplified example. For the sake of simplicity I use formats 0.F × 10^-2 and 1.F × 10^-3, where F has 2 decimal digits and there is no ±.
Min (non-zero) / max values:
Format Min value (non-zero) Max value
0.F × 10^-2 0.01 × 10^-2 = 0.0001 0.99 × 10^-2 = 0.0099
1.F × 10^-3 1.00 × 10^-3 = 0.001 9.99 × 10^-3 = 0.00999
Here is the graphical representation:
Here we see that starting from value 0.001 format 1.F × 10^-3 does not allow anymore to represent smaller values. However, format 0.F × 10^-2 allows to represent smaller values. Here is the zoomed-in version:
Conclusion: from the graphical representation we see that the properties of format 0.F × 10^-2 over format 1.F × 10^-3 are:
gives more dynamic range: log10(max_real / min_real): 1.99 vs 0.99
gives less precision: less values can be represented: 100 vs 900
It seems that for subnormals IEEE 754 preferred more dynamic range despite of less precision. Hence, that is why the format of subnormal numbers is ±(0.F) × 2^-126 and not ±(1.F) × 2^-127.

How to compute random bfloat number in Maxima CAS

Maxima Cas random function takes as an input floating point number and gives floating point number as an output.
I need floating point number with more digits, so I use bfloat with increased precision.
I have tried :
random(1.0b0)
bfloat(random(1.0));
The best result was :
bfloat(%pi)/6.000000000000000000000000000000000000000000b0
5.235987755982988730771072305465838140328615665625176368291574320513027343810348331046724708903528447b-1
but it is not random.
One way to generate a random bigfloat is to generate an integer with the appropriate number of bits and then rescale it to get a number in the range 0 to 1.
Note that random(n) returns an integer in the range 0 to n - 1 when n is an integer, therefore: bfloat(random(10^fpprec) / 10^fpprec).

fmt.Printf with width and precision fields in %g behaves unexpectedly

I am trying to get some floats formatted with the same width using fmt.Printf().
For example, given the float values 0.0606060606060606, 0.3333333333333333, 0.05, 0.4 and 0.1818181818181818, I would like to get each value formatted in, say, 10 runes:
0.06060606
0.33333333
0.05
0.4
0.18181818
But I can't understand how it's done. Documentation says that
For floating-point values, width sets the minimum width of the field
and precision sets the number of places after the decimal, if
appropriate, except that for %g/%G it sets the total number of digits.
For example, given 123.45 the format %6.2f prints 123.45 while %.4g
prints 123.5. The default precision for %e and %f is 6; for %g it is
the smallest number of digits necessary to identify the value
uniquely.
So, if I use %f a larger number will not fit in 10-character constraint, therefore %g is required. To get a minimum width of 10 is %10g and to get a maximum number of 9 digits (+1 for the dot) it's %.9g, but combining them in %10.9g is not behaving as I expect
0.0606060606
0.333333333
0.05
0.4
0.181818182
How come I get strings which are of 10 runes, others that are 11 runes and others that are 12 runes?
In particular, it seems that %.9g does not produce 9 digits in total. See for example: http://play.golang.org/p/ie9k8bYC7r
Firstly, we need to understand the documentation correctly:
width sets the minimum width of the field and precision sets the number of places after the decimal, if appropriate, except that for %g/%G it sets the total number of digits.
This line is grammatically correct, but the it in the last part of this sentence is really confusing: it actually refers to the precision, not the width.
Therefore, let's look at some examples:
123.45
12312.2
1.6069
0.6069
0.0006069
and you print it like fmt.Printf("%.4g"), it gives you
123.5
1.231e+04
1.607
0.6069
0.0006069
only 4 digits, excluding all decimal points and exponent. But wait, what happens to the last 2 example? Are you kidding me isn't that more than 5 digits?
This is the confusing part in printing: leading 0s won't be counted as digits, and won't be shrunk when there are less than 4 zeros.
Let's look at 0 behavior using the example below:
package main
import "fmt"
func main() {
fmt.Printf("%.4g\n", 0.12345)
fmt.Printf("%.4g\n", 0.012345)
fmt.Printf("%.4g\n", 0.0012345)
fmt.Printf("%.4g\n", 0.00012345)
fmt.Printf("%.4g\n", 0.000012345)
fmt.Printf("%.4g\n", 0.0000012345)
fmt.Printf("%.4g\n", 0.00000012345)
fmt.Printf("%g\n", 0.12345)
fmt.Printf("%g\n", 0.012345)
fmt.Printf("%g\n", 0.0012345)
fmt.Printf("%g\n", 0.00012345)
fmt.Printf("%g\n", 0.000012345)
fmt.Printf("%g\n", 0.0000012345)
fmt.Printf("%g\n", 0.00000012345)
}
and the output:
0.1235
0.01235
0.001234
0.0001234
1.234e-05
1.234e-06
1.235e-07
0.12345
0.012345
0.0012345
0.00012345
1.2345e-05
1.2345e-06
1.2345e-07
So you could see, when there are less than 4 leading 0s, they will be counted, and be shrunk if there are more than that.
Ok, next thing is the width. From the documentation, width only specifies the minimum width, including decimal place and exponent. Which means, if you have more digits than what width specified, it will shoot out of the width.
Remember, width will be taken account as the last step, which means it needs to first satisfy the precision field.
Let's go back to your case. You specified %10.9g, that means you want a total digit of 9, excluding the leading 0, and a min width of 10 including decimal place and exponent, and the precision should take priority.
0.0606060606060606: take 9 digits without leading 0 will give you 0.0606060606, since it's already 12 width, it passes the min width of 10;
0.3333333333333333: take 9 digits without leading 0 will give you 0.333333333, since it's already 11 width, it passes the min width of 10;
0.05: take 9 digits without leading 0 will give you 0.05, since it's less than width 10, it will pad with another 6 width to get width of 10;
0.4: same as above;
0.1818181818181818: take 9 digits without leading 0 will give you 0.181818182 with rounding, since it's already 11 width, it passes the min width of 10.
So this explains why you got the funny printing.
Yes, I agree: it gives precedence to the "precision fields" not to "width".
So when we need fix columns for printing we need write new formatting func.

How does ORACLE DB sum NUMBER(*,s) with many records?

I am wondering how Oracle sums NUMBER(9,2) with SUM(numWithScale/7).
This is because I am wondering how the error will propagate with a large amount of records
Let's say I have a table EMP_SAL with some EMP_ID, numWithScale, numWithScale being a salary.
To make it simple, let us make the numWithScale column NUMBER(9,2) 9 decimals of precision with 2 decimals to round to. All of these numbers in the table are random digits from 10.00-20.00 (ex. 10.12, 20.00, 19.95)
I divide by 7 in my calculation to give random digits at the end that round up or down.
Now, I sum all of the employees salaries with SUM(numWithScale/7).
Will the sum round each time it adds a record? Or does Oracle round after the calculation is complete? i.e. the error can be +/-0.01 from rounding, and with many additions then roundings, error adds up. Or does it round at the end? Thus I dont have to worry about the error adding up (unless I use the result in many more calculations)
Also, will Oracle return the sum as the more precise NUMBER, (38 digit precision, floating point)? or will it round up to the second digit NUMBER(9,2) when returning the value?
Will MSSQL behave pretty much the same way (even though syntax is different?
Oracle performs operation in the order you specified.
So, if you write this query:
select SUM(numWithScale/7) from some_table -- (1)
each of values divided by 7 and rounded to maximum available precision: NUMBER with 38 significant digits. After that all digits are summed.
In case of this query:
select SUM(numWithScale)/7 from some_table -- (2)
all numWithScale values are summed and only after that divided by 7. In this case there are no precision loss for each record, only result of sum() division by 7 are rounded to 38 significant digits.
This problem are common for calculation algorithms. Each time when you divide value by 7 you produce small calculation error because of limited number of digits, representing a number:
numWithScale/7 => quotient + delta.
While summing this values you got
sum(quotient) + sum(delta).
If numWithScale represents ideal uniform distribution and and a some_table contains infinite number of records, then sum(delta) tends to zero. But it happens only in theory. In practical cases sum(delta) grows and introduces significant error. This is a case of query(1).
On the other hand, summing can't introduce a rounding error if implemented properly. So for query (2) rounding error introduced only in last step, when whole sum divided by 7. Therefore value of delta for this query not affected by number of records.
Number scale and precision is only relevant as column or variable constraint.
When you attempt to store a number that exceeds defined precision it will raise an exception:
create table num (a number(5,2));
insert into num values (123456.789);
=> ORA-01438: value larger than specified precision allowed for this column
When you attempt to store a number that exceeds defined scale it will be rounded:
insert into num values (123.456789);
select a from num;
=> 123.46
Precision and scale do not matter when you read data and perform any calculations on it...
select 100000 + a / 100 from num;
=> 100001.2346
...unless you want to store it back into column with constraints, so above rules apply:
update num set a = a / 100;
select a from num;
=> 1.23
numWithScale/7 will be converted to NUMBER (i.e. it will not be rounded to number(9,2)).

Generating strongly biased random numbers for tests

I want to run tests with randomized inputs and need to generate 'sensible' random
numbers, that is, numbers that match good enough to pass the tested function's
preconditions, but hopefully wreak havoc deeper inside its code.
math.random() (I'm using Lua) produces uniformly distributed random
numbers. Scaling these up will give far more big numbers than small numbers,
and there will be very few integers.
I would like to skew the random numbers (or generate new ones using the old
function as a randomness source) in a way that strongly favors 'simple' numbers,
but will still cover the whole range, i.e., extending up to positive/negative infinity
(or ±1e309 for double). This means:
numbers up to, say, ten should be most common,
integers should be more common than fractions,
numbers ending in 0.5 should be the most common fractions,
followed by 0.25 and 0.75; then 0.125,
and so on.
A different description: Fix a base probability x such that probabilities
will sum to one and define the probability of a number n as xk
where k is the generation in which n is constructed as a surreal
number1. That assigns x to 0, x2 to -1 and +1,
x3 to -2, -1/2, +1/2 and +2, and so on. This
gives a nice description of something close to what I want (it skews a bit too
much), but is near-unusable for computing random numbers. The resulting
distribution is nowhere continuous (it's fractal!), I'm not sure how to
determine the base probability x (I think for infinite precision it would be
zero), and computing numbers based on this by iteration is awfully
slow (spending near-infinite time to construct large numbers).
Does anyone know of a simple approximation that, given a uniformly distributed
randomness source, produces random numbers very roughly distributed as
described above?
I would like to run thousands of randomized tests, quantity/speed is more
important than quality. Still, better numbers mean less inputs get rejected.
Lua has a JIT, so performance is usually not much of an issue. However, jumps based
on randomness will break every prediction, and many calls to math.random()
will be slow, too. This means a closed formula will be better than an
iterative or recursive one.
1 Wikipedia has an article on surreal numbers, with
a nice picture. A surreal number is a pair of two surreal
numbers, i.e. x := {n|m}, and its value is the number in the middle of the
pair, i.e. (for finite numbers) {n|m} = (n+m)/2 (as rational). If one side
of the pair is empty, that's interpreted as increment (or decrement, if right
is empty) by one. If both sides are empty, that's zero. Initially, there are
no numbers, so the only number one can build is 0 := { | }. In generation
two one can build numbers {0| } =: 1 and { |0} =: -1, in three we get
{1| } =: 2, {|1} =: -2, {0|1} =: 1/2 and {-1|0} =: -1/2 (plus some
more complex representations of known numbers, e.g. {-1|1} ? 0). Note that
e.g. 1/3 is never generated by finite numbers because it is an infinite
fraction – the same goes for floats, 1/3 is never represented exactly.
How's this for an algorithm?
Generate a random float in (0, 1) with a library function
Generate a random integral roundoff point according to a desired probability density function (e.g. 0 with probability 0.5, 1 with probability 0.25, 2 with probability 0.125, ...).
'Round' the float by that roundoff point (e.g. floor((float_val << roundoff)+0.5))
Generate a random integral exponent according to another PDF (e.g. 0, 1, 2, 3 with probability 0.1 each, and decreasing thereafter)
Multiply the rounded float by 2exponent.
For a surreal-like decimal expansion, you need a random binary number.
Even bits tell you whether to stop or continue, odd bits tell you whether to go right or left on the tree:
> 0... => 0.0 [50%] Stop
> 100... => -0.5 [<12.5%] Go, Left, Stop
> 110... => 0.5 [<12.5%] Go, Right, Stop
> 11100... => 0.25 [<3.125%] Go, Right, Go, Left, Stop
> 11110... => 0.75 [<3.125%] Go, Right, Go, Right, Stop
> 1110100... => 0.125
> 1110110... => 0.375
> 1111100... => 0.625
> 1111110... => 0.875
One way to quickly generate a random binary number is by looking at the decimal digits in math.random() and replace 0-4 with '1' and 5-9 with '1':
0.8430419054348022
becomes
1000001010001011
which becomes -0.5
0.5513009827118367
becomes
1100001101001011
which becomes 0.25
etc
Haven't done much lua programming, but in Javascript you can do:
Math.random().toString().substring(2).split("").map(
function(digit) { return digit >= "5" ? 1 : 0 }
);
or true binary expansion:
Math.random().toString(2).substring(2)
Not sure which is more genuinely "random" -- you'll need to test it.
You could generate surreal numbers in this way, but most of the results will be decimals in the form a/2^b, with relatively few integers. On Day 3, only 2 integers are produced (-3 and 3) vs. 6 decimals, on Day 4 it is 2 vs. 14, and on Day n it is 2 vs (2^n-2).
If you add two uniform random numbers from math.random(), you get a new distribution which has a "triangle" like distribution (linearly decreasing from the center). Adding 3 or more will get a more 'bell curve' like distribution centered around 0:
math.random() + math.random() + math.random() - 1.5
Dividing by a random number will get a truly wild number:
A/(math.random()+1e-300)
This will return an results between A and (theoretically) A*1e+300,
though my tests show that 50% of the time the results are between A and 2*A
and about 75% of the time between A and 4*A.
Putting them together, we get:
round(6*(math.random()+math.random()+math.random() - 1.5)/(math.random()+1e-300))
This has over 70% of the number returned between -9 and 9 with a few big numbers popping up rarely.
Note that the average and sum of this distribution will tend to diverge towards a large negative or positive number, because the more times you run it, the more likely it is for a small number in the denominator to cause the number to "blow up" to a large number such as 147,967 or -194,137.
See gist for sample code.
Josh
You can immediately calculate the nth born surreal number.
Example, the 1000th Surreal number is:
convert to binary:
1000 dec = 1111101000 bin
1's become pluses and 0's minuses:
1111101000
+++++-+---
The first '1' bit is 0 value, the next set of similar numbers is +1 (for 1's) or -1 (for 0's), then the value is 1/2, 1/4, 1/8, etc for each subsequent bit.
1 1 1 1 1 0 1 0 0 0
+ + + + + - + - - -
0 1 1 1 1 h h h h h
+0+1+1+1+1-1/2+1/4-1/8-1/16-1/32
= 3+17/32
= 113/32
= 3.53125
The binary length in bits of this representation is equal to the day on which that number was born.
Left and right numbers of a surreal number are the binary representation with its tail stripped back to the last 0 or 1 respectively.
Surreal numbers have an even distribution between -1 and 1 where half of the numbers created to a particular day will exist. 1/4 of the numbers exists evenly distributed between -2 to -1 and 1 to 2 and so on. The max range will be negative to positive integers matching the number of days you provide. The numbers go to infinity slowly because each day only adds one to the negative and positive ranges and days contain twice as many numbers as the last.
Edit:
A good name for this bit representation is "sinary"
Negative numbers are transpositions. ex:
100010101001101s -> negative number (always start 10...)
111101010110010s -> positive number (always start 01...)
and we notice that all bits flip accept the first one which is a transposition.
Nan is => 0s (since all other numbers start with 1), which makes it ideal for representation in bit registers in a computer since leading zeros are required (we don't make ternary computer anymore... too bad)
All Conway surreal algebra can be done on these number without needing to convert to binary or decimal.
The sinary format can be seem as a one plus a simple one's counter with a 2's complement decimal representation attached.
Here is an incomplete report on finary (similar to sinary): https://github.com/peawormsworth/tools/blob/master/finary/Fine%20binary.ipynb

Resources