How to get a random integer in BigQuery? - random

I want to get a random integer between 0 and 9 in BigQuery. I tried the classic
SELECT CAST(10*RAND() AS INT64)
but it's producing numbers between 0 and 10
Adding this question as the results might surprise programmers used to CAST doing a TRUNC in most other languages.
Note this weird distribution of results:

Update 2019:
Now you can just do this:
SELECT fhoffa.x.random_int(0,10)
(blog post about persisted UDFs)
To get random integers between 0 and n (9 in this case) you need to FLOOR before CAST:
SELECT CAST(FLOOR(10*RAND()) AS INT64)
This because the SQL Standard doesn't specify if CAST to integer should TRUNC or ROUND the float being casted. BigQuery standard SQL implementation chooses to ROUND, so the classic formula with a CAST won't work as intended. Make sure to specify that you want to FLOOR (or TRUNC) your random number, and then CAST (to get an INT64 instead of a FLOAT).
From the SQL standard:
Whenever an exact or approximate numeric value is assigned to an
exact numeric value site, an approximation of its value that
preserves leading significant digits after rounding or truncating is
represented in the declared type of the target. The value is
converted to have the precision and scale of the target. The choice
of whether to truncate or round is implementation-defined.
https://github.com/twitter/mysql/blob/master/strings/decimal.c#L42

Another option would be
SELECT MOD(CAST(10*RAND() AS INT64), 10)

Related

Storage, computation, and unpacking multiple Boolean values in Oracle and PowerBI

In my work, I often need to translate the data I find in the database into Pass/Fail or True/False values. For example, we may want to know if a patient's blood pressure was taken during a particular visit, with the only possible values being "True" or "False". In fact, there are dozens of such fields. Additionally, we actually don't need to do all of these things at every visit. There is another set of matching Boolean values that indicate if something, such as a blood pressure, was required.
I have researched this problem before and came up with little. Oracle does not support a Boolean datatype. However, I recently learned that Oracle DOES support bit-wise operators (logical AND, OR, XOR, NOT, etc.) and learned that the best way to store the data and use these operators is to:
Use the RAW datatype to store a large count of bits. Each bit can correspond to a particular real-world concept such as "Blood Pressure Done" or "Height Measurement Done".
Use bit-wise operators in the Oracle package UTL_RAW to do calculations on multiple RAW values to derive results such as "what was required AND done".
I have not yet determined that I want to go all the way down this rabbit-hole. But if I do, I have one challenge, yet unsolved and that is how do I unpack the RAW values into individual truth values, elegantly? Oracle will display the values only in hexadecimal, which is not convenient when we want to see the actual bits. It would be nice if I could carry out this operation in SQL for testing purposes. It is also necessary that I do this in Power BI so the results are formatted for the customer's needs. I could write a function, if one does not exist yet.
In resolving this challenge, I wish to not increase the size of the solution considerably. I am dealing with millions of rows and wish to have the space-savings from using the RAW datatype (storing each value as a single bit) but also have an output layer of unpacked bits into the required dozens of True-False columns for the customer's needs in seeing the details.
I feel like this problem has been present for me since I began working on these kinds of business problems over ten years ago. Certainly, I am not the only analyst who wonders: Has it been solved yet?
Edit 4/28:
I completed the Oracle-side of the work. I implemented a package with the following header file (I don't think I can show all the code in the body without permission from my employer, but feel free to ask for a peek).
With the Oracle-side of the project wrapped up, I have yet to figure out how to unpack these RAW values (called 'Binary' in Power BI) into their individual bits within Power BI. I need a visualization that will carry out a "bits to columns" operation on the fly or something like that.
Also, it would be nice to have an aggregation of a column of RAWs based upon a single bit position, so we can for example determine what percentage of the rows have a particular bit set to 1 without explicitly transforming all the data into columns, with one column per bit.
CREATE OR REPLACE PACKAGE bool AS
byte_count CONSTANT INTEGER := 8; --Must be an integer <= 2000
nibble_count CONSTANT INTEGER := byte_count * 2;
--construct a bitstring with a single '1', or with all zeros (pass 0).
FUNCTION raw_bitstring ( bit_to_set_to_one_in INTEGER ) RETURN RAW;
--visualize the bits as zeros and ones in a varchar2 field
FUNCTION binary_string ( input_in RAW ) RETURN VARCHAR2;
--takes an input RAW and sets a single bit to zero or one
--and return the altered RAW.
FUNCTION set_bit ( raw_bitstring_in RAW,
bit_loc_in INTEGER,
set_to_in INTEGER ) RETURN RAW;
--returns the value (0 or 1) of the indicated bit as an INTEGER.
FUNCTION bit_to_integer ( raw_bitstring_in RAW, bit_loc_in INTEGER) RETURN INTEGER;
--counts all the ones 1's in a RAW and returns the count as an INTEGER
FUNCTION bit_sum (raw_bitstring_in IN RAW) RETURN INTEGER;
END bool;
'''

What is the purpose of arbitrary precision constants in Go?

Go features untyped exact numeric constants with arbitrary size and precision. The spec requires all compilers to support integers to at least 256 bits, and floats to at least 272 bits (256 bits for the mantissa and 16 bits for the exponent). So compilers are required to faithfully and exactly represent expressions like this:
const (
PI = 3.1415926535897932384626433832795028841971
Prime256 = 84028154888444252871881479176271707868370175636848156449781508641811196133203
)
This is interesting...and yet I cannot find any way to actually use any such constant that exceeds the maximum precision of the 64 bit concrete types int64, uint64, float64, complex128 (which is just a pair of float64 values). Even the standard library big number types big.Int and big.Float cannot be initialized from large numeric constants -- they must instead be deserialized from string constants or other expressions.
The underlying mechanics are fairly obvious: the constants exist only at compile time, and must be coerced to some value representable at runtime to be used at runtime. They are a language construct that exists only in code and during compilation. You cannot retrieve the raw value of a constant at runtime; it is is not stored at some address in the compiled program itself.
So the question remains: Why does the language make such a point of supporting enormous constants when they cannot be used in practice?
TLDR; Go's arbitrary precision constants give you the possibility to work with "real" numbers and not with "boxed" numbers, so "artifacts" like overflow, underflow, infinity corner cases are relieved. You have the possibility to work with higher precision, and only the result have to be converted to limited-precision, mitigating the effect of intermediate errors.
The Go Blog: Constants: (emphasizes are mine answering your question)
Numeric constants live in an arbitrary-precision numeric space; they are just regular numbers. But when they are assigned to a variable the value must be able to fit in the destination. We can declare a constant with a very large value:
const Huge = 1e1000
—that's just a number, after all—but we can't assign it or even print it. This statement won't even compile:
fmt.Println(Huge)
The error is, "constant 1.00000e+1000 overflows float64", which is true. But Huge might be useful: we can use it in expressions with other constants and use the value of those expressions if the result can be represented in the range of a float64. The statement,
fmt.Println(Huge / 1e999)
prints 10, as one would expect.
In a related way, floating-point constants may have very high precision, so that arithmetic involving them is more accurate. The constants defined in the math package are given with many more digits than are available in a float64. Here is the definition of math.Pi:
Pi = 3.14159265358979323846264338327950288419716939937510582097494459
When that value is assigned to a variable, some of the precision will be lost; the assignment will create the float64 (or float32) value closest to the high-precision value. This snippet
pi := math.Pi
fmt.Println(pi)
prints 3.141592653589793.
Having so many digits available means that calculations like Pi/2 or other more intricate evaluations can carry more precision until the result is assigned, making calculations involving constants easier to write without losing precision. It also means that there is no occasion in which the floating-point corner cases like infinities, soft underflows, and NaNs arise in constant expressions. (Division by a constant zero is a compile-time error, and when everything is a number there's no such thing as "not a number".)
See related: How does Go perform arithmetic on constants?

Convert a long character field to numeric, NOT scientific notation (SAS)

I need to join two tables - one table has householdid which is CHAR30, which appears to have center alignment and the other householdid as numeric 20. I need to convert to the numeric 20 but when I do that it appears truncated, perhaps because of the strange alignment (not all of the 30 positions are actually needed).
When I try to keep the full 30 positions as a numeric I instead get a conversion to scientific notation so of course this will not work as a key id for later operations.
As long as the number is converted properly, it doesn't matter what format it has. A format just tells SAS how to show you the number. Behind the scenes, it is just a DOUBLE.
1.0 = 1 = 1e0
Now if you have converted to a number and cannot get a join, then look at the informat you used to read it in.
try
num_id = input(strip(char_id),best32.);
Strip removes leading and trailing blanks. The BEST32. INFORMAT tries its "best" to read the number up to 32 characters in length.
You cannot store a 20 digit number as a numeric in SAS. SAS stores all numbers as 8 byte floating point and so does not have enough bits to represent that many digits uniquely. You can ask SAS what is the largest integer it can represent exactly by using the CONSTANT() function.
1 data _null_;
2 x=constant('EXACTINT',8);
3 put x = comma32. ;
4 run;
x=9,007,199,254,740,992
Read and store your 20 and 30 digit strings as character variables.
Use the bestd32. format. Tends to work out pretty well for long key variables. Depending on the length of the variable, you can change 32 to whichever length you need.
Based on the comments under the original question, the only thing you can do is convert all ID fields to strings, and use the strings to do the joins. #Reeza suggested this in one of the comments but it should have been posted as an answer.
I assume you are pulling this information out of another database/system that allows for greater numeric precision then SAS does. If you don't convert the values to strings when they are read into SAS, then you run the risk of losing precision.
If you lose precision, the ID in SAS is likely to become very slightly different to the ID in the original system, which can cause problems when searching the original system for an ID obtained from SAS.
Be sure you don't read the numbers into SAS as numeric, then convert to string. If you do it this way you are still losing precision as soon as the numbers are stored in SAS as numeric variables.

How to convert fixed-point VHDL type back to float?

I am using IEEE fixed point package in VHDL.
It works well, but I now facing a problem concerning their string representation in a test bench : I would like to dump them in a text file.
I have found that it is indeed possible to directly write ufixed or sfixed using :
write(buf, to_string(x)); --where x is either sfixed or ufixed (and buf : line)
But then I get values like 11110001.10101 (for sfixed q8.5 representation).
So my question : how to convert back these fixed point numbers to reals (and then to string) ?
The variable needs to be split into two std-logic-vector parts, the integer part can be converted to a string using standard conversion, but for the fraction part the string conversion is a bit different. For the integer part you need to use a loop and divide by 10 and convert the modulo remainder into ascii character, building up from the lower digit to the higher digit. For the fractional part it also need a loop but one needs to multiply by 10 take the floor and isolate this digit to get the corresponding character, then that integer is used to be substracted to the fraction number, etc. This is the concept, worked in MATLAB to test and making a vhdl version I will share soon. I was surprised not to find such useful function anywhere. Of course fixed-point format can vary Q(N,M) N and M can have all sorts of values, while for floating point, it is standardized.

Oracle Floats vs Number

I'm seeing conflicting references in Oracles documentation. Is there any difference between how decimals are stored in a FLOAT and a NUMBER types in the database?
As I recall from C, et al, a float has accuracy limitations that an int doesn't have. R.g., For 'float's, 0.1(Base 10) is approximated as 0.110011001100110011001101(Base 2) which equals roughtly something like 0.100000001490116119384765625 (Base 10). However, for 'int's, 5(Base 10) is exactly 101(Base 2).
Which is why the following won't terminate as expected in C:
float i;
i = 0;
for (i=0; i != 10; )
{
i += 0.1
}
However I see elsewhere in Oracle's documentation that FLOAT has been defined as a NUMBER. And as I understand it, Oracle's implementation of the NUMBER type does not run into the same problem as C's float.
So, what's the real story here? Has Oracle deviated from the norm of what I expect to happen with floats/FLOATs?
(I'm sure it's a bee-fart-in-a-hurricane of difference for what I'll be using them for, but I know I'm going to have questions if 0.1*10 comes out to 1.00000000000000001)
Oracle's BINARY_FLOAT stores the data internally using IEEE 754 floating-point representation, like C and many other languages do. When you fetch them from the database, and typically store them in an IEEE 754 data type in the host language, it's able to copy the value without transforming it.
Whereas Oracle's FLOAT data type is a synonym for the ANSI SQL NUMERIC data type, called NUMBER in Oracle. This is an exact numeric, a scaled decimal data type that doesn't have the rounding behavior of IEEE 754. But if you fetch these values from the database and put them into a C or Java float, you can lose precision during this step.
The Oracle BINARY_FLOAT and BINARY_DOUBLE are mostly equivalent to the IEEE 754 standard but they are definitely not stored internally in the standard IEEE 754 representation.
For example, a BINARY_DOUBLE takes 9 bytes of storage vs. IEEE's 8. Also the double floating number -3.0 is represented as 3F-F7-FF-FF-FF-FF-FF-FF which if you use real IEEE would be C0-08-00-00-00-00-00-00. Notice that bit 63 is 0 in the Oracle representation while it is 1 in the IEEE one (if 's' is the sign bit, according to IEEE, the sign of the number is (-1)^s). See the very good IEEE 754 calculators at http://babbage.cs.qc.cuny.edu/IEEE-754/
You can easily find this if you have a BINARY__DOUBLE column BD in table T with the query:
select BD,DUMP(BD) from T
Now all of that is fine and interesting (maybe) but when one works in C and gets a numeric value from Oracle (by binding a variable to a numeric column of any kind), one typically gets the result in a real IEEE double as is supported by C. Now this value is subject to all of the usual IEEE inaccuracies.
If one wants to do precise arithmetic one can either do it in PL/SQL or using special precise-arithmetic C libraries.
For Oracle's own explanation of their numeric data types see: http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/datatype.htm#i16209
Oracle's Number is in fact a Decimal (base-10) floating point representation...
Float is just an alias for Number and does the exact same thing.
if you want Binary (base-2) floats, you need to use Oracle's BINARY_FLOAT or BINARY_DOUBLE datatypes.
link text
Bill's answer about Oracle's FLOAT is only correct to late version(say 11i), in Oracle 8i, the document says:
You can specify floating-point numbers with the form discussed in
"NUMBER Datatype". Oracle also supports the ANSI datatype FLOAT. You
can specify this datatype using one of these syntactic forms:
FLOAT specifies a floating-point number with decimal precision 38, or
binary precision 126. FLOAT(b) specifies a floating-point number with
binary precision b. The precision b can range from 1 to 126. To
convert from binary to decimal precision, multiply b by 0.30103. To
convert from decimal to binary precision, multiply the decimal
precision by 3.32193. The maximum of 126 digits of binary precision is
roughly equivalent to 38 digits of decimal precision.
It sounds like a Quadruple precision(126 binary precision). If I am not mistaken, IEEE754 only requires b = 2, p = 24 for single precision and p = 53 for double precision. The differences between 8i an 11i caused a lot of confusion when I was looking into a conversion plan between Oracle and PostgreSQL.
Like the PLS_INTEGER mentioned previously, the BINARY_FLOAT and BINARY_DOUBLE types in Oracle 10g use machine arithmetic and require less storage space, both of which make them more efficient than the NUMBER type
ONLY BINARY_FLOAT and BINARY_DOUBLE supports NAN values
-not precise calculations

Resources