Hive is removing decimals after cast? - hadoop

In Hive I was performing some tests with operations and got a behaviour that I could not understand.
While doing that operation, Hive is returning below results with 17 decimals after the comma:
select 500/3260;
> 0.15337423312883436
But when I am triyng to cast it with decimal format, the result is:
select cast(500 as decimal(38,18)) / cast(3260 as decimal(38,18));
> 0.153374
I would like to have 18 decimals, but only 6 are displayed.
Could you please explain me why it is giving this result?
Thank you in advance for your help.

I think it's due to decimal division rules detailed here. The link is for SQL server, but apparently the behaviour shown here is the same.
The scale will be set to 6 if it's greater than 6 and if the integral part is greater than 32. In this case, both integral part and scale would be reduced and resulting type is decimal(38,6). Result might be rounded to 6 decimal places or the overflow error will be thrown if the integral part can't fit into 32 digits.
The SQL server implementation was referenced in this Hive document (page 3).

Related

How to change the precision of a math function in oracle

How can I change a precision of functions ln and power in oracle? I'm getting very precise results - 40 digits. The problem is that I have a huge table, therefore, the calculations are very long, and I don't need that kind of precision. Standard 7 or 16 digits would be fine, and probably would speed up the computation. Note that I'm not asking about round function because it would only change the format of the result, and would not influence the computation itself.
Edit
My real query is complicated, so to keep things simple, let us consider
select ln(2) from dual;
As a result, I'm getting
.6931471805599453094172321214581765680782
whereas, I would like to get, e.g., .69314718, but not by rounding the final result .6931471805599453094172321214581765680782. I want to avoid the computation of those additional digits.
Just trunc the ln to avoid rounding.
select trunc(ln(2),7),ln(2) from dual;
Outputs:
0.6931471 0.693147180559945
It turned out that the argument conversion to binary_double is the perfect solution for my efficiency problems. For binary_double arguments, power and ln functions produce binary_double results. Now, both of my queries are evaluated in couple of minutes instead of 1 hour 15 minutes and 40 minutes.

Visual Foxpro 9 Odd behaviour with large numeric values

Can someone please explain this behaviour and suggest a way around it?
In the command window on VFP 9.
Test 1
a = 7003602346555440
? a
Displays correct value.
Test 2
a = 7003602346555438
? a
Still fine.
Test 3
a = 7003602346555439
? a
Displays incorrect value of 7003602346555440
Test 4
? a=7003602346555439
Returns .T. as you'd expect.
Test 5
? VAL(7003602346555439)
Displays incorrect value of 7003602346555440
Clearly something odd going on with converting the numeric into the textual representation for display, but can anyone suggest a way to avoid this and ensure I always get the correct text version of the numeric?
Source from this article
SUMMARY
Visual FoxPro is documented as having 16 digits of precision. This is an
approximation: the actual maximum exactly representable number is
9007199254740992 (2^53).
MORE INFORMATION
Floating point numbers are stored in 8-byte or 64-bit representations. There are
12 bits of overhead, leaving 52 bits to store the number. There is one more
implied bit that gives you 2^53 as the maximum. The maximum number that can be
stored by Visual FoxPro is 2^1023. The highest power of two that is printed out
exactly using the ? command with the default setting of SET DECIMALS TO 2 is
2^43.
The following code demonstrates this:
SET DECIMALS TO 2
? 2^43 && All digits displayed
? 2^44 && Scientific notation
SET DECIMALS TO 5
? 2^53 && Maximum exact number
? 2^53 - 1 && Correct result
? 2^53 + 1 && Incorrect result: rounded in floating point
? 2^1023 && Cannot display: *'s will be printed
? 2^1022 && Can display
Even though the documentation says that val() will round up after 16 digits, it often rounds up at 16 and above. The example you are showing uses 16 digits and that is causing val() to round up.

How to get a random integer in BigQuery?

I want to get a random integer between 0 and 9 in BigQuery. I tried the classic
SELECT CAST(10*RAND() AS INT64)
but it's producing numbers between 0 and 10
Adding this question as the results might surprise programmers used to CAST doing a TRUNC in most other languages.
Note this weird distribution of results:
Update 2019:
Now you can just do this:
SELECT fhoffa.x.random_int(0,10)
(blog post about persisted UDFs)
To get random integers between 0 and n (9 in this case) you need to FLOOR before CAST:
SELECT CAST(FLOOR(10*RAND()) AS INT64)
This because the SQL Standard doesn't specify if CAST to integer should TRUNC or ROUND the float being casted. BigQuery standard SQL implementation chooses to ROUND, so the classic formula with a CAST won't work as intended. Make sure to specify that you want to FLOOR (or TRUNC) your random number, and then CAST (to get an INT64 instead of a FLOAT).
From the SQL standard:
Whenever an exact or approximate numeric value is assigned to an
exact numeric value site, an approximation of its value that
preserves leading significant digits after rounding or truncating is
represented in the declared type of the target. The value is
converted to have the precision and scale of the target. The choice
of whether to truncate or round is implementation-defined.
https://github.com/twitter/mysql/blob/master/strings/decimal.c#L42
Another option would be
SELECT MOD(CAST(10*RAND() AS INT64), 10)

Hive float data type with unix timestamp

I am using hive external table with float data type (by Mistake) to store Unix epoch timestsamp.
When I issue where clause on hive external table it returns some odd results. As an example.
select event_time from tbl_name where event_time=1478649561 limit 10
Output :
1478649600
Could anyone help me to understand what's wrong in this ?
This is not so much a Hive issue as it is a general issue of floating-point precision. You are seeing the effect of 1478649561 (an integral type) being approximated in floating-point representation to fit into the 32-bit Java float type.
To see this in a simpler form, outside the context of Hive, here is a sample Scala REPL session that converts 1478649561 to a Java Float and prints the result.
scala> printf("%f", new java.lang.Float(1478649561))
1478649600.000000
As you can see, this was approximated to 1478649600. There is no way to avoid this if the data is stored as a 32-bit floating point number.
I highly suggest storing the data in an integral type or the DATE type to avoid this problem. The Apache Hive documentation on Hive Data Types discusses the different data types in detail. If you're interested in digging into the full details of floating-point precision, the Wikipedia Floating point article is a decent starting point.

How to do high precision float point arithmetics in mathematica

In Mma, for example, I want to calculate
1.0492843824838929890231*0.2323432432432432^3
But it does not show the full precision. I tried N or various other functions but none seemed to work. How to achieve this? Many thanks.
When you specify numbers using decimal point, it takes them to have MachinePrecision, roughly 16 digits, hence the results typically have less than 16 meaningful digits. You can do infinite precision by using rational/algebraic numbers. If you want finite precision that's better than default, specify your numbers like this
123.23`100
This makes Mathematica interpret the number as having 100 digits of precision. So you can do
ans=1.0492843824838929890231`100*0.2323432432432432`100^3
Check precision of the final answer using Precision
Precision[ans]
Check tutorial/ArbitraryPrecisionNumbers for more details
You may do:
r[x_]:=Rationalize[x,0];
n = r#1.0492843824838929890231 (r#0.2323432432432432)^3
Out:
228598965838025665886943284771018147212124/17369643723462006556253010609136949809542531
And now, for example
N[n,100]
0.01316083216659453615093767083090600540780118249299143245357391544869\
928014026433963352910151464006549
Sometimes you just want to see more of the machine precision result. These are a few methods.
(1) Put the cursor at the end of the output line, and press Enter (not on the numeric keypad) to copy the output to a new input line, showing all digits.
(2) Use InputForm as in InputForm[1.0/7]
(3) Change the setting of PrintPrecision using the Options Inspector.

Resources