How to implement percentile in Hive? - hadoop

Can anyone please tell me ,how to implement Percentile in Hive?
I tried with percentile function,but not able to get the expected result.
Example code will greatly help.

Use the percentile function, as per the product documentation:
Returns the exact pth percentile of a column in the group (does not work with floating point types). p must be between 0 and 1. NOTE: A true percentile can only be computed for integer values. Use PERCENTILE_APPROX if your input is non-integral.
If you are not able to get 'expected result' then your going to add a lot more detail to your question, as in what is the data, your query, and the expected result.

Related

Google Sheets: Get First Value in Column

I need to find the first value in a column range. I've previously had help to find the last value in a column range, such as:
=INDEX(E4:E188, MATCH(999^99, (E4:E188)))
However I haven't figured out how to convert the above formula to return the first value instead. Any suggestions would be appreciated.
Although I'm not sure whether I could correctly understand your expected result, how about the following sample formula?
Sample formula:
=INDEX(FILTER(E4:E188,E4:E188),1)
Testing:
Reference
FILTER

Dax - how to get average value per day where column x value is y?

I'm trying to figure out how to get average value per day where type is for example of type "Test". This would show
the average value for all rows wich has the type as "Test" and the average of that same day. So all days would
probably get different average values. How would this be done in Dax syntax? Below is a combination of sql
and normal text which might help explain what im trying to archieve.
select average Values per day from mytable where type = test
I'm providing an example table that might make it easier to understand, it's a quite simple question but I
fail to find information on how to solve it, any help or input is highly appreciated, thanks!
You can filter the dataset to calculate the test value average.
AverageTestValues =
CALCULATE ( AVERAGE ( myTable[Values] ), FILTER ( myTable, [Type] = "Test" ) )
This expression calculates the test values average in a given context. If you use the measure in a total context you will get the average of all rows type Test.
Let me know if this helps.

percentile_approx in hive returning zero

I have been trying to check the percentile_approx for a set of users. The intention behind this is to get the top 25% of customers in the data set. So, in order to check that, I ran the following HIVE query.
select percentile_approx(amount, 0.75)
from sales
However, the value returned from this query is 0.0. I am not sure what the problem is. When I run this query over a sample of few records the result is what is expected.
Can anyone please shed some light on this?
Note - I am trying to find the percentile in a data set containing more than 3.3 M records.
select percentile_approx(cast(amount as double), ARRAY(0.75))
from sales
Try this method
Generally percentile_approx() works on integer type data. Please make sure that you have applied this on the column which has integers.

Percentiles for multiple columns

I have a table with around 200-250 columns and I want to compute the percentile for each of these columns.
Hive gives the Function, Percentile(int_exp,p) that returns the pth percentile value of the column int_exp. But it seems redundant to run the same query for rest of the 250 columns. Is there a way I can find the percentile of all columns at one go?
Unfortunately you will have to call the percentile function for each column. One suggestion is that you could dynamically generate this query using some other language (e.g. Java, Ruby, Python, etc.)

Oracle spool Number rounding

I am calculating sum of all sales order (by multiplying quantity and price of a sales order - assume one sale order has only one item and using the sum function) in SQL query and I am spooling the output to a CSV file by using spool C:\scripts\output.csv.
The numeric output I get is truncated/rounded e.g. the SQL output 122393446 is made available in CSV as 122400000.
I tried to google and search on stackoverflow, but I could not get any hints about what can be done to prevent this.
Any clues?
Thanks
I think it is a xls issue.
Save as xls.
format column -> number with 2 decimals for example.
Initially I thought it might have something to do with the width of the number format which normally is 10 (NUMWIDTH) in sqlplus, but your result numeric width is 9, so that can not be the problem. Please check your query if you use a numeric type that doesn't have the required precission, and thus makes inexact calculations.

Resources