Need a help for Hive query.
I wrote a Hive query :
select to_date(from_unixtime(epoch)) as date, count1 , count2, count3 from table1 where count3=168
This gives me result as follows:
date count1 count2 count3
7-15-2015 168 3 7
7-15-2015 168 1 5
7-15-2015 168 4 3
and similarly for other dates
....
Finally i need to write a query which returns me, median value of count2 and count3 for each date.
for ex: I need output as:
date count1 count2 count3
7-15-2015 168 3 5
and similarly for other dates
I know i need to use group by date and then write subquery on that.
But I am not able to write correct query.
Can anyone help me in this
The median is the 2nd quartile, 5th decile, and 50th percentile. We can calculate the 50th percentile using percentile function in hive:
select to_date(from_unixtime(epoch)) as date
, count1
, percentile(count2,0.5) as median_ct2
, percentile(count3,0.5) as median_ct3
from table1
where count1=168
group by to_date(from_unixtime(epoch)), count1;
Related
My tables are as below
MS_ISM_ISSUE
ISSUE_ID ISSUE_DUE_DATE ISSUE_SOURCE_TYPE
I1 25-11-2018 1
I2 25-12-2018 1
I3 27-03-2019 2
MS_ISM_SOURCE_SETUP
SOURCE_ID MODULE_NAME
1 IT-Compliance
2 Risk Assessment
I have written following query.
with rs as
(select
count(ISSUE_ID) as ISSUE_COUNT, src.MODULE_NAME,
case
when ISSUE_DUE_DATE<sysdate then 'Overdue'
when ISSUE_DUE_DATE between sysdate and sysdate + 90 then 'Within 3 months'
when ISSUE_DUE_DATE>sysdate+90 then 'Beyond 90 days'
end as date_range
from MS_ISM_ISSUE issue, MS_ISM_SOURCE_SETUP src
where issue.Issue_source_type = src.source_id
group by src.MODULE_NAME, case
when ISSUE_DUE_DATE<sysdate then 'Overdue'
when ISSUE_DUE_DATE between sysdate and sysdate + 90 then 'Within 3 months'
when ISSUE_DUE_DATE>sysdate+90 then 'Beyond 90 days'
end)
select ISSUE_COUNT,MODULE_NAME, DATE_RANGE,
(select count(ISSUE_COUNT) from rs where rs.MODULE_NAME=MODULE_NAME) as total from rs;
The output of the code is as below.
ISSUE_COUNT MODULE_NAME DATE_RANGE Total
1 IT-Compliance Overdue 3
1 IT-Compliance Within 3 months 3
1 Risk Assessment Beyond 90 days 3
The result is correct till 3rd column. In 4th column what I want is, total of Issue count for given module name. Hence in above case Total column will have value as 2 for first and second row (since there are 2 Issues for IT-Compliance) and value 1 for the third row (since one issue is present for Risk Assessment).
Essentially, I want to achieve is to replace current row's MODULE_NAME in last where clause. How do I achieve this using query?
OK, this condition
where rs.MODULE_NAME=MODULE_NAME
is essentially the same as if you wrote
where MODULE_NAME = MODULE_NAME
which is simply always true (if there are no nulls in module_name).
Try using different table alias for inner query and outer query, e.g.
select count(ISSUE_COUNT) from rs rs2 where rs2.MODULE_NAME=rs.MODULE_NAME
You can also try to use analytic function here, something like
select ISSUE_COUNT,
MODULE_NAME,
DATE_RANGE,
COUNT(ISSUE_COUNT) OVER (PARTITION BY RS.MODULE_NAME) AS TOTAL
from rs
instead of your subquery
I have a column MONTHLY_SPEND in the table with data type of NUMBER. I am trying to write a query which will return number of zeros in the column.
e.g..
1000 will return 3
14322 will return 0
1230 will return 1
1254000.65 will return 0
I tried using mod operator and 10 but without the expected result. Any help is appreciated. Please note that database is Oracle and we can't create procedure/function.
select nvl(length(regexp_substr(column, '0+$')), 0) from table;
Here is one way to find
create table spend
(Monthly_spend NUMBER);
Begin
insert into spend values (1000)
insert into spend values (14322)
insert into spend values (1230)
insert into spend values (1254000.65)
End;
This query will for this data :
select Monthly_spend,REGEXP_COUNT(Monthly_spend,0)
from spend
where Monthly_spend not like '%.%' ;
if have one more data like 102 and if it should be zero , then try below query:
select Monthly_spend,case when substr(Monthly_spend,-1,1)=0 THEN REGEXP_COUNT(Monthly_spend,0) ELSE 0 END from spend;
Here is final query for value like 2300120 or 230012000
select Monthly_spend,
case when substr(Monthly_spend,-1,1)=0 and REGEXP_COUNT(trim (0 from Monthly_spend),0)<=0 THEN REGEXP_COUNT(Monthly_spend,0)
when REGEXP_COUNT(trim (0 from Monthly_spend),0)>0 THEN LENGTH(Monthly_spend) - LENGTH(trim (0 from Monthly_spend))
ELSE 0 END from spend;
Output :
1000 3
1254000.65 0
14322 0
1230 1
102 0
2300120 1
230012000 3
You can try this, a simple solution.
select length(to_char(col1))-length(rtrim(to_char(col1), '0')) no_of_trailing_zeros from dual;
select length(to_char('123.120'))-length(rtrim(to_char('123.120'), '0')) no_of_trailing_zeros from dual;
I have a scenario where I need to compare 2 or more dates for given period.
I'm able to succeed when comparing 1 date to a period using between function. But challenge is when I have 2 dates to compare in parallel, getting single row sub query error
select A
from ORDER
where Date1 between sysdate and (sysdate-10)
Above query works fine for single date, please help to get a solution when I have Date 1 and Date 2 and need to compare against the same period (sysdate and (sysdate-10)) and I may have more than 2 dates as well.
Thanks
Shankar
Not having a proper description of your tables or the data they contain, it is difficult to know what you want.
Perhaps something like:
SELECT A
FROM ORDER
GROUP BY A
HAVING COUNT( CASE WHEN datecolumn BETWEEN SYSDATE - 10 AND SYSDATE THEN 1 ELSE NULL END ) > 0
i need to write a query that will calculate difference between last month-end and month-end and difference between last year-end and month-end. I created sample database in sqlfiddle http://sqlfiddle.com/#!4/b9749
In my database the most important date is always the month-end but as you can see in the sample there there are other dates as well but i can't use values from these dates. When i run this query with condidtion that date ='2014-04-30' the result should be like this:
date product amount last_month_diff last_year_end_diff
2014-04-30 a1 350 -150 650
2014-04-30 b1 123 -123 1877
when i run this query with condidtion that date ='2014-05-31' the result should be like this
date product amount last_month_diff last_year_end_diff
2014-05-31 a1 400 -50 600
2014-05-31 b1 500 -377 1500
2014-05-31 c1 200 0 0
and when i run this query with condidtion that date ='2014-06-30' the result should be like this
date product amount last_month_diff last_year_end_diff
2014-06-30 b1 780 -280 1220
2014-06-30 c1 100 100 0
At first i thought i use analytical functions (lag) but i may have many dates between two month-ends and i don't know how to achieve the expected result.
Try something like the bellow.
with input_date as (select to_date('2014-04-30', 'YYYY-MM-DD') d from dual),
sot_tot as (select product,
sum(case when extract(month from date_) = extract(month from d) then amount else 0end) amount,
sum(case when extract(month from date_) = extract(month from last_day(add_months(d, -1))) then amount else 0 end) previous_month_amount
from sot, input_date
where date_ <= d
group by product)
select product, amount, previous_month_amount - amount as previos_month_diff
from sot_tot
I was not able understand what you mean by difference between last month-end and month-end and difference between last year-end and month-end. However the solution will be very similar to the one above and you can play with the sum and case combination to achieve the result you want.
I wonder how do I select a range of data depending on the date range?
I have these data in my payment table in format dd/mm/yyyy
Id Date Amount
1 4/1/2011 300
2 10/1/2011 200
3 27/1/2011 100
4 4/2/2011 300
5 22/2/2011 400
6 1/3/2011 500
7 1/1/2012 600
The closing date is on the 27 of every month. so I would like to group all the data from 27 till 26 of next month into a group.
Meaning to say I would like the output as this.
Group 1
1 4/1/2011 300
2 10/1/2011 200
Group 2
1 27/1/2011 100
2 4/2/2011 300
3 22/2/2011 400
Group 3
1 1/3/2011 500
Group 4
1 1/1/2012 600
It's not clear the context of your qestion. Are you querying a database?
If this is the case, you are asking about datetime but it seems you have a column in string format.
First of all, convert your data in datetime data type (or some equivalent, what db engine are you using?), and then use a grouping criteria like this:
GROUP BY datepart(month, dateadd(day, -26, [datefield])), DATEPART(year, dateadd(day, -26, [datefield]))
EDIT:
So, you are in Linq?
Different language, same logic:
.GroupBy(x => DateTime
.ParseExact(x.Date, "dd/mm/yyyy", CultureInfo.InvariantCulture) //Supposed your date field of string data type
.AddDays(-26)
.ToString("yyyyMM"));
If you are going to do this frequently, it would be worth investing in a table that assigns a unique identifier to each month and the start and end dates:
CREATE TABLE MonthEndings
(
MonthID INTEGER NOT NULL PRIMARY KEY,
StartDate DATE NOT NULL,
EndDate DATE NOT NULL
);
INSERT INTO MonthEndings VALUES(201101, '27/12/2010', '26/01/2011');
INSERT INTO MonthEndings VALUES(201102, '27/01/2011', '26/02/2011');
INSERT INTO MonthEndings VALUES(201103, '27/02/2011', '26/03/2011');
INSERT INTO MonthEndings VALUES(201112, '27/11/2011', '26/01/2012');
You can then group accurately using:
SELECT M.MonthID, P.Id, P.Date, P.Amount
FROM Payments AS P
JOIN MonthEndings AS M ON P.Date BETWEEN M.StartDate and M.EndDate
ORDER BY M.MonthID, P.Date;
Any group headings etc are best handled out of the DBMS - the SQL gets you the data in the correct sequence, and the software retrieving the data presents it to the user.
If you can't translate SQL to LINQ, that makes two of us. Sorry, I have never used LINQ, so I've no idea what is involved.
SELECT *, CASE WHEN datepart(day,date)<27 THEN datepart(month,date)
ELSE datepart(month,date) % 12 + 1 END as group_name
FROM payment