Impala : Running sum of 1 hour - hadoop

I want to count records of each ID with in 1 Hour. I tried out some IMPALA queries but without any luck.
I have input data as follows:
And expected output would be:
I tried :
select
concat(month,'/',day,'/',year,' ',hour,':',minute) time, id,
count(1) over(partition by id order by concat(month,'/',day,'/',year,' ',hour,':',minute) range between '1 hour' PRECEDING AND CURRENT ROW) request
from rt_request
where
concat(year,month,day,hour) >= '2019020318'
group by id, concat(month,'/',day,'/',year,' ',hour,':',minute)
But I got exception.
RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
Any suggestion/help would be appreciated.
Thank you in advance!

I think you are looking for counts for the same hour across days for a given id. You can simply use row_number to do this.
select time,id,row_number() over(partition by id,hour order by concat(month,'/',day,'/',year,' ',hour,':',minute)) as total
from tbl

Related

How to calculate longest period between two specific dates in SQL?

I have problem with the task which looks like that I have a table Warehouse containing a list of items that a company has on stock. This
table contains the columns ItemID, ItemTypeID, InTime and OutTime, where InTime (OutTime)
specifies the point in time where a respective item has entered (left) the warehouse. I have to calculate the longest period that the company has gone without an item entering or leaving the warehouse. I am trying to resolve it this way:
select MAX(OutTime-InTime) from Warehouse where OutTime is not null
Is my understanding correct? Because I believe that it is not ;)
You want the greatest gap between any two consecutive actions (item entering or leaving the warehouse). One method is to unpivot the in and out times to rows, then use lag() to get the date of the "previous" action. The final step is aggregation:
select max(x_time - lag_x_time) max_time_diff
from warehouse w
cross apply (
select x_time, lag(x.x_time) over(order by x.x_time) lag_x_time
from (
select w.in_time as x_time from dual
union all select w.out_time from dual
) x
) x
You can directly perform date calculation in oracle.
The result is calculated in days.
If you want to do it in hours, multiply the result by 24.
To calculate the duration in [day], and check all the information in the table:
SELECT round((OutTime - InTime)) as periodDay, Warehouse .*
FROM Warehouse
WHERE OutTime is not null
ORDER BY periodDay DESC
To calculate the duration in [hour]:
SELECT round((OutTime - InTime)*24) AS periodHour, Warehouse .*
FROM Warehouse
WHERE OutTime is not null
ORDER periodHour DESC
round() is used to remove the digits.
Select only the record with maximum period.
SELECT *
FROM Warehouse
WHERE (OutTime - InTime) =
( SELECT MAX(OutTime - InTime) FROM Warehouse)
Select only the record with maximum period, with the period indicated.
SELECT (OutTime - InTime) AS period, Warehouse.*
FROM Warehouse
WHERE (OutTime - InTime) =
( SELECT MAX(OutTime - InTime) FROM Warehouse)
When finding the longest period, the condition where OutTime is null is not needed.
SQL Server has DateDiff, Oracle you can just take one date away from the other.
The code looks ok. Oracle has a live SQL tool where you can test out queries in your browser that should help you.
https://livesql.oracle.com/

How to SELECT the MAX Time Difference Between Any 2 Consecutive Rows Per Value?

Just had a user answer this correctly for TSQL, but wondering how best to achieve this now in SQL Developer/PLSQL seeing as there is no DATEDIFF function.
Table I want to query on has some 'CODE' values, which can naturally have multiple primary key records ('OccsID') in a table 'Occs'. There is also a datetime column called 'CreateDT' for each OccsID.
Just want to find the maximum possible time variance between any 2 consecutive rows in 'Occs', per 'CODE'.
If you subtract the "next" date and "this" date (using the LEAD analytic function), you'll get the date difference. Then fetch the maximum difference per code. Something like this:
with diff as
(select occsid,
code,
nvl(lead(createdt) over (partition by code order by createdt), createdt) - createdt date_diff
from test
)
select code,
max(date_diff)
from diff
group by code;
Assuming that this T-SQL version works for you (from the prior question)
SELECT x.code, MAX(x.diff_sec) FROM
(
SELECT
code,
DATEDIFF(
SECOND,
CreateDT,
LEAD(CreateDT) OVER(PARTITION BY CODE ORDER BY CreateDT) --next row's createdt
) as diff_sec
FROM Occs
)x
GROUP BY x.code
The simplest option is just to subtract the two dates to get a difference in days. You can then multiply to get the difference in hours, minutes, or seconds
SELECT x.code, MAX(x.diff_day), MAX(x.diff_sec)
FROM
(
SELECT
code,
CreateDT -
LEAD(CreateDT) OVER(PARTITION BY CODE ORDER BY CreateDT) as diff_day,
24*60*60* (CreateDT -
LEAD(CreateDT) OVER(PARTITION BY CODE ORDER BY CreateDT)) as diff_sec
FROM Occs
)x
GROUP BY x.code

Hadoop Hive MAX gives multiple results

I am trying to get a maximum value from a count selecting 2 label srcip and max, but everytime I include srcip I have to use group by srcip at the end and gives me result as the max wasnt even there.
When I write the query like this it gives me the correct max value but I want to select srcip as well.
Select max(count1) as maximum
from (SELECT srcip,count(srcip) as count1 from data group by srcip)t;
But when I do include srcip in the select I get result as there was no max function
Select srcip,max(count1) as maximum
from (SELECT srcip,count(srcip) as count1 from data group by srcip)t
group by srcip;
I would expect from this a single result but I get multiple.
Anyone has any ideas?
You may do ORDER BY count DESC with LIMIT 1 to get the scrip with MAX of count.
SELECT srcip, count(srcip) as count1
from data group by srcip
ORDER BY count1 DESC LIMIT 1
let's consider you have a data like this.
Table
Let's see what happens when you run following query, what happens to data.
Query
SELECT srcip,count(srcip) as count1 from data group by srcip
Output: table1
Now let's see what happens you run your outer query on above table .
Select srcip,max(count1) as maximum from table1 group by srcip
Same Output
Reason being your query says to select srcip and maximum of count from each group of srcip. And we have 3 groups, so 3 rows.
The query below returns exact one row having the max count and the associated scrip. This is the query based on the expected result; you would rather look more into sql and earlier comments, then progress to hive analytical queries.
Some people could argue that there is better way to optimize this query for your expected result but this should give you a motivation to look more into Hive analytical queries.
select scrip, count1 as maximum from (select srcip, count(scrip) over (PARTITION by scrip) as count1, row_number() over (ORDER by scrip desc) as row_num from data) q1 having row_num = 1;

Oracle Moving Average without Current Row

Would like to calculate moving average in oracle for the last 30 records excluding current row.
select crd_nb, flng_month, curr_0380,
avg(curr_0380) over (partition by crd_nb order by flng_month ROWS 30 PRECEDING) mavg
from us_ordered
The above SQL calculates moving average for 30 records including current row.
Is there any way to calculate moving average excluding current row.?
select
crd_nb,
flng_month,
curr_0380,
avg(curr_0380) over (
partition by crd_nb
order by flng_month
ROWS between 30 PRECEDING and 1 preceding
) mavg
from us_ordered
#be_here_now's answer is clearly superior. I'm leaving mine in place nonetheless, as it's still functional, if needlessly complex.
An answer would to be to get the sum and count individually, subtract out the current row then divide the two results. It's a little ugly, but it should work:
SELECT crd_nb,
flng_month,
curr_0380,
( SUM (curr_0380)
OVER (PARTITION BY crd_nb
ORDER BY flng_month ROWS 30 PRECEDING)
- curr_0380)
/ ( COUNT (curr_0380)
OVER (PARTITION BY crd_nb
ORDER BY flng_month ROWS 30 PRECEDING)
- 1)
mavg
FROM us_ordered
If curr_0380 can be null, you'd have to tweak this slightly so that the current row is removed only if it's not null.

Oracle Daily count/average over a year

I'm pulling two pieces of information over a specific time period, but I would like to fetch the daily average of one tag and the daily count of another tag. I'm not sure how to do daily averages over a specific time period, can anyone provide some advice? Below were my first ideas on how to handle this however to change every date would be annoying. Any help is appreciated thanks
SELECT COUNT(distinct chargeno), to_char(chargetime, 'mmddyyyy') AS chargeend
FROM batch_index WHERE plant=1 AND chargetime>to_date('2012-06-18:00:00:00','yyyy-mm-dd:hh24:mi:ss')
AND chargetime<to_date('2012-07-19:00:00:00','yyyy-mm-dd:hh24:mi:ss')
group by chargetime;
The working version of the daily sum
SELECT to_char(bi.chargetime, 'mmddyyyy') as chargtime, SUM(cv.val)*0.0005
FROM Charge_Value cv, batch_index bi WHERE cv.ValueID =97
AND bi.chargetime<=to_date('2012-07-19','yyyy-mm-dd')
AND bi.chargeno = cv.chargeno AND bi.typ=1
group by to_char(bi.chargetime, 'mmddyyyy')
seems like in the first one you want to change the group to the day - not the time... (plus i dont think you need to specify all those 0's for seconds..)
SELECT COUNT(distinct chargeno), to_char(chargetime, 'mmddyyyy') AS chargeend
FROM batch_index WHERE plant=1 AND chargetime>to_date('2012-06-18','yyyy-mm-dd')
AND chargetime<to_date('2012-07-19','yyyy-mm-dd')
group by to_char(chargetime, 'mmddyyyy') ;
not 100% I'm following your question, but if you just want to do aggregates (sums, avg), then do just that. I threw in the rollup just in case that is what you were looking for
with fakeData as(
select trunc(level *.66667) nr
, trunc(2*level * .33478) lvl --these truncs just make the doubles ints
,trunc(sysdate+trunc(level*.263784123)) dte --note the trunc, this gets rid of the to_char to drop the time
from dual
connect by level < 600
) --the cte is just to create fake data
--below is just some aggregates that may help you
select sum(nr) daily_sum_of_nr
, avg(nr) daily_avg_of_nr
, count(distinct lvl) distinct_lvls_per_day
, count(lvl) count_of_nonNull_lvls_per_day
, dte days
from fakeData
group by rollup(dte)
--if you want the query to supply a total for the range, you may use rollup ( http://psoug.org/reference/rollup.html )

Resources