I have :
ID | BRAND_ID | CUST_ID | EXPIRY_DATE | CREATED_DATE
1 1 22 2018-02-02 2018-01-01 00:00:00
2 1 22 2018-02-02 2018-02-02 00:00:00
3 1 22 2019-02-02 2018-02-02 00:05:00
4 1 22 2019-02-02 2018-02-02 00:05:00
5 1 22 2018-02-02 2018-02-02 00:07:00
6 1 22 2018-02-02 2018-02-02 00:07:00
trying to get the last newest records grouping by custid
->groupBy('CUST_ID')
->ordrBy('CREATED_DATE', 'desc')
but i'm getting the first 2 rows when i add groupBy, not getting the last 2
Related
I have code that's partitioning my data for a gaps and island solution. The data itself is reporting on user activity, time spent working, and idle time based on logged timestamps and activities. My code is working great, but every once in a while I have a user_id that logs a string of activities for one application, goes idle, then returns to the same application to log additional activity. Based on my current code, it looks like the user spent nearly two hours in one application when in reality there was significant downtime in the middle. I want to "force" the creation of an island, restarting the partition if there is a lapse of greater than 30 minutes between activities.
ACTIVITY_DATE | USER_ID | APPL_ID | PR1 | PR2
---------------------------------------------------
11/20/2020 10:55 A 9340 1 1
11/20/2020 10:55 A 9340 2 2
11/20/2020 10:58 A 9340 3 3
11/20/2020 10:58 A 9340 4 4
11/20/2020 10:59 A 9340 5 5
11/20/2020 13:09 A 9340 6 6
11/20/2020 13:09 A 9340 7 7
11/20/2020 13:10 A 9340 8 8
11/20/2020 13:10 A 9340 9 9
11/20/2020 17:12 A 8354 10 1
11/20/2020 17:14 A 8354 11 2
11/20/2020 17:14 A 8354 12 3
The final result needs to restart the partition for column PR2 at the sixth row in this example because the gap between logged activities exceeds 30min for the same appl_id:
ACTIVITY_DATE | USER_ID | APPL_ID | PR1 | PR2
---------------------------------------------------
11/20/2020 10:55 A 9340 1 1
11/20/2020 10:55 A 9340 2 2
11/20/2020 10:58 A 9340 3 3
11/20/2020 10:58 A 9340 4 4
11/20/2020 10:59 A 9340 5 5
11/20/2020 13:09 A 9340 6 1
11/20/2020 13:09 A 9340 7 2
11/20/2020 13:10 A 9340 8 3
11/20/2020 13:10 A 9340 9 4
11/20/2020 17:12 A 8354 10 1
11/20/2020 17:14 A 8354 11 2
11/20/2020 17:14 A 8354 12 3
Here's my current code:
select activity_date, user_id, appl_id,
row_number() over(partition by user_id order by activity_date) rn1,
row_number() over(partition by user_id, appl_id order by activity_date) rn2
from
(select
activity_date, user_id, appl_id, count(*)
from mytable tt
where
user_id in ('A', 'B', 'C')
and activity_date >= trunc(sysdate - 4,'DD')
and activity_date <= trunc(sysdate - 3,'DD')
group by
activity_date, user_id, appl_id) tt
You can use MATCH_RECOGNIZE:
SELECT activity_date,
user_id,
appl_id,
pr1,
ROW_NUMBER() OVER ( PARTITION BY user_id, appl_id, mno ORDER BY pr1 )
AS pr2
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY activity_date) AS pr1
FROM table_name t
)
MATCH_RECOGNIZE(
PARTITION BY user_id, appl_id
ORDER BY pr1
MEASURES
MATCH_NUMBER() AS mno
ALL ROWS PER MATCH
PATTERN ( activities* last_activity )
DEFINE activities AS
NEXT(activity_date) <= LAST(activity_date) + INTERVAL '30' MINUTE
)
ORDER BY user_id, pr1;
Which, for the sample data:
CREATE TABLE table_name ( ACTIVITY_DATE, USER_ID, APPL_ID ) AS
SELECT DATE '2020-11-20' + INTERVAL '10:55' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:55' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:58' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:58' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:59' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:09' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:09' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:10' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:10' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '17:12' HOUR TO MINUTE, 'A', 8354 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '17:14' HOUR TO MINUTE, 'A', 8354 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '17:14' HOUR TO MINUTE, 'A', 8354 FROM DUAL;
Outputs:
ACTIVITY_DATE | USER_ID | APPL_ID | PR1 | PR2
:------------------ | :------ | ------: | --: | --:
2020-11-20 10:55:00 | A | 9340 | 1 | 1
2020-11-20 10:55:00 | A | 9340 | 2 | 2
2020-11-20 10:58:00 | A | 9340 | 3 | 3
2020-11-20 10:58:00 | A | 9340 | 4 | 4
2020-11-20 10:59:00 | A | 9340 | 5 | 5
2020-11-20 13:09:00 | A | 9340 | 6 | 1
2020-11-20 13:09:00 | A | 9340 | 7 | 2
2020-11-20 13:10:00 | A | 9340 | 8 | 3
2020-11-20 13:10:00 | A | 9340 | 9 | 4
2020-11-20 17:12:00 | A | 8354 | 10 | 1
2020-11-20 17:14:00 | A | 8354 | 11 | 2
2020-11-20 17:14:00 | A | 8354 | 12 | 3
db<>fiddle here
I have a table as below, and I need select to return the value per minute within the current quarter. For example, if now it's 15:19, I need select to return TIMESTAMP and the value in this quarter between 15:15 and 15:30.
That is, I need select to return the last minutes of the current quarter of an hour. DB is ORACLE.
TIMESTAMP | VALUE
11/11/2019 15:09 | 45
11/11/2019 15:10 | 10
11/11/2019 15:11 | 15
11/11/2019 15:12 | 35
11/11/2019 15:13 | 55
11/11/2019 15:14 | 25
11/11/2019 15:15 | 20
11/11/2019 15:16 | 22
11/11/2019 15:17 | 12
11/11/2019 15:18 | 10
11/11/2019 15:19 | 21
I have tried TRUNC, but no success.
You need trunc by 15 mins.
You can do it using following logic:
Select * from your_table
Where your_timestamp_col
between trunc(systimestamp,'dd') + floor(to_char(systimestamp,'sssss.ff') / 900) / 96
And trunc(systimestamp,'dd') + ceil(to_char(systimestamp,'sssss.ff') / 900) / 96
Here, 900 represent seconds in 15 mins and 96 represents total such quarters in a day (24 hours * 4 quarters = 96)
Cheers!!
How to calculate time difference for each id between current row and next for
dataset below:
time id
2012-03-16 23:50:00 1
2012-03-16 23:56:00 1
2012-03-17 00:08:00 1
2012-03-17 00:10:00 2
2012-03-17 00:12:00 2
2012-03-17 00:20:00 2
2012-03-20 00:43:00 3
and get next result:
time id tdiff
2012-03-16 23:50:00 1 6
2012-03-16 23:56:00 1 12
2012-03-17 00:08:00 1 NA
2012-03-17 00:10:00 2 2
2012-03-17 00:12:00 2 8
2012-03-17 00:20:00 2 NA
2012-03-20 00:43:00 3 NA
I see that you need result in minutes by id. Here is how to do it :
use diff() in groupby :
# first convert to datetime with the right format
data['time']=pd.to_datetime(data.time, format='%Y-%m-%d %H:%M:%S')
data['tdiff']=(data.groupby('id').diff().time.values/60000000000).astype(int)
data['tdiff'][data['tdiff'] < 0] = np.nan
print(data)
output
time id tdiff
0 2012-03-16 23:50:00 1 NaN
1 2012-03-16 23:56:00 1 6.0
2 2012-03-17 00:08:00 1 12.0
3 2012-03-17 00:10:00 2 NaN
4 2012-03-17 00:12:00 2 2.0
5 2012-03-17 00:20:00 2 8.0
6 2012-03-20 00:43:00 3 NaN
I need to capture those specific rows where there is a change in value of a specific column like "Toggle"
I have the below data :
ID ROW Toggle Date
661 1 1 2017-03-01
661 2 1 2017-03-02
661 3 1 2017-03-03
661 4 1 2017-03-04
661 5 1 2017-03-05
661 6 1 2017-03-06
661 7 1 2017-03-07
661 8 1 2017-03-08
661 9 1 2017-03-09
661 10 1 2017-03-10
661 11 1 2017-03-11
661 12 1 2017-03-12
661 13 1 2017-03-13
661 14 1 2017-03-14
661 15 1 2017-03-15
661 16 1 2017-03-16
661 17 1 2017-03-17
661 18 1 2017-03-18
661 19 1 2017-03-19
661 20 1 2017-03-20
661 21 1 2017-03-21
661 22 1 2017-03-22
661 23 1 2017-03-23
661 24 1 2017-03-24
661 25 1 2017-03-25
661 26 1 2017-03-26
661 27 1 2017-03-27
661 28 1 2017-03-28
661 29 1 2017-03-29
661 30 1 2017-03-30
661 31 1 2017-03-31
661 32 1 2017-04-01
661 33 1 2017-04-02
661 34 1 2017-04-03
661 35 1 2017-04-04
661 36 1 2017-04-05
661 37 0 2017-04-06
661 38 0 2017-04-07
661 39 0 2017-04-08
661 40 0 2017-04-09
Query used :
select b.id, b.ROW b.tog, b.ts
from
(select id, ts, tog,
ROW_NUMBER() OVER (order by ts ASC) as ROW
from database.source_table
where id = 661
) b
Can anyone help me with the query so that I can fetch only 1st and 37th row from source table?
Use row_number() + filter. This query will output 1st and 37th row:
select b.id, b.ROW, b.toggle, b.date
from
(select id, date, toggle,
ROW_NUMBER() OVER (partition by id, toggle order by date ASC) as rn,
ROW_NUMBER() OVER (partition by id order by date ASC) as ROW
from test_table
where id = 661
) b
where rn=1
order by date asc
Result:
OK
661 1 1 2017-03-01
661 37 0 2017-04-06
Time taken: 192.38 seconds, Fetched: 2 row(s)
I have the following dataset (table: stk):
S_Date Qty OOS (Out of Stock - 1 true, 0 false)
01/01/2013 0 1
02/01/2013 0 1
03/01/2013 0 1
04/01/2013 5 0
05/01/2013 0 1
06/01/2013 0 1
And what I want is:
S_Date Qty Cumulative_Days_OOS
01/01/2013 0 1
02/01/2013 0 2
03/01/2013 0 3
04/01/2013 5 0 -- No longer out of stock
05/01/2013 0 1
06/01/2013 0 2
The closest I've got so far is the following SQL:
SELECT
S_DATE, QTY,
SUM(OOS) OVER (PARTITION BY OOS ORDER BY S_DATE) CUMLATIVE_DAYS_OOS
FROM
STK
GROUP BY
S_DATE, QTY, OOS
ORDER BY
1
This gives me the following output:
S_Date Qty Cumulative_Days_OOS
01/01/2013 0 1
02/01/2013 0 2
03/01/2013 0 3
04/01/2013 5 0
05/01/2013 0 4
06/01/2013 0 5
It is close to what I want, but understandably, the sum is continued.
Is it possible to reset this cumulative sum and start it again?
I've tried searching around on stackoverflow and google, but I'm not really sure what I should be searching for.
Any help much appreciated.
You need to identify groups of consecutive days where oos = 1 or 0. This can be done by using LAG function to find when oos column changes and then summing over it.
with x (s_date,qty,oos,chg) as (
select s_date,qty,oos,
case when oos = lag(oos,1) over (order by s_date)
then 0
else 1
end
from stk
)
select s_date,qty,oos,
sum(chg) over (order by s_date) grp
from x;
output :
| S_DATE | QTY | OOS | GRP |
|--------------------------------|-----|-----|-----|
| January, 01 2013 00:00:00+0000 | 0 | 1 | 1 |
| January, 02 2013 00:00:00+0000 | 0 | 1 | 1 |
| January, 03 2013 00:00:00+0000 | 0 | 1 | 1 |
| January, 04 2013 00:00:00+0000 | 5 | 0 | 2 |
| January, 05 2013 00:00:00+0000 | 0 | 1 | 3 |
| January, 06 2013 00:00:00+0000 | 0 | 1 | 3 |
Then, you can sum over this oos, partitioned by grp column to get consecutive oos days.
with x (s_date,qty,oos,chg) as (
select s_date,qty,oos,
case when oos = lag(oos,1) over (order by s_date)
then 0
else 1
end
from stk
),
y (s_date,qty,oos,grp) as (
select s_date,qty,oos,
sum(chg) over (order by s_date)
from x
)
select s_date,qty,oos,
sum(oos) over (partition by grp order by s_date) cum_days_oos
from y;
output:
| S_DATE | QTY | OOS | CUM_DAYS_OOS |
|--------------------------------|-----|-----|--------------|
| January, 01 2013 00:00:00+0000 | 0 | 1 | 1 |
| January, 02 2013 00:00:00+0000 | 0 | 1 | 2 |
| January, 03 2013 00:00:00+0000 | 0 | 1 | 3 |
| January, 04 2013 00:00:00+0000 | 5 | 0 | 0 |
| January, 05 2013 00:00:00+0000 | 0 | 1 | 1 |
| January, 06 2013 00:00:00+0000 | 0 | 1 | 2 |
Demo at sqlfiddle.
First we need to divide rows to groups. In this case you can use the count of 0 values before current row as a group number. And then you can use SUM() OVER for these groups. To get 0 for OOS = 0 you can use CASE statement or just OOS*SUM(OOS) as soon as OOS = (0,1)
Something like this:
select T1.*,
OOS*SUM(OOS) OVER (PARTITION BY GRP ORDER BY S_DATE) CUMLATIVE_DAYS_OOS
FROM
(
select T.*,
(select count(*) from STK where S_Date<T.S_Date and OOS=0) GRP
FROM STK T
) T1
ORDER BY S_Date
SQLFiddle demo