I have a table similar to the below. My goal is to remove groups for each date where the status moves to either 'Cancelled' or 'Failed', while retaining groups per day that contain other status changes.
Group
Status
Date
A
Pending
2021-01-01 08:00:00
A
Cancelled
2021-01-01 13:00:00
A
Pending
2021-01-02 08:00:00
A
Failed
2021-01-02 13:00:00
A
Pending
2021-01-03 08:00:00
A
Pending Settlement
2021-01-03 13:00:00
A
Pending
2021-01-04 08:00:00
A
Settled
2021-01-04 13:00:00
B
Pending
2021-01-01 08:00:00
B
Cancelled
2021-01-01 13:00:00
B
Pending
2021-01-02 08:00:00
B
Failed
2021-01-02 13:00:00
B
Pending
2021-01-03 08:00:00
B
Pending Settlement
2021-01-03 13:00:00
B
Pending
2021-01-04 08:00:00
B
Settled
2021-01-04 13:00:00
My first attempt was something like:
select GROUP, STATUS, DATE
from TABLE TBL
, (
select GROUP, STATUS, DATE
from TABLE
where STATUS in ('Cancelled','Failed')
) FLAG
where (TBL.GROUP <> FLAG.GROUP and TBL.DATE <> FLAG.DATE)
;
My expected output is shown below EDIT:, however it seems to be taking exceptionally long (>10 mins) even when applying date filters:
Group
Status
Date
A
Pending
2021-01-03 08:00:00
A
Pending Settlement
2021-01-03 13:00:00
A
Pending
2021-01-04 08:00:00
A
Settled
2021-01-04 13:00:00
B
Pending
2021-01-03 08:00:00
B
Pending Settlement
2021-01-03 13:00:00
B
Pending
2021-01-04 08:00:00
B
Settled
2021-01-04 13:00:00
You may use the last_value() window function to get the last value within a group and then apply your filter against it.
SELECT "GROUP",
"STATUS",
"DATE"
FROM (SELECT "GROUP",
"STATUS",
"DATE",
last_value("STATUS") OVER (PARTITION BY "GROUP",
trunc("DATE")
ORDER BY "DATE" ASC) lv
FROM "TABLE") x
WHERE lv NOT IN ('Cancelled',
'Failed');
Edit:
To filter out days where the status was 'Cancelled' or 'Failed' anytime during the day, you can use for example the windowed version of of count() with a CASE expression that gives a non-NULL value when the status is 'Cancelled' or 'Failed', or NULL (the default) otherwise.
SELECT "GROUP",
"STATUS",
"DATE"
FROM (SELECT "GROUP",
"STATUS",
"DATE",
count(CASE
WHEN "STATUS" IN ('Cancelled',
'Failed') THEN
0
END) OVER (PARTITION BY "GROUP",
trunc("DATE")) c
FROM "TABLE") x
WHERE c = 0;
Related
Imagine a table with the following rows
order
date
1
2021-01-01 00:00:00
2
2021-01-01 01:00:00
3
2021-01-01 02:00:00
4
2021-01-01 03:00:00
5
2021-01-02 00:00:00
6
2021-01-03 00:00:00
7
2021-01-04 00:00:00
8
2021-01-04 01:00:00
9
2021-01-04 02:00:00
10
2021-01-06 00:00:00
I am using a cursor to loop through a database result one by one:
cursor = conn.parse("SELECT * FROM table ORDER BY date ASC")
while result = cursor.fetch_hash()
# Do some processing on the result
# ...
end
I want to store the last processed row date into a variable to be used later for a different part of code. My solution is to do so:
last_date = ""
cursor = conn.parse("SELECT * FROM table ORDER BY date ASC")
while result = cursor.fetch()
last_date = result["date"]
# Do some processing on the result
# ...
end
Each iteration last_date is being updated and once the iteration ends it stores 2021-01-06 00:00:00, which is what I want.
I'm curious is there a better or more elegant way of doing this? I can't think of an alternative.
I've a table with employees and their birth date, in a column in a format string.
I cannot modify the table, so I created a view to get their birth date in a real date format (TO_DATE).
Now, I would like to get the list of the employees having theirs birthday in the last 15 days and the employees who'll have theirs birthday in the next 15 days.
So, just based with the Day and the month.
I successfully get for exemple all employees bornt in April with "Extract", but, I'm sure you've already understand, when I'll run the query the 25 April, I'd like the futures birthday in May.
How could I get that (oracle 12c)
Thank you 🙂
Using the hiredate column in table scott.emp for testing:
select empno, ename, hiredate
from scott.emp
where add_months(trunc(hiredate),
12 * round(months_between(sysdate, hiredate) / 12))
between trunc(sysdate) - 15 and trunc(sysdate) + 15
;
EMPNO ENAME HIREDATE
---------- ---------- ----------
7566 JONES 04/02/1981
7698 BLAKE 05/01/1981
7788 SCOTT 04/19/1987
This will produce the wrong result in the following situation: if someone's birthday is Feb. 28 in a non-leap year, their birthday in a leap year (calculated with the ADD_MONTHS function in the query) will be considered to be Feb. 29. So, they will be excluded if running the query on, say, Feb. 13 2024 (even though they should be included), and they will be included if running the query on March 14 (even though they should be excluded). If you can live with this - those people will be recognized in the wrong window, once every four years - then this may be all you need. Otherwise that situation will require further tweaking.
For people born on Feb. 29 (in a leap year, obviously), their birthday in a non-leap-year is considered to be Feb. 28. With this convention, the query will always work correctly for them. Whether this convention is appropriate in your locale, only your business users can tell you. (Local laws and regulations may matter, too - depending on what you are using this for.)
You can use ddd format model:
DDD - Day of year (1-366).
For example:
SQL> with v(dt) as (
2 select date'2020-01-01'+level-1 from dual connect by date'2020-01-01'+level-1<date'2021-01-01'
3 )
4 select *
5 from v
6 where
7 not abs(
8 to_number(to_char(date'&dt','ddd'))
9 -to_number(to_char(dt ,'ddd'))
10 ) between 15 and 350;
Enter value for dt: 2022-01-03
DT
-------------------
2020-01-01 00:00:00
2020-01-02 00:00:00
2020-01-03 00:00:00
2020-01-04 00:00:00
2020-01-05 00:00:00
2020-01-06 00:00:00
2020-01-07 00:00:00
2020-01-08 00:00:00
2020-01-09 00:00:00
2020-01-10 00:00:00
2020-01-11 00:00:00
2020-01-12 00:00:00
2020-01-13 00:00:00
2020-01-14 00:00:00
2020-01-15 00:00:00
2020-01-16 00:00:00
2020-01-17 00:00:00
2020-12-19 00:00:00
2020-12-20 00:00:00
2020-12-21 00:00:00
2020-12-22 00:00:00
2020-12-23 00:00:00
2020-12-24 00:00:00
2020-12-25 00:00:00
2020-12-26 00:00:00
2020-12-27 00:00:00
2020-12-28 00:00:00
2020-12-29 00:00:00
2020-12-30 00:00:00
2020-12-31 00:00:00
30 rows selected.
NB: This example doesn't analyze leap years.
Similar to mathguy's answer, but translating the current date back to the birth year (rather than translating the birth year forwards):
SELECT *
FROM employees
WHERE birth_date BETWEEN ADD_MONTHS(
TRUNC(SYSDATE),
ROUND(MONTHS_BETWEEN(birth_date, SYSDATE)/12)*12
) - INTERVAL '15' DAY
AND ADD_MONTHS(
TRUNC(SYSDATE),
ROUND(MONTHS_BETWEEN(birth_date, SYSDATE)/12)*12
) + INTERVAL '15' DAY;
Then, for the sample data:
CREATE TABLE employees (name, birth_date) AS
SELECT 'Alice', DATE '2020-02-28' FROM DUAL UNION ALL
SELECT 'Betty', DATE '2020-02-29' FROM DUAL UNION ALL
SELECT 'Carol', DATE '2021-02-28' FROM DUAL UNION ALL
SELECT 'Debra', DATE '2022-04-28' FROM DUAL UNION ALL
SELECT 'Emily', DATE '2021-03-30' FROM DUAL UNION ALL
SELECT 'Fiona', DATE '2021-03-31' FROM DUAL;
If today's date is 2022-04-16 then the output is:
NAME
BIRTH_DATE
Debra
28-APR-22
If today's date is 2022-03-15 then the output is:
NAME
BIRTH_DATE
Betty
29-FEB-20
Carol
28-FEB-21
Emily
30-MAR-21
And would get values from 28th February - 30th March in a non-leap-year and from 29th February - 30th March in a leap year.
db<>fiddle here
This is presentation table:
ID PRESENTATIONDAY PRESENTATIONSTART PRESENTATIONEND PRESENTATIONSTARTDATE PRESENTATIONENDDATE
622 Monday 12:00:00 02:00:00 01-05-2016 04-06-2016
623 Tuesday 12:00:00 02:00:00 01-05-2016 04-06-2016
624 Wednesday 08:00:00 10:00:00 01-05-2016 04-06-2016
625 Thursday 10:00:00 12:00:00 01-05-2016 04-06-2016
I would like to insert availabledate in schedule table. This is my current query :
insert into SCHEDULE (studentID,studentName,projectTitle,supervisorID,
supervisorName,examinerID,examinerName,exavailableID,
availableday,availablestart,availableend,
availabledate) //PROBLEM STARTS HERE
values (?,?,?,?,?,?,?,?,?,?,?,?));
The value availabledate are retrieved based on the exavailableID
. For example, if exavailableID = 2, the availableday = Monday, availablestart= 12pm, availableend = 2pm.
The dates will only be chosen only between PRESENTATIONSTARTDATE to PRESENTATIONENDDATE from presentation table.
In presentation table, it will match PRESENTATIONDAY, PRESENTATIONDATESTART and PRESENTATIONDATEEND with availableday, availablestart and availableend to get a list of all possible dates.
This is the query to get list of all possible dates based on particular days:
select
A.PRESENTATIONID,
A.PRESENTATIONDAY,
A.PRESENTATIONDATESTART+delta LIST_DATE
from
PRESENTATION A,
(
select level-1 as delta
from dual
connect by level-1 <= (
select max(PRESENTATIONDATEEND- PRESENTATIONDATESTART) from PRESENTATION
)
)
where A.PRESENTATIONDATESTART+delta <= A.PRESENTATIONDATEEND
and
a.presentationday = trim(to_char(A.PRESENTATIONDATESTART+delta, 'Day'))
order by 1,2,3;
This query result is:
622 Monday 02-05-2016 12:00:00
...
622 Monday 30-05-2016 12:00:00
623 Tuesday 03-05-2016 12:00:00
...
623 Tuesday 31-05-2016 12:00:00
624 Wednesday 04-05-2016 12:00:00
...
624 Wednesday 01-06-2016 12:00:00
625 Thursday 05-05-2016 12:00:00
...
625 Thursday 02-06-2016 12:00:00
It will automatically assign dates from the SELECT query to be inserted in schedule table. However, each date can be used only 4 times. Once it reached 4 times, it will proceed to next date. For example, if Monday, '02-05-2016' to '09-05-2016'
How can I corporate these two queries (INSERT and SELECT) to have a result like this:
StudentName projectTitle SupervisorID ExaminerID availableday availablestart availableend availabledate
abc Hello 1024 1001 MONDAY 12.00pm 2.00pm 02-05-2016
def Hi 1024 1001 MONDAY 12.00pm 2.00pm 02-05-2016
ghi Hey 1002 1004 MONDAY 12.00pm 2.00pm 02-05-2016
xxx hhh 1020 1011 MONDAY 12.00pm 2.00pm 02-05-2016
jkl hhh 1027 1010 MONDAY 12.00pm 2.00pm 09-05-2016
try ttt 1001 1011 MONDAY 12.00pm 2.00pm 09-05-2016
654 bbb 1007 1012 MONDAY 12.00pm 2.00pm 09-05-2016
gyg 888 1027 1051 MONDAY 12.00pm 2.00pm 09-05-2016
yyi 333 1004 1022 TUESDAY 12.00pm 2.00pm 03-05-2016
fff 111 1027 1041 TUESDAY ..
ggg 222 1032 1007 TUESDAY .. .. .. ..
hhh 444 1007 1001 TUESDAY 12.00pm 2.00pm 03-05-2016
and so on :)
In short, I would like to use the list of dates from presentation table based on the day, start time and end time to insertion query where each date will only used 4 times. Thank you!
I am not sure this kind of syntax works with oracle (and have no good way to check), but changing the select part of insert like this may or may not work.
select
A.PRESENTATIONID,
A.PRESENTATIONDAY,
A.PRESENTATIONDATESTART+delta LIST_DATE
from
PRESENTATION A,
(
select level-1 as delta
from dual
connect by level-1 <= (
select max(PRESENTATIONDATEEND - PRESENTATIONDATESTART) from PRESENTATION
)
),
--MIGHT NEED ADDITIONAL LOGIC FOR THE EXAVAILABLEID COMPARISON
(SELECT count(S.*) as counter FROM SCHEDULE S WHERE S.EXAVAILABLEID=A.ID) C
where A.PRESENTATIONDATESTART+delta <= A.PRESENTATIONDATEEND
and
a.presentationday = trim(to_char(A.PRESENTATIONDATESTART+delta, 'Day'))
and
C.counter<4
order by 1,2,3;
EDIT: Changed the operator. Had >= before. Placed teh WHERE check at the right place. Deleted aliases.
EDIT2: changed the syntax to where the counter select statement is a part of the from clause.
I have a scenario where in I have to aggregate data for a dynamic 24 hour period.
For eg: If a user selects the FROM date as Jan 05 2016 8:00 AM and TO date as Jan 10 2016 2:00 AM data in the output should be aggregated from Jan 05 2016 8:00 AM to Jan 06 2016 7:59 AM as 1 day (Jan 05 2016).
Jan 5 2016 - Jan 5 2016 8:00 AM to Jan 6 2016 7:59 AM
Jan 6 2016 - Jan 6 2016 8:00 AM to Jan 7 2016 7:59 AM
Jan 7 2016 - Jan 7 2016 8:00 AM to Jan 8 2016 7:59 AM
Jan 8 2016 - Jan 8 2016 8:00 AM to Jan 9 2016 7:59 AM
Jan 9 2016 - Jan 9 2016 8:00 AM to Jan 10 2016 2:00 AM
To achieve this, I subtracted 8 hours from the date column in the fact table and joined it to the Date Dimension. The query looks like this:
SELECT D.DAY_FMT,SUM(F.MEASURE) from FACT F
INNER JOIN DATES D ON
to_number(to_char((F.DATESTIME - 0.3333333),'YYYYMMDD')) = D.DATEID
WHERE F.DATESTIME between to_timestamp ('05-Jan-16 08.00.00.000000000 AM')
and to_timestamp ('10-Jan-16 02.00.00.000000000 AM')
GROUP BY D.DAY_FMT
Note 1: If the From Time is 06:00 AM then we would be subtracting 0.25 (days) instead of 0.3333333 (days)
Note 2: The Fact table has billions of rows.
Is there any way to improve the performance of the above query?
In Oracle the date and the time are stored together. You don't need to join on equality, and you don't need to wrap the date within any functions. (And why timestamps?) Having all the computations (if any are even needed) on the "right hand side" of conditions means the computations are done just once, the same for every row, instead of separately for each row.
select f.day_fmt, sum(f.measure) as some_col_name
from fact f inner join dates d
on f.datestime >= to_date('05-Jan-16 08:00:00 AM', 'dd-Mon-yy hh:mi:ss AM')
and f.datestime < to_date('10-Jan-16 02:00:00 AM', 'dd-Mon-yy hh:mi:ss AM')
group by day_fmt;
Edit: Based on further clarification from OP - suppose the data is in table "fact" - with columns day_fmt, measure, and datestime. The assignment is to aggregate (sum) measure, grouped by day_fmt and also grouped by 24-hour intervals, starting from a date-time chosen by the user and ending with a date-time chosen by the user. Solution below.
with user_input (sd, ed) as (
select to_date('05-Jan-16 08:00:00 AM', 'dd-Mon-yy hh:mi:ss AM'),
to_date('10-Jan-16 02:00:00 AM', 'dd-Mon-yy hh:mi:ss AM') from dual
),
prep (dt) as (
select (select sd from user_input) + level - 1 from dual
connect by level < (select ed - sd from user_input) + 1
union
select ed from user_input
),
dates (from_date, to_date) as (
select dt, lead(dt) over (order by dt) from prep
)
select f.day_fmt, d.from_datetime, d.to_datetime, sum(f.measure) as some_column_name
from fact f inner join dates d
on f.datestime >= d.from_datetime and f.datestime < d.to_datetime
where to_datetime is not null
group by f.day_fmt, d.from_datetime, f.to_datetime
order by f.day_fmt, d.from_datetime;
By not using function calls wrapped around f.datestime, you can take advantage of an index defined on this column of the "fact" table (an index you already have or one you can create now, to help speed up your queries).
I have below data:
Order
order_id order_name order_date order_status
1 iphone 20130102 13:20:00 cancelled
1 blackberry 20130102 13:00:00 cancelled
1 ipad 20130102 13:00:00 cancelled
Person
person_id person_name order_id
1 harshini 1
I want to retrieve the below data when i query based on order_date between 20130102 13:00:00 to 2013 13:20:00.It means last cancel order.
person_name order_name order_date
harshini blackberry 20130102 13:00:00
just try with this...select p.person_name ,o.order_name,o.order_status from order_1 o,person p where orderdate=(select max(orderdate) from order_1)
select person_name, order_name, order_date
from(
select
o.order_id, o.order_name, o.order_date, o.order_status,
p.person_id, p.person_name, p.order_id,
row_number() over (/*partition by person_id*/ order by order_date desc) as rnk
from order o join person p on (o.order_id = p.order_id)
where o.order_status = 'canceled'
and o.order_date between
to_date('20130102 13:00:00','yyyymmdd hh24:mi:ss') and
to_date('20130102 13:00:00','yyyymmdd hh24:mi:ss')
)
where rnk = 1;
But see the comments, you have some issues in your design.