Oracle query to find records took more than 24hrs to process - oracle

I am having a situation were I have to find out such records from the tables who takes more than 24 hrs two load in DW.
so for this I am having two tables
Table 1 :- Which contains the stats about each and every load
Table 2 :- Which contains the stats about when we received the each file to load
Now I want only those records which took more than 24 hrs to load.
The date on which I have received a file is in table 2 whereas when its load is finished in in table 1, so table2 may have more than 1 entries for each file.
I have developed a below query but it's taking more time
SELECT
rcd.file_date,
rcd.recived_on as "Date received On",
rcd.loaded_On "Date Processed On",
to_char(rcd.recived_on,'DY') as "Day",
round((rcd.loaded_On - rcd.recived_on)*24,2) as "time required"
FROM (
SELECT
tbl1.file_date,
(SELECT tbl2.recived_on
FROM ( SELECT recived_on
FROM table2
Where fileName = tbl1.feedName
order by recived_on) tbl2
WHERE rownum = 1) recived_on,
tbl1.loaded_On,
to_char(tbl2.recived_on,'DY'),
round((tbl1.loaded_On - tbl2.recived_on)*24,2)
FROM Table1 tbl1 ,
Table1 tbl2
WHERE
tbl1.id=tbl2.id
AND tbl1.FileState = 'Success'
AND trunc(loaded_On) between '25-Feb-2020' AND '03-Mar-2020'
) rcd
WHERE (rcd.loaded_On - rcd.recived_on)*24 > 24;

I think a lot of your problem most likely stems from the use of the subquery in your column list of your inner query. Maybe try using an analytic function instead. Something like this:
SELECT rcd.file_date,
rcd.recived_on AS "Date received On",
rcd.loaded_On "Date Processed On",
to_char(rcd.recived_on, 'DY') AS "Day",
round((rcd.loaded_On - rcd.recived_on) * 24, 2) AS "time required"
FROM (SELECT tbl1.file_date,
MIN(tbl2.recived_on) OVER (PARTITION BY tbl2.filename) AS recived_on,
tbl1.loaded_On
FROM Table1 tbl1
INNER JOIN Table1 tbl2 ON tbl1.id = tbl2.id
WHERE tbl1.FileState = 'Success'
AND trunc(loaded_On) BETWEEN '25-Feb-2020' AND '03-Mar-2020') rcd
WHERE (rcd.loaded_On - rcd.recived_on) * 24 > 24;
Also, you were selecting some columns in the inner query and not using them, so I removed them.

Related

ClickHouse correlated subquery

I've some table. Each date contains data snapshots for 15 days ago and 50 days ahead. If I'll go to my table and run SELECT WHERE date = '2022-01-27' I will get rows with block_date from 2022-01-13 to 2022-03-18. I need to build reports per month, so at date = '2022-01-27' I need to see rows with block_date from 2022-01-01 to 2022-01-31. I can find this missing information if I go back a day.
I tried to run query below to achieve this, but I got an error and according to these docs: https://clickhouse.com/docs/en/sql-reference/operators/exists/ I can't use EXISTS here. How can I modify my query? I don't have any ideas.
SELECT
`date`,
block_date,
total_plan_volume_grp20
FROM schema.table
UNION ALL
SELECT
addDays(`date`, 1),
block_date,
total_plan_volume_grp20
FROM schema.table t1
where exists (
select true
from schema.table t2
where t2.`date` = addDays(t1.`date`, 1))
UNION ALL
SELECT
addDays(`date`, 2),
block_date,
total_plan_volume_grp20
FROM schema.table t1
where exists (
select true
from schema.table t2
where t2.`date` = addDays(t1.`date`, 2)

Trying to limit the results of a group by query to a range of dates

I'm trying to limit a query's results to the latest 14 distinct PROCESS_DATE dates. To do this, I have used a CTE expression to retrieve the latest and earliest dates for the date range
With these 2 values, I would like to plug them into a group by statement so that I will get results between the two dates
But I am getting this error when I run the query in Oracle
ORA-00904: "MAX_PROCESS_DATE": invalid identifier
00904. 00000 - "%s: invalid identifier"
*Cause:
*Action:
Error at Line: 24 Column: 60
Line: 24 Column: 60 is the WHERE PROCESS_DATE >= MIN_PROCESS_DATE AND .... MAX_PROCESS_DATE part of the Group By statement
If this is a wrong way to go about this task, please pardon me and suggest a better query. If it's on the right track, how would I fix it so it will run successfully?
WITH cteQUERYRANGE AS
(
SELECT MAX(PROCESS_DATE) AS MAX_PROCESS_DATE, MIN(PROCESS_DATE) AS MIN_PROCESS_DATE FROM
(
SELECT DISTINCT PROCESS_DATE FROM PAYMENTS
ORDER BY PROCESS_DATE DESC
FETCH FIRST 14 ROWS ONLY
)
)
SELECT PROGRAM_CODE AS PROGRAM, BWE_DATE AS "BWE DATE", PROCESS_DATE AS "PROCESSED DATE", GROSS_AMOUNT AS ENTITLEMENTS, FPUC, LWA
FROM PAYMENTS
WHERE PROCESS_DATE >= MIN_PROCESS_DATE AND PROCESS_DATE <= MAX_PROCESS_DATE
GROUP BY PROGRAM_CODE, BWE_DATE, PROCESS_DATE, GROSS_AMOUNT, FPUC, LWA
ORDER BY PROCESS_DATE DESC;
To me, it looks like this (see 3 comments within code):
WITH ctequeryrange AS(SELECT MAX(process_date) AS max_process_date,
MIN(process_date) AS min_process_date
FROM(SELECT DISTINCT process_date
FROM payments
ORDER BY process_date DESC
FETCH FIRST 14 ROWS ONLY)
)
SELECT DISTINCT program_code AS program, --> distinct
bwe_date AS "BWE DATE",
process_date AS "PROCESSED DATE",
gross_amount AS entitlements,
fpuc,
lwa
FROM payments cross join ctequeryrange --> cross join
WHERE process_date >= min_process_date
AND process_date <= max_process_date
ORDER BY process_date DESC; --> no group by
if you want to use columns from a CTE, you have to "reference" it, somehow. As it returns only one row, cross join is safe
as there are no aggregates in your query, no need to GROUP BY - DISTINCT would do
Though, your fetch first 14 rows won't result in 14 rows (if that was your intention) as CTE itself returns only one row.

Add next unique value to SQL column

I have two tables which I am trying to join based on two criteria. One of the criteria is that a date from t1 is between a date in t2 and the next date in t2. The other is that the name from t1 matches the name from t2.
I.e. if t2 looks like this:
Record Name Date
1 A1234 2016-01-03 04:58:00
2 A1234 2015-12-15 08:34:00
3 A5678 2016-01-04 03:14:00
4 A1234 2016-01-05 21:06:00
Then:
Any records from t1 for Name A1234 with dates between 2016-01-03 04:58:00 and 2016-01-05 21:06:00 would be joined to record 1.
Any records from t1 for Name A1234 with dates between 2015-12-15 08:34:00 and 2016-01-03 04:58:00 would be joined to record 2
Any records from t1 for A1234 after the date of record 4 would be joined to record 4
Any records from t1 for A5678 would be joined to record 3 because there's only one date.
My initial approach is to use a correlated subquery to find the next date. However, due to a large number of records, I determined this would take over a year to execute because it searches all of t2 for the next later date during each iteration. Original SQLite:
CREATE TABLE outputtable AS SELECT * FROM t1, t2 d
WHERE t1.Name = d.Name AND t1.Date BETWEEN d.Date AND (
SELECT * FROM (
SELECT Date from t2
WHERE t2.Name = d.Name
ORDER BY Date ASC )
WHERE Date > d.Date
LIMIT 1 )
Now, I would like to find the next date only once for all records in t2 and create a new column in t2 that contains the next date. This way, I only search for the next date about 400,000 times instead of 56 billion times, significantly improving my performance.
Thus the output of the query I'm looking for would make t2 look like this:
Record Name Date Next_Date
1 A1234 2016-01-03 04:58:00 2016-01-05 21:06:00
2 A1234 2015-12-15 08:34:00 2016-01-03 04:58:00
3 A5678 2016-01-04 03:14:00 2999-12-31 23:59:59
4 A1234 2016-01-05 21:06:00 2999-12-31 23:59:59
Then I would be able to simply query whether t1.Date is between t2.Date and t2.Next_Date.
How can I build a query that will add the next date to a new column in t2?
Rather than add the new column, you should just be able to use a query like the one below to join the tables:
SELECT
T1.*,
T2_1.*
FROM
T1
INNER JOIN T2 T2_1 ON
T2_1.Name = T1.Name AND
T2_1.some_date < T1.some_date
LEFT OUTER JOIN T2 T2_2 ON
T2_2.Name = T1.Name AND
T2_2.some_date > T2_1.some_date
LEFT OUTER JOIN T2 T2_3 ON
T2_3.Name = T1.Name AND
T2_3.some_date > T2_1.some_date AND
T2_3.some_date < T2_2.some_date
WHERE
T2_3.Name IS NULL
You can do the same with NOT EXISTS, but this method often has better performance.
You can speed up (sub)queries by using proper indexes.
To check which indexes are actually used, use EXPLAIN QUERY PLAN.
Your original query, without any indexes, would be executed by SQLite 3.10.0 like this:
0|0|0|SCAN TABLE t1
0|1|1|SEARCH TABLE t2 AS d USING AUTOMATIC COVERING INDEX (name=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SCAN TABLE t2
1|0|0|USE TEMP B-TREE FOR ORDER BY
(The "automatic" index is created temporarily just for this query; the optimizer has estimated that this would still be faster than not using any index.)
In this case, you get the most optimal query plan by indexing all columns used for lookups:
create index i1nd on t1(name, date);
create index i2nd on t2(name, date);
0|0|1|SCAN TABLE t2 AS d
0|1|0|SEARCH TABLE t1 USING INDEX i1nd (name=? AND date>? AND date<?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE t2 USING COVERING INDEX i2nd (name=? AND date>?)
I've used this method on tables with around 1 mil rows with success. Obviously, creating an index that will cover this query will help performance.
This approach uses RANK to create a value to join against. After creating the RANK in a CTE (I use this for readability reasons, please correct for style or personal preference), use a sub-query to join rnk to rnk + 1; aka the next date.
Here's an example of what the code looks like using your sample values.
IF OBJECT_ID('tempdb..#T2') IS NOT NULL
DROP TABLE #T2
CREATE TABLE #T2
(
Record INT NOT NULL PRIMARY KEY,
Name VARCHAR(10),
[DATE] DATETIME,
)
INSERT INTO #T2
VALUES (1, 'A1234', '2016-01-03 04:58:00'),
(2, 'A1234', '2015-12-15 08:34:00'),
(3, 'A5678', '2016-01-04 03:14:00'),
(4, 'A1234', '2016-01-05 21:06:00');
WITH Rank_Dates
AS (Select *
,rank() OVER(PARTITION BY #t2.name ORDER BY #t2.date DESC) AS rnk
FROM #T2)
select RD1.Record,
RD1.Name,
RD1.DATE,
COALESCE (RD2.DATE, '2999-12-31 23:59:59') AS NEXT_DATE
FROM Rank_Dates RD1
LEFT JOIN Rank_Dates RD2
ON RD1.rnk = RD2.rnk + 1
AND RD1.Name = RD2.Name
ORDER BY RD1.Record -- ORDER BY is optional
;
EDIT: added code output below.
The code above produces the following output.
Record Name DATE NEXT_DATE
1 A1234 2016-01-03 04:58:00.000 2016-01-05 21:06:00.000
2 A1234 2015-12-15 08:34:00.000 2016-01-03 04:58:00.000
3 A5678 2016-01-04 03:14:00.000 2999-12-31 23:59:59.000
4 A1234 2016-01-05 21:06:00.000 2999-12-31 23:59:59.000
On a random note. Would using the CURRENT_TIMESTAMP in place of hard coding '2999-12-31 23:59:59.000' produce a similar result?

recursive cte working very slow

I want to Group the rows based on certain columns, i.e. if data is same in these columns in continuous rows, then assign same Group Number to them, and if its changed, assign new one. This become complex as the same data in the columns could appear later in some other rows, so they have to be given another Group Number as they are not in continuous rows with previous group.
I used cte for this purpose and it is giving correct output also, but is so slow that iterating over 75k+ rows takes about 15 minutes. The code I used is:
WITH
cte AS (SELECT ROW_NUMBER () OVER (ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) AS RowNumber,
Opnamenummer, Patient_ID, AfdelingsCode, Opnamedatum, Opnamedatumtijd, Ontslagdatum, Ontslagdatumtijd, IsSpoedopname, OpnameType, IsNuOpgenomen, SpecialismeCode, Specialismen
FROM t_opnames)
SELECT * INTO #ttt FROM cte;
WITH cte2 AS (SELECT TOP 1 RowNumber,
1 AS GroupNumber,
Opnamenummer, Patient_ID, AfdelingsCode, Opnamedatum, Opnamedatumtijd, Ontslagdatum, Ontslagdatumtijd, IsSpoedopname, OpnameType, IsNuOpgenomen, SpecialismeCode, Specialismen
FROM #ttt
ORDER BY RowNumber
UNION ALL
SELECT c1.RowNumber,
CASE
WHEN c2.Afdelingscode <> c1.Afdelingscode
OR c2.Patient_ID <> c1.Patient_ID
OR c2.Opnametype <> c1.Opnametype
THEN c2.GroupNumber + 1
ELSE c2.GroupNumber
END AS GroupNumber,
c1.Opnamenummer,c1.Patient_ID,c1.AfdelingsCode,c1.Opnamedatum,c1.Opnamedatumtijd,c1.Ontslagdatum,c1.Ontslagdatumtijd,c1.IsSpoedopname,c1.OpnameType,c1.IsNuOpgenomen, SpecialismeCode, Specialismen
FROM cte2 c2
JOIN #ttt c1 ON c1.RowNumber = c2.RowNumber + 1
)
SELECT *
FROM cte2
OPTION (MAXRECURSION 0) ;
DROP TABLE #ttt
I tried to improve performance by putting output of cte in a temp table. That increased the performance, but still its too slow. So, how can I increase the performance of this code to run it under 10 seconds for 75k+ records? The output before cancelling the query is: Screenshot. As visible from the image, data is same in columns Afdelingscode,Patient_ID and Opnametype in RowNumber 3,5 and 6, but they have different GroupNumber because of concurrency of the rows.
Without data its not that easy to test but i would try first to not use temporary table and just use both cte from start to end, ie;
;WITH
cte AS (...),
cte2 AS (...)
select * from cte2
OPTION (MAXRECURSION 0);
Without knowing indices etc... for instance, you do a lot of ordering in the first cte. Is this supported by indices (or one multicolumn index) or not?
Without the data i don't have the option to play with it but looking at this:
CASE
WHEN c2.Afdelingscode <> c1.Afdelingscode
OR c2.Patient_ID <> c1.Patient_ID
OR c2.Opnametype <> c1.Opnametype
THEN c2.GroupNumber + 1
ELSE c2.GroupNumber
i would try to take a look at partition by statement in row_number
So try to run this:
WITH
cte AS (
SELECT ROW_NUMBER () OVER (PARTITION BY Afdelingscode , Patient_ID ,Opnametype ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd ) AS RowNumber,
Opnamenummer, Patient_ID, AfdelingsCode, Opnamedatum, Opnamedatumtijd, Ontslagdatum, Ontslagdatumtijd, IsSpoedopname, OpnameType, IsNuOpgenomen
FROM t_opnames)

Aggregate only new rows from source table

I got one Source table with a timestamp column (YYYY.MM.DD HH24:MI:SS) and a target table with aggregated rows on daily basis (Date column: YYYY.MM.DD).
My Problem is: How do I bring new data from source to target and aggregate it?
I tried:
select
a.Sales,
trunc(a.timestamp,'DD') as TIMESTAMP,
count(1) as COUNT,
from
tbl_Source a
where trunc(a.timestamp,'DD') > nvl((select MAX(b.TIME_TO_DAY)from tbl_target b), to_date('01.01.1975 00:00:00','dd.mm.yyyy hh24:mi:ss'))
group by a.sales,
trunc(a.Timestamp,'DD')
The problem with that is: when I have a row with timestamp '2013.11.15 00:01:32' and the max day from target is the 14th of november, it will only aggregate the 15th. Would I use >= instead of > some rows would get loaded twice.
It looks like you are looking for a merge statement: If the day is already present in tbl_target then update the count else insert the record.
merge into tbl_target dest
using
(
select sales, trunc(timestamp) as theday , count(*) as sales_count
from tbl_Source
where trunc(timestamp) >= ( select nvl(max(time_to_day),to_date('01.01.1975','dd.mm.yyyy')) from tbl_target )
group by sales, trunc(timestamp)
) src
on (src.theday = dest.time_to_day)
when matched then update set
dest.sales_count = src.sales_count
when not matched then
insert (time_to_day, sales_count)
values (src.theday, src.sales_count)
;
As far as I can understand your question: you need to get everything since the last reload to target table.
The problem here: you need this date, but it is truncated during the update.
If my guesses are correct you cannot do anything except to store the date of reload as an additional column because there is no way to get it back from the data presented here.
about your query:
count(*) and count(1) are the same in performance (proved many times, at least in 10-11 versions) - do not make this count(1), looks really ugly
do not use nvl, use coalesce instead of it - it is much faster
I would write your query like that:
with t as (select max(b.time_to_day) mx from tbl_target b)
select a.sales,trunc(a.timestamp,'dd') as timestamp,count(*) as count
from tbl_source a,t
where trunc(a.timestamp,'dd') > t.mx or t.mx is null
group by a.sales,trunc(a.timestamp,'dd')
Does this fit your needs:
WHERE trunc(a.timestamp,'DD') > nvl((select MAX(b.TIME_TO_DAY) + 1 - 1/(24*60*60) from tbl_target b), to_date('01.01.1975 00:00:00','dd.mm.yyyy hh24:mi:ss'))
i.e. instead of 2013-11-15 00:00:00 compare to 2013-11-16 23:59:59
Update:
This one?
WHERE trunc(a.timestamp,'DD') BETWEEN nvl((select MAX(b.TIME_TO_DAY) from ...) AND nvl((select MAX(b.TIME_TO_DAY) + 1 - 1/(24*60*60) from ...)

Resources