recursive cte working very slow - performance

I want to Group the rows based on certain columns, i.e. if data is same in these columns in continuous rows, then assign same Group Number to them, and if its changed, assign new one. This become complex as the same data in the columns could appear later in some other rows, so they have to be given another Group Number as they are not in continuous rows with previous group.
I used cte for this purpose and it is giving correct output also, but is so slow that iterating over 75k+ rows takes about 15 minutes. The code I used is:
WITH
cte AS (SELECT ROW_NUMBER () OVER (ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) AS RowNumber,
Opnamenummer, Patient_ID, AfdelingsCode, Opnamedatum, Opnamedatumtijd, Ontslagdatum, Ontslagdatumtijd, IsSpoedopname, OpnameType, IsNuOpgenomen, SpecialismeCode, Specialismen
FROM t_opnames)
SELECT * INTO #ttt FROM cte;
WITH cte2 AS (SELECT TOP 1 RowNumber,
1 AS GroupNumber,
Opnamenummer, Patient_ID, AfdelingsCode, Opnamedatum, Opnamedatumtijd, Ontslagdatum, Ontslagdatumtijd, IsSpoedopname, OpnameType, IsNuOpgenomen, SpecialismeCode, Specialismen
FROM #ttt
ORDER BY RowNumber
UNION ALL
SELECT c1.RowNumber,
CASE
WHEN c2.Afdelingscode <> c1.Afdelingscode
OR c2.Patient_ID <> c1.Patient_ID
OR c2.Opnametype <> c1.Opnametype
THEN c2.GroupNumber + 1
ELSE c2.GroupNumber
END AS GroupNumber,
c1.Opnamenummer,c1.Patient_ID,c1.AfdelingsCode,c1.Opnamedatum,c1.Opnamedatumtijd,c1.Ontslagdatum,c1.Ontslagdatumtijd,c1.IsSpoedopname,c1.OpnameType,c1.IsNuOpgenomen, SpecialismeCode, Specialismen
FROM cte2 c2
JOIN #ttt c1 ON c1.RowNumber = c2.RowNumber + 1
)
SELECT *
FROM cte2
OPTION (MAXRECURSION 0) ;
DROP TABLE #ttt
I tried to improve performance by putting output of cte in a temp table. That increased the performance, but still its too slow. So, how can I increase the performance of this code to run it under 10 seconds for 75k+ records? The output before cancelling the query is: Screenshot. As visible from the image, data is same in columns Afdelingscode,Patient_ID and Opnametype in RowNumber 3,5 and 6, but they have different GroupNumber because of concurrency of the rows.

Without data its not that easy to test but i would try first to not use temporary table and just use both cte from start to end, ie;
;WITH
cte AS (...),
cte2 AS (...)
select * from cte2
OPTION (MAXRECURSION 0);
Without knowing indices etc... for instance, you do a lot of ordering in the first cte. Is this supported by indices (or one multicolumn index) or not?
Without the data i don't have the option to play with it but looking at this:
CASE
WHEN c2.Afdelingscode <> c1.Afdelingscode
OR c2.Patient_ID <> c1.Patient_ID
OR c2.Opnametype <> c1.Opnametype
THEN c2.GroupNumber + 1
ELSE c2.GroupNumber
i would try to take a look at partition by statement in row_number
So try to run this:
WITH
cte AS (
SELECT ROW_NUMBER () OVER (PARTITION BY Afdelingscode , Patient_ID ,Opnametype ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd ) AS RowNumber,
Opnamenummer, Patient_ID, AfdelingsCode, Opnamedatum, Opnamedatumtijd, Ontslagdatum, Ontslagdatumtijd, IsSpoedopname, OpnameType, IsNuOpgenomen
FROM t_opnames)

Related

Trying to display top 3 amount from a table using sql query in oracle 11g..column is of varchar type

Am trying to list top 3 records from atable based on some amount stored in a column FTE_TMUSD which is of varchar datatype
below is the query i tried
SELECT *FROM
(
SELECT * FROM FSE_TM_ENTRY
ORDER BY FTE_TMUSD desc
)
WHERE rownum <= 3
ORDER BY FTE_TMUSD DESC ;
o/p i got
972,9680,963 -->FTE_TMUSD values which are not displayed in desc
I am expecting an o/p which will display the top 3 records of values
That should work; inline view is ordered by FTE_TMUSD in descending order, and you're selecting values from it.
What looks suspicious are values you specified as the result. It appears that FTE_TMUSD's datatype is VARCHAR2 (ah, yes - it is, you said so). It means that values are sorted as strings, not numbers - and it seems that you expect numbers. So, apply TO_NUMBER to that column. Note that it'll fail if column contains anything but numbers (for example, if there's a value 972C).
Also, an alternative to your query might be use of analytic functions, such as row_number:
with temp as
(select f.*,
row_number() over (order by to_number(f.fte_tmusd) desc) rn
from fse_tm_entry f
)
select *
from temp
where rn <= 3;

Passing a parameter to a WITH clause query in Oracle

I'm wondering if it's possible to pass one or more parameters to a WITH clause query; in a very simple way, doing something like this (taht, obviously, is not working!):
with qq(a) as (
select a+1 as increment
from dual
)
select qq.increment
from qq(10); -- should get 11
Of course, the use I'm going to do is much more complicated, since the with clause should be in a subquery, and the parameter I'd pass are values taken from the main query....details upon request... ;-)
Thanks for any hint
OK.....here's the whole deal:
select appu.* from
(<quite a complex query here>) appu
where not exists
(select 1
from dual
where appu.ORA_APP IN
(select slot from
(select distinct slots.inizio,slots.fine from
(
with
params as (select 1900 fine from dual)
--params as (select app.ora_fine_attivita fine
-- where app.cod_agenda = appu.AGE
-- and app.ora_fine_attivita = appu.fine_fascia
--and app.data_appuntamento = appu.dataapp
--)
,
Intervals (inizio, EDM) as
( select 1700, 20 from dual
union all
select inizio+EDM, EDM from Intervals join params on
(inizio <= fine)
)
select * from Intervals join params on (inizio <= fine)
) slots
) slots
where slots.slot <= slots.fine
)
order by 1,2,3;
Without going in too deep details, the where condition should remove those records where 'appu.ORA_APP' match one of the records that are supposed to be created in the (outer) 'slots' table.
The constants used in the example are good for a subset of records (a single 'appu.AGE' value), that's why I should parametrize it, in order to use the commented 'params' table (to be replicated, then, in the 'Intervals' table.
I know thats not simple to analyze from scratch, but I tried to make it as clear as possible; feel free to ask for a numeric example if needed....
Thanks

how can i make this merge/ update statement more efficient , its taking too much time

MERGE INTO ////////1 GFO
USING
(SELECT *
FROM
(SELECT facto/////rid,
p-Id,
PRE/////EDATE,
RU//MODE,
cre///date,
ROW_NUMBER() OVER (PARTITION BY facto/////id ORDER BY cre///te DESC) col
FROM ///////////2
) x
WHERE x.col = 1) UFD
ON (GFO.FACTO-/////RID=UFD.FACTO////RID)
WHEN MATCHED THEN UPDATE
SET
GFO.PRE////DATE=UFD.PRE//////DATE
WHERE UFD.CRE/////DATE IS NOT NULL
AND UFD.RU//MODE= 'S'
AND GFO.P////ID=:2
hi every1, my above merge statement is taking too long , it has to run 40 times on table 1 using table2 each having 4millions plus records, for 40 different p--id, please suggest more efficient way as currently its taking 40+ minutes.
its updating only one colummn using a column from table2.t
i am unable to execute the query, its returning
Error: cannot fetch last explain plan from PLAN_TABLE
EXPLAIN PLAN IMAGE
HERE IS THE SCREENSHOT OF EXPLAIN PLAN
cost
The shown plan seems to by OK, the observed problem stems from the LOOP over P_ID that do not scale.
I assume you performs something like this (strongly simplified) - assuming the P_ID to be processed are in table TAB_PID
begin
for cur in (select p_id from tab_pid) loop
merge INTO tab1 USING tab2 ON (tab1.r_id = tab2.r_id)
WHEN MATCHED THEN
UPDATE SET tab1.col1=tab2.col1 WHERE p_id = cur.p_id;
end loop;
end;
/
HASH JOIN on large tables (in NO PARALLEL mode) with elapsed time 60 seconds is not a catastrophic result. But looping 40 times makes your 40 minutes.
So I'd sugesst to try to integrate the loop in the MERGE statement, without knowing details something like this (mayby you'll need also ajdust the MERGE JOIN condition).
merge INTO tab1 USING tab2 ON (tab1.r_id = tab2.r_id)
WHEN MATCHED THEN
UPDATE SET tab1.col1=tab2.col1
WHERE p_id in (select p_id from tab_pid);

Need to make a query more efficient

I have a query which I need to make more efficient.
I am breaking it down into sections to see where the efficiency floors are, I currently have a few Nested Select statements, are these a performance problem?
Here is an example of one of them:
SELECT AgreementID,
DueDate,
UpdatedAmountDue AS AmountDue,
COALESCE((SELECT SUM(UpdatedAmountDue)
FROM RepaymentBreakdown AS B
WHERE CONVERT(datetime, CONVERT(varchar, DueDate, 103), 103) <=
CONVERT(datetime, CONVERT(varchar, R.DueDate, 103), 103)
AND B.AgreementID = R.AgreementID),0) AS DueTD,
RN=ROW_NUMBER() OVER (Partition BY R.AgreementID ORDER BY DueDate)
FROM RepaymentBreakdown AS R
Is there a more clean and efficient way of getting the data of DueTD?
Basically, for each line of a repayment schedule result, I want to get:
AgreementID,
DueDate,
AmountDue,
AmountDueToDate (DueTD)
RowNumber.
The table I am querying is structured as follows:
AgreementID (int),
DueDate (datetime),
AmountDue (decimal(9,2)),
UpdatedAmountDue (decimal(9,2))*
*UpdatedAmountDue is always referenced as it is the moving figure, AmountDue is always fixed, as a reference value.
So, I think you could get performance boost just by removing convert, like this:
select
AgreementID,
DueDate,
UpdatedAmountDue as AmountDue,
(
select sum(B.UpdatedAmountDue)
from RepaymentBreakdown as B
where B.DueDate <= R.DueDate and B.AgreementID = R.AgreementID
) as UpdatedAmountDue
from RepaymentBreakdown AS R
The fastest way I know to calculate running total in SQL Server 2008 would be to use recursive CTE, see my answer here Calculate a Running Total in SqlServer. In your case the query would be smth like this:
create table #t (....., primary key (AgreementID, ord))
insert into #t (AgreementID, DueDate, UpdatedAmountDue, ord)
select AgreementID, DueDate, UpdatedAmountDue, row_number() over (partition by AgreementID, DueDate order by DueDate asc)
;with
CTE_RunningTotal
as
(
select T.ord, T.AgreementID, T.DueDate, T.UpdatedAmountDue as T.AmountDue, T.UpdatedAmountDue
from #t as T
where T.ord = 1
union all
select T.ord, T.AgreementID, T.DueDate, T.UpdatedAmountDue as T.AmountDue, T.UpdatedAmountDue + C.UpdatedAmountDue as UpdatedAmountDue
from CTE_RunningTotal as C
inner join #t as T on T.ord = C.ord + 1 and T.AgreementID = C.AgreementID
)
select AgreementID, DueDate, AmountDue, UpdatedAmountDue
from CTE_RunningTotal as C
option (maxrecursion 0)
Your conversion of the datetime to a date has several issues.
First, it is not guaranteed to always produce correct results depending on your servers language settings. If you need to do String manipulation on a datetime value always use CONVERT(,,126).
But more importantly, it prevents index usage. Instead use CAST(DueDate AS DATE) as the optimizer recognizes that conversion to be index-safe.
Afterwards you might want to add an index on AgreementId,DueDate and either INCLUDE UpdatedAmountDue or better make it clustered.
Assuming UpdatedAmountDue cannot be NULL, you can get rid of the COALESCE too, as the sum always includes the current row.

How to get records randomly from the oracle database?

I need to select rows randomly from an Oracle DB.
Ex: Assume a table with 100 rows, how I can randomly return 20 of those records from the entire 100 rows.
SELECT *
FROM (
SELECT *
FROM table
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum < 21;
SAMPLE() is not guaranteed to give you exactly 20 rows, but might be suitable (and may perform significantly better than a full query + sort-by-random for large tables):
SELECT *
FROM table SAMPLE(20);
Note: the 20 here is an approximate percentage, not the number of rows desired. In this case, since you have 100 rows, to get approximately 20 rows you ask for a 20% sample.
SELECT * FROM table SAMPLE(10) WHERE ROWNUM <= 20;
This is more efficient as it doesn't need to sort the Table.
SELECT column FROM
( SELECT column, dbms_random.value FROM table ORDER BY 2 )
where rownum <= 20;
In summary, two ways were introduced
1) using order by DBMS_RANDOM.VALUE clause
2) using sample([%]) function
The first way has advantage in 'CORRECTNESS' which means you will never fail get result if it actually exists, while in the second way you may get no result even though it has cases satisfying the query condition since information is reduced during sampling.
The second way has advantage in 'EFFICIENT' which mean you will get result faster and give light load to your database.
I was given an warning from DBA that my query using the first way gives loads to the database
You can choose one of two ways according to your interest!
In case of huge tables standard way with sorting by dbms_random.value is not effective because you need to scan whole table and dbms_random.value is pretty slow function and requires context switches. For such cases, there are 3 additional methods:
1: Use sample clause:
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
for example:
select *
from s1 sample block(1)
order by dbms_random.value
fetch first 1 rows only
ie get 1% of all blocks, then sort them randomly and return just 1 row.
2: if you have an index/primary key on the column with normal distribution, you can get min and max values, get random value in this range and get first row with a value greater or equal than that randomly generated value.
Example:
--big table with 1 mln rows with primary key on ID with normal distribution:
Create table s1(id primary key,padding) as
select level, rpad('x',100,'x')
from dual
connect by level<=1e6;
select *
from s1
where id>=(select
dbms_random.value(
(select min(id) from s1),
(select max(id) from s1)
)
from dual)
order by id
fetch first 1 rows only;
3: get random table block, generate rowid and get row from the table by this rowid:
select *
from s1
where rowid = (
select
DBMS_ROWID.ROWID_CREATE (
1,
objd,
file#,
block#,
1)
from
(
select/*+ rule */ file#,block#,objd
from v$bh b
where b.objd in (select o.data_object_id from user_objects o where object_name='S1' /* table_name */)
order by dbms_random.value
fetch first 1 rows only
)
);
To randomly select 20 rows I think you'd be better off selecting the lot of them randomly ordered and selecting the first 20 of that set.
Something like:
Select *
from (select *
from table
order by dbms_random.value) -- you can also use DBMS_RANDOM.RANDOM
where rownum < 21;
Best used for small tables to avoid selecting large chunks of data only to discard most of it.
Here's how to pick a random sample out of each group:
SELECT GROUPING_COLUMN,
MIN (COLUMN_NAME) KEEP (DENSE_RANK FIRST ORDER BY DBMS_RANDOM.VALUE)
AS RANDOM_SAMPLE
FROM TABLE_NAME
GROUP BY GROUPING_COLUMN
ORDER BY GROUPING_COLUMN;
I'm not sure how efficient it is, but if you have a lot of categories and sub-categories, this seems to do the job nicely.
-- Q. How to find Random 50% records from table ?
when we want percent wise randomly data
SELECT *
FROM (
SELECT *
FROM table_name
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum <= (select count(*) from table_name) * 50/100;

Resources