Exists query with not equal running extremly slow - oracle

I am trying to modify a query that someone else wrote which is taking a really long time to run. The problem has to do with the <> portion of the exists query. Any idea how this can be changed to run quicker?
SELECT m.level4 center, cc.description, m.employeename, m.empno,
TO_DATE (ct.tsdate, 'dd-mon-yyyy') tsdate, ct.starttime, ct.endtime,
ct.DURATION,
NVL (DECODE (ct.paycode, ' ', 'REG', ct.paycode), 'REG') paycode,
ct.titlecode, ct.costcenter, m.tsgroup
FROM clairvia_text ct, MASTER m, costcenteroutbound cc
WHERE ct.recordtype = '1'
AND ct.empno = m.empno
AND m.level4 = cc.center
AND EXISTS (
SELECT ct1.recordtype,ct1.empno,ct1.tsdate,ct1.processdate
FROM clairvia_text ct1
WHERE ct.recordtype = ct1.recordtype
AND ct.empno = ct1.empno
AND ct.tsdate = ct1.tsdate
AND ct.processdate = ct1.processdate
group by ct1.recordtype,ct1.empno,ct1.tsdate,ct1.processdate
having count(*) < 2)

Oracle can be finnicky with exists statements and subqueries. A couple of things to try:
Change the exists to an "in"
Change the exists to a group by statement with a "having count(1) > 1". This could even be changed into a join.
I'm assuming indexes are not an issue.

You can use analytic function count here to eliminate duplicated rows.
select * from (
SELECT m.level4 center, cc.description, m.employeename, m.empno,
TO_DATE (ct.tsdate, 'dd-mon-yyyy') tsdate, ct.starttime, ct.endtime,
ct.DURATION,
NVL (DECODE (ct.paycode, ' ', 'REG', ct.paycode), 'REG') paycode,
ct.titlecode, ct.costcenter, m.tsgroup,
count(1) over (partition by ct.recordtype,ct.empno,ct.tsdate,ct.processdate
order by null) cnt
FROM clairvia_text ct, MASTER m, costcenteroutbound cc
WHERE ct.recordtype = '1'
AND ct.empno = m.empno
AND m.level4 = cc.center
) where cnt=1
I do not have your structures and data, so I run similar queries with all_tab_cols and first query took ~500s on my laptop and second query ~2s.
-- slow test
select count(1)
from all_tab_cols atc
where exists (
select 1
from all_tab_cols atc1
where atc1.column_name = atc.column_name
group by column_name
having count(1) = 1)
-- fast test
select count(1)
from (
select column_name,
count(1) over (partition by atc.column_name order by null) cnt
from all_tab_cols atc
)
where cnt = 1

Related

SQL to count number of partitions that has statistics and with missing statistics

I am on Oracle11g.
For each table, I want to check the number of partitions that has been analyzed and the number of partitions that has not been analyzed.
At the moment, I am using the SQL below:
COMPUTE SUM OF "UNANALYZED" ON REPORT
COMPUTE SUM OF "ANALYZED" ON REPORT
COMPUTE SUM OF "TOTAL" ON REPORT
BREAK ON REPORT
select t1.table_name,
decode(t2.unanalyzed,null,0,t2.unanalyzed) unanalyzed,
decode(t3.analyzed,null,0,t3.analyzed) analyzed,
t1.total
from
( SELECT table_name, count(1) total
FROM DBA_TAB_PARTITIONS p
WHERE 1=1
AND table_owner = 'ABC'
GROUP BY table_name ) t1 ,
( SELECT table_name, count(1) unanalyzed
FROM DBA_TAB_PARTITIONS p
WHERE 1=1
AND table_owner = 'ABC'
AND last_analyzed is NULL
GROUP BY table_name ) t2 ,
( SELECT table_name, count(1) analyzed
FROM DBA_TAB_PARTITIONS p
WHERE 1=1
AND table_owner = 'ABC'
AND last_analyzed is NOT NULL
GROUP BY table_name ) t3
where t1.table_name = t2.table_name (+)
and t1.table_name = t3.table_name (+)
order by t1.table_name
;
It is working like I would have wanted it to.
I just want to know if there is alternative to this SQL that will give the same result?
Something shorter or simpler maybe or something that uses analytic function?
Thanks.
I would do it like that :
SELECT
table_owner,
table_name,
SUM(DECODE(last_analyzed, NULL, 1, 0)) AS unanalyzed,
SUM(DECODE(last_analyzed, NULL, 0, 1)) AS analyzed,
COUNT(*) as total
FROM
dba_tab_partitions
WHERE
table_owner = 'ABC'
GROUP BY
table_owner,
table_name;
Ps: I kept your logic, but if last_analysed was 5 years ago, is it really analysed ?

Oracle SQL -- Finding count of rows that match date maximum in table

I am trying to use a query to return the count from rows such that the date of the rows matches the maximum date for that column in the table.
Oracle SQL: version 11.2:
The following syntax would seem to be correct (to me), and it compiles and runs. However, instead of returning JUST the count for the maximum, it returns several counts more or less like the "HAIVNG" clause wasn't there.
Select ourDate, Count(1) as OUR_COUNT
from schema1.table1
group by ourDate
HAVING ourDate = max(ourDate) ;
How can this be fixed, please?
You can use:
SELECT MAX(ourDate) AS ourDate,
COUNT(*) KEEP (DENSE_RANK LAST ORDER BY ourDate) AS ourCount
FROM schema1.table1
or:
SELECT ourDate,
COUNT(*) AS our_count
FROM (
SELECT ourDate,
RANK() OVER (ORDER BY ourDate DESC) AS rnk
FROM schema1.table1
)
WHERE rnk = 1
GROUP BY ourDate
Which, for the sample data:
CREATE TABLE table1 (ourDate) AS
SELECT SYSDATE FROM DUAL CONNECT BY LEVEL <= 5 UNION ALL
SELECT SYSDATE - 1 FROM DUAL;
Both output:
OURDATE
OUR_COUNT
2022-06-28 13:35:01
5
db<>fiddle here
I don't know if I understand what you want. Try this:
Select x.ourDate, Count(1) as OUR_COUNT
from schema1.table1 x
where x.ourDate = (select max(y.ourDate) from schema1.table1 y)
group by x.ourDate
One option is to use a subquery which fetches maximum date:
select ourdate, count(*)
from table1
where ourdate = (select max(ourdate)
from table1)
group by ourdate;
Or, a more modern approach (if your database version supports it; 11g doesn't, though):
select ourdate, count(*)
from table1
group by ourdate
order by ourdate desc
fetch first 1 rows only;
You can use this SQL query:
select MAX(ourDate),COUNT(1) as OUR_COUNT
from schema1.table1
where ourDate = (select MAX(ourDate) from schema1.table1)
group by ourDate;

Adding filters in subquery from CTE quadruples run time

I am working on an existing query for SSRS report that focuses on aggregated financial aid data split out into 10 aggregations. User wants to be able to select students included in that aggregated data based on new vs. returning and 'selected for verification.' For the new/returning status, I added a CTE to return the earliest admit date for a student. 2 of the 10 data fields are created by a subquery. I have been trying for 3 days to get the subquery to use the CTE fields for a filter, but they won't work. Either they're ignored or I get a 'not a group by expression' error. If I put the join to the CTE within the subquery, the query time jumps from 45 second to 400 seconds. This shouldn't be that complicated! What am I missing? I have added some of the code... 3 of the chunks work - paid_something doesn't.
with stuStatus as
(select
person_uid, min(year_admitted) admit_year
from academic_study
where aid_year between :AidYearStartParameter and :AidYearEndParameter
group by person_uid)
--- above code added to get student information not originally in qry
select
finaid_applicant_status.aid_year
, count(1) as fafsa_cnt --works
, sum( --works
case
when (
package_complete_date is not null
and admit.status is not null
)
then 1
else 0
end
) as admit_and_package
, (select count(*) --does't work
from (
select distinct award_by_aid_year.person_uid
from
award_by_aid_year
where
award_by_aid_year.aid_year = finaid_applicant_status.aid_year
and award_by_aid_year.total_paid_amount > 0 )dta
where
(
(:StudentStatusParameter = 'N' and stuStatus.admit_year = finaid_applicant_status.aid_year)
OR
(:StudentStatusParameter = 'R' and stuStatus.admit_year <> finaid_applicant_status.aid_year)
OR :StudentStatusParameter = '%'
)
)
as paid_something
, sum( --works
case
when exists (
select
1
from
award_by_person abp
where
abp.person_uid = fafsa.person_uid
and abp.aid_year = fafsa.aid_year
and abp.award_paid_amount > 0
) and fafsa.requirement is not null
then 1
else 0
end
) as paid_something_fafsa
from
finaid_applicant_status
join finaid_tracking_requirement fafsa
on finaid_applicant_status.person_uid = fafsa.person_uid
and finaid_applicant_status.aid_year = fafsa.aid_year
and fafsa.requirement = 'FAFSA'
left join finaid_tracking_requirement admit
on finaid_applicant_status.person_uid = admit.person_uid
and finaid_applicant_status.aid_year = admit.aid_year
and admit.requirement = 'ADMIT'
and admit.status in ('M', 'P')
left outer join stuStatus
on finaid_applicant_status.person_uid = stuStatus.person_uid
where
finaid_applicant_status.aid_year between :AidYearStartParameter and :AidYearEndParameter
and (
(:VerifiedParameter = '%') OR
(:VerifiedParameter <> '%' AND finaid_applicant_status.verification_required_ind = :VerifiedParameter)
)
and
(
(:StudentStatusParameter = 'N' and (stuStatus.admit_year IS NULL OR stuStatus.admit_year = finaid_applicant_status.aid_year ))
OR
(:StudentStatusParameter = 'R' and stuStatus.admit_year <> finaid_applicant_status.aid_year)
OR :StudentStatusParameter = '%'
)
group by
finaid_applicant_status.aid_year
order by
finaid_applicant_status.aid_year
Not sure if this helps, but you have something like this:
select aid_year, count(1) c1,
(select count(1)
from (select distinct person_uid
from award_by_aid_year a
where a.aid_year = fas.aid_year))
from finaid_applicant_status fas
group by aid_year;
This query throws ORA-00904 FAS.AID_YEAR invalid identifier. It is because fas.aid_year is nested too deep in subquery.
If you are able to modify your subquery from select count(1) from (select distinct sth from ... where year = fas.year) to select count(distinct sth) from ... where year = fas.year then it has the chance to work.
select aid_year, count(1) c1,
(select count(distinct person_uid)
from award_by_aid_year a
where a.aid_year = fas.aid_year) c2
from finaid_applicant_status fas
group by aid_year
Here is simplified demo showing non-working and working queries. Of course your query is much more complicated, but this is something what you could check.
Also maybe you can use dbfiddle or sqlfiddle to set up some test case? Or show us sample (anonimized) data and required output for them?

How can i set a maxiumum value using list agg

i have read the other questions and answers and they do not help with my issue. i am asking if there is a way to set a limit on the number of results returned in listagg.
I am using this query
HR--Any baby with a HR<80
AS
SELECT fm.y_inpatient_dat, h.pat_id, h.pat_enc_csn_id,
LISTAGG(meas_value, '; ') WITHIN GROUP (ORDER BY fm.recorded_time)
abnormal_HR_values
from
ip_flwsht_meas fm
join pat_enc_hsp h on fm.y_inpatient_dat = h.inpatient_data_id
where fm.flo_meas_id in ('8' ) and (to_number(MEAS_VALUE) <80)
AND fm.recorded_time between (select start_date from dd) AND (select end_date from dd)
group by fm.y_inpatient_dat,h.pat_id, h.pat_enc_csn_id)
and I get the following error:
ORA-01489: result of string concatenation is too long
I have researched online how to set a size limit, but I can't seem to make it work. Can someone please advise how to set a limit so it does not exceed the 50 characters.
In Oracle 12.2, you can use ON OVERFLOW ERROR in the LISTAGG, like:
LISTAGG(meas_value, '; ' ON OVERFLOW ERROR) WITHIN GROUP (ORDER BY fm.recorded_time)
Then you can surround that with a SUBSTR() to get the first 50 characters.
Pre 12.2, you need to restructure the query to limit the number of rows that get seen by the LISTAGG. Here is an example of that that uses DBA_OBJECTS (so people without your tables can run it). It will only the 1st three values for each object type.
SELECT object_type,
listagg(object_name, ', ') within group ( order by object_name) first_three
FROM (
SELECT object_type,
object_name,
row_number() over ( partition by object_type order by object_name ) ord
FROM dba_objects
WHERE owner = 'SYS'
)
WHERE ord <= 3
GROUP BY object_type
ORDER BY object_type;
The idea is to number the row that you want to aggregate and then only aggregate the first X of them, where "X" is small enough not to overflow the max length on VARCHAR2. "X" will depend on your data.
Or, if you don't want the truncation at 50 characters to happen mid-values and/or you don't know how many values are safe to allow, you can replace the ord expression with a running_length expression to keep a running count of the length and cap it off before it gets to your limit (of 50 chars). That expression would be a SUM(length()) OVER (...). Like this:
SELECT object_type,
listagg(object_name, ', ') within group ( order by object_name) first_50_char,
FROM (
SELECT object_type,
object_name,
sum(length(object_name || ', '))
over ( partition by object_type order by object_name ) running_len
FROM dba_objects
WHERE owner = 'SYS'
)
WHERE running_len <= 50+2 -- +2 because the last one won't have a trailing delimiter
GROUP BY object_type
ORDER BY object_type;
With your query, all that put together would look like this:
SELECT y_inpatient_dat,
pat_id,
pat_enc_csn_id,
LISTAGG(meas_value, '; ') WITHIN GROUP ( ORDER BY fm.recorded_time ) abnormal_HR_values
FROM (
SELECT fm.y_inpatient_dat,
h.pat_id,
h.pat_enc_csn_id,
meas_value,
fm.recorded_time,
SUM(length(meas_value || '; ') OVER ( ORDER BY fm.recorded_time ) running_len
FROM ip_flwsht_meas fm
INNER JOIN pat_enc_hsp h on fm.y_inpatient_dat = h.inpatient_data_id
WHERE fm.flo_meas_id in ('8' ) and (to_number(MEAS_VALUE) <80)
AND fm.recorded_time BETWEEN
(SELECT start_date FROM dd) AND (SELECT end_date FROM dd)
)
WHERE running_len <= 50+2
GROUP BY fm.y_inpatient_dat,h.pat_id, h.pat_enc_csn_id;

how to make selecting random rows in oracle faster with table with millions of rows

Is there a way to make selecting random rows faster in oracle with a table that has million of rows. I tried to use sample(x) and dbms_random.value and its taking a long time to run.
Thanks!
Using appropriate values of sample(x) is the fastest way you can. It's block-random and row-random within blocks, so if you only want one random row:
select dbms_rowid.rowid_relative_fno(rowid) as fileno,
dbms_rowid.rowid_block_number(rowid) as blockno,
dbms_rowid.rowid_row_number(rowid) as offset
from (select rowid from [my_big_table] sample (.01))
where rownum = 1
I'm using a subpartitioned table, and I'm getting pretty good randomness even grabbing multiple rows:
select dbms_rowid.rowid_relative_fno(rowid) as fileno,
dbms_rowid.rowid_block_number(rowid) as blockno,
dbms_rowid.rowid_row_number(rowid) as offset
from (select rowid from [my_big_table] sample (.01))
where rownum <= 5
FILENO BLOCKNO OFFSET
---------- ---------- ----------
152 2454936 11
152 2463140 32
152 2335208 2
152 2429207 23
152 2746125 28
I suspect you should probably tune your SAMPLE clause to use an appropriate sample size for what you're fetching.
Start with Adam's answer first, but if SAMPLE just isn't fast enough, even with the ROWNUM optimization, you can use block samples:
....FROM [table] SAMPLE BLOCK (0.01)
This applies the sampling at the block level instead of for each row. This does mean that it can skip large swathes of data from the table so the sample percent will be very rough. It's not unusual for a SAMPLE BLOCK with a low percentage to return zero rows.
Here's the same question on AskTom:
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522
If you know how big your table is, use sample block as described above. If you don't, you can modify the routine below to get however many rows you want.
Copied from: http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522#56174726207861
create or replace function get_random_rowid
( table_name varchar2
) return urowid
as
sql_v varchar2(100);
urowid_t dbms_sql.urowid_table;
cursor_v integer;
status_v integer;
rows_v integer;
begin
for exp_v in -6..2 loop
exit when (urowid_t.count > 0);
if (exp_v < 2) then
sql_v := 'select rowid from ' || table_name
|| ' sample block (' || power(10, exp_v) || ')';
else
sql_v := 'select rowid from ' || table_name;
end if;
cursor_v := dbms_sql.open_cursor;
dbms_sql.parse(cursor_v, sql_v, dbms_sql.native);
dbms_sql.define_array(cursor_v, 1, urowid_t, 100, 0);
status_v := dbms_sql.execute(cursor_v);
loop
rows_v := dbms_sql.fetch_rows(cursor_v);
dbms_sql.column_value(cursor_v, 1, urowid_t);
exit when rows_v != 100;
end loop;
dbms_sql.close_cursor(cursor_v);
end loop;
if (urowid_t.count > 0) then
return urowid_t(trunc(dbms_random.value(0, urowid_t.count)));
end if;
return null;
exception when others then
if (dbms_sql.is_open(cursor_v)) then
dbms_sql.close_cursor(cursor_v);
end if;
raise;
end;
/
show errors
Below Solution to this question is not the exact answer but in many scenarios you try to select a row and try to use it for some purpose and then update its status with "used" or "done" so that you do not select it again.
Solution:
Below query is useful but that way if your table is large, I just tried and see that you definitely face performance problem with this query.
SELECT * FROM
( SELECT * FROM table
ORDER BY dbms_random.value )
WHERE rownum = 1
So if you set a rownum like below then you can work around the performance problem. By incrementing rownum you can reduce the possiblities. But in this case you will always get rows from the same 1000 rows. If you get a row from 1000 and update its status with "USED", you will almost get different row everytime you query with "ACTIVE"
SELECT * FROM
( SELECT * FROM table
where rownum < 1000
and status = 'ACTIVE'
ORDER BY dbms_random.value )
WHERE rownum = 1
update the rows status after selecting it, If you can not update that means another transaction has already used it. Then You should try to get a new row and update its status. By the way, getting the same row by two different transaction possibility is 0.001 since rownum is 1000.
Someone told sample(x) is the fastest way you can.
But for me this method works slightly faster than sample(x) method.
It should take fraction of the second (0.2 in my case) no matter what is the size of the table. If it takes longer try to use hints (--+ leading(e) use_nl(e t) rowid(t)) can help
SELECT *
FROM My_User.My_Table
WHERE ROWID = (SELECT MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value)
FROM (SELECT o.Data_Object_Id,
e.Relative_Fno,
e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
FROM Dba_Extents e
JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
WHERE e.Segment_Name = 'MY_TABLE'
AND(e.Segment_Type, e.Owner, e.Extent_Id) =
(SELECT MAX(e.Segment_Type) AS Segment_Type,
MAX(e.Owner) AS Owner,
MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
FROM Dba_Extents e
WHERE e.Segment_Name = 'MY_TABLE'
AND e.Owner = 'MY_USER'
AND e.Segment_Type = 'TABLE')) e
JOIN My_User.My_Table t
ON t.Rowid BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))
Version with retries when no rows returned:
WITH gen AS ((SELECT --+ inline leading(e) use_nl(e t) rowid(t)
MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value) Row_Id
FROM (SELECT o.Data_Object_Id,
e.Relative_Fno,
e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
FROM Dba_Extents e
JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
WHERE e.Segment_Name = 'MY_TABLE'
AND(e.Segment_Type, e.Owner, e.Extent_Id) =
(SELECT MAX(e.Segment_Type) AS Segment_Type,
MAX(e.Owner) AS Owner,
MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
FROM Dba_Extents e
WHERE e.Segment_Name = 'MY_TABLE'
AND e.Owner = 'MY_USER'
AND e.Segment_Type = 'TABLE')) e
JOIN MY_USER.MY_TABLE t ON t.ROWID BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))),
Retries(Cnt, Row_Id) AS (SELECT 1, gen.Row_Id
FROM Dual
LEFT JOIN gen ON 1=1
UNION ALL
SELECT Cnt + 1, gen.Row_Id
FROM Retries
LEFT JOIN gen ON 1=1
WHERE Retries.Row_Id IS NULL AND Retries.Cnt < 10)
SELECT *
FROM MY_USER.MY_TABLE
WHERE ROWID = (SELECT Row_Id
FROM Retries
WHERE Row_Id IS NOT NULL)
Can you use pseudorandom rows?
select * from (
select * from ... where... order by ora_hash(rowid)
) where rownum<100

Resources