I've written a query that works, but looks super clunky. Also, the table only has a few hundred records in it right now, but will in the future have hundreds of thousands of records. It might get into the millions, but I'm not sure. So, performance might become an issue.
I'm wondering if there is a cleaner way to do this.
Thanks.
with objects as
(
select object_type, object_name
from pt_objectshistory
where export_guid = 'PTGAA5V0H2U1XAQYFLQ0QXGWF0OY7Z'
),
distinct_objects as
(select distinct * from objects),
o_count as
(select count(*) ocount from objects),
do_count as
(select count(*) docount from distinct_objects)
select
o_count.ocount,
do_count.docount,
o_count.ocount - do_count.docount delta
from o_count join do_count on 1=1
There is no group by, so you just count the number of (distinct) rows in the table
select count(*), count(distinct object_type || object_name)
from pt_objectshistory
where export_guid = 'PTGAA5V0H2U1XAQYFLQ0QXGWF0OY7Z'
Thanks to #MT0, object_type || object_name could be ambiguous and discard distinct rows, when there are none like in 'abc' || 'def' and 'ab' || 'cdef'.
So depending on the data you have, adding a separator could be helpful, e.g. object_type || ';' || object_name.
You should also look on performance and test, if this solution (concatenating columns) is really faster than a count of a subselect/CTE.
You may group by the object_type, object_name pairs and then sum up the result
select sum(gcount) as ocount, count(*) as docount
from (select count(*) as gcount, object_type, object_name
from pt_objectshistory
where export_guid = 'PTGAA5V0H2U1XAQYFLQ0QXGWF0OY7Z'
group by object_type, object_name) temp
Related
I am on Oracle11g.
For each table, I want to check the number of partitions that has been analyzed and the number of partitions that has not been analyzed.
At the moment, I am using the SQL below:
COMPUTE SUM OF "UNANALYZED" ON REPORT
COMPUTE SUM OF "ANALYZED" ON REPORT
COMPUTE SUM OF "TOTAL" ON REPORT
BREAK ON REPORT
select t1.table_name,
decode(t2.unanalyzed,null,0,t2.unanalyzed) unanalyzed,
decode(t3.analyzed,null,0,t3.analyzed) analyzed,
t1.total
from
( SELECT table_name, count(1) total
FROM DBA_TAB_PARTITIONS p
WHERE 1=1
AND table_owner = 'ABC'
GROUP BY table_name ) t1 ,
( SELECT table_name, count(1) unanalyzed
FROM DBA_TAB_PARTITIONS p
WHERE 1=1
AND table_owner = 'ABC'
AND last_analyzed is NULL
GROUP BY table_name ) t2 ,
( SELECT table_name, count(1) analyzed
FROM DBA_TAB_PARTITIONS p
WHERE 1=1
AND table_owner = 'ABC'
AND last_analyzed is NOT NULL
GROUP BY table_name ) t3
where t1.table_name = t2.table_name (+)
and t1.table_name = t3.table_name (+)
order by t1.table_name
;
It is working like I would have wanted it to.
I just want to know if there is alternative to this SQL that will give the same result?
Something shorter or simpler maybe or something that uses analytic function?
Thanks.
I would do it like that :
SELECT
table_owner,
table_name,
SUM(DECODE(last_analyzed, NULL, 1, 0)) AS unanalyzed,
SUM(DECODE(last_analyzed, NULL, 0, 1)) AS analyzed,
COUNT(*) as total
FROM
dba_tab_partitions
WHERE
table_owner = 'ABC'
GROUP BY
table_owner,
table_name;
Ps: I kept your logic, but if last_analysed was 5 years ago, is it really analysed ?
i have read the other questions and answers and they do not help with my issue. i am asking if there is a way to set a limit on the number of results returned in listagg.
I am using this query
HR--Any baby with a HR<80
AS
SELECT fm.y_inpatient_dat, h.pat_id, h.pat_enc_csn_id,
LISTAGG(meas_value, '; ') WITHIN GROUP (ORDER BY fm.recorded_time)
abnormal_HR_values
from
ip_flwsht_meas fm
join pat_enc_hsp h on fm.y_inpatient_dat = h.inpatient_data_id
where fm.flo_meas_id in ('8' ) and (to_number(MEAS_VALUE) <80)
AND fm.recorded_time between (select start_date from dd) AND (select end_date from dd)
group by fm.y_inpatient_dat,h.pat_id, h.pat_enc_csn_id)
and I get the following error:
ORA-01489: result of string concatenation is too long
I have researched online how to set a size limit, but I can't seem to make it work. Can someone please advise how to set a limit so it does not exceed the 50 characters.
In Oracle 12.2, you can use ON OVERFLOW ERROR in the LISTAGG, like:
LISTAGG(meas_value, '; ' ON OVERFLOW ERROR) WITHIN GROUP (ORDER BY fm.recorded_time)
Then you can surround that with a SUBSTR() to get the first 50 characters.
Pre 12.2, you need to restructure the query to limit the number of rows that get seen by the LISTAGG. Here is an example of that that uses DBA_OBJECTS (so people without your tables can run it). It will only the 1st three values for each object type.
SELECT object_type,
listagg(object_name, ', ') within group ( order by object_name) first_three
FROM (
SELECT object_type,
object_name,
row_number() over ( partition by object_type order by object_name ) ord
FROM dba_objects
WHERE owner = 'SYS'
)
WHERE ord <= 3
GROUP BY object_type
ORDER BY object_type;
The idea is to number the row that you want to aggregate and then only aggregate the first X of them, where "X" is small enough not to overflow the max length on VARCHAR2. "X" will depend on your data.
Or, if you don't want the truncation at 50 characters to happen mid-values and/or you don't know how many values are safe to allow, you can replace the ord expression with a running_length expression to keep a running count of the length and cap it off before it gets to your limit (of 50 chars). That expression would be a SUM(length()) OVER (...). Like this:
SELECT object_type,
listagg(object_name, ', ') within group ( order by object_name) first_50_char,
FROM (
SELECT object_type,
object_name,
sum(length(object_name || ', '))
over ( partition by object_type order by object_name ) running_len
FROM dba_objects
WHERE owner = 'SYS'
)
WHERE running_len <= 50+2 -- +2 because the last one won't have a trailing delimiter
GROUP BY object_type
ORDER BY object_type;
With your query, all that put together would look like this:
SELECT y_inpatient_dat,
pat_id,
pat_enc_csn_id,
LISTAGG(meas_value, '; ') WITHIN GROUP ( ORDER BY fm.recorded_time ) abnormal_HR_values
FROM (
SELECT fm.y_inpatient_dat,
h.pat_id,
h.pat_enc_csn_id,
meas_value,
fm.recorded_time,
SUM(length(meas_value || '; ') OVER ( ORDER BY fm.recorded_time ) running_len
FROM ip_flwsht_meas fm
INNER JOIN pat_enc_hsp h on fm.y_inpatient_dat = h.inpatient_data_id
WHERE fm.flo_meas_id in ('8' ) and (to_number(MEAS_VALUE) <80)
AND fm.recorded_time BETWEEN
(SELECT start_date FROM dd) AND (SELECT end_date FROM dd)
)
WHERE running_len <= 50+2
GROUP BY fm.y_inpatient_dat,h.pat_id, h.pat_enc_csn_id;
I'm trying to adapt a query that works in MSSQL to Oracle, the query is much bigger (this part is just a field from a much bigger query) but I managed to reduce it so it looks simpler.
SELECT CASE WHEN COUNT(*) > 0 THEN COUNT(*)
ELSE (SELECT COUNT(*) FROM table2)
END
FROM table1
The error I'm getting is:
ora-00937 not a single-group group function
Can someone tell me where's the problem or how can I redefine it?
You can try with this query:
SELECT CASE WHEN (SELECT COUNT(*) FROM table1) > 0 then (SELECT COUNT(*) FROM table1)
ELSE (SELECT COUNT(*) FROM table2)
END
FROM dual;
It is still ugly but it works :)
Update:
To explain how it's working:
We have 2 cases:
If there are records in the table1 then show me how many records
there are
If the table1 is empty, then give me the number of records from the
table2
Dual is the dummy table.
I think that NikNik answer is cleaner but another solution would be:
SELECT *
FROM (SELECT CASE
WHEN Count(*) > 0 THEN Count(*)
ELSE (SELECT Count(*)
FROM table2)
END
FROM table1
GROUP BY table1.primarykey1,
table1.primarykey2)
WHERE ROWNUM = 1
Here is my query,
SELECT ID As Col1,
(
SELECT VID FROM TABLE2 t
WHERE (a.ID=t.ID or a.ID=t.ID2)
AND t.STARTDTE =
(
SELECT MAX(tt.STARTDTE)
FROM TABLE2 tt
WHERE (a.ID=tt.ID or a.ID=tt.ID2) AND tt.STARTDTE < SYSDATE
)
) As Col2
FROM TABLE1 a
Table1 has 48850 records and Table2 has 15944098 records.
I have separate indexes in TABLE2 on ID,ID & STARTDTE, STARTDTE, ID, ID2 & STARTDTE.
The query is still too slow. How can this be improved? Please help.
I'm guessing that the OR in inner queries is messing up with the optimizer's ability to use indexes. Also I wouldn't recommend a solution that would scan all of TABLE2 given its size.
This is why in this case I would suggest using a function that will efficiently retrieve the information you are looking for (2 index scan per call):
CREATE OR REPLACE FUNCTION getvid(p_id table1.id%TYPE)
RETURN table2.vid%TYPE IS
l_result table2.vid%TYPE;
BEGIN
SELECT vid
INTO l_result
FROM (SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
UNION ALL
SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id2 = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
ORDER BY startdte DESC)
WHERE rownum = 1;
RETURN l_result;
END;
Your SQL would become:
SELECT ID As Col1,
getvid(a.id) vid
FROM TABLE1 a
Make sure you have indexes on both table2(id, startdte DESC) and table2(id2, startdte DESC). The order of the index is very important.
Possibly try the following, though untested.
WITH max_times AS
(SELECT a.ID, MAX(t.STARTDTE) AS Startdte
FROM TABLE1 a, TABLE2 t
WHERE (a.ID=t.ID OR a.ID=t.ID2)
AND t.STARTDTE < SYSDATE
GROUP BY a.ID)
SELECT b.ID As Col1, tt.VID
FROM TABLE1 b
LEFT OUTER JOIN max_times mt
ON (b.ID = mt.ID)
LEFT OUTER JOIN TABLE2 tt
ON ((mt.ID=tt.ID OR mt.ID=tt.ID2)
AND mt.startdte = tt.startdte)
You can look at analytic functions to avoid having to hit the second table twice. Something like this might work:
SELECT id AS col1, vid
FROM (
SELECT t1.id, t2.vid, RANK() OVER (PARTITION BY t1.id ORDER BY
CASE WHEN t2.startdte < TRUNC(SYSDATE) THEN t2.startdte ELSE null END
NULLS LAST) AS rn
FROM table1 t1
JOIN table2 t2 ON t2.id IN (t1.ID, t1.ID2)
)
WHERE rn = 1;
The inner select gets the id and vid values from the two tables with a simple join on id or id2. The rank function calculates a ranking for each matching row in the second table based on the startdte. It's complicated a bit by you wanting to filter on that date, so I've used a case to effectively ignore any dates today or later by changing the evaluated value to null, and in this instance that means the order by in the over clause needs nulls last so they're ignored.
I'd suggest you run the inner select on its own first - maybe with just a couple of id values for brevity - to see what its doing, and what ranks are being allocated.
The outer query is then just picking the top-ranked result for each id.
You may still get duplicates though; if table2 has more than one row for an id with the same startdte they'll get the same rank, but then you may have had that situation before. You may need to add more fields to the order by to break ties in a way that makes sens to you.
But this is largely speculation without being able to see where your existing query is actually slow.
I'm trying to make my query run as quickly as possible but i'm struggling to get it under 5 seconds.
I think it's because i'm referencing two linked databases
Here's my query
select column2, column3, column4
from table1#dev
where column1 in (
select distinct column2
from table2#dev
where column3 > 0
)
order by column1
Is there a way to optimise this query any more?
I've tried using join but it seems to make the query run longer
Thanks in advance
EDIT
From further investigation the DRIVING_SITE makes it run very quick like this
select /*+ DRIVING_SITE(table1) */ t1.column2, t1.column3, t1.column4
from table1#dev t1, table2#dev t2
WHERE t2.column3 > 0
But as soon as I add the distinct column2 in it makes it run really slow
First, no need for distinct. The query can be written as:
select *
from table1#dev
where column1 in (
select column2
from table2#dev
where column3 > 0
)
order by column1
Second, there are (at least) two more ways to write it. Either with JOIN:
select t1.*
from table1#dev t1
join table2#dev t2
where t2.column2 = t1.column1
and t2.column3 > 0
group by
t1.id, t1.column1, ...
order by t1.column1
or (my preference) with EXISTS:
select t1.*
from table1#dev t1
where exists
( select *
from table2#dev
where t2.column2 = t1.column1
and t2.column3 > 0
)
order by column1
In any case, you should check the execution plans for all of them.
I would expect performance to be best if you have an index on table1.column1 and for table2, either an index on column2 or a composite index on (column3, column2)
I agree with Shannon above , but are you able to create a view on the dev server ?
Also select * is a bit naughty - it is better to name the fields you really want. For very large datasets that will give you a performance improvement too.
Am I missing something in believing that this will work?
select t1.*
from table1 t1, table2 t2
where t1.column1 = t2.column2(+)
and t2.column3 > 0;