order by rownum — is it correct or not? - oracle

I have a canonical top-N query against an Oracle database suggested by all FAQs and HowTos:
select ... from (
select ... from ... order by ...
) where ronwum <= N
It works perfectly on Oracle 11, i.e. it returns top-N records in the order specified in inner select.
However it breaks on Oracle 12. It still returns the same top-N records, but they may get shuffled. The final order of these records is non-deterministic.
I googled but haven't found any related discussions. Looks like everyone else is always getting the correct record order from such select.
One finding was interesting though. I saw that some people use (without an explanation, unfortunately) an additional order by rownum clause in the outer select:
select ... from (
select ... from ... order by ...
) where ronwum <= N
order by rownum
(both rownum's here are references to the Oracle pseudocolumn; it's not something returned by inner select)
It appears to work. But with Oracle optimizer you can never be sure if it's just luck or a really correct solution.
The question is: does order by rownum guarantee correct ordering in this case or not, and why? Me and my colleagues could not come to agreement about it.
P.S. I'm aware of other ways to select top-N records, e.g. using row_number analytic function and fetch first clause introduced in Oracle 12. I'm also aware that I can repeat the same order by ... on the outer select. The question is about order by rownum only — is it correct or not.

Inner query and outer query may or may not give different order and hence different order of rownum. As rownum is already ordered and if you want to get top N records then best thing is to do is create alias of rownum in inner query and use it on outer query.
select ... from (
select rownum rn ... from ...
) where rn <= N
order by rn

Related

Oracle11g select query with pagination

I am facing a big performance problem when trying to get a list of objects with pagination from an oracle11g database.
As far as I know and as much as I have checked online, the only way to achieve pagination in oracle11g is the following :
Example : [page=1, size=100]
SELECT * FROM
(
SELECT pagination.*, rownum r__ FROM
(
select * from "TABLE_NAME" t
inner join X on X.id = t.id
inner join .....
where ......
order
) pagination
WHERE rownum <= 200
)
WHERE r__ > 100
The problem in this query, is that the most inner query fetching data from the table "TABLE_NAME" is returning a huge amount of data and causing the overall query to take 8 seconds (there are around 2 Million records returned after applying the where clause, and it contains 9 or 10 join clause).
The reason of this is that the most inner query is fetching all the data that respects the where clause and then the second query is getting the 200 rows, and the third to exclude the first 100 to get the second pages' data we need.
Isn't there a way to do that in one query, in a way to fetch the second pages' data that we need without having to do all these steps and cause performance issues?
Thank you!!
It depends on your sorting options (order by ...): database needs to sort whole dataset before applying outer where rownum<200 because of your order by clause.
It will fetch only 200 rows if you remove your order by clause. In some cases oracle can avoid sort operations (for example, if oracle can use some indexes to get requested data in the required order). Btw, Oracle uses optimized sorting operations in case of rownum<N predicates: it doesn't sort full dataset, it just gets top N records instead.
You can investigate sort operations deeper using sort trace event: alter session set events '10032 trace name context forever, level 10';
Furthermore, sometimes it's better to use analytic functions like
select *
from (
select
t1.*
,t2.*
,row_number()over([partition by ...] order by ...) rn
from t1
,t2
where ...
)
where rn <=200
and rn>=100
because in some specific cases Oracle can transform your query to push sorting and sort filter predicates to the earliest possible steps.

A fast query that selects the number of rows in each table

I want a query that selects the number of rows in each table
but they are NOT updated statistically .So such query will not be accurate:
select table_name, num_rows from user_tables
i want to select several schema and each schema has minimum 500 table some of them contain a lot of columns . it will took for me days if i want to update them .
from the site ask tom he suggest a function includes this query
'select count(*)
from ' || p_tname INTO l_columnValue;
such query with count(*) is really slow and it will not give me fast results.
Is there a query that can give me how many rows are in table in a fast way ?
You said in a comment that you want to delete (drop?) empty tables. If you don't want an exact count but only want to know if a table is empty you can do a shortcut count:
select count(*) from table_name where rownum < 2;
The optimiser will stop when it reaches the first row - the execution plan shows a 'count stopkey' operation - so it will be fast. It will return zero for an empty table, and one for a table with any data - you have no idea how much data, but you don't seem to care.
You still have a slight race condition between the count and the drop, of course.
This seems like a very odd thing to want to do - either your application uses the table, in which case dropping it will break something even if it's empty; or it doesn't, in which case it shouldn't matter whether it has (presumably redundant) and it can be dropped regardless. If you think there might be confusion, that sounds like your source (including DDL) control needs some work, maybe?
To check if either table in two schemas have a row, just count from both of them; either with a union:
select max(c) from (
select count(*) as c from schema1.table_name where rownum < 2
union all
select count(*) as c from schema2.table_name where rownum < 2
);
... or with greatest and two sub-selects, e.g.:
select greatest(
(select count(*) from schema1.table_name where rownum < 2),
(select count(*) from schema2.table_name where rownum < 2)
) from dual;
Either would return one if either table has any rows, and would only return zero f they were both empty.
Full Disclosure: I had originally suggested a query that specifically counts a column that's (a) indexed and (b) not null. #AlexPoole and #JustinCave pointed out (please see their comments below) that Oracle will optimize a COUNT(*) to do this anyway. As such, this answer has been altered significantly.
There's a good explanation here for why User_Tables shouldn't be used for accurate row counts, even when statistics are up to date.
If your tables have indexes which can be used to speed up the count by doing an index scan rather than a table scan, Oracle will use them. This will make the counts faster, though not by any means instantaneous. That said, this is the only way I know to get an accurate count.
To check for empty (zero row) tables, please use the answer posted by Alex Poole.
You could make a table to hold the counts of each table. Then, set a trigger to run on INSERT for each of the tables you're counting that updates the main table.
You'd also need to include a trigger for DELETE.

Does Oracle's ROWNUM build the whole table before it extract the rows you want?

I need to make a navigation panel that shows only a subset of a possible large result set. This subset is 20 records before and 20 records after the resulted record set. As I navigate the results through the navigation panel, I'll be applying a sliding window design using ROWNUM to get the next subset. My question is does Oracle's ROWNUM build the whole table before it extracts the rows you want? Or is it intelligent enough to only generate the rows I need? I googled and I couldn't find an explanation on this.
The pre-analytic-function method for doing this would be:
select col1, col2 from (
select col1, col2, rownum rn from (
select col1, col2 from the_table order by sort_column
)
where rownum <= 20
)
where rn > 10
The Oracle optimizer will recognize in this case that it only needs to get the top 20 rows to satisfy the inner query. It will likely have to look at all the rows (unless, say, the sort column is indexed in a way that lets it avoid the sort altogether) but it will not need to do a full sort of all the rows.
Your solution will not work (as Bob correctly pointed out) but you can use row_number() to do what you want:
SELECT col1,
col2
FROM (
SELECT col1,
col2,
row_number() over (order by some_column) as rn
FROM your_table
) t
WHERE rn BETWEEN 10 AND 20
Note that this solution has the added benefit that you can order the final result on a different criteria if you want to.
Edit: forgot to answer your initial question:
With the above solution, yes Oracle will have to build the full result in order to find out the correct numbering.
With 11g and above you might improve your query using the query cache.
Concerning the question's title.
See http://www.orafaq.com/wiki/ROWNUM and this in-depth explanation by Tom Kyte.
Concerning the question's goal.
This should be what you're looking for: Paging with Oracle
I don't think your design is quite going to work out as you've planned. Oracle assigns values to ROWNUM in the order that they are produced by the query - the first row produced is assigned ROWNUM=1, the second is assigned ROWNUM=2, etc. Notice that in order to have ROWNUM=21 assigned the query must first return the first twenty rows and thus if you write a query which says
SELECT *
FROM MY_TABLE
WHERE ROWNUM >= 21 AND
ROWNUM <= 40
no rows will be returned because in order for there to be rows with ROWNUM >= 21 the query must first return all the rows with ROWNUM <= 20.
I hope this helps.
It's an old question but you should try this - http://www.inf.unideb.hu/~gabora/pagination/results.html

Oracle Select Query, Order By + Limit Results

I am new to Oracle and working with a fairly large database. I would like to perform a query that will select the desired columns, order by a certain column and also limit the results. According to everything I have read, the below query should be working but it is returning "ORA-00918: column ambiguously defined":
SELECT * FROM(SELECT * FROM EAI.EAI_EVENT_LOG e,
EAI.EAI_EVENT_LOG_MESSAGE e1 WHERE e.SOURCE_URL LIKE '%.XML'
ORDER BY e.REQUEST_DATE_TIME DESC) WHERE ROWNUM <= 20
Any suggestions would be greatly appreciated :D
The error message means your result set contains two columns with the same name. Each column in a query's projection needs to have a unique name. Presumably you have a column (or columns) with the same name in both EAI_EVENT_LOG and EAI_EVENT_LOG_MESSAGE.
You also want to join on that column. At the moment you are generating a cross join between the two tables. In other words, if you have a hundred records in EAI_EVENT_LOG and two hundred records EAI_EVENT_LOG_MESSAGE your result set will be twenty thousand records (without the rownum). This is probably your intention.
"By switching to innerjoin, will that eliminate the error with the
current code?"
No, you'll still need to handle having two columns with the same name. Basically this comes from using SELECT * on two multiple tables. SELECT * is bad practice. It's convenient but it is always better to specify the exact columns you want in the query's projection. That way you can include (say) e.TRANSACTION_ID and exclude e1.TRANSACTION_ID, and avoid the ORA-00918 exception.
Maybe you have some columns in both EAI_EVENT_LOG and EAI_EVENT_LOG_MESSAGE tables having identical names? Instead of SELECT * list all columns you want to select.
Other problem I see is that you are selecting from two tables but you're not joining them in the WHERE clause hence the result set will be the cross product of those two table.
You need to stop using SQL '89 implicit join syntax.
Not because it doesn't work, but because it is evil.
Right now you have a cross join which in 99,9% of the cases is not what you want.
Also every sub-select needs to have it's own alias.
SELECT * FROM
(SELECT e.*, e1.* FROM EAI.EAI_EVENT_LOG e
INNER JOIN EAI.EAI_EVENT_LOG_MESSAGE e1 on (......)
WHERE e.SOURCE_URL LIKE '%.XML'
ORDER BY e.REQUEST_DATE_TIME DESC) s WHERE ROWNUM <= 20
Please specify a join criterion on the dotted line.
Normally you do a join on a keyfield e.g. ON (e.id = e1.event_id)
It's bad idea to use select *, it's better to specify exactly which fields you want:
SELECT e.field1 as customer_id
,e.field2 as customer_name
.....

Table Join Efficiency Question

When joining across tables (as in the examples below), is there an efficiency difference between joining on the tables or joining subqueries containing only the needed columns?
In other words, is there a difference in efficiency between these two tables?
SELECT result
FROM result_tbl
JOIN test_tbl USING (test_id)
JOIN sample_tbl USING (sample_id)
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A') USING(request_id)
vs
SELECT result
FROM (SELECT result, test_id FROM result_tbl)
JOIN (SELECT test_id, sample_id FROM test_tbl) USING(test_id)
JOIN (SELECT sample_id FROM sample_tbl) USING(sample_id)
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A') USING(request_id)
The only way to find out for sure is to run both with tracing turned on and then look at the trace file. But in all probability they will be treated the same: the optimizer will merge all the inline views into the main statement and come up with the same query plan.
It doesn't matter. It may actually be WORSE since you are taking control away from the optimizer which generally knows best.
However, remember if you are doing a JOIN and only including a column from one of the tables that it is QUITE OFTEN better to re-write it as a series of EXISTS statements -- because that's what you really mean. JOINs (with some exceptions) will join matching rows which is a lot more work for the optimizer to do.
e.g.
SELECT t1.id1
FROM table1 t1
INNER JOIN table2 ON something = something
should almost always be
SELECT id1
FROM table1 t1
WHERE EXISTS( SELECT *
FROM table2
WHERE something = something )
For simple queries the optimizer may reduce the query plans into identical ones. Check it out on your DBMS.
Also this is a code smell and probably should be changed:
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A')
to
SELECT result
FROM request
WHERE EXISTS(...)
AND request_status = 'A'
No difference.
You can tell by running EXPLAIN PLAN on both those statements - Oracle knows that all you want is the "result" column, so it only does the minimum necessary to get the data it needs - you should find that the plans will be identical.
The Oracle optimiser does, sometimes, "materialize" a subquery (i.e. run the subquery and keep the results in memory for later reuse), but this is rare and only occurs when the optimiser believes this will result in a performance improvement; in any case, Oracle will do this "materialization" whether you specified the columns in the subqueries or not.
Obviously if the only place the "results" column is stored is in the blocks (along with the rest of the data), Oracle has to visit those blocks - but it will only keep the relevant info (the "result" column and other relevant columns, e.g. "test_id") in memory when processing the query.

Resources