Oracle11g select query with pagination - oracle

I am facing a big performance problem when trying to get a list of objects with pagination from an oracle11g database.
As far as I know and as much as I have checked online, the only way to achieve pagination in oracle11g is the following :
Example : [page=1, size=100]
SELECT * FROM
(
SELECT pagination.*, rownum r__ FROM
(
select * from "TABLE_NAME" t
inner join X on X.id = t.id
inner join .....
where ......
order
) pagination
WHERE rownum <= 200
)
WHERE r__ > 100
The problem in this query, is that the most inner query fetching data from the table "TABLE_NAME" is returning a huge amount of data and causing the overall query to take 8 seconds (there are around 2 Million records returned after applying the where clause, and it contains 9 or 10 join clause).
The reason of this is that the most inner query is fetching all the data that respects the where clause and then the second query is getting the 200 rows, and the third to exclude the first 100 to get the second pages' data we need.
Isn't there a way to do that in one query, in a way to fetch the second pages' data that we need without having to do all these steps and cause performance issues?
Thank you!!

It depends on your sorting options (order by ...): database needs to sort whole dataset before applying outer where rownum<200 because of your order by clause.
It will fetch only 200 rows if you remove your order by clause. In some cases oracle can avoid sort operations (for example, if oracle can use some indexes to get requested data in the required order). Btw, Oracle uses optimized sorting operations in case of rownum<N predicates: it doesn't sort full dataset, it just gets top N records instead.
You can investigate sort operations deeper using sort trace event: alter session set events '10032 trace name context forever, level 10';
Furthermore, sometimes it's better to use analytic functions like
select *
from (
select
t1.*
,t2.*
,row_number()over([partition by ...] order by ...) rn
from t1
,t2
where ...
)
where rn <=200
and rn>=100
because in some specific cases Oracle can transform your query to push sorting and sort filter predicates to the earliest possible steps.

Related

order by rownum — is it correct or not?

I have a canonical top-N query against an Oracle database suggested by all FAQs and HowTos:
select ... from (
select ... from ... order by ...
) where ronwum <= N
It works perfectly on Oracle 11, i.e. it returns top-N records in the order specified in inner select.
However it breaks on Oracle 12. It still returns the same top-N records, but they may get shuffled. The final order of these records is non-deterministic.
I googled but haven't found any related discussions. Looks like everyone else is always getting the correct record order from such select.
One finding was interesting though. I saw that some people use (without an explanation, unfortunately) an additional order by rownum clause in the outer select:
select ... from (
select ... from ... order by ...
) where ronwum <= N
order by rownum
(both rownum's here are references to the Oracle pseudocolumn; it's not something returned by inner select)
It appears to work. But with Oracle optimizer you can never be sure if it's just luck or a really correct solution.
The question is: does order by rownum guarantee correct ordering in this case or not, and why? Me and my colleagues could not come to agreement about it.
P.S. I'm aware of other ways to select top-N records, e.g. using row_number analytic function and fetch first clause introduced in Oracle 12. I'm also aware that I can repeat the same order by ... on the outer select. The question is about order by rownum only — is it correct or not.
Inner query and outer query may or may not give different order and hence different order of rownum. As rownum is already ordered and if you want to get top N records then best thing is to do is create alias of rownum in inner query and use it on outer query.
select ... from (
select rownum rn ... from ...
) where rn <= N
order by rn

Why are subqueries in selects slow compared to joins?

I have two queries. The first retrieves some aggregate from another table as a column, using a subquery in the select (returns a string concatenation of a column for all rows).
The second query does the same by having a subselect in the from and then join the results. This second query however is doing the aggregate on the complete table before joining, but it is much faster (286ms vs 7645ms).
I don't understand why the subquery is so much slower, while the second query does the aggregate on a table with 175k rows (on postgresql 9.5). Using a subselect is much easier to integrate in a query builder, so I would like to use that, and the second query will slow down when the number of records increase. Is there a way to increase the speed of a subselect?
Query 1:
select kp_No,
(select string_agg(description,E'\n') from (select nt_Text as description from fgeNote where nt_kp_No=fgeContact.kp_No order by nt_No DESC limit 3) as subquery) as description
from fgeContact
where kp_k_No=729;
Explain: https://explain.depesz.com/s/8sL
Query 2:
select kp_No, NoteSummary
from fgeContact
LEFT JOIN
(select nt_kp_No, string_agg(nt_Text,E'\n') as NoteSummary
from
(select nt_kp_No, nt_Text from fgeNote ORDER BY nt_No DESC) as sortquery
group by nt_kp_No) as joinquery
ON joinquery.nt_kp_No=kp_No
where kp_k_No=729;
Explain: https://explain.depesz.com/s/yk9W
This is because in the second query you are retrieving all registers in a single scan while in the first one, the subquery is executed one time per each selected register of the master table so, each time, the table should be scanned again.
Even with indexed scans, it is usually more expensive than scanning the whole table, even sequentially (in fact, sequential scan is much faster than indexed scan when when selecting more than a few registers because indexing implies some overhead) and picking only for interesting registers.
But that depends to the actual data distribution too. Its perfecly possible that, for a different value of kp_k_No, the second query would become faster if the table contains only one or a few rows with that value for this parameter.
It's a matter of test and guess the different situations that will happen...

Write a nested select statement with a where clause in Hive

I have a requirement to do a nested select within a where clause in a Hive query. A sample code snippet would be as follows;
select *
from TableA
where TA_timestamp > (select timestmp from TableB where id="hourDim")
Is this possible or am I doing something wrong here, because I am getting an error while running the above script ?!
To further elaborate on what I am trying to do, there is a cassandra keyspace that I publish statistics with a timestamp. Periodically (hourly for example) this stats will be summarized using hive, once summarized that data will be stored separately with the corresponding hour. So when the query runs for the second time (and consecutive runs) the query should only run on the new data (i.e. - timestamp > previous_execution_timestamp). I am trying to do that by storing the latest executed timestamp in a separate hive table, and then use that value to filter out the raw stats.
Can this be achieved this using hive ?!
Subqueries inside a WHERE clause are not supported in Hive:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
However, often you can use a JOIN statement instead to get to the same result:
https://karmasphere.com/hive-queries-on-table-data#join_syntax
For example, this query:
SELECT a.KEY, a.value
FROM a
WHERE a.KEY IN
(SELECT b.KEY FROM B);
can be rewritten to:
SELECT a.KEY, a.val
FROM a LEFT SEMI JOIN b ON (a.KEY = b.KEY)
Looking at the business requirements underlying your question, it occurs that you might get more efficient results by partitioning your Hive table using hour. If the data can be written to use this factor as the partition key, then your query to update the summary will be much faster and require fewer resources.
Partitions can get out of hand when they reach the scale of millions, but this seems like a case that will not tease that limitation.
It will work if you put in :
select *
from TableA
where TA_timestamp in (select timestmp from TableB where id="hourDim")
EXPLANATION : As > , < , = need one exact figure in the right side, while here we are getting multiple values which can be taken only with 'IN' clause.

hibernate oracle rownum issue

SELECT * FROM (
select *
from tableA
where ColumnA = 'randomText'
ORDER BY columnL ASC
) WHERE ROWNUM <= 25
on execution of this query, due to some Oracle optimization, the query takes about 14 minutes to execute . If I remove the where clause , the query executes in seconds. most of the columns of the table have indexes on them, including the ones mentioned above. I do not have much flexibility on the structure of the query as I use hibernate.
This query returns results instantly too, with the correct result:
SELECT *
FROM (
select *
from tableA,
dual
where ColumnA = 'randomText'
ORDER BY columnL ASC
) WHERE ROWNUM <= 25
is there something I can do, using hibernate?
UPDATE: I use EntityManager.createQuery(), and I use setMaxResults(25) and setFirstResult() too. the query above is what hibernate's query looks like, upon observation of logs
I don't get the explain plans exactly matched to your queries, but it seems oracle using a different index for the two queries.
Can you create an index containing columnA and columnL?
If you have an index only containing columnA, you MIGHT be able to drop that without a large effect on performance of other queries.
An alternative would be to add a hint to use the index used in the faster query. But this would require you to use native sql.
this means you are using hibernate/jpa? If so, I guess you are using the EntityManager.createNativeQuery() to create the query? Try removing your where-restriction and use the .setMaxResults(25) on the Query instead.
Anyways, why do you need the outer-select? Wouldn't
select *
from tableA
where ColumnA = 'randomText'
AND ROWNUM <= 25
ORDER BY columnL ASC
produce the desired results?

Oracle Select Query, Order By + Limit Results

I am new to Oracle and working with a fairly large database. I would like to perform a query that will select the desired columns, order by a certain column and also limit the results. According to everything I have read, the below query should be working but it is returning "ORA-00918: column ambiguously defined":
SELECT * FROM(SELECT * FROM EAI.EAI_EVENT_LOG e,
EAI.EAI_EVENT_LOG_MESSAGE e1 WHERE e.SOURCE_URL LIKE '%.XML'
ORDER BY e.REQUEST_DATE_TIME DESC) WHERE ROWNUM <= 20
Any suggestions would be greatly appreciated :D
The error message means your result set contains two columns with the same name. Each column in a query's projection needs to have a unique name. Presumably you have a column (or columns) with the same name in both EAI_EVENT_LOG and EAI_EVENT_LOG_MESSAGE.
You also want to join on that column. At the moment you are generating a cross join between the two tables. In other words, if you have a hundred records in EAI_EVENT_LOG and two hundred records EAI_EVENT_LOG_MESSAGE your result set will be twenty thousand records (without the rownum). This is probably your intention.
"By switching to innerjoin, will that eliminate the error with the
current code?"
No, you'll still need to handle having two columns with the same name. Basically this comes from using SELECT * on two multiple tables. SELECT * is bad practice. It's convenient but it is always better to specify the exact columns you want in the query's projection. That way you can include (say) e.TRANSACTION_ID and exclude e1.TRANSACTION_ID, and avoid the ORA-00918 exception.
Maybe you have some columns in both EAI_EVENT_LOG and EAI_EVENT_LOG_MESSAGE tables having identical names? Instead of SELECT * list all columns you want to select.
Other problem I see is that you are selecting from two tables but you're not joining them in the WHERE clause hence the result set will be the cross product of those two table.
You need to stop using SQL '89 implicit join syntax.
Not because it doesn't work, but because it is evil.
Right now you have a cross join which in 99,9% of the cases is not what you want.
Also every sub-select needs to have it's own alias.
SELECT * FROM
(SELECT e.*, e1.* FROM EAI.EAI_EVENT_LOG e
INNER JOIN EAI.EAI_EVENT_LOG_MESSAGE e1 on (......)
WHERE e.SOURCE_URL LIKE '%.XML'
ORDER BY e.REQUEST_DATE_TIME DESC) s WHERE ROWNUM <= 20
Please specify a join criterion on the dotted line.
Normally you do a join on a keyfield e.g. ON (e.id = e1.event_id)
It's bad idea to use select *, it's better to specify exactly which fields you want:
SELECT e.field1 as customer_id
,e.field2 as customer_name
.....

Resources