Sub Query Issues - oracle

Recently our DB server changed and after that sub query started giving performance issue.
Example :
select * from table1 a where col1 =
(select max(col1) from table1 b where a.p1=b.p1)
This pattern is available at many places so NOT looking to change query but any database level changes should be fine. Looking for which DB parameters can cause performance Issue.

The optimizer should be able to rewrite your query in the following, more efficient form (but it probably can't do it if your query is much more complicated than that, involving many joins, etc. - assuming your example is much simplified just to illustrate the problem):
select * from table1 where (p1, col1) in
(select p1, max(col1) from table1 group by p1);
If the optimizer doesn't (or can't, for some reason) rewrite the query this way, then it will obviously be slow, since it must read the table repeatedly - once for each row in the table.
Another common way to get the same result (but experimentation shows that it is often slower, even though it only reads the base table once, vs. twice with the solution above) uses analytic functions:
select * -- or just the columns from table1, without rnk
from (
select t.*,
dense_rank() over (partition by p1
order by col1 desc nulls last) as rnk
from table1 t
)
where rnk = 1
;

Related

Select N arbitrary rows from a very large table in Oracle

I have a very large table MY_TABLE (100 million rows). I wish to select a sample of 5, say, records from this table.
What I can think of is getting 5 arbitrary primary keys as follows, this uses fast full scan as the explain plan shows:
select MY_PRIMARY_KEY_COLUMN from (select MY_PRIMARY_KEY_COLUMN, rownum as rn from MY_TABLE) where rn <=5
and then getting the records corresponding to these primary keys.
However this is still very very slow..
Can it be done more efficiently?
As it looks, I got confused. As the commenters noticed, there should have been no problem with the query
select * from MY_TABLE where rownum <=5
but I somehow started to look at
select MY_PRIMARY_KEY_COLUMN from (select MY_PRIMARY_KEY_COLUMN, rownum as rn from MY_TABLE) where rn <=5
which indeed runs very slowly..
Sorry for wasting everyone's time, the select * from MY_TABLE where rownum <=5 works perfectly.

Select distinct on specific columns but select other columns also in hive

I have multiple columns in a table in hive having around 80 columns. I need to apply the distinct clause on some of the columns and get the first values from the other columns also. Below is the representation of what I am trying to achieve.
select distinct(col1,col2,col3),col5,col6,col7
from abc where col1 = 'something';
All the columns mentioned above are text columns. So I cannot apply group by and aggregate functions.
You can use row_number function to solve the problem.
create table temp as
select *, row_number() over (partition by col1,col2,col3) as rn
from abc
where col1 = 'something';
select *
from temp
where rn=1
You can also sort the table while partitioning.
row_number() over (partition by col1,col2,col3 order by col4 asc) as rn
DISTINCT is the most overused and least understood function in SQL. It's the last thing that is executed over your entire result set and removes duplicates using ALL columns in your select. You can do a GROUP BY with a string, in fact that is the answer here:
SELECT col1,col2,col3,COLLECT_SET(col4),COLLECT_SET(col5),COLLECT_SET(col6)
FROM abc WHERE col1 = 'something'
GROUP BY col1,col2,col3;
Now that I re-read your question though, I'm not really sure what you are after. You might have to join the table to an aggregate of itself.

How to use Oracle hints or other optimization to fix function in where clause performance issue?

This is slow:
select col_x from table_a where col_y in (select col_y from table_b) and fn(col_x)=0;
But I know that this will return 4 rows fast, and that I can run fn() on 4 values fast.
So I do some testing, and I see that this is fast:
select fn(col_x) from table_a where col_y in (select col_y from table_b);
When using the fn() in the where clause, Oracle is running it on every row in table_a. How can I make it so Oracle first uses the col_y filter, and only runs the function on the matched rows?
For example, conceptually, I though this would work:
with taba as (
select fn(col_x) x from table_a where col_y in (select col_y from table_b)
)
select * from taba where x=0;
because I thought Oracle would run the with clause first, but Oracle is "optimizing" this query and making this run exactly the same as the first query above where fn(col_x)=0 is in the where clause.
I would like this to run just as a query and not in a pl/sql block. It seems like there should be a way to give oracle a hint, or do some other trick, but I can't figure it out. BTW, table is indexed on col_y and it is being used as an access predicate. Stats are up to date.
There are two ways you could go around it,
1) add 'AND rownum >=0' in the subquery to force materialization.
OR
2) use a Case statement inside the query to force the execution priority (maybe)
This works, but if anyone has a better answer, please share:
select col_x
from table_a
where col_y in (select col_y from table_b)
and (select 1 from dual where fn(col_x)=0);
Kind of kludgy, but works. Takes a query running in 60+ seconds down to .1 seconds.
You could try the HAVING clause in your query. This clause is not executed until the base query is completed, and then the HAVING clause is run on the resulting rows. It's typically used for analytic functions, but could be useful in your case.
select col_x
from table_a
where col_y in (select col_y from table_b)
having fn(col_x)=0;
A HAVING clause restricts the results of a GROUP BY in a
SelectExpression. The HAVING clause is applied to each group of the
grouped table, much as a WHERE clause is applied to a select list. If
there is no GROUP BY clause, the HAVING clause is applied to the
entire result as a single group. The SELECT clause cannot refer
directly to any column that does not have a GROUP BY clause. It can,
however, refer to constants, aggregates, and special registers.
http://docs.oracle.com/javadb/10.8.3.0/ref/rrefsqlj14854.html
1) Why you don't try join table_a and table_b using col_y.
select a.col_x from table_a a,table_b b
where a.col_y = b.col_y
and fn(col_x) = 0
2) NO_PUSH_PRED -
select /*+ NO_PUSH_PRED(v) */ col_x from (
select col_x from table_a where col_y in (select col_y from table_b)
) v
where fn(col_x) =0
3) Exists and PUSH_SUBQ.
select col_x from table_a a
where exists( select /*+ PUSH_SUBQ */ 1 from table_b b where a.col_y = b.coly )
and fn(col_x) = 0;

select * from (select first_name, last_name from employees)

I do understand the meaning of this statement but I don't understand why do we need this?
This is equivalent to
select first_Name, last_name from employees
I can see this type of statements in many examples. Can you please explain when we need this? In practical do we use this type of statements?
Can you please explain when we need this?
These are called Derived Tables.
A "derived table" is essentially a statement-local temporary table
created by means of a subquery in the FROM clause of a SQL SELECT
statement. It exists only in memory and behaves like a standard view
or table.
In SQL, subqueries can only see values from parent queries one level deep.
In practical do we use this type of statements?
The most common use of it is the classic row-limiting query using ROWNUM.
Row-Limiting query:
SELECT *
FROM (SELECT *
FROM emp
ORDER BY sal DESC)
WHERE ROWNUM <= 5;
Pagination query:
SELECT eno
FROM (SELECT e.empno eno,
e.ROWNUM rn
FROM (SELECT empno
FROM emp
ORDER BY sal DESC) e)
WHERE rn <= 5;
This kind of statement is useless, you're right, but there are many occasions when you need a subselect because you can't do everything in one statement. Of the top of my head I'd be thinking about for instance, combining aggregate functions, get the min, max and avg of a sum
select min(t.summed), max(t.summed), avg(t.summed)
from (select type, sum(value) as summed from table1 group by type) t
this is just from the top of my head, but I did encounter many occasions where subselects in the from clause were necessary. Once the statements are complex enough you'll see it.

Best practice for pagination in Oracle?

Problem: I need write stored procedure(s) that will return result set of a single page of rows and the number of total rows.
Solution A: I create two stored procedures, one that returns a results set of a single page and another that returns a scalar -- total rows. The Explain Plan says the first sproc has a cost of 9 and the second has a cost of 3.
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY D.ID DESC ) AS RowNum, ...
) AS PageResult
WHERE RowNum >= #from
AND RowNum < #to
ORDER BY RowNum
SELECT COUNT(*)
FROM ...
Solution B: I put everything in a single sproc, by adding the same TotalRows number to every row in the result set. This solution feel hackish, but has a cost of 9 and only one sproc, so I'm inclined to use this solution.
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY D.ID DESC ) RowNum, COUNT(*) OVER () TotalRows,
WHERE RowNum >= from
AND RowNum < to
ORDER BY RowNum;
Is there a best-practice for pagination in Oracle? Which of the aforementioned solutions is most used in practice? Is any of them considered just plain wrong? Note that my DB is and will stay relatively small (less than 10GB).
I'm using Oracle 11g and the latest ODP.NET with VS2010 SP1 and Entity Framework 4.4. I need the final solution to work within the EF 4.4. I'm sure there are probably better methods out there for pagination in general, but I need them working with EF.
If you're already using analytics (ROW_NUMBER() OVER ...) then adding another analytic function on the same partitioning will add a negligible cost to the query.
On the other hand, there are many other ways to do pagination, one of them using rownum:
SELECT *
FROM (SELECT A.*, rownum rn
FROM (SELECT *
FROM your_table
ORDER BY col) A
WHERE rownum <= :Y)
WHERE rn >= :X
This method will be superior if you have an appropriate index on the ordering column. In this case, it might be more efficient to use two queries (one for the total number of rows, one for the result).
Both methods are appropriate but in general if you want both the number of rows and a pagination set then using analytics is more efficient because you only query the rows once.
In Oracle 12C you can use limit LIMIT and OFFSET for the pagination.
Example -
Suppose you have Table tab from which data needs to be fetched on the basis of DATE datatype column dt in descending order using pagination.
page_size:=5
select * from tab
order by dt desc
OFFSET nvl(page_no-1,1)*page_size ROWS FETCH NEXT page_size ROWS ONLY;
Explanation:
page_no=1
page_size=5
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY - Fetch 1st 5 rows only
page_no=2
page_size=5
OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY - Fetch next 5 rows
and so on.
Refrence Pages -
https://dba-presents.com/index.php/databases/oracle/31-new-pagination-method-in-oracle-12c-offset-fetch
https://oracle-base.com/articles/12c/row-limiting-clause-for-top-n-queries-12cr1#paging
This may help:
SELECT * FROM
( SELECT deptno, ename, sal, ROW_NUMBER() OVER (ORDER BY ename) Row_Num FROM emp)
WHERE Row_Num BETWEEN 5 and 10;
A clean way to organize your SQL code could be trough WITH statement.
The reduced version implements also total number of results and total pages count.
For example
WITH SELECTION AS (
SELECT FIELDA, FIELDB, FIELDC FROM TABLE),
NUMBERED AS (
SELECT
ROW_NUMBER() OVER (ORDER BY FIELDA) RN,
SELECTION.*
FROM SELECTION)
SELECT
(SELECT COUNT(*) FROM NUMBERED) TOTAL_ROWS,
NUMBERED.*
FROM NUMBERED
WHERE
RN BETWEEN ((:page_size*:page_number)-:page_size+1) AND (:page_size*:page_number)
This code gives you a paged resultset with two more fields:
TOTAL_ROWS with the total rows of your full SELECTION
RN the row number of the record
It requires 2 parameter: :page_size and :page_number to slice your SELECTION
Reduced Version
Selection implements already ROW_NUMBER() field
WITH SELECTION AS (
SELECT
ROW_NUMBER() OVER (ORDER BY FIELDA) RN,
FIELDA,
FIELDB,
FIELDC
FROM TABLE)
SELECT
:page_number PAGE_NUMBER,
CEIL((SELECT COUNT(*) FROM SELECTION ) / :page_size) TOTAL_PAGES,
:page_size PAGE_SIZE,
(SELECT COUNT(*) FROM SELECTION ) TOTAL_ROWS,
SELECTION.*
FROM SELECTION
WHERE
RN BETWEEN ((:page_size*:page_number)-:page_size+1) AND (:page_size*:page_number)
Try this:
select * from ( select * from "table" order by "column" desc ) where ROWNUM > 0 and ROWNUM <= 5;
I also faced a similar issue. I tried all the above solutions and none gave me a better performance. I have a table with millions of records and I need to display them on screen in pages of 20. I have done the below to solve the issue.
Add a new column ROW_NUMBER in the table.
Make the column as primary key or add a unique index on it.
Use the population program (in my case, Informatica), to populate the column with rownum.
Fetch Records from the table using between statement. (SELECT * FROM TABLE WHERE ROW_NUMBER BETWEEN LOWER_RANGE AND UPPER_RANGE).
This method is effective if we need to do an unconditional pagination fetch on a huge table.
Sorry, this one works with sorting:
SELECT * FROM (SELECT ROWNUM rnum,a.* FROM (SELECT * FROM "tabla" order by "column" asc) a) WHERE rnum BETWEEN "firstrange" AND "lastrange";

Resources