I have an Oracle view (using 11gR2) which is a join of three other views, i.e.:
create or replace view main_vw as
select a.*, b.*, c.*
from a_vw a, b_vw b, c_vw c
where a.b_id = b.b_id
and a.c_id = c.c_id
Doing "select * from main_vw" hangs for hours, and EXPLAIN PLAN shows a very inefficient query plan. Yet, if create the logically equivalent steps as:
create table a_tbl as select * from a_vw;
create table b_tbl as select * from b_vw;
create table c_tbl as select * from c_vw;
select a.*, b.*, c.*
from a_tbl a, b_tbl b, c_tbl c
where a.b_id = b.b_id and a.c_id = c.c_id;
All four statements complete in under 5 seconds.
Is there any way I can use hints or something to get Oracle's optimizer to evaluate the sub-views first, and then join them as if they were tables?
I've looked at the hints 'QB_NAME', 'NO_EXPAND', 'NO_REWRITE' to no avail...
Note: a_vw, b_vw, and c_vw in this example are quite complex queries, and they do reference base tables in common. They also reference a settings table, whose contents is customized at the session level and affects what is returned. So, I cannot create tables from these views.
Use ROWNUM to force Oracle to evaluate inline views independently.
create or replace view main_vw as
select a.*, b.*, c.*
from
(select * from a_vw where rownum >= 1 /*prevent transformations*/) a,
(select * from b_vw where rownum >= 1 /*prevent transformations*/) b,
(select * from c_vw where rownum >= 1 /*prevent transformations*/) c
where a.b_id = b.b_id
and a.c_id = c.c_id
This looks odd at first. The ROWNUM doesn't appear to do anything. But ROWNUM is a special pseudo-column used for sorting that "can affect view optimization". In practice it prevents all optimizations and is the only safe way to completely isolate code. This method is also useful for type-safety, such as a Entity-Attribute-Value pattern where everything is stored as a string and must be processed in a specific order.
As you've already discovered, hints are difficult to get right. Even if you do get them right now there's a good chance they won't work properly when the query is modified by another developer in the future. To prevent them from removing this cryptic predicate make sure to add a comment.
This won't necessarily solve the root performance problem but it should at least make it exponentially easier to solve. The over-all explain plan should include three sections that match the explain plans for each inline view. If each inline view runs well you only need to worry about the two joins between them.
Related
I'm having some difficulty with joining a view to another table. This is on an Oracle RAC system running 11.2
I'll try and give as much detail as possible without going into specific table structures as my company would not like that.
You all know how this works. "Hey, can you write some really ugly software to implement our crazy ideas?"
The idea of what they wanted me to do was to make a view where the end user wouldn't know if they were going after the new table or the old table so one of the tables is a parameter table that will return "ON" or "OFF" and is used in the case statements.
There are some not too difficult but nested case statements in the select clause
I have a view:
create view my_view as
select t1.a as a, t1.b as b, t1.c as c,
sum(case when t2.a = 'xx' then case when t3.a then ... ,
case when t2.a = 'xx' then case when t3.a then ... ,
from table1 t1
join table t2 on (t1.a = t2.a etc...)
full outer join t3 on (t1.a = t3.a etc...)
full outer join t4 on (t1.a = t4.a etc...)
group by t1.a, t1.b, t2.c, and all the ugly case statements...
Now, when I run the query
select * from my_view where a='xxx' and b='yyy' and c='zzz'
the query runs great and the cost is 10.
However, when I join this view with another table everything falls apart.
select * from my_table mt join my_view mv on (mt.a = mv.a and mt.b=mv.b and mt.c=mv.c) where ..."
everything falls apart with a cost though the roof.
What I think is happening is the predicates are not getting pushed to the view. As such, the view is now doing full tables scans and joining everything to everything and then finally removing all the rows.
Every hint, tweak, or anything I've done doesn't appear to help.
When looking at the plan it looks like it has the predicates.
But this happens after everything is joined.
Sorry if this is cryptic but any help would be greatly appreciated.
Since you have the view with a "GROUP BY", predicates could not be pushed to the inner query
Also, you have the group by functions in a case statement, which could also make it worse for the optimizer
Oracle introduces enhancements to Optimizer every version/release/patch. It is hard to say what is supported in the version you're running. However, you can try:
See if removing the case from the GROUP BY function will make any difference
Otherwise, you have to take the GROUP BY and GROUP BY functions from the view to the outer most query
After many keyboard indentations on my forehead I may have tricked Oracle into pushing the predicates. I don't know exactly why this works but simplifying things may have helped.
I changed all my ON clauses to USING clauses and in this way the column names now match the columns from which I'm joining to. On some other predicates that were constants I added in a where clause to the view.
The end result is I can now join this view with another table and the cost is reasonable and the plan shows that the predicates are being pushed.
Thank you to everybody who looked at this problem.
I have a quite complicated view (using several layers of views across several database links) which takes a second to return all of it's rows. But, when I ask for distinct rows, it takes considerably more time. I stopped waiting after 4 minutes.
To make my self as clear as possible:
select a, b from compicated_view; -- takes 1 sec (returns 6 rows)
select distinct a, b from compicated_view; -- takes at least 4 minutes
I find that pretty weird, but hey, that's how it is. I guess Oracle messed something up when planing that query. Now, is there a way to force Oracle to first finish the select without distinct, and then do a "select distinct *" on the results? I looked into optimizer hints, but I can't find anything about hinting the order in which distinct is applied (this is first time I'm optimizing a query, obviously :-/).
I'm using Oracle SQl Developer on Oracle 10g EE.
Try:
SELECT DISTINCT A,B FROM (
SELECT A,B FROM COMPLICATED_VIEW
WHERE rownum > 0 );
this forces to materialize the subquery and prevents from view merging/predicate pushing, and likely from changing the original plan of the view.
You may also try NO_MERGE hint:
SELECT /*+ NO_MERGE(alias) */
DISTINCT a,b
FROM (
SELECT a,b FROM COMPLICATED_VIEW
) alias
Since you haven't posted details... try the following:
SELECT DISTINCT A,B
FROM
(SELECT A,B FROM COMPLICATED_VIEW);
Disclaimer: I'm a developer and not a DBA.
I've been a huge fan of the USING clause in Oracle since I accidentally stumbled upon it and have used it in place of the old-fashioned ON clause to join fact tables with dimension tables ever since. To me, it creates a much more succinct SQL and produces a more concise result set with no unnecessary duplicated columns.
However, I was asked yesterday by a colleague to convert all my USING clauses into ONs. I will check with him and ask him what his reasons are. He works much more closely with the database than I do, so I assume he has some good reasons.
I have not heard back from him (we work in different timezones), but I wonder if there are any guidelines or best practices regarding the use of the "using" clause? I've googled around quite a bit, but have not come across anything definitive. In fact, I've not even even a good debate anywhere.
Can someone shed some light on this? Or provide a link to a good discussion on the topic?
Thank you!
You're presumably already aware of the distinction, but from the documentation:
ON condition Use the ON clause to specify a join condition. Doing so
lets you specify join conditions separate from any search or filter
conditions in the WHERE clause.
USING (column) When you are specifying an equijoin of columns that
have the same name in both tables, the USING column clause indicates
the columns to be used. You can use this clause only if the join
columns in both tables have the same name. Within this clause, do not
qualify the column name with a table name or table alias.
So these would be equivalent:
select e.ename, d.dname
from emp e join dept d using (deptno);
select e.ename, d.dname
from emp e join dept d on d.deptno = e.deptno;
To a large extent which you use is a matter of style, but there are (at least) two situations where you can't use using: (a) when the column names are not the same in the two tables, and (b) when you want to use the joining column:
select e.ename, d.dname, d.deptno
from emp e join dept d using(deptno);
select e.ename, d.dname, d.deptno
*
ERROR at line 1:
ORA-25154: column part of USING clause cannot have qualifier
You can of course just leave off the qualifier and select ..., deptno, as long as you don't have another table with the same column that isn't joined using it:
select e.ename, d.dname, deptno
from emp e join dept d using (deptno) join mytab m using (empno);
select e.ename, d.dname, deptno
*
ERROR at line 1:
ORA-00918: column ambiguously defined
In that case you can only select the qualified m.deptno. (OK, this is rather contrived...).
The main reason I can see for avoiding using is just consistency; since you sometimes can't use it, occasionally switching to on for those situations might be a bit jarring. But again that's more about style than any deep technical reason.
Perhaps your colleague is simply imposing (or suggesting) coding standards, but only they will know that. It also isn't quite clear if you're being asked to change some new code you've written that is going through review, or old code. If it's the latter then regardless of the reasons for them preferring on, I think you'd need to get a separate justification for modifying proven code, as there's a risk of introducing new problems even when the modified code is retested - quite apart from the cost/effort involved in the rework and retesting.
A couple of things strike me about your question though. Firstly you describes the on syntax as 'old-fashioned', but I don't think that's fair - both are valid and current (as of SQL:2011 I think, but citation needed!). And this:
produces a more concise result set with no unnecessary duplicated columns.
... which I think suggests you're using select *, otherwise you would just select one of the values, albeit with a couple of extra characters for the qualifier. Using select * is generally considered bad practice (here for example) for anything other than ad hoc queries and some subqueries.
Related question.
It seems the main difference is syntactic: the columns are merged in a USING join.
In all cases this means that you can't access the value of a joined column from a specific table, in effect some SQL will not compile, for example:
SQL> WITH t AS (SELECT 1 a, 2 b, 3 c FROM dual),
2 v AS (SELECT 1 a, 2 b, 3 c FROM dual)
3 SELECT t.* FROM t JOIN v USING (a);
SELECT t.* FROM t JOIN v USING (a)
^
ORA-25154: column part of USING clause cannot have qualifier
In an outer join this means you can't access the outer table value:
SQL> WITH t AS (SELECT 1 a, 2 b, 3 c FROM dual),
2 v AS (SELECT NULL a, 2 b, 3 c FROM dual)
3 SELECT * FROM t LEFT JOIN v USING (a)
4 WHERE v.a IS NULL;
WHERE v.a IS NULL
^
ORA-25154: column part of USING clause cannot have qualifier
This means that there is no equivalent for this anti-join syntax with the USING clause:
SQL> WITH t AS (SELECT 1 a, 2 b, 3 c FROM dual),
2 v AS (SELECT NULL a, 2 b, 3 c FROM dual)
3 SELECT * FROM t LEFT JOIN v ON v.a = t.a
4 WHERE v.a IS NULL;
A B C A B C
---------- ---------- ---------- - ---------- ----------
1 2 3
Apart from this, I'm not aware of any difference once the SQL is valid.
However, since it seems this syntax is less commonly used, I wouldn't be surprised if there were specific bugs that affect only the USING clause, especially in early versions where ANSI SQL was introduced. I haven't found anything on MOS that could confirm this, partly because the USING word is ubiquitous in bug descriptions.
If the reason for not using this feature is because of bugs, it seems to me the burden of the proof lies with your colleague: the bugs must be referenced/documented, so that the ban can eventually be lifted once the bugs are patched (database upgrade...).
If the reason is cosmetic or part of a coding convention, surely it must be documented too.
With USING you also cannot do a join like:
select a.id,aval,bval,cval
from a
left join b on a.id = b.id
left join c on c.id = b.id;
that is, only give the column from C when it is matched to a row in the B table.
I would request for help in understanding which all RDBMS from Oracle, DB2, Sybase support a common table expression (CTE) in a sub-query. I am aware that PostgreSQL does while MS SQL Server does not.
SELECT a.*, b.*
FROM (WHERE aa as (
<<select_query>),
SELECT *
FROM aa
WHERE <<criteria>>
) as a
LEFT JOIN (
WITH bb as (
<<select_query>
),
select * from bb inner join tbl_c on <<innerjoin>> where <<criteria>>
) as b
on <<join_expr>>
I am unable to define the with clause outside the sub-queries - both the queries are dynamically generated w.r.t. the columns, criteria, security, etc.
Also, the above query itself may be used in another query as a sub-query.
In summary, the principle is dynamically generated views, re-usable later. Some queries may have upto 10-12 such dynamic views being merged together as well.
The problem is that the application is supposed to be database-agnostic at least so far as PG, Oracle & DB2 are concerned and features not supported by one are not implemented at all.
Yes, you can use CTE's in subqueries in Oracle. From the Oracle 11g docs:
You can specify this clause in any top-level SELECT statement and in
most types of subqueries. The query name is visible to the main query
and to all subsequent subqueries. For recursive subquery factoring,
the query name is even visible to the subquery that defines the query
name itself.
As an example, this works in Oracle:
SELECT a.*, b.*
FROM (WITH aa AS
(
SELECT LEVEL l1, mod(level, 5) m1 FROM dual CONNECT BY LEVEL < 50
)
SELECT * FROM aa WHERE m1 < 3) a LEFT JOIN
(WITH bb AS
(
SELECT LEVEL l2, mod(level, 5) m2 FROM dual CONNECT BY LEVEL < 50
)
SELECT * FROM bb WHERE m2 BETWEEN 1 AND 4) b
ON a.l1 = b.l2;
That's not directly an answer to your question, but maybe you can think about this:
SQL Server seems to limit the semantics (not necessarily the syntax) of SQL where it makes sense to do so. For instance, you cannot have a subquery with an ORDER BY clause, if you don't also specify a TOP n clause. This makes sense, as ordered subqueries are pointless unless they have limits. Other RDBMS allow for such pointlessness.
In your case (that's just a guess), having CTE's in subqueries only makes limited sense, because you can rewrite your whole query in a way that the CTE's are declared at the top-most level. The only difference you'll have is the scope and maybe the readability of each declaration.
On the other hand, CTE's allow for recursive queries, which might be very hard to apply when CTE's are declared in subqueries...
Since you need to implement database-agnostic SQL, I recommend you do not make heavy use of CTE's yet. If CTE's are simple, you can always rewrite them as simple views...
The newer Microsoft SQL Server versions do support CTE's.
While PostgreSQL supports CTE's, they are an optimisation barrier which prevents predicate or join pushing into the CTE query. This makes them less effective in many cases than a simple subquery.
Why inline views are used..??
There are many different reasons for using inline views. Some things can't be done without inline views, for example:
1) Filtering on the results of an analytic function:
select ename from
( select ename, rank() over (order by sal desc) rnk
from emp
)
where rnk < 4;
2) Using ROWNUM on ordered results:
select ename, ROWNUM from
( select ename
from emp
order by ename
);
Other times they just make it easier to write the SQL you want to write.
The inline view is a construct in Oracle SQL where you can place a query in the SQL FROM, clause, just as if the query was a table name.
Inline views provide
Bind variables can be introduced inside the statement to limit the data
Better control over the tuning
Visibility into the code
To get top N ordered rows.
SELECT name, salary,
FROM (SELECT name, salary
FROM emp
ORDER BY salary DESC)
WHERE rownum <= 10;
An inline view can be regarded as an intermediate result set that contributes to the required data set in some way. Sometimes it is entirely a matter of improving maintainability of the code, and sometimes it is logically neccessary.
From the Oracle Database Concepts document there are the inline view concept definition:
An inline view is not a schema object.
It is a subquery with an alias
(correlation name) that you can use
like a view within a SQL statement.
About the subqueries look in Using Subqueries from the Oracle SQL Reference manual. It have a very nice pedagogic information.
Anyway, today is preferred to use the Subquery Factoring Clause that is a more powerfull way of use inline views.
As an example of all together:
WITH
dept_costs AS (
SELECT department_name, SUM(salary) dept_total
FROM employees e, departments d
WHERE e.department_id = d.department_id
GROUP BY department_name),
avg_cost AS
SELECT * FROM dept_costs
WHERE dept_total >
(SELECT avg FROM (SELECT SUM(dept_total)/COUNT(*) avg
FROM dept_costs)
)
ORDER BY department_name;
There you can see one of all:
An inline view query: SELECT SUM...
A correlated subquery: SELECT avg FROM...
A subquery factoring: dept_costs AS (...
What are they used for?:
To avoid creating an intermediate view object: CREATE VIEW ...
To simplify some queries that a view cannot be helpfull. For instance, when the view need to filter from the main query.
You will often use inline views to break your query up into logical parts which helps both readability and makes writing more complex queries a bit easier.
Jva and Tony Andrews provided some good examples of simple cases where this is useful such as Top-N or Pagination queries where you may want to perform a query and order its results before using that as a part of a larger query which in turn might feed a query doing some other processing, where the logic for these individual queries would be difficult to achieve in a single query.
Another case they can be very useful is if you are writing a query that joins various tables together and want to perform aggregation on some of the tables, separating group functions and the processing into different inline views before performing the joins makes managing cardinality a lot easier. If you want some examples, I would be happy to provide them to make it more clear.
Factored subqueries (where you list your queries in the WITH clause at the start of the query) and inline views also often bring performance benefits. If you need to access the results of the subquery multiple times, you only need to run it once and it can be materialized as a Global Temporary Table (how the optimizer acts isn't totally black and white so I won't go into it here but you can do your own research - for example, see http://jonathanlewis.wordpress.com/2007/07/26/subquery-factoring-2/)