Disclaimer: I'm a developer and not a DBA.
I've been a huge fan of the USING clause in Oracle since I accidentally stumbled upon it and have used it in place of the old-fashioned ON clause to join fact tables with dimension tables ever since. To me, it creates a much more succinct SQL and produces a more concise result set with no unnecessary duplicated columns.
However, I was asked yesterday by a colleague to convert all my USING clauses into ONs. I will check with him and ask him what his reasons are. He works much more closely with the database than I do, so I assume he has some good reasons.
I have not heard back from him (we work in different timezones), but I wonder if there are any guidelines or best practices regarding the use of the "using" clause? I've googled around quite a bit, but have not come across anything definitive. In fact, I've not even even a good debate anywhere.
Can someone shed some light on this? Or provide a link to a good discussion on the topic?
Thank you!
You're presumably already aware of the distinction, but from the documentation:
ON condition Use the ON clause to specify a join condition. Doing so
lets you specify join conditions separate from any search or filter
conditions in the WHERE clause.
USING (column) When you are specifying an equijoin of columns that
have the same name in both tables, the USING column clause indicates
the columns to be used. You can use this clause only if the join
columns in both tables have the same name. Within this clause, do not
qualify the column name with a table name or table alias.
So these would be equivalent:
select e.ename, d.dname
from emp e join dept d using (deptno);
select e.ename, d.dname
from emp e join dept d on d.deptno = e.deptno;
To a large extent which you use is a matter of style, but there are (at least) two situations where you can't use using: (a) when the column names are not the same in the two tables, and (b) when you want to use the joining column:
select e.ename, d.dname, d.deptno
from emp e join dept d using(deptno);
select e.ename, d.dname, d.deptno
*
ERROR at line 1:
ORA-25154: column part of USING clause cannot have qualifier
You can of course just leave off the qualifier and select ..., deptno, as long as you don't have another table with the same column that isn't joined using it:
select e.ename, d.dname, deptno
from emp e join dept d using (deptno) join mytab m using (empno);
select e.ename, d.dname, deptno
*
ERROR at line 1:
ORA-00918: column ambiguously defined
In that case you can only select the qualified m.deptno. (OK, this is rather contrived...).
The main reason I can see for avoiding using is just consistency; since you sometimes can't use it, occasionally switching to on for those situations might be a bit jarring. But again that's more about style than any deep technical reason.
Perhaps your colleague is simply imposing (or suggesting) coding standards, but only they will know that. It also isn't quite clear if you're being asked to change some new code you've written that is going through review, or old code. If it's the latter then regardless of the reasons for them preferring on, I think you'd need to get a separate justification for modifying proven code, as there's a risk of introducing new problems even when the modified code is retested - quite apart from the cost/effort involved in the rework and retesting.
A couple of things strike me about your question though. Firstly you describes the on syntax as 'old-fashioned', but I don't think that's fair - both are valid and current (as of SQL:2011 I think, but citation needed!). And this:
produces a more concise result set with no unnecessary duplicated columns.
... which I think suggests you're using select *, otherwise you would just select one of the values, albeit with a couple of extra characters for the qualifier. Using select * is generally considered bad practice (here for example) for anything other than ad hoc queries and some subqueries.
Related question.
It seems the main difference is syntactic: the columns are merged in a USING join.
In all cases this means that you can't access the value of a joined column from a specific table, in effect some SQL will not compile, for example:
SQL> WITH t AS (SELECT 1 a, 2 b, 3 c FROM dual),
2 v AS (SELECT 1 a, 2 b, 3 c FROM dual)
3 SELECT t.* FROM t JOIN v USING (a);
SELECT t.* FROM t JOIN v USING (a)
^
ORA-25154: column part of USING clause cannot have qualifier
In an outer join this means you can't access the outer table value:
SQL> WITH t AS (SELECT 1 a, 2 b, 3 c FROM dual),
2 v AS (SELECT NULL a, 2 b, 3 c FROM dual)
3 SELECT * FROM t LEFT JOIN v USING (a)
4 WHERE v.a IS NULL;
WHERE v.a IS NULL
^
ORA-25154: column part of USING clause cannot have qualifier
This means that there is no equivalent for this anti-join syntax with the USING clause:
SQL> WITH t AS (SELECT 1 a, 2 b, 3 c FROM dual),
2 v AS (SELECT NULL a, 2 b, 3 c FROM dual)
3 SELECT * FROM t LEFT JOIN v ON v.a = t.a
4 WHERE v.a IS NULL;
A B C A B C
---------- ---------- ---------- - ---------- ----------
1 2 3
Apart from this, I'm not aware of any difference once the SQL is valid.
However, since it seems this syntax is less commonly used, I wouldn't be surprised if there were specific bugs that affect only the USING clause, especially in early versions where ANSI SQL was introduced. I haven't found anything on MOS that could confirm this, partly because the USING word is ubiquitous in bug descriptions.
If the reason for not using this feature is because of bugs, it seems to me the burden of the proof lies with your colleague: the bugs must be referenced/documented, so that the ban can eventually be lifted once the bugs are patched (database upgrade...).
If the reason is cosmetic or part of a coding convention, surely it must be documented too.
With USING you also cannot do a join like:
select a.id,aval,bval,cval
from a
left join b on a.id = b.id
left join c on c.id = b.id;
that is, only give the column from C when it is matched to a row in the B table.
Related
I'm having some difficulty with joining a view to another table. This is on an Oracle RAC system running 11.2
I'll try and give as much detail as possible without going into specific table structures as my company would not like that.
You all know how this works. "Hey, can you write some really ugly software to implement our crazy ideas?"
The idea of what they wanted me to do was to make a view where the end user wouldn't know if they were going after the new table or the old table so one of the tables is a parameter table that will return "ON" or "OFF" and is used in the case statements.
There are some not too difficult but nested case statements in the select clause
I have a view:
create view my_view as
select t1.a as a, t1.b as b, t1.c as c,
sum(case when t2.a = 'xx' then case when t3.a then ... ,
case when t2.a = 'xx' then case when t3.a then ... ,
from table1 t1
join table t2 on (t1.a = t2.a etc...)
full outer join t3 on (t1.a = t3.a etc...)
full outer join t4 on (t1.a = t4.a etc...)
group by t1.a, t1.b, t2.c, and all the ugly case statements...
Now, when I run the query
select * from my_view where a='xxx' and b='yyy' and c='zzz'
the query runs great and the cost is 10.
However, when I join this view with another table everything falls apart.
select * from my_table mt join my_view mv on (mt.a = mv.a and mt.b=mv.b and mt.c=mv.c) where ..."
everything falls apart with a cost though the roof.
What I think is happening is the predicates are not getting pushed to the view. As such, the view is now doing full tables scans and joining everything to everything and then finally removing all the rows.
Every hint, tweak, or anything I've done doesn't appear to help.
When looking at the plan it looks like it has the predicates.
But this happens after everything is joined.
Sorry if this is cryptic but any help would be greatly appreciated.
Since you have the view with a "GROUP BY", predicates could not be pushed to the inner query
Also, you have the group by functions in a case statement, which could also make it worse for the optimizer
Oracle introduces enhancements to Optimizer every version/release/patch. It is hard to say what is supported in the version you're running. However, you can try:
See if removing the case from the GROUP BY function will make any difference
Otherwise, you have to take the GROUP BY and GROUP BY functions from the view to the outer most query
After many keyboard indentations on my forehead I may have tricked Oracle into pushing the predicates. I don't know exactly why this works but simplifying things may have helped.
I changed all my ON clauses to USING clauses and in this way the column names now match the columns from which I'm joining to. On some other predicates that were constants I added in a where clause to the view.
The end result is I can now join this view with another table and the cost is reasonable and the plan shows that the predicates are being pushed.
Thank you to everybody who looked at this problem.
We're in the process of upgrading from Oracle 11g to 12c, and have noticed that queries on user_cons_columns seem to be quite a bit slower.
For example this is about 4 times as slow, even on a smaller dataset:
select uc.search_condition
from user_constraints uc inner join user_cons_columns ucc on ucc.CONSTRAINT_NAME = uc.CONSTRAINT_NAME
where ucc.table_name = :upper_table_name
and ucc.column_name = :upper_column
Could it just be a matter of gathering statistics?
In my experience selects from user_constraints and user_cons_columns and other data dictionary views have been slow for several major Oracle versions. Not just 12c. Doing dbms_stats.gather_dictionary_stats; speeded up the first query below between 10-20%.
But what was really helpful was to rewrite the query to select from with clause "tables" using the /*+materialized*/ hint instead of selecting directly from the user_ tables.
This query is very slow on my setup, about 150 seconds: (it returns all foreign keys on a list of tables, including table and column names in both ends of the foreign keys)
select
cc.table_name, cc.position, cc.constraint_name, cc.column_name,
cr.table_name r_table_name, ccr.constraint_name r_constraint_name, ccr.column_name r_column_name
from user_constraints c
join user_cons_columns cc on cc.constraint_name=c.constraint_name and
cc.owner=c.owner and
cc.table_name=c.table_name
join user_constraints cr on cr.owner=c.r_owner and
cr.constraint_name=c.r_constraint_name and
cr.constraint_type in ('P','U')
join user_cons_columns ccr on ccr.constraint_name=cr.constraint_name and
ccr.owner=cr.owner and
ccr.table_name=cr.table_name and
ccr.position=cc.position
where c.constraint_type='R'
and c.table_name in ('TABLE_A', 'TABLE_B', ........a list of about 157 table names.......)
order by cc.table_name, cc.position, constraint_name, column_name, cc.position;
After rewriting to this, the query use just 1-8 seconds:
with
uc as (select /*+materialize*/ owner,table_name,constraint_name,constraint_type,r_owner,r_constraint_name from user_constraints),
ucc as (select /*+materialize*/ owner,table_name,constraint_name,position,column_name from user_cons_columns)
select
cc.table_name, cc.position, cc.constraint_name, cc.column_name,
cr.table_name r_table_name, ccr.constraint_name r_constraint_name, ccr.column_name r_column_name
from uc c
join ucc cc on cc.constraint_name=c.constraint_name and cc.owner=c.owner and cc.table_name=c.table_name
join uc cr on cr.owner=c.r_owner and cr.constraint_name=c.r_constraint_name and cr.constraint_type in ('P','U')
join ucc ccr on ccr.constraint_name=cr.constraint_name and ccr.owner=cr.owner and ccr.table_name=cr.table_name and ccr.position=cc.position
where c.constraint_type='R'
and c.table_name in ('TABLE_A', 'TABLE_B', ........a list of about 157 table names.......)
order by cc.table_name, cc.position, constraint_name, column_name, cc.position;
I also tried * instead of listing just the needed columns in the with tables, but that didn't help. I'm guessing it's because Oracle ignore /*+materialize*/ hints if too much data is to be remembered/cached.
1. Gather dictionary stats.
begin
dbms_stats.gather_dictionary_stats;
end;
/
2. Gather fixed object stats.
begin
dbms_stats.gather_fixed_objects_stats;
end;
/
There are also a few rare data dictionary objects that are never analyzed unless you specifically call them with dbms_stats.gather_table_stats.
3. Look for broken data dictionary objects. In some rare cases character set problems can cause data dictionary performance problems. Run an EXPLAIN PLAN on the SELECT and look for anything "weird", like NLSSORT in the predicates that would prevent an index access.
4. Check My Oracle Support. I've seen bugs before for data dictionary views that degrade with new versions. Sometimes there's an alternate version of the data dictionary view that fixes the problem. I searched on My Oracle Support and "Data Dictionary Select Taking A Very Long Time in 12c (Doc ID 2251730.1)" may be relevant here. I can't post the contents of that article here so go to support.oracle.com and check out the workaround in that bug report.
5. Consider yourself lucky. If you only have one performance problem, and it's only four times slower, I'd consider that a successful upgrade.
I'm a bit late to this party but as suggested by Burleson, use the /*+ RULE */ hint with your queries on the Oracle data dictionary. This effectively turns off the optimizer.
Many have said not to use hints and that the RULE hint has been deprecated but it makes a huge difference in my case. One of my DBA_IND_COLUMNS queries which took 18 MINUTES to run now takes less than a second (Oracle 12cR1). At a loss to say why this works...
I have an Oracle view (using 11gR2) which is a join of three other views, i.e.:
create or replace view main_vw as
select a.*, b.*, c.*
from a_vw a, b_vw b, c_vw c
where a.b_id = b.b_id
and a.c_id = c.c_id
Doing "select * from main_vw" hangs for hours, and EXPLAIN PLAN shows a very inefficient query plan. Yet, if create the logically equivalent steps as:
create table a_tbl as select * from a_vw;
create table b_tbl as select * from b_vw;
create table c_tbl as select * from c_vw;
select a.*, b.*, c.*
from a_tbl a, b_tbl b, c_tbl c
where a.b_id = b.b_id and a.c_id = c.c_id;
All four statements complete in under 5 seconds.
Is there any way I can use hints or something to get Oracle's optimizer to evaluate the sub-views first, and then join them as if they were tables?
I've looked at the hints 'QB_NAME', 'NO_EXPAND', 'NO_REWRITE' to no avail...
Note: a_vw, b_vw, and c_vw in this example are quite complex queries, and they do reference base tables in common. They also reference a settings table, whose contents is customized at the session level and affects what is returned. So, I cannot create tables from these views.
Use ROWNUM to force Oracle to evaluate inline views independently.
create or replace view main_vw as
select a.*, b.*, c.*
from
(select * from a_vw where rownum >= 1 /*prevent transformations*/) a,
(select * from b_vw where rownum >= 1 /*prevent transformations*/) b,
(select * from c_vw where rownum >= 1 /*prevent transformations*/) c
where a.b_id = b.b_id
and a.c_id = c.c_id
This looks odd at first. The ROWNUM doesn't appear to do anything. But ROWNUM is a special pseudo-column used for sorting that "can affect view optimization". In practice it prevents all optimizations and is the only safe way to completely isolate code. This method is also useful for type-safety, such as a Entity-Attribute-Value pattern where everything is stored as a string and must be processed in a specific order.
As you've already discovered, hints are difficult to get right. Even if you do get them right now there's a good chance they won't work properly when the query is modified by another developer in the future. To prevent them from removing this cryptic predicate make sure to add a comment.
This won't necessarily solve the root performance problem but it should at least make it exponentially easier to solve. The over-all explain plan should include three sections that match the explain plans for each inline view. If each inline view runs well you only need to worry about the two joins between them.
I have a quite complicated view (using several layers of views across several database links) which takes a second to return all of it's rows. But, when I ask for distinct rows, it takes considerably more time. I stopped waiting after 4 minutes.
To make my self as clear as possible:
select a, b from compicated_view; -- takes 1 sec (returns 6 rows)
select distinct a, b from compicated_view; -- takes at least 4 minutes
I find that pretty weird, but hey, that's how it is. I guess Oracle messed something up when planing that query. Now, is there a way to force Oracle to first finish the select without distinct, and then do a "select distinct *" on the results? I looked into optimizer hints, but I can't find anything about hinting the order in which distinct is applied (this is first time I'm optimizing a query, obviously :-/).
I'm using Oracle SQl Developer on Oracle 10g EE.
Try:
SELECT DISTINCT A,B FROM (
SELECT A,B FROM COMPLICATED_VIEW
WHERE rownum > 0 );
this forces to materialize the subquery and prevents from view merging/predicate pushing, and likely from changing the original plan of the view.
You may also try NO_MERGE hint:
SELECT /*+ NO_MERGE(alias) */
DISTINCT a,b
FROM (
SELECT a,b FROM COMPLICATED_VIEW
) alias
Since you haven't posted details... try the following:
SELECT DISTINCT A,B
FROM
(SELECT A,B FROM COMPLICATED_VIEW);
When joining across tables (as in the examples below), is there an efficiency difference between joining on the tables or joining subqueries containing only the needed columns?
In other words, is there a difference in efficiency between these two tables?
SELECT result
FROM result_tbl
JOIN test_tbl USING (test_id)
JOIN sample_tbl USING (sample_id)
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A') USING(request_id)
vs
SELECT result
FROM (SELECT result, test_id FROM result_tbl)
JOIN (SELECT test_id, sample_id FROM test_tbl) USING(test_id)
JOIN (SELECT sample_id FROM sample_tbl) USING(sample_id)
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A') USING(request_id)
The only way to find out for sure is to run both with tracing turned on and then look at the trace file. But in all probability they will be treated the same: the optimizer will merge all the inline views into the main statement and come up with the same query plan.
It doesn't matter. It may actually be WORSE since you are taking control away from the optimizer which generally knows best.
However, remember if you are doing a JOIN and only including a column from one of the tables that it is QUITE OFTEN better to re-write it as a series of EXISTS statements -- because that's what you really mean. JOINs (with some exceptions) will join matching rows which is a lot more work for the optimizer to do.
e.g.
SELECT t1.id1
FROM table1 t1
INNER JOIN table2 ON something = something
should almost always be
SELECT id1
FROM table1 t1
WHERE EXISTS( SELECT *
FROM table2
WHERE something = something )
For simple queries the optimizer may reduce the query plans into identical ones. Check it out on your DBMS.
Also this is a code smell and probably should be changed:
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A')
to
SELECT result
FROM request
WHERE EXISTS(...)
AND request_status = 'A'
No difference.
You can tell by running EXPLAIN PLAN on both those statements - Oracle knows that all you want is the "result" column, so it only does the minimum necessary to get the data it needs - you should find that the plans will be identical.
The Oracle optimiser does, sometimes, "materialize" a subquery (i.e. run the subquery and keep the results in memory for later reuse), but this is rare and only occurs when the optimiser believes this will result in a performance improvement; in any case, Oracle will do this "materialization" whether you specified the columns in the subqueries or not.
Obviously if the only place the "results" column is stored is in the blocks (along with the rest of the data), Oracle has to visit those blocks - but it will only keep the relevant info (the "result" column and other relevant columns, e.g. "test_id") in memory when processing the query.