Oracle db Joins vs FROM a,b,c - oracle

I've been working with SQL Server db for some time, and there I would
join the tables here and there and everywhere in my queries.
Now I have a project with Oracle db, and as I looked through
procedures some oracle programmer wrote, complex data selection
query, I noticed she never ever used a join.
Question :
Is there anything specific about Oracle that implies not using joins, or its human factor ?

No. Oracle, like any other reasonable relational database, is more efficient when you do set-based operations and when you do joins rather than procedurally emulating joins (with, say, nested cursor loops).
My guess, however, is that you are not really talking about code that lacks joins. My guess is that you are talking about code that uses a different join syntax than you are accustomed to. Both
SELECT a.*
FROM a
JOIN b ON (a.a_id = b.a_id)
JOIN c ON (b.b_id = c.b_id)
and
SELECT a.*
FROM a,
b,
c
WHERE a.a_id = b.a_id
AND b.b_id = c.b_id
are queries that join a to b to c. The two queries are exactly identical-- the Oracle parser will actually internally rewrite the first query into the second. The only difference is that the first query uses the newer SQL 99 syntax to specify its joins.
Historically, Oracle was relatively late to adopt the SQL 99 syntax, there is a tremendous amount of code that was written before the SQL 99 syntax was available, and quite a few Oracle folks prefer the old style syntax out of habit if nothing else. For all those reasons, it's relatively common to find Oracle based projects using the older join syntax exclusively. There is nothing inherently wrong with that (though I personally prefer the newer syntax).

Related

When do we use WITH clause, and what are main benefits of it?

I was working on task about optimization queries. One of the improvement ways was using WITH clause. I notice that it did very good job, and it lead to shorter time of execution, but i am not sure now, when should I use WITH clause and is there any risk of using it?
Here is one of the queries that I am working on :
WITH MY_TABLE AS
( SELECT PROD_KY,
sum(GROUPISPRIVATE) AS ISPRIVATE,
sum(GROUPISSHARED) AS ISSHARED
FROM
(
SELECT GRP_PROD_CUSTOMER.PROD_KY,
1 as ISPRIVATE,
0 as ISSHARED
FROM CUSTOMER
JOIN GRP_CUSTOMER ON GRP_CUSTOMER.CUST_KY = CUSTOMER.CUST_KY
JOIN GRP_PROD_CUSTOMER ON GRP_PROD_CUSTOMER.GRP_KY = GRP_CUSTOMER.GRP_KY
GROUP BY GRP_PROD_CUSTOMER.PROD_KY
)
GROUP BY PROD_KY
)
SELECT * FROM MY_TABLE;
is there any risk of using it?
Yes. Oracle may decide to materialize the subquery, which means writing its result set to disk and then reading it back (except it might not mean that in 12cR2 or later). That unexpected I/O could be a performance hit. Not always, and usually we can trust the optimizer to make the correct choice. However, Oracle has provided us with hints to tell the optimizer how to handle the result set: /*+ materialize */ to um materialize it and /*+ inline */ to keep it in memory.
I start with this potential downside because I think it's important to understand that the WITH clause is not a silver bullet and it won't improve every single query, and may even degrade performance. For instance I share the scepticism of the other commenters that the query you posted is in any way faster because you re-wrote it as a common table expression.
Generally, the use cases for the WITH clause are:
We want to use the result set from the subquery multiple times
with cte as
( select blah from meh )
select *
from t1
join t2 on t1.id = t2.id
where t1.col1 in ( select blah from cte )
and t2.col2 not in ( select blah from cte)
We want to be build a cascade of subqueries:
with cte as
( select id, blah from meh )
, cte2 as
( select t2.*, cte.blah
from cte
join t2 on t2.id = cte.id)
, cte3 as
( select t3.*, cte2.*
from cte2
join t3 on t3.col2 = cte2.something )
….
This second approach is beguiling and can be useful for implementing complex business logic in pure SQL. But it can lead to a procedural mindset and lose the power sets and joins. This too is a risk.
We want to use recursive WITH clause. This allows us to replace Oracle's own CONNECT BY syntax with a more standard approach. Find out more
In 12c and later we can write user-defined functions in the WITH clause. This is a powerful feature, especially for users who need to implement some logic in PL/SQL but only have SELECT access to the database. Find out more
For the record I have seen some very successful and highly performative uses of the second type of WITH clause. However I have also seen uses of WITH when it would have been just as easy to write an inline view. For instance, this is just using the WITH clause as syntactic sugar ...
with cte as
( select id, blah from meh )
select t2.*, cte.blah
from t2
join cte on cte.id = t2.id
… and would be clearer as ...
select t2.*, cte.blah
from t2
join ( select id, blah from meh ) cte on cte.id = t2.id
WITH clause is introduced in oracle to match SQL-99 standard.
The main purpose is to reduce the complexity and repetitive code.
Lets say you need to find the average salary of one department and then need to fetch all the department(d1) with more than average salary of that department(d1).
This can make multiple references to the subquery more efficient and readable.
The MATERIALIZE and INLINE optimizer hints can be used to influence the decision. The undocumented MATERIALIZE hint tells the optimizer to resolve the subquery as a global temporary table, while the INLINE hint tells it to process the query inline. Decision to use the hint is purely depends on logic that we are going to implement in query.
In oracle 12c, declaration of PL/SQL Block in WITH clause is introduced.
You must refer it from oracle documents.
Cheers!!
Your query is rather useless in terms of WITH statement (aka Common Table Expression, CTE)
Anyway, using the WITH clause brings several benefits:
The query is better readable (in my opinion)
You can use the same subquery several times in the main query. You can even cascade them.
Oracle can materialize the subquery, i.e. Oracle may create a temporary table and stores result of the subquery in it. This can give better performance.
The WITH clause may be processed as an inline view or resolved as a temporary table. The SQL WITH clause is very similar to the use of Global temporary tables. This technique is often used to improve query speed for complex subqueries and enables the Oracle optimizer to push the necessary predicates into the views.
The advantage of the latter is that repeated references to the subquery may be more efficient as the data is easily retrieved from the temporary table, rather than being requeried by each reference. You should assess the performance implications of the WITH clause on a case-by-case basis.
You can read more here:
http://www.dba-oracle.com/t_with_clause.htm
https://oracle-base.com/articles/misc/with-clause
one point to consider is, that different RDBMS handle the with clause - aka common table expressions (CTE) aka subquery factoring - differently:
Oracle may use a materialization or an inlining (as already explained in the answer provided by APC)
postgres always uses a materialization in releases up to 11 (so here a CTE is an optimization fence). In postgres 12 the behaviour changes and is similar to Oracles approach: https://info.crunchydata.com/blog/with-queries-present-future-common-table-expressions. You even have something that almost looks like a hint (though it is known that postgres does not use hints...)
in SQL Server currently a CTE is always inlined, as explained in https://erikdarlingdata.com/2019/08/what-would-materialized-ctes-look-like-in-sql-server/
So depending on the RDBMS you use and its version your mileage may vary.

Oracle SQL sub query vs inner join

At first, I seen the select statement on Oracle Docs.
I have some question about oracle select behaviour, when my query contain select,join,where.
see this below for information:
My sample table:
[ P_IMAGE_ID ]
IMAGE_ID (PK)
FILE_NAME
FILE_TYPE
...
...
[ P_IMG_TAG ]
IMG_TAG_ID (PK)
IMAGE_ID (FK)
TAG
...
...
My requirement are: get distinct of image when it's tag is "70702".
Method 1: Select -> Join -> Where -> Distinct
SELECT DISTINCT PID.IMAGE_ID
, PID.FILE_NAME
FROM P_IMAGE_ID PID
INNER JOIN P_IMG_TAG PTAG
ON PTAG.IMAGE_ID = PID.IMAGE_ID
WHERE PTAG.TAG = '70702';
I think the query behaviour should be like:
join table -> hint where cause -> distinct select
I use Oracle SQL developer to get the explain plan:
Method 1 cost 76.
Method 2: Select -> Where -> Where -> Distinct
SELECT DISTINCT PID.IMAGE_ID
, PID.FILE_NAME
FROM P_IMAGE_ID PID
WHERE PID.IMAGE_ID IN
(
SELECT PTAG.IMAGE_ID
FROM P_IMG_TAG PTAG
WHERE PTAG.TAG = '70702'
);
I think the second query behaviour should be like:
hint where cause -> hint where cause -> distinct select
I use Oracle SQL developer to get the explain plan too:
Method 2 cost 76 too. Why?
I believe when I try where cause first for reduce the database process and avoid join table that query performance should be better than the table join query, but now when I test it, I am confused, why 2 method cost are equal ?
Or am I misunderstood something ?
List of my question here:
Why 2 method above cost are equal ?
If the result of sub select Tag = '70702' more than thousand or million or more, use join table should be better alright ?
If the result of sub select Tag = '70702' are least, use sub select for reduce data query process is better alright ?
When I use method 1 Select -> Join -> Where -> Distinct mean the database process table joining before hint where cause alright ?
Someone told me when i move hint cause Tag = '70702' into join cause
(ie. INNER JOIN P_IMG_TAG PTAG ON PAT.IMAGE_ID = PID.IMAGE_ID AND PTAG.TAG = '70702' ) it's performance may be better that's alright ?
I read topic subselect vs outer join and subquery or inner join but both are for SQL Server, I don't sure that may be like Oracle database.
The DBMS takes your query and executes something. But it doesn't execute steps that correspond to SQL statement parts in the order they appear in an SQL statement.
Read about "relational query optimization", which could just as well be called "relational query implementation". Eg for Oracle.
Any language processor takes declarations and calls as input and implements the described behaviour in terms of internal data structures and operations, maybe through one or more levels of "intermediate code" running on a "virtual machine", eventually down to physical machines. But even just staying in the input language, SQL queries can be rearranged into other SQL queries that return the same value but perform significantly better under simple and general implementation assumptions. Just as you know that your question's queries always return the same thing for a given database, the DBMS can know. Part of how it knows is that there are many rules for taking a relational algebra expression and generating a different but same-valued expression. Certain rewrite rules apply under certain limited circumstances. There are rules that take into consideration SQL-level relational things like primary keys, unique columns, foreign keys and other constraints. Other rules use implementation-oriented SQL-level things like indexes and statistics. This is the "relational query rewriting" part of relational query optimization.
Even when two different but equivalent queries generate different plans, the cost can be similar because the plans are so similar. Here, both a HASH and SORT index are UNIQUE. (It would be interesting to know what the few top plans were for each of your queries. It is quite likely that those few are the same for both, but that the plan that is more directly derived from the particular input expression is the one that is offered when there's little difference.)
The way to get the DBMS to find good query plans is to write the most natural expression of a query that you can find.

Reason for poor performance of ANSI joins in Oracle 9i

Please before flagging as duplicate, read the last paragraphs.
In an Oracle 9i database this query runs in 0.18 seconds:
select
count(*)
from
dba_synonyms s,dba_objects t
where
s.TABLE_OWNER = t.OWNER(+) and
s.TABLE_NAME = t.OBJECT_NAME(+) and
s.DB_LINK is null and
t.OWNER is null;
...but this one runs in an appalling 120 seconds!:
select
count(*)
from
dba_synonyms s left join dba_objects t
on ( s.TABLE_OWNER = t.OWNER and s.TABLE_NAME = t.OBJECT_NAME )
where
s.DB_LINK is null and
t.OWNER is null;
Notice the only difference is using propietary Oracle join syntax vs ANSI join syntax.
These question is not a duplicate of this one because that other question is about a very complex query involving more than 9 tables, and the only answer points out that the queries are very different besides the syntax usage (mainly the order of the tables).
In my case it is a extremelly simple query, a mere join between two relations with no mayor complications of differences, including the order of tables.
Is this a bug in Oracle 9i?
What is the cause of such dramatic difference in performance?
UPDATE:
This are the execution plans, notice that the explain plan for query one shows no cardinality, cost or bytes info :
Fast, old-style join:
Slow, ANSI join:
Well, after a long time with no answers. I've done some testing.
I ran the same query in 10g and 11g and both versions, the one with ANSI joins and the ones with the WHERE joins run in less than 1 second.
As problems exists only in 9i, the same version the support for ANSI joins where introduced I assume it's a bug in 9i that could or could have not be solved in a patch.
Gladly as I mentioned, from 10g onwards both flavors of joins perform well.

Is there an Oracle official recommendation on the use of explicit ANSI JOINs vs implicit joins?

Note: I am not asking you to tell me “use explicit joins” but looking for Oracle official position if any on that subject.
From Oracle database documentation (also appears in 9i and 11g documentations):
Oracle recommends that you use the FROM clause OUTER JOIN syntax
rather than the Oracle join operator. Outer join queries that use the
Oracle join operator (+) are subject to the following rules and
restrictions […]
In other words, Oracle advises to prefer the first of these two forms:
FROM a LEFT JOIN b ON b.x = a.x
vs
FROM a, b WHERE b.x(+) = a.x
However, I have never found in any Oracle documentation a single recommendation to use preferably one of those two forms:
FROM a INNER JOIN b ON b.x = a.x
vs
FROM a, b WHERE b.x = a.x
Is there a paragraph I missed?
There are a number of notes From Oracle Support site on issues with ANSI join syntax with workarounds recommeding to use the oracle syntax.
Bug 5188321 wrong results (no rows) OR ORA-1445 from ANSI outer join
Versions affected: Versions >= 9.2.0.1 but < 11
Description
Wrong results or an ORA-1445 can be returned with a query involving a
very large select list count when ANSI OUTER JOIN syntax is used.
Workaround
Use native oracle outer join syntax
or
reduce the select list count.
Bug 5368296 ANSI join SQL may not report ORA-918 for ambiguous column
Versions affected: Versions < 11
Description
****
Note: This fix introduces the problem described in bug 7318276
One off fixes for that bug address the issue here also.
****
ORA-918 is not reported for an ambiguous column in a query
involving an ANSI join of more than 2 tables/objects.
eg:
-- 2 table join, returns ORA-918
SELECT empno
FROM emp a JOIN emp b on a.empno = b.empno;
-- 3 table join does not report ORA-918 when it should ...
SELECT empno
FROM emp a JOIN emp b on a.empno = b.empno
JOIN emp c on a.empno = c.empno;
Bug 7670135 Long parse time compiling ANSI join
Versions affected: Versions BELOW 11.2
Description
A query having ANSI join(s) may take noticeable time during query compilation,
especially if the query includes an NVL() function.
Workaround:
Use ORACLE join instead of ANSI join
From the Oracle Press - Oracle OCP 11g all in one exam guide
And from asktom (who is non committal)
Historically there have been bugs related to ANSI syntax, in fact even the
10.2.0.4 projected issues list includes 10 bugs/issues related to ANSI syntax.
In the past I've encountered some of these bugs myself, and have continued to use
and advocate the "traditional" Oracle style.
I'd like to know if you feel that the implementation of ANSI syntax is now equally
robust compared to the traditional syntax.
Followup February 19, 2008 - 5pm Central time zone:
unfortunately, there are bugs in non-ansi joins too, probably more than 10 in fact.
I personally do not use the new syntax (except in the rare case of a full outer join,
a truly rare beast to encounter). I have no comment on it really.
See also earlier question on same topic
Difference between Oracle's plus (+) notation and ansi JOIN notation?
I also found this statement in a document but no reference as to where it came from
"Starting with Oracle 9i, Oracle recommends that SQL developers use the ANSI join syntax instead of the Oracle proprietary (+) syntax. There are several reasons for this recommendation, including:
• Easier to segregate and read (without mixing up join versus restriction code)
• Easier to construct join code correctly (especially in the case of “outer” joins)
• Portable syntax will work on all other ANSI compliant databases, such as MS SQL Server, DB2, MySQL, PostgreSQL, et al
• Since it’s the universally accepted standard, it’s the general target for all future database and third party vendors’ tools
• The proprietary Oracle outer-join (+) syntax can only be used in one direction at a time, it cannot perform a full outer join
• Plus these additional limitations from the Oracle documentation:
o The (+) operator can be applied only to a column, not to an arbitrary expression. However, an arbitrary expression can contain one or more columns marked with the (+) operator.
o A condition containing the (+) operator cannot be combined with another condition using the OR logical operator.
o A condition cannot use the IN comparison condition to compare a column marked with the (+) operator with an expression.
o A condition cannot compare any column marked with the (+) operator with a sub-query."
Thus it’s time to embrace the ANSI join syntax – and move into the 21st century
I haven't seen it if there is. The reason for preferring ANSI syntax for outer joins in particular (apart from the non-standarrd, Oracle-specific (+) symbol) is that more outer joins are expressible using the ANSI syntax. The restriction "ORA-01417: a table may be outer joined to at most one other table" applies to (+) outer joins but not to ANSI outer joins. Other restrictions on (+) that do not apply to ANSI outer joins are documented here.
One highly respected Oracle expert actually recommends sticking to the old syntax for inner joins - see Jonathan Lewis's blog. He says there that ANSI joins are transformed to traditional Oracle joins under the covers anyway. I don't agree with him 100% (I prefer ANSI joins myself in general), but would not claim to have a fraction of his knowledge on the topic.
In a nutshell, ANSI outer joins are technically superior to old (+) joins, whereas with inner joins it is more just a matter of style.

Common Table Expression in Sub-Query

I would request for help in understanding which all RDBMS from Oracle, DB2, Sybase support a common table expression (CTE) in a sub-query. I am aware that PostgreSQL does while MS SQL Server does not.
SELECT a.*, b.*
FROM (WHERE aa as (
<<select_query>),
SELECT *
FROM aa
WHERE <<criteria>>
) as a
LEFT JOIN (
WITH bb as (
<<select_query>
),
select * from bb inner join tbl_c on <<innerjoin>> where <<criteria>>
) as b
on <<join_expr>>
I am unable to define the with clause outside the sub-queries - both the queries are dynamically generated w.r.t. the columns, criteria, security, etc.
Also, the above query itself may be used in another query as a sub-query.
In summary, the principle is dynamically generated views, re-usable later. Some queries may have upto 10-12 such dynamic views being merged together as well.
The problem is that the application is supposed to be database-agnostic at least so far as PG, Oracle & DB2 are concerned and features not supported by one are not implemented at all.
Yes, you can use CTE's in subqueries in Oracle. From the Oracle 11g docs:
You can specify this clause in any top-level SELECT statement and in
most types of subqueries. The query name is visible to the main query
and to all subsequent subqueries. For recursive subquery factoring,
the query name is even visible to the subquery that defines the query
name itself.
As an example, this works in Oracle:
SELECT a.*, b.*
FROM (WITH aa AS
(
SELECT LEVEL l1, mod(level, 5) m1 FROM dual CONNECT BY LEVEL < 50
)
SELECT * FROM aa WHERE m1 < 3) a LEFT JOIN
(WITH bb AS
(
SELECT LEVEL l2, mod(level, 5) m2 FROM dual CONNECT BY LEVEL < 50
)
SELECT * FROM bb WHERE m2 BETWEEN 1 AND 4) b
ON a.l1 = b.l2;
That's not directly an answer to your question, but maybe you can think about this:
SQL Server seems to limit the semantics (not necessarily the syntax) of SQL where it makes sense to do so. For instance, you cannot have a subquery with an ORDER BY clause, if you don't also specify a TOP n clause. This makes sense, as ordered subqueries are pointless unless they have limits. Other RDBMS allow for such pointlessness.
In your case (that's just a guess), having CTE's in subqueries only makes limited sense, because you can rewrite your whole query in a way that the CTE's are declared at the top-most level. The only difference you'll have is the scope and maybe the readability of each declaration.
On the other hand, CTE's allow for recursive queries, which might be very hard to apply when CTE's are declared in subqueries...
Since you need to implement database-agnostic SQL, I recommend you do not make heavy use of CTE's yet. If CTE's are simple, you can always rewrite them as simple views...
The newer Microsoft SQL Server versions do support CTE's.
While PostgreSQL supports CTE's, they are an optimisation barrier which prevents predicate or join pushing into the CTE query. This makes them less effective in many cases than a simple subquery.

Resources