When do we use WITH clause, and what are main benefits of it?

When do we use WITH clause, and what are main benefits of it? - oracle

I was working on task about optimization queries. One of the improvement ways was using WITH clause. I notice that it did very good job, and it lead to shorter time of execution, but i am not sure now, when should I use WITH clause and is there any risk of using it?
Here is one of the queries that I am working on :
WITH MY_TABLE AS
( SELECT PROD_KY,
sum(GROUPISPRIVATE) AS ISPRIVATE,
sum(GROUPISSHARED) AS ISSHARED
FROM
(
SELECT GRP_PROD_CUSTOMER.PROD_KY,
1 as ISPRIVATE,
0 as ISSHARED
FROM CUSTOMER
JOIN GRP_CUSTOMER ON GRP_CUSTOMER.CUST_KY = CUSTOMER.CUST_KY
JOIN GRP_PROD_CUSTOMER ON GRP_PROD_CUSTOMER.GRP_KY = GRP_CUSTOMER.GRP_KY
GROUP BY GRP_PROD_CUSTOMER.PROD_KY
)
GROUP BY PROD_KY
)
SELECT * FROM MY_TABLE;

is there any risk of using it?
Yes. Oracle may decide to materialize the subquery, which means writing its result set to disk and then reading it back (except it might not mean that in 12cR2 or later). That unexpected I/O could be a performance hit. Not always, and usually we can trust the optimizer to make the correct choice. However, Oracle has provided us with hints to tell the optimizer how to handle the result set: /*+ materialize */ to um materialize it and /*+ inline */ to keep it in memory.
I start with this potential downside because I think it's important to understand that the WITH clause is not a silver bullet and it won't improve every single query, and may even degrade performance. For instance I share the scepticism of the other commenters that the query you posted is in any way faster because you re-wrote it as a common table expression.
Generally, the use cases for the WITH clause are:
We want to use the result set from the subquery multiple times
with cte as
( select blah from meh )
select *
from t1
join t2 on t1.id = t2.id
where t1.col1 in ( select blah from cte )
and t2.col2 not in ( select blah from cte)
We want to be build a cascade of subqueries:
with cte as
( select id, blah from meh )
, cte2 as
( select t2.*, cte.blah
from cte
join t2 on t2.id = cte.id)
, cte3 as
( select t3.*, cte2.*
from cte2
join t3 on t3.col2 = cte2.something )
….
This second approach is beguiling and can be useful for implementing complex business logic in pure SQL. But it can lead to a procedural mindset and lose the power sets and joins. This too is a risk.
We want to use recursive WITH clause. This allows us to replace Oracle's own CONNECT BY syntax with a more standard approach. Find out more
In 12c and later we can write user-defined functions in the WITH clause. This is a powerful feature, especially for users who need to implement some logic in PL/SQL but only have SELECT access to the database. Find out more
For the record I have seen some very successful and highly performative uses of the second type of WITH clause. However I have also seen uses of WITH when it would have been just as easy to write an inline view. For instance, this is just using the WITH clause as syntactic sugar ...
with cte as
( select id, blah from meh )
select t2.*, cte.blah
from t2
join cte on cte.id = t2.id
… and would be clearer as ...
select t2.*, cte.blah
from t2
join ( select id, blah from meh ) cte on cte.id = t2.id

WITH clause is introduced in oracle to match SQL-99 standard.
The main purpose is to reduce the complexity and repetitive code.
Lets say you need to find the average salary of one department and then need to fetch all the department(d1) with more than average salary of that department(d1).
This can make multiple references to the subquery more efficient and readable.
The MATERIALIZE and INLINE optimizer hints can be used to influence the decision. The undocumented MATERIALIZE hint tells the optimizer to resolve the subquery as a global temporary table, while the INLINE hint tells it to process the query inline. Decision to use the hint is purely depends on logic that we are going to implement in query.
In oracle 12c, declaration of PL/SQL Block in WITH clause is introduced.
You must refer it from oracle documents.
Cheers!!

Your query is rather useless in terms of WITH statement (aka Common Table Expression, CTE)
Anyway, using the WITH clause brings several benefits:
The query is better readable (in my opinion)
You can use the same subquery several times in the main query. You can even cascade them.
Oracle can materialize the subquery, i.e. Oracle may create a temporary table and stores result of the subquery in it. This can give better performance.

The WITH clause may be processed as an inline view or resolved as a temporary table. The SQL WITH clause is very similar to the use of Global temporary tables. This technique is often used to improve query speed for complex subqueries and enables the Oracle optimizer to push the necessary predicates into the views.
The advantage of the latter is that repeated references to the subquery may be more efficient as the data is easily retrieved from the temporary table, rather than being requeried by each reference. You should assess the performance implications of the WITH clause on a case-by-case basis.
You can read more here:
http://www.dba-oracle.com/t_with_clause.htm
https://oracle-base.com/articles/misc/with-clause

one point to consider is, that different RDBMS handle the with clause - aka common table expressions (CTE) aka subquery factoring - differently:
Oracle may use a materialization or an inlining (as already explained in the answer provided by APC)
postgres always uses a materialization in releases up to 11 (so here a CTE is an optimization fence). In postgres 12 the behaviour changes and is similar to Oracles approach: https://info.crunchydata.com/blog/with-queries-present-future-common-table-expressions. You even have something that almost looks like a hint (though it is known that postgres does not use hints...)
in SQL Server currently a CTE is always inlined, as explained in https://erikdarlingdata.com/2019/08/what-would-materialized-ctes-look-like-in-sql-server/
So depending on the RDBMS you use and its version your mileage may vary.

Related

Oracle 11 joining with view has high cost

I'm having some difficulty with joining a view to another table. This is on an Oracle RAC system running 11.2
I'll try and give as much detail as possible without going into specific table structures as my company would not like that.
You all know how this works. "Hey, can you write some really ugly software to implement our crazy ideas?"
The idea of what they wanted me to do was to make a view where the end user wouldn't know if they were going after the new table or the old table so one of the tables is a parameter table that will return "ON" or "OFF" and is used in the case statements.
There are some not too difficult but nested case statements in the select clause
I have a view:
create view my_view as
select t1.a as a, t1.b as b, t1.c as c,
sum(case when t2.a = 'xx' then case when t3.a then ... ,
case when t2.a = 'xx' then case when t3.a then ... ,
from table1 t1
join table t2 on (t1.a = t2.a etc...)
full outer join t3 on (t1.a = t3.a etc...)
full outer join t4 on (t1.a = t4.a etc...)
group by t1.a, t1.b, t2.c, and all the ugly case statements...
Now, when I run the query
select * from my_view where a='xxx' and b='yyy' and c='zzz'
the query runs great and the cost is 10.
However, when I join this view with another table everything falls apart.
select * from my_table mt join my_view mv on (mt.a = mv.a and mt.b=mv.b and mt.c=mv.c) where ..."
everything falls apart with a cost though the roof.
What I think is happening is the predicates are not getting pushed to the view. As such, the view is now doing full tables scans and joining everything to everything and then finally removing all the rows.
Every hint, tweak, or anything I've done doesn't appear to help.
When looking at the plan it looks like it has the predicates.
But this happens after everything is joined.
Sorry if this is cryptic but any help would be greatly appreciated.

Since you have the view with a "GROUP BY", predicates could not be pushed to the inner query
Also, you have the group by functions in a case statement, which could also make it worse for the optimizer
Oracle introduces enhancements to Optimizer every version/release/patch. It is hard to say what is supported in the version you're running. However, you can try:
See if removing the case from the GROUP BY function will make any difference
Otherwise, you have to take the GROUP BY and GROUP BY functions from the view to the outer most query

After many keyboard indentations on my forehead I may have tricked Oracle into pushing the predicates. I don't know exactly why this works but simplifying things may have helped.
I changed all my ON clauses to USING clauses and in this way the column names now match the columns from which I'm joining to. On some other predicates that were constants I added in a where clause to the view.
The end result is I can now join this view with another table and the cost is reasonable and the plan shows that the predicates are being pushed.
Thank you to everybody who looked at this problem.

Oracle SQL sub query vs inner join

At first, I seen the select statement on Oracle Docs.
I have some question about oracle select behaviour, when my query contain select,join,where.
see this below for information:
My sample table:
[ P_IMAGE_ID ]
IMAGE_ID (PK)
FILE_NAME
FILE_TYPE
...
...
[ P_IMG_TAG ]
IMG_TAG_ID (PK)
IMAGE_ID (FK)
TAG
...
...
My requirement are: get distinct of image when it's tag is "70702".
Method 1: Select -> Join -> Where -> Distinct
SELECT DISTINCT PID.IMAGE_ID
, PID.FILE_NAME
FROM P_IMAGE_ID PID
INNER JOIN P_IMG_TAG PTAG
ON PTAG.IMAGE_ID = PID.IMAGE_ID
WHERE PTAG.TAG = '70702';
I think the query behaviour should be like:
join table -> hint where cause -> distinct select
I use Oracle SQL developer to get the explain plan:
Method 1 cost 76.
Method 2: Select -> Where -> Where -> Distinct
SELECT DISTINCT PID.IMAGE_ID
, PID.FILE_NAME
FROM P_IMAGE_ID PID
WHERE PID.IMAGE_ID IN
(
SELECT PTAG.IMAGE_ID
FROM P_IMG_TAG PTAG
WHERE PTAG.TAG = '70702'
);
I think the second query behaviour should be like:
hint where cause -> hint where cause -> distinct select
I use Oracle SQL developer to get the explain plan too:
Method 2 cost 76 too. Why?
I believe when I try where cause first for reduce the database process and avoid join table that query performance should be better than the table join query, but now when I test it, I am confused, why 2 method cost are equal ?
Or am I misunderstood something ?
List of my question here:
Why 2 method above cost are equal ?
If the result of sub select Tag = '70702' more than thousand or million or more, use join table should be better alright ?
If the result of sub select Tag = '70702' are least, use sub select for reduce data query process is better alright ?
When I use method 1 Select -> Join -> Where -> Distinct mean the database process table joining before hint where cause alright ?
Someone told me when i move hint cause Tag = '70702' into join cause
(ie. INNER JOIN P_IMG_TAG PTAG ON PAT.IMAGE_ID = PID.IMAGE_ID AND PTAG.TAG = '70702' ) it's performance may be better that's alright ?
I read topic subselect vs outer join and subquery or inner join but both are for SQL Server, I don't sure that may be like Oracle database.

The DBMS takes your query and executes something. But it doesn't execute steps that correspond to SQL statement parts in the order they appear in an SQL statement.
Read about "relational query optimization", which could just as well be called "relational query implementation". Eg for Oracle.
Any language processor takes declarations and calls as input and implements the described behaviour in terms of internal data structures and operations, maybe through one or more levels of "intermediate code" running on a "virtual machine", eventually down to physical machines. But even just staying in the input language, SQL queries can be rearranged into other SQL queries that return the same value but perform significantly better under simple and general implementation assumptions. Just as you know that your question's queries always return the same thing for a given database, the DBMS can know. Part of how it knows is that there are many rules for taking a relational algebra expression and generating a different but same-valued expression. Certain rewrite rules apply under certain limited circumstances. There are rules that take into consideration SQL-level relational things like primary keys, unique columns, foreign keys and other constraints. Other rules use implementation-oriented SQL-level things like indexes and statistics. This is the "relational query rewriting" part of relational query optimization.
Even when two different but equivalent queries generate different plans, the cost can be similar because the plans are so similar. Here, both a HASH and SORT index are UNIQUE. (It would be interesting to know what the few top plans were for each of your queries. It is quite likely that those few are the same for both, but that the plan that is more directly derived from the particular input expression is the one that is offered when there's little difference.)
The way to get the DBMS to find good query plans is to write the most natural expression of a query that you can find.

Common Table Expression in Sub-Query

I would request for help in understanding which all RDBMS from Oracle, DB2, Sybase support a common table expression (CTE) in a sub-query. I am aware that PostgreSQL does while MS SQL Server does not.
SELECT a.*, b.*
FROM (WHERE aa as (
<<select_query>),
SELECT *
FROM aa
WHERE <<criteria>>
) as a
LEFT JOIN (
WITH bb as (
<<select_query>
),
select * from bb inner join tbl_c on <<innerjoin>> where <<criteria>>
) as b
on <<join_expr>>
I am unable to define the with clause outside the sub-queries - both the queries are dynamically generated w.r.t. the columns, criteria, security, etc.
Also, the above query itself may be used in another query as a sub-query.
In summary, the principle is dynamically generated views, re-usable later. Some queries may have upto 10-12 such dynamic views being merged together as well.
The problem is that the application is supposed to be database-agnostic at least so far as PG, Oracle & DB2 are concerned and features not supported by one are not implemented at all.

Yes, you can use CTE's in subqueries in Oracle. From the Oracle 11g docs:
You can specify this clause in any top-level SELECT statement and in
most types of subqueries. The query name is visible to the main query
and to all subsequent subqueries. For recursive subquery factoring,
the query name is even visible to the subquery that defines the query
name itself.
As an example, this works in Oracle:
SELECT a.*, b.*
FROM (WITH aa AS
(
SELECT LEVEL l1, mod(level, 5) m1 FROM dual CONNECT BY LEVEL < 50
)
SELECT * FROM aa WHERE m1 < 3) a LEFT JOIN
(WITH bb AS
(
SELECT LEVEL l2, mod(level, 5) m2 FROM dual CONNECT BY LEVEL < 50
)
SELECT * FROM bb WHERE m2 BETWEEN 1 AND 4) b
ON a.l1 = b.l2;

That's not directly an answer to your question, but maybe you can think about this:
SQL Server seems to limit the semantics (not necessarily the syntax) of SQL where it makes sense to do so. For instance, you cannot have a subquery with an ORDER BY clause, if you don't also specify a TOP n clause. This makes sense, as ordered subqueries are pointless unless they have limits. Other RDBMS allow for such pointlessness.
In your case (that's just a guess), having CTE's in subqueries only makes limited sense, because you can rewrite your whole query in a way that the CTE's are declared at the top-most level. The only difference you'll have is the scope and maybe the readability of each declaration.
On the other hand, CTE's allow for recursive queries, which might be very hard to apply when CTE's are declared in subqueries...
Since you need to implement database-agnostic SQL, I recommend you do not make heavy use of CTE's yet. If CTE's are simple, you can always rewrite them as simple views...

The newer Microsoft SQL Server versions do support CTE's.

While PostgreSQL supports CTE's, they are an optimisation barrier which prevents predicate or join pushing into the CTE query. This makes them less effective in many cases than a simple subquery.

oracle-inline view

Why inline views are used..??

There are many different reasons for using inline views. Some things can't be done without inline views, for example:
1) Filtering on the results of an analytic function:
select ename from
( select ename, rank() over (order by sal desc) rnk
from emp
)
where rnk < 4;
2) Using ROWNUM on ordered results:
select ename, ROWNUM from
( select ename
from emp
order by ename
);
Other times they just make it easier to write the SQL you want to write.

The inline view is a construct in Oracle SQL where you can place a query in the SQL FROM, clause, just as if the query was a table name.
Inline views provide
Bind variables can be introduced inside the statement to limit the data
Better control over the tuning
Visibility into the code

To get top N ordered rows.
SELECT name, salary,
FROM (SELECT name, salary
FROM emp
ORDER BY salary DESC)
WHERE rownum <= 10;

An inline view can be regarded as an intermediate result set that contributes to the required data set in some way. Sometimes it is entirely a matter of improving maintainability of the code, and sometimes it is logically neccessary.

From the Oracle Database Concepts document there are the inline view concept definition:
An inline view is not a schema object.
It is a subquery with an alias
(correlation name) that you can use
like a view within a SQL statement.
About the subqueries look in Using Subqueries from the Oracle SQL Reference manual. It have a very nice pedagogic information.
Anyway, today is preferred to use the Subquery Factoring Clause that is a more powerfull way of use inline views.
As an example of all together:
WITH
dept_costs AS (
SELECT department_name, SUM(salary) dept_total
FROM employees e, departments d
WHERE e.department_id = d.department_id
GROUP BY department_name),
avg_cost AS
SELECT * FROM dept_costs
WHERE dept_total >
(SELECT avg FROM (SELECT SUM(dept_total)/COUNT(*) avg
FROM dept_costs)
)
ORDER BY department_name;
There you can see one of all:
An inline view query: SELECT SUM...
A correlated subquery: SELECT avg FROM...
A subquery factoring: dept_costs AS (...
What are they used for?:
To avoid creating an intermediate view object: CREATE VIEW ...
To simplify some queries that a view cannot be helpfull. For instance, when the view need to filter from the main query.

You will often use inline views to break your query up into logical parts which helps both readability and makes writing more complex queries a bit easier.
Jva and Tony Andrews provided some good examples of simple cases where this is useful such as Top-N or Pagination queries where you may want to perform a query and order its results before using that as a part of a larger query which in turn might feed a query doing some other processing, where the logic for these individual queries would be difficult to achieve in a single query.
Another case they can be very useful is if you are writing a query that joins various tables together and want to perform aggregation on some of the tables, separating group functions and the processing into different inline views before performing the joins makes managing cardinality a lot easier. If you want some examples, I would be happy to provide them to make it more clear.
Factored subqueries (where you list your queries in the WITH clause at the start of the query) and inline views also often bring performance benefits. If you need to access the results of the subquery multiple times, you only need to run it once and it can be materialized as a Global Temporary Table (how the optimizer acts isn't totally black and white so I won't go into it here but you can do your own research - for example, see http://jonathanlewis.wordpress.com/2007/07/26/subquery-factoring-2/)

Table Join Efficiency Question

When joining across tables (as in the examples below), is there an efficiency difference between joining on the tables or joining subqueries containing only the needed columns?
In other words, is there a difference in efficiency between these two tables?
SELECT result
FROM result_tbl
JOIN test_tbl USING (test_id)
JOIN sample_tbl USING (sample_id)
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A') USING(request_id)
vs
SELECT result
FROM (SELECT result, test_id FROM result_tbl)
JOIN (SELECT test_id, sample_id FROM test_tbl) USING(test_id)
JOIN (SELECT sample_id FROM sample_tbl) USING(sample_id)
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A') USING(request_id)

The only way to find out for sure is to run both with tracing turned on and then look at the trace file. But in all probability they will be treated the same: the optimizer will merge all the inline views into the main statement and come up with the same query plan.

It doesn't matter. It may actually be WORSE since you are taking control away from the optimizer which generally knows best.
However, remember if you are doing a JOIN and only including a column from one of the tables that it is QUITE OFTEN better to re-write it as a series of EXISTS statements -- because that's what you really mean. JOINs (with some exceptions) will join matching rows which is a lot more work for the optimizer to do.
e.g.
SELECT t1.id1
FROM table1 t1
INNER JOIN table2 ON something = something
should almost always be
SELECT id1
FROM table1 t1
WHERE EXISTS( SELECT *
FROM table2
WHERE something = something )
For simple queries the optimizer may reduce the query plans into identical ones. Check it out on your DBMS.
Also this is a code smell and probably should be changed:
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A')
to
SELECT result
FROM request
WHERE EXISTS(...)
AND request_status = 'A'

No difference.
You can tell by running EXPLAIN PLAN on both those statements - Oracle knows that all you want is the "result" column, so it only does the minimum necessary to get the data it needs - you should find that the plans will be identical.
The Oracle optimiser does, sometimes, "materialize" a subquery (i.e. run the subquery and keep the results in memory for later reuse), but this is rare and only occurs when the optimiser believes this will result in a performance improvement; in any case, Oracle will do this "materialization" whether you specified the columns in the subqueries or not.
Obviously if the only place the "results" column is stored is in the blocks (along with the rest of the data), Oracle has to visit those blocks - but it will only keep the relevant info (the "result" column and other relevant columns, e.g. "test_id") in memory when processing the query.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio