Query in DB2 vs Oracle - oracle

There is a query having multiple inner joins. It involves two views, of which one view is based on four tables, and total there are four tables(including two views).
The same query with the same amount of data in the source tables runs in both, Oracle and DB2. In DB2, surprisingly, it takes 2 minutes to load 3 million records. While in Oracle, it is taking two hours. Same indexes are on all source tables in both the environments. Is the behavior of views (when used in joins) different in both environments (Oracle vs DB2)?
a dummy query I am sharing :-
INSERT INTO TABLE_A
SELECT
adf.column1,
adf.column2,
dd.column3,
SUM(otl.column4) column4,
SUM(otl.column5) column5,
(Case when SUM(otl.column5) = 0 then 0
else round(CAST(SUM(otl.column4) AS DECIMAL(19,2)) /abs(CAST(SUM(otl.column4) AS DECIMAL(18,2))),4)
end) taxl_unrlz_cgl_pct
FROM
view_a adf
INNER JOIN table_b hr on hr.hh_ref_id = adf.hh_ref_id
AND hr.col_typ_cd = 'FIRM'
AND hr.col_end_dt = TO_DATE('1/1/2900','MM/DD/YYYY')
INNER JOIN dw.table_c ar on ar.colb_id = adf.colb_id
AND ar.col_cd = '#'
AND ar.col_num BETWEEN 10000000 AND 89999999
AND ar.col_dt IS NULL
INNER JOIN table_d dd on dd.col_id = adf.col_id
INNER JOIN view2 otl ON otl.cola_id = ar.cola_id
GROUP BY adf.column1, adf.column2, dd.column3;

Technically, both DB2 and Oracle will try to rewrite the query in most efficient way possible using the base query that you have coded. But one of the common (but not frequent) issues that I have seen when using multi-table view is DBMS not being able to rewrite the query using underlying tables. So depending on complexity of the view itself and sometime the additional joins, DBMS may not be able to rewrite the query to use the underlying tables properly and hence resulting in not being able to use the indexes on the underlying tables used in the view. When this happens, the view itself acts like a materialized table (work table) and query goes for table scan on the materialized table.
There is no consistent pattern on when such issue can happen. So you will need to check on a case by case basis.
Since you are mentioning about 2 hrs vs 2 minutes, in most probability that might be the case. So you will need to check the access path on both Oracle and DB2. But you will also need to make sure that stats are updated and access path is based on latest stats on DBMS. Else it won't be apples to apples compare.

Related

ORA-02019:connection description for remote database not found - left join in a view

I have 3 tables:
table1: id, person_code
table2: id, address, person_code_foreing(same with that one from table 1), admission_date_1
table3: id, id_table2, admission_date_2, something
(the tables are fictive)
I'm trying to make a view who takes infos from this 3 tables using left join, i'm doing like this because in the first table i have some record who don't have the person_code in the others tables but I want also this info to be returned by the view:
CREATE OR REPLACE VIEW schema.my_view
SELECT t1.name, t2.adress, t3.something
from schema.table1#ambient1 t1
left join schema.table2#ambient1 t2
on t1.person_code = t2.person_code_foreing
left join schema.table3#ambient1 t3
on t3.id_table2 = t2.id
and t1.admission_date_1=t2.admission_date_2;
This view needs to be created in another ambient (ambient2).
I tried using a subquery, there I need also a left join to use, and this thing is very confusing because I don't get it, the subquery and the left join are the big no-no?! Or just de left-join?!
Has this happened to anyone?
How did you risolved it?
Thanks a lot.
ORA-2019 indicates that your database link (#ambient1) does not exist, or is not visible to the current user. You can confirm by checking the ALL_DB_LINKS view, which should list all links to which the user has access:
select owner, db_link from all_db_links;
Also keep in mind that Oracle will perform the joins in the database making the call, not the remote database, so you will almost certainly have to pull the entire contents of all three tables over the network to be written into TEMP for the join and then thrown away, every time you run a query. You will also lose the benefit of any indexes on the data and most likely wind up with full table scans on the temp tables within your local database.
I don't know if this is an option for you, but from a performance perspective and given that it isn't joining with anything in the local database, it would make much more sense to create the view in the remote database and just query that through the database link. That way all of the joins are performed efficiently where the data lives, only the result set is pushed over the network, and your client database SQL becomes much simpler.
I managed to make it work, but apparently ambient2 doesn't like my "left-join", and i used only a subquery and the operator (+), this is how it worked:
CREATE OR REPLACE VIEW schema.my_view
SELECT t1.name, all.adress, all.something
from schema.table1#ambient1 t1,(select * from
schema.table3#ambient1 t3, schema.table2#ambient1 t2
where t3.id_table2 = t2.id(+)
and (t1.admission_date_1=t2.admission_date_2 or t1.admission_date is null))
all
where t1.person_code = t2.person_code_foreing(+);
I tried to test if a query in ambient2 using a right-join works (with 2 tables created there) and it does. I thought there is a problem with that ambient..
For me, there is no sense why in my case this kind of join retrieves that error.
The versions are different?! I don't know, and I don't find any official documentation about that.
Maybe some of you guys have any clue..
There is a mistery for me :))
Thanks.

Replacing NOT IN with NOT EXISTS and an OUTER JOIN in Oracle Database 12c

I understand that the performance of our queries is improved when we use EXISTS and NOT EXISTS in the place of IN and NOT IN, however, is performance improved further when we replace NOT IN with an OUTER JOIN as opposed to NOT EXISTS?
For example, the following query selects all models from a PRODUCT table that are not in another table called PC. For the record, no model values in the PRODUCT or PC tables are null:
select model
from product
where not exists(
select *
from pc
where product.model = pc.model);
The following OUTER JOIN will display the same results:
select product.model
from product left join pc
on pc.model = product.model
where pc.model is null;
Seeing as these both return the same values, which option should we use to better improve the performance of our queries?
The query plan will tell you. It will depend on the data and tables. In the case of OUTER JOIN and NOT EXISTS they are the same.
However to your opening sentence, NOT IN and NOT EXISTS are not the same if NULL is accepted on model. In this case you say model cannot be null so you might find they all have the same plan anyway. However when making this assumption, the database must be told there cannot be nulls (using NOT NULL) as opposed to there simply not being any. If you don't it will make different plans for each query which may result in different performance depending on your actual data. This is generally true and particularly true for ORACLE which does not index NULLs.
Check out EXPLAIN PLAN

Oracle join using Hint USE_NL USE_HASH

What is the best way to Force execution plan to do only nested loop joins for all tables using Hint USE_NL in once case,
And in other case to do only Hash Join using USE_HASH hint for all tables
I want to run both query and see which has low cost in execution plan and use, please suggest
My doubt is in which sequence i should put for all 4 tables inside HINT like below
USE_NL(bl1_gain_adj,customers,bl1_gain,bl1_reply_code)
SELECT bl1_gain_adj.adj_seq_no,
bl1_gain_adj.amount_currency ,
bl1_gain_adj.gain_seq_no,
customers.loan_key,
customers.customer_key,
FROM
bl1_gain_adj,
customers,
bl1_gain,
bl1_reply_code
WHERE
bl1_gain.loan_key = customers.loan_key
AND bl1_gain.customer_key = customers.customer_key
AND bl1_gain.receiver_customer = customers.customer_no
AND bl1_gain.cycle_seq_no = customers.cycle_seq_no
AND bl1_reply_code.gain_code = bl1_gain.gain_code
AND bl1_reply_code.revenue_code = 'RC'
AND bl1_gain_adj.gain_seq_no = bl1_gain.gain_seq_no
AND bl1_gain_adj.customer_key = bl1_gain.customer_key;
Records in tables
---------------
bl1_gain_adj = 100 records
customers = 10 Million records
bl1_gain = 1 Million records
bl1_reply_code = 100 million records
Keeping aside the choice of the most appropriate hint for your query (if any), the order you write the table names/aliases in the USE_NL hint does not matter.
According to Oracle documentation:
Note that USE_NL(table1 table2) is not considered a multi-table hint
because it is a shortcut for USE_NL(table1) and USE_NL(table2)
About USE_NL, Oracle says:
The USE_NL hint instructs the optimizer to join each specified table
to another row source with a nested loops join, using the specified
table as the inner table.
That is, if you write USE_NL(table1 table2 table3 table4) this means "use all these tables as inner tables in a nested loop join"; if your query only has these 4 tables, the hint will be ignored for at least one table: to use a table as inner, we need another table to use as outer, so it's impossible to use all the tables as inner.
LEADING does something different, regarding the order in which tables are scanned:
The LEADING hint instructs the optimizer to use the specified set of
tables as the prefix in the execution plan.

Oracle SQL sub query vs inner join

At first, I seen the select statement on Oracle Docs.
I have some question about oracle select behaviour, when my query contain select,join,where.
see this below for information:
My sample table:
[ P_IMAGE_ID ]
IMAGE_ID (PK)
FILE_NAME
FILE_TYPE
...
...
[ P_IMG_TAG ]
IMG_TAG_ID (PK)
IMAGE_ID (FK)
TAG
...
...
My requirement are: get distinct of image when it's tag is "70702".
Method 1: Select -> Join -> Where -> Distinct
SELECT DISTINCT PID.IMAGE_ID
, PID.FILE_NAME
FROM P_IMAGE_ID PID
INNER JOIN P_IMG_TAG PTAG
ON PTAG.IMAGE_ID = PID.IMAGE_ID
WHERE PTAG.TAG = '70702';
I think the query behaviour should be like:
join table -> hint where cause -> distinct select
I use Oracle SQL developer to get the explain plan:
Method 1 cost 76.
Method 2: Select -> Where -> Where -> Distinct
SELECT DISTINCT PID.IMAGE_ID
, PID.FILE_NAME
FROM P_IMAGE_ID PID
WHERE PID.IMAGE_ID IN
(
SELECT PTAG.IMAGE_ID
FROM P_IMG_TAG PTAG
WHERE PTAG.TAG = '70702'
);
I think the second query behaviour should be like:
hint where cause -> hint where cause -> distinct select
I use Oracle SQL developer to get the explain plan too:
Method 2 cost 76 too. Why?
I believe when I try where cause first for reduce the database process and avoid join table that query performance should be better than the table join query, but now when I test it, I am confused, why 2 method cost are equal ?
Or am I misunderstood something ?
List of my question here:
Why 2 method above cost are equal ?
If the result of sub select Tag = '70702' more than thousand or million or more, use join table should be better alright ?
If the result of sub select Tag = '70702' are least, use sub select for reduce data query process is better alright ?
When I use method 1 Select -> Join -> Where -> Distinct mean the database process table joining before hint where cause alright ?
Someone told me when i move hint cause Tag = '70702' into join cause
(ie. INNER JOIN P_IMG_TAG PTAG ON PAT.IMAGE_ID = PID.IMAGE_ID AND PTAG.TAG = '70702' ) it's performance may be better that's alright ?
I read topic subselect vs outer join and subquery or inner join but both are for SQL Server, I don't sure that may be like Oracle database.
The DBMS takes your query and executes something. But it doesn't execute steps that correspond to SQL statement parts in the order they appear in an SQL statement.
Read about "relational query optimization", which could just as well be called "relational query implementation". Eg for Oracle.
Any language processor takes declarations and calls as input and implements the described behaviour in terms of internal data structures and operations, maybe through one or more levels of "intermediate code" running on a "virtual machine", eventually down to physical machines. But even just staying in the input language, SQL queries can be rearranged into other SQL queries that return the same value but perform significantly better under simple and general implementation assumptions. Just as you know that your question's queries always return the same thing for a given database, the DBMS can know. Part of how it knows is that there are many rules for taking a relational algebra expression and generating a different but same-valued expression. Certain rewrite rules apply under certain limited circumstances. There are rules that take into consideration SQL-level relational things like primary keys, unique columns, foreign keys and other constraints. Other rules use implementation-oriented SQL-level things like indexes and statistics. This is the "relational query rewriting" part of relational query optimization.
Even when two different but equivalent queries generate different plans, the cost can be similar because the plans are so similar. Here, both a HASH and SORT index are UNIQUE. (It would be interesting to know what the few top plans were for each of your queries. It is quite likely that those few are the same for both, but that the plan that is more directly derived from the particular input expression is the one that is offered when there's little difference.)
The way to get the DBMS to find good query plans is to write the most natural expression of a query that you can find.

BatchUpdate using an Oracle view

I have a complex Oracle view which takes around ~ 2 - 3 seconds to execute.
I'm trying to insert values from the Oracle view into a table.
I'm using JdbcTemplate BatchUpdate() to insert multiple values into the table.
In BatchUpdate(), PreparedStatement is used to set values.
Will using a Oracle view, cause any performance issue?
By using PreparedStatement, SQL statements are precompiled. But in case of VIEW, will the view be executed each time the insert query is fired ?
Views are just SQL-statements. They are not slower or faster than the underlying SQL-query.
However, when using complex views (multi-table joins and aggregation) built on-top of other complex views the optimizer may get confused and tries to outsmart itself, leading to really bad execution plans. The problems tend to be even worse if you don't have constraints, referential integrity in place.
A final note is that if you are merely pulling data out of the database to stuff it back in, you would probably achieve better performance doing the entire operation in the database instead. For an example, let's say you pull "order lines" from the database and then updates the "order header" with an "Order Total Qty". In this case you should probably do something like below instead:
merge
into order_header h
using (select order_id, sum(order_qty) as order_total_qty
from order_line
group by order_id
) l
on (h.order_id = l.order_id)
when matched then
update
set h.order_total_qty = l.order_total_qty;

Resources