WITH Clause performance issue in Oracle 11g - oracle

Table myfirst3 have 4 columns and 1.2 million records.
Table mtl_object_genealogy has over 10 million records.
Running the below code takes very long time. How to tune this code using with options?
WITH level1 as (
SELECT mln_parent.lot_number,
mln_parent.inventory_item_id,
gen.lot_num ,--fg_lot,
gen.segment1,
gen.rcv_date.
FROM mtl_lot_numbers mln_parent,
(SELECT MOG1.parent_object_id,
p.segment1,
p.lot_num,
p.rcv_date
FROM mtl_object_genealogy MOG1 ,
myfirst3 p
START WITH MOG1.object_id = p.gen_object_id
AND (MOG1.end_date_active IS NULL OR MOG1.end_date_active > SYSDATE)
CONNECT BY nocycle PRIOR MOG1.parent_object_id = MOG1.object_id
AND (MOG1.end_date_active IS NULL OR MOG1.end_date_active > SYSDATE)
UNION all
SELECT p1.gen_object_id,
p1.segment1,
p1.lot_num,
p1.rcv_date
FROM myfirst3 p1 ) gen
WHERE mln_parent.gen_object_id = gen.parent_object_id )
select /*+ NO_CPU_COSTING */ *
from level1;
execution plan
CREATE TABLE APPS.MYFIRST3
(
TO_ORGANIZATION_ID NUMBER,
LOT_NUM VARCHAR2(80 BYTE),
ITEM_ID NUMBER,
FROM_ORGANIZATION_ID NUMBER,
GEN_OBJECT_ID NUMBER,
SEGMENT1 VARCHAR2(40 BYTE),
RCV_DATE DATE
);
CREATE TABLE INV.MTL_OBJECT_GENEALOGY
(
OBJECT_ID NUMBER NOT NULL,
OBJECT_TYPE NUMBER NOT NULL,
PARENT_OBJECT_ID NUMBER NOT NULL,
START_DATE_ACTIVE DATE NOT NULL,
END_DATE_ACTIVE DATE,
GENEALOGY_ORIGIN NUMBER,
ORIGIN_TXN_ID NUMBER,
GENEALOGY_TYPE NUMBER,
);
CREATE INDEX INV.MTL_OBJECT_GENEALOGY_N1 ON INV.MTL_OBJECT_GENEALOGY(OBJECT_ID);
CREATE INDEX INV.MTL_OBJECT_GENEALOGY_N2 ON INV.MTL_OBJECT_GENEALOGY(PARENT_OBJECT_ID);

Your explain plan shows some very big numbers. The optimizer reckons the final result set will be about 3227,000,000,000 rows. Just returning that many rows will take some time.
All table accesses are Full Table Scans. As you have big tables that will eat time too.
As for improvements, it's pretty hard to for us understand the logic of your query. This is your data model, you business rules, your data. You haven't explained anything so all we can do is guess.
Why are you using the WITH clause? You only use the level result set once, so just have a regular FROM clause.
Why are you using UNION ALL? That operation just duplicates the records retrieved from myfirst3 ( all those values are already included as rows where MOG1.object_id = p.gen_object_id.
The MERGE JOIN CARTESIAN operation is interesting. Oracle uses it to implement transitive closure. It is an expensive operation but that's because treewalking a hierarchy is an expensive thing to do. It is unfortunate for you that you are generating all the parent-child relationships for a table with 27 million records. That's bad.
The full table scans aren't the problem. There are no filters on myfirst3 so obviously the database has to get all the records. If there is one parent for each myfirst3 record that's 10% of the contents mtl_object_genealogy so a full table scan would be efficient; but you're rolling up the entire hierarchy so it's like you're looking at a much greater chunk of the table.
Your indexes are irrelevant in the face of such numbers. What might help is a composite index on mtl_object_genealogy(OBJECT_ID, PARENT_OBJECT_ID, END_DATE_ACTIVE).
You want all the levels of PARENT_OBJECT_ID for the records in myfirst3. If you run this query often and mtl_object_genealogy is a slowly changing table you should consider materializing the transitive closure into a table which just has records for all the permutations of leaf records and parents.
To sum up:
Ditch the WITH clause
Drop the UNION ALL
Tune the tree-walk with a composite index (or materializing it)

Related

Performance of date time concatenation into timestamp

Oracle 12C, non partitioned, no ASM.
This is the background. I have a table with multiple columns, 3 of them being -
TRAN_DATE DATE
TRAN_TIME TIMESTAMP(6)
FINAL_DATETIME NOT NULL TIMESTAMP(6)
The table has around 70 million records. What I want to do is concatenate the tran_date and the tran_time field and update the final_datetime field with that output, for all 70 million records.
This is the query I have -
update MYSCHEMA.MYTAB set FINAL_DATETIME = (to_timestamp( (to_char(tran_date, 'YYYY/MM/DD') || ' ' ||to_char(TRAN_TIME,'HH24MISS') ), 'YYYY-MM-DD HH24:MI:SS.FF'))
Eg:
At present (for one record)
TRAN_DATE=01-DEC-16
TRAN_TIME=01-JAN-70 12.35.58.000000 AM /*I need only the time part from this*/
FINAL_DATETIME=22-OCT-18 04.37.18.870000 PM
Post the query - the FINAL_DATETIME needs to be
01-DEC-16 12.35.58.000000 AM
The to_timestamp does require 2 character strings and I fear this will slow down the update a lot. Any suggestions?
What more can I do to increase performance? No one else will be using the table at this point, so, I do have the option to
Drop indices
Turn off logging
and anything more anyone can suggest.
Any help is appreciated.
I would prefer CTAS method and your job would be simpler if you didn't have indexes, triggers and constraints on your table.
Create a new table for the column to be modified.
CREATE TABLE mytab_new
NOLOGGING
AS
SELECT /*+ FULL(mytab) PARALLEL(mytab, 10) */ --hint to speed up the select.
CAST(tran_date AS TIMESTAMP) + ( tran_time - trunc(tran_time) ) AS final_datetime
FROM mytab;
I have included only one(the final) column in your new table because storing the other two in the new table is waste of resources. You may include other columns in select apart from the two now redundant ones.
Read logging/nologging to know about NOLOGGING option in the select.
Next step is to rebuild indexes, triggers and constraints for the new table new_mytab using the definition from mytab for other columns if they exist.
Then rename the tables
rename mytab to mytab_bkp;
rename mytab_new to mytab;
You may drop the table mytab_bkp after validating the new table or later when you feel you no longer need it.
Demo

Define index for sparse column

I have a table with a columns 'A' and 'B'.
'A' is a column with 90% 'null' and 10% different values , and most of the time I query to have record with one or two of these different values.
and 'B' is a column with 90% value='1' and 10% different values and most of the time I query to have record with one or two of these different values.
In this table we have DML transaction most of the time.
now , I don't know define index on these columns is good? if yes which type of index?
In principle Bitmap Index would be the best in such situation. However, due to mulit-user environment they are not suitable - you would slow down your application significantly by table locks and perhaps get even dead-locks.
Maybe you can optimize your application by smart partitioning and usage of Partial Indexes (new feature in Oracle 12c)
CREATE TABLE statements below should be equivalent.
CREATE TABLE YOUR_TABLE (a INTEGER, b INTEGER, ... more COLUMNS)
PARTITION BY LIST (a) SUBPARTITION BY LIST (b) (
PARTITION part_a_NULL VALUES (NULL) (
SUBPARTITION part_a_NULL_b_1 VALUES (1) INDEXING OFF,
SUBPARTITION part_a_NULL_b_other VALUES (DEFAULT) INDEXING ON
),
PARTITION part_a_others VALUES (DEFAULT) (
SUBPARTITION part_a_others_b_1 VALUES (1) INDEXING OFF,
SUBPARTITION part_a_others_b_other VALUES (DEFAULT) INDEXING ON
)
);
CREATE TABLE YOUR_TABLE (a INTEGER, b INTEGER, ... more COLUMNS)
PARTITION BY LIST (a) SUBPARTITION BY LIST (b)
SUBPARTITION TEMPLATE (
SUBPARTITION b_1 VALUES (1) INDEXING OFF,
SUBPARTITION b_other VALUES (DEFAULT) INDEXING ON
)
(
PARTITION part_a_NULL VALUES (NULL),
PARTITION part_a_others VALUES (DEFAULT)
);
CREATE INDEX IND_A ON YOUR_TABLE (A) LOCAL INDEXING PARTIAL;
CREATE INDEX IND_B ON YOUR_TABLE (B) LOCAL INDEXING PARTIAL;
By this your index will consume only 10% of entire tablespace. If your WHERE condition is WHERE A IS NULL or WHERE B = 1 then Oracle optimizer would skip such indexes anyway.
Verify with this query
SELECT table_name, partition_name, subpartition_name, indexing
FROM USER_TAB_SUBPARTITIONS
WHERE table_name = 'YOUR_TABLE';
if INDEXING is used on desired subpartitions.
Update
I just see actually this is an overkill because NULL values on column A do not create any index entry anyway. So, it can be simplified to
CREATE TABLE YOUR_TABLE (a INTEGER, b INTEGER, ... more COLUMNS)
PARTITION BY LIST (b) (
PARTITION part_b_1 VALUES (1) INDEXING OFF,
PARTITION part_b_other VALUES (DEFAULT) INDEXING ON
);
For example, if you have index a_b_idx on A, B (in that order):
a) select ... from ... where A = ... will use index
b) select ... from ... where B = ... will not use index
On the other side, if you have index b_a_idx on B, A:
a) select ... from ... where A = ... will not use index
b) select ... from ... where B = ... will use index
Oracle can't use second column in index if it doesn't filter on first column, since in regular cases index is tree-like structure: column1->column2->column3->etc.
You need index on column A only or on columns A, B if you do queries like a).
You need index on column B only or on columns B, A if you do queries like b).
Oracle doesn't store all-null values in index, but it can store null value for A if B contains non-null value.
Sometimes it's more fruitful to read whole table into memory and ignore index. Optimizer can do it if possible result set is big and it goes for all records, since index-to-record transition costs more than simple records read.
Also sometimes it happens erroneously for tables without statistics, so you either need jobs with alter table ... compute statistics or oracle 11+ that can compute statistics like this without jobs.
Most of the times, another index is good thing for queries, but bad thing for updates/disk. Each index takes disk space and each update of record(s) makes updates to every index. So for heavily updated tables it's not good to have many indexes, but for frequently queried tables it's better to have indexes covering all common cases.
For most flat queries (without joins/subqueries/hierarchy) only 1 index is used, so having indexes for each column is generally just a waste of disk space. You need multicolumn index to optimize where A=... and B=...
As for index type, you probably need simple non-unique indexes.
Column A
Let assume that you create an index named _columnA_index_. In general, indexes in RDBMS would not include NULL values, which means there is no index entries in _columnA_index_ pointing to records having NULL values. Thus, the following query
Q1: select * from MyTable where A is null;
will result in a table scan instead ( or DBMS opts to use another index on another column if any)
However, since there is 10% of records having 'different values', the _columnA_index_ will of course help for queries, for example.
Q2: select * from MyTable where A = '123';
In the above example, if the query returns < 1% of the records, the _columnA_index_ is helpful. Depending on how selective the query is, the index greatly improves the performance. You can create an index that is suitable for datatype of column A.
Column B
Similarly, an index on B will not help
Q3: select * from MyTable where B = 1;
but it will help with different values
Q4: select * from MyTable where B = '456';
NULL values
So far, I answered that any index does not help with NULL values. Therefore, if you need to query Q1 most of the time, I suggest the following ideas
Make sure that your version of DBMS does support NULL values be included in indexes. For example Oracle 11g does but not versions before that.
Plan to create function-based index here, again with Oracle. But you can take the idea at least.
Redesign the logic of your application / your need to do querying on Null values. I prefer this approach.

Oracle performance Issue

Need help query performance.
I have a table A joining to a view and it is taking 7 seconds to get the results. But when i do select query on view i get the results in 1 seconds.
I have created the indexes on the table A. But there is no improvements in the query.
SELECT
ITEM_ID, BARCODE, CONTENT_TYPE_CODE, DEPARTMENT, DESCRIPTION, ITEM_NUMBER, FROM_DATE,
TO_DATE, CONTACT_NAME, FILE_LOCATION, FILE_LOCATION_UPPER, SOURCE_LOCATION,
DESTRUCTION_DATE, SOURCE, LABEL_NAME, ARTIST_NAME, TITLE, SELECTION_NUM, REP_IDENTIFIER,
CHECKED_OUT
FROM View B,
table A
where B.item_id=A.itemid
and status='VALID'
AND session_id IN ('naveen13122016095800')
ORDER BY item_id,barcode;
CREATE TABLE A
(
ITEMID NUMBER,
USER_NAME VARCHAR2(25 BYTE),
CREATE_DATE DATE,
SESSION_ID VARCHAR2(240 BYTE),
STATUS VARCHAR2(20 BYTE)
)
CREATE UNIQUE INDEX A_IDX1 ON A(ITEMID);
CREATE INDEX A_IDX2 ON A(SESSION_ID);
CREATE INDEX A_IDX3 ON A(STATUS);'
So querying the view joined to a table is slower than querying the view alone? This is not surprising, is it?
Anyway, it doesn't make much sense to create separate indexes on the fields. The DBMS will pick one index (if any) to access the table. You can try a composed index:
CREATE UNIQUE INDEX A_IDX4 ON A(status, session_id, itemid);
But the DBMS will still only use this index when it sees an advantage in this over simply reading the full table. That means, if the DBMS expects to have to read a big amount of records anyway, it won't indirectly access them via the index.
At last two remarks concerning your query:
Don't use those out-dated comma-separated joins. They are less readable and more prone to errors than explicit ANSI joins (FROM View B JOIN table A ON B.item_id = A.itemid).
Use qualifiers for all columns when working with more than one table or view in your query (and A.status='VALID' ...).
UPDATE: I see now, that you are not selecting any columns from the table, so why join it at all? It seems you are merely looking up whether a record exists in the table, so use EXISTS or IN accordingly. (This may not make it faster, but a lot more readable at least.)
SELECT
ITEM_ID, BARCODE, CONTENT_TYPE_CODE, DEPARTMENT, DESCRIPTION, ITEM_NUMBER, FROM_DATE,
TO_DATE, CONTACT_NAME, FILE_LOCATION, FILE_LOCATION_UPPER, SOURCE_LOCATION,
DESTRUCTION_DATE, SOURCE, LABEL_NAME, ARTIST_NAME, TITLE, SELECTION_NUM, REP_IDENTIFIER,
CHECKED_OUT
FROM View
WHERE itemid IN
(
SELECT itemid
FROM A
WHERE status = 'VALID'
AND session_id IN ('naveen13122016095800')
)
ORDER BY item_id, barcode;

optimize query with minus oracle

Wanted to optimize a query with the minus that it takes too much time ... if they can give thanked help.
I have two tables A and B,
Table A: ID, value
Table B: ID
I want all of Table A records that are not in Table B. Showing the value.
For it was something like:
Select ID, value
FROM A
WHERE value> 70
MINUS
Select ID
FROM B;
Only this query is taking too long ... any tips how best this simple query?
Thank you for attention
Are ID and Value indexed?
The performance of Minus and Not Exists depend:
It really depends on a bunch of factors.
A MINUS will do a full table scan on both tables unless there is some
criteria in the where clause of both queries that allows an index
range scan. A MINUS also requires that both queries have the same
number of columns, and that each column has the same data type as the
corresponding column in the other query (or one convertible to the
same type). A MINUS will return all rows from the first query where
there is not an exact match column for column with the second query. A
MINUS also requires an implicit sort of both queries
NOT EXISTS will read the sub-query once for each row in the outer
query. If the correlation field (you are running a correlated
sub-query?) is an indexed field, then only an index scan is done.
The choice of which construct to use depends on the type of data you
want to return, and also the relative sizes of the two tables/queries.
If the outer table is small relative to the inner one, and the inner
table is indexed (preferrable a unique index but not required) on the
correlation field, then NOT EXISTS will probably be faster since the
index lookup will be pretty fast, and only executed a relatively few
times. If both tables a roughly the same size, then MINUS might be
faster, particularly if you can live with only seeing fields that you
are comparing on.
Minus operator versus 'not exists' for faster SQL query - Oracle Community Forums
You could use NOT EXISTS like so:
SELECT a.ID, a.Value
From a
where a.value > 70
and not exists(
Select b.ID
From B
Where b.ID = a.ID)
EDIT: I've produced some dummy data and two datasets for testing to prove the performance increases of indexing. Note: I did this in MySQL since I don't have Oracle on my Macbook.
Table A has 2600 records with 2 columns: ID, val.
ID is an autoincrement integer
Val varchar(255)
Table b has one column, but more records than Table A. Autoincrement (in gaps of 3)
You can reproduce this if you wish: Pastebin - SQL Dummy Data
Here is the query I will be using:
select a.id, a.val from tablea a
where length(a.val) > 3
and not exists(
select b.id from tableb b where b.id = a.id
);
Without Indexes, the runtime is 986ms with 1685 rows.
Now we add the indexes:
ALTER TABLE `tablea` ADD INDEX `id` (`id`);
ALTER TABLE `tableb` ADD INDEX `id` (`id`);
With Indexes, the runtime is 14ms with 1685 rows. That's 1.42% the time it took without indexes!

T-SQL - wrong query execution plan behaviour

One of our queries degraded after generating load on the DB.
Our query is a join between 3 tables:
Base table which contain 10 M rows.
EventPerson table which contain 5000 rows.
EventPerson788 which is empty.
It seems that the optimizer scans the index on the EventPerson instead of seek, this the script for replicating the issue:
--Create Tables
CREATE TABLE [dbo].[BASE](
[ID] [bigint] NOT NULL,
[IsActive] BIT
PRIMARY KEY CLUSTERED ([ID] ASC)
)ON [PRIMARY]
GO
CREATE TABLE [dbo].[EventPerson](
[DUID] [bigint] NOT NULL,
[PersonInvolvedID] [bigint] NULL,
PRIMARY KEY CLUSTERED ([DUID] ASC)
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [EventPerson_IDX] ON [dbo].[EventPerson]
(
[PersonInvolvedID] ASC
)
CREATE TABLE [dbo].[EventPerson788](
[EntryID] [bigint] NOT NULL,
[LinkedSuspectID] [bigint] NULL,
[sourceid] [bigint] NULL,
PRIMARY KEY CLUSTERED ([EntryID] ASC)
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[EventPerson788] WITH CHECK
ADD CONSTRAINT [FK7A34153D3720F84A]
FOREIGN KEY([sourceid]) REFERENCES [dbo].[EventPerson] ([DUID])
GO
ALTER TABLE [dbo].[EventPerson788] CHECK CONSTRAINT [FK7A34153D3720F84A]
GO
CREATE NONCLUSTERED INDEX [EventPerson788_IDX]
ON [dbo].[EventPerson788] ([LinkedSuspectID] ASC)
GO
--POPOLATE BASE TABLE
DECLARE #I BIGINT=1
WHILE (#I<10000000)
BEGIN
begin transaction
INSERT INTO BASE(ID) VALUES(#I)
SET #I+=1
if (#I%10000=0 )
begin
commit;
end;
END
go
--POPOLATE EventPerson TABLE
DECLARE #I BIGINT=1
WHILE (#I<5000)
BEGIN
BEGIN TRANSACTION
INSERT INTO EventPerson(DUID,PersonInvolvedID) VALUES(#I,(SELECT TOP 1 ID FROM BASE ORDER BY NEWID()))
SET #I+=1
IF(#I%10000=0 )
COMMIT TRANSACTION ;
END
GO
This the query :
select
count(EventPerson.DUID)
from
EventPerson
inner loop join
Base on EventPerson.DUID = base.ID
left outer join
EventPerson788 on EventPerson.DUID = EventPerson788.sourceid
where
(EventPerson.PersonInvolvedID = 37909 or
EventPerson788.LinkedSuspectID = 37909)
AND BASE.IsActive = 1
Do you have any idea why the optimizer decides to use index scan instead of index seek?
Workaround that already done :
Analyze tables and build statistics.
Rebuild Indices.
Try the FORCESEEK hint
None of the above persuaded the optimizer to run an index seek on EventPerson and seek on the base tables.
Thanks for your help .
The scan is there because of the or condition and the outer join against EventPerson788.
Either it will return rows from EventPerson when EventPerson.PersonInvolvedID = 37909 or when the there exists rows in EventPerson788 where EventPerson788.LinkedSuspectID = 37909. The last part means that every row in EventPerson has to be checked against the join.
The fact that EventPerson788 is empty can not be used by the query optimizer since the query plan is saved to be reused later when there might be matching rows in EventPerson788.
Update:
You can rewrite your query using a union all instead of or to get a seek in EventPerson.
select count(EventPerson.DUID)
from
(
select EventPerson.DUID
from EventPerson
where EventPerson.PersonInvolvedID = 1556 and
not exists (select *
from EventPerson788
where EventPerson788.LinkedSuspectID = 1556)
union all
select EventPerson788.sourceid
from EventPerson788
where EventPerson788.LinkedSuspectID = 1556
) as EventPerson
inner join BASE
on EventPerson.DUID=base.ID
where
BASE.IsActive=1
Well, you're asking SQL Server to count the rows of the EventPerson table - so why do you expect a seek to be better than a scan here?
For a COUNT, the SQL Server optimizer will almost always use a scan - it needs to count the rows, after all - all of them... it will do a clustered index scan, if no other non-nullable columns are indexed.
If you have an index on a small, non-nullable column (e.g. on a ID INT or something like that), it would probably do a scan on that index instead (less data to read to count all rows).
But in general: seek is great for selecting one or a few rows - but it sucks if you're dealing with all rows (like for a count)
You can easily observe this behavior if you're using the AdventureWorks sample database.
When doing a COUNT(*) on the Sales.SalesOrderDetail table which has over 120000 rows like this:
SELECT COUNT(*) FROM Sales.SalesOrderDetail
then you'll get an index scan on IX_SalesOrderDetail_ProductID - it just doesn't pay off to do seeks on over 120000 entries!
However, if you do the same operation on a smaller set of data, like this:
SELECT COUNT(*) FROM Sales.SalesOrderDetail
WHERE ProductID = 897
then you get back 2 rows out of all of them - and SQL Server will now use an index seek on that same index.

Resources