How to query tree table structure which has no subnodes - oracle

I have data organized as a tree in Oracle 11g.
CREATE TABLE FOLDER
(
NAME VARCHAR2(20),
ID NUMBER(10, 0),
PARENT_ID NUMBER(10, 0)
}
(I think this structure is obvious and I should explain it).
Trying to query only data that has no subfolders. Using this query (not sure its correct):
SELECT * FROM folder f WHERE NOT EXISTS (
SELECT *
FROM folder cf
WHERE f.id= cf.parent)
Is there possibility to use more optimal query for this purpose, for example using COUNT ?

You could use a hierarchical query:
select name, id, parent_id from (
select name, id, parent_id, connect_by_isleaf as isleaf
from folder
start with parent_id = parent_id -- from comments
connect by nocycle parent_id = prior id
)
where isleaf = 1;
The inner query gets the hierarchy plus a flag set using the connect_by_isleaf pseudocolumn:
The CONNECT_BY_ISLEAF pseudocolumn returns 1 if the current row is a leaf of the tree defined by the CONNECT BY condition. Otherwise it returns 0. This information indicates whether a given row can be further expanded to show more of the hierarchy.
You need the nocycle parameter because the fact you have the parent ID set to the same as the ID in the top-level row of the hierarchy means the prior condition is met by that row itself, causing a loop. The combination of the starting condition and the nocycle will let this work.
Essentially that gives you the whole of the original table plus an extra column with a 1 or a 0. You can run that inner query on its own to see what it produces.
You're only looking for the lead nodes, this which have no sub-folders, so the outer query then filters on the generated isleaf column, so you only see those which are leaves.

Related

Sorting a table with a column with SQL Server 2012

I have a table like this :
How can I sort it out in the form below :
Each set of records has been marked by a FlagID at end
Each set of records has been marked by a FlagID at end
I assume this means that each record is implicitly associated with the first non-null FlagID value that occurs at or after its position along the ID primary key. Thus, we can use a correlated subquery to project this implicit FlagID value for each record, sort by it, then sort by your Row column as tiebreaker for each set.
SELECT *
FROM YourTable T1
ORDER BY
(
SELECT TOP 1 T2.FlagID
FROM YourTable T2
WHERE T2.ID <= T1.ID
AND T2.FlagID IS NOT NULL
ORDER BY T2.ID DESC
),
T1.Row
However, if you're able to alter the database content, I would recommend you to explicitly populate all the FlagID fields, as this would make your life easier. If you do so, then the query becomes:
SELECT *
FROM YourTable T1
ORDER BY T1.FlagID,
T1.Row

optimize query with minus oracle

Wanted to optimize a query with the minus that it takes too much time ... if they can give thanked help.
I have two tables A and B,
Table A: ID, value
Table B: ID
I want all of Table A records that are not in Table B. Showing the value.
For it was something like:
Select ID, value
FROM A
WHERE value> 70
MINUS
Select ID
FROM B;
Only this query is taking too long ... any tips how best this simple query?
Thank you for attention
Are ID and Value indexed?
The performance of Minus and Not Exists depend:
It really depends on a bunch of factors.
A MINUS will do a full table scan on both tables unless there is some
criteria in the where clause of both queries that allows an index
range scan. A MINUS also requires that both queries have the same
number of columns, and that each column has the same data type as the
corresponding column in the other query (or one convertible to the
same type). A MINUS will return all rows from the first query where
there is not an exact match column for column with the second query. A
MINUS also requires an implicit sort of both queries
NOT EXISTS will read the sub-query once for each row in the outer
query. If the correlation field (you are running a correlated
sub-query?) is an indexed field, then only an index scan is done.
The choice of which construct to use depends on the type of data you
want to return, and also the relative sizes of the two tables/queries.
If the outer table is small relative to the inner one, and the inner
table is indexed (preferrable a unique index but not required) on the
correlation field, then NOT EXISTS will probably be faster since the
index lookup will be pretty fast, and only executed a relatively few
times. If both tables a roughly the same size, then MINUS might be
faster, particularly if you can live with only seeing fields that you
are comparing on.
Minus operator versus 'not exists' for faster SQL query - Oracle Community Forums
You could use NOT EXISTS like so:
SELECT a.ID, a.Value
From a
where a.value > 70
and not exists(
Select b.ID
From B
Where b.ID = a.ID)
EDIT: I've produced some dummy data and two datasets for testing to prove the performance increases of indexing. Note: I did this in MySQL since I don't have Oracle on my Macbook.
Table A has 2600 records with 2 columns: ID, val.
ID is an autoincrement integer
Val varchar(255)
Table b has one column, but more records than Table A. Autoincrement (in gaps of 3)
You can reproduce this if you wish: Pastebin - SQL Dummy Data
Here is the query I will be using:
select a.id, a.val from tablea a
where length(a.val) > 3
and not exists(
select b.id from tableb b where b.id = a.id
);
Without Indexes, the runtime is 986ms with 1685 rows.
Now we add the indexes:
ALTER TABLE `tablea` ADD INDEX `id` (`id`);
ALTER TABLE `tableb` ADD INDEX `id` (`id`);
With Indexes, the runtime is 14ms with 1685 rows. That's 1.42% the time it took without indexes!

T-SQL - wrong query execution plan behaviour

One of our queries degraded after generating load on the DB.
Our query is a join between 3 tables:
Base table which contain 10 M rows.
EventPerson table which contain 5000 rows.
EventPerson788 which is empty.
It seems that the optimizer scans the index on the EventPerson instead of seek, this the script for replicating the issue:
--Create Tables
CREATE TABLE [dbo].[BASE](
[ID] [bigint] NOT NULL,
[IsActive] BIT
PRIMARY KEY CLUSTERED ([ID] ASC)
)ON [PRIMARY]
GO
CREATE TABLE [dbo].[EventPerson](
[DUID] [bigint] NOT NULL,
[PersonInvolvedID] [bigint] NULL,
PRIMARY KEY CLUSTERED ([DUID] ASC)
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [EventPerson_IDX] ON [dbo].[EventPerson]
(
[PersonInvolvedID] ASC
)
CREATE TABLE [dbo].[EventPerson788](
[EntryID] [bigint] NOT NULL,
[LinkedSuspectID] [bigint] NULL,
[sourceid] [bigint] NULL,
PRIMARY KEY CLUSTERED ([EntryID] ASC)
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[EventPerson788] WITH CHECK
ADD CONSTRAINT [FK7A34153D3720F84A]
FOREIGN KEY([sourceid]) REFERENCES [dbo].[EventPerson] ([DUID])
GO
ALTER TABLE [dbo].[EventPerson788] CHECK CONSTRAINT [FK7A34153D3720F84A]
GO
CREATE NONCLUSTERED INDEX [EventPerson788_IDX]
ON [dbo].[EventPerson788] ([LinkedSuspectID] ASC)
GO
--POPOLATE BASE TABLE
DECLARE #I BIGINT=1
WHILE (#I<10000000)
BEGIN
begin transaction
INSERT INTO BASE(ID) VALUES(#I)
SET #I+=1
if (#I%10000=0 )
begin
commit;
end;
END
go
--POPOLATE EventPerson TABLE
DECLARE #I BIGINT=1
WHILE (#I<5000)
BEGIN
BEGIN TRANSACTION
INSERT INTO EventPerson(DUID,PersonInvolvedID) VALUES(#I,(SELECT TOP 1 ID FROM BASE ORDER BY NEWID()))
SET #I+=1
IF(#I%10000=0 )
COMMIT TRANSACTION ;
END
GO
This the query :
select
count(EventPerson.DUID)
from
EventPerson
inner loop join
Base on EventPerson.DUID = base.ID
left outer join
EventPerson788 on EventPerson.DUID = EventPerson788.sourceid
where
(EventPerson.PersonInvolvedID = 37909 or
EventPerson788.LinkedSuspectID = 37909)
AND BASE.IsActive = 1
Do you have any idea why the optimizer decides to use index scan instead of index seek?
Workaround that already done :
Analyze tables and build statistics.
Rebuild Indices.
Try the FORCESEEK hint
None of the above persuaded the optimizer to run an index seek on EventPerson and seek on the base tables.
Thanks for your help .
The scan is there because of the or condition and the outer join against EventPerson788.
Either it will return rows from EventPerson when EventPerson.PersonInvolvedID = 37909 or when the there exists rows in EventPerson788 where EventPerson788.LinkedSuspectID = 37909. The last part means that every row in EventPerson has to be checked against the join.
The fact that EventPerson788 is empty can not be used by the query optimizer since the query plan is saved to be reused later when there might be matching rows in EventPerson788.
Update:
You can rewrite your query using a union all instead of or to get a seek in EventPerson.
select count(EventPerson.DUID)
from
(
select EventPerson.DUID
from EventPerson
where EventPerson.PersonInvolvedID = 1556 and
not exists (select *
from EventPerson788
where EventPerson788.LinkedSuspectID = 1556)
union all
select EventPerson788.sourceid
from EventPerson788
where EventPerson788.LinkedSuspectID = 1556
) as EventPerson
inner join BASE
on EventPerson.DUID=base.ID
where
BASE.IsActive=1
Well, you're asking SQL Server to count the rows of the EventPerson table - so why do you expect a seek to be better than a scan here?
For a COUNT, the SQL Server optimizer will almost always use a scan - it needs to count the rows, after all - all of them... it will do a clustered index scan, if no other non-nullable columns are indexed.
If you have an index on a small, non-nullable column (e.g. on a ID INT or something like that), it would probably do a scan on that index instead (less data to read to count all rows).
But in general: seek is great for selecting one or a few rows - but it sucks if you're dealing with all rows (like for a count)
You can easily observe this behavior if you're using the AdventureWorks sample database.
When doing a COUNT(*) on the Sales.SalesOrderDetail table which has over 120000 rows like this:
SELECT COUNT(*) FROM Sales.SalesOrderDetail
then you'll get an index scan on IX_SalesOrderDetail_ProductID - it just doesn't pay off to do seeks on over 120000 entries!
However, if you do the same operation on a smaller set of data, like this:
SELECT COUNT(*) FROM Sales.SalesOrderDetail
WHERE ProductID = 897
then you get back 2 rows out of all of them - and SQL Server will now use an index seek on that same index.

oracle find difference between 2 tables

I have 2 tables that are the same structure. One is a temp one and the other is a prod one. The entire data set gets loaded each time and sometimes this dataset will have deleted records from the prior datasets. I load the dataset into temp table first and if any records were deleted I want to deleted them from the prod table also.
So how can I find the records that exist in prod but not in temp? I tried outer join but it doesn't seem to be working. It's returning all the records from the table in the left or right depending on doing left or right outer join.
I then also want to delete those records in the prod table.
One way would be to use the MINUS operator
SELECT * FROM table1
MINUS
SELECT * FROM table2
will show all the rows in table1 that do not have an exact match in table2 (you can obviously specify a smaller column list if you are only interested in determining whether a particular key exists in both tables).
Another would be to use a NOT EXISTS
SELECT *
FROM table1 t1
WHERE NOT EXISTS( SELECT 1
FROM table2 t2
WHERE t1.some_key = t2.some_key )
How about something like:
SELECT * FROM ProdTable WHERE ID NOT IN
(select ID from TempTable);
It'd work the same as a DELETE statement as well:
DELETE FROM ProdTable WHERE ID NOT IN
(select ID from TempTable);
MINUS can work here
The following statement combines results with the MINUS operator, which returns only rows returned by the first query but not by the second:
SELECT * FROM prod
MINUS
SELECT * FROM temp;
Minus will only work if the table structure is same

Procedure to alter and update table on hierarchical relationship to see if there are any children

I have a hierarchical table on Oracle pl/sql. something like:
create table hierarchical (
id integer primary key,
parent_id references hierarchical ,
name varchar(100));
I need to create a procedure to alter that table so I get a new field that tells, for each node, if it has any children or not.
Is it possible to do the alter and the update in one single procedure?
Any code samples would be much appreciated.
Thanks
You can not do the ALTER TABLE (DDL) and the UPDATE (DML) in a single step.
You will have to do the ALTER TABLE, followed by the UPDATE.
BEGIN
EXECUTE IMMEDIATE 'ALTER TABLE hierarchical ADD child_count INTEGER';
--
EXECUTE IMMEDIATE '
UPDATE hierarchical h
SET child_count = ( SELECT COUNT(*)
FROM hierarchical h2
WHERE h2.parent_id = h.id )';
END;
Think twice before doing this though. You can easily find out now if an id has any childs with a query.
This one would give you the child-count of all top-nodes for example:
SELECT h.id, h.name, COUNT(childs.id) child_count
FROM hierarchical h
LEFT JOIN hierarchical childs ON ( childs.parent_id = h.id )
WHERE h.parent_id IS NULL
GROUP BY h.id, h.name
Adding an extra column with redundant data will make changing your data more difficult, as you will always have to update the parent too, when adding/removing childs.
If you just need to know whether children exist, the following query can do it without the loop or the denormalized column.
select h.*, connect_by_isleaf as No_children_exist
from hierarchical h
start with parent_id is null
connect by prior id = parent_id;
CONNECT_BY_LEAF returns 0 if the row has children, 1 if it does not.
I think you could probably get the exact number of children through a clever use of analytic functions and the LEVEL pseudo-column, but I'm not sure.

Resources