Oracle, merge tables, insert missing values while increasing id - oracle

I have two tables, one with values (FACT table), that I'm trying to merge one column into a key/value table (FACT_ATTRIBUTE table). The key is a number, and the value is a varchar2. The FACT_ATTRIBUTE table already has some entries in it. I am trying to write a package procedure that merges in new entries. We are trying to do away with triggers and sequences. I have a bit of SQL that selects entries not already in the table, but I am struggling with how to create the index key value. The key should just be one greater than the current max.
For example, say the FACT_ATTRIBUTE table already has entries up to key 5. A new entry "value 6" comes in. The insert for the key/value would be (6, "value 6") into FACT_ATTRIBUTE.
Here is where I am at:
INSERT INTO FACT_ATTRIBUTE (
KEY, VALUE
)
VALUES(
(SELECT (MAX(KEY) + 1) FROM FACT_ATTRIBUTE),
(SELECT DISTINCT ENTRY
FROM FACT fact_table
WHERE NOT EXISTS (
SELECT 1 FROM FACT_ATTRIBUTE fa
WHERE fa.VALUE = fact_table.ENTRY
))
);
The problem is obvious, the inner select grabs multiple elements, while the insert is single value. One constraint is that it cannot crash on null (if there are no new values). Is there a good way to accomplish this? My SQL skills are not the sharpest. Thanks in advance for any help!

One approach that might solve your problem is converting your insert statement from values to select. Try this -
INSERT INTO FACT_ATTRIBUTE (
KEY, VALUE
)
SELECT max(key) over (partition by value) + 1 as key_max, f.entry
FROM (
SELECT f.entry, fa.value
FROM fact_table f
LEFT OUTER JOIN fact_attribute fa
ON f.entry = fa.value)
WHERE fa.value is null;
It might need a little syntax polish cause I haven't had a server to check this on, but the gist seems right to me - using a left outer join to query all of the entries not existing in FACT_ATTRIBUTE and attaching the current maximum value in the table + 1.

This is the final solution that got me to what I needed.
DECLARE
max_id NUMBER;
BEGIN
--If no value, start with 1
SELECT NVL(max(fa.FACT_ATTRIBUTE), 1)
INTO max_id
FROM FACT_ATTRIBUTE fa;
INSERT INTO FACT_ATTRIBUTE (KEY, VALUE)
SELECT max_id + ROWNUM, DATA
FROM (
SELECT DISTINCT f.DATA
FROM FACT f
WHERE NOT EXISTS (
SELECT 1
FROM FACT_ATTRIBUTE fa
WHERE fa.VALUE = f.DATA
)
AND f.DATA IS NOT NULL
);
END;

Related

Oracle access varray elements in SQL

I'm playing around with array support in Oracle and hit a roadblock regarding array access within a SQL query. I'm using the following schema:
create type smallintarray as varray(10) of number(3,0);
create table tbl (
id number(19,0) not null,
the_array smallintarray,
primary key (id)
);
What I would like to do is get the id and the first element i.e. at index 1 of the array. In PostgreSQL I could write select id, the_array[1] from tbl t but I don't see how I could do that with Oracle. I read that array access by index is only possible in PL/SQL, which would be fine if I could return a "decorated cursor" to achieve the same result through JDBC, but I don't know if that's possible.
DECLARE
c1 SYS_REFCURSOR;
varr smallintarray2;
BEGIN
OPEN c1 FOR SELECT t.id, t.THE_ARRAY from tbl t;
-- SELECT t.THE_ARRAY INTO varr FROM table_with_enum_arrays2 t;
-- return a "decorated cursor" with varr(1) at select item position 1
dbms_sql.return_result(c1);
END;
You can do this in plain SQL; it's not pretty, but it does work. You would prefer that Oracle had syntax to hide this from the programmer (and perhaps it does, at least in the most recent versions; I am still stuck at 12.2).
select t.id, q.array_element
from tbl t cross apply
( select column_value as array_element,
rownum as ord
from table(the_array)
) q
where ord = 1
;
EDIT If order of generating the elements through the table operator is a concern, you could do something like this (in Oracle 12.1 and higher; otherwise the function can't be part of the query itself, but it can be defined on its own):
with
function select_element(arr smallintarray, i integer)
return number
as
begin
return arr(i);
end;
select id, select_element(the_array, 1) as the_array_1
from tbl
/
First of all, please don't do that on production. Use tables instead of storing arrays within a table.
Answer to your question is to use column as a table source
SELECT t.id, ta.*
from tbl t,
table(t.THE_ARRAY) ta
order by column_value
-- offset 1 row -- in case if sometime you'll need to skip a row
fetch first 1 row only;
UPD: as for ordering the array I can only say playing with 2asc/desc" parameters provided me with results I've expected - it has been ordered ascending or descending.
UPD2: found a cool link to description of performance issues might happen

Compare differences before insert into oracle table

Could you please tell me how to compare differences between table and my select query and insert those results in separate table? My plan is to create one base table (name RESULT) by using select statement and populate it with current result set. Then next day I would like to create procedure which will going to compare same select with RESULT table, and insert differences into another table called DIFFERENCES.
Any ideas?
Thanks!
You can create the RESULT_TABLE using CTAS as follows:
CREATE TABLE RESULT_TABLE
AS SELECT ... -- YOUR QUERY
Then you can use the following procedure which calculates the difference between your query and data from RESULT_TABLE:
CREATE OR REPLACE PROCEDURE FIND_DIFF
AS
BEGIN
INSERT INTO DIFFERENCES
--data present in the query but not in RESULT_TABLE
(SELECT ... -- YOUR QUERY
MINUS
SELECT * FROM RESULT_TABLE)
UNION
--data present in the RESULT_TABLE but not in the query
(SELECT * FROM RESULT_TABLE
MINUS
SELECT ... );-- YOUR QUERY
END;
/
I have used the UNION and the difference between both of them in a different order using MINUS to insert the deleted data also in the DIFFERENCES table. If this is not the requirement then remove the query after/before the UNION according to your requirement.
-- Create a table with results from the query, and ID as primary key
create table result_t as
select id, col_1, col_2, col_3
from <some-query>;
-- Create a table with new rows, deleted rows or updated rows
create table differences_t as
select id
-- Old values
,b.col_1 as old_col_1
,b.col_2 as old_col_2
,b.col_3 as old_col_3
-- New values
,a.col_1 as new_col_1
,a.col_2 as new_col_2
,a.col_3 as new_col_3
-- Execute the query once again
from <some-query> a
-- Outer join to detect also detect new/deleted rows
full join result_t b using(id)
-- Null aware comparison
where decode(a.col_1, b.col_1, 1, 0) = 0
or decode(a.col_2, b.col_2, 1, 0) = 0
or decode(a.col_3, b.col_3, 1, 0) = 0;

Cursor for loop using a selection instead of a table ( Oracle )

I'm writing a procedure to fill up a child table from a parent table. The child table however has more fields than the parent table ( as it should be ). I've conjured a cursor which point to a selection, which is essentially a join of multiple tables.
Here's the code I got so far :
CREATE OR REPLACE PROCEDURE Pop_occ_lezione
AS
x Lezione%rowtype;
CURSOR cc IS
WITH y as(
SELECT Codice_corso,
nome_modulo,
Data_inizio_ed_modulo diem,
Giorno_lezione,
ora_inizio_lezione o_i,
ora_fine_lezione o_f,
anno,
id_cdl,
nome_sede,
locazione_modulo loc
FROM lezione
join ( select id_cdl, anno, codice_corso from corso ) using (codice_corso)
join ( select codice_corso, locazione_modulo from modulo ) using (codice_corso)
join ( select nome_sede, id_cdl from cdl ) using (id_cdl)
WHERE
case
when extract (month from Data_inizio_ed_modulo) < 9 then extract (year from Data_inizio_ed_modulo) - 1
else extract (year from Data_inizio_ed_modulo)
end = extract (year from sysdate+365)
)
SELECT *
FROM y
WHERE sem_check(y.diem,sysdate+365) = 1;
--
BEGIN
FETCH cc into x;
EXIT when cc%NOTFOUND;
INSERT INTO Occr_lezione
VALUES (
x.Codice_corso,
x.Nome_modulo,
x.diem,x.giorno_lezione,
x.Ora_inizio_lezione,
to_date(to_char(next_day(sysdate,x.Giorno_lezione),'DD-MM-YYYY') || to_char(x.Ora_inizio_lezione,' hh24:mi'),'dd-mm-yyyy hh24:mi'),
to_date(to_char(next_day(sysdate,x.Giorno_lezione),'DD-MM-YYYY') || to_char(x.Ora_fine_lezione,' hh24:mi'),'dd-mm-yyyy hh24:mi'),
x.nome_sede,
0,
x.loc
);
END LOOP;
END;
/
But of course it won't work, because the variable x has the type of my initial table row, which has less columns then my selection. Unfortunately As far as I know a rowtype variable is needed to cycle trough a cursor, in order to fetch data from it. Can you see the contradiction? How can I change the code? Is there a certain type of variable which can be crafted to reflect a row from my query result? Or maybe a way to cycle trough the data in the cursor without using a support variable? Or maybe something entirely different? Please let me know.
Ok, so as suggested I tried something like this:
INSERT INTO Occr_lezione(
Codice_corso,
Nome_modulo,
Data_inizio_ed_modulo,
Giorno_lezione,
Ora_inizio_lezione,
Ora_fine_lezione,
Anno,
Id_cdl,
Nome_sede,
Locazione_modulo
)
WITH y as(
SELECT Codice_corso,
Nome_modulo,
Data_inizio_ed_modulo,
Giorno_lezione,
Ora_inizio_lezione,
Ora_fine_lezione,
Anno,
Id_cdl,
Nome_sede,
Locazione_modulo
FROM Lezione
join ( select Id_cdl, Anno, Codice_corso from Corso ) using (codice_corso)
join ( select Codice_corso, Locazione_modulo from Modulo ) using (Codice_corso)
join ( select Nome_sede, Id_cdl from Cdl ) using (id_cdl)
WHERE
case
when extract (month from Data_inizio_ed_modulo) < 9 then extract (year from Data_inizio_ed_modulo) - 1
else extract (year from Data_inizio_ed_modulo)
end = extract (year from sysdate+365)
)
SELECT *
FROM y
WHERE sem_check(y.Data_inizio_ed_modulo,sysdate+365) = 1;
END;
/
But it says PL/SQL: ORA-00904: "LOCAZIONE_MODULO": invalid identifier
which isn't true, because the query return a table in which such column is present... am I missing something?
The code is compiled with no errors, it occurs when I try to fire the procedure.
In the table Occr_lezione as you can see:
CREATE TABLE Occr_lezione (
Codice_corso varchar2(20) NOT NULL,
Nome_modulo varchar2(50) NOT NULL,
Data_inizio_ed_modulo date NOT NULL,
Giorno_lezione number(1) NOT NULL,
Ora_inizio_lezione date NOT NULL,
Data_inizio_occr_lezione date,
Data_fine_occr_lezione date NOT NULL,
Nome_sede varchar2(30) NOT NULL,
Num_aula varchar2(3) NOT NULL,
Tipo_aula varchar2(20) NOT NULL,
--
CONSTRAINT fk_Occr_lezione_lezione FOREIGN KEY (Codice_corso,Nome_modulo,Data_inizio_ed_modulo,Giorno_lezione,Ora_inizio_lezione) REFERENCES Lezione(Codice_corso,Nome_modulo,Data_inizio_ed_modulo,Giorno_lezione,Ora_inizio_lezione) ON DELETE CASCADE,
CONSTRAINT fk_Occr_lezione_aula FOREIGN KEY (Nome_sede,Num_aula,Tipo_aula) REFERENCES Aula(Nome_sede,Num_aula,Tipo_aula) ON DELETE SET NULL,
CONSTRAINT pk_Occr_lezione PRIMARY KEY (Codice_corso,Nome_modulo,Data_inizio_ed_modulo,Giorno_lezione,Ora_inizio_lezione,Data_inizio_occr_lezione),
CHECK ( trunc(Data_inizio_occr_lezione) = trunc(Data_fine_occr_lezione) ), -- data inizio = data fine // prenotazione giornaliera
CHECK ( Data_inizio_occr_lezione < Data_fine_occr_lezione ) -- ora inizio < ora fine // coerenza temporale
there is not a column named Locazione_modulo, however the last column Tipo_aula as the same type and size of Locazione modulo :
CREATE TABLE Modulo (
Codice_corso varchar2(20) NOT NULL,
Nome_modulo varchar2(50),
Locazione_modulo varchar2(20) NOT NULL,
--
CONSTRAINT fk_Modulo_Corso FOREIGN KEY(Codice_corso) REFERENCES Corso(Codice_corso) ON DELETE CASCADE,
CONSTRAINT pk_Modulo PRIMARY KEY(Codice_corso,Nome_modulo),
CHECK (Locazione_modulo IN ('Aula','Laboratorio','Conferenze'))
);
So it should be irrelevant, right?
If you really want to use explicit cursors, you can declare x to be of type cc%rowtype
CREATE OR REPLACE PROCEDURE Pop_occ_lezione
AS
CURSOR cc IS ...
x cc%rowtype;
...
Unless you are using explicit cursors because you want to be able to explicitly fetch the data into local collections that you can leverage later on in your procedure, code using implicit cursors tends to be preferrable. That eliminates the need to FETCH and CLOSE the cursor or to write an EXIT condition and it implicitly does a bulk fetch to minimize context shifts.
BEGIN
FOR x IN cc
LOOP
INSERT INTO Occr_lezione ...
END LOOP;
END;
Of course, in either case, I would hope that you'd choose more meaningful names for your local variables-- x and cc don't tell you anything about what the variables are doing.
If all you are doing is taking data from one set of tables and inserting it into another table, it would be more efficient to write a single INSERT statement rather than coding a PL/SQL loop.
INSERT INTO Occr_lezione( <<column list>> )
SELECT <<column list>>
FROM <<tables you are joining together in the cursor definition>>
WHERE <<conditions from your cursor definition>>

T-SQL - wrong query execution plan behaviour

One of our queries degraded after generating load on the DB.
Our query is a join between 3 tables:
Base table which contain 10 M rows.
EventPerson table which contain 5000 rows.
EventPerson788 which is empty.
It seems that the optimizer scans the index on the EventPerson instead of seek, this the script for replicating the issue:
--Create Tables
CREATE TABLE [dbo].[BASE](
[ID] [bigint] NOT NULL,
[IsActive] BIT
PRIMARY KEY CLUSTERED ([ID] ASC)
)ON [PRIMARY]
GO
CREATE TABLE [dbo].[EventPerson](
[DUID] [bigint] NOT NULL,
[PersonInvolvedID] [bigint] NULL,
PRIMARY KEY CLUSTERED ([DUID] ASC)
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [EventPerson_IDX] ON [dbo].[EventPerson]
(
[PersonInvolvedID] ASC
)
CREATE TABLE [dbo].[EventPerson788](
[EntryID] [bigint] NOT NULL,
[LinkedSuspectID] [bigint] NULL,
[sourceid] [bigint] NULL,
PRIMARY KEY CLUSTERED ([EntryID] ASC)
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[EventPerson788] WITH CHECK
ADD CONSTRAINT [FK7A34153D3720F84A]
FOREIGN KEY([sourceid]) REFERENCES [dbo].[EventPerson] ([DUID])
GO
ALTER TABLE [dbo].[EventPerson788] CHECK CONSTRAINT [FK7A34153D3720F84A]
GO
CREATE NONCLUSTERED INDEX [EventPerson788_IDX]
ON [dbo].[EventPerson788] ([LinkedSuspectID] ASC)
GO
--POPOLATE BASE TABLE
DECLARE #I BIGINT=1
WHILE (#I<10000000)
BEGIN
begin transaction
INSERT INTO BASE(ID) VALUES(#I)
SET #I+=1
if (#I%10000=0 )
begin
commit;
end;
END
go
--POPOLATE EventPerson TABLE
DECLARE #I BIGINT=1
WHILE (#I<5000)
BEGIN
BEGIN TRANSACTION
INSERT INTO EventPerson(DUID,PersonInvolvedID) VALUES(#I,(SELECT TOP 1 ID FROM BASE ORDER BY NEWID()))
SET #I+=1
IF(#I%10000=0 )
COMMIT TRANSACTION ;
END
GO
This the query :
select
count(EventPerson.DUID)
from
EventPerson
inner loop join
Base on EventPerson.DUID = base.ID
left outer join
EventPerson788 on EventPerson.DUID = EventPerson788.sourceid
where
(EventPerson.PersonInvolvedID = 37909 or
EventPerson788.LinkedSuspectID = 37909)
AND BASE.IsActive = 1
Do you have any idea why the optimizer decides to use index scan instead of index seek?
Workaround that already done :
Analyze tables and build statistics.
Rebuild Indices.
Try the FORCESEEK hint
None of the above persuaded the optimizer to run an index seek on EventPerson and seek on the base tables.
Thanks for your help .
The scan is there because of the or condition and the outer join against EventPerson788.
Either it will return rows from EventPerson when EventPerson.PersonInvolvedID = 37909 or when the there exists rows in EventPerson788 where EventPerson788.LinkedSuspectID = 37909. The last part means that every row in EventPerson has to be checked against the join.
The fact that EventPerson788 is empty can not be used by the query optimizer since the query plan is saved to be reused later when there might be matching rows in EventPerson788.
Update:
You can rewrite your query using a union all instead of or to get a seek in EventPerson.
select count(EventPerson.DUID)
from
(
select EventPerson.DUID
from EventPerson
where EventPerson.PersonInvolvedID = 1556 and
not exists (select *
from EventPerson788
where EventPerson788.LinkedSuspectID = 1556)
union all
select EventPerson788.sourceid
from EventPerson788
where EventPerson788.LinkedSuspectID = 1556
) as EventPerson
inner join BASE
on EventPerson.DUID=base.ID
where
BASE.IsActive=1
Well, you're asking SQL Server to count the rows of the EventPerson table - so why do you expect a seek to be better than a scan here?
For a COUNT, the SQL Server optimizer will almost always use a scan - it needs to count the rows, after all - all of them... it will do a clustered index scan, if no other non-nullable columns are indexed.
If you have an index on a small, non-nullable column (e.g. on a ID INT or something like that), it would probably do a scan on that index instead (less data to read to count all rows).
But in general: seek is great for selecting one or a few rows - but it sucks if you're dealing with all rows (like for a count)
You can easily observe this behavior if you're using the AdventureWorks sample database.
When doing a COUNT(*) on the Sales.SalesOrderDetail table which has over 120000 rows like this:
SELECT COUNT(*) FROM Sales.SalesOrderDetail
then you'll get an index scan on IX_SalesOrderDetail_ProductID - it just doesn't pay off to do seeks on over 120000 entries!
However, if you do the same operation on a smaller set of data, like this:
SELECT COUNT(*) FROM Sales.SalesOrderDetail
WHERE ProductID = 897
then you get back 2 rows out of all of them - and SQL Server will now use an index seek on that same index.

Oracle finding last row inserted

Let say I have table my table has values(which they are varchar):
values
a
o
g
t
And I have insert a new value called V
values
V
a
o
g
t
Is there a way or query that can specify what is the last value was insert in the column ? the desired query : select * from dual where rown_num = count(*) -- just an example and the result will be V
Rows in a table have no inherent order. rownum is a pseudocolumn that's part of the select so it isn't useful here. There is no way to tell where in the storage a new row will physically be placed, so you can't rely on rowid, for example.
The only way to do this reliably is to have a timestamp column (maybe set by a trigger so you don't have to worry about it). That would let you order the rows by timestamp and find the row with the highest (most recent) timestamp.
You are still restricted by the precision of the timestamp, as I discovered creating a SQL Fiddle demo; without forcing a small gap between the inserts the timestamps were all the same, but then it only seems to support `timestamp(3). That probably won't be a significant issue in the real world, unless you're doing bulk inserts, but then the last row inserted is still a bit of an arbitrary concept.
As quite correctly pointed out in comments, if the actual time doesn't need to be know, a numeric field populated by a sequence would be more reliable and performant; another SQL Fiddle demo here, and this is the gist:
create table t42(data varchar2(10), id number);
create sequence seq_t42;
create trigger bi_t42
before insert on t42
for each row
begin
:new.id := seq_t42.nextval;
end;
/
insert into t42(data) values ('a');
insert into t42(data) values ('o');
insert into t42(data) values ('g');
insert into t42(data) values ('t');
insert into t42(data) values ('V');
select data from (
select data, row_number() over (order by id desc) as rn
from t42
)
where rn = 1;

Resources