Looking for the fastest way to check for duplicates while processing an import in SQL.
For the most part there will be no duplicates. I am using an IF so that ROW OVER PARTITION will only run when there are duplicates. The IF also dumps the errors to a table.
The issue I am having is I can't use temp tables because it cannot be the same table for both IF and ELSE.
I decided to duplicate the merge for the IF and ELSE. Is there a better/faster way to accomplish this?
--#ImportID is pulled from the filename and datetime
DROP TABLE IF EXISTS [#Duplicate];
SELECT [IDColumn],
COUNT(*) AS [Count]
INTO [#Duplicate]
FROM [TableOfImport]
GROUP BY [IDColumn]
HAVING COUNT(*) > 1;
--SELECT * INTO [#Unique] FROM (
IF 0 = (SELECT MAX(count) FROM [#Duplicate])
PRINT 'Unique'
SELECT *
--INTO #UniqueInfo
FROM [TableOfImport]
-- MERGE STATEMENT HERE --
ELSE
PRINT 'Duplicates'
INSERT INTO [#error] ([Import]
, [Row]
, [Error])
SELECT #ImportID
, [IDColumn]
, 'Duplicate'
FROM [#Duplicate];
SELECT *
--INTO #UniqueInfo
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY [IDColumn] ORDER BY [IDColumn] DESC) rn
FROM [TableOfImport]
) x
WHERE x.rn = 1
-- SAME MERGE STATEMENT HERE --
END
--)
I have a function, which will get greatest of three dates from the table.
create or replace FUNCTION fn_max_date_val(
pi_user_id IN number)
RETURN DATE
IS
l_modified_dt DATE;
l_mod1_dt DATE;
l_mod2_dt DATE;
ret_user_id DATE;
BEGIN
SELECT MAX(last_modified_dt)
INTO l_modified_dt
FROM table1
WHERE id = pi_user_id;
-- this table contains a million records
SELECT nvl(MAX(last_modified_ts),sysdate-90)
INTO l_mod1_dt
FROM table2
WHERE table2_id=pi_user_id;
-- this table contains clob data, 800 000 records, the table 3 does not have user_id and has to fetched from table 2, as shown below
SELECT nvl(MAX(last_modified_dt),sysdate-90)
INTO l_mod2_dt
FROM table3
WHERE table2_id IN
(SELECT id FROM table2 WHERE table2_id=pi_user_id
);
execute immediate 'select greatest('''||l_modified_dt||''','''||l_mod1_dt||''','''||l_mod2_dt||''') from dual' into ret_user_id;
RETURN ret_user_id;
EXCEPTION
WHEN OTHERS THEN
return SYSDATE;
END;
this function works perfectly fine and executes within a second.
-- random user_id , just to test the functionality
SELECT fn_max_date_val(100) as max_date FROM DUAL
MAX_DATE
--------
27-02-14
For reference purpose i have used the table name as table1,table2 and table3 but my business case is similar to what i stated below.
I need to get the details of the table1 along with the highest modified date among the three tables.
I did something like this.
SELECT a.id,a.name,a.value,fn_max_date_val(id) as max_date
FROM table1 a where status_id ='Active';
The above query execute perfectly fine and got result in millisecods. But the problem came when i tried to use order by.
SELECT a.id,a.name,a.value,a.status_id,last_modified_dt,fn_max_date_val(id) as max_date
FROM table1 where status_id ='Active' a
order by status_id desc,last_modified_dt desc ;
-- It took almost 300 seconds to complete
I tried using index also all the values of the status_id and last_modified, but no luck. Can this be done in a right way?
How about if your query is like this?
select a.*, fn_max_date_val(id) as max_date
from
(SELECT a.id,a.name,a.value,a.status_id,last_modified_dt
FROM table1 where status_id ='Active' a
order by status_id desc,last_modified_dt desc) a;
What if you don't use the function and do something like this:
SELECT a.id,a.name,a.value,a.status_id,last_modified_dt x.max_date
FROM table1 a
(
select max(max_date) as max_date
from (
SELECT MAX(last_modified_dt) as max_date
FROM table1 t1
WHERE t1.id = a.id
union
SELECT nvl(MAX(last_modified_ts),sysdate-90) as max_date
FROM table2 t2
WHERE t2.table2_id=a.id
...
) y
) x
where a.status_id ='Active'
order by status_id desc,last_modified_dt desc;
Syntax might contain errors, but something like that + the third table in the derived table too.
I have a table that looks like this :
CREATE OR REPLACE TYPE tip_orase AS VARRAY(10) of VARCHAR2(50)
/
CREATE table excursie_try (
cod_excursie NUMBER(4),
denumire VARCHAR2(20),
orase tip_orase,
status varchar2(20)
);
And i need to find out 'cod_excursie' of the entry that has in orase the lowest number of entries.
I can do this with a lot of work by counting for each entry the number of cities and selecting a minimum. Then making a query to give 'cod_excursie' of the entry that has the lowest number of entries in orase.
Is there a simpler way ? I tried something like:
select cod_excursie
from excursie_try, (select max(orase.count()) m
from excursie_try) T
where orase.count = T.m
and ROWNUM <= 1;
but it does not work. Any ideas or i have to take the LONG way ?
Try this:
select cod_excursie from (
select et.cod_excursie,
(select count(*) from table(et.orase)) n
from excursie_try et order by 2 desc
) where rownum = 1;
(select count(*) from table(et.orase)) is a single-row subquery, I used TABLE to emulate sql table on varray.
order by 2 desc in the subquery + where rownum = 1 is used for top-N reporting.
I have a table ( table1 ) that represents item grouping and another table ( table2 ) that represents the items themselves.
table1.id is foreign key to table2 and in every record of table1 I also collect information like the total number of records in table2 associated with that particular record and the sum of various fields so that I can show the grouping and a summary of what's in it without having to query table2.
Usually items in table2 are added/removed one at a time, so I update table1 to reflect the changes in table2.
A new requirement arose, choosen items in a group must be moved to a new group. I thought of it as a 3 step operation:
create a new group in table1
update choosen records in table2 to point to the newly created rec in table1
the third step would be to subtract to the group the number of records / the sum of those other fields I need do show and add them to the new group, data that I can find simply querying table2 for items associated with the new group.
I came up with the following statement that works.
update table1 t1 set
countitems = (
case t1.id
when 1 then t1.countitems - ( select count( t2.id ) from table2 t2 where t2.id = 2 )
when 2 then ( select count( t2.id ) from table2 t2 where t2.id = 2 )
end
),
sumitems = (
case t1.id
when 1 then t1.sumitems - ( select sum( t2.num ) from table2 t2 where t2.id = 2 )
when 2 then ( select sum( t2.num ) from table2 t2 where t2.id = 2 )
end
)
where t1.id in( 1, 2 );
is there a way to rewrite the statement without having to repeat the subquery every time?
thanks
Piero
You can use a cursor and a bulk collect update statement on the rowid. That way you can simply write the join query with the desired result and update the table with those values. I always use this function and make slight adjustments each time.
declare
cursor cur_cur
IS
select ti.rowid row_id
, count(t2.id) countitems
, sum(t2.num) numitems
from table t1
join table t2 on t1.id = t2.t1_id
order by row_id
;
type type_rowid_array is table of rowid index by binary_integer;
type type_countitems_array is table of table1.countitems%type;
type type_numitems_array is table of table1.numitems%type;
arr_rowid type_rowid_array;
arr_countitems type_countitems_array;
arr_numitems type_numitems_array;
v_commit_size number := 10000;
begin
open cur_cur;
loop
fetch cur_cur bulk collect into arr_rowid, arr_countitems, arr_numitems limit v_commit_size;
forall i in arr_rowid.first .. arr_rowid.last
update table1 tab
SET tab.countitems = arr_countitems(i)
, tab.numitems = arr_numitems(i)
where tab.rowid = arr_rowid(i)
;
commit;
exit when cur_cur%notfound;
end loop;
close cur_cur;
commit;
exception
when others
then rollback;
raise_application_error(-20000, 'ERROR updating table1(countitems,numitems) - '||sqlerrm);
end;
What is the easiest way to INSERT a row if it doesn't exist, in PL/SQL (oracle)?
I want something like:
IF NOT EXISTS (SELECT * FROM table WHERE name = 'jonny') THEN
INSERT INTO table VALUES ("jonny", null);
END IF;
But it's not working.
Note: this table has 2 fields, say, name and age. But only name is PK.
INSERT INTO table
SELECT 'jonny', NULL
FROM dual -- Not Oracle? No need for dual, drop that line
WHERE NOT EXISTS (SELECT NULL -- canonical way, but you can select
-- anything as EXISTS only checks existence
FROM table
WHERE name = 'jonny'
)
Assuming you are on 10g, you can also use the MERGE statement. This allows you to insert the row if it doesn't exist and ignore the row if it does exist. People tend to think of MERGE when they want to do an "upsert" (INSERT if the row doesn't exist and UPDATE if the row does exist) but the UPDATE part is optional now so it can also be used here.
SQL> create table foo (
2 name varchar2(10) primary key,
3 age number
4 );
Table created.
SQL> ed
Wrote file afiedt.buf
1 merge into foo a
2 using (select 'johnny' name, null age from dual) b
3 on (a.name = b.name)
4 when not matched then
5 insert( name, age)
6* values( b.name, b.age)
SQL> /
1 row merged.
SQL> /
0 rows merged.
SQL> select * from foo;
NAME AGE
---------- ----------
johnny
If name is a PK, then just insert and catch the error. The reason to do this rather than any check is that it will work even with multiple clients inserting at the same time. If you check and then insert, you have to hold a lock during that time, or expect the error anyway.
The code for this would be something like
BEGIN
INSERT INTO table( name, age )
VALUES( 'johnny', null );
EXCEPTION
WHEN dup_val_on_index
THEN
NULL; -- Intentionally ignore duplicates
END;
I found the examples a bit tricky to follow for the situation where you want to ensure a row exists in the destination table (especially when you have two columns as the primary key), but the primary key might not exist there at all so there's nothing to select.
This is what worked for me:
MERGE INTO table1 D
USING (
-- These are the row(s) you want to insert.
SELECT
'val1' AS FIELD_A,
'val2' AS FIELD_B
FROM DUAL
) S ON (
-- This is the criteria to find the above row(s) in the
-- destination table. S refers to the rows in the SELECT
-- statement above, D refers to the destination table.
D.FIELD_A = S.FIELD_A
AND D.FIELD_B = S.FIELD_B
)
-- This is the INSERT statement to run for each row that
-- doesn't exist in the destination table.
WHEN NOT MATCHED THEN INSERT (
FIELD_A,
FIELD_B,
FIELD_C
) VALUES (
S.FIELD_A,
S.FIELD_B,
'val3'
)
The key points are:
The SELECT statement inside the USING block must always return rows. If there are no rows returned from this query, no rows will be inserted or updated. Here I select from DUAL so there will always be exactly one row.
The ON condition is what sets the criteria for matching rows. If ON does not have a match then the INSERT statement is run.
You can also add a WHEN MATCHED THEN UPDATE clause if you want more control over the updates too.
Using parts of #benoit answer, I will use this:
DECLARE
varTmp NUMBER:=0;
BEGIN
-- checks
SELECT nvl((SELECT 1 FROM table WHERE name = 'john'), 0) INTO varTmp FROM dual;
-- insert
IF (varTmp = 1) THEN
INSERT INTO table (john, null)
END IF;
END;
Sorry for I don't use any full given answer, but I need IF check because my code is much more complex than this example table with name and age fields. I need a very clear code. Well thanks, I learned a lot! I'll accept #benoit answer.
In addition to the perfect and valid answers given so far, there is also the ignore_row_on_dupkey_index hint you might want to use:
create table tq84_a (
name varchar2 (20) primary key,
age number
);
insert /*+ ignore_row_on_dupkey_index(tq84_a(name)) */ into tq84_a values ('Johnny', 77);
insert /*+ ignore_row_on_dupkey_index(tq84_a(name)) */ into tq84_a values ('Pete' , 28);
insert /*+ ignore_row_on_dupkey_index(tq84_a(name)) */ into tq84_a values ('Sue' , 35);
insert /*+ ignore_row_on_dupkey_index(tq84_a(name)) */ into tq84_a values ('Johnny', null);
select * from tq84_a;
The hint is described on Tahiti.
you can use this syntax:
INSERT INTO table_name ( name, age )
select 'jonny', 18 from dual
where not exists(select 1 from table_name where name = 'jonny');
if its open an pop for asking as "enter substitution variable" then use this before the above queries:
set define off;
INSERT INTO table_name ( name, age )
select 'jonny', 18 from dual
where not exists(select 1 from table_name where name = 'jonny');
You should use Merge:
For example:
MERGE INTO employees e
USING (SELECT * FROM hr_records WHERE start_date > ADD_MONTHS(SYSDATE, -1)) h
ON (e.id = h.emp_id)
WHEN MATCHED THEN
UPDATE SET e.address = h.address
WHEN NOT MATCHED THEN
INSERT (id, address)
VALUES (h.emp_id, h.address);
or
MERGE INTO employees e
USING hr_records h
ON (e.id = h.emp_id)
WHEN MATCHED THEN
UPDATE SET e.address = h.address
WHEN NOT MATCHED THEN
INSERT (id, address)
VALUES (h.emp_id, h.address);
https://oracle-base.com/articles/9i/merge-statement
CTE and only CTE :-)
just throw out extra stuff. Here is almost complete and verbose form for all cases of life. And you can use any concise form.
INSERT INTO reports r
(r.id, r.name, r.key, r.param)
--
-- Invoke this script from "WITH" to the end (";")
-- to debug and see prepared values.
WITH
-- Some new data to add.
newData AS(
SELECT 'Name 1' name, 'key_new_1' key FROM DUAL
UNION SELECT 'Name 2' NAME, 'key_new_2' key FROM DUAL
UNION SELECT 'Name 3' NAME, 'key_new_3' key FROM DUAL
),
-- Any single row for copying with each new row from "newData",
-- if you will of course.
copyData AS(
SELECT r.*
FROM reports r
WHERE r.key = 'key_existing'
-- ! Prevent more than one row to return.
AND FALSE -- do something here for than!
),
-- Last used ID from the "reports" table (it depends on your case).
-- (not going to work with concurrent transactions)
maxId AS (SELECT MAX(id) AS id FROM reports),
--
-- Some construction of all data for insertion.
SELECT maxId.id + ROWNUM, newData.name, newData.key, copyData.param
FROM copyData
-- matrix multiplication :)
-- (or a recursion if you're imperative coder)
CROSS JOIN newData
CROSS JOIN maxId
--
-- Let's prevent re-insertion.
WHERE NOT EXISTS (
SELECT 1 FROM reports rs
WHERE rs.name IN(
SELECT name FROM newData
));
I call it "IF NOT EXISTS" on steroids. So, this helps me and I mostly do so.