Selecting duplicate rows from a table (ignoring the primary key) - oracle

I have an Oracle table where I want to find out if there are any duplicate rows (i.e. where all column values are equal). The problem is that the rows have unique primary keys so I want to exclude them since they are basically preventing me doing this.
Is there a way to ignore the primary key when doing such a task (instead of listing all columns except the primary key column) so that I can find out the duplicate rows?

No, just list all columns except the primary key columns in the GROUP BY clause:
CREATE TABLE mytable (
pk NUMBER PRIMARY KEY,
c1 NUMBER NOT NULL,
c2 NUMBER
);
INSERT INTO mytable (pk, c1, c2) VALUES (100, 1, 1);
INSERT INTO mytable (pk, c1, c2) VALUES (101, 1, 1);
INSERT INTO mytable (pk, c1, c2) VALUES (102, 2, 1);
INSERT INTO mytable (pk, c1, c2) VALUES (103, 2, null);
INSERT INTO mytable (pk, c1, c2) VALUES (104, 2, null);
SELECT c1, c2
FROM mytable
GROUP BY c1, c2
HAVING COUNT(*) > 1;
C1 C2
----- -----
1 1
2 (null)
To find out the non-primary key columns, you could use the following query. For most tables it will be quicker to type the columns instead of pasting/running the query:
SELECT column_name
FROM user_tab_columns co
WHERE co.table_name = 'MYTABLE'
AND NOT EXISTS (
SELECT *
FROM user_constraints pk
JOIN user_cons_columns pc USING (owner, constraint_name)
WHERE pk.table_name = co.table_name
AND constraint_type='P'
AND co.column_name = pc.column_name)
ORDER BY co.column_id;

You're going to have to list the other columns explicitly.
Potentially, you could use dynamic SQL to generate the query that you want. But that is unlikely to be terribly helpful if this is just for a single table. If you were trying to automate a process of comparing dozens or hundreds of tables, a dynamic SQL approach would potentially be easier to manage.

Related

PLSQL code requirement for partition comparison in oracle

Is there any way to compare values for one partition with another partition of the same table? Requirement is like I have a table and suppose there are 5 partitions, table having two columns(not null). Suppose Col1 having all the distinct values and in col2 there can be a duplicate values. So while comparing one partition with other or we can say rest of the 4 partitions on the basis of distinct col2 values according to the partition name, if the value match between two partition then a new table will create with union of the two partition.
And if there is no match between the col2 values of one partition and with rest of the partition then new table will create of same structure(without any union).
Note:
I want to automate this process through PLSQL code.
Currently what I am doing manually:
I have one table having five partition, for example Table structure:
create table PART_TEST1
(col1 int not null,
col2 int not null)
partition by range (col2)
(partition part1 values less than (10),
partition part2 values less than (20),
partition part3 values less than (30),
partition part4 values less than (40),
partition part5 values less than (maxvalue));
Data distribution:
col1 having distinct values like- 1, 2, 3....so on.
col2 having values like- 1, 2, -1, 1, 2, 3, 4, 1...so on
col2 has duplicate values and my goal is to find the distinct values according to the name of the partition like:
select distinct col2 from PART_TEST1 partition (part1);
For example output of above query is:
Col2
1
2
Again I am querying for finding matching values in other partition:
select distinct col2 from PART_TEST1 partition (part2);
For example output of above query is:
Col2
2
3
So now part 1 and part2 has one same value '2' and two non common values 1 and 3.
so my final query is:
create table 'TABLE_NAME' as select * from part_test1 where col2 = 1;
create table 'TABLE_NAME' as select * from part_test1 where col2 = 3;
create table 'TABLE_NAME' as
(select * from part_test1 where col2 = 2
union
select * from part_test1 where col2 = 2);
Hopefully now you will get some clarity about my problem. I am new to PLSQL and not able to compare the partition values. Also if I am able to compare the values then how can I store the output of the comparison query and then finally create the table? And also I am thinking that I need to compare one partition with rest of the partition like some kind of loop operation.

Delete data returned from subquery in oracle

I have two tables. if the data in table1 is more than a predefined limit (say 2), i need to copy the remaining contents of table1 to table2 and delete those same contents from table1.
I used the below query to insert the excess data from table1 to table2.
insert into table2
SELECT * FROM table1 WHERE ROWNUM < ((select count(*) from table1)-2);
Now i need the delete query to delete the above contents from table1.
Thanks in advance.
A straightforward approach would be an interim storage in a temporary table. Its content can be used to determine the data to be deleted from table1 as well as the source to feed table 2.
Assume (slightly abusing notation) to be the PK column (or that of any candidate key) of table1 - usually there'll be some key that comprises only 1 column.
create global temporary table t_interim as
( SELECT <pk> pkc FROM table1 WHERE ROWNUM < ((select count(*) from table1)-2 )
;
insert into table2
select * from table1 where <pk> IN (
select pkc from t_interim
);
delete from table1 where <pk> IN (
select pkc from t_interim
);
Alternative
If any key of table1 spans more than 1 column, use an EXISTS clause instead as follows ( denoting the i-th component of a candidate key in table1):
create global temporary table t_interim as
( SELECT <ck_1> ck1, <ck_2> ck2, ..., <ck_n> ckn FROM table1 WHERE ROWNUM < ((select count(*) from table1)-2 )
;
insert into table2
select * from table1 t
where exists (
select 1
from t_interim t_i
where t.ck_1 = t_i.ck1
and t.ck_2 = t_i.ck2
...
and t.ck_n = t_i.ckn
)
;
delete from table1 t where
where exists (
select 1
from t_interim t_i
where t.ck_1 = t_i.ck1
and t.ck_2 = t_i.ck2
...
and t.ck_n = t_i.ckn
)
;
(Technically you could try to adjust the first scheme by synthesizing a key from the components of any CK, eg. by concatenating. You run the risk of introducing ambiguities ( (a bc, ab c) -> (abc, abc) ) or run into implementation limits ( max. varchar length ) using the first method)
Note
In case the table doesn't have a PK, you can apply the technique using any candidate key of table1. There will always be one, in the extreme case it's the set of all columns.
This situation may be the right time to improve the db design and add a (synthetic) pk column to table1 ( and any other tables in the system that lack it).

SQL delete rows not in another table

I'm looking for a good SQL approach (Oracle database) to fulfill the next requirements:
Delete rows from Table A that are not present in Table B.
Both tables have identical structure
Some fields are nullable
Amount of columns and rows is huge (more 100k rows and 20-30 columns to compare)
Every single field of every single row needs to be compared from Table A against table B.
Such requirement is owing to a process that must run every day as changes will come from Table B.
In other words: Table A Minus Table B => Delete the records from the Table A
delete from Table A
where (field1, field2, field3) in
(select field1, field2, field3
from Table A
minus
select field1, field2, field3
from Table B);
It's very important to mention that a normal MINUS within DELETE clause fails as does not take the nulls on nullable fields into consideration (unknown result for oracle, then no match).
I also tried EXISTS with success, but I have to use NVL function to replace the nulls with dummy values, which I don't want it as I cannot guarantee that the value replaced in NVL will not come as a valid value in the field.
Does anybody know a way to accomplish such thing? Please remember performance and nullable fields as "a must".
Thanks ever
decode finds sameness (even if both values are null):
decode( field1, field2, 1, 0 ) = 1
To delete rows in table1 not found in table2:
delete table1 t
where t.rowid in (select t1.rowid
from table1 t1
left outer join table2 t2
on decode(t1.field1, t2.field1, 1, 0) = 1
and decode(t1.field2, t2.field2, 1, 0) = 1
and decode(t1.field3, t2.field3, 1, 0) = 1
/* ... */
where t2.rowid is null /* no matching row found */
)
to use existing indexes
...
left outer join table2 t2
on (t1.index_field1=t2.index_field1 or
t1.index_field1 is null and t2.index_field1 is null)
and ...
Use a left outer join and test for null in your where clause
delete a
from a
left outer join b on a.x = b.x
where b.x is null
Have you considered ORALCE SQL MERGE statement?
Use Bulk operation for huge number of records. Performance wise it will be faster.
And use join between two table to get rows to be delete. Nullable columns can be compared with some default value.
Also, if you want Table A to be similar as Table B, why don't you truncate table A and then insert data from table b
Assuming you the same PK field available on each table...(Having a PK or some other unique key is critical for this.)
create table table_a (id number, name varchar2(25), dob date);
insert into table_a values (1, 'bob', to_date('01-01-1978','MM-DD-YYYY'));
insert into table_a values (2, 'steve', null);
insert into table_a values (3, 'joe', to_date('05-22-1989','MM-DD-YYYY'));
insert into table_a values (4, null, null);
insert into table_a values (5, 'susan', to_date('08-08-2005','MM-DD-YYYY'));
insert into table_a values (6, 'juan', to_date('11-17-2001', 'MM-DD-YYYY'));
create table table_b (id number, name varchar2(25), dob date);
insert into table_b values (1, 'bob', to_date('01-01-1978','MM-DD-YYYY'));
insert into table_b values (2, 'steve',to_date('10-14-1992','MM-DD-YYYY'));
insert into table_b values (3, null, to_date('05-22-1989','MM-DD-YYYY'));
insert into table_b values (4, 'mary', to_date('12-08-2012','MM-DD-YYYY'));
insert into table_b values (5, null, null);
commit;
-- confirm minus is working
select id, name, dob
from table_a
minus
select id, name, dob
from table_b;
-- from the minus, re-query to just get the key, then delete by key
delete table_a where id in (
select id from (
select id, name, dob
from table_a
minus
select id, name, dob
from table_b)
);
commit;
select * from table_a;
But, if at some point in time, tableA is to be reset to the same as tableB, why not, as another answer suggested, truncate tableA and select all from tableB.
100K is not huge. I can do ~100K truncate and insert on my laptop instance in less than 1 second.
> DELETE FROM purchase WHERE clientcode NOT IN (
> SELECT clientcode FROM client );
This deletes the rows from the purchase table whose clientcode are not in the client table. The clientcode of purchase table references the clientcode of client table.
DELETE FROM TABLE1 WHERE FIELD1 NOT IN (SELECT CLIENT1 FROM TABLE2);

plsql: inserting Multiple rows from select and ignore duplicates

I am using oracle 10g with TOAD.
I need to insert lacs of records using INSERT FROM SELECT.
BEGIN
INSERT INTO MYTABLE(C1,C2,C3)
SELECT C1,C2,C3 FROM MYTABLE2 WHERE C1>100;
EXCEPTION
WHEN DUP_VAL_ON_INDEX THEN NULL;
COMMIT;
END;
Here, the problem , i am facing is , if this select queries return rows which is already exists in MYTABLE, THEN all transaction will be rolledback.
Is there a way to insert all non-existent rows ,ignoring duplicate rows and continuing with insertion of non-existent rows and then committing the transaction?
Use the Distinct Keyword
BEGIN
INSERT INTO MYTABLE(C1,C2,C3)
SELECT distinct C1,C2,C3 FROM MYTABLE2 WHERE C1>100;
EXCEPTION
WHEN DUP_VAL_ON_INDEX THEN NULL;
COMMIT;
END;
Instead of trying to handle the exception, you can avoid these rows in the first place, e.g., by using the minus operator:
INSERT INTO mytable (c1, c2, c3)
SELECT c1, c2, c3
FROM mytable2
WHERE c1 > 100;
MINUS
SELECT c1, c2, c3
FROM mytable
WHERE c1 > 100 -- not necessary, but should improve performance
There are many ways to do it. First of all, you can try something like this:
insert into mytable(c1, c2, c3)
select distinct c1, c2, c3 from mytable2 where c1 > 100
minus
select c1, c2, c3 from mytable;
Otherwise, you can use something like
insert into mytable(c1, c2, c3)
select c1, c2, c3 from mytable2 where c1 > 100
log errors into myerrtable reject limit unlimited;
And so on...
More detailed about error logging. Feauture introduced since 10g release 2.
SQL> create table garbage(id integer);
Table created
SQL> insert into garbage select rownum from dual connect by level <= 10;
10 rows inserted
SQL> insert into garbage values (3);
1 row inserted
SQL> insert into garbage values (5);
1 row inserted
SQL> create table uniq(id integer not null primary key);
Table created
SQL> insert into uniq select * from garbage;
ORA-00001: unique constraint (TEST.SYS_C0010568) violated
SQL> select count(*) from uniq;
COUNT(*)
----------
0
SQL> exec dbms_errlog.create_error_log('uniq', 'uniq_err');
PL/SQL procedure successfully completed
SQL> insert into uniq select * from garbage
2 log errors into uniq_err reject limit unlimited;
10 rows inserted
SQL> select * from uniq;
ID
---------------------------------------
1
2
3
4
5
6
7
8
9
10
10 rows selected
SQL> select ora_err_mesg$, id from uniq_err;
ORA_ERR_MESG$ ID
---------------------------------------------------------------------- --
ORA-00001: unique constraint (TEST.SYS_C0010568) violated 3
ORA-00001: unique constraint (TEST.SYS_C0010568) violated 5
Thought I would make this an answer, but it really depends on what your trying to achieve.
You can check to see if the data is already in table2 using:
INSERT INTO mytable2 (c1, c2, c3)
SELECT DISTINCT c1,c2,c3 FROM mytable t1
WHERE c1 > 100
AND NOT EXISTS
(select 1 from mytable2 t2 where t2.c1 = t1.c1 and t2.c2 = t1.c2 and t2.c3 = t1.c3);
or you can use a merge like this:
MERGE INTO mytable2 m2
USING (SELECT DISTINCT c1, c2, c3 FROM mytable) m1
ON (m1.c1 = m2.c1 and m1.c2 = m2.c2 and m1.c3 = m2.c3)
WHEN NOT MATCHED THEN INSERT (c1, c2, c3) VALUES (m1.c1, m1.c2, m1.c3)
where m1.c1 > 100;
In both examples, you will only insert unique rows into mytable2

How to update a table with null values with data from other table at one time?

I have 2 tables - A and B . Table A has two columns, pkey (primary key) and col1. Table B also has two columns, pr_key (primary key but not a foreign key) and column1. Both tables have 4 rows. Table B has no values in column1, while table A has column1 values for all 4 rows. So my data looks like this
Table A
pkey col1
A 10
B 20
C 30
D 40
Table B
pr_key column1
A null
B null
C null
D null
I want to update table B to set the column1 value of each row equal to the column1 value of the equivalent row from table A in a single DML statement.
Should be something like that (depends on SQL implementation you use, but in general, the following is rather standard. In particular should work in MS-SQL and in MySQL.
INSERT INTO tblB (pr_key, column1)
SELECT pkey, col1
FROM tblA
-- WHERE some condition (if you don't want 100% of A to be copied)
The question is a bit unclear as to the nature of tblB's pr_key, if for some reason this was a default/auto-incremented key for that table, it could just then be omitted from both the column list (in parenthesis) and in the SELECT that follows. In this fashion upon insertion of each new row, a new value would be generated.
Edit: It appears the OP actually wants to update table B with values from A.
The syntax for this should then be something like
UPDATE tblB
SET Column1 = A.Col1
FROM tblA AS A
JOIN tblB AS B ON B.pr_key = A.pkey
This may perform better:
MERGE INTO tableB
USING (select pkey, col1 from tableA) a
ON (tableB.pr_key = a.pkey)
WHEN MATCHED THEN UPDATE
SET tableB.column1 = a.col1;
It sounds like you want to do a correlated update. The syntax for that in Oracle is
UPDATE tableB b
SET column1 = (SELECT a.column1
FROM tableA a
WHERE a.pkey = b.pr_key)
WHERE EXISTS( SELECT 1
FROM tableA a
WHERE a.pkey = b.pr_key )
The WHERE EXISTS clause isn't necessary if tableA and tableB each have 4 rows and have the same set of keys in each. It is much safer to include that option, though, to avoid updating column1 values of tableB to NULL if there is no matching row in tableA.

Resources