How to clear table in oracle - oracle

I have data table in Oracle 8,1. There are about a million rows. But lots of rows duplicates by the same columns. I need to know fastest way to clear this data.
For example I have:
id name surname date
21 'john' 'smith' '2012 12 12';
21 'john' 'smith' '2012 12 13';
21 'john' 'smith' '2012 12 14';
....
And now I need to delete first two rows as they duplicates by first three columns and keep the row with the latest date.

If there are really lots of duplicates, I'd recommend to recreate the table with only the clean data:
CREATE TABLE tmp AS
SELECT id, name, surname, max(d) as d
FROM t
GROUP BY id, name, surname;
and then replace the original table with the original table:
RENAME your_table TO old_table;
RENAME tmp_table TO your_table;
Don't forget to move indexes, constraints and privileges...

delete from table t where
exists (select * from table where id=t.id and name=t.name and surname=t.surname
and date > t.date)
How fast this is depends con your Oracle parameters. And index on (id,name,surname) might help.

If possible, I'd go for a CTAS (create table as select), truncate the original table, and copy the data back:
-- create the temp table (it contains only the latest values for a given (id, name, surname) triple
CREATE TABLE tmp as
SELECT id, name, surname, date1 from
(select
t1.*,
row_number() over (partition by id, name, surname order by date1 desc) rn
from mytab t1)
where rn = 1;
-- clear the original table
TRUNCATE TABLE mytab;
-- copy the data back
INSERT /* +APPEND */ INTO mytab(id,name,surname,date1)
(SELECT id,name,surname,date1 from tmp);

Related

Can we use an insert overwrite after using insert all

In Snowflake I am trying to insert updated records to a table. Then I want to identify the records that were just inserted as the most recent records save that as the final table output in a new column called ACTIVE which will either be true or flase. I am having an issue incorporating some sort of updated table segment to my current query. I need everything be contained in the same query rather than break it up into separate parts.
I have my table as follows
CREATE TABLE IF NOT EXISTS MY_TABLE
(
LINK_ID BINARY NOT NULL,
LOAD TIMESTAMP NOT NULL,
SOURCE STRING NOT NULL,
SOURCE_DATE TIMESTAMP NOT NULL,
ORDER BIGINT NOT NULL,
ID BINARY NOT NULL,
ATTRIBUTE_ID BINARY NOT NULL
);
I have records being inserted in this way:
INSERT ALL
WHEN HAS_DATA AND ID_SEQ_NUM > 1 AND (SELECT COUNT(1) FROM MY_TABLE WHERE ID = KEY) = 0 THEN
INTO MY_TABLE VALUES (
LINK_KEY,
TIME,
DATASET_NAME,
DATASET_DATE,
ORDER_NUMBER,
O_KEY,
OA_KEY
)
SELECT *
FROM TEST_TABLE;
I would like my final table from this to be the output as
SELECT *, ORDER != MAX(ORDER) OVER (PARTITION BY ID) AS ACTIVE
FROM MY_TABLE;
This is so I can identify the most recent record per ID group as ACTIVE/TRUE and the previous records within that ID group as INACTIVE/FALSE
I tried to use an insert overwrite method like this
INSERT ALL
WHEN HAS_DATA AND ID_SEQ_NUM > 1 AND (SELECT COUNT(1) FROM MY_TABLE WHERE ID = KEY) = 0 THEN
INTO MY_TABLE VALUES (
LINK_KEY,
TIME,
DATASET_NAME,
DATASET_DATE,
ORDER_NUMBER,
O_KEY,
OA_KEY
)
INSERT OVERWRITE INTO MY_TABLE
SELECT *, RSRC_OFFSET != MAX(RSRC_OFFSET) OVER (PARTITION BY ID) AS ACTIVE
FROM L_OPTION_OPTION_ALLOCATION_TEST
SELECT *
FROM MY_TABLE;
However, it seems the insert overwrite doesn't work in this way (also I am not sure if I can just add a new column to the table like this?). Is there a way I can incorporate it into this query or a different way to update the table with this new ACTIVE column within this query itself?
Also I am using INSERT ALL here because I actually have multiple different tables I am inserting into at once, but this is the current table that I am trying to modify.
You can use the overwrite option with conditional multi-table inserts.
Starting with your current statement:
INSERT ALL
WHEN HAS_DATA AND ID_SEQ_NUM > 1 AND (SELECT COUNT(1) FROM MY_TABLE WHERE ID = KEY) = 0 THEN
INTO MY_TABLE VALUES (
LINK_KEY,
TIME,
DATASET_NAME,
DATASET_DATE,
ORDER_NUMBER,
O_KEY,
OA_KEY
)
SELECT *
FROM TEST_TABLE;
Add the overwrite option immediately after the insert command:
INSERT OVERWRITE ALL
WHEN HAS_DATA AND ID_SEQ_NUM > 1 AND (SELECT COUNT(1) FROM MY_TABLE WHERE ID = KEY) = 0 THEN
INTO MY_TABLE VALUES (
LINK_KEY,
TIME,
DATASET_NAME,
DATASET_DATE,
ORDER_NUMBER,
O_KEY,
OA_KEY
)
SELECT *
FROM TEST_TABLE;
Note that this will truncate and insert ALL tables in the multi-table insert. There is not a way to be selective about which tables get truncated and inserted and which don't.
https://docs.snowflake.com/en/sql-reference/sql/insert-multi-table.html#optional-parameters

SQL delete rows not in another table

I'm looking for a good SQL approach (Oracle database) to fulfill the next requirements:
Delete rows from Table A that are not present in Table B.
Both tables have identical structure
Some fields are nullable
Amount of columns and rows is huge (more 100k rows and 20-30 columns to compare)
Every single field of every single row needs to be compared from Table A against table B.
Such requirement is owing to a process that must run every day as changes will come from Table B.
In other words: Table A Minus Table B => Delete the records from the Table A
delete from Table A
where (field1, field2, field3) in
(select field1, field2, field3
from Table A
minus
select field1, field2, field3
from Table B);
It's very important to mention that a normal MINUS within DELETE clause fails as does not take the nulls on nullable fields into consideration (unknown result for oracle, then no match).
I also tried EXISTS with success, but I have to use NVL function to replace the nulls with dummy values, which I don't want it as I cannot guarantee that the value replaced in NVL will not come as a valid value in the field.
Does anybody know a way to accomplish such thing? Please remember performance and nullable fields as "a must".
Thanks ever
decode finds sameness (even if both values are null):
decode( field1, field2, 1, 0 ) = 1
To delete rows in table1 not found in table2:
delete table1 t
where t.rowid in (select t1.rowid
from table1 t1
left outer join table2 t2
on decode(t1.field1, t2.field1, 1, 0) = 1
and decode(t1.field2, t2.field2, 1, 0) = 1
and decode(t1.field3, t2.field3, 1, 0) = 1
/* ... */
where t2.rowid is null /* no matching row found */
)
to use existing indexes
...
left outer join table2 t2
on (t1.index_field1=t2.index_field1 or
t1.index_field1 is null and t2.index_field1 is null)
and ...
Use a left outer join and test for null in your where clause
delete a
from a
left outer join b on a.x = b.x
where b.x is null
Have you considered ORALCE SQL MERGE statement?
Use Bulk operation for huge number of records. Performance wise it will be faster.
And use join between two table to get rows to be delete. Nullable columns can be compared with some default value.
Also, if you want Table A to be similar as Table B, why don't you truncate table A and then insert data from table b
Assuming you the same PK field available on each table...(Having a PK or some other unique key is critical for this.)
create table table_a (id number, name varchar2(25), dob date);
insert into table_a values (1, 'bob', to_date('01-01-1978','MM-DD-YYYY'));
insert into table_a values (2, 'steve', null);
insert into table_a values (3, 'joe', to_date('05-22-1989','MM-DD-YYYY'));
insert into table_a values (4, null, null);
insert into table_a values (5, 'susan', to_date('08-08-2005','MM-DD-YYYY'));
insert into table_a values (6, 'juan', to_date('11-17-2001', 'MM-DD-YYYY'));
create table table_b (id number, name varchar2(25), dob date);
insert into table_b values (1, 'bob', to_date('01-01-1978','MM-DD-YYYY'));
insert into table_b values (2, 'steve',to_date('10-14-1992','MM-DD-YYYY'));
insert into table_b values (3, null, to_date('05-22-1989','MM-DD-YYYY'));
insert into table_b values (4, 'mary', to_date('12-08-2012','MM-DD-YYYY'));
insert into table_b values (5, null, null);
commit;
-- confirm minus is working
select id, name, dob
from table_a
minus
select id, name, dob
from table_b;
-- from the minus, re-query to just get the key, then delete by key
delete table_a where id in (
select id from (
select id, name, dob
from table_a
minus
select id, name, dob
from table_b)
);
commit;
select * from table_a;
But, if at some point in time, tableA is to be reset to the same as tableB, why not, as another answer suggested, truncate tableA and select all from tableB.
100K is not huge. I can do ~100K truncate and insert on my laptop instance in less than 1 second.
> DELETE FROM purchase WHERE clientcode NOT IN (
> SELECT clientcode FROM client );
This deletes the rows from the purchase table whose clientcode are not in the client table. The clientcode of purchase table references the clientcode of client table.
DELETE FROM TABLE1 WHERE FIELD1 NOT IN (SELECT CLIENT1 FROM TABLE2);

Oracle Comma Separated Value (ID) in a Column. How to get Description for each Value in a Comma Separated string.

Sorry for the Confusing title.I myself did not understand it when i read it second time.
So here is the details description.
I have a table say "Awards" which have following Column:
Name,
Amount,
Employee
and Another table "Employee" which have following column:
Emp_Id,
Emp_Name
and in employee column of "Awards" table i have value "01,20" which are actually the Employee ID referenced to "Employee" table.
So is there any way i can get Employee Name in select "Awards" query?
Here is one method:
select a.*, e.EmpName
from Awards a join
Employees e
on ','||a.employee||',' like '%,'||e.emp_id||',%';
This will return the employee names on separate lines. If you want them in a list, then you would need to concatenate them together (and the best function for doing that depends on your version of Oracle).
By the way, this is a very bad data structure, You should have an association table AwardEmployee that has one row for each row and each employee.
Given below is the query to get comma seperated employee ids in form of rows which I put in subquery to get their name. Please edit as per your ewquirements.
Select Ename from employee where employee_id in (
SELECT trim(x.column_value.extract('e/text()')) COLUMNS
from awards t, table (xmlsequence(xmltype('<e><e>' || replace(Employee,':','</e><e>')||
'</e></e>').extract('e/e'))) x )
I have changed the Database (added one more table). and already started changing the CODE, as for the said report i have used following
WITH t AS
(
Select emp_name from employee where emp_id in (
select regexp_substr(Employee ,'[^,]+', 1, level) from awards
connect by regexp_substr((select Employee from awards ), '[^,]+', 1, level) is
not null)
)
SELECT LTRIM(SYS_CONNECT_BY_PATH(emp_name, ','),',') emp_name
FROM ( SELECT emp_name,
ROW_NUMBER() OVER (ORDER BY emp_name) FILA
FROM t )
WHERE CONNECT_BY_ISLEAF = 1
START WITH FILA = 1
CONNECT BY PRIOR FILA = FILA - 1
Which is temporary and i understand very less of it.
Thanks for you help and suggestion.

Delete Duplicate rows in Vertica database

Vertica allows duplicates to be inserted into the tables. I can view those using the 'analyze_constraints' function.
How to delete duplicate rows from Vertica tables?
You should try to avoid/limit using DELETE with a large number of records. The following approach should be more effective:
Step 1 Create a new table with the same structure / projections as the one containing duplicates:
create table mytable_new like mytable including projections ;
Step 2 Insert into this new table de-duplicated rows:
insert /* +direct */ into mytable_new select <column list> from (
select * , row_number() over ( partition by <pk column list> ) as rownum from <table-name>
) a where a.rownum = 1 ;
Step 3 rename the original table (the one containing dups):
alter table mytable rename to mytable_orig ;
Step 4 rename the new table:
alter table mytable_new rename to mytable ;
That's all.
The answer of Mauro is correct, but there is an error in the sql of step 2. So, the complete way of working by avoiding DELETE should then be as follows:
Step 1 Create a new table with the same structure / projections as the one containing duplicates:
create table mytable_new like mytable including projections ;
Step 2 Insert into this new table de-duplicated rows:
insert /* +direct */ into mytable_new select <column list> from (
select * , row_number() over ( partition by <pk column list> ) as rownum from mytable
) a where a.rownum = 1 ;
Step 3 rename the original table (the one containing dups):
alter table mytable rename to mytable_orig ;
Step 4 rename the new table:
alter table mytable_new rename to mytable ;
Off the top of my head, and not a great answer so let's let this be the final word, you can delete both and insert one back in.
You can delete duplicates by Vertica tables by creating a temporary table and generating pseudo row_ids. Here are few steps, especially if you are removing duplicates from very large and wide tables. In the example below, i assume, k1 and k2 rows have more than 1 duplicates. For more info see here.
-- Find the duplicates
select keys, count(1) from large-table-1
where [where-conditions]
group by 1
having count(1) > 1
order by count(1) desc ;
-- Step 2: Dump the duplicates into temp table
create table test.large-table-1-dups
like large-table-1;
alter table test.large-table-1-dups -- add row_num column (pseudo row_id)
add column row_num int;
insert into test.large-table-1-dups
select *, ROW_NUMBER() OVER(PARTITION BY key)
from large-table-1
where key in ('k1', 'k2'); -- where, say, k1 has n and k2 has m exact dups
-- Step 3: Remove duplicates from the temp table
delete from test.large-table-1-dups
where row_num > 1;
select * from test.dim_line_items_dups;
-- Sanity test. Should have 1 row each of k1 & k2 rows above
-- Step 4: Delete all duplicates from main table...
delete from large-table-1
where key in ('k1', 'k2');
-- Step 5: Insert data back into main table from temp dedupe data
alter table test.large-table-1-dups
drop column row_num;
insert into large-table-1
select * from test.large-table-1-dups;
Step1: Create a intermediate table to port/load the data from original table along with row number.
Here in below sample, porting data from Table1 to Table2 along with row_num column
select * into Table2 from (select *, ROW_NUMBER() OVER(PARTITION BY A,B order by C)as row_num from Table1 ) A;
Step2: Delete data from Table1 using earlier created Table2 in above step
DELETE FROM Table1 WHERE EXISTS (SELECT NULL FROM Table2
where Table2.A=Table1.A
and Table2.B=Table1.B
and row_num > 1);
Step3: Drop table create in first step1 i.e Table2
Drop Table Table2;
You should have a look at this answer from the PostgreSQL wiki which also works for Vertica:
DELETE
FROM
tablename
WHERE
id IN(
SELECT
id
FROM
(
SELECT
id,
ROW_NUMBER() OVER(
partition BY column1,
column2,
column3
ORDER BY
id
) AS rnum
FROM
tablename
) t
WHERE
t.rnum > 1
);
It deletes all duplicate entries but the one with the lowest id.

ORACLE: How to select previous different value?

I have table that stores employee job name, it has the following columns:
id; date_from; date_to; emp_id; jobname_id; grade;
Each emp_id can have many consecutive records with the same jobname_id due to many grade changes.
How can I select previous different jobname_id omitting those that are the same like the most current one?
This solution uses the FIRST_VALUE() analytic function to identify each employee's current job. It then filters for all the jobs which dfon't match that one:
select distinct id
, jobname_id
from ( select id
, jobname_id
, first_value(jobname_id) over (partition by id
order by from_date desc) as current_job
from employee
where emp_id = 1234 )
where jobname_id != current_job
order by id, jobname_id
/
Will this work for your issue:
SELECT DISTINCT
e1.emp_id,
e1.jobname_id
FROM employee e1
WHERE NOT EXISTS
(SELECT 1
FROM employee e2
WHERE e1.emp_id = e2.emp_id
AND SYSDATE BETWEEN e2.date_from
AND NVL(e2.date_to, SYSDATE + 1));
(This asumes your table is named "employee" and emp_id is the PK value).
It selects unique emp_id, jobname_id values where the emp_id, jobname_id values are not current.
EDIT: I agree with Chin Boon that fundamentally this is a design issue and perhaps that should be addressed rather than working around the problem.

Resources