You can insert records into table B if they are in table A
insert into B
select * from A
minus
select * from B
How do you delete records that are in B and not in A?
select * from B
minus
select * from A
How to delete records?
The assumption is that we do not have a primary key or a unique one.
You can do it like this:
delete from a2
where (COLUMN1, COLUMN2, COLUMN3, ...) in (select * from a2
minus
select * from a1);
It works, but you have to enter the column names in the where clause. Can not do delete in such a beautiful way as insert into select * from ...?
If you wanted to continue with your current approach, you should not use select * in your minus query. Rather, you should always explicitly list out the columns which you want to use.
But, I would use an EXISTS query here:
DELETE
FROM tableB b
WHERE NOT EXISTS (SELECT 1 FROM tableA a
WHERE a.col1 = b.col1 AND
a.col2 = b.col2 AND
a.col3 = b.col3 AND ...);
Best practice going forward would to have a primary key column in the A table. Then, you would only need to check that column against the B table for uniqueness.
I have a set of partitions from 1 to 20.
I have this Query below
select * from table1 partition (1);
I would like to do the same statement but on two or three partitions at the same time but not the whole table.
What would be the correct query to do it?
select * from table1 partition (1)
UNION ALL
select * from table1 partition (2)
UNION ALL
select * from table1 partition (3)
--etc.
;
i'm using oracle 11g. i want to know why these two query giving different answer?
logically they are same:
select * from tableA where
exists (select * from tableB where tableA.ID != tableB.ID);
select * from tableA where
not exists (select * from tableB where tableA.ID = tableB.ID);
in the first one i'm selecting every thing that not exist.
in the second one i'm not selecting everything that exist.
note ("exist" changed to "not exist) and ("!=" changed to "=")
look same right? but they give totally different answer
This statement is probably going to return all values in A:
select *
from tableA
where exists (select * from tableB where tableA.ID != tableB.ID);
The only time a row will fail to match is when it is the same as all rows in TableB that have a non-NULL values in ID. So, if TableB has at least two rows with different ids, then all rows in tableA will be returned.
This statement:
select *
from tableA
where not exists (select * from tableB where tableA.ID = tableB.ID);
Is saying that there is no id in TableB that matched the id in TableA. This would be what you want 99% of the time.
The first statement returns A values different from any B value.
The second statement returns A values different from all B values.
Vertica allows duplicates to be inserted into the tables. I can view those using the 'analyze_constraints' function.
How to delete duplicate rows from Vertica tables?
You should try to avoid/limit using DELETE with a large number of records. The following approach should be more effective:
Step 1 Create a new table with the same structure / projections as the one containing duplicates:
create table mytable_new like mytable including projections ;
Step 2 Insert into this new table de-duplicated rows:
insert /* +direct */ into mytable_new select <column list> from (
select * , row_number() over ( partition by <pk column list> ) as rownum from <table-name>
) a where a.rownum = 1 ;
Step 3 rename the original table (the one containing dups):
alter table mytable rename to mytable_orig ;
Step 4 rename the new table:
alter table mytable_new rename to mytable ;
That's all.
The answer of Mauro is correct, but there is an error in the sql of step 2. So, the complete way of working by avoiding DELETE should then be as follows:
Step 1 Create a new table with the same structure / projections as the one containing duplicates:
create table mytable_new like mytable including projections ;
Step 2 Insert into this new table de-duplicated rows:
insert /* +direct */ into mytable_new select <column list> from (
select * , row_number() over ( partition by <pk column list> ) as rownum from mytable
) a where a.rownum = 1 ;
Step 3 rename the original table (the one containing dups):
alter table mytable rename to mytable_orig ;
Step 4 rename the new table:
alter table mytable_new rename to mytable ;
Off the top of my head, and not a great answer so let's let this be the final word, you can delete both and insert one back in.
You can delete duplicates by Vertica tables by creating a temporary table and generating pseudo row_ids. Here are few steps, especially if you are removing duplicates from very large and wide tables. In the example below, i assume, k1 and k2 rows have more than 1 duplicates. For more info see here.
-- Find the duplicates
select keys, count(1) from large-table-1
where [where-conditions]
group by 1
having count(1) > 1
order by count(1) desc ;
-- Step 2: Dump the duplicates into temp table
create table test.large-table-1-dups
like large-table-1;
alter table test.large-table-1-dups -- add row_num column (pseudo row_id)
add column row_num int;
insert into test.large-table-1-dups
select *, ROW_NUMBER() OVER(PARTITION BY key)
from large-table-1
where key in ('k1', 'k2'); -- where, say, k1 has n and k2 has m exact dups
-- Step 3: Remove duplicates from the temp table
delete from test.large-table-1-dups
where row_num > 1;
select * from test.dim_line_items_dups;
-- Sanity test. Should have 1 row each of k1 & k2 rows above
-- Step 4: Delete all duplicates from main table...
delete from large-table-1
where key in ('k1', 'k2');
-- Step 5: Insert data back into main table from temp dedupe data
alter table test.large-table-1-dups
drop column row_num;
insert into large-table-1
select * from test.large-table-1-dups;
Step1: Create a intermediate table to port/load the data from original table along with row number.
Here in below sample, porting data from Table1 to Table2 along with row_num column
select * into Table2 from (select *, ROW_NUMBER() OVER(PARTITION BY A,B order by C)as row_num from Table1 ) A;
Step2: Delete data from Table1 using earlier created Table2 in above step
DELETE FROM Table1 WHERE EXISTS (SELECT NULL FROM Table2
where Table2.A=Table1.A
and Table2.B=Table1.B
and row_num > 1);
Step3: Drop table create in first step1 i.e Table2
Drop Table Table2;
You should have a look at this answer from the PostgreSQL wiki which also works for Vertica:
DELETE
FROM
tablename
WHERE
id IN(
SELECT
id
FROM
(
SELECT
id,
ROW_NUMBER() OVER(
partition BY column1,
column2,
column3
ORDER BY
id
) AS rnum
FROM
tablename
) t
WHERE
t.rnum > 1
);
It deletes all duplicate entries but the one with the lowest id.
I have created a view that fills data from different tables. I used 10 select statements and combine the results of those select statements using UNION ALL.
I want to add primary key column to my view. because I have to create XML file using data in this view. so I need a primary key column for some process in my XML building application.
I have add rownum to all my select statements. But it returned duplicate ids. because rownum in each select statements start from 1.
Then I have created a sequence and tried use nextval . But I can't use sequence because my select statements has group by and order by.
Is there any way to do that ?
You can do a select over the union, for example:
SELECT rownum(),*
FROM (SELECT * FROM tableA UNION ALL SELECT * FROM tableB)
UPDATED
SELECT rownum, t.*
FROM (SELECT * FROM tableA UNION ALL SELECT * FROM tableB) t