Remove duplicated records based on pair of columns? - ruby

I have a table with the folowwing columns:
id
app_id
tag_id
created_at
updated_at
The pair app_id and tag_id should be uniq. How to remove oldest duplicated records based on updated_at column to determine "oldest" and pair (app_id, tag_id) to determine "duplicated"?

ActiveRecord::Base.connection.execute <<-SQL
delete from my_table where id in (
select t1.id
from my_table t1
inner join (
select app_id, tag_id, min(updated_at) as oldest
from my_table
group by app_id, tag_id
) t2
on t1.app_id = t2.app_id
and t1.tag_id = t2.tag_id
and t1.updated_at = t2.oldest
)
SQL
Replace my_table with your actual table name.

Related

How to find record from a very big HIVE table where column header__timestamp,header__change_seq should be latest update and id should unique

I have to find record from the hive table where Id, der__timestamp, header__change_seq should be unique but in table (Id, der__timestamp, header__change_seq) can duplicate so in this case i have to fetch only one record if records are getting duplicate .
select b.*
from (SELECT ID, max(COALESCE(header__timestamp))
max_modified,MAX(CAST(header__change_seq AS DECIMAL(38,0))) max_sequence
FROM table_name group by ID) a
join table_name b on (a.id=b.id and
a.max_modified=b.header__timestamp and
a.max_sequence=b.header__change_seq)
So the total number of distinct id is count-->244441250
but through above query i am getting count-->244442548
due to some duplicate records but i have to find only distinct id where (header__change_seq and header__timestamp) should max .
#Rahul; please try this one. It makes use of row_number() so in case of duplicate id, header_timestamp and hearder_change_seq, it will select only one record. Hope it helps.
select *
from (
select *,
row_number() over ( partition by id order by header__timestamp desc, header__change_seq desc) as rnk
from table_name) t
where t.rnk = 1;

SQL delete rows not in another table

I'm looking for a good SQL approach (Oracle database) to fulfill the next requirements:
Delete rows from Table A that are not present in Table B.
Both tables have identical structure
Some fields are nullable
Amount of columns and rows is huge (more 100k rows and 20-30 columns to compare)
Every single field of every single row needs to be compared from Table A against table B.
Such requirement is owing to a process that must run every day as changes will come from Table B.
In other words: Table A Minus Table B => Delete the records from the Table A
delete from Table A
where (field1, field2, field3) in
(select field1, field2, field3
from Table A
minus
select field1, field2, field3
from Table B);
It's very important to mention that a normal MINUS within DELETE clause fails as does not take the nulls on nullable fields into consideration (unknown result for oracle, then no match).
I also tried EXISTS with success, but I have to use NVL function to replace the nulls with dummy values, which I don't want it as I cannot guarantee that the value replaced in NVL will not come as a valid value in the field.
Does anybody know a way to accomplish such thing? Please remember performance and nullable fields as "a must".
Thanks ever
decode finds sameness (even if both values are null):
decode( field1, field2, 1, 0 ) = 1
To delete rows in table1 not found in table2:
delete table1 t
where t.rowid in (select t1.rowid
from table1 t1
left outer join table2 t2
on decode(t1.field1, t2.field1, 1, 0) = 1
and decode(t1.field2, t2.field2, 1, 0) = 1
and decode(t1.field3, t2.field3, 1, 0) = 1
/* ... */
where t2.rowid is null /* no matching row found */
)
to use existing indexes
...
left outer join table2 t2
on (t1.index_field1=t2.index_field1 or
t1.index_field1 is null and t2.index_field1 is null)
and ...
Use a left outer join and test for null in your where clause
delete a
from a
left outer join b on a.x = b.x
where b.x is null
Have you considered ORALCE SQL MERGE statement?
Use Bulk operation for huge number of records. Performance wise it will be faster.
And use join between two table to get rows to be delete. Nullable columns can be compared with some default value.
Also, if you want Table A to be similar as Table B, why don't you truncate table A and then insert data from table b
Assuming you the same PK field available on each table...(Having a PK or some other unique key is critical for this.)
create table table_a (id number, name varchar2(25), dob date);
insert into table_a values (1, 'bob', to_date('01-01-1978','MM-DD-YYYY'));
insert into table_a values (2, 'steve', null);
insert into table_a values (3, 'joe', to_date('05-22-1989','MM-DD-YYYY'));
insert into table_a values (4, null, null);
insert into table_a values (5, 'susan', to_date('08-08-2005','MM-DD-YYYY'));
insert into table_a values (6, 'juan', to_date('11-17-2001', 'MM-DD-YYYY'));
create table table_b (id number, name varchar2(25), dob date);
insert into table_b values (1, 'bob', to_date('01-01-1978','MM-DD-YYYY'));
insert into table_b values (2, 'steve',to_date('10-14-1992','MM-DD-YYYY'));
insert into table_b values (3, null, to_date('05-22-1989','MM-DD-YYYY'));
insert into table_b values (4, 'mary', to_date('12-08-2012','MM-DD-YYYY'));
insert into table_b values (5, null, null);
commit;
-- confirm minus is working
select id, name, dob
from table_a
minus
select id, name, dob
from table_b;
-- from the minus, re-query to just get the key, then delete by key
delete table_a where id in (
select id from (
select id, name, dob
from table_a
minus
select id, name, dob
from table_b)
);
commit;
select * from table_a;
But, if at some point in time, tableA is to be reset to the same as tableB, why not, as another answer suggested, truncate tableA and select all from tableB.
100K is not huge. I can do ~100K truncate and insert on my laptop instance in less than 1 second.
> DELETE FROM purchase WHERE clientcode NOT IN (
> SELECT clientcode FROM client );
This deletes the rows from the purchase table whose clientcode are not in the client table. The clientcode of purchase table references the clientcode of client table.
DELETE FROM TABLE1 WHERE FIELD1 NOT IN (SELECT CLIENT1 FROM TABLE2);

Fetch single record from duplicate rows from oracle table

I have a table user_audit_records_tbl which has multiple rows for a single user ,Every time user logs in one entry is made into this table so i want a select query which will fetch a latest single record for each user, I have a query which uses IN clause.
Table Name : user_audit_records_tbl
Record_id Number Primary Key,
user_id varchar Primary Key ,
user_ip varchar,
.
.
etc
Current query i am using is
select * from user_audit_records_tbl where record_id in (select
max(record_id) from user_audit_records_tbl
group by user_id);
but was just wondering if anybody has better solution for this since this table has huge volumns.
You can use the first/last function
select max(Record_id) as Record_id,
user_id,
max(user_ip) keep (dense_rank last order by record_id) as user_ip,
...
from user_audit_records_tbl
group by user_id
No sure if it will be more efficient.
EDIT : As above query is less efficient, may be you could try an exist clause
select *
from user_audit_records_tbl A
where exists ( select 1
from user_audit_records_tbl B
where A.user_id = B.user_id
group by B.user_id
having max(B.record_id) = A.record_id
)
But maybe, you should look on the index side instead of the query side.
select *
from ( select row_number() over ( partition by user_id order by record_id desc) row_nr,
a.*
from user_audit_records_tbl a
)
where row_nr = 1
;

Return all null values between two dates

I have the following query and I need it to return all the null values between those two dates.
select cust_first_name
from customers
join orders using(customer_id)
where order_date between (to_date('01-01-2007','DD-MM-YYYY'))
and (to_date('31-12-2008','DD-MM-YYYY'));
Sounds like what you want is customers with no orders within the given date range. The join you are using finds the opposite of that.
You could do this with an outer join, in which case you need to apply the date filter prior to the join. It's probably easier and more readable to use a NOT IN or NOT EXISTS subquery:
select cust_first_name
from customers
WHERE customers.customer_id NOT IN (
SELECT orders.customer_id from orders
where order_date between (to_date('01-01-2007','DD-MM-YYYY'))
and (to_date('31-12-2008','DD-MM-YYYY'))
)
Here is an example of how to do what you want.
The key part is doing a left join on your orders table, and then simply doing a not between date1 and date2
declare #customers table (
id int identity(1,1),
first_name nvarchar(50),
last_name nvarchar(50)
)
declare #orders table (
id int identity(1,1),
customer_id int,
order_date datetime
)
insert into #customers(first_name, last_name) values ('bob', 'gates')
insert into #customers(first_name, last_name) values ('cyril', 'smith')
insert into #customers(first_name, last_name) values ('harry', 'potter')
insert into #orders(customer_id, order_date) values (1, '2007-02-01')
insert into #orders(customer_id, order_date) values (2, '2015-02-15')
insert into #orders(customer_id, order_date) values (3, '2008-02-15')
select
customers.id
,customers.first_name
,customers.last_name
from #customers customers
left join #orders orders on orders.customer_id = customers.id
where orders.id is null
or orders.order_date not between ('2007-01-01') and ('2008-12-31')
group by
customers.id
,customers.first_name
,customers.last_name;

MERGING DATA OF TWO TABLES

I want to write a query which finds the difference between two tables and writes updates or new data into third table. My two tables have identical column names. Third table which captures changes have extra column called comment. I would like to insert the comment whether it is a new row or updated row based on the row modification.
**TABLE1 (BACKUP)**
KEY,FIRST_NAME,LAST_NAME,CITY
1,RAM,KUMAR,INDIA
2,TOM,MOODY,ENGLAND
3,MOHAMMAD,HAFEEZ,PAKISTAN
4,MONIKA,SAM,USA
5,MIKE,PALEDINO,USA
**TABLE2 (CURRENT)**
KEY,FIRST_NAME,LAST_NAME,CITY
1,RAM,KUMAR,USA
2,TOM,MOODY,ENGLAND
3,MOHAMMAD,HAFEEZ,PAKISTAN
4,MONIKA,SAM,INDIA
5,MIKE,PALEDINO,USA
6,MAHELA,JAYA,SL
**TABLE3 (DIFFERENCE FROM TABLE2 TO TABLE1)**
KEY,FIRST_NAME,LAST_NAME,CITY,COMMENT
1,RAM,KUMAR,USA,UPDATE
4,MONIKA,SAM,INDIA,UPDATE
6,MAHELA,JAYA,SL,INSERT
table scripts
DROP TABLE TABLE1;
DROP TABLE TABLE2;
DROP TABLE TABLE3;
CREATE TABLE TABLE1
(
KEY NUMBER,
FIRST_NAME VARCHAR2(100),
LAST_NAME VARCHAR2(100),
CITY VARCHAR2(50)
);
/
CREATE TABLE TABLE2
(
KEY NUMBER,
FIRST_NAME VARCHAR2(100),
LAST_NAME VARCHAR2(100),
CITY VARCHAR2(50)
);
/
CREATE TABLE TABLE3
(
KEY NUMBER,
FIRST_NAME VARCHAR2(100),
LAST_NAME VARCHAR2(100),
CITY VARCHAR2(50),
COMMENTS VARCHAR2(200)
);
/
INSERT ALL
INTO TABLE1
VALUES(1,'RAM','KUMAR','INDIA')
INTO TABLE1 VALUES(2,'TOM','MOODY','ENGLAND')
INTO TABLE1 VALUES(3,'MOHAMMAD','HAFEEZ','PAKISTAN')
INTO TABLE1 VALUES(4,'MONIKA','SAM','USA')
INTO TABLE1 VALUES(5,'MIKE','PALEDINO','USA')
SELECT 1 FROM DUAL;
/
INSERT ALL
INTO TABLE2
VALUES(1,'RAM','KUMAR','USA')
INTO TABLE2 VALUES(2,'TOM','MOODY','ENGLAND')
INTO TABLE2 VALUES(3,'MOHAMMAD','HAFEEZ','PAKISTAN')
INTO TABLE2 VALUES(4,'MONIKA','SAM','INDIA')
INTO TABLE2 VALUES(5,'MIKE','PALEDINO','USA')
INTO TABLE2 VALUES(6,'MAHELA','JAYA','SL')
SELECT 1 FROM DUAL;
I was using the merge statement to accomplish the same. but i have hit a roadblock in merge statement , it's rhrowing an error "SQL Error: ORA-00905: missing keyword
00905. 00000 - "missing keyword"" I dont understand where is the error. please help
INSERT INTO TABLE3
SELECT KEY,FIRST_NAME,LAST_NAME,CITY,NULL AS COMMENTS FROM TABLE2
MINUS
SELECT KEY,FIRST_NAME,LAST_NAME,CITY,NULL AS COMMENTS FROM TABLE1
;
MERGE INTO TABLE3 A
USING TABLE1 B
ON (A.KEY=B.KEY)
WHEN MATCHED THEN
UPDATE SET A.COMMENTS='UPDATED'
WHEN NOT MATCHED THEN
UPDATE SET A.COMMENTS='INSERTED';
There is no such WHEN NOT MATCHED THEN UPDATE clause, you should use WHEN NOT MATCHED THEN INSERT. Refer to MERGE for details.
A few assumptions made about the data:
An INSERT event will be a record identified by its key in table2 (current data) that does not have a matching key in the original back-up table: table1.
An UPDATE event is a field that exists in both table1 and table2 for the same KEY but is not the same.
Records which did not change between tables are not to be recorded in table3.
Example Query: Check for Updates
SELECT UPD_QUERY.NEW_CITY, 'UPDATED' as COMMENTS
FROM (SELECT CASE WHEN REPLACE(CURR.CITY, BKUP.CITY,'') IS NOT NULL THEN CURR.CITY
ELSE NULL END as NEW_CITY
FROM table1 BKUP, table2 CURR
WHERE BKUP.KEY = CURR.KEY) UPD_QUERY
WHERE UPD_QUERY.NEW_CITY is NOT NULL;
You can repeat this comparison method for the other fields:
SELECT UPD_QUERY.*
FROM (SELECT CURR.KEY,
CASE WHEN REPLACE(CURR.FIRST_NAME, BKUP.FIRST_NAME,'') IS NOT NULL
THEN CURR.FIRST_NAME
ELSE NULL END as FIRST_NAME,
CASE WHEN REPLACE(CURR.LAST_NAME, BKUP.LAST_NAME,'') IS NOT NULL
THEN CURR.LAST_NAME
ELSE NULL END as LAST_NAME,
CASE WHEN REPLACE(CURR.CITY, BKUP.CITY,'') IS NOT NULL
THEN CURR.CITY
ELSE NULL END as CITY
FROM table1 BKUP, table2 CURR
WHERE BKUP.KEY = CURR.KEY) UPD_QUERY
WHERE COALESCE(UPD_QUERY.FIRST_NAME, UPD_QUERY.LAST_NAME, UPD_QUERY.CITY)
is NOT NULL;
NOTE: This could get unwieldy very quickly if the number of columns compared are many. Since the target table design (table3) requires not only identification of a change, but the field and its new value are also recorded.
Example Query: Look for Newly Added Records
SELECT CURR.*, 'INSERTED' as COMMENTS
FROM table2 CURR, table1 BKUP
WHERE CURR.KEY = BKUP.KEY(+)
AND BKUP.KEY is NULL;
Basically MERGE forces the operation: MATCHED=UPDATE (or DELETE), NOT MATCHED = INSERT. It's in the docs.
You can do what you want but you need two insert statements with different set operators,
For UPDATED:
Insert into table3
table1 INTERSECT table2
For INSERTED:
Insert into table3
table2 MINUS table1

Resources