Removing duplicate data from a column

Removing duplicate data from a column - oracle

I have a table with following structure
Id Pro_id name price
----------------------------------------
1 001 ABC 200
1 002 XYZ 100
1 003 XYZ 150
2 004 PQR 100
2 005 PQR 100
2 006 LMN 200
2 007 LMN 300
2 008 DEF 150
As you can see there are some duplicate names in 'name' column.
I want to remove all the duplicate names(just need to keep first entered name and remove remaining)
So my table should look like-
Id Pro_id name price
----------------------------------------
1 001 ABC 200
1 002 XYZ 100
2 004 PQR 100
2 006 LMN 200
2 008 DEF 150
I tried following to get duplicate names-
SELECT ID, NAME, count(NAME) FROM TABLENAME
GROUP BY ID, NAME HAVING count(NAME)>1
But now I am unable to go further, stucked in how to delete the records.
any idea?

You may try below SQL (In MySQL it works)
delete t1.* from tablename t1
inner join
tablename t2 ON t1.name = t2.name
AND t1.Pro_id > t2.Pro_id

There is no "first" in SQL as the order of select is generally undefined, so the following will keep entries with the minimum value of Pro_id for duplicated names, but you are free to define a different aggregator:
DELETE FROM tablename
WHERE Pro_id NOT IN (SELECT MIN(Pro_id) FROM tablename GROUP BY name);

DELETE FROM table_name
WHERE rowid NOT IN
(
SELECT MIN(rowid)
FROM table_name
GROUP BY column1, column2, column3...
) ;

You can try something like this
delete from table1
where rowid in
(
select rid
from
(
select rowid as rid,
row_number() over (partition by name order by pro_id) as rn
from table1
)
where rn > 1
)
Havent tested it

Related

I want to delete duplication with condition from a table in PLSQL

I want to delete the dup lines using PLSQL. The sample of the table is below
Policy #
Price
Dealno for Loan #
Price of Loan
PersonID
123
10
Loan123
1,000
abc
123
10
Loan123
3,000
abc
456
10
Loan456
500
xyz
456
10
Loan456
500
null
As you can see, in the case of Policy #123, I try to get the line with the highest amount of Price of Loan. Which mean the Price of Loan for 3,000.
For Policy #456, I want to get the one without null value.
Is there a way for me to achieve that in PLSQL.
Thank you

This query identifies if a row is OK (rn = 1) or if is is a duplicated copy (rn > 1) based on your definition
select POLICY#, PRICE, LOAN#, PRICE_LOAN, PERSON_ID,
row_number() over (partition by POLICY# order by PRICE_LOAN desc, PERSON_ID nulls last) as rn
from tab
;
POLICY# PRICE LOAN# PRICE_LOAN PER RN
---------- ---------- -------- ---------- --- ----------
123 10 loan123 3000 abc 1
123 10 loan123 1000 abc 2
456 10 loan4563 500 xyz 1
456 10 loan4563 500 2
Note that you use row_number where you partition by on the unique key and order by so that you get first the row that should be taken.
So to get the duplicates only you use this query
with rn as (
select POLICY#, PRICE, LOAN#, PRICE_LOAN, PERSON_ID,
row_number() over (partition by POLICY# order by PRICE_LOAN desc, PERSON_ID nulls last) as rn
from tab
)
select * from rn where rn > 1;
POLICY# PRICE LOAN# PRICE_LOAN PER RN
---------- ---------- -------- ---------- --- ----------
123 10 loan123 1000 abc 2
456 10 loan4563 500 2
Based on this you write the DELETE statement (enclose in BEGIN ... END if you insist in PL/SQL)
delete from tab where rowid in
(
with rn as (
select POLICY#, PRICE, LOAN#, PRICE_LOAN, PERSON_ID,
row_number() over (partition by POLICY# order by PRICE_LOAN desc, PERSON_ID nulls last) as rn
from tab
)
select rowid from rn where rn > 1
);
You may check if the delete worked fine ....
select * from tab;
POLICY# PRICE LOAN# PRICE_LOAN PER
---------- ---------- -------- ---------- ---
123 10 loan123 3000 abc
456 10 loan4563 500 xyz
... and commit

select matching string from another table oracle

I have a database (Oracle) Table A with some strings in one of columns, Now I want to get matching records from Table B against each column value of Table A for example,
Table A
Name
-----------
ABC
DEE
GHI
JKL
Table B
Name
-----------
ABC
DEF
GHI
JKL
MNO
PQR
Now i want that each string in Table A must be checked against Table B's column and if some string is found almost identical then it should appear against original Value as per below
Table OutPut
Name Matched
--------|----------
ABC | ABC
DEE | DEF
GHI | GHI
JKL | JKL
I have tried following query
with data as(
SELECT Name FROM TABLE_A UNION ALL
SELECT Name FROM TABLE_B
)
SELECT Name
FROM
(
SELECT t.*,utl_match.edit_distance_similarity(upper(Name),upper('DEE')) eds
FROM data t
ORDER BY eds DESC
)
WHERE rownum = 1
but problem is that using this query i can check only one record at a time and that too against a hard coded string. Is there any way to check whole column from Table A one by one against Table B and produce result in output against each string.

Not too clever (hint: performance issue, but - see if it helps. Might be OK if there aren't too many rows involved.
You need lines 21 onwards.
I set similarity to be greater than 80 - adjust it, if needed (which is very probable, as data you posted is really sample data).
SQL> WITH ta (name)
2 AS (SELECT 'ABC' FROM DUAL
3 UNION ALL
4 SELECT 'DEE' FROM DUAL
5 UNION ALL
6 SELECT 'GHI' FROM DUAL
7 UNION ALL
8 SELECT 'JKL' FROM DUAL),
9 tb (name)
10 AS (SELECT 'ABC' FROM DUAL
11 UNION ALL
12 SELECT 'DEF' FROM DUAL
13 UNION ALL
14 SELECT 'GHI' FROM DUAL
15 UNION ALL
16 SELECT 'JKL' FROM DUAL
17 UNION ALL
18 SELECT 'MNO' FROM DUAL
19 UNION ALL
20 SELECT 'PQR' FROM DUAL)
21 SELECT ta.name,
22 tb.name,
23 UTL_MATCH.jaro_winkler_similarity (ta.name, tb.name) sim
24 FROM ta, tb
25 WHERE UTL_MATCH.jaro_winkler_similarity (ta.name, tb.name) > 80
26 ;
NAM NAM SIM
--- --- ----------
ABC ABC 100
DEE DEF 82
GHI GHI 100
JKL JKL 100
SQL>

How to return non-empty rows for a given ID - Hive

I have a table X
ID A B
--------------
1 abc 27
1 - 28
2 - 33
3 xyz 41
3 - 07
I need output as
ID A B
--------------
1 abc 27
2 - 33
3 xyz 41
I tried doing
max(A) OVER (PARTITION BY ID) as the_value
but it did not work. I can still see all the rows in the output table.
I was wondering if somebody has come across a similar situation and has a solution to this ?

you can use this simple trick for getting the full record for which some column is maxed:
select original.* from
(select ID,max(B) as B from Tbl group by ID ) maxB
inner join
(select * from Tbl ) original
on original.ID = maxB.ID and original.B = maxB.B
now this is of course an overkill code. you can also do:
select Tbl.* from
(select ID,max(B) as B from Tbl group by ID ) maxB
inner join
Tbl
on Tbl.ID = maxB.ID and Tbl.B = maxB.B
but the first version is more of a template to do whatever you want with further columns, fields, conditions joins etc.

sql query to get the column data in one row

I have below data in a table called data_tab
sn code
2 101
2
2 202
5 103
5
5
How can i query to see result in one row, like
sn code1 code2 code3
2 101 202
5 103

Hi This gives the intented output ... take a look here
select sn,
max(decode(rn,1,code)) as CODE_1
,max(decode(rn,2,code)) as CODE_2
,max(decode(rn,3,code)) as CODE_3
from
(
select sn,
code,
row_number() over (partition by sn order by null ) rn
from test
)
group by sn

Sql Query Problem

i have problem when joining tables(left join)
table1:
id1 amt1
1 100
2 200
3 300
table2:
id2 amt2
1 150
2 250
2 350
my Query:
select id1,amt1,id2,amt2 from table1
left join table2 on table2.id1=table1.id2
my supposed o/p is:
id1 amt1 id2 amt2
row1: 1 100 1 150
row2: 2 200 2 250
row3: 2 200 2 350
i want o/p in row3 as
2 null 2 350
ie i want avoid repetetion of data(amt1)
friends help!

Using LEAD and LAG gives acces to previous or following rows in oracle.
SELECT id1, decode(amt1, lag(amt1) over (order by id1, id2), '', amt1) amt1,
id2, amt2
FROM table1 left join table2 on table2.id1=table1.id2
ORDER BY id1, id2
The order of the query and the order given to the lag function should be the same.
Explanation:
If the current am1 is the same as the preceding amt1 (preceding in the given order) then omit the value.
EDIT
According to your comment, add an additional check for id changes.
SELECT id1,
decode(id1, lag(id1) over (order by id1, id2),
decode(amt1, lag(amt1) over (order by id1, id2), '', amt1),
amt1) amt1,
id2, amt2
FROM table1 left join table2 on table2.id1=table1.id2
ORDER BY id1, id2
Use the same LAG feature to check for id changes. The expression is a bit more complex, but its comparable with a nested if statement.

select distinct id1,amt1,id2,amt2 from table1 left join table2 on table2.id1=table1.id2
try this ?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Removing duplicate data from a column - oracle

You may try below SQL (In MySQL it works) delete t1.* from tablename t1 inner join tablename t2 ON t1.name = t2.name AND t1.Pro_id > t2.Pro_id

DELETE FROM table_name WHERE rowid NOT IN ( SELECT MIN(rowid) FROM table_name GROUP BY column1, column2, column3... ) ;

You can try something like this delete from table1 where rowid in ( select rid from ( select rowid as rid, row_number() over (partition by name order by pro_id) as rn from table1 ) where rn > 1 ) Havent tested it

Related

I want to delete duplication with condition from a table in PLSQL

select matching string from another table oracle

How to return non-empty rows for a given ID - Hive

sql query to get the column data in one row

Sql Query Problem

Categories

Resources