Cannot delete from view without exactly one key-preserved table workaround - oracle

I have a table x_visa. I want to delete the duplicate columns from this table.
The query I am using for this is :
select * from (SELECT x_visa.*,
ROW_NUMBER() over (partition by effective_start_date, effective_end_date, person_id,
business_group_id, legislation_code , current_visa_permit, visa_permit_type, visa_permit_id, configuration_id
order by person_id) AS rn
from x_visa) T
WHERE rn > 1 );
The delete statement is giving an error :ORA-01752: cannot delete from view without exactly one key-preserved table
delete from
(select * from (SELECT x_visa.*,
ROW_NUMBER() over (partition by effective_start_date, effective_end_date, person_id,
business_group_id, legislation_code , current_visa_permit, visa_permit_type, visa_permit_id, configuration_id
order by person_id) AS rn
from x_visa) T
WHERE rn = 1 );
Is there a workaround to delete the duplicate data from this table ?

Each row has rowid identifier. So you can delete where rowid in results of your query.
delete from x_visa where rowid in (/*YOUR QUERY*/);
So we have:
delete from x_visa where rowid in (select r from (SELECT x_visa.rowid r, x_visa.*,
ROW_NUMBER() over (partition by effective_start_date, effective_end_date, person_id,
business_group_id, legislation_code , current_visa_permit, visa_permit_type, visa_permit_id, configuration_id
order by person_id) AS rn
from x_visa) T
WHERE rn > 1 ))

Related

Update a column of table with ROW_NUMBER

I am new to Oracle I need to Update a Column of a Table with ROW_NUMBER() in oracle
i.e
UPDATE tablefull
SET newcolumn=ROW_NUMBER() OVER (PARTITION BY columnid ORDER BY datecolumn)-1
Since the Window function is not allowed in Update I had tried with joining the table with subquery of same table and do the update
update a
set a.newcolumn= b.upnum
from tablefull a
INNER JOIN (SELECT columnid,ROW_NUMBER() OVER (PARTITION BY columnid ORDER BY datecolumn)-1 AS upnum
FROM tablefull) b ON b.columnid=a.columnid
Since the join and update is also not possible in oracle the above also did not worked out
Please anyone help me in the way to update the newcolumn of the table with ROW_NUMBER() OVER (PARTITION BY columnid ORDER BY datecolumn)-1 in Oracle
Put this query with row_number as source table of merge statement:
merge into tablefull tgt
using (select rowid rwd, columnid,
row_number() over (partition by columnid order by datecolumn) - 1 rn
from tablefull) src
on (src.rwd = tgt.rowid and tgt.columnid = src.columnid)
when matched then update set newcolumn = rn;
dbfiddle

Column name become invalid after referred as result of aggregate function MIN()

SELECT cust_detl.*,
MIN(CREATION_TIMESTAMP) OVER (PARTITION BY CUST_ID) AS MIN_TIMESTAMP
FROM CUST_DETAILS cust_detl
WHERE CREATION_TIMESTAMP=MIN_TIMESTAMP;
Above query select all columns from table CUST_DETAILS with oldest value inCREATION_TIMESTAMP column.
Any idea why MIN_TIMESTAMP encounter as an invalid identifier?
These are the columns that should display:
SELECT
CUSTOMER_DTL_SEQ.nextval,
CUST_ID
CUS_REF_ID
CUST_NAME
CUST_ADDRESS
CREATION_TIMESTAMP
FROM
(
CUSTOMER_DTL_SEQ.nextval,
cust_detl.CUST_ID,
cust_detl.CUST_REF_ID,
cust_detl.CUST_NAME,
cust_detl.CUST_ADDRESS,
cust_detl.CREATION_TIMESTAMP,
MIN(CREATION_TIMESTAMP) OVER (PARTITION BY CUST_ID) AS min_timestamp
FROM cust_details cust_detl
)
WHERE CREATION_TIMESTAMP = min_timestamp;
I would need to select CREATION_TIMESTAMP column as well, only those columns with minimum timestamp will be selected. The problem is the sequence with nextval is not allowed. I would need sequence in the query as this statment is going to use for INSERT later SELECT...INSERT INTO
The PK need to be incremented.
The column name is not yet valid, data is filtered first with the where condition and then on the filtered data, select statement works. You need to put it in sub query before you can use it.
SELECT * FROM
(SELECT cust_detl.*,
MIN(CREATION_TIMESTAMP) OVER (PARTITION BY CUST_ID) AS MIN_TIMESTAMP
FROM CUST_DETAILS cust_detl)
WHERE CREATION_TIMESTAMP=MIN_TIMESTAMP;
UPDATE: I don't know what list of columns you have in your table, but if you need only specific columns, then the query goes like this(assuming you need only columns cust_id, column1, column2 and column3 in select list)
SELECT cust_id,
column1,
column2,
column3
FROM (SELECT cust_detl.cust_id,
cust_detl.column1,
cust_detl.column2,
cust_detl.column3,
cust_detl.creation_timestamp,
MIN(creation_timestamp) over(PARTITION BY cust_id) AS min_timestamp
FROM cust_details cust_detl)
WHERE creation_timestamp = min_timestamp;
If you still don't get your solution, the post the list of columns from the table and the expected output.
Update2 : Fetch the cursor in the outer query, this query should work fine.
SELECT customer_dtl_seq.nextval,
cust_id,
cus_ref_id,
cust_name,
cust_address,
creation_timestamp
FROM (SELECT cust_detl.cust_id,
cust_detl.cust_ref_id,
cust_detl.cust_name,
cust_detl.cust_address,
cust_detl.creation_timestamp,
MIN(creation_timestamp) over(PARTITION BY cust_id) AS min_timestamp
FROM cust_details cust_detl)
WHERE creation_timestamp = min_timestamp;

How to find record from a very big HIVE table where column header__timestamp,header__change_seq should be latest update and id should unique

I have to find record from the hive table where Id, der__timestamp, header__change_seq should be unique but in table (Id, der__timestamp, header__change_seq) can duplicate so in this case i have to fetch only one record if records are getting duplicate .
select b.*
from (SELECT ID, max(COALESCE(header__timestamp))
max_modified,MAX(CAST(header__change_seq AS DECIMAL(38,0))) max_sequence
FROM table_name group by ID) a
join table_name b on (a.id=b.id and
a.max_modified=b.header__timestamp and
a.max_sequence=b.header__change_seq)
So the total number of distinct id is count-->244441250
but through above query i am getting count-->244442548
due to some duplicate records but i have to find only distinct id where (header__change_seq and header__timestamp) should max .
#Rahul; please try this one. It makes use of row_number() so in case of duplicate id, header_timestamp and hearder_change_seq, it will select only one record. Hope it helps.
select *
from (
select *,
row_number() over ( partition by id order by header__timestamp desc, header__change_seq desc) as rnk
from table_name) t
where t.rnk = 1;

Need rowid from a query

I am trying to get the rowid from a query. My query is :
Table: test ( PersonId number, AssetId number);
Query:
with abc as(
select
personid ,
row_number() over(partition by personid order by personid,carid) rnk
from test
--group by personid,carid,rowid
)
select rowid, abc.* from abc ;
and its throwing error.
ORA-01446: cannot select ROWID from,or sample,a view with DISTINCT,GROUP BY etc
Is there any way i can get the rowid (this way) or its not allowed to get rowid using the way am trying to get in Oracle. Can anyone throw some views here. Thanks.
Try including rowid inside the with subquery ( I used an alias)
with abc as(
select
personid ,
rowid as r,
row_number() over(partition by personid order by personid,carid) rnk
from test
/***This was unwanted in question***/
--group by personid,carid,rowid
)
select r, personid, rnk
from abc ;
Using GROUP BY with ROWID will not aggregate rows as ROWID is unique so the size of each group will always be 1.
You can just do:
SELECT personid,
ROW_NUMBER() OVER ( PARTITION BY personid ORDER BY carid ) AS rnk,
ROWID
FROM test;

How to optimize this SELECT with sub query Oracle

Here is my query,
SELECT ID As Col1,
(
SELECT VID FROM TABLE2 t
WHERE (a.ID=t.ID or a.ID=t.ID2)
AND t.STARTDTE =
(
SELECT MAX(tt.STARTDTE)
FROM TABLE2 tt
WHERE (a.ID=tt.ID or a.ID=tt.ID2) AND tt.STARTDTE < SYSDATE
)
) As Col2
FROM TABLE1 a
Table1 has 48850 records and Table2 has 15944098 records.
I have separate indexes in TABLE2 on ID,ID & STARTDTE, STARTDTE, ID, ID2 & STARTDTE.
The query is still too slow. How can this be improved? Please help.
I'm guessing that the OR in inner queries is messing up with the optimizer's ability to use indexes. Also I wouldn't recommend a solution that would scan all of TABLE2 given its size.
This is why in this case I would suggest using a function that will efficiently retrieve the information you are looking for (2 index scan per call):
CREATE OR REPLACE FUNCTION getvid(p_id table1.id%TYPE)
RETURN table2.vid%TYPE IS
l_result table2.vid%TYPE;
BEGIN
SELECT vid
INTO l_result
FROM (SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
UNION ALL
SELECT vid, startdte
FROM (SELECT vid, startdte
FROM table2 t
WHERE t.id2 = p_id
AND t.startdte < SYSDATE
ORDER BY t.startdte DESC)
WHERE rownum = 1
ORDER BY startdte DESC)
WHERE rownum = 1;
RETURN l_result;
END;
Your SQL would become:
SELECT ID As Col1,
getvid(a.id) vid
FROM TABLE1 a
Make sure you have indexes on both table2(id, startdte DESC) and table2(id2, startdte DESC). The order of the index is very important.
Possibly try the following, though untested.
WITH max_times AS
(SELECT a.ID, MAX(t.STARTDTE) AS Startdte
FROM TABLE1 a, TABLE2 t
WHERE (a.ID=t.ID OR a.ID=t.ID2)
AND t.STARTDTE < SYSDATE
GROUP BY a.ID)
SELECT b.ID As Col1, tt.VID
FROM TABLE1 b
LEFT OUTER JOIN max_times mt
ON (b.ID = mt.ID)
LEFT OUTER JOIN TABLE2 tt
ON ((mt.ID=tt.ID OR mt.ID=tt.ID2)
AND mt.startdte = tt.startdte)
You can look at analytic functions to avoid having to hit the second table twice. Something like this might work:
SELECT id AS col1, vid
FROM (
SELECT t1.id, t2.vid, RANK() OVER (PARTITION BY t1.id ORDER BY
CASE WHEN t2.startdte < TRUNC(SYSDATE) THEN t2.startdte ELSE null END
NULLS LAST) AS rn
FROM table1 t1
JOIN table2 t2 ON t2.id IN (t1.ID, t1.ID2)
)
WHERE rn = 1;
The inner select gets the id and vid values from the two tables with a simple join on id or id2. The rank function calculates a ranking for each matching row in the second table based on the startdte. It's complicated a bit by you wanting to filter on that date, so I've used a case to effectively ignore any dates today or later by changing the evaluated value to null, and in this instance that means the order by in the over clause needs nulls last so they're ignored.
I'd suggest you run the inner select on its own first - maybe with just a couple of id values for brevity - to see what its doing, and what ranks are being allocated.
The outer query is then just picking the top-ranked result for each id.
You may still get duplicates though; if table2 has more than one row for an id with the same startdte they'll get the same rank, but then you may have had that situation before. You may need to add more fields to the order by to break ties in a way that makes sens to you.
But this is largely speculation without being able to see where your existing query is actually slow.

Resources