Fetch single record from duplicate rows from oracle table - oracle

I have a table user_audit_records_tbl which has multiple rows for a single user ,Every time user logs in one entry is made into this table so i want a select query which will fetch a latest single record for each user, I have a query which uses IN clause.
Table Name : user_audit_records_tbl
Record_id Number Primary Key,
user_id varchar Primary Key ,
user_ip varchar,
.
.
etc
Current query i am using is
select * from user_audit_records_tbl where record_id in (select
max(record_id) from user_audit_records_tbl
group by user_id);
but was just wondering if anybody has better solution for this since this table has huge volumns.

You can use the first/last function
select max(Record_id) as Record_id,
user_id,
max(user_ip) keep (dense_rank last order by record_id) as user_ip,
...
from user_audit_records_tbl
group by user_id
No sure if it will be more efficient.
EDIT : As above query is less efficient, may be you could try an exist clause
select *
from user_audit_records_tbl A
where exists ( select 1
from user_audit_records_tbl B
where A.user_id = B.user_id
group by B.user_id
having max(B.record_id) = A.record_id
)
But maybe, you should look on the index side instead of the query side.

select *
from ( select row_number() over ( partition by user_id order by record_id desc) row_nr,
a.*
from user_audit_records_tbl a
)
where row_nr = 1
;

Related

Can we use an insert overwrite after using insert all

In Snowflake I am trying to insert updated records to a table. Then I want to identify the records that were just inserted as the most recent records save that as the final table output in a new column called ACTIVE which will either be true or flase. I am having an issue incorporating some sort of updated table segment to my current query. I need everything be contained in the same query rather than break it up into separate parts.
I have my table as follows
CREATE TABLE IF NOT EXISTS MY_TABLE
(
LINK_ID BINARY NOT NULL,
LOAD TIMESTAMP NOT NULL,
SOURCE STRING NOT NULL,
SOURCE_DATE TIMESTAMP NOT NULL,
ORDER BIGINT NOT NULL,
ID BINARY NOT NULL,
ATTRIBUTE_ID BINARY NOT NULL
);
I have records being inserted in this way:
INSERT ALL
WHEN HAS_DATA AND ID_SEQ_NUM > 1 AND (SELECT COUNT(1) FROM MY_TABLE WHERE ID = KEY) = 0 THEN
INTO MY_TABLE VALUES (
LINK_KEY,
TIME,
DATASET_NAME,
DATASET_DATE,
ORDER_NUMBER,
O_KEY,
OA_KEY
)
SELECT *
FROM TEST_TABLE;
I would like my final table from this to be the output as
SELECT *, ORDER != MAX(ORDER) OVER (PARTITION BY ID) AS ACTIVE
FROM MY_TABLE;
This is so I can identify the most recent record per ID group as ACTIVE/TRUE and the previous records within that ID group as INACTIVE/FALSE
I tried to use an insert overwrite method like this
INSERT ALL
WHEN HAS_DATA AND ID_SEQ_NUM > 1 AND (SELECT COUNT(1) FROM MY_TABLE WHERE ID = KEY) = 0 THEN
INTO MY_TABLE VALUES (
LINK_KEY,
TIME,
DATASET_NAME,
DATASET_DATE,
ORDER_NUMBER,
O_KEY,
OA_KEY
)
INSERT OVERWRITE INTO MY_TABLE
SELECT *, RSRC_OFFSET != MAX(RSRC_OFFSET) OVER (PARTITION BY ID) AS ACTIVE
FROM L_OPTION_OPTION_ALLOCATION_TEST
SELECT *
FROM MY_TABLE;
However, it seems the insert overwrite doesn't work in this way (also I am not sure if I can just add a new column to the table like this?). Is there a way I can incorporate it into this query or a different way to update the table with this new ACTIVE column within this query itself?
Also I am using INSERT ALL here because I actually have multiple different tables I am inserting into at once, but this is the current table that I am trying to modify.
You can use the overwrite option with conditional multi-table inserts.
Starting with your current statement:
INSERT ALL
WHEN HAS_DATA AND ID_SEQ_NUM > 1 AND (SELECT COUNT(1) FROM MY_TABLE WHERE ID = KEY) = 0 THEN
INTO MY_TABLE VALUES (
LINK_KEY,
TIME,
DATASET_NAME,
DATASET_DATE,
ORDER_NUMBER,
O_KEY,
OA_KEY
)
SELECT *
FROM TEST_TABLE;
Add the overwrite option immediately after the insert command:
INSERT OVERWRITE ALL
WHEN HAS_DATA AND ID_SEQ_NUM > 1 AND (SELECT COUNT(1) FROM MY_TABLE WHERE ID = KEY) = 0 THEN
INTO MY_TABLE VALUES (
LINK_KEY,
TIME,
DATASET_NAME,
DATASET_DATE,
ORDER_NUMBER,
O_KEY,
OA_KEY
)
SELECT *
FROM TEST_TABLE;
Note that this will truncate and insert ALL tables in the multi-table insert. There is not a way to be selective about which tables get truncated and inserted and which don't.
https://docs.snowflake.com/en/sql-reference/sql/insert-multi-table.html#optional-parameters

How to find record from a very big HIVE table where column header__timestamp,header__change_seq should be latest update and id should unique

I have to find record from the hive table where Id, der__timestamp, header__change_seq should be unique but in table (Id, der__timestamp, header__change_seq) can duplicate so in this case i have to fetch only one record if records are getting duplicate .
select b.*
from (SELECT ID, max(COALESCE(header__timestamp))
max_modified,MAX(CAST(header__change_seq AS DECIMAL(38,0))) max_sequence
FROM table_name group by ID) a
join table_name b on (a.id=b.id and
a.max_modified=b.header__timestamp and
a.max_sequence=b.header__change_seq)
So the total number of distinct id is count-->244441250
but through above query i am getting count-->244442548
due to some duplicate records but i have to find only distinct id where (header__change_seq and header__timestamp) should max .
#Rahul; please try this one. It makes use of row_number() so in case of duplicate id, header_timestamp and hearder_change_seq, it will select only one record. Hope it helps.
select *
from (
select *,
row_number() over ( partition by id order by header__timestamp desc, header__change_seq desc) as rnk
from table_name) t
where t.rnk = 1;

Oracle join rows in order where order is defined differently on each table

I have two tables
TABLE_A with columns project_id, id and load_date
and TABLE_B with columns project_id, delete_flag and delete_date
where TABLE_A.load_date is a new column and I want to populate it based on TABLE_B.delete_date for historic data. Basically, a file has been repeatedly loaded into the system and historically we didn't keep track of when it was loaded. However, each time the file is re-loaded, the previous version of it is updated in TABLE_B with a delete_date (i.e. a soft delete). The previous version just stays in TABLE_A without any changes.
I would like to populate TABLE_A.load_date based on matching projects in TABLE_B. The oldest row in TABLE_A (smallest TABLE_A.id) matches the oldest row in TABLE_B (oldest delete_date), etc. So the rows should match up if you keep picking the next one in order from each table. But I don't know how to turn that into an Oracle statement. What I've got so far is this which doesn't deal with matching on row order:
MERGE INTO TABLE_A a
USING
(
SELECT PROJECT_ID, DELETE_DATE
FROM TABLE_B
WHERE DELETE_FLAG = 'Y'
ORDER BY DELETE_DATE ASC
) b ON (a.PROJECT_ID = b.PROJECT_ID)
WHEN MATCHED THEN UPDATE
SET a.LOAD_DATE = p.DELETE_DATE;
This merge should do the work, as far as I properly understood your criteria:
merge into table_a ta
using (
select pid project_id, id, delete_date
from (
select project_id pid, id,
row_number() over (partition by project_id order by id) rn
from table_a) a
join (
select project_id pid, delete_date,
row_number() over (partition by project_id order by delete_date ) rn
from table_b
where delete_flag='Y') b using (pid, rn) ) tb
on (ta.project_id = tb.project_id and ta.id = tb.id)
when matched then update
set ta.load_date = tb.delete_date

Oracle Comma Separated Value (ID) in a Column. How to get Description for each Value in a Comma Separated string.

Sorry for the Confusing title.I myself did not understand it when i read it second time.
So here is the details description.
I have a table say "Awards" which have following Column:
Name,
Amount,
Employee
and Another table "Employee" which have following column:
Emp_Id,
Emp_Name
and in employee column of "Awards" table i have value "01,20" which are actually the Employee ID referenced to "Employee" table.
So is there any way i can get Employee Name in select "Awards" query?
Here is one method:
select a.*, e.EmpName
from Awards a join
Employees e
on ','||a.employee||',' like '%,'||e.emp_id||',%';
This will return the employee names on separate lines. If you want them in a list, then you would need to concatenate them together (and the best function for doing that depends on your version of Oracle).
By the way, this is a very bad data structure, You should have an association table AwardEmployee that has one row for each row and each employee.
Given below is the query to get comma seperated employee ids in form of rows which I put in subquery to get their name. Please edit as per your ewquirements.
Select Ename from employee where employee_id in (
SELECT trim(x.column_value.extract('e/text()')) COLUMNS
from awards t, table (xmlsequence(xmltype('<e><e>' || replace(Employee,':','</e><e>')||
'</e></e>').extract('e/e'))) x )
I have changed the Database (added one more table). and already started changing the CODE, as for the said report i have used following
WITH t AS
(
Select emp_name from employee where emp_id in (
select regexp_substr(Employee ,'[^,]+', 1, level) from awards
connect by regexp_substr((select Employee from awards ), '[^,]+', 1, level) is
not null)
)
SELECT LTRIM(SYS_CONNECT_BY_PATH(emp_name, ','),',') emp_name
FROM ( SELECT emp_name,
ROW_NUMBER() OVER (ORDER BY emp_name) FILA
FROM t )
WHERE CONNECT_BY_ISLEAF = 1
START WITH FILA = 1
CONNECT BY PRIOR FILA = FILA - 1
Which is temporary and i understand very less of it.
Thanks for you help and suggestion.

ORACLE: How to select previous different value?

I have table that stores employee job name, it has the following columns:
id; date_from; date_to; emp_id; jobname_id; grade;
Each emp_id can have many consecutive records with the same jobname_id due to many grade changes.
How can I select previous different jobname_id omitting those that are the same like the most current one?
This solution uses the FIRST_VALUE() analytic function to identify each employee's current job. It then filters for all the jobs which dfon't match that one:
select distinct id
, jobname_id
from ( select id
, jobname_id
, first_value(jobname_id) over (partition by id
order by from_date desc) as current_job
from employee
where emp_id = 1234 )
where jobname_id != current_job
order by id, jobname_id
/
Will this work for your issue:
SELECT DISTINCT
e1.emp_id,
e1.jobname_id
FROM employee e1
WHERE NOT EXISTS
(SELECT 1
FROM employee e2
WHERE e1.emp_id = e2.emp_id
AND SYSDATE BETWEEN e2.date_from
AND NVL(e2.date_to, SYSDATE + 1));
(This asumes your table is named "employee" and emp_id is the PK value).
It selects unique emp_id, jobname_id values where the emp_id, jobname_id values are not current.
EDIT: I agree with Chin Boon that fundamentally this is a design issue and perhaps that should be addressed rather than working around the problem.

Resources