When i select , only one column is checked without duplicates - oracle

I have a 2 table like this:
first table
+------------+---------------+--------+
| pk | user_one |user_two|
+------------+---------------+--------+
second table
+------------+---------------+--------+----------------+----------------+
| pk | sender |receiver|fk of firsttable|content |
+------------+---------------+--------+----------------+----------------+
First and second table have one to many(1:N) relations.
There are many records in second table:
| pk | sender|receiver|fk of firsttable|content |
|120 |car224 |car223 |1 |test message1 to 223
|121 |car224 |car223 |1 |test message2 to 223
|122 |car224 |car225 |21 |test message1 to 225
|123 |car224 |car225 |21 |test message2 to 225
|124 |car224 |car225 |21 |test message3 to 225
|125 |car224 |car225 |21 |test message4 to 225
I need to find if fk has the same value and I want the row with the largest pk.
I've changed the above column name to make it easier to understand.
Here is the actual sql I've tried so far:
select *
from (select rownum rn,
mr.mrno,
mr.user_one,
mr.user_two,
m.mno,
m.content
from tbl_messagerelation mr,
tbl_message m
where (mr.user_one = 'car224' or
mr.user_two='car224') and
m.rowid in (select max(rowid)
from tbl_message
group by m.mno) and
rownum <= 1*20)
where rn > (1-1) * 20
And this is the result:
+---------+-------+----------+----------+-------------------------+----------------------+
| rn | mrno | user_one | user_two | mno(pk of second table) | content |
+---------+-------+----------+----------+-------------------------+----------------------+
| 1 | 1 | car224 | car223 | 125 | test message4 to 225 |
| 2 | 21 | car224 | car225 | 125 | test message4 to 225 |
+---------+-------+----------+----------+-------------------------+----------------------+
My desired result is something like this:
+---------+---------+----------+--------------------+----------------------+
| fk | sender | receiver | pk of second table | content |
+---------+---------+----------+--------------------+----------------------+
| 1 | car224 | car223 | 121 | test message2 to 223 |
| 21 | car224 | car223 | 125 | test message4 to 225 |
+---------+---------+----------+--------------------+----------------------+

Your table description when compared to your query is confusing me. However, what I could understand was that you are probably looking for row_number().
An important advice is to use standard explicit JOIN syntax rather than outdated a,b syntax for joins. Join keys were not clear to me and you may replace it appropriately in your final query.
select * from
(
select mr.*, m.*, row_number() over ( partition by m.fk order by m.pk desc ) as rn
from tbl_messagerelation mr join tbl_message m on mr.? = m.?
) where rn =1
Or perhaps you don't need that join at all
select * from
(
select m.*, row_number() over ( partition by m.fk order by m.pk desc ) as rn
from tbl_message m
) where rn =1

Related

Delete query inoracle db is fast but select query running too long

I have below table with 160,000 rows. When I use SELECT ID FROM mytable WHERE id NOT IN ( SELECT max(id) FROM mytable GROUP BY user_id); query is running very long and not finishing (I wait for 1 Hr) but when I use delete FROM mytable WHERE id NOT IN (SELECT max(id) FROM mytable GROUP BY user_id); query is running in 0.5 seconds. Why??
---------------------------------------------------------------------------------------------------
| id | MyTimestamp | Name | user_id ...
----------------------------------------------------------------------------------------------------
| 0 | 1657640396 | John | 123581 ...
| 1 | 1657638832 | Tom | 168525 ...
| 2 | 1657640265 | Tom | 168525 ...
| 3 | 1657640292 | John | 123581 ...
| 4 | 1657640005 | Jack | 896545 ...
-----------------------------------------------------------------------------------------

Pull interlinked records based on rank and latest timestamp

I have a table like below.
myTable:
---------------------------------------------------------------------------------
id | ref | type | status | update_dt
---------------------------------------------------------------------------------
id1 | m1123 | 10 | 1 | 03-NOV-22 10.44.64.104000000 AM
id1 | m2123 | 10 | 2 | 03-NOV-22 10.44.64.104000000 AM
id1 | s1123 | 20 | | 03-NOV-22 10.44.64.104000000 AM
id1 | s2123 | 20 | | 03-NOV-22 10.44.54.104000000 AM
id1 | p1123 | 30 | | 03-NOV-22 10.44.54.104000000 AM
id2 | m1234 | 10 | | 02-NOV-22 10.44.64.104000000 AM
id2 | s1234 | 20 | | 02-NOV-22 10.44.54.104000000 AM
id2 | s2234 | 20 | | 02-NOV-22 10.44.54.104000000 AM
id3 | m1345 | 10 | 1 | 01-NOV-22 10.44.64.104000000 AM
id3 | s1345 | 20 | | 01-NOV-22 10.44.64.104000000 AM
id3 | s2345 | 20 | | 01-NOV-22 10.44.54.104000000 AM
---------------------------------------------------------------------------------
My requirement looks pretty complex to me and I have tried to reach somewhere but not completely there. Here are my requirements.
From the table, I have to pull records of type 10 and 20 alone. With type 10 having status either null or 1.
For type 10 comparison, I need to convert the update_dt to epoch and pull all the type 10 records above a specific epoch.
type 10 records are linked to type 20 records by the id. They have the same id.
For all the records pulled in step 2, need to pull their corresponding type 20 records. But only the latest one based on update_dt.
If multiple records of type 20 has the same update_dt from step 4, any one of them can be picked.
By the above requirements, I need to get a result like for a sample epoch that corresponds to Nov 1 2022 - 11AM (1667300400):
-----------------------------------------------------------------------------------------------
ref1 | ref2 | ref1_update_dt | ref2_update_dt
-----------------------------------------------------------------------------------------------
m1123 | s1123 | 03-NOV-22 10.44.64.104000000 AM | 03-NOV-22 10.44.64.104000000 AM
m1234 | s2234 | 02-NOV-22 10.44.64.104000000 AM | 02-NOV-22 10.44.54.104000000 AM
-----------------------------------------------------------------------------------------------
I tried the below. But didnt quite get there.
WITH cte_latest AS
(
SELECT
t1.ref ref1,
t2.ref ref2,
t1.update_dt ref1_update_dt,
t2.update_dt ref2_update_dt,
RANK() OVER(ORDER BY t2.update_dt DESC) rank_temp
FROM
myTable t1
JOIN myTable t2 ON
t1.id = t2.id
WHERE
t1.type = 10
AND (t1.status IS NULL
OR t1.status = 1)
AND t2.type = 20
AND (CAST(t1.update_dt AS DATE) - TO_DATE('01/01/1970', 'DD/MM/YYYY')) * 24 * 60 * 60 > '1667300400')
SELECT
ref1,
ref2,
ref1_update_dt,
ref2_update_dt
FROM
cte_latest
WHERE
rank_temp = 1
ORDER BY
ref1_update_dt;
Please help.
RANK will return the same number when there are multiple type 20 records that have the same update_dt. So, you will want to use ROW_NUMBER instead. That will ensure that each type 20 row gets a unique number to break any ties - per rule #5.
Also, you will need to partition the ROW_NUMBER based on the id of the type 10 records. That will cause the numbering to reset at 1 for each type 10 record id. Without partitioning every row in the result set would get a unique number.
ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY t2.update_dt DESC)

monetdb full outer join resulting in varchar type_digits=0

I am using MonetDB v11.29.7 "Mar2018-SP1" on a Windows10 x64 bit operating system. When I perform a full outer join with two tables on respective varchar columns with lengths > 0 (type_digits > 0), the resultant column in the target table yields a varchar column with type_digits=0, although the column data seems to display the proper, non-null varchar records.
I am not sure how to interpret column information of type=varchar and type_digits=0. This state is causing issues in the subsequent handling/extraction of data via Python interfaces (UDFs), as the expected Python dtype for the data of this column is ambiguous for Python numpy conversion.
I have provided a simple example whereby I created two small tables (dummy4 and dummy5) with two columns each and then create a third table (dummy6) using a full outer join command.
For table dummy6 and column "key", I would have expected the type_digits=32 (as per the "key" columns in the two source tables dummy4 & dummy5). Additionally, how should I interpret type=varchar and type_digits=0 state? What would be the proper handling/expectation when accessing/allocating a Python/numpy array for extracting the "key" column of table "dummy6" (via Python UDFs) in this case?
create table dummy4(key varchar(32), val int);
insert into dummy4 values('AAAAAAAA',1);
insert into dummy4 values('BBBBBBBBB',2);
select * from dummy4;
+-----------+------+
| key | val |
+===========+======+
| AAAAAAAA | 1 |
| BBBBBBBBB | 2 |
+-----------+------+
create table dummy5(key varchar(32), val int);
insert into dummy5 values('CCCCCCCC',3);
insert into dummy5 values('DDDDDDDD',4);
select * from dummy5;
+----------+------+
| key | val |
+==========+======+
| CCCCCCCC | 3 |
| DDDDDDDD | 4 |
+----------+------+
create table dummy6 as select key, dummy4.val as "val4", dummy5.val as "val5" from dummy4 full outer join dummy5 using (key);
select * from dummy6;
+-----------+------+------+
| key | val4 | val5 |
+===========+======+======+
| AAAAAAAA | 1 | null |
| BBBBBBBBB | 2 | null |
| CCCCCCCC | null | 3 |
| DDDDDDDD | null | 4 |
+-----------+------+------+
select t.name as "table_name", t.id as "table_id", c.id as "column_id", c.name as "column_name", c.type, c.type_digits from sys.tables t JOIN sys.columns c ON c.table_id = t.id where t.name = 'dummy4';
+------------+----------+-----------+-------------+---------+-------------+
| table_name | table_id | column_id | column_name | type | type_digits |
+============+==========+===========+=============+=========+=============+
| dummy4 | 78445 | 78443 | key | varchar | 32 |
| dummy4 | 78445 | 78444 | val | int | 32 |
+------------+----------+-----------+-------------+---------+-------------+
select t.name as "table_name", t.id as "table_id", c.id as "column_id", c.name as "column_name", c.type, c.type_digits from sys.tables t JOIN sys.columns c ON c.table_id = t.id where t.name = 'dummy5';
+------------+----------+-----------+-------------+---------+-------------+
| table_name | table_id | column_id | column_name | type | type_digits |
+============+==========+===========+=============+=========+=============+
| dummy5 | 78449 | 78447 | key | varchar | 32 |
| dummy5 | 78449 | 78448 | val | int | 32 |
+------------+----------+-----------+-------------+---------+-------------+
select t.name as "table_name", t.id as "table_id", c.id as "column_id", c.name as "column_name", c.type, c.type_digits from sys.tables t JOIN sys.columns c ON c.table_id = t.id where t.name = 'dummy6';
+------------+----------+-----------+-------------+---------+-------------+
| table_name | table_id | column_id | column_name | type | type_digits |
+============+==========+===========+=============+=========+=============+
| dummy6 | 78457 | 78454 | key | varchar | 0 |
| dummy6 | 78457 | 78455 | val4 | int | 32 |
| dummy6 | 78457 | 78456 | val5 | int | 32 |
+------------+----------+-----------+-------------+---------+-------------+
In fact this was a MonetDB's bug and was fixed today. Th fix will be featured on the upcoming Nov2019 release.

Get records from multiple Hive tables without join

I have 2 tables :
Table1 desc:
count int
Table2 desc:
count_val int
I get the fields count, count_val from the above tables and insert into the another Audit table(table3) .
Table3 desc:
count int
count_val int
I am trying to log the record count of these 2 tables into audit table for each job run.
Any of your suggestions are appreciated.Thanks!
If you want just aggregations (like sums), the solution comes with the use of UNION
INSERT INTO TABLE audit
SELECT
SUM(count),
SUM(count_val)
FROM (
SELECT
t1.count,
0 as count_val
FROM table1 t1
UNION ALL
SELECT
0 as count,
t2.count_val
FROM table2 t2
) unioned;
Otherwise join is required, because you should somehow match your lines, it's how relational algebra (the theory behind SQL) works.
==table1==
| count|
|------|
| 12 |
| 751 |
| 167 |
===table2===
| count_val|
|----------|
| 1991 |
| 321 |
| 489 |
| 7201 |
| 3906 |
===audit===
| count | count_val|
|-------|----------|
| ??? | ??? |

Slow Update When Using Oracle PL/SQL Table

We're using a PL/SQL table (named pTable) to collect a number of ids to be updated.
However, the statement
UPDATE aTable
SET aColumn = 1
WHERE id IN (SELECT COLUMN_VALUE
FROM TABLE (pTable));
takes a long time to execute.
It seems that the optimizer comes up with a very bad execution plan, instead of using the index that is defined on id (as the primary key) it decides to use a full table scan on the aTable. pTable usually contains very few values (in most cases just one).
What can we do to make this faster? The best we've come up with is to handle low pTable.Count (1 and 2) as special cases, but that is certainly not very elegant.
Thanks for all the great suggestions. I wrote about this issue in my blog at http://smartercoding.blogspot.com/2010/01/performance-issues-using-plsql-tables.html.
You can try the cardinality hint. This is good if you know (roughly) the number of rows in the collection.
UPDATE aTable
SET aColumn = 1
WHERE id IN (SELECT /*+ cardinality( pt 10 ) */
COLUMN_VALUE
FROM TABLE (pTable) pt );
Here's another approach. Create a temporary table:
create global temporary table pTempTable ( id int primary key )
on commit delete rows;
To perform the update, populate pTempTable with the contents of pTable and execute:
update
(
select aColumn
from aTable aa join pTempTable pp on aa.id = pp.id
)
set aColumn = 1;
The should perform reasonably well without resorting to optimizer hints.
The bad execution plan is probably unavoidable (unfortunately). There is no statistics information for the PL/SQL table, so the optimizer has no way of knowing that there are few rows in it. Is it possible to use hints in an UPDATE? If so, you might force use of the index that way.
It helped to tell the optimizer to use the "correct" index instead of going on a wild full-table scan:
UPDATE /*+ INDEX(aTable PK_aTable) */aTable
SET aColumn = 1
WHERE id IN (SELECT COLUMN_VALUE
FROM TABLE (CAST (pdarllist AS list_of_keys)));
I couldn't apply this solution to more complicated scenarios, but found other workarounds for those.
You could try adding a ROWNUM < ... clause.
In this test a ROWNUM < 30 changes the plan to use an index.
Of course that depends on your set of values having a reasonable maximum size.
create table atable (acolumn number, id number);
insert into atable select rownum, rownum from dual connect by level < 150000;
alter table atable add constraint atab_pk primary key (id);
exec dbms_stats.gather_table_stats(ownname => user, tabname => 'ATABLE');
create type type_coll is table of number(4);
/
declare
v_coll type_coll;
begin
v_coll := type_coll(1,2,3,4);
UPDATE aTable
SET aColumn = 1
WHERE id IN (SELECT COLUMN_VALUE
FROM TABLE (v_coll));
end;
/
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
UPDATE ATABLE SET ACOLUMN = 1 WHERE ID IN (SELECT COLUMN_VALUE FROM TABLE (:B1 ))
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | | | 142 (100)| |
| 1 | UPDATE | ATABLE | | | | |
|* 2 | HASH JOIN RIGHT SEMI | | 1 | 11 | 142 (8)| 00:00:02 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| | | | | |
| 4 | TABLE ACCESS FULL | ATABLE | 150K| 1325K| 108 (6)| 00:00:02 |
----------------------------------------------------------------------------------------------
declare
v_coll type_coll;
begin
v_coll := type_coll(1,2,3,4);
UPDATE aTable
SET aColumn = 1
WHERE id IN (SELECT COLUMN_VALUE
FROM TABLE (v_coll)
where rownum < 30);
end;
/
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
UPDATE ATABLE SET ACOLUMN = 1 WHERE ID IN (SELECT COLUMN_VALUE FROM TABLE (:B1 ) WHERE
ROWNUM < 30)
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | | | 31 (100)| |
| 1 | UPDATE | ATABLE | | | | |
| 2 | NESTED LOOPS | | 1 | 22 | 31 (4)| 00:00:01 |
| 3 | VIEW | VW_NSO_1 | 29 | 377 | 29 (0)| 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 58 | | |
|* 5 | COUNT STOPKEY | | | | | |
| 6 | COLLECTION ITERATOR PICKLER FETCH| | | | | |
|* 7 | INDEX UNIQUE SCAN | ATAB_PK | 1 | 9 | 0 (0)| |
---------------------------------------------------------------------------------------------------
I wonder if the MATERIALIZE hint in the subselect from the PL/SQL table would force a temp table instantiation and help the optimizer?
UPDATE aTable
SET aColumn = 1
WHERE id IN (SELECT /*+ MATERIALIZE */ COLUMN_VALUE
FROM TABLE (pTable));

Resources