Oracle - SQL - Nest Queries Vs Where Clase - oracle

I have 3 tables
1. BIG (~6 Million Records, indexed on ID1 and some other columns , not partitioned , on DB instance 1)
2. VBIG (~6 Billion Records, indexed on ID2 and some other columns , partition on DATE field, on DB instance 2)
3. VVBIG (> VBIG by 10-15% , indexed on ID1, ID2 and some other columns, partition on DATE field, on DB instance 2)
For a given DATE and few other filter conditions, I am using data from these 3 table to run some processing. I have to decide between the 2 queries.
select /*+ ORDERED */
column1, column2
from
BIG, VBIG, VVBIG
where
BIG.ID1 = VVBIG.ID1 and
VBIG.ID2 = VVBIG.ID2 and
VBIG.DATE = VVBIG.DATE and
VBIG.DATE = '1-Jan-2015' and
BIG.CL1 = 'XYZ' and
VVBIG.CL1 = 'ABC'
OR
select /*+ ORDERED */
column1, column2
from
(select /*+ parallel */ from BIG
where BIG.CL1 = 'XYZ'),
(select /*+ parallel */ from VBIG
where VBIG.DATE = '1-Jan-2015'),
(select /*+ parallel */ from VVBIG
where VVBIG.DATE = '1-Jan-2015' and VVBIG.CL1 = 'ABC')
where
BIG.ID1 = VVBIG.ID1 and
VBIG.ID2 = VVBIG.ID2 and
VBIG.DATE = VVBIG.DATE and
Not sure if oracle is playing tricks, or if it is the distributed DB architecture, but my explain plan changes randomly.
My tests with synthetic data shows better performance with option#2.
Is there a way I can rest assured that this would be the correct choice?
Also my Performance DBA suggested of using
/*+ use_hash( BIG, VBIB, VVBIG ) full(BIG) full(VBIG) full(VVBIG) */
instead of the ORDERED hint. Would it be advisable as I am getting a CARTESIAN JOIN MERGE with his suggested change.

Related

Oracle | Update huge table after comparing values with other table

I have two huge tables. Let's call them as ITEM table (1807236 records) and ITEM_PROD_DUMP table (796369 records).
I need to update two columns (total_volume_amount, total_volume_uom) from ITEM table with the values of second table ITEM_PROD_DUMP where their primary key (SYS_ITEM_ID) matches.
I have written a query to do so, it works but only for handful records. For these huge number of records, it just keeps on running.
Can anyone please help me to write a correct and optimal query.
Query I have written:
update item i set i.total_volume_amount = (select ipd.total_volume_amount
from item_prod_dump ipd
where i.sys_item_id = ipd.sys_item_id),
i.total_volume_uom = (select ipd.total_volume_uom
from item_prod_dump ipd
where i.sys_item_id = ipd.sys_item_id)
where exists (select ipd.total_volume_amount
from item_prod_dump ipd
where i.sys_item_id = ipd.sys_item_id);
Use a MERGE statement. Pure and simple. 1.8 million records is not a "huge" number of records.
merge into item t
using ( SELECT *
FROM item_prod_dump ipd ) u
on ( t.sys_item_id = u.sys_item_id )
when matched then update set t.total_volume_amount = u.total_volume_amount,
t.total_volume_uom = u.total_volume_uom;

Oracle Performance issues on using subquery in an "In" orperator

I have two query that looks close to the same but Oracle have very different performance.
Query A
Create Table T1 as Select * from FinalView1 where CustomerID in ('A0000001','A000002')
Query B
Create Table T1 as Select * from FinalView1 where CustomerID in (select distinct CustomerID from CriteriaTable)
The CriteriaTable have 800 rows but all belongs to Customer ID 'A0000001' and 'A000002'.
This means the subquery: "select distinct CustomerID from CriteriaTable" also only returns the same two elements('A0000001','A000002') as manually entered in query A
Following is the query under the FinalView1
create or replace view FinalView1_20200716 as
select
Customer_ID,
<Some columns>
from
Table1_20200716 T1
INNER join Table2_20200716 T2 on
T1.Invoice_number = T2.Invoice_number
and
T1.line_id = T2.line_id
left join Table3_20200716 T3 on
T3.id = T1.Customer_ID
left join Table4_20200716 T4 on
T4.Shipping_ID = T1.Shipping_ID
left join Table5_20200716 Table5 on
Table5.Invoice_ID = T1.Invoice_ID
left join Table6_20200716 T6 on
T6.Shipping_ID = T4.Shipping_ID
left join First_Order first on
first.Shipping_ID = T1.Shipping_ID
;
Table1_20200716,Table2_20200716,Table3_20200716,Table4_20200716,Table5_20200716,Table6_20200716 are views to the corresponding table with temporal validity feature. For example
The query under Table1_20200716
Create or replace view Table1_20200716 as
select
*
from Table1 as for period of to_date('20200716,'yyyymmdd')
However table "First_Order" is just a normal table as
Following is the performance for both queries (According to explain plan):
Query A:
Cardinality: 102
Cost : 204
Total Runtime: 5 secs max
Query B:
Cardinality:27921981
Cost: 14846
Total Runtime:20 mins until user cancelled
All tables are indexed using those columns that used to join against other tables in the FinalView1. According to the explain plan, they have all been used except for the FirstOrder table.
Query A used uniquue index on the FirstOrder Table while Query B performed a full scan.
For query B, I was expecting the Oracle will firstly query the sub-query get the result into the in operator, before executing the main query and therefore should only have minor impact to the performance.
Thanks in advance!
As mentioned from my comment 2 days ago. Someone have actually posted the solution and then have it removed while the answer actually work. After waiting for 2 days the So I designed to post that solution.
That solution suggested that the performance was slow down by the "in" operator. and suggested me to replace it with an inner join
Create Table T1 as
Select
FV.*
from
FinalView1 FV
inner join (
select distinct
CustomerID
from
CriteriaTable
) CT on CT.customerid = FV.customerID;
Result from explain plan was worse then before:
Cardinality:28364465 (from 27921981)
Cost: 15060 (from 14846)
However, it only takes 17 secs. Which is very good!

In Elasticsearch, how can I establish join query with conditions and later perform percentile and count functions?

I have set of tables in my data base like table A which has set of set of categories , table B set of repositeries. A and B are related by categoryid. And then table C which has set of properties for a repoId. Table C and A are associated with repoId.
Table C can have multiple values for a repoId.
The data in C table is like a property say a number string like 12345XXXX (max data of 10 characters) and I have to find the top 6 matching characters of a particular value in table C and the count of repoIds associated with those top 6 value for a particular data in table A (categoryid).
Table A(set of categories ) ---------> Table B (set of repositories, associated with A with categoryid)---------> Table V (set of FMProperties against a repoId)
Now currently, this has been achieved by using joins and substring queries on these tables and it is very slow.
I have to achieve this functionality using Elastic search. I dont have clear view how to start?
Do I create separate documents / indexes for table A , B and C or fetch the info using sql query and create a single document.
And how we can apply this analytics part explained above.
I am very new and amateur in this technology but I am following the tutorials provided at elasticsearch site.
PFB the query in mysql for this logic:-
select 'fms' as fmstype, C.fmscode as fmsCode,
count(C.repoId) as countOffms from tableC C, tableB B
where B.repoId = C.repoId and B.categoryid = 175
group by C.fmscode
order by countOffms desc
limit 1)
UNION ALL
(select 'fms6' as fmstype, t1.fmscode, t2.countOffms from tableC t1
inner join
(
select substring(C.fmscode,1,6) as first6,
count(C.repoId) as countOffms from tableC C, tableB B
where B.repoId = C.repoId and B.categoryid = 175 and length(C.fmscode) = 6
group by substring(C.fmscode,1,6) order by countOffms desc
limit 1 ) t2
ON
substring(t1.fmscode,1,6) = t2.first6 and length(t1.fmscode) = 6
group by t1.fmscode
order by count(t1.fmscode) desc
limit 1)

Copy records from one table to another with pl-sql

I want to copy records from one table to another.
The only records from table 1 that will be copied to table 2 are the ones that still dont exist in table 2.
If duplicate records exists in Table 1 then only be copied to table 2 the record with the larger size name.
I could already implement a query that almost does what I want.
The problem I have is when there are names with the same maximum size of characters.
In these cases, my query returns more than one record and I just want to insert one new record in table 2.
Does anyone know how I can fix this?
Here is my code:
For x in (Select distinct xdd.id_t, xdd.name_t
From table1 xdd
Where xdd.id_t not in (Select distinct det.id_t2
From table2 det)
And LENGTH(xdd.name_t) in (Select Max(LENGTH(xdd2.name_t))
From table1 xdd2
Where xdd2.id_t = xdd.id_t)
) Loop
Insert into id_t2 (id_t2, name_t2)
Values (x.id_t, x.name_t);
End loop;
Can you give me an example to solve this?
Sure. If I understood requirements correctly, then the merge statement will look similar to this one:
We use row_number() analytic function to choose a duplicate record with longer name_t
merge into table_two t2
using(
select id_t
, name_t
from (select id_t
, name_t
, row_number() over(partition by id_t
order by length(name_t) desc) as rn
from table_one) q
where q.rn = 1
) t1
on (t2.id_t = t1.id_t)
when not matched then
insert(id_t, name_t)
values(t1.id_t, t1.name_t)
SQLFiddle demo
This is a merge statement that should "upsert" data from table 1 into table 2. Matching keys should update only when the name field in table1 is greater than that of table 2. And inserts should occur when keys from table one are not matched to table 2.
MERGE INTO table2 D
USING (SELECT table1.id_t, table1.name_t FROM table1) S
ON (D.id_t2 = S.id_t)
WHEN MATCHED THEN UPDATE SET D.name_t2 = S.name_t
WHERE (LENGTH(S.name_t) > LENGTH(D.name_t2))
WHEN NOT MATCHED THEN INSERT (D.id_t, D.name_t)
VALUES (S.id_t2, S.name_t2);

Oracle Query : Single update statement

I need to code an Oracle Query for the below logic and any help is appreciated.
I have a table with 8 columns and out of that need to consider 3 column for the a specific business logic.
Table data (with the 3 columns)
A B C
071699 01 I
071699 01W
071699 02W
071699 01W I
071699 02W
more rows.
Amount of data varies depending upon case, meaning it could be one or more rows per column A-B combination and usually out of these
column C is populated for at least 1 combination.
This table has over 100K of distinct A values.
Logic I need to implement:
Check - for a specific A value, how many combinations we have (A-B).
For a specific A: Check if any combination (A-B) is populated with column C data.
Take the value from the populated C column and update the same table (for the other combination of same A)
Data before (only showing specific rows)
A B C
071699 01 I
071699 01W
071699 02W
Data After Query
A B C
071699 01 I
071699 01W I
071699 02W I
I have a SQL server query doing this logic in a single query but not working in Oracle and I am getting error,"Single row query returning more than one row"
SQL Server Query
update c
set c.colC = u.colC
from data_table u
join data_table c on u.colA = c.colA and u.colB <> c.colB
and u.colB = (select MIN(colB) from data_table
where colA = u.colA and colC is not null)
and u.colC is not null and c.colC is null
Any help is appreciated to write similar oracle version.
Oracle doesn't allow joins in update queries unless you have a unique column which guarantees a 1-1 mapping which definitely doesn't apply in your case. So about the best you can do here is a nested subquery. It ain't pretty but it will work. I wasn't able to completely match up the logic you said you needed to implement with the logic that was in the SQL Server update statement, so I went with the statement.
UPDATE data_table u
SET u.COLC =(
SELECT c.ColC
FROM data_table c
WHERE c.ColA = u.ColA
and c.ColB =(
SELECT MIN( ColB )
FROM data_table
WHERE ColA = c.ColA
AND ColC IS NOT NULL
)
)
where u.ColC is null;
I tried various ways to do this task in a single query but not able to achieve that due to Oracle limitations, below is the solution worked for me:
I split the query into two parts - one doing insert statement and another with update.
First query : Here I am inserting the rows into another temp table, logic - get distinct rows per colA - ColC where colC is polulated.
Use the tmp table created in the step 1, to update the main data table by join on colA.
If anyone has better solution, then please send your response and I'll surely try it.
I resolved this issue with the set based coding after trying various methods like spliting into two tmp tables, cursor, etc, below is the sample code :
This query is 3-5X faster than any other solution.
UPDATE data_table a SET C = (
WITH comp AS (
SELECT DISTINCT A, C FROM data_table a
WHERE B = (SELECT MIN(B) FROM data_table
WHERE A = a.A
AND C IS NOT NULL)
)
SELECT
CASE
when a.C IS NULL THEN c.C
when a.C IS NOT NULL THEN a.C
END
FROM comp c
WHERE a.A = c.A AND c.C IS NOT NULL
);

Resources