SQL Select query Performance improvement - performance

I've a bunch of queries like this :
-- get all data that exists in source but not yet in destination
SELECT
*
INTO #temp
FROM source T010T
WHERE NOT EXISTS
(
SELECT TOP 1 1 FROM destination P510T
WHERE WH_CD = T010T.WH_CD
AND POS_NO = T010T.POS_NO
AND SLIP_NO = T010T.TRAN_NO
AND OPE_DATE = T010T.SL_REC_DATE
)
-- process the data
....
-- insert data into destination
Insert into destination select * From #temp
I'm wondering will this way of approach will affect the performance? Because I haven't got the real data to test and this is running locally so I'm kind of scared that when put into reality, those queries will be a pain in the a##!
Is there any better alternatives?
p/s : columns on both table used in the comparison are all primary keys primarykey(wh_cd,pos_no,slip_no,ope_date) ...

Try with Left Join instead:
SELECT
*
INTO #temp
FROM source T010T
LEFT JOIN destination P510T
ON WH_CD = T010T.WH_CD
AND POS_NO = T010T.POS_NO
AND SLIP_NO = T010T.TRAN_NO
AND OPE_DATE = T010T.SL_REC_DATE
WHERE P510T.WH_CD IS NULL

Related

Oracle mutliple left outer joins only max() rows

I have a statement that is used quiet often but does perform pretty poorly.
So i want to optimize it as good as possible. The statement consist of various parts and unions.
One part that is pretty costly is the following.
select *
from rc050 F
LEFT OUTER JOIN
(SELECT fi_nr,
fklz,
rvc_status,
lfdnr,
txt
FROM rc0531 temp
WHERE lfdnr =
(SELECT MAX(lfdnr) FROM rc0531 WHERE fi_nr = temp.fi_nr AND fklz = temp.fklz
)
) SUB_TABLE1
ON F.fklz = SUB_TABLE1.fklz
AND F.fi_nr = SUB_TABLE1.fi_nr
LEFT OUTER JOIN
(SELECT fi_nr,
fklz,
rvc_status,
lfdnr,
txt
FROM rc0532 temp
WHERE lfdnr =
(SELECT MAX(lfdnr) FROM rc0532 WHERE fi_nr = temp.fi_nr AND fklz = temp.fklz
)
) SUB_TABLE2
ON F.fklz = SUB_TABLE2.fklz
AND F.fi_nr = SUB_TABLE2.fi_nr
LEFT OUTER JOIN
(SELECT fi_nr,
fklz,
rvc_status,
lfdnr,
txt
FROM rc05311 temp
WHERE lfdnr =
(SELECT MAX(lfdnr)
FROM rc05311
WHERE fi_nr = temp.fi_nr
AND fklz = temp.fklz
)
) SUB_TABLE11
ON F.fklz = SUB_TABLE11.fklz
AND F.fi_nr = SUB_TABLE11.fi_nr
where F.fklz != ' '
This is a part where I have to load from these rc0531 ... rc05311 tables
what there latest entry is. There are 11 of these tables so this is broken down.
As you can see I'm currently joining each table via subquery and need only the latest entry so i need an additional subquery to get the max(lfdnr).
This all works good so far but i wanna know if there is a more efficient way to perform this.
In my main select I need to be able to address each column from each of these tables.
Do you guys have any suggestions ?
This currently runs in 1.3 sec on 13k rows to get a decent boost this should get down to 0.1 sec or so. Again this is only one problem in a bigger statement full of unefficient declarations.
It's not possible to optimize a SQL query. Oracle is going to take your query and determine an execution plan that it uses to determine the information you've asked for. You might completely rewrite the query and, if it leads to the same execution plan, you'll get exactly the same performance. To tune a query, you need to determine the execution plan.
You may well benefit from an index on each of these tables, on the (fi_nr, fklz) columns or possibly even (fi_nr, fklz, lfdnr), depending on how selective that gets. There's no way to tell in advance from this information, you'll just have to try.
You should also remove the select * and select only the columns you want. If it's possible to get just the needed information from an index, Oracle will not need to retrieve the actual table row.
replace the left joins from :
LEFT OUTER JOIN (
SELECT fi_nr,fklz,rvc_status,lfdnr,txt FROM rc0531 temp
WHERE lfdnr = (SELECT MAX(lfdnr) FROM rc0531 WHERE fi_nr = temp.fi_nr AND fklz = temp.fklz)
) SUB_TABLE1
ON F.fklz = SUB_TABLE1.fklz
AND F.fi_nr = SUB_TABLE1.fi_nr
to this:
LEFT OUTER JOIN (
SELECT fi_nr,fklz,rvc_status,lfdnr,txt FROM rc0531 inner join
(SELECT fi_nr, fklz, MAX(lfdnr) lfdnr FROM rc0531 group by fi_nr, fklz)x on x.fi_nr=rc0531.fi_nr and x.fklz=rc0531.fklz and x.lfdnr=rc0531.lfdnr
) SUB_TABLE1
ON F.fklz = SUB_TABLE1.fklz
AND F.fi_nr = SUB_TABLE1.fi_nr
let me know if it got to 0.1 sec, i think it will go

Using Merge in Oracle for updating a parent table using Joins

I have a SQL statement as below, I couldn't execute the statement in Oracle as it doesn't support Joins in Update. I have tried using updating using a temp table and completed the job, however, I would like to use the Merge statement and failed to write the code.
Let me know if there is a possibility in Oracle use joins in an update which doesn't add any performance overhead.
UPDATE A SET
YearStartPD = B.PERIOD_CODE,
YearEndPD = A.PERIOD_CODE,
PY_YearStartPD = C.PERIOD_CODE,
PY_YearEndPD = A.PY_WeekEndPD
FROM dbo.DIM_TIME A
INNER JOIN
(
select PERIOD_CODE,WEEK_PERIOD_CODE,YEAR_CODE from dbo.DIM_TIME
WHERE WEEK_PERIOD_CODE = '01'
) B ON B.Year_Code = A.Year_Code
LEFT JOIN
(
select PERIOD_CODE,WEEK_PERIOD_CODE,YEAR_CODE from dbo.DIM_TIME
WHERE WEEK_PERIOD_CODE = '01'
) C ON C.Year_Code = A.Year_Code - 1

Update statement with joins in Oracle

I need to update one column in table A with the result of a multiplication of one field from table A with one field from table B.
It would be pretty simple to do this in T-SQL, but I can't write the correct syntax in Oracle.
What I've tried:
UPDATE TABLE_A
SET TABLE_A.COLUMN_TO_UPDATE =
(select TABLE_A.COLUMN_WITH_SOME_VALUE * TABLE_B.COLUMN_WITH_PERCENTAGE
from TABLE_A
INNER JOIN TABLE_B
ON TABLE_A.PRODUCT_ID = TABLE_B.PRODUCT_ID
AND TABLE_A.SALES_CHANNEL_ID = TABLE_B.SALES_CHANNEL_ID)
WHERE TABLE_A.MONTH_ID IN (201601, 201602, 201603);
But I keep getting errors. Could anybody help me, please?
I generally prefer to use the below format for such cases since this will ensure there's no update performed if there's no data in the table(query extracted temp table) whereas in the above solution provided by Brian Leach will update the new value as null if there's no record present in the 2nd table but exists in the first table.
UPDATE
(
select TABLE_A.COLUMN_TO_UPDATE
, TABLE_A.PRODUCT_ID
, TABLE_A.COLUMN_WITH_SOME_VALUE * TABLE_B.COLUMN_WITH_PERCENTAGE as value
from TABLE_A
INNER JOIN TABLE_B
ON TABLE_A.PRODUCT_ID = TABLE_B.PRODUCT_ID
AND TABLE_A.SALES_CHANNEL_ID = TABLE_B.SALES_CHANNEL_ID
AND TABLE_A.MONTH_ID IN (201601, 201602, 201603)
) DATA
SET DATA.COLUMN_TO_UPDATE = DATA.value;
This solution can cause key preserved value issues which shouldn't be an issue here since i expect a single row in both the tables for one product(ID).
More on Key Preserved table concept in inner join can be found here
https://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:548422757486
#Jayesh Mulwani raiesed a valid point, this will set the value to null if there is no matching record. This may or may not be the desired result. If it isn't, and no change is desirect, you can change the select statement to:
coalesce((SELECT table_b.column_with_percentage
FROM table_b
WHERE table_a.product_id = table_b.product_id AND table_a.sales_channel_id = table_b.sales_channel_id),1)
If this is the desired outcome, Jayesh's solution will be more efficient as it will only update matching records.
UPDATE table_a
SET table_a.column_to_update = table_a.column_with_some_value
* (SELECT table_b.column_with_percentage
FROM table_b
WHERE table_a.product_id = table_b.product_id
AND table_a.sales_channel_id = table_b.sales_channel_id)
WHERE table_a.month_id IN (201601, 201602, 201603);

Nested select in hiveQL

In one of my use case, i have two tables namely flow and conf. The flow table contains list of all flight data. It has columns creationdate,datafilename,aircraftid. The conf table contains configuration information. It has columns configdate, aircraftid, configurationame. There are multiple versions of configurations created for one aircraft type. So, when we process a datafilename, we need to identify the aircraftid from the flow table, and pick up the configuration from conf table that was created just before the datafilename was created. So, i tried this,
FROM (
SELECT
F_FILE_CREATION_DATE,
F_FILE_ARCHIVED_RELATIVE_PATH,
F_FILE_ARCHIVED_NAME,
K_AIRCRAFT
from T_FLOW f )x left join
(
select c.config_date, c.aircraft_id, c.configurationfrom t_conf c
) y on y.aircraft_id = x.K_AIRCRAFT
select
x.F_FILE_CREATION_DATE,
x.F_FILE_ARCHIVED_RELATIVE_PATH,
x.F_FILE_ARCHIVED_NAME,
x.K_AIRCRAFT,
y.config_date,
y.aircraft_id,
y.configuration;
This picks up all the configurations created for the aircraft which is obvious as there is no condition to check conf.config_date < flow.f_file_creation_date. I tried to include this condition like this,
FROM (
SELECT
F_FILE_CREATION_DATE,
F_FILE_ARCHIVED_RELATIVE_PATH,
F_FILE_ARCHIVED_NAME,
K_AIRCRAFT
from T_FLOW f )x join
(
select c.config_date, c.aircraft_id, c.FILEFILTER from t_conf c
) y on y.aircraft_id = x.K_AIRCRAFT where y.config_date < x.f_file_creation_date
select
x.F_FILE_CREATION_DATE,
x.F_FILE_ARCHIVED_RELATIVE_PATH,
x.F_FILE_ARCHIVED_NAME,
x.K_AIRCRAFT,
y.config_date,
y.aircraft_id,
y.filefilter;
This time failed with the error
required (...)+ loop did not match anything at input 'where' in statement
Can someone give me a hint or two where i am going wrong and on how to fix this?
select f.f_file_creation_date
,f.f_file_archived_relative_path
,f.f_file_archived_name
,f.k_aircraft
,c.config_date
,c.aircraft_id
,c.filefilter
from t_flow as f
join (select config_date
,aircraft_id
,filefilter
,lead (config_date,1,date '3000-01-01') over
(
partition by aircraft_id
order by config_date
) as next_config_date
from t_conf
) c
on c.aircraft_id =
f.k_aircraft
where f.f_file_creation_date >= c.config_date
and f.f_file_creation_date < c.next_config_date
Please read carefully
Posting a question
When you post a data related question -
Supply a data sample: source data + required results.
It is going to be more clear than any explanation you give.
It will also supply a common background for further discussions and a way for you and others to verify the correctness of the given solutions.
Supply the size properties (records/volume) of the tables.
It is important for performance considerations ans might impact the given solution.
SQL
Hive currently does not support any JOIN condition type other than equijoin (e.g. t1.X = t2.X and t1.Y = t2.Y). This is why you get an error.
If you are doing an inner join (and not outer join) then you can move the non-equijoin conditions to the WHERE clause.
Stick to ISO SQL standard. There is a conventional order for SQL clauses: SELECT-FROM-WHERE...
You gain nothing from esoteric syntax except for esoteric error messages.
There is no reason what so ever to use sub-queries in order to narrow the columns list.
Just to make it perfectly clear - There isn't any performance gain doing that. More than that, if it would have work as you assume (and it does not) the performance would have been worse, not better.
I can't reproduce your error. I guess your query is valid.
What version do you use for Hive ? I tested this query with hive 2.1.1.
DROP TABLE IF EXISTS t_flow;
CREATE TABLE IF NOT EXISTS t_flow (
f_file_creation_date DATE
, f_file_archived_relative_path STRING
, f_file_archived_name STRING
, k_aircraft STRING
);
-- Conf table contains configuration information.
-- It has columns configdate, aircraftid, configurationame
DROP TABLE IF EXISTS t_conf;
CREATE TABLE IF NOT EXISTS t_conf (
config_date DATE
, aircraft_id STRING
, filefilter STRING
);
SELECT
x.f_file_creation_date,
x.f_file_archived_relative_path,
x.f_file_archived_name,
x.k_aircraft,
y.config_date,
y.aircraft_id,
y.filefilter
FROM
(SELECT
f_file_creation_date,
f_file_archived_relative_path,
f_file_archived_name,
k_aircraft
FROM t_flow f) x
JOIN
(SELECT
c.config_date,
c.aircraft_id,
c.filefilter
FROM t_conf c) y on y.aircraft_id = x.k_aircraft where y.config_date < x.f_file_creation_date;

Oracle Table Variables

Using Oracle PL/SQL is there a simple equivalent of the following set of T-SQL statements? It seems that everything I am finding is either hopelessly outdated or populates a table data type with no explanation on how to use the result other than writing values to stdout.
declare #tempSites table (siteid int)
insert into #tempSites select siteid from site where state = 'TX'
if 10 > (select COUNT(*) from #tempSites)
begin
insert into #tempSites select siteid from site where state = 'OK'
end
select * from #tempSites ts inner join site on site.siteId = ts.siteId
As #AlexPoole points out in his comment, this is a fairly contrived example.
What I am attempting to do is get all sites that meet a certain set of criteria, and if there are not enough matches, then I am looking to use a different set of criteria.
Oracle doesn't have local temporary tables, and global temporary tables don't look appropriate here.
You could use a common table expression (subquery factoring):
with tempSites (siteId) as (
select siteid
from site
where state = 'TX'
union all
select siteid
from site
where state = 'OK'
and (select count(*) from site where state = 'TX') < 10
)
select s.*
from tempSites ts
join site s on s.siteid = ts.siteid;
That isn't quite the same thing, but gets all the TX IDs, and only includes the OK ones if the count of TX ones - which has to be repeated - is less than 10. The CTE is then joined back to the original table, which all seems a bit wasteful; you're hitting the same table three times.
You could use a subquery directly in a filter instead:
select *
from site
where state = 'TX'
or (state = 'OK'
and (select count(*) from site where state = 'TX') < 10);
but again the TX sites have to be retrieved (or at least counted) a second time.
You can do this with a single hit of the table using an inline view (or CTE if you prefer) with an analytic count - which add the count of TX rows to the columns in the actual table, so you'd probably want to exclude that dummy column from the final result set (but using * is bad practice anyway):
select * -- but list columns, excluding tx_count
from (
select s.*,
count(case when state = 'TX' then state end) over (partition by null) as tx_count
from site s
where s.state in ('TX', 'OK')
)
where state = 'TX'
or (state = 'OK' and tx_count < 10);
From your description of your research it sounds like what you've been looking at involved PL/SQL code populating a collection, which you could still do, but it's probably overkill unless your real situation is much more complicated.

Resources