How does the Oracle (+) join work in this scenario - oracle

I have a question about Oracle's (+) left/right join in a scenario where the 2 tables were joined on 2 columns but one column was with (+) but the other was without the (+). I am trying to convert a whole bunch of queries and changing them to the proper join and curious about this one.
The query using the (+) join works but when convert to a proper left/right join the results are different. See sample code.
--OLD
select *
from tbl1 a, tbl2 b
where a.col1 = b.col1 (+)
and a.col2 = b.col2
--CONVERTED
select *
from tbl1 a
left join tbl2 b on a.col1 = b.col1 and a.col2 = b.col2
Is there a way to make the CONVERTED code work just like the OLD?
Thanks.
RS..

Your first query will not do left outer join but it will do inner join as there are condition without (+) that is why using new format of join is more suitable for readability and less complexity.
If you really want to stick to old join format that is using (+) then you can use the following code.
select *
from tbl1 a, tbl2 b
where a.col1 = b.col1 (+)
and a.col2 = b.col2 (+)
Above code will do proper left outer join as all the condition pertaining to the table are now tagged with (+)
BUT, It is recommended that you use new join format, that is, use LEFT JOIN syntax.
Hope, It will clear your doubts.
Cheers!!

If you simply want to convert to ANSI syntax, then you should know that your old query is executed as an inner join and not as an outer join. The converted query should be:
select *
from tbl1 a
join tbl2 b on (
b.col1 = a.col1
and b.col2 = a.col2
)
If you think you have stumbled on a bug, and that the query should be executed as an outer join, then your query will be:
select *
from tbl1 a
left join tbl2 b on (
b.col1 = a.col1
and b.col2 = a.col2
)
If you want to repeat the same mistake as the original query, i.e appearing to be an outer join while actually being executed as an inner join... the query will be:
select *
from tbl1 a
left join tbl2 b on (
b.col1 = a.col1
)
where b.col2 = a.col2

Related

Oracle CASE statement optimization

Is there a way to optimize this statement in terms of performance?
SELECT
CASE
WHEN A.COL1 IN ('A','F','G','K','L') THEN 'VALUE1'
WHEN A.COL1 IS NULL AND B.COL1 IN ('A','F','G','K','L') THEN 'VALUE1'
ELSE NULL
AS VALUES_COLUMN
FROM
TABLE A LEFT JOIN TABLE B ON A.COD = B.COD
I was thinking about using an OR expression to avoid code redundance and reduce time comparison, like that:
SELECT
CASE
WHEN A.COL1 IN ('A','F','G','K','L') OR B.COL1 IN ('A','F','G','K','L') THEN 'VALUE1'
ELSE NULL
AS VALUES_COLUMN
FROM
TABLE A LEFT JOIN TABLE B ON A.COD = B.COD
Thanks
I don't know if this is "optimized" (could mean several different things), but it's shorter:
SELECT
CASE
WHEN NVL(A.COL1, B.COL1) IN ('A','F','G','K','L') THEN 'VALUE1'
ELSE NULL
AS VALUES_COLUMN
FROM
TABLE A LEFT JOIN TABLE B ON A.COD = B.COD

What is the purpose of (+) operator in a where clause, other than outer joins, in Oracle SQL?

I have some very old Oracle SQL code I need to review, as per below and am trying to understand what the (+) operator is doing in the where clause after the first use of it
select *
from table_a a,
table b b
where
a.id = b.id (+)
and b.seq_nb (+) = 1
and b.type_cd (+) = 'DOLLR'
I thought (+) was a outer join equivalent, so
from table_a a,
table b b
where
a.id = b.id (+)
would be the same as
from table a a left outer join table b b on a.id=b.id
so how can you have outer joins to hard coded variables as below?
b.seq_nb (+) = 1
and b.type_cd (+) = 'DOLLR'
Any help would be greatly appreciated, thank you!
It's the same as:
select *
from table_a a
left outer join table_b b
on a.id = b.id
and b.type_cd = 'DOLLR'
and b.seq_nb = 1
Sometimes also referred to as a "filtered outer join".
It is equivalent to an outer join with a derived table:
select *
from table_a a
left outer join (
select *
from table_b
where b.type_cd = 'DOLLR'
and b.seq_nb = 1
) b on a.id = b.id

Hive - how to reuse a sub-query in hive with optimal performance

What is the best way to structure/write a query in Hive when I have a complex sub-query that is repeated multiple times throughout the select statement?
I originally created a temporary table for the sub-query which was refreshed before each run. Then I began to use a CTE as part of the original query (discarding the temp table) for readability and noticed degraded performance. This made me curious about which implementation methods are best with respect to performance when needing to reuse sub-queries.
The data I am working with contains upwards of 10 million records. Below is an example of the query I wrote that made use of a CTE.
with temp as (
select
a.id,
x.type,
y.response
from sandbox.tbl_form a
left outer join sandbox.tbl_formStatus b
on a.id = b.id
left outer join sandbox.tbl_formResponse y
on b.id = y.id
left outer join sandbox.tbl_formType x
on y.id = x.typeId
where b.status = 'Completed'
)
select
a.id,
q.response as user,
r.response as system,
s.response as agent,
t.response as owner
from sandbox.tbl_form a
left outer join (
select * from temp x
where x.type= 'User'
) q
on a.id = q.id
left outer join (
select * from temp x
where x.type= 'System'
) r
on a.id = r.id
left outer join (
select * from temp x
where x.type= 'Agent'
) s
on a.id = s.id
left outer join (
select * from temp x
where x.type= 'Owner'
) t
on a.id = t.id;
There are issues in your query.
1) In the CTE you have three left joins without ON clause. This may cause serious performance problems because joins without ON clause are CROSS JOINS.
2) BTW where b.status = 'Completed' clause converts LEFT join with table b to the inner join though still without ON clause it multiplicates all records from a by all records from b with a where.
3) Most probably you do not need CTE at all. Just join correctly with ON clause and use case when type='User' then response end + aggregate using min() or max() by id:
select a.id
max(case when x.type='User' then y.response end) as user,
max(case when x.type='System' then y.response end) as system,
...
from sandbox.tbl_form a
left outer join sandbox.tbl_formStatus b
on a.id = b.id
left outer join sandbox.tbl_formResponse y
on b.id = y.id
left outer join sandbox.tbl_formType x
on y.id = x.typeId
where b.status = 'Completed' --if you want LEFT JOIN add --or b.status is null
group by a.id

How to implement left join on data range in hive

I want to convert the below oracle logic to hive.
Logic:
Select a.id,a.name,b.desc from table a left join table b on
a.num between b.min_num and b.max_num;
Could any one help me out to achieve the above logic in hive.
With this solution you have the control on the performance.
b ranges are being split to sub-ranges, small as you want (x).
Too big x will practically cause a CROSS JOIN.
Too small x might generate a huge set from b (x=1 will generate all b ranges' values).
set hivevar:x=100;
select a.id
,a.name
,b.desc
from table_a as a
left join
(select a.id
,b.desc
from table_a as a
inner join
(select b.min_num div ${hivevar:x} + pe.pos as sub_range_id
,b.*
from table_b as b
lateral view
posexplode(split(space(cast (b.max_num div ${hivevar:x} - b.min_num div ${hivevar:x} as int)),' ')) pe
) as b
on a.num div ${hivevar:x} =
b.sub_range_id
where a.num between b.min_num and b.max_num
) b
on b.id =
a.id
;
select a.id
,a.name
,b.desc
from table_a as a
left join (select a.id
,b.desc
from table_a as a
cross join table_b as b
where a.num between b.min_num and b.max_num
) b
on b.id =
a.id
;
select a.id
,a.name
,b.desc
from table_a as a
left join (select b.min_num + pe.pos as num
,b.desc
from table_b as b
lateral view
posexplode(split(space(b.max_num-b.min_num),' ')) pe
) b
on b.num =
a.num
;

LEFT OUTER JOIN WHEN IT HAS MULTIPLE TABLE SELECT QUERY

Currently I have joined two tables using inner join , like following
SELECT A.*,B.*
FROM A,B
WHERE A.COLUMN_A = B.COLUMN_B
now I want to join Left outer join to above results , lets say I want to join Table C
So I did like following
SELECT A.*,B.*
FROM A,B
LEFT OUTER JOIN C ON B.COLUMN_X = C.COLUMN_X
WHERE A.COLUMN_A = B.COLUMN_B
this is executing without errors in SQL navigator, But in this result I cannot see any output.
anything wrong in this query , please advise
Change it to have proper join syntax like
SELECT A.*,B.*
FROM A
INNER JOIN B ON A.COLUMN_A = B.COLUMN_B
LEFT OUTER JOIN C ON B.COLUMN_X = C.COLUMN_X;
Better change all to outer join
SELECT A.*,B.*
FROM A
LEFT JOIN B ON A.COLUMN_A = B.COLUMN_B
LEFT OUTER JOIN C ON B.COLUMN_X = C.COLUMN_X;
Use this
SELECT A.*,B.*,C.*
FROM A
INNER JOIN B
ON A.COLUMN_A = B.COLUMN_B
LEFT OUTER JOIN C
ON B.COLUMN_X = C.COLUMN_X
If you absolutely have to use legacy syntax, then use this. But I won't recommend it.
SELECT A.*,B.*,C.*
FROM A,B,C
where A.COLUMN_A = B.COLUMN_B
AND
B.COLUMN_X = C.COLUMN_X (+)

Resources