Difference in the following using Hive Left Outer Join - hadoop

I had an issue with the LEft outer join and my query was little bit wierd. Tried searching for it , but did not get any clues.
This is happening due to were clause has been added.
The query below is not working as expected.
SELECT count(*) from
BN_DATA_TEMP.EDS_DELIVERYITEM_GBQ_SAMPLE_DATA A
LEFT OUTER JOIN BN_DATA_TEMP.TBL_EDS_DELIVERYITEMS_DATES B
on (
to_Date(substr(A.LASTUPDATEDTIME,1,10)) = '2020-05-31'
and to_Date(substr(B.LASTUPDATEDTIME,1,10)) = '2020-05-31'
and A.deliveryid = B.deliveryid )
where B.deliveryid is null;
Trying to get Records in A but not in B. But the count is very different and it is wrong.
I need to consider only the dates with 2020-05-31 from both the tables.
Modified like the below and it gave the correct result.
select count(*) from
(SELECT * from
BN_DATA_TEMP.EDS_DELIVERYITEM_GBQ_SAMPLE_DATA where to_Date(substr(LASTUPDATEDTIME,1,10)) = '2020-05-31' ) A
LEFT OUTER JOIN
(select * from
BN_DATA_TEMP.TBL_EDS_DELIVERYITEMS_DATES where to_Date(substr(LASTUPDATEDTIME,1,10)) = '2020-05-31' ) B
on A.deliveryid = B.deliveryid
where B.deliveryid is null;
I have never use much of Left outer joins for Data validation. So thought of posting over here and learn the same.
Thanks in Advance.

Related

How do you "globally" tune a SQL query so it's driven by records in a view if the view name keeps changing but query structure stays the same?

I have a Crystal report running through an application that takes a long time to run due to an inefficient query that takes 15 minutes. We are running Oracle 19.4. CURSOR_SHARING = FORCE for the database and this is required per the vendor. See query below.
The problem is, the view name in the query (such as TW_RPT_11263_7833_199916 in the example below) changes based on the query run inside the application to provide a filtered list of record IDs. Each time the report is run based on a different application query, there is a different SQL ID depending on that particular view's selection criteria.
So, a SQL profile can be generated but it only works for one query/one view. Generating a SQL Profile even with the FORCE option did not make the query faster when it has a different view name TW_RPT_####_#####, and it did not use the sql_profile as seen in v$sql.
Adding a hint to the query works great; the query runs in 1 second (see SQL below). However with a different view name per user, this means that applying a hint would only work for one view and that specific query ID. Also I do not know how it would be possible to inject this hint; it's a Crystal report. Also I do not know if it's possible to use hints with pattern matching such as /*+ USE_HASH(TW_RPT_%) */ or to use some other technique that would change the hint depending on the view name.
The PR table has 2 million rows, whereas the view only has a few rows, and so the view needs to drive the query.
QUERY with hint USE_HASH takes <1 second, whereas without hint, it takes 15 minutes:
SELECT /*+ USE_HASH(TW_RPT_11263_7833_199916)*/ "PR"."ID", "PR"."NAME", "TW_V_IMPACT_LEVEL"."S_VALUE", "PROJECT"."NAME", "PR_1"."ID", "PROJECT_1"."NAME", "PR_1"."NAME", "PR_STATUS_TYPE_1"."NAME", "PR_STATUS_TYPE"."NAME", "PR_1"."PARENT_ID", "PROJECT_2"."NAME", "TW_RPT_11263_7833_199916"."ID", "TW_V_DESCRIPTION"."TEXT", "TW_V_MATERIAL_CONTINUATION_DEC"."TEXT", "TW_V_DESCRIPTION_1"."TEXT", "TW_V_JUSTIFICATION"."TEXT", "TW_V_CLOSURE_SUMMARY"."TEXT", "TW_V_QI_CLOSURE_SUMMARY"."TEXT" FROM (((((((((((((("TRACKWISE_OWNER"."PR" "PR" LEFT OUTER JOIN "TRACKWISE_OWNER"."PR" "PR_1" ON "PR"."ID"="PR_1"."ROOT_PARENT_ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_DESCRIPTION" "TW_V_DESCRIPTION" ON "PR"."ID"="TW_V_DESCRIPTION"."PR_ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_MATERIAL_CONTINUATION_DEC" "TW_V_MATERIAL_CONTINUATION_DEC" ON "PR"."ID"="TW_V_MATERIAL_CONTINUATION_DEC"."PR_ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_IMPACT_LEVEL" "TW_V_IMPACT_LEVEL" ON "PR"."ID"="TW_V_IMPACT_LEVEL"."PR_ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."PROJECT" "PROJECT" ON "PR"."PROJECT_ID"="PROJECT"."ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."PR_STATUS_TYPE" "PR_STATUS_TYPE" ON "PR"."STATUS_TYPE"="PR_STATUS_TYPE"."ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_CLOSURE_SUMMARY" "TW_V_CLOSURE_SUMMARY" ON "PR"."ID"="TW_V_CLOSURE_SUMMARY"."PR_ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_QI_CLOSURE_SUMMARY" "TW_V_QI_CLOSURE_SUMMARY" ON "PR"."ID"="TW_V_QI_CLOSURE_SUMMARY"."PR_ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."PROJECT" "PROJECT_1" ON "PR_1"."PROJECT_ID"="PROJECT_1"."ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_DESCRIPTION" "TW_V_DESCRIPTION_1" ON "PR_1"."ID"="TW_V_DESCRIPTION_1"."PR_ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."PR_STATUS_TYPE" "PR_STATUS_TYPE_1" ON "PR_1"."STATUS_TYPE"="PR_STATUS_TYPE_1"."ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."PR" "PR_2" ON "PR_1"."PARENT_ID"="PR_2"."ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_JUSTIFICATION" "TW_V_JUSTIFICATION" ON "PR_1"."ID"="TW_V_JUSTIFICATION"."PR_ID") LEFT OUTER JOIN "TRACKWISE_OWNER"."PROJECT" "PROJECT_2" ON "PR_2"."PROJECT_ID"="PROJECT_2"."ID") INNER JOIN "TRACKWISE_OWNER"."TW_RPT_11263_7833_199916" "TW_RPT_11263_7833_199916" ON "PR"."ID"="TW_RPT_11263_7833_199916"."ID" WHERE ("PROJECT"."NAME"='Quality Investigation - SC' OR "PROJECT"."NAME"='Quality Issue')
I am looking for any ideas to help Oracle figure out the best join order for a query having this structure, regardless of the name of the view (TW_RPT_####-######). An assumption can definitely be made that the view will always have considerably fewer rows than the PR table.
Here is an example view created by the application based on what the end user specifies in the application query before running the report:
**TW_RPT_11263_7833_199916:**
CREATE OR REPLACE FORCE EDITIONABLE VIEW "TRACKWISE_OWNER"."TW_RPT_11263_7833_199916" ("ID") DEFAULT COLLATION "USING_NLS_COMP" AS
SELECT DISTINCT PR.id
FROM
pr, project , Project_member, Group_member
WHERE
project.id = pr.project_id AND
pr.id IN (
SELECT
pr_addtl_data.pr_id
FROM
pr_addtl_data
WHERE
pr_addtl_data.pr_id = pr.id AND
pr_addtl_data.data_field_id = 573 AND
pr_addtl_data.n_value IN (6164231)
) AND PR.project_parent_id IN(366,279,395,396) AND Project_member.project_id = PR.project_parent_id AND Group_member.project_member_id = Project_member.id AND Project_member.person_rel_id = 13836 AND ((Project_member.view_all = 1) OR (Project_member.view_self_created = 1 and PR.created_by_rel_id = 13836) OR (Project_member.view_assigned_to = 1 and PR.responsible_rel_id = 13836) OR (Project_member.view_group_created = 1 and PR.user_group_id = Group_member.user_group_id) OR (Project_member.view_by_entity = 1 and PR.entity_id = 1251));
The result from the view is two record IDs as follows, and this returns in milliseconds:
2012202 and 2012397
One option is to use a Command as the data source for the report.
A parameter can control the Table/View used in the Command syntax.
Just use better naming for aliases: use something more common than TW_RPT_11263_7833_199916. For example, instead of INNER JOIN "TRACKWISE_OWNER"."TW_RPT_11263_7833_199916" "TW_RPT_11263_7833_199916" use INNER JOIN "TRACKWISE_OWNER"."TW_RPT_11263_7833_199916" "TW_RPT_JOINED" and use it for your hints
SELECT /*+ USE_HASH(TW_RPT_JOINED)*/
"PR"."ID",
"PR"."NAME",
"TW_V_IMPACT_LEVEL"."S_VALUE",
"PROJECT"."NAME",
"PR_1"."ID",
"PROJECT_1"."NAME",
"PR_1"."NAME",
"PR_STATUS_TYPE_1"."NAME",
"PR_STATUS_TYPE"."NAME",
"PR_1"."PARENT_ID",
"PROJECT_2"."NAME",
"TW_RPT_JOINED"."ID",
"TW_V_DESCRIPTION"."TEXT",
"TW_V_MATERIAL_CONTINUATION_DEC"."TEXT",
"TW_V_DESCRIPTION_1"."TEXT",
"TW_V_JUSTIFICATION"."TEXT",
"TW_V_CLOSURE_SUMMARY"."TEXT",
"TW_V_QI_CLOSURE_SUMMARY"."TEXT"
FROM
(
(
(
(
(
(
(
(
(
(
(
(
(
("TRACKWISE_OWNER"."PR" "PR"
LEFT OUTER JOIN "TRACKWISE_OWNER"."PR" "PR_1"
ON "PR"."ID"="PR_1"."ROOT_PARENT_ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_DESCRIPTION" "TW_V_DESCRIPTION"
ON "PR"."ID"="TW_V_DESCRIPTION"."PR_ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_MATERIAL_CONTINUATION_DEC" "TW_V_MATERIAL_CONTINUATION_DEC"
ON "PR"."ID"="TW_V_MATERIAL_CONTINUATION_DEC"."PR_ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_IMPACT_LEVEL" "TW_V_IMPACT_LEVEL"
ON "PR"."ID"="TW_V_IMPACT_LEVEL"."PR_ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."PROJECT" "PROJECT"
ON "PR"."PROJECT_ID"="PROJECT"."ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."PR_STATUS_TYPE" "PR_STATUS_TYPE"
ON "PR"."STATUS_TYPE"="PR_STATUS_TYPE"."ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_CLOSURE_SUMMARY" "TW_V_CLOSURE_SUMMARY"
ON "PR"."ID"="TW_V_CLOSURE_SUMMARY"."PR_ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_QI_CLOSURE_SUMMARY" "TW_V_QI_CLOSURE_SUMMARY"
ON "PR"."ID"="TW_V_QI_CLOSURE_SUMMARY"."PR_ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."PROJECT" "PROJECT_1"
ON "PR_1"."PROJECT_ID"="PROJECT_1"."ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_DESCRIPTION" "TW_V_DESCRIPTION_1"
ON "PR_1"."ID"="TW_V_DESCRIPTION_1"."PR_ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."PR_STATUS_TYPE" "PR_STATUS_TYPE_1"
ON "PR_1"."STATUS_TYPE"="PR_STATUS_TYPE_1"."ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."PR" "PR_2"
ON "PR_1"."PARENT_ID"="PR_2"."ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."TW_V_JUSTIFICATION" "TW_V_JUSTIFICATION"
ON "PR_1"."ID"="TW_V_JUSTIFICATION"."PR_ID"
)
LEFT OUTER JOIN "TRACKWISE_OWNER"."PROJECT" "PROJECT_2"
ON "PR_2"."PROJECT_ID"="PROJECT_2"."ID"
)
INNER JOIN "TRACKWISE_OWNER"."TW_RPT_11263_7833_199916" "TW_RPT_JOINED"
ON "PR"."ID"="TW_RPT_JOINED"."ID"
WHERE ("PROJECT"."NAME"='Quality Investigation - SC' OR "PROJECT"."NAME"='Quality Issue')
PS. "Wonderful" SQL generator - why so many (((()))))...

ORACLE - Update Multiple Columns with Values from Nested Join

I'm practically new in using oracle and I bumped into a blocker. Below is the query that I created based on what I have researched online to update multiple columns of a table with values from a nested join statement.
UPDATE
(
SELECT
A.COLUMN1 OLD_COLUMN1,
BC.COLUMN1 NEW_COLUMN1,
A.BALANCE OLD_COLUMN2,
BC.COLUMN2_MIN NEW_COLUMN2,
A.COLUMN3 OLD_COLUMN3,
BC.COLUMN3 NEW_COLUMN3
FROM TABLE_A A
INNER JOIN
(
SELECT B.TWWID,
B.ITEMDATE,
B.COLUMN2_MIN,
C.COLUMN3,
C.COUNTRYID,
C.COLUMN1
FROM TABLE_B B
LEFT OUTER JOIN TABLE_C C
ON TO_CHAR(B.ID) = TO_CHAR(C.ID)
) BC
ON A.ID = BC.ID
AND A.DATE = BC.DATE
)ABCUPDATE
SET ABCUPDATE.OLD_COLUMN1 = ABCUPDATE.NEW_COLUMN1,
ABCUPDATE.OLD_COLUMN2 = ABCUPDATE.NEW_COLUMN2,
ABCUPDATE.OLD_COLUMN3 = ABCUPDATE.NEW_COLUMN3;
Selecting the sub-query returns the expected results but when I run the update script as a whole an error is returned.
ORA-01779: cannot modify a column which maps to a non key-preserved
table
Can anyone please explain why I encounter this error and what adjustments can I do to the script to make it work?
Thanks in advance!

Oracle: Which one of the queries is more efficient?

Query 1
select student.identifier,
id_tab.reporter_name,
non_id_tab.reporter_name
from student_table student
inner join id_table id_tab on (student.is_NEW = 'Y'
and student.reporter_id = id_tab.reporter_id
and id_tab.name in('name1','name2'))
inner join id_table non_id_tab on (student.non_reporter_id = non_id_tab.reporter_id)
Query 2
select student.identifier,
id_tab.reporter_name,non_id_tab.reporter_name
from student_table student,
id_table id_tab,
id_table non_id_tab
where student.is_NEW = 'Y'
and student.reporter_id = id_tab.reporter_id
and id_tab.name in('name1','name2')
and student.non_reporter_id = non_id_tab.reporter_id
Since these two queries produce exactly same output,I am assuming they are syntactically same(please correct me if I am wrong).
I was wondering whether either of them is more efficient that the other.
Can anyone help me here please?
I would rewrite it as follows, using the ON only for JOIN conditions and moving the filters to a WHERE condition:
...
from student_table student
inner join id_table id_tab on ( student.reporter_id = id_tab.reporter_id )
inner join id_table non_id_tab on (student.non_reporter_id = non_id_tab.reporter_id)
where student.is_NEW = 'Y'
and id_tab.name in('name1','name2')
This should give a more readable query; however, no matter how you write it (the ANSI join is highly preferrable), you should check the explain plans to understand how the query will be executed.
In terms of performance, there should be no difference.
Execution Plans created by the Oracle optimizer do not differ.
In terms of readability, joining tables inside the WHERE clause is an old style (SQL89).
From SQL92 and higher, it is recommended to use the JOIN syntax.

Linq-to-SQL left join on left join/multiple left joins in one Linq-to-SQL statement

I'm trying to rewrite SQL procedure to Linq, it all went well and works fine, as long as it works on small data set. I couldn't really find answer to this anywhere. Thing is, I have 3 joins in the query, 2 are left joins and 1 is inner join, they all join to each other/like a tree. Below you can see SQL procedure:
SELECT ...
FROM sprawa s (NOLOCK)
LEFT JOIN strona st (NOLOCK) on s.ident = st.id_sprawy
INNER JOIN stan_szczegoly ss (NOLOCK) on s.kod_stanu = ss.kod_stanu
LEFT JOIN broni b (NOLOCK) on b.id_strony = st.ident
What I'd like to ask you is a way to translate this to Linq. For now I have this:
var queryOne = from s in db.sprawa
join st in db.strona on s.ident equals st.id_sprawy into tmp1
from st2 in tmp1.DefaultIfEmpty()
join ss in db.stan_szczegoly on s.kod_stanu equals ss.kod_stanu
join b in db.broni on st2.ident equals b.id_strony into tmp2
from b2 in tmp2.DefaultIfEmpty()
select new { };
Seems alright, but when checked with SQL Profiler, query that is sent to database looks like that:
SELECT ... FROM [dbo].[sprawa] AS [Extent1]
LEFT OUTER JOIN [dbo].[strona] AS [Extent2]
ON [Extent1].[ident] = [Extent2].[id_sprawy]
INNER JOIN [dbo].[stan_szczegoly] AS [Extent3]
ON [Extent1].[kod_stanu] = [Extent3].[kod_stanu]
INNER JOIN [dbo].[broni] AS [Extent4]
ON ([Extent2].[ident] = [Extent4].[id_strony]) OR
(([Extent2].[ident] IS NULL) AND ([Extent4].[id_strony] IS NULL))
As you can see both SQL queries are bit different. Effect is the same, but latter works incomparably slower (less than a second to over 30 minutes). There's also a union made, but it shouldn't be the problem. If asked for I'll paste code for it.
I'd be grateful for any advice on how to better the performance of my Linq statement or how to write it in a way that is translated properly.
I guess I found the solution:
var queryOne = from s in db.sprawa
join st in db.strona on s.ident equals st.id_sprawy into tmp1
where tmp1.Any()
from st2 in tmp1.DefaultIfEmpty()
join ss in db.stan_szczegoly on s.kod_stanu equals ss.kod_stanu
join b in db.broni on st2.ident equals b.id_strony into tmp2
where tmp2.Any()
from b2 in tmp2.DefaultIfEmpty()
select new { };
In other words where table.Any() after each into table statement. It doesn't make translation any better but has sped up execution time from nearly 30minutes(!) to about 5 seconds.
This has to be used carefully though, because it MAY lead to losing some records in result set.

PL SQL - Join 2 tables and return max from right table

Trying to retrive the MAX doc in the right table.
SELECT F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC/100 As Received,
F431.PRAREC/100,
max(F431.PRDOC)
FROM PRODDTA.F43121 F431
LEFT OUTER JOIN PRODDTA.F4311 F43
ON
F43.PDKCOO=F431.PRKCOO
AND F43.PDDOCO=F431.PRDOCO
AND F43.PDDCTO=F431.PRDCTO
AND F43.PDSFXO=F431.PRSFXO
AND F43.PDLNID=F431.PRLNID
WHERE F431.PRDOCO = 401531
and F431.PRMATC = 2
and F43.PDLNTY = 'DC'
Group by
F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC,
F431.PRAREC/100
This query is still returning the two rows in the right table. Fairly new to SQL and struggling with the statement. Any help would be appreciated.
Without seeing your data it is difficult to tell where the problem might so I will offer a few suggestions that could help.
First, you are joining with a LEFT JOIN on the PRODDTA.F4311 but you have in the WHERE clause a filter for that table. You should move the F43.PDLNTY = 'DC' to the JOIN condition. This is causing the query to act like an INNER JOIN.
Second, you can try using a subquery to get the MAX(PRDOC) value. Then you can limit the columns that you are grouping on which could eliminate the duplicates. The query would them be similar to the following:
SELECT F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC/100 As Received,
F431.PRAREC/100,
F431.PRDOC
FROM PRODDTA.F43121 F431
INNER JOIN
(
-- subquery to get the max
-- then group by the distinct columns
SELECT PDKCOO, max(PRDOC) MaxPRDOC
FROM PRODDTA.F43121
WHERE PRDOCO = 401531
and PRMATC = 2
GROUP BY PDKCOO
) f2
-- join the subquery result back to the PRODDTA.F43121 table
on F431.PRDOC = f2.MaxPRDOC
AND F431.PDKCOO = f2.PDKCOO
LEFT OUTER JOIN PRODDTA.F4311 F43
ON F43.PDKCOO=F431.PRKCOO
AND F43.PDDOCO=F431.PRDOCO
AND F43.PDDCTO=F431.PRDCTO
AND F43.PDSFXO=F431.PRSFXO
AND F43.PDLNID=F431.PRLNID
AND F43.PDLNTY = 'DC' -- move this filter to the join instead of the WHERE
WHERE F431.PRDOCO = 401531
and F431.PRMATC = 2
If you provide your table structures and some sample data, it will be easier to determine the issue.

Resources