Joins are not giving the right answers - oracle

1)select count(*) from LCL_RKM_AuditForm; **O/P : 868**
2)select count(*) from RKM_KnowledgeArticleManager; **O/P : 8511**
3)select count(*) from
LCL_RKM_AuditForm A
**right** outer join
RKM_KnowledgeArticleManager B
on A.ARTICLE_ID=B.DocID; **O/P : 9216**
4)select count(*) from
LCL_RKM_AuditForm A
**left** outer join
RKM_KnowledgeArticleManager B
on A.ARTICLE_ID=B.DocID; **O/P : 1973**
5)select count(*) from
LCL_RKM_AuditForm A,RKM_KnowledgeArticleManager B
**where** A.ARTICLE_ID=B.DocID; **O/P : 1973**
My understanding is that.,.
Left outer join will Displays all the values in A table and common values in B table.
Right outer join will Displays all the values in B table and common values in A table.
What does that Common Values refers to ? If its the left outer join which means it should give only 868 results right ? And if its right outer join which means it should give only 8511 results right ?
5th statement i have used WHERE clause which means it should give me only 868 entries right ?
Please help me on this.

Your expected results appear to be based on the false assumption that there is a one-to-one mapping between rows in the two tables.
For a standard inner join (as in your last query), every matching combination of rows from the two tables is returned. Since you are getting more results than there are rows in the first table, it must be true that a given row in the first table may have multiple matching rows in the second table.
For instance, if there is one row in table A with ArticleID = 1, and two rows in Table B with DocID = 1, then a join of the two tables on these fields will produce 2 rows.
When you change to an outer join, you will get at least the same number of rows as the inner join, and potentially more. An outer join will return the same rows as the corresponding inner join; plus, for any row in the "inner" table that does not have any match in the "outer" table, it will return that one row, with NULL values for columns from the second table.
Your LEFT OUTER JOIN returns the same number of rows as the inner join; this implies that every row in table A has at least one matching row in table B.
Your RIGHT OUTER JOIN returns many more rows. This implies that there are many rows in table B that have no matching row in table A.

Related

Oracle how to use distinct command in join

Im am trying to get a list of distinct values from one column whilst getting the key column data by inner join on another table as below..table a and table b hold key column of client.
Table b holds column product which has a range of values against a range of client numbers
Table a holds only client numbet
Table b holds client number and product
Client product
1. A
1. B
2. B
3. C
I want to find the list of distinct product values where the client is in table a and table b
Any suggestions welcome
find the list of distinct product values where the client is in table
a and table b
As you will notice below "distinct" isn't applied in joins
SELECT DISTINCT
b.Product
FROM TABLEA a
INNER JOIN TABLEB b ON a.Client = b.Client
;
The inner join ensures that client exists in both A and B and then the "select distinct" removes any repetition in the list of products.
SELECT
b.Product
, COUNT(*) AS countof
FROM TABLEA a
INNER JOIN TABLEB b ON a.Client = b.Client
GROUP BY
b.Product
;
An alternative, which also makes a distinct list of products where clients are in A and B is to use group by, with the added bonus that this way you can do some extra stuff like counting how often a product is referenced.
try it at SQLFiddle.com

Can Mapside Join and Reduce side join have different O/P

The below code is present in PROD and runs daily, I am trying to optimize it.
I see that set hive.auto.convert.join=FALSE; is making it to do an Reduce side join which runs for 2.5 hours and produces an row count of 2324381 records.
If i set hive.auto.convert.join=TRUE; then it does an Map side join and runs only for 20 minutes and produces an row count of 5766529 records.
I need to know why the row counts differ and is this correct ? is it okay the row counts differ ? i was under the impression that the O/P or the query should remain the same irrespective of which join is happening.
The source data remains the same in both the case and every other condition is the same expect for the hive setting i am changing.
INSERT OVERWRITE TABLE krish
SELECT
s.svcrqst_id
s.svcrqst_lupdusr_id,
s.svcrqst_lstupd_dts as svcrqst_lupdt,
f.crsr_lupdt,
s.svcrqst_crt_dts,
s.svcrqst_asrqst_ind,
s.svcrtyp_cd,
s.svrstyp_cd,
s.asdplnsp_psuniq_id as psuniq_id,
s.svcrqst_rtnorig_in,
s.cmpltyp_cd,
s.catsrsn_cd,
s.apealvl_cd,
s.cnstnty_cd,
s.svcrqst_vwasof_dt,
f.crsr_master_claim_index,
t.svcrqct_cds,
r.sum_reason_cd,
r.sum_reason
from
table1 s
left outer join
(
select distinct
lpad(trim(i_srtp_sr_sbtyp_cd), 3, '0') as i_srtp_sr_sbtyp_cd,
lpad(trim(i_srtp_sr_typ_cd), 3, '0') as i_srtp_sr_typ_cd,
sum_reason_cd,
sum_reason
from table2
) r
on lpad(trim(s.svcrtyp_cd), 3, '0')=r.i_srtp_sr_typ_cd
and lpad(trim(s.svrstyp_cd), 3, '0')=r.i_srtp_sr_sbtyp_cd
left outer join table3 f
on trim(s.svcrqst_id)=trim(f.crsr_sr_id)
left outer join table4 t
on t.svcrqst_id=s.svcrqst_id
where
( year(s.svcrqst_lstupd_dts)=${hiveconf:YEAR} and month(s.svcrqst_lstupd_dts)=${hiveconf:MONTH} and day(s.svcrqst_lstupd_dts)=${hiveconf:DAY} )
or
( year(f.crsr_lupdt)=${hiveconf:YEAR} and month(f.crsr_lupdt)=${hiveconf:MONTH} and day(f.crsr_lupdt)=${hiveconf:DAY} )
;
After doing some more research with my data, I created all my source tables with same column partitioned and bucketed and reran my HQL.
This time the number of rows for both map side join and reduce side join came with same counts.
I think on the previous query since the data was not partitioned the map side and reduce side joins have different output.

Oracle Performance on columns joining Non Unique Index col with Unique Index col

I am having a query like as shown below:
select a.a1, a.a2, c.c1
from a inner join b
on a.nonUniqueIndexCol = b.uniqueIndexCol1
left outer join c
on c.nonUniqueIndexCol = b.uniqueIndexCol2
Are the joins going to suffer from performance hit since we have non unique index and unique index are connected through equality? I don't have the explain plan.

Hive command to execute NOT IN clause

I have two tables,tab1 & tab2.
tab1(T1) tab2(T2)
a1 b1
b1 c1
c1 f1
d1 g1
I am looking for the values from table T1 that are not present in T2.
In this case, the output should be a1 d1
I have tried with the following query but couldn't get the right solution.
select distinct tab1.T1 from tab1 left semi join tab2 on (tab1.T1!=tab2.T2);
SELECT t1.str
FROM tab1 t1
LEFT OUTER JOIN tab2 t2 ON t1.str = t2.str
WHERE t2.str IS NULL;
Result:
OK
a1
d1
"Why is the t2.str is null condition there": Left outer joins ensure that all values from the first table are included in the results. So then what happens when there are no values in the second table: in that case all of the columns from the second table are reported as null.
So in the case above we are searching precisely for the condition that the second table entries are missing - and thus we:
Choose one of the never-empty (aka not null) columns from table two.
So: is number an always-present column? If not then please choose another one
Specify the condition "table1-alias"."table1-never-null-column" = null. That means that the record is actually not present in the join condition - and thus we found the records existing only in table 1.

PL SQL - Join 2 tables and return max from right table

Trying to retrive the MAX doc in the right table.
SELECT F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC/100 As Received,
F431.PRAREC/100,
max(F431.PRDOC)
FROM PRODDTA.F43121 F431
LEFT OUTER JOIN PRODDTA.F4311 F43
ON
F43.PDKCOO=F431.PRKCOO
AND F43.PDDOCO=F431.PRDOCO
AND F43.PDDCTO=F431.PRDCTO
AND F43.PDSFXO=F431.PRSFXO
AND F43.PDLNID=F431.PRLNID
WHERE F431.PRDOCO = 401531
and F431.PRMATC = 2
and F43.PDLNTY = 'DC'
Group by
F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC,
F431.PRAREC/100
This query is still returning the two rows in the right table. Fairly new to SQL and struggling with the statement. Any help would be appreciated.
Without seeing your data it is difficult to tell where the problem might so I will offer a few suggestions that could help.
First, you are joining with a LEFT JOIN on the PRODDTA.F4311 but you have in the WHERE clause a filter for that table. You should move the F43.PDLNTY = 'DC' to the JOIN condition. This is causing the query to act like an INNER JOIN.
Second, you can try using a subquery to get the MAX(PRDOC) value. Then you can limit the columns that you are grouping on which could eliminate the duplicates. The query would them be similar to the following:
SELECT F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC/100 As Received,
F431.PRAREC/100,
F431.PRDOC
FROM PRODDTA.F43121 F431
INNER JOIN
(
-- subquery to get the max
-- then group by the distinct columns
SELECT PDKCOO, max(PRDOC) MaxPRDOC
FROM PRODDTA.F43121
WHERE PRDOCO = 401531
and PRMATC = 2
GROUP BY PDKCOO
) f2
-- join the subquery result back to the PRODDTA.F43121 table
on F431.PRDOC = f2.MaxPRDOC
AND F431.PDKCOO = f2.PDKCOO
LEFT OUTER JOIN PRODDTA.F4311 F43
ON F43.PDKCOO=F431.PRKCOO
AND F43.PDDOCO=F431.PRDOCO
AND F43.PDDCTO=F431.PRDCTO
AND F43.PDSFXO=F431.PRSFXO
AND F43.PDLNID=F431.PRLNID
AND F43.PDLNTY = 'DC' -- move this filter to the join instead of the WHERE
WHERE F431.PRDOCO = 401531
and F431.PRMATC = 2
If you provide your table structures and some sample data, it will be easier to determine the issue.

Resources