Hive command to execute NOT IN clause - hadoop

I have two tables,tab1 & tab2.
tab1(T1) tab2(T2)
a1 b1
b1 c1
c1 f1
d1 g1
I am looking for the values from table T1 that are not present in T2.
In this case, the output should be a1 d1
I have tried with the following query but couldn't get the right solution.
select distinct tab1.T1 from tab1 left semi join tab2 on (tab1.T1!=tab2.T2);

SELECT t1.str
FROM tab1 t1
LEFT OUTER JOIN tab2 t2 ON t1.str = t2.str
WHERE t2.str IS NULL;
Result:
OK
a1
d1
"Why is the t2.str is null condition there": Left outer joins ensure that all values from the first table are included in the results. So then what happens when there are no values in the second table: in that case all of the columns from the second table are reported as null.
So in the case above we are searching precisely for the condition that the second table entries are missing - and thus we:
Choose one of the never-empty (aka not null) columns from table two.
So: is number an always-present column? If not then please choose another one
Specify the condition "table1-alias"."table1-never-null-column" = null. That means that the record is actually not present in the join condition - and thus we found the records existing only in table 1.

Related

Hash Join with Partition restriction from third table

my current problem is in 11g, but I am also interested in how this might be solved smarter in later versions.
I want to join two tables. Table A has 10 million rows, Table B is huge and has a billion of records across about a thousand partitions. One partition has around 10 million records. I am not joining on the partition key. For most rows of Table A, one or more rows in Table B will be found.
Example:
select * from table_a a
inner join table_b b on a.ref = b.ref
The above will return about 50 million rows, whereas the results come from about 30 partitions of table b. I am assuming a hash join is the correct join here, hashing table a and FTSing/index-scanning table b.
So, 970 partitions were scanned for no reason. And, I have a third query that could tell oracle which 30 partitions to check for the join.
Example of third query:
select partition_id from table_c
This query gives exactly the 30 partitions for the query above.
To my question:
In PL/SQL one can solve this by
select the 30 partition_ids into a variable (be it just a select listagg(partition_id,',') ... into v_partitions from table_c
Execute my query like so:
execute immediate 'select * from table_a a
inner join table_b b on a.ref = b.ref
where b.partition_id in ('||v_partitions||')' into ...
Let's say this completes in 10 minutes.
Now, how can I do this in the same amount of time with pure SQL?
Just simply writing
select * from table_a a
inner join table_b b on a.ref = b.ref
where b.partition_id in (select partition_id from table_c)
does not do the trick it seems, or I might be aiming at the wrong plan.
The plan I think I want is
hash join
table a
nested loop
table c
partition pruning here
table b
But, this does not come back in 10 minutes.
So, how to do this in SQL and what execution plan to aim at? One variation I have not tried yet that might be the solution is
nested loop
table c
hash join
table a
partition pruning here (pushed predicate from the join to c)
table b
Another feeling I have is that the solution might lie in joining table a to table c (not sure on what though) and then joining this result to table b.
I am not asking you to type everything out for me. Just a general concept of how to do this (getting partition restriction from a query) in SQL - what plan should I aim at?
thank you very much! Peter
I'm not an expert at this, but I think Oracle generally does the joins first, then applies the where conditions. So you might get the plan you want by moving the partition pruning up into a join condition:
select * from table_a a
inner join table_b b on a.ref = b.ref
and b.partition_id in (select partition_id from table_c);
I've also seen people try to do this sort of thing with an inline view:
select * from table_a a
inner join (select * from table_b
where partition_id in (select partition_id from table_c)) b
on a.ref = b.ref;
thank you all for your discussions with me on this one. In my case this was solved (not by me) by adding a join-path between table_c and table_a and by overloading the join conditions as below. In my case this was possible by adding column partition_id to table_a:
select * from
table_c c
JOIN table_a a ON (a.partition_id = c.partition_id)
JOIN table_b b ON (b.partition_id = c.partition_id and b.partition_id = a.partition_id and b.ref = a.ref)
And this is the plan you want:
leading(c,b,a) use_nl(c,b) swap_join_inputs(a) use_hash(a)
So you get:
hash join
table a
nested loop
table c
partition list iterator
table b

Oracle how to use distinct command in join

Im am trying to get a list of distinct values from one column whilst getting the key column data by inner join on another table as below..table a and table b hold key column of client.
Table b holds column product which has a range of values against a range of client numbers
Table a holds only client numbet
Table b holds client number and product
Client product
1. A
1. B
2. B
3. C
I want to find the list of distinct product values where the client is in table a and table b
Any suggestions welcome
find the list of distinct product values where the client is in table
a and table b
As you will notice below "distinct" isn't applied in joins
SELECT DISTINCT
b.Product
FROM TABLEA a
INNER JOIN TABLEB b ON a.Client = b.Client
;
The inner join ensures that client exists in both A and B and then the "select distinct" removes any repetition in the list of products.
SELECT
b.Product
, COUNT(*) AS countof
FROM TABLEA a
INNER JOIN TABLEB b ON a.Client = b.Client
GROUP BY
b.Product
;
An alternative, which also makes a distinct list of products where clients are in A and B is to use group by, with the added bonus that this way you can do some extra stuff like counting how often a product is referenced.
try it at SQLFiddle.com

Joins are not giving the right answers

1)select count(*) from LCL_RKM_AuditForm; **O/P : 868**
2)select count(*) from RKM_KnowledgeArticleManager; **O/P : 8511**
3)select count(*) from
LCL_RKM_AuditForm A
**right** outer join
RKM_KnowledgeArticleManager B
on A.ARTICLE_ID=B.DocID; **O/P : 9216**
4)select count(*) from
LCL_RKM_AuditForm A
**left** outer join
RKM_KnowledgeArticleManager B
on A.ARTICLE_ID=B.DocID; **O/P : 1973**
5)select count(*) from
LCL_RKM_AuditForm A,RKM_KnowledgeArticleManager B
**where** A.ARTICLE_ID=B.DocID; **O/P : 1973**
My understanding is that.,.
Left outer join will Displays all the values in A table and common values in B table.
Right outer join will Displays all the values in B table and common values in A table.
What does that Common Values refers to ? If its the left outer join which means it should give only 868 results right ? And if its right outer join which means it should give only 8511 results right ?
5th statement i have used WHERE clause which means it should give me only 868 entries right ?
Please help me on this.
Your expected results appear to be based on the false assumption that there is a one-to-one mapping between rows in the two tables.
For a standard inner join (as in your last query), every matching combination of rows from the two tables is returned. Since you are getting more results than there are rows in the first table, it must be true that a given row in the first table may have multiple matching rows in the second table.
For instance, if there is one row in table A with ArticleID = 1, and two rows in Table B with DocID = 1, then a join of the two tables on these fields will produce 2 rows.
When you change to an outer join, you will get at least the same number of rows as the inner join, and potentially more. An outer join will return the same rows as the corresponding inner join; plus, for any row in the "inner" table that does not have any match in the "outer" table, it will return that one row, with NULL values for columns from the second table.
Your LEFT OUTER JOIN returns the same number of rows as the inner join; this implies that every row in table A has at least one matching row in table B.
Your RIGHT OUTER JOIN returns many more rows. This implies that there are many rows in table B that have no matching row in table A.

Comparing two tables, if rows are different, run query in Oracle

Think of my two tables have the same columns. One column is the ID, and the other one is the text. Is it possible to implement the following pseudo code in PLSQL?
Compare each row (They will have the same ID)
If anything is different about them
Run a couple of queries: an Update, and an Insert
ElseIf they are the same
Do nothing
Else the row does not exist
So add the row to the table compared on
Is it easy to do this using PLSQL or should I create a standalone application to do do this logic.
As your table have the same columns, by using NATURAL JOIN you can easily check if two corresponding rows are identical -- without need to update your code if a column is added to your table.
In addition, using OUTER JOIN allow you to find the rows present in one table but not in the other.
So, you can use something like that to achieve your purpose:
for rec in (
SELECT T.ID ID1,
U.ID ID2,
V.EQ
FROM T
FULL OUTER JOIN U ON T.ID = U.ID
FULL OUTER JOIN (SELECT ID, 1 EQ FROM T NATURAL JOIN U) V ON U.ID = V.ID)
loop
if rec.id1 is null
then
-- row in U but not in T
elsif rec.id2 is null
then
-- row in T but not in U
elsif rec.eq is null
-- row present in both tables
-- but content mismatch
end if
end loop
Else the row does not exist
So add the row to the table compared on
Is this condition means that rows can be missed in both tables? If only in one, then:
insert into t1 (id, text)
select id, text
from t2
minus
select id, text
from t1;
If missed records can be in both tables, you need the same query that inserts into table t2 rows from t1.
If anything is different about them
If you need one action for any amount of different rows, then use something like this:
select count(*)
into a
from t1, t2
where t1.id = t2.id and t1.text <> t2.text;
if a > 0 then
...
otherwise:
for i in (
select *
from t1, t2
where t1.id = t2.id and t1.text <> t2.text) loop
<do something>
end loop;
A 'merge' statement is what u needed.
Here is the syntax:
MERGE INTO TARGET_TABLE
USING SOURCE_TABLE
ON (CONDITION)
WHEN MATCHED THEN
UPDATE SET (DO YOUR UPDATES)
WHEN NOT MATCHED THEN
(INSERT YOUR NEW ROWS)
Google MERGE syntax for more about the statement.
Just use MINUS.
query_1
MINUS
query_2
In your case, if you really want to use PL/SQL, then select count into a local variable. Write a logic, if count > 0 then do other stuff.

suppress oracle data not found exception

I have a unique scenario where-in i need multiple column values to be put into multiple variables. The problem i am facing is that while one column value is present the others needn't be present, hence i end up with DATA NOT FOUND exception, while i want to suppress it and put empty values into the remaining variable.
select nvl(A,''), nvl(B,''), nvl(C,'')
into A1, B1, C1
from SAMPLE_TABLE
where name ='W6';
Value of A in the table can be 'hello', Value of B is null and Value of C is null in the table.
When the statement is executed inside the body of a stored proc i do not want the DATA NOT FOUND Exception, instead i want A1 to have the value 'hello', B1 as '' and C1 as ''. I will be running this dynamically and the where condition keeps changing, hence i do not want to open those many cursors either. Can anyone please let me know how i can accomplish the same?
Your analysis isn't quite correct. You only receive the DATA NOT FOUND error if the whole row is missing, i.e. the WHERE condition name ='W6' doesn't select any rows.
To avoid the error, you can use exception handling:
BEGIN
select A, B, C
into A1, B1, C1
from SAMPLE_TABLE where name ='W6';
EXCEPTION
WHEN NO_DATA_FOUND THEN
A1 := 'hello';
B1 := NULL;
C1 := NULL;
END;
Update:
If you want to select a NULL values even if the WHERE condition matches no row, then you can try the following query:
SELECT t.A, t.B, t.C
FROM DUAL
LEFT JOIN SAMPLE_TABLE t ON t.name = 'W6';
Update 2: Query with exactly one row:
This query should always return a single row:
SELECT A, B, C
INTO A1, B1, C1
FROM (
SELECT t.A, t.B, t.C
FROM DUAL
LEFT JOIN SAMPLE_TABLE t ON t.name = 'W6'
) x
WHERE ROWNUM <= 1;

Resources