How can I solve this hive sql problem? (Like join in Hive) - hadoop

I know Hive only provide equi join. For example, below sql statement.
select *
from A join B
on A.c1 = B.c2
where 1=1;
But I want to execute Like join Query in Hive. For example, below sql statement.
select *
from A join B
on A.c1 like B.c2
where 1=1;
Please let me know if you know the solution in Hive.

How about -
ON A.c1 LIKE concat('%',B.c2,'%')
concat will concatenate % to the c2 data so like operator will work properly.
whole sql will be like -
select * from A join B on A.c1 LIKE concat('%',B.c2,'%') where 1=1;
version - hive 2.1.1

Related

Subqueries with select in hive

Team,
I have an issue here, have 2 temporary table a & b with value as 5 & 6 for the respective column like a.ref1 & b.ref2.
I am trying to get these values into another SQL like
"select c.col1, d.col1,d.col2 from c join d on a.id=d.id where d.col1=(schema_name).a.ref1 or
d.col2=(schema_name).b.ref2"
I get error like
"Invalid table alias or column reference "
. any thoughts, why it behaving like this. I tried with select query to pass the temp table values but this does not work in hive .Any further assistance would be appreciated
You you can do this by using something called Common Table Expression, This will make your query perform better.
It would look like this:
WITH
refined_d AS
(
SELECT
d.id,
d.col1,
d.col2
FROM
d
INNER JOIN
(schema_name).a
ON
( (schema_name).a.ref1 = d.col1)
INNER JOIN
(schema_name).b
ON
( ( schema_name).b.ref2 = d.col2)
)
SELECT
c.col1,
d.col1,
d.col2
FROM
c
JOIN
refined_d d
ON
c.id=d.id;

Oracle join select result

I 've got this problem:
I have a select statement, which is rather time consuming.
I have to join the result with itself.
I want to do something like this:
Select table1.*, table2.Consumption
from (heavy select statement) table1 left outer join
(same heavy statement) table2
on table1."id" = table2."id" and table1."Year" -1 = table2."Year"
I don't want to catch the same data 2 times. I would rather like to do something like table1 table2. Is this possible?
I need this for an application, which executes querys but isn't able to use create or something like this, otherwise i would store the data in a table.
You can use a common table expression (CTE) and materialize the results of the heavy select statement:
WITH heavy AS ( SELECT /*+ MATERIALIZE */ ... (heavy select statemenet) )
Select table1.*, table2.Consumption
from heavy table1 left outer join
heavy table2
on table1."id" = table2."id" and table1."Year" -1 = table2."Year"

Hive Joins query

I have two tables in hive:
Table 1:
1,Nail,maher,24,6.2
2,finn,egan,23,5.9
3,Hadm,Sha,28,6.0
4,bob,hope,55,7.2
Table 2 :
1,Nail,maher,24,6.2
2,finn,egan,23,5.9
3,Hadm,Sha,28,6.0
4,bob,hope,55,7.2
5,john,hill,22,5.5
6,todger,hommy,11,2.2
7,jim,cnt,99,9.9
8,will,hats,43,11.2
Is there any way in Hive to retrieve the new data in table 2 that doesn't exist in table 1??
In other Databases tools, you would use a inner left/right. But inner left/right doesn't exist in Hive and suggestions how this could be achieved?
If you are using Hive version >= 0.13 you can use this query:
SELECT * FROM A WHERE A.firstname, A.lastname ... IN (SELECT B.firstname, B.lastname ... FROM B);
But I'm not sure if Hive supports multiple coloumns in the IN clause.
If not something like this could work:
SELECT * FROM A WHERE A.firstname IN (SELECT B.firstname FROM B) AND A.lastname IN (SELECT b.lastname FROM B) ...;
It might be wiser to concatenate the fields together before testing for NOT IN:
SELECT *
FROM t2
WHERE CONCAT(t2.firstname, t2.lastname, CAST(t2.val1 as STRING), CAST(t2.val2 as STRING)) NOT IN
(SELECT CONCAT(t2.firstname, t2.lastname, CAST(t2.val1 as STRING), CAST(t2.val2 as STRING))
FROM t1)
Performing sequential NOT IN sub-queries may give you erroneous results.
From the above example, a new record with the values ('nail','egan',28, 7.2) would not show up as new with sequential NOT IN statements.

Convert Oracle (Cross Join?) to Netezza when using comma separated table list instead of JOIN keywords

Below is is some Oracle PL/SQL code to join tables without using actual JOIN keywords. This looks like a cross join? How would I convert to Netezza SQL code? That's where I'm stuck.
SELECT COUNT(*)
FROM TABLE_A A, TABLE_A B
WHERE A.X = 'Y' AND A.PATH LIKE '/A/A/A'
AND B.X = 'Z' AND B.PATH LIKE '/B/B/B';
Oracle Cross Join:
http://www.sqlguides.com/sql_cross_join.php
Here's what I tried so far:
SELECT *
from TABLE_A A
cross join (
select * from TABLE_A
) B
WHERE
A.X = 'Y' AND A.PATH LIKE '/A/A/A'
AND B.X = 'Z' AND B.PATH LIKE '/B/B/B';
EDIT:
a_horse_with_no_name:
When I use either syntax in Netezza for the COUNT(*) in the very beginning, it works and returns a count of 60, which matches the first query above when running in Oracle. Without the WHERE clause in Netezza returns 125316 results, which matches the first query above when running in Oracle. When I use either syntax in Netezza for the SELECT * in the very beginning, I get error
ERROR [HY000] ERROR: Record size 70418 exceeds internal limit of 65535 bytes'
Had to use explicit columns in Netezza when doing a CROSS JOIN. Using SELECT * throws the error as indicated in my question EDIT. Also had to escape the '%' character by escaping nothing. Thank you a_horse_with_no_name. Cheers! "Where everybody knows your name." ;-)
select A.CODE, B.CODE, LOWER(A.DIM), LOWER(B.DIM)
FROM TABLE_A A
cross join TABLE_A B
WHERE A.PATH LIKE '\A\A\A%' ESCAPE '' AND A.X = 'Y'
AND B.PATH LIKE '\B\B\B%' ESCAPE '' AND B.X = 'Y'

HQL query to join 2 tables with the same key

Iam trying to this in HQL:
select A.a A.a1, B.b,B.b1 from A,B
where A.x=B.x;
It is simple to realize the join with sql but when returninig in HQL I find a problem.
would you please give me the HQL syntax for the join
Thanks for help.
Maybe something like this would work:
select ai.a, ai.a1, bi.b, bi.b1
from A ai, B bi where ai.x = bi.x

Resources