Hive - create table by select columns from different tables - hadoop

Here are my hive tables:
table1:
|a |b |c |
----------
|a1|b1|c1|
|a2|b2|c2|
|a3|b3|c3|
|a4|b4|c4|
|a5|b5|c5|
table2:
|x |y |z |
----------
|x1|y1|z1|
|x2|y2|z2|
|x3|y3|z3|
|x4|y4|z4|
|x5|y5|z5|
Desired output:
|a |b |x |y |
-------------
|a1|b1|x1|y1|
|a2|b2|x2|y2|
|a3|b3|x3|y3|
|a4|b4|x4|y4|
|a5|b5|x5|y5|
is it really possible in hive? Any help would be appreciated, Thank you!

You seem to want to "line up" the rows of both tables. Assuming that column a can be used to order the record in table1 (resp column x in table2), you can use row_number() as follows:
select t1.a, t1.b, t2.x, t2.y
from (select t1.*, row_number() over(order by a) rn from table1 t1) t1
inner join (select t2.*, row_number() over(order by x) rn from table2 t2) t2
on t1.rn = t2.rn
If the tables may have a different number of rows, and you want to retain "additional" rows, you can just change the inner join to a full join.

Related

How to get mismatch records of two tables from same database in hive?

Eg:
select username, country from table1
Minus
Select username, country from table2;
The above minus query works in RDBMS but i want the same result using hive. Can we use joins here in hive to get the result? If so how to get proper result using hive query.
Set operations (MINUS/EXCEPT/INTERSECT in addition to UNION) are supported as of Hive 2.3.0 (released on 17 July 2017)
https://issues.apache.org/jira/browse/HIVE-12764
Demo
create table table1 (username string, country string);
create table table2 (username string, country string);
insert into table1 values ('Danny','USA'),('Danny','USA'),('David','UK');
insert into table2 values ('David','UK'),('Michal','France');
select username, country from table1
minus
Select username, country from table2
;
+--------------+-------------+
| _u1.username | _u1.country |
+--------------+-------------+
| Danny | USA |
+--------------+-------------+
In older Hive version you can use -
select username
,country
from ( select 1 tab,username, country from table1
union all select 2 tab,username, country from table2
) t
group by username
,country
having count(case when tab = 2 then 1 end) = 0
;
+----------+---------+
| username | country |
+----------+---------+
| Danny | USA |
+----------+---------+
You may utilize left join as follows
select table1.username, table1.country
from table1 left join table2
on table1.username=table2.username and table1.country=table2.country
where table2.username is NULL and table2.country is NULL;
Yes , As minus and exist not usually work in hive we can do minus operation by below LEFT JOIN condition.
SELECT t1.username, t1.country
FROM
(select username, country from table1) t1
LEFT JOIN
(Select username, country from table2) t2
ON t1.username =t2.username
AND t1.country =t2.country
WHERE t1.username IS NULL
IMP NOTE:Please do use WHERE CLAUSE FOR NULL Operations instead of AND after join condition this will have different results.

How to update table in Hive 0.13?

My Hive version is 0.13. I have two tables, table_1 and table_2
table_1 contains:
customer_id | items | price | updated_date
------------+-------+-------+-------------
10 | watch | 1000 | 20170626
11 | bat | 400 | 20170625
table_2 contains:
customer_id | items | price | updated_date
------------+----------+-------+-------------
10 | computer | 20000 | 20170624
I want to update records of table_2 if customer_id already exists in it, if not, it should append to table_2.
As Hive 0.13 does not support update, I tried using join, but it fails.
You can use row_number or full join. This is example using row_number:
insert overwrite table_1
select customer_id, items, price, updated_date
from
(
select customer_id, items, price, updated_date,
row_number() over(partition by customer_id order by new_flag desc) rn
from
(
select customer_id, items, price, updated_date, 0 as new_flag
from table_1
union all
select customer_id, items, price, updated_date, 1 as new_flag
from table_2
) all_data
)s where rn=1;
Also see this answer for update using FULL JOIN: https://stackoverflow.com/a/37744071/2700344

Retrieve from Oracle db key value pair

I need to retrieve 3 values with different key from a key value pair table.
My database schema as follows. I need to reach to table3 from table1 by taking the E_SUBID and then joining the table2 with E_SUBID. Once table1 and table2 are joined I need take to take E_CID from table2 to join it with table2 E_CID to get the "Attr_Value" keeping E_CID as a criteria.
Table1
------------------------
|E_SUBID| B_LocationID |
|1 100 |
|2 101 |
|3 102 |
Table2
-----------------
|E_CID | E_SUBID|
|10 1 |
|11 2 |
|12 3 |
Table3
---------------------------------
|E_CID | Attr_name | Attr_Value |
|10 Product Samsung |
|10 Model Smartphone |
|10 usage daily |
|11 Product Apple |
|11 Model Ipad |
|11 usage everyday |
|12 Model smartwatch |
I have been successful to join table1,table2 and table3 but I cannot get the required output which as follows
OUTPUT
|Product | Model | Usage |
Samsung Smartphone daily
Apple Ipad everyday
null smartwatch null
The query which joins table1, table2 and table3 as follows
select distinct t3.Attr_value as Product
from table1 t1, table2 t2, table3 t3
where t1.E_SUBID = t2.E_SUBID and
t2.E_CID = t3.E_CID and
t3.Attr_name=?????
order by Product;
Thank you for your time.
In a case like this, you can join to table3 as often as you need to for each attribute name you wish to display:
select
p.attr_value product,
m.attr_value "model", -- Quotes to escape reserved word
u.attr_value usage
from table1 t1
join table2 t2 on t1.e_subid = t2.e_subid
left outer join table3 p on t2.e_cid = p.e_cid and p.attr_name = 'Product'
left outer join table3 m on t2.e_cid = m.e_cid and m.attr_name = 'Model'
left outer join table3 u on t2.e_cid = u.e_cid and u.attr_name = 'Usage'
order by 1;
Edit
Based on the comment, by making table3 optional (outer join) the query should return all rows and whether or not a Model or Usage or Product has been defined.
Try as below ... Basically you are trying to transpose the rows to column in table3.
Select Product, "Model", Usage
From
(
Select
t1.E_SUBID,
t2.E_CID,
Max(Case when T3.Attr_name = 'Product' Then T3.Attr_Value else null end) Product,
max(Case when T3.Attr_name = 'Model' Then T3.Attr_Value else null end) Model,
max(Case when T3.Attr_name = 'Usage' Then T3.Attr_Value else null end) Usage
From Table1 t1,
Table2 t2,
Table3 t3
Where
t1.E_SUBID = t2.E_SUBID
and t2.E_CID = t3.E_CID
group by t1.t1.E_SUBID,t2.E_CID
);

Aggregate columns but distinct terms should be inserted

I have two table and I want to merge them
TERMS_TABLE
ID | TERMS
309 | 'hardware'
309 | 'software'
309 | 'computer'
TFIDF_TABLE
ID | TERMS
309 |'computer,phone,mp3....'
Now I want to add TERMS column of TERMS_TABLE to terms column of TFIDF_TABLE but If TFIDF_TABLE already contains TERMS of TERMS_TABLE then I should not insert this term to the NEW_TFIDF_TABLE , like that
result should be:
NEW_TFIDF_TABLE
ID | TERMS
309 |'computer,phone,mp3....,hardware,software'
How can I do that ?
If you use Oracle 11 you can try this:
select t3.id, t3.terms||','||t4.terms terms from
(
select t1.id, listagg(t1.terms,',') within group (order by t1.terms) terms
from terms_table t1 join tfidf_table t2 on t1.id=t2.id
where instr(t2.terms,t1.terms)=0
group by t1.id )
t3 right outer join tfidf_table t4 on t3.id=t4.id
On Oracle 10 you could try
select t3.id, t3.terms||','||t4.terms terms from
(
select t1.id, wm_concat(t1.terms) terms
from terms_table t1 join tfidf_table t2 on t1.id=t2.id
where instr(t2.terms,t1.terms)=0
group by t1.id )
t3 right outer join tfidf_table t4 on t3.id=t4.id

Update query Oracle

I have table TB1 which has the following cols:
ID | date
---------------------
1 | 12-JUL-10
2 | 12-JUL-10
3 | 12-JUL-10
4 | 12-JUL-10
.
.
.
10000 | 12-JUL-10
table2
ID | date
---------------------
1 | 12-JAN-09
2 | 12-JUL-09
3 | 12-JUL-09
4 | 12-JUL-08
.
.
.
5800 | 12-JUL-08
How to update the table2's date which has similar ID as table1.
Thanks :)
In general
UPDATE table2 t2
SET date_col = (SELECT t1.date_col
FROM table1 t1
WHERE t1.id = t2.id)
WHERE EXISTS (
SELECT 1
FROM table1 t1
WHERE t1.id = t2.id )
If you can be guaranteed that every ID in table2 exists in table1 (or if you want the date_col set to NULL if there is no match), you can eliminate the WHERE EXISTS. But generally you only want to do an update if there is a matching record.
Then there is also using an inline view for the update. This is slightly trickier to get right because I think it requires a primary key to exist on both sides of the join otherwise it fails with an error.
update (
select
t1.id as t1_id,
t1.value as t1_date,
t2.id as t2_id,
t2.value as t2_date
from
table1 t1
join table2 t2 on (t1.id = t2.id)
)
set t2_date = t1_date

Resources