Multiple Table references in HIVE update statement - hadoop

I have given update statement which works perfectly in RDBMS, but not working in HIVE. Currently In HIVE, you can't have reference of multiple tables(As in query TABLE TableA,TableB referenced)
UPDATE A
FROM TableA A, TableB B
SET DepartmentId = B.DepartmentId
WHERE A.CustomerId = B.CustomerId ;
I need your help on How can I achieve the same in HIVE ? possible alternative in HIVE ?

Since Hive doesn't support row level inserts and updates there are few workarounds. The answer mentioned above is one such.
One way would be to do the same thing and again insert overwrite into the same table.
INSERT OVERWRITE TABLE A
SELECT A.c1,A.c2, ... , B.DepartmentId , ..
FROM TableA A, TableB B
WHERE A.CustomerId = B.CustomerId ;
This will be like updating the same table.

Hive tables are immutable. So update is not possible. You can always rewrite entire table:
CREATE TABLE TableA_new
AS
SELECT A.c1,A.c2, ... , B.DepartmentId , ..
FROM TableA A, TableB B
WHERE A.CustomerId = B.CustomerId ;
Answer is partially true, Update is available from HIVE 0.14 ;) GL

Related

Load data into newly added columns in Impala

Let's say I have a table1 in schema1 like this:
Stu_ID
Math
1
A
2
B
3
B+
Now, I want to add a new column, for instance, Literature, into table1 in schema1.
ALTER TABLE schema1.table 1
ADD COLUMN Literature STRING
Table1 now looks like
Stu_ID
Math
Literature
1
A
NULL
2
B
NULL
3
B+
NULL
I want to load data from table2, shema2 based on the respective Stu_ID. Is there a way to do so? I have thought of UPDATE, but Impala only supports updating a kudu table according to my understanding. Please correct me if I'm wrong.
instead of update you can insert+overwrite.
insert overwrite schema1.table1 t1
select
t1.stu_id, t1.Math, t2.Literature
from schema1.table1 t1
join schema2.table2 t2 ON t1.stu_id=t2.stu_id
This will replace whole data of t1 and will replace with old data + new column.

Deleting data from one table using data from a second table

I have a table table1, where there is a million data and a completely identical table table2, only there are only 106 data. How it is possible to delete these 106 data from table1?
In these two tables i have fields like id, date, param0, param1, param2.
Presuming that uniqueness is enforced through the ID column in both tables, then:
delete from table1 a
where exists (select null
from table2 b
where b.id = a.id
);
Otherwise, add some more columns (into the where clause) which will help you delete only rows you really want.

Using multiple select statement inside insert statement in Hive

I'm new in Hive. I have three tables like this:
table1:
id;value
1;val1
2;val2
3;val3
table2
num;desc;refVal
1;desc;0
2;descd;0
3;desc;0
I want to create a new table3 that contains:
num;desc;refVal
1;desc;3
2;descd;3
3;desc;3
Where num and desc are columns from table2 and refVal is the max value of column id in table1
Can someone guide me to solve this?
First, you have to create an table to hold this.
CREATE TABLE my_new_table;
After that, you have to insert into this table, as showed here
INSERT INTO TABLE my_new_table
[PARTITION (partcol1=val1, partcol2=val2 ...)]
select_statement1;
In the select_statement1 you can use the same select you would normally use to join and select the columns you need.
For more informations, you can check here

Data Migration - Verify Data loaded where Primary Key can change

I am currently trying to write SQL to verify the counts of the data that has been migrated from one application to another.
One of the main tables that is being migrated sometimes contains a primary key that already exists in the target application so it needs to be changed. This results in my counts not matching up.
I have a reference table for these changed primary keys but I'm not sure how to incorporate this reference table into my left join.
I really don't know how to include the condition where the key from Table A could be the key on Table B or the new key stored on the Reference table?
select count(*)
from table_b b
left join table_a a on
b.key = a.key
where a.key is null;
The reference table is really simple, two colmumns, old_number, new_number. It will only contain entries where the key in table A needed to be changed before being loaded into table B.
old_number, new_number
12345678, 13345678
23456781, 24456781
How can I include this scenario?
select count(*)
from table_b b
left join table_a a on
b.key = (a.key or new_number if it exists)
where a.key is null;
So, if the query can include the new_numbers in the reference table then the migration count should match the count in Table A.
This should work
select count() from table_b b, table_a a where b.key = a.key UNION select count() from table_b b, reference_table re where b.key = re.new_number;

How to copy all constrains and data form one schema to another in oracle

I am using Toad for oracle 12c. I need to copy a table and data (40M) from one shcema to another (prod to test). However there is an unique key(not the PK for this table) called record_Id col which has something data like this 3.000*******19E15. About 2M rows has same numbers(I believe its because very large number) which are unique in prod. When I try to copy it violets the unique key of that col. I am using toad "export data to another schema" function to copy the data.
when I execute query in prod
select count(*) from table_name
OR
select count(distinct(record_id) from table_name
Both query gives the exact same numbers of data.
I don't have DBA permission. How do I copy all data without violating unique key of the table.
Thanks in advance!
You can use UPSERT for decisional INSERT or UPDATE or you may write small procedure for this.
you may consider to use NOT EXISTS, but your data is big and it might not be resource efficient.
insert into prod_tab
select * from other_tab t1 where NOT exists (
select 1 from prod_tab t2 where t1.id = t2.id
);
In Oracle you can use a MERGE query for that.
The following query proceeds as follows for each data row :
if the source record_id does not yet exist in the target table, a new record is inserted
else, the existing record is updated with source values
For the sake of the example, I assumed that there are two other columns in the table : column1 and column2.
MERGE INTO target_table t1
USING (SELECT * from source_table t2)
ON (t1.record_id = t2.record_id)
WHEN MATCHED THEN UPDATE SET
t1.column1 = t2.column1,
t1.column2 = t2.column2
WHEN NOT MATCHED THEN INSERT
(record_id, column1, column2) VALUES (t2.record_id, t2.column1, t2.column2)

Resources