Workaround for delete in hive with join conditions - hadoop

So i'm trying to convert a SQL delete query to Hive one. Im using hive .12 version's which doesn't supports delete.
Below is the SQL query:
Delete from t1 c where exists(select 1 from t2 a where
a.emplid=c.employee_id and a.project_status='test')
Now i tried tried using NOT IN for the above query but due to some reasons we cannot use NOT IN in our queries.
below is Hive query i have written but i'm not sure as its not giving correct results. i'm pretty new to hive. can anyone help on this.
INSERT Overwrite table t1
select * from t1 c left outer join t2 a on (c.employee_id=a.employee_id)
where a.project_status= 'test'
and a.employee_id is null

Move project_status='test' condition to the subquery or into the on clause. Also you should select columns only from table c.
Example with filter in the subquery:
insert overwrite table t1
select c.*
from t1 c
left join (select employee_id
from t2
where project_status='test'
) a on (c.employee_id=a.employee_id)
where a.employee_id is null;
Example with additional condition in the ON:
insert overwrite table t1
select c.*
from t1 c
left join t2 a on (c.employee_id=a.employee_id and a.project_status='test')
where a.employee_id is null;

Related

Hive :Insert the records that are not present

I need to insert records into a table t1 from another table t2 such that insert only the records that are not in t2.
But when i use this query
insert into table t1 select * from t2 where id not in (select id from t1);
But I get error as
Correlating expression cannot contain qualified column reference.
Can anybody suggest me a query to do this.
t2.id
Yet another ridiculous hive limitation
insert into table t1 select * from t2 where t2.id not in (select id from t1);
You can also use below command :-
insert into table t1 select t2.* from t2 left join t1 on t2.id=t1.id where t1.id is NULL;

Select each row and insert into another table

At a first glance, this is my query
INSERT INTO T2
SELECT T1.*, T3.DATETIME, T3.STATEID
FROM T1 WHERE T3.DATETIME BETWEEN '01-NOV-15' AND SYSDATE
LEFT OUTER JOIN T3
ON (T1.DOCID = T3.DOCID);
T1 AND T2 are almost identicle in terms of number of columns; only T2 has two columns extra (CREATIONDATE, STATEID)
Each and every row of T1 needs to be inserted in T2 + the two additional columns CREATIONDATE & STATEID which are only available in T3. Both T1 and 'T3' share the DOCID which could be used to join them...
Some issues remain, help is much appreciated..
Getting Error
Error report -
SQL Error: ORA-00933: SQL command not properly ended
00933. 00000 - "SQL command not properly ended"
The problem in in the position of the WHERE clause, which has to come after the list of joined table:
INSERT INTO T2
SELECT T1.*, T3.DATETIME, T3.STATEID
FROM T1
LEFT OUTER JOIN T3
ON (T1.DOCID = T3.DOCID)
WHERE T3.DATETIME BETWEEN '01-NOV-15' AND SYSDATE;
Your syntax is incorrect, where comes after the join:
INSERT INTO T2
SELECT T1.*, T3.DATETIME, T3.STATEID
FROM T1
LEFT OUTER JOIN T3
ON (T1.DOCID = T3.DOCID)
WHERE T3.DATETIME BETWEEN '01-NOV-15' AND SYSDATE

subquery returning row in HQL

I want to do following:
INSERT INTO Table0(value1, value2)
SELECT
(SELECT t1.something1 FROM Table1 t1 WHERE t1.id = :t1id),
(SELECT max(t2.something2) FROM Table2 t2 WHERE t2.some = :t2Some)
FROM Table1, Table2
But hibernate complains that I want to insert entity (Table1) as string (value1). It looks like subqueries in HQL returns entities instead of column values. Can I force it not to do so?
I know, that I can do like:
INSERT INTO Table0(value1, value2)
SELECT t1.something1, max(t2.something2) FROM Table1 t1, Table2 t2 WHERE ...
but it generates bad SQL for Oracle, because there is HIBERNATE_SEQUENCE.NEXTVAL in SELECT and Oracle does not allow to do so.
It looks like first query works now, maybe it was about Hibernate version.

Distinct in hive

I want to use DISTINCT in my hive query.So query looks like
Insert overwrite table tablename select distinct a.id AS ID , a.id AS SID from a left join b on a.id = b.id;
So in above query I want to insert same value for two different column.With DISTINCT query doesn't work ,otherwise it works.

I want to update T1 using T3 column and by using three table relationships

UPDATE TABLE1 T1 SET T1.CENTERNAME=
(SELECT AC.CENTERNAME
FROM TABLE2 T2 INNER JOIN TABLET3 AN ON T2.CENTERID = T3.LOCATIONID
INNER JOIN TABLE1 T1 ON T3.LOG_ID = T1.LOGID W
HERE TRUNC(T1.ROW_DATE)='25-MAR-2014');
This gives the error 'ORA-01427: single-row subquery returns more than one row'.
The error message
ORA-01427: single-row subquery returns more than one row
means, er, the sub-query returns more than row. That is, this part of your statement ...
(SELECT AC.CENTERNAME
FROM TABLE2 T2 INNER JOIN TABLET3 AN ON T2.CENTERID = T3.LOCATIONID
INNER JOIN TABLE1 T1 ON T3.LOG_ID = T1.LOGID
WHERE TRUNC(T1.ROW_DATE)='25-MAR-2014')
returns more than row. The error occurs because the SET part of the UPDATE depends on the equality operator - SET T1.CENTERNAME= - so it can take only be one value.
Without more details about your data structure it is hard to be certain but I suspect what you really want is something like this
UPDATE TABLE1 T1
SET T1.CENTERNAME= (SELECT T2.CENTERNAME
FROM TABLE2 T2
INNER JOIN TABLE3 T3
ON T2.CENTERID = T3.LOCATIONID
WHERE T3.LOG_ID = T1.LOGID )
WHERE TRUNC(T1.ROW_DATE)='25-MAR-2014'
/
(I've tidied up your redaction to make the aliases consistent.)

Resources