Hive :Insert the records that are not present - hadoop

I need to insert records into a table t1 from another table t2 such that insert only the records that are not in t2.
But when i use this query
insert into table t1 select * from t2 where id not in (select id from t1);
But I get error as
Correlating expression cannot contain qualified column reference.
Can anybody suggest me a query to do this.

t2.id
Yet another ridiculous hive limitation
insert into table t1 select * from t2 where t2.id not in (select id from t1);

You can also use below command :-
insert into table t1 select t2.* from t2 left join t1 on t2.id=t1.id where t1.id is NULL;

Related

Oracle join clause where varchar2(4 byte) causing issue

Title was tough to choose my wording.
I have 2 tables I want to join together via a lg_code. Both columns are VARCHAR2(4 byte). I am running into an issue where table1 lg_code = 0003 and table2 lg_code = 3. The three 0's are causing an issue with the join and not returning all the data needed. How would I go about writing the join clause to fix this issue?
Code:
select * from table1 t1 JOIN table2 t2 ON t1.LG_CODE = t2.LG_CODE
I would suggest to convert the value of the columnlg_code to number first then make the join:
SELECT * FROM table1 t1
JOIN table2 t2 ON to_number(t1.LG_CODE) = to_number(t2.LG_CODE)
you can also use ltrim() on them:
SELECT * FROM table1 t1
JOIN table2 t2 ON LTRIM(t1.LG_CODE, '0') = LTRIM(t2.LG_CODE, '0');
but in newer versions of oracle SQL*PLUS it trims automatically.

Workaround for delete in hive with join conditions

So i'm trying to convert a SQL delete query to Hive one. Im using hive .12 version's which doesn't supports delete.
Below is the SQL query:
Delete from t1 c where exists(select 1 from t2 a where
a.emplid=c.employee_id and a.project_status='test')
Now i tried tried using NOT IN for the above query but due to some reasons we cannot use NOT IN in our queries.
below is Hive query i have written but i'm not sure as its not giving correct results. i'm pretty new to hive. can anyone help on this.
INSERT Overwrite table t1
select * from t1 c left outer join t2 a on (c.employee_id=a.employee_id)
where a.project_status= 'test'
and a.employee_id is null
Move project_status='test' condition to the subquery or into the on clause. Also you should select columns only from table c.
Example with filter in the subquery:
insert overwrite table t1
select c.*
from t1 c
left join (select employee_id
from t2
where project_status='test'
) a on (c.employee_id=a.employee_id)
where a.employee_id is null;
Example with additional condition in the ON:
insert overwrite table t1
select c.*
from t1 c
left join t2 a on (c.employee_id=a.employee_id and a.project_status='test')
where a.employee_id is null;

How do I insert values in a table using inner join?

I am trying to insert data into table1.col1 using following query.
INSERT INTO table1 t1( t1.col1)
SELECT t2.col1
FROM table2 t2
WHERE t1.col2= t2.col2;
Apparently, it wouldn't work(flawed logic maybe). How can I achieve similar results.
Let me know if I don't make sense.
INSERT INTO table1 (col1)
SELECT t2.col1
FROM table2 t2
INNER JOIN table1 t1 on t1.col2= t2.col2;
INSERT INTO table1 (col1)
SELECT t2.col1
FROM table1 t1,table2 t2
WHERE t1.col2= t2.col2;
It seems you need a MERGE statement with MATCHED(for already existing rows in table1) and
NOT MATCHED(for rows not inserted into table1 yet) options :
MERGE INTO table1 t1
USING table2 t2
ON (t1.col2 = t2.col2)
WHEN MATCHED THEN
UPDATE SET t1.col1 = t2.col1
WHEN NOT MATCHED THEN
INSERT (col1,col2)
VALUES (t2.col1, t2.col2);
Demo
So, I was not looking to insert but to update...stupid question I know :)
This is what I was looking for.
update table1 t1 set t1.col1 = (select t2.col1 from table2 t2 where t1.col2 = t2.col2);

Spring-data-JPA : Join query best practices

I have two tables T1 and T2.
I have to fetch the record from Table T1 where anotherColumn is null in T2 or not exists in T2.
Table T1 Entity relation with T2
#OneToMany(mappedBy="t2")
private List<T2> t2s;
Table T2 Entity relation with T1
#ManyToOne
#JoinColumn(name="pId")
private T1 t1;
In the above scenario, it should return 2nd and 3rd records from Table T1.
#Query("select t1 from T1 t1 where NOT EXISTS (select t2 from T2 t2 where t1.id = t2.pId) OR EXISTS (select t2 from T2 t2 where t1.id = t2.pId OR t2.anotherColumn=null)")
public List<T2> findDisconnected();
Since I'm using inner subqueries it is taking more time.
Could please someone helps me,
1) How can I optimize the above query?
2) What is the best way to use join queries in Spring-data-jpa?
Is this what you are looking for
select * from T1 t1 full join T2 t2 on t1.id = t2.pId where t2.anotherColumn is NULL
here you are full joining two tables and fetching all the records which have a null value in another column.

Old SQL to the New. table by table joins

With our oracle Database/queries that are currently running i have come across some SQL where they have done a table by table join. Now I want to be able to understand this so could someone explain? I am a newbie to this.
SELECT *
FROM ra_customer_trx_all
WHERE customer_trx_id IN
(SELECT customer_trx_id
FROM AR_PAYMENT_SCHEDULES_ALL
WHERE payment_schedule_ID IN
(SELECT payment_schedule_ID
FROM AR_RECEIVABLE_APPLICATIONS_ALL
WHERE applied_customer_trx_id =
SELECT customer_trx_id FROM ra_customer_trx_all WHERE trx_number = '34054'));
1st:
select all TRX records from table ra_customer_trx_all where number = 34054
we are looking for customer_trx_id
select * from ra_customer_trx_all t4 where t4.trx_number = '34054'
2nd: select all records from payment_schedule table that have the IDs from step1
select * from AR_RECEIVABLE_APPLICATIONS_ALL t3 where t3.payment_schedule_ID = (prev select)
3rd: select all records from customer_trx_all table that have the IDs from step2
select * from AR_PAYMENT_SCHEDULES_ALL t2 where t3.customer_trx_id = (prev select)
4th
select * from ra_customer_trx_all t1 where t2.customer_trx_id = (prev select)
5:
summary:
if trx is transation
the logic is:
select all customer transaction records that have been scheduled to be paid via the RECEIVABLE_APPLICATIONS and transaction number is 34054
SELECT t1.*
FROM ra_customer_trx_all t1
inner join AR_PAYMENT_SCHEDULES_ALL t2 on t2.customer_trx_id = t1.customer_trx_id
inner join AR_RECEIVABLE_APPLICATIONS_ALL t3 on t3.payment_schedule_ID = t2.payment_schedule_ID
inner join ra_customer_trx_all t4 on t4.customer_trx_id = t3.applied_customer_trx_id
where t4.trx_number = '34054'
You can replace
select *
from tableA
where columnA in (select columnB
from tableB
where columnB1 in (select ...))
with
select *
from tableA, tableB
where tableA.columnA = tableB.columnB
and tableB.columnB1 in (select ...)
Apply this pattern sequentially to each subquery.
Short explanation: you open outer brackets after IN keyword, move table from inner FROM clause to outer, and add condition to WHERE clause: column before IN have to be equal to column in SELECT clause in subquery.

Resources