Compare two tables in Hive without apply JOINS - hadoop

I have 2 tables, TableA and TableB. Both having same set of columns C1, C2. Now need to compare both the table are having same DATA or NOT. How do you do without use JOIN. I tried MINUS operator ie.,
SELECT * FROM TableA
MINUS
SELECT * FROM TableB
But this is not supported in HIVE. May be impala has this SET operator?
Please suggest how to do without JOINS. Thanks.

You can try with
SELECT *
FROM T1
WHERE NOT EXISTS (SELECT * FROM T2 WHERE T1.X = T2.Y)
WHERE T1.X = T2.Y are the "key"

create table student
(
id integer,
subject string,
total_score integer
);
insert into student
(id, subject, total_score)
values
(1, 'math', 90);
insert into student
(id, subject, total_score)
values
(1, 'science', 100);
insert into student
(id, subject, total_score)
values
(2, 'math', 90);
insert into student
(id, subject, total_score)
values
(2, 'science', 80);
---------- MINUS ---------
select id,subject,
total_score
from ( select max (id) id,
subject,
total_score,
count (*)
from (
select *
from student
where id = 1
union all
select *
from student
where id = 2
) merged_data
group by subject, total_score
having count (*) = 1
) minus_data
where id is not null;
id subject total_score
2 science 80
1 science 100

Related

ORDER BY BASED ON COLUMN

I have two tables,PRODUCTS AND LOOKUP TABLES.Now i want to order the KEY Column in products table based on KEY column value in LOOKUP TABLE.
CREATE TABLE PRODUCTS
(
ID INT,
KEY VARCHAR(50)
)
INSERT INTO PRODUCTS
VALUES (1, 'EGHS'), (2, 'PFE'), (3, 'EGHS'),
(4, 'PFE'), (5, 'ABC')
CREATE TABLE LOOKUP (F_KEY VARCHAR(50))
INSERT INTO LOOKUP VALUES('PFE,EGHS,ABC')
Now I want to order the records in PRODUCTS table based on KEY (PFE,EGHS,ABC) values in LOOKUP table.
Example output:
PRODUCTS
ID F_KEY
-----------
2 PFE
4 PFE
1 EGHS
3 EGHS
5 ABC
I use this query, but it is not working
SELECT *
FROM PRODUCTS
ORDER BY (SELECT F_KEY FROM LOOKUP)
You can split the string using XML. You first need to convert the string to XML and replace the comma with start and end XML tags.
Once done, you can assign an incrementing number using ROW_NUMBER() like following.
;WITH cte
AS (SELECT dt,
Row_number()
OVER(
ORDER BY (SELECT 1)) RN
FROM (SELECT Cast('<X>' + Replace(F.f_key, ',', '</X><X>')
+ '</X>' AS XML) AS xmlfilter
FROM [lookup] F)F1
CROSS apply (SELECT fdata.d.value('.', 'varchar(500)') AS DT
FROM f1.xmlfilter.nodes('X') AS fdata(d)) O)
SELECT P.*
FROM products P
LEFT JOIN cte C
ON C.dt = P.[key]
ORDER BY C.rn
Online Demo
Output:
ID F_KEY
-----------
2 PFE
4 PFE
1 EGHS
3 EGHS
5 ABC
You may do it like this:
SELECT ID, [KEY] FROM PRODUCTS
ORDER BY
CASE [KEY]
WHEN 'PFE' THEN 1
WHEN 'EGHS' THEN 2
WHEN 'ABC' THEN 3
END

Insert data into one table from another table avoiding duplicates

I've got a table as follows
Table1
ID Name Tag
-----------------
1 N1 2.1
2 N2 3.5
3 N1 3.5
4 N3 8.1
I create a new table Table2 with ID and Name (unique constraint) and I want to insert Table1's contents into Table2 avoiding duplicates, in the sense that I want only 1, 2 and 4 from Table1 in Table2.
I've tried this but it doesn't seem to work and I get the unique constraint error (ORACLE SQL)
INSERT INTO TABLE2 (ID, NAME)
SELECT ID, NAME
FROM TABLE1
WHERE NAME NOT IN (SELECT NAME FROM TABLE2);
Please can someone point me in the right direction?
Sorry for not making myself clear. Table2 is a brand new table. I want the first values inserted, the following duplicates should be ignored. So in my case, N1, N2 get inserted, N1 is dupe so it is ignored, N3 is inserted
OK - from your description, I understand table t2 is currently empty, and you want to copy the rows where id is in (1, 2, 4) from table t1 to table t2.
Why your code fails:
You seem to believe that the condition is applied to the first row in t1, it passes so it is inserted into t2, then the condition is applied to the second row in t1 (using what is already inserted in t2), etc. - and you don't understand why there is any attempt to insert ALL the rows from t1 into t2. Why doesn't the third row fail the WHERE clause?
Good question! The reason is that operations are done on a SET basis. The WHERE condition uses table t2 AS IT WAS before the INSERT operation began. So for ALL rows, the WHERE clause compares to an empty table t2.
How to fix this... Decide which id you want to add when there are duplicate names. For example, one way to get the result you said you wanted is to select MIN(id) for each name. Moreover, you still want to check if the name exists in t2 already (since you may do this again in the future, when t2 is already partially populated).
insert into t2 ( id, name )
select min(id), name
from t1
where name not in (select name from t2)
group by name
;
You can try it bother....!
Insert into tb2(Field1, Field2)
SELECT Field1, Field2
FROM tb1
WHERE NOT EXISTS (SELECT Field1 FROM tb1) ;
This is how I understood the question:
SQL> create table table2
2 (id number,
3 name varchar2(2),
4 tag number,
5 constraint pk_t2 primary key (id, name)
6 );
Table created.
SQL>
SQL> insert into table2 (id, name, tag)
2 with test (id, name, tag) as
3 (select 1, 'N1', 2.1 from dual union
4 select 2, 'N2', 3.5 from dual union
5 select 3, 'N1', 3.5 from dual union
6 select 4, 'N3', 8.1 from dual
7 )
8 select min(id), name, max(tag)
9 from test
10 group by name;
3 rows created.
SQL>
SQL> select * from table2 order by id;
ID NA TAG
---------- -- ----------
1 N1 3,5
2 N2 3,5
4 N3 8,1
SQL>
When we need to unique any two or more column we have to create unique index.
Run this query
ALTER TABLE TABLE2 ADD UNIQUE unique_index( id, name);
and then
INSERT INTO TABLE2 (id,name,tag) VALUES(1, "N1", 3.5 )
ON DUPLICATE KEY UPDATE tag=3.5
this will also help to update new tag
Try to check if the id and name from Table1 is doesn't exist in Table2, if then insert.
If the unique constraint on TABLE2 is a composite key then run this:
INSERT INTO TABLE2 (ID, NAME)
SELECT A.ID, A.NAME
FROM TABLE1 A
WHERE NOT EXISTS (SELECT NULL FROM TABLE2 B WHERE A.ID=B.ID AND A.NAME=B.NAME);
If there are two unique constraints; one on the id, and the other on the name then run this instead:
INSERT INTO TABLE2 (ID, NAME)
SELECT A.ID, A.NAME
FROM TABLE1 A
WHERE NOT EXISTS (SELECT NULL FROM TABLE2 B WHERE A.ID=B.ID OR A.NAME=B.NAME);
ORACLE, in case you need to get values from 2 different tables.
below example,i use an increment case.
INSERT INTO TABLE1
(INDEX, REMARKS, NAME, AGE)
(SELECT (SELECT colescs(MAX(INDEX),0) FROM TABLE1)+1,
'any remarks',
t2.NAME, t2,age from TABLE2 t2 where t2.name = 'apple')
explanation
match below numbers (1)-(1), (2)-(2) ...
INSERT INTO TABLE1
(INDEX, //index increment (1)
REMARKS, //hard code (2)
NAME, //from table2 (3)
AGE) //from table2 (4)
(SELECT // this part is to get values from another table
(SELECT colescs(MAX(INDEX),0) FROM TABLE1)+1, //increment (1)
'any remarks', //hard code value (2)
t2.NAME, //from table2 (3)
t2,age //from table2 (4)
from TABLE2 t2 where t2.name = 'apple') //condition for table2

Concat Results of 2 Select Queries into 1 Column (oracle)

Im trying to insert a record into my table. But there is 1 column in which I want to get concatenated results of 2 select statements. Like the 2 statements will fetch their records and concatenate to form 1 value so that it can be inserted into the column.
insert into ABC (Name,City,Age)
Values ('John',(
(Select City from TableA where ID=1)concat(Select City from TableA where ID=2)),'22')
Or it can be comma separated but I am not getting what to use here.
Try this one:
INSERT INTO ABC (Name, City, Age)
VALUES ('John',
(
(SELECT City FROM TableA WHERE ID = 1) ||
(SELECT City FROM TableA WHERE ID = 2)
),
'22');
But ensure ... WHERE ID = 1 and ....WHERE ID = 2 return one row.
Using a cross join to select from the two tables produces a nice clear statement:
insert into ABC (Name,City,Age)
select 'John', concat(t1.city, t2.city), 22
from TableA t1
cross join TableA t2
where t1.ID = 1
and t2.ID = 2
/
Use CONCAT() or CONCAT_WS() functions for this (reference)
insert into ABC (Name,City,Age) Values (
'John',
CONCAT_WS(' ', (Select City from TableA where ID=1), (Select City from TableA where ID=2)),
'22'
)

SQL delete rows not in another table

I'm looking for a good SQL approach (Oracle database) to fulfill the next requirements:
Delete rows from Table A that are not present in Table B.
Both tables have identical structure
Some fields are nullable
Amount of columns and rows is huge (more 100k rows and 20-30 columns to compare)
Every single field of every single row needs to be compared from Table A against table B.
Such requirement is owing to a process that must run every day as changes will come from Table B.
In other words: Table A Minus Table B => Delete the records from the Table A
delete from Table A
where (field1, field2, field3) in
(select field1, field2, field3
from Table A
minus
select field1, field2, field3
from Table B);
It's very important to mention that a normal MINUS within DELETE clause fails as does not take the nulls on nullable fields into consideration (unknown result for oracle, then no match).
I also tried EXISTS with success, but I have to use NVL function to replace the nulls with dummy values, which I don't want it as I cannot guarantee that the value replaced in NVL will not come as a valid value in the field.
Does anybody know a way to accomplish such thing? Please remember performance and nullable fields as "a must".
Thanks ever
decode finds sameness (even if both values are null):
decode( field1, field2, 1, 0 ) = 1
To delete rows in table1 not found in table2:
delete table1 t
where t.rowid in (select t1.rowid
from table1 t1
left outer join table2 t2
on decode(t1.field1, t2.field1, 1, 0) = 1
and decode(t1.field2, t2.field2, 1, 0) = 1
and decode(t1.field3, t2.field3, 1, 0) = 1
/* ... */
where t2.rowid is null /* no matching row found */
)
to use existing indexes
...
left outer join table2 t2
on (t1.index_field1=t2.index_field1 or
t1.index_field1 is null and t2.index_field1 is null)
and ...
Use a left outer join and test for null in your where clause
delete a
from a
left outer join b on a.x = b.x
where b.x is null
Have you considered ORALCE SQL MERGE statement?
Use Bulk operation for huge number of records. Performance wise it will be faster.
And use join between two table to get rows to be delete. Nullable columns can be compared with some default value.
Also, if you want Table A to be similar as Table B, why don't you truncate table A and then insert data from table b
Assuming you the same PK field available on each table...(Having a PK or some other unique key is critical for this.)
create table table_a (id number, name varchar2(25), dob date);
insert into table_a values (1, 'bob', to_date('01-01-1978','MM-DD-YYYY'));
insert into table_a values (2, 'steve', null);
insert into table_a values (3, 'joe', to_date('05-22-1989','MM-DD-YYYY'));
insert into table_a values (4, null, null);
insert into table_a values (5, 'susan', to_date('08-08-2005','MM-DD-YYYY'));
insert into table_a values (6, 'juan', to_date('11-17-2001', 'MM-DD-YYYY'));
create table table_b (id number, name varchar2(25), dob date);
insert into table_b values (1, 'bob', to_date('01-01-1978','MM-DD-YYYY'));
insert into table_b values (2, 'steve',to_date('10-14-1992','MM-DD-YYYY'));
insert into table_b values (3, null, to_date('05-22-1989','MM-DD-YYYY'));
insert into table_b values (4, 'mary', to_date('12-08-2012','MM-DD-YYYY'));
insert into table_b values (5, null, null);
commit;
-- confirm minus is working
select id, name, dob
from table_a
minus
select id, name, dob
from table_b;
-- from the minus, re-query to just get the key, then delete by key
delete table_a where id in (
select id from (
select id, name, dob
from table_a
minus
select id, name, dob
from table_b)
);
commit;
select * from table_a;
But, if at some point in time, tableA is to be reset to the same as tableB, why not, as another answer suggested, truncate tableA and select all from tableB.
100K is not huge. I can do ~100K truncate and insert on my laptop instance in less than 1 second.
> DELETE FROM purchase WHERE clientcode NOT IN (
> SELECT clientcode FROM client );
This deletes the rows from the purchase table whose clientcode are not in the client table. The clientcode of purchase table references the clientcode of client table.
DELETE FROM TABLE1 WHERE FIELD1 NOT IN (SELECT CLIENT1 FROM TABLE2);

How to bind horizontal values of a table to a vertical values of another table in oracle database

i have 2 tables .
The columns start with attributes are change based on department. the description of attributes are here
My requirement is to get the values of each attributes with its primary key based on the department as table bellow.
Honestly i am stuck on this problem in my program. I have no permission to change the tables and there is no common unique key column.i would appreciate if anyone could provide me a suggestion.
with a as (
select a.*, row_number() over (partition by department order by attributeID) rn
from attributes a),
e as (
select employeeId, department, attribute1, 1 rn from employees union all
select employeeId, department, attribute2, 2 rn from employees union all
select employeeId, department, attribute3, 3 rn from employees
)
select e.employeeId, a.attributeid, e.department, a.attribute, a.meaning,
e.attribute1 as value
from e join a on a.department=e.department and a.rn=e.rn
order by e.employeeId, a.attributeid
Test data and output:
create table employees (employeeID number(3), name varchar2(10), department varchar2(5), age number(3), attribute1 varchar2(10), attribute2 varchar2(10), attribute3 varchar2(10));
insert into employees values (1, 'john', 'IT', 22, 'attr1val1', 'attr2val2', null);
insert into employees values (2, 'jane', 'HR', 32, 'attr1val3', 'attr2val4', 'attr3val5');
insert into employees values (3, 'joe', 'HR', 23, 'attr1val6', 'attr2val7', 'attr3val8');
insert into employees values (4, 'jack', 'IT', 45, 'attr1val9', 'attr2val10', null);
create table attributes (attributeID number(3), department varchar2(10), attribute varchar2(10), meaning varchar2(10));
insert into attributes values (1, 'IT', 'attribute1', 'laptoptype');
insert into attributes values (2, 'IT', 'attribute2', 'networkloc');
insert into attributes values (3, 'HR', 'attribute1', 'location');
insert into attributes values (4, 'HR', 'attribute2', 'position');
insert into attributes values (5, 'HR', 'attribute3', 'allocation');
EMPLOYEEID ATTRIBUTEID DEPARTMENT ATTRIBUTE MEANING VALUE
---------- ----------- ---------- ---------- ---------- ----------
1 1 IT attribute1 laptoptype attr1val1
1 2 IT attribute2 networkloc attr2val2
2 3 HR attribute1 location attr1val3
2 4 HR attribute2 position attr2val4
2 5 HR attribute3 allocation attr3val5
3 3 HR attribute1 location attr1val6
3 4 HR attribute2 position attr2val7
3 5 HR attribute3 allocation attr3val8
4 1 IT attribute1 laptoptype attr1val9
4 2 IT attribute2 networkloc attr2val10
Edit: Explanation
In answer I used with
clause just to divide solution into readable steps. You can move them into from clause of main query if it is
more comfortable for you. Anyway: subquery a reads data from table attributes and adds number for rows,
so for each department they are allways numbered from 1. I used row_number() for that. Subquery e unions (all) required attributes and numbers
them accordingly. Numbers generated in both subqueries are then used in main join: a.department=e.department and a.rn=e.rn.
Alternative 1 - if you are using Oracle 11g you could use the unpivot. See what is generated by subquery, and how it is joined with attributes table:
with e as (
select employeeId, name, department, attribute, value from employees
unpivot (value for attribute in ("ATTRIBUTE1", "ATTRIBUTE2", "ATTRIBUTE3"))
)
select e.employeeId, a.attributeid, e.department, a.attribute,
a.meaning, e.value
from e join attributes a on a.department=e.department
and lower(a.attribute)=lower(e.attribute)
order by e.employeeId, a.attributeid;
Alternative 2 - with hierarchical subquery generator (subquery r), realised by connect by which simple creates numbers from 1, 2, 3 which are next joined with employees and proper attribute
is attached as value in case clause. Rest is made in similiar way like in original answer.
with a as (
select a.*, row_number() over (partition by department order by attributeID) rn
from attributes a),
r as (select level rn from dual connect by level<=3),
e as (
select employeeId, department, rn,
case when r.rn = 1 then attribute1
when r.rn = 2 then attribute2
when r.rn = 3 then attribute3
end value
from employees cross join r
)
select e.employeeId, a.attributeid, e.department, a.attribute,
a.meaning, e.value
from e join a on a.department=e.department and a.rn=e.rn
order by e.employeeId, a.attributeid
All three versions gave me the same output. I also tested first option on similiar table with 100k rows and get output in few seconds (for 5 attributes). Please test all solutions and try to understand them. If you can use unpivot version I would prefer this.
Sorry for delayed explanation and any language mistakes.
The WITH clause was added with Oracle 9.2 and should do the trick. For the other attributes just add more sub queries where the filter is att.attribute = 'attribute2' or 'Attribute3'...
WITH e AS
(SELECT emp.employee_ID, emp.department, emp.attribute1
FROM employee emp),
a AS (SELECT att.attribute_id, att.attribute, att.meaning
FROM attribute_TYPE att
WHERE att.attribute = 'attribute1')a
SELECT e.employeeid, att.attributeid, e.department, a.attribute,
a.meaning e.attribute1
FROM e JOIN a ON e.department = a.department

Resources