Using multiple select statement inside insert statement in Hive - shell

I'm new in Hive. I have three tables like this:
table1:
id;value
1;val1
2;val2
3;val3
table2
num;desc;refVal
1;desc;0
2;descd;0
3;desc;0
I want to create a new table3 that contains:
num;desc;refVal
1;desc;3
2;descd;3
3;desc;3
Where num and desc are columns from table2 and refVal is the max value of column id in table1
Can someone guide me to solve this?

First, you have to create an table to hold this.
CREATE TABLE my_new_table;
After that, you have to insert into this table, as showed here
INSERT INTO TABLE my_new_table
[PARTITION (partcol1=val1, partcol2=val2 ...)]
select_statement1;
In the select_statement1 you can use the same select you would normally use to join and select the columns you need.
For more informations, you can check here

Related

How to copy all constrains and data form one schema to another in oracle

I am using Toad for oracle 12c. I need to copy a table and data (40M) from one shcema to another (prod to test). However there is an unique key(not the PK for this table) called record_Id col which has something data like this 3.000*******19E15. About 2M rows has same numbers(I believe its because very large number) which are unique in prod. When I try to copy it violets the unique key of that col. I am using toad "export data to another schema" function to copy the data.
when I execute query in prod
select count(*) from table_name
OR
select count(distinct(record_id) from table_name
Both query gives the exact same numbers of data.
I don't have DBA permission. How do I copy all data without violating unique key of the table.
Thanks in advance!
You can use UPSERT for decisional INSERT or UPDATE or you may write small procedure for this.
you may consider to use NOT EXISTS, but your data is big and it might not be resource efficient.
insert into prod_tab
select * from other_tab t1 where NOT exists (
select 1 from prod_tab t2 where t1.id = t2.id
);
In Oracle you can use a MERGE query for that.
The following query proceeds as follows for each data row :
if the source record_id does not yet exist in the target table, a new record is inserted
else, the existing record is updated with source values
For the sake of the example, I assumed that there are two other columns in the table : column1 and column2.
MERGE INTO target_table t1
USING (SELECT * from source_table t2)
ON (t1.record_id = t2.record_id)
WHEN MATCHED THEN UPDATE SET
t1.column1 = t2.column1,
t1.column2 = t2.column2
WHEN NOT MATCHED THEN INSERT
(record_id, column1, column2) VALUES (t2.record_id, t2.column1, t2.column2)

Is there a way to prevent insertion of duplicate rows in Hive?

I have an ORC Table. I populate it using the data from some other table as follows:
INSERT INTO TABLE orc_table_name SELECT * FROM other_table_name
Is there any way I can prevent inserting of duplicate entries into the ORC Table?
you can use not in command See a general code below: it inserts records to the orc_table_name based on the fact that value1 from TABLE_1 was not inserted before.
INSERT INTO orc_table_name
(Value1, Value2)
SELECT t1.Value1,
t1.Value2
FROM TABLE_1 t1
WHERE t1.Value1 NOT IN (SELECT Value1 FROM orc_table_name)
INSERT INTO orc_table_name(field1,field2....fieldn)
select field1,field2... field(n-1),MIN(fieldn) as fieldn
from other_table_name
Group By field1,field2...field(n-1)

how to use one sql insert data to two table?

I have two table,and they are connected by one field : B_ID of table A & id of table B.
I want to use sql to insert data to this two table.
how to write the insert sql ?
1,id in table B is auto-increment.
2,in a stupid way,I can insert data to table B first,and then select the id from table B,then add the id to table A as message_id.
You cannot insert data to multiple tables in one SQL statement. Just insert data first to B table and then table A. You could use RETURNING statement to get ID value and get rid of additional select statement between inserts.
See: https://oracle-base.com/articles/misc/dml-returning-into-clause
Have you heard about AFTER INSERT trigger? I think it is what you are looking for.
Something like this might do what you want:
CREATE OR REPLACE TRIGGER TableB_after_insert
AFTER INSERT
ON TableB
FOR EACH ROW
DECLARE
v_id int;
BEGIN
/*
* 1. Select your id from TableB
* 2. Insert data to TableA
*/
END;
/

sql statement to select and insert one value from one table rest from outside the tables

I have two db2 tables, table one contains rows of constant data with id as one of the columns. second table contains id column which is foreign key to id column in first table and has 3 more varchar columns.
I am trying to insert rows into second table, where entry into id col is based on a where clause and the remaining columns get values from outside.
table1 has columns id, t1c1, t1c2, t1c3
table2 has columns id, t2c1, t2c2, t2c3
to insert into table2, I am trying this query:
insert into table2 values
(select id from table1 where t1c2 like 'xxx', 'abc1','abc2');
I know it is something basic I am missing here.
Please help correcting this query.

Hive multiple insert goes wrong with the DISTINCT select statement

I read this code from "Hadoop the Definitive Guide":
SELECT a.ad_id, a.campaign_id, a.account_id, b.user_id
FROM dim_ads a JOIN impression_logs b ON (b.ad_id = a.ad_id)
WHERE b.dateid = '2008-12-01') x
INSERT OVERWRITE DIRECTORY 'results_gby_adid'
SELECT x.ad_id, count(1), count(DISTINCT x.user_id) GROUP BY x.ad_id
INSERT OVERWRITE DIRECTORY 'results_gby_campaignid'
SELECT x.campaign_id, count(1), count(DISTINCT x.user_id) GROUP BY x.campaign_id
INSERT OVERWRITE DIRECTORY 'results_gby_accountid'
SELECT x.account_id, count(1), count(DISTINCT x.user_id) GROUP BY x.account_id;
but as my test, using several DISTINCT cannot get right results.
my hiveql as below:
CREATE TABLE IF NOT EXISTS a (logindate int, id int);
then
load local file to this table...
CREATE TABLE IF NOT EXISTS user (id INT) PARTITIONED BY (logindate INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
then
if inserting table separately:
INSERT OVERWRITE TABLE user PARTITION(logindate=20130120) SELECT DISTINCT(id) FROM a WHERE logindate=20130120;
INSERT OVERWRITE TABLE user PARTITION(logindate=20130121) SELECT DISTINCT(id) FROM a WHERE logindate=20130121;
the results are correct;
but if choosing the next multiple insert hql:
FROM a
INSERT OVERWRITE TABLE user PARTITION(logindate=20130120) SELECT DISTINCT(id) WHERE logindate=20130120
INSERT OVERWRITE TABLE user PARTITION(logindate=20130121) SELECT DISTINCT(id) WHERE logindate=20130121;
the results are not correct, both partitions have the same number of records, seems like select from DISTINCT(id) WHERE logindate=20130120 OR logindate=20130121
so is it a bug or did I write some wrong syntax?
DISTINCT has a bit of an odd history in the code as an alias to group by.
If there is a bug, then the version of hive you are using would be important to know since bugs are addressed in each release.
This might work:
FROM a
INSERT OVERWRITE TABLE user PARTITION(logindate=20130120) SELECT id WHERE logindate=20130120 GROUP BY id
INSERT OVERWRITE TABLE user PARTITION(logindate=20130121) SELECT id WHERE logindate=20130121 GROUP BY id;
if that doesn't work, this will definitely work...even though it isn't the approach you were attempting to use...
FROM (select distinct id, logindate from a where logindate in ('20130120','20130121')) subq_a
INSERT OVERWRITE TABLE user PARTITION(logindate=20130120) SELECT id WHERE logindate=20130120
INSERT OVERWRITE TABLE user PARTITION(logindate=20130120) SELECT id WHERE logindate=20130121;

Resources