How to copy all constrains and data form one schema to another in oracle - oracle

I am using Toad for oracle 12c. I need to copy a table and data (40M) from one shcema to another (prod to test). However there is an unique key(not the PK for this table) called record_Id col which has something data like this 3.000*******19E15. About 2M rows has same numbers(I believe its because very large number) which are unique in prod. When I try to copy it violets the unique key of that col. I am using toad "export data to another schema" function to copy the data.
when I execute query in prod
select count(*) from table_name
OR
select count(distinct(record_id) from table_name
Both query gives the exact same numbers of data.
I don't have DBA permission. How do I copy all data without violating unique key of the table.
Thanks in advance!

You can use UPSERT for decisional INSERT or UPDATE or you may write small procedure for this.
you may consider to use NOT EXISTS, but your data is big and it might not be resource efficient.
insert into prod_tab
select * from other_tab t1 where NOT exists (
select 1 from prod_tab t2 where t1.id = t2.id
);

In Oracle you can use a MERGE query for that.
The following query proceeds as follows for each data row :
if the source record_id does not yet exist in the target table, a new record is inserted
else, the existing record is updated with source values
For the sake of the example, I assumed that there are two other columns in the table : column1 and column2.
MERGE INTO target_table t1
USING (SELECT * from source_table t2)
ON (t1.record_id = t2.record_id)
WHEN MATCHED THEN UPDATE SET
t1.column1 = t2.column1,
t1.column2 = t2.column2
WHEN NOT MATCHED THEN INSERT
(record_id, column1, column2) VALUES (t2.record_id, t2.column1, t2.column2)

Related

Using multiple select statement inside insert statement in Hive

I'm new in Hive. I have three tables like this:
table1:
id;value
1;val1
2;val2
3;val3
table2
num;desc;refVal
1;desc;0
2;descd;0
3;desc;0
I want to create a new table3 that contains:
num;desc;refVal
1;desc;3
2;descd;3
3;desc;3
Where num and desc are columns from table2 and refVal is the max value of column id in table1
Can someone guide me to solve this?
First, you have to create an table to hold this.
CREATE TABLE my_new_table;
After that, you have to insert into this table, as showed here
INSERT INTO TABLE my_new_table
[PARTITION (partcol1=val1, partcol2=val2 ...)]
select_statement1;
In the select_statement1 you can use the same select you would normally use to join and select the columns you need.
For more informations, you can check here

Is there a way to prevent insertion of duplicate rows in Hive?

I have an ORC Table. I populate it using the data from some other table as follows:
INSERT INTO TABLE orc_table_name SELECT * FROM other_table_name
Is there any way I can prevent inserting of duplicate entries into the ORC Table?
you can use not in command See a general code below: it inserts records to the orc_table_name based on the fact that value1 from TABLE_1 was not inserted before.
INSERT INTO orc_table_name
(Value1, Value2)
SELECT t1.Value1,
t1.Value2
FROM TABLE_1 t1
WHERE t1.Value1 NOT IN (SELECT Value1 FROM orc_table_name)
INSERT INTO orc_table_name(field1,field2....fieldn)
select field1,field2... field(n-1),MIN(fieldn) as fieldn
from other_table_name
Group By field1,field2...field(n-1)

Hive Timestamp aggregation

I have two hive tables, in which one table is updating an hourly basic by Java API team (they are calling and storing it into hive table1). And now I have to aggregate the latest data and store it into another table called table2 (data which are loaded newly,because old data have been aggregated and stored). For that I have used the query below:
set maxtime = select max(lastactivitytimestamp) from table2;
insert into table2 select * from table1 where lastactivitytimestamp > unix_timestamp('${hivevar:maxtime}');
I am not getting any result. But when I give the timestamp value manually I am getting data, like below:
insert into table2 select * from table1 where lastactivitytimestamp > unix_timestamp('2014-08-18 15:23:26.754');
Is it possible to pass dynamic values in unix_timestamp?
Try removing the upper commas from the unix_timestamp() function, like this:
insert into table2 select * from table1 where lastactivitytimestamp > unix_timestamp(${hivevar:maxtime});

oracle find difference between 2 tables

I have 2 tables that are the same structure. One is a temp one and the other is a prod one. The entire data set gets loaded each time and sometimes this dataset will have deleted records from the prior datasets. I load the dataset into temp table first and if any records were deleted I want to deleted them from the prod table also.
So how can I find the records that exist in prod but not in temp? I tried outer join but it doesn't seem to be working. It's returning all the records from the table in the left or right depending on doing left or right outer join.
I then also want to delete those records in the prod table.
One way would be to use the MINUS operator
SELECT * FROM table1
MINUS
SELECT * FROM table2
will show all the rows in table1 that do not have an exact match in table2 (you can obviously specify a smaller column list if you are only interested in determining whether a particular key exists in both tables).
Another would be to use a NOT EXISTS
SELECT *
FROM table1 t1
WHERE NOT EXISTS( SELECT 1
FROM table2 t2
WHERE t1.some_key = t2.some_key )
How about something like:
SELECT * FROM ProdTable WHERE ID NOT IN
(select ID from TempTable);
It'd work the same as a DELETE statement as well:
DELETE FROM ProdTable WHERE ID NOT IN
(select ID from TempTable);
MINUS can work here
The following statement combines results with the MINUS operator, which returns only rows returned by the first query but not by the second:
SELECT * FROM prod
MINUS
SELECT * FROM temp;
Minus will only work if the table structure is same

How to duplicate all data in a table except for a single column that should be changed

I have a question regarding a unified insert query against tables with different data
structures (Oracle). Let me elaborate with an example:
tb_customers (
id NUMBER(3), name VARCHAR2(40), archive_id NUMBER(3)
)
tb_suppliers (
id NUMBER(3), name VARCHAR2(40), contact VARCHAR2(40), xxx, xxx,
archive_id NUMBER(3)
)
The only column that is present in all tables is [archive_id]. The plan is to create a new archive of the dataset by copying (duplicating) all records to a different database partition and incrementing the archive_id for those records accordingly. [archive_id] is always part of the primary key.
My problem is with select statements to do the actual duplication of the data. Because the columns are variable, I am struggling to come up with a unified select statement that will copy the data and update the archive_id.
One solution (that works), is to iterate over all the tables in a stored procedure and do a:
CREATE TABLE temp as (SELECT * from ORIGINAL_TABLE);
UPDATE temp SET archive_id=something;
INSERT INTO ORIGINAL_TABLE (select * from temp);
DROP TABLE temp;
I do not like this solution very much as the DDL commands muck up all restore points.
Does anyone else have any solution?
How about creating a global temporary table for each base table?
create global temporary table tb_customers$ as select * from tb_customers;
create global temporary table tb_suppliers$ as select * from tb_suppliers;
You don't need to create and drop these each time, just leave them as-is.
You're archive process is then a single transaction...
insert into tb_customers$ as select * from tb_customers;
update tb_customers$ set archive_id = :v_new_archive_id;
insert into tb_customers select * from tb_customers$;
insert into tb_suppliers$ as select * from tb_suppliers;
update tb_suppliers$ set archive_id = :v_new_archive_id;
insert into tb_suppliers select * from tb_suppliers$;
commit; -- this will clear the global temporary tables
Hope this helps.
I would suggest not having a single sql statement for all tables and just use and insert.
insert into tb_customers_2
select id, name, 'new_archive_id' from tb_customers;
insert into tb_suppliers_2
select id, name, contact, xxx, xxx, 'new_archive_id' from tb_suppliers;
Or if you really need a single sql statement for all of them at least precreate all the temp tables (as temp tables) and leave them in place for next time. Then just use dynamic sql to refer to the temp table.
insert into ORIGINAL_TABLE_TEMP (SELECT * from ORIGINAL_TABLE);
UPDATE ORIGINAL_TABLE_TEMP SET archive_id=something;
INSERT INTO NEW_TABLE (select * from ORIGINAL_TABLE_TEMP);

Resources