Bulk Copy from one server to another - insert

I've one situation where I need to copy part of the data from one server to another. The table schema are exactly same. I need to move partial data from the source, which may or may not be available in the destination table. The solution I'm thinking is, use bcp to export data to a text(or .dat) file and then take that file to the destination as both are not accessible at the same time (Different network), then import the data to the destination. There are some conditions I need to satisfy:
I need to export only a list of data from the table, not whole. My client is going to give me IDs which needs to be moved from source to destination. I've around 3000 records in the master table, and same in the child tables too. What I expect is, only 300 records to be moved.
If the record exists in the destination, the client is going to instruct as whether to ignore or overwrite case to case. 90% of the time, we need to ignore the records without overwriting, but log the records in a log file.
Please help me with the best approach. I thought of using BCP with query option to filter the data, but while importing, how do I bypass inserting the existing records? How do I overwrite, if that is needed?

Unfortunately BCPing into a table is an all or nothing deal, you can't select rows to bring in.
What I'd do is . . .
Create a table on the source
database, this will store the ID's
of the rows you need to move. You
can now BCP out the rows that you
need.
On the destination database, create
a new Work In Progress table, and
BCP the rows in there.
Once in there you can write a script
that will decide whether or not a
WIP row goes into the destination
table, or not.
Hope this helps.
Update
By work in progress (WIP) tables I don't mean #temp tables, you can't BCP into a temp table (at least I'd be very sprprised if you could).
I mean a table you'd create with the same structure of the destination table, bcp into that, script the WIP rows to the destination table then drop the WIP table.
You haven't said what RDBMS you're using, assuming SQL Server, something like the following (untried code) . . .
-- following creates new table with identical schema to destination table
select * into WIP_Destination from Destination
where 1 = 0
-- BCP in the rows
BULK INSERT WIP_Destination from 'BcpFileName.dat'
-- Insert new rows into Destination
insert into Destination
Select * from WIP_Destination
where not id in (select id from Destination)
-- Update existing rows in destination
Update Destination
set field1 = w.field1,
field2 = w.field2,
field3 = w.field3,
. . .
from Destination d inner join WIP_Destination w on d.id = w.id
Drop table WIP_Destination
Update 2
OK, so you can insert into temporary tables, I just tried it (I didn't have time the other day, sorry).
On the problem of the master/detail records (and we're now moving off the subject of the original question, if I were you I'd open a new question for this topic, you'll get more answers than just mine)
You can write an SP that will step through the new rows to add.
So, you're looping through the rows in your temp table (these rows have the original id on them from the source database), insert that row into the Destination table, use SCOPE_IDENTITY to get the id of the newly inserted row. Now you have the old Id and the new ID, you can create an insert statement that will insert statement for the detail rows like . . .
insert into Destination_Detail
select #newId, field1, field2 . . . from #temp_Destination_Detail
where Id = #oldId
Hope this helps [if it has helped you are allowed to upvote this answer, even if it's not the answer you're going to select :)]
Thanks
BW

Related

PLSQL Daily record of changes on table, then select from day

Oracle PL SQL question: One table should be archived day by day. Table counts about 50.000 records. But only few records during a day are changed. Second table (destination/history table) has one additional field - import_date. Two days = 100.000 records. Should be 50.000 + feq records with informations about changes during a day.
I need one simple solution to copy data from source table to destination like a "LOG" - only changes are copied/registered. But I should have possibility to check dataset of source table from given day.
Is there such mechanism like MERGE or something like that?
Normally you'd have a day_table and a master_table. All records are loaded from the day_table into master and only master is manipulated with the day table used to store the raw data.
You could add a new column to master such as a date_modified and have the app update this field when a record changes, or a flag used to indicate it's changed.
Another way to do this is to have an active/latest flag. Instead of changing the record it is duplicated with a flag set to indicate this is a better/old record. This might be easier for comparison
e.g. select * from master_table where record = 'abcd'
This would show 2 rows - the original loaded at 1pm and the modified active one changed at 2pm.
There's no need to have another table, you could base a view on this flag then
e.g. CHANGED_RECORDS_VIEW = select * from master_table where flag = 'Y'
Once i faced a similar issue. And please find the solution below.
Tables we had :
Master table always has records it and keeps adding up.
One backup table to store all the master records on daily basis.
Solution:
From morning to evening records are inserted and updated into the master table. The concept of finding out the new records was the timestamp. Whenever a new record is inserted/updated then corresponding timestamp is added and kept.
At night, we had created a job schedule to run a procedure (Create_Job-> please check oracle documentations for further learning) which runs exactly at 10:00 pm to bulk collect all the records available in master table based on today's date and insert into the backup table.
This scenrio which i have explained to you will help you. Please check out the concept of Job scheduling which will help you. Thank you .

How to compare table data structure

How to compare table data structure.
1. Any table added or deleted.
2. Any column in the tables added or deleted.
So my job is to verify if any table or columns are added/deleted on 1st of every month.
My plan is to run a sql query and take a copy of entire list of tables and it's data type only (NO DATA) and save it in txt file or something and use it as base line, and next month run the same sql query and get the results and compare the file. is it possible? please help with the sql query which can do this job.
This query will give you a list of all tables and their columns for a given user (just replace ABCD in this query for the user you have to audit and providing you have access to all that users tables this will work).
SELECT table_name,
column_name
FROM all_tab_columns
WHERE owner = 'ABCD'
ORDER
BY table_name,
column_id;
This answers your question but I have to agree with a_horse_with_no_name that is not a good way implement change control, most notably because the changes have already happened.
This query is very basic and doesn't give you all the information you'd need to see if a column has changed (or any information about other objects types etc), but then you only asked about additions and deletions of tables and columns and you can compare the output of this script to previous outputs to find the answer to your allotted task.

Hive delete duplicate records

In hive, how can I delete duplicate records ? Below is my case,
First, I load data from product table to products_rcfileformat. There are 25 rows of records on product table
FROM products INSERT OVERWRITE TABLE products_rcfileformat
SELECT *;
Second, I load data from product table to products_rcfileformat. There are 25 rows of records on product table. But this time I'm NOT using OVERWRITE clause
FROM products INSERT INTO TABLE products_rcfileformat
SELECT *;
When I query the data it give me total rows = 50 which are right
Check from hdfs, it seem hdfs make another copy of file xxx_copy_1 instead of append to 000000_0
Now I want to remove those records that read from xxx_copy_1. How can I achieve this in hive command ? If I'm not mistaken, i can remove xxx_copy_1 file by using hdfs dfs -rm command follow by rerun insert overwrite command. But I want to know whether this can it be done by using hive command example like delete statement?
Partition your data such that the rows (use window function row_number) you want to drop are in a partition unto themselves. You can then drop the partition without impacting the rest of your table. This is a fairly sustainable model, even if your dataset grows quite large.
detail about Partition .
www.tutorialspoint.com/hive/hive_partitioning.htm
Check from hdfs, it seem hdfs make another copy of file xxx_copy_1
instead of append to 000000_0
The reason is hdfs is read only, not editable, as hive warehouse files (or whatever may be the location) that is still in hdfs, so it has to create a second file.
Now I want to remove those records that read from xxx_copy_1. How can
I achieve this in hive command ?
Please check this post - Removing DUPLICATE rows in hive based on columns.
Let me know if you are satisfied with the answer there. I have another method, which removes duplicate entries but may not be in the way you want.

Archiving data in oracle

i am trying to Archive my data from one table to another.Please find below my requirement.
I have a table A and another table B.
I need to find all the records from A which is less than a particular date
After identifying the records ,i need to move the Records to table B
Once the data is moved to Table B,I need to delete those records from table A.
I am planning to use a Stored procedure with the number of days to archive as parameter.
Now i need to check for the errors while inserting in the table A and should not delete those records in table B and also if the records is successfully inserted in Table A and if fails in the deletion of Table B.Then i need to rollback the record inserted in Table A.
I need to archive on a daily basis and there will at least a million records to archive.
I started with the coding by using the forall and save exceptions but struck with the logic .
Can anyone help me with this logic.
First of all, I'm doubt if such 'archieving' is a good idea. It seems like transferring soup from one plate to another using teaspoon. There are better decisions exist for almost every task, say, using partitioning and maybe exchange partition.
But if you immovably wish to do this, you should write something like this:
procedure Move_Many_Records is
begin
savepoint MMR;
insert /*+ APPEND */ into TARGET (fields)
select fields from SOURCE where {condition};
delete from SOURCE
where id in (select id from TARGET);
savepoint MMR;
exception
when others then
rollback to savepoint MMR;
My_Alerts.Shit_Happens('Failed to move records!');
raise;
end;

Handling Delete followed by Insert in Informatica Designer

I am working on Informatica PowerCenter Designer 8.1.1
I have a source table with three columns, which are
PORT_ID
ISSUE_ID
MKT_VAL
I need to sync the data from my source table to a target table which contains the same three tables on a different database.
There's a 1:n relationship between PORT_ID and ISSUE_ID
While doing this data Sync, I have to do a DELETE followed by INSERT, reason being, the number of ISSUE_ID mapped to a PORT_ID can change. Let's say that intially, the data was like this in Source and Target:
PORT_ID ISSUE_ID
1 A
1 B
1 C
The data in source gets changed to:
PORT_ID ISSUE_ID
1 A
1 B
1 D
Due to this, during my sync, I have to first delete all rows mapped to PORT_ID = 1 and then Insert the incoming records.
I am not able to figure out how I can get this done in mapping designer. Can someone give me some inputs?
The most common way this is done is using a pre-query. If port_id and issue_id are unique within the table, you could use....
delete from tgt_table
where (port_id, issue_id)
not in (select port_id, issue_id
from src_table
);
commit;
Second Way:
If these two columns can be added as a key in your mapping, then you can "check" the treat target rows as "insert, update, delete" (all three) to make sure the target data is the same as the source data. In most cases, however, business rules are more complex than this, so this feature is rarely used.
Another common implementation is to "Mark rows for delete" based on a lookup on the target table.
Source -> Lookup (target_table) ->
exp (flag to see if the value exists) ->
mark for delete ->
Delete using update_strategy_transformation
Write a simple stored proc which does the following:
1) Delete statement (Given by Rajesh)
(delete from tgt_table
where (port_id, issue_id)
not in (select port_id, issue_id
from src_table
);
commit;
)
2) Insert Statement
Insert into tgt_table where not in src_table
commit;
3) Use dummy as source and target in mapping, and call the stored proc using STORED_PROCEDURE_TRANSFORMATION.
You can create dynamic lookup on the source table.
What you can do to create dynamic lookup,
-> go to the lookup properties,
-> check the dynamic lookup cache box
-> and then check the insert else update box.
As soon as you do that one new port NewLookupRow will appear in the ports tab.
You can use this port to check whether record is insert or update with following corresponding values
0 is no change
1 is insert
2 is update
Now you can update the target accordingly.
Hope this helps..
Cheers.
I dont think we need a dynamic lookup here, as the requirement is not having dups in the source..
Why dont you do a regular lookup and update the records using Update strategy instead of delete and insert?

Resources