Read and update an Oracle table with 20 millions records - oracle

I have a situation where I am doing a data fix from back up.
Table MAIN TABLE (PrimaryKey, Value) and Table BACKUP(PRIMARYKEY, Value).
I want to find all the records in MAIN Table with value=0 , then go fetch the value for the same primary key from table BACKUP and update the MAIN Table.
There are 20 millions records with value=0
Updates and fetch are both done using primary key
Questions
Stored procedure? Script?
Fetch and update are done on the same table? Any concerns?
How much time do you think it will take- ball park figure. how to test?
Solution I was thinking :
Open a cursor on Table Main with my condition(value=0) and then go fetch value from BACKUP and then update. Commit every 10K updates in a loop
Any thoughts?

You can give a try to Oracle's MERGE.
Make sure you make tests in test tables before applying the query to main tables.
MERGE INTO main_table m
USING backup_table b
ON (m.primary_key = b.primary_key)
WHEN MATCHED THEN
UPDATE SET m.value = b.value
WHERE m.value = 0;

UPDATE MAIN_TABLE
SET main.value=back.value
FROM MAIN_TABLE as main
JOIN BACKUP_TABLE as back ON main.pk=back.pk
WHERE main.value=0
Here is where I found how to do this:
https://chartio.com/resources/tutorials/how-to-update-from-select-in-sql-server/

Related

Update 62Millions Records in Oracle

I have to update 62 millions of records in production database. Its a simple update statement.
Its a pretty big table.
This is the total number of records count in that table = 1251797271.
Can I approach bulk collect method for the updating the records?
Please let me know what is the best approach..
update statement looks like this,
UPDATE CASHFLOW_HIST
SET EFF_DT = '03-JAN-2019'
WHERE EFF_DT= '01-JAN-2019'
Note: I'm not looking for this method,
create a new table ,then drop the original table and rename the new table to original table instead of updating a table with millions of records.

Oracle Create Table as Select * from Another_Table same table space

I didn't design the DB so don't judge me on this.
I have a log table that is receiving A LOT of entries. I only need to keep a day or so on this this log table. My initial thought was:
In a single transaction:
1. rename the log table
2. create the original log table from the renamed log table
3. commit the trx and life goes on
The second time this happens I drop the renamed table and do it all over again. This will run as an Oracle job once a day.
The original question:
Would anyone know if I specify a table space name in table #1 like so:
create table "my_user"."first_table" (pkid number, full_name varchar2(50)) nologging tablespace "my_custom_tablespace";
Then I do something like:
create table second_table as select * from first_table where 1=2 -- because I only want the structure
Will my second_table be in the same table_space?
Thanks in advance for your help.
If you are on Enterprise Edition with partitioning, then a simpler solution is to go with an interval partitioned table, with one partition per day. Then truncate the partitions when you don't need them.
If not, then go with two tables, a synonym to point to the 'current' one that is being inserted into, and a view that selects from a union of the two tables. The nightly job would truncate the 'old' table and switch the synonym to make it the 'new' one.

Optimizing a delete... where query with rownum

I'm working with an application that has a large amount of outdated data clogging up a table in my databank. Ideally, I'd want to delete all entries in the table whose reference date is too old:
delete outdatedTable where referenceDate < :deletionCutoffDate
If this statement were to be run, it would take ages to complete, so I'd rather break it up into chunks with the following:
delete outdatedTable where referenceData < :deletionCutoffDate and rownum <= 10000
In testing, this works suprisingly slowly. The following query, however, runs dramatically faster:
delete outdatedTable where rownum <= 10000
I've been reading through multiple blogs and similar questions on StackOverflow, but I haven't yet found a straightforward description of how/whether using rownum affects the Oracle optimizer when there are other Where clauses in the query. In my case, it seems to me as if Oracle checks
referenceData < :deletionCutoffDate
on every single row, executes a massive Select on all matching rows, and only then filters out the top 10000 rows to return. Is this in fact the case? If so, is there any clever way to make Oracle stop checking the Where clause as soon as it's found enough matching rows?
How about a different approach without so much DML on the table. As a permanent solution for future you could go for table partitioning.
Create a new table with required partition(s).
Move ONLY the required rows from your existing table to the new partitioned table.
Once the new table is populated, add the required constraints and indexes.
Drop the old table.
In future, you would just need to DROP the old partitions.
CTAS(create table as select) is another way, however, if you want to have a new table with partition, you would have to go for exchange partition concept.
First of all, you should read about SQL statement's execution plan and learn how to explain in. It will help you to find answers on such questions.
Generally, one single delete is more effective than several chunked. It's main disadvantage is extremal using of undo tablespace.
If you wish to delete most rows of table, much faster way usially a trick:
create table new_table as select * from old_table where date >= :date_limit;
drop table old_table;
rename table new_table to old_table;
... recreate indexes and other stuff ...
If you wish to do it more than once, partitioning is a much better way. If table partitioned by date, you can select actual date quickly and you can drop partion with outdated data in milliseconds.
At last, paritioning if a way to dismiss 'deleting outdated records' at all. Sometimes we need old data, and it's sad if we delete it by own hands. With paritioning you can archive outdated partitions outside of the database, but connects them when you need to access old data.
This is an old request, but I'd like to show another approach (also using partitions).
Depending on what you consider old, you could create corresponding partitions (optimally exactly two; one current, one old; but you could just as well make more), e.g.:
PARTITION BY LIST ( mod(referenceDate,2) )
(
PARTITION year_odd VALUES (1),
PARTITION year_even VALUES (0)
);
This could as well be months (Jan, Feb, ... Dec), decades (XX0X, XX1X, ... XX9X), half years (first_half, second_half), etc. Anything circular.
Then whenever you want to get rid of old data, truncate:
ALTER TABLE mytable TRUNCATE PARTITION year_even;
delete from your_table
where PK not in
(select PK from your_table where rounum<=...) -- these records you want to leave

T-SQL Performance dilemma when looping through a Big table (details inside)

Let's say I have a Big and a Bigger table.
I need to cycle through the Big table, that is indexed but not sequential (since it is a filter of a sequentially indexed Bigger table).
For this example, let's say I needed to cycle through about 20000 rows.
Should I do 20000 of these
set #currentID = (select min(ID) from myData where ID > #currentID)
or
Creating a (big) temporary sequentially indexed table (copy of the Big table) and do 20000 of
#Row = #Row + 1
?
I imagine that doing 20000 filters of the Bigger table just to fetch the next ID is heavy, but so must be filling a big (Big sized) temporary table just to add a dummy identity column.
Is the solution somewhere else?
For example, if I could loop through the results of the select statement (the filter of the Bigger table that originates "table" (actually a resultset) Big) without needing to create temporary tables, it would be ideal, but I seem to be unable to add something like an IDENTITY(1,1) dummy column to the results.
Thanks!
You may want to consider finding out how to do your work set based instead of RBAR. With that said, for very big tables, you may want to not make a temp table so that you are sure that you have live data if you suspect that the proc may run for a while in production. If your proc fails, you'll be able to pick up where you left off. If you use a temp table then if your proc crashes, then you could lose data that hasn't been completed yet.
You need to provide more information on what your end result is, It is only very rarely necessary to do row-by-row processing (and almost always the worst possible choice from a performance perspective). This article will get you started on how to do many tasks in a set-based manner:
http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them
If you just want a temp table with an identity, here are two methods:
create table #temp ( test varchar (10) , id int identity)
insert #temp (test)
select test from mytable
select test, identity(int) as id into #temp from mytable
I think a join will serve your purposes better.
SELECT BIG.*, BIGGER.*, -- Add additional calcs here involving BIG and BIGGER.
FROM TableBig BIG (NOLOCK)
JOIN TableBigger BIGGER (NOLOCK)
ON BIG.ID = BIGGER.ID
This will limit the set you are working with to. But again it comes down to the specifics of your solution.
Remember too, you can do bulk inserts and bulk updates in this manner too.

Bulk Copy from one server to another

I've one situation where I need to copy part of the data from one server to another. The table schema are exactly same. I need to move partial data from the source, which may or may not be available in the destination table. The solution I'm thinking is, use bcp to export data to a text(or .dat) file and then take that file to the destination as both are not accessible at the same time (Different network), then import the data to the destination. There are some conditions I need to satisfy:
I need to export only a list of data from the table, not whole. My client is going to give me IDs which needs to be moved from source to destination. I've around 3000 records in the master table, and same in the child tables too. What I expect is, only 300 records to be moved.
If the record exists in the destination, the client is going to instruct as whether to ignore or overwrite case to case. 90% of the time, we need to ignore the records without overwriting, but log the records in a log file.
Please help me with the best approach. I thought of using BCP with query option to filter the data, but while importing, how do I bypass inserting the existing records? How do I overwrite, if that is needed?
Unfortunately BCPing into a table is an all or nothing deal, you can't select rows to bring in.
What I'd do is . . .
Create a table on the source
database, this will store the ID's
of the rows you need to move. You
can now BCP out the rows that you
need.
On the destination database, create
a new Work In Progress table, and
BCP the rows in there.
Once in there you can write a script
that will decide whether or not a
WIP row goes into the destination
table, or not.
Hope this helps.
Update
By work in progress (WIP) tables I don't mean #temp tables, you can't BCP into a temp table (at least I'd be very sprprised if you could).
I mean a table you'd create with the same structure of the destination table, bcp into that, script the WIP rows to the destination table then drop the WIP table.
You haven't said what RDBMS you're using, assuming SQL Server, something like the following (untried code) . . .
-- following creates new table with identical schema to destination table
select * into WIP_Destination from Destination
where 1 = 0
-- BCP in the rows
BULK INSERT WIP_Destination from 'BcpFileName.dat'
-- Insert new rows into Destination
insert into Destination
Select * from WIP_Destination
where not id in (select id from Destination)
-- Update existing rows in destination
Update Destination
set field1 = w.field1,
field2 = w.field2,
field3 = w.field3,
. . .
from Destination d inner join WIP_Destination w on d.id = w.id
Drop table WIP_Destination
Update 2
OK, so you can insert into temporary tables, I just tried it (I didn't have time the other day, sorry).
On the problem of the master/detail records (and we're now moving off the subject of the original question, if I were you I'd open a new question for this topic, you'll get more answers than just mine)
You can write an SP that will step through the new rows to add.
So, you're looping through the rows in your temp table (these rows have the original id on them from the source database), insert that row into the Destination table, use SCOPE_IDENTITY to get the id of the newly inserted row. Now you have the old Id and the new ID, you can create an insert statement that will insert statement for the detail rows like . . .
insert into Destination_Detail
select #newId, field1, field2 . . . from #temp_Destination_Detail
where Id = #oldId
Hope this helps [if it has helped you are allowed to upvote this answer, even if it's not the answer you're going to select :)]
Thanks
BW

Resources