I have to update 62 millions of records in production database. Its a simple update statement.
Its a pretty big table.
This is the total number of records count in that table = 1251797271.
Can I approach bulk collect method for the updating the records?
Please let me know what is the best approach..
update statement looks like this,
UPDATE CASHFLOW_HIST
SET EFF_DT = '03-JAN-2019'
WHERE EFF_DT= '01-JAN-2019'
Note: I'm not looking for this method,
create a new table ,then drop the original table and rename the new table to original table instead of updating a table with millions of records.
Related
I have queries that take an existing large table and build tables off of them for reporting. The problem is that the source tables are 60-80MM+ records and it takes a long time to recreate. I'd like to be able to identify which records are new so I can build just add the new records to the reporting tables.
To me, the best way to identify this is to have an identity column. Is there any significant cost to creating this and adding it to the table?
Separately, is it possible to create a materialized view that takes data from one of these tables but add a sequence as part of the materialized view? That is, something like
create materialized view some_materialized_view as
select somesequence.nextval, source_table.*
from source_table?
You can add a sequence based column to your table, but as Gary suggests I wouldn't do that.
The task you are about to solve is so common that other solutions have been already implemented.
The first built-in option that comes to mind is the system change number SCN, a kind of Oracle internal clock. By default, tables are set up to record the SCN of the whole (usually 8K) block, containing usually many rows, but you can set a table to keep a record of the SCN that changed every row. Then you can track the columns that are new or change and have not been copied to your reporting tables.
CREATE TABLE t (c1 NUMBER) ROWDEPENDENCIES;
INSERT INTO t VALUES (1);
COMMIT;
SELECT c1, ora_rowscn FROM t;
Secondly, I would think of adding a date column. With 60-80 mio rows I wouldn't do this with ALTER TABLE xxx ADD (d DATE DEFAULT SYSDATE), but with rename, create as select, drop:
CREATE TABLE t AS SELECT * FROM all_objects;
RENAME t TO told;
CREATE TABLE t AS SELECT sysdate AS d, told.* FROM told;
ALTER TABLE t MODIFY d DATE DEFAULT SYSDATE;
DROP TABLE told;
Thirdly, I would read up on materialized views. I never had the chance to use this a work, but in theory, you should be able to set up a materialized view log on your 80 m table that records changes and updates dependent materialized views.
And forthly, I'd look into partitioning your large table on the (newly introduced) date column, so that identifying the new rows will become faster. That sadly depends on your version and Oracle license, though.
I have a situation where I am doing a data fix from back up.
Table MAIN TABLE (PrimaryKey, Value) and Table BACKUP(PRIMARYKEY, Value).
I want to find all the records in MAIN Table with value=0 , then go fetch the value for the same primary key from table BACKUP and update the MAIN Table.
There are 20 millions records with value=0
Updates and fetch are both done using primary key
Questions
Stored procedure? Script?
Fetch and update are done on the same table? Any concerns?
How much time do you think it will take- ball park figure. how to test?
Solution I was thinking :
Open a cursor on Table Main with my condition(value=0) and then go fetch value from BACKUP and then update. Commit every 10K updates in a loop
Any thoughts?
You can give a try to Oracle's MERGE.
Make sure you make tests in test tables before applying the query to main tables.
MERGE INTO main_table m
USING backup_table b
ON (m.primary_key = b.primary_key)
WHEN MATCHED THEN
UPDATE SET m.value = b.value
WHERE m.value = 0;
UPDATE MAIN_TABLE
SET main.value=back.value
FROM MAIN_TABLE as main
JOIN BACKUP_TABLE as back ON main.pk=back.pk
WHERE main.value=0
Here is where I found how to do this:
https://chartio.com/resources/tutorials/how-to-update-from-select-in-sql-server/
So, i am a begginer on ORACLE and realy would apreciate your help.
I have 3 tables, EMPLOEES, PERSONAL_DATA and RECORDS. I want to create an UPDATE TRIGGER that when fires takes the old values of EMPLOOES finds the personal data of that updating emplooe on the PERSONAL_DATA table with the OLD id and insert all of that data( the OLD of EMPLOOES and the one fetched from PERSONAL_DATA) into the RECORDS table. I been triying to use the SELECT sentence to fetch information from the table PERSONAL_DATA, but the compiler throws me an error.
I have a situation like to update a column(all rows) in a table having 150 million records.
Creation of duplicate table with updates and dropping of previous table is the best way but there is no available disk space to hold the duplicate table.
So how to perform the update in less time? Partitions are there on the table.
I am using oracle 12c
The cleanest approach is NOT updating the table, but creating a new table with the new column of updated rows. For instance, let's say I needed to update a column called old_value with the max of some value, instead of updating the old_table one does:
create new_table as select foo, bar, max(old_value) from old_table;
drop table old_table;
rename new_table as old_table.
If you need even more speed, you can do this creation using a parallel query with nologging thereby generating very little redo and no undo logs. More details can be ascertained here: https://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:6407993912330
I need to update the some tables in my application from some other warehouse tables which would be updating weekly or biweekly. I should update my tables based on those. And these are having foreign keys in another tables. So I cannot just truncate the table and reinsert the whole data every time. So I have to take the delta and update accordingly based on few primary key columns which doesn't change. Need some inputs on how to implement this approach.
My approach:
Check the last updated time of those tables, views.
If it is most recent then compare each row based on the primary key in my table and warehouse table.
update each column if it is different.
Do nothing if there is no change in columns.
insert if there is a new record.
My Question:
How do I implement this? Writing a PL/SQL code is it a good and efficient way? as the expected number of records are around 800K.
Please provide any sample code or links.
I would go for Pl/Sql and bulk collect forall method. You can use minus in your cursor in order to reduce data size and calculating difference.
You can check this site for more information about bulk collect, forall and engines: http://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52plsql-1709862.html
There are many parts to your question above and I will answer as best I can:
While it is possible to disable referencing foreign keys, truncate the table, repopulate the table with the updated data then reenable the foreign keys, given your requirements described above I don't believe truncating the table each time to be optimal
Yes, in principle PL/SQL is a good way to achieve what you are wanting to
achieve as this is too complex to deal with in native SQL and PL/SQL is an efficient alternative
Conceptually, the approach I would take is something like as follows:
Initial set up:
create a sequence called activity_seq
Add an "activity_id" column of type number to your source tables with a unique constraint
Add a trigger to the source table/s setting activity_id = activity_seq.nextval for each insert / update of a table row
create some kind of master table to hold the "last processed activity id" value
Then bi/weekly:
retrieve the value of "last processed activity id" from the master
table
select all rows in the source table/s having activity_id value > "last processed activity id" value
iterate through the selected source rows and update the target if a match is found based on whatever your match criterion is, or if
no match is found then insert a new row into the target (I assume
there is no delete as you do not mention it)
on completion, update the master table "last processed activity id" to the greatest value of activity_id for the source rows
processed in step 3 above.
(please note that, depending on your environment and the number of rows processed, the above process may need to be split and repeated over a number of transactions)
I hope this proves helpful