Deleting very large table records where id not in another table - performance

I have one table values that have 80 million records. Another table values_history that has 250 million records.
I want to filter the values_history table and want to keep the only data for which id is preset in values table.
delete from values_history where id not in (select id from values);
This query takes such a long time that I have to abort the process.
Please some idea to speed up the process.
Can I delete the records in bunch like 1000000 at a time?

I have extracted out the required record and inserted into temp table .This took 2 hrs after that i dropped the table then again inserted extracted data back to the main table whole process took 4 hrs around that is fine for me.I have dropped foreign key and all other constraint before that..

Related

MySQL Workbench shows running query but query not in process list, eventually times out at 7200 seconds

Purpose: Remove duplicate records from large table.
My Process:
Create table 2 with 9 fields. No indexes. Same data_types per field as Table 1.
insert 9 fields, all records into table 2 from existing table 1
Table 1 contains 71+ Million rows and 232 columns and many duplicate records.
No joins. No Where Clause.
Table 1 contains several indexes.
8 fields are required to get unique records.
I'm trying to set up a process to de-dup large tables, using dense_rank partitioning to identify the most recently entered duplicate. Thus, those 8 required fields from Table 1 plus the auto-increment from Table 1 are loaded into Table 2.
Version: 10.5.17 MariaDB
The next steps would be:
Create new Table 3 identical to table 1 but with no indexes.
Load all data from Table 1 into Table 3, joining Table 1 to Table 2 on the auto-increment fields, where table 2.Dense_Rank field value = 1. This inserts ~17 Million unique records
Drop any existing Foreign_Keys related to Table 1
Truncate Table 1
Insert all records from Table 3 into Table 1
Nullify columns in related tables where the foreign key values in Table 1 no longer exist
re-create Foreign Keys that had been dropped.
Creating a test instance of an existing system I accomplish everything I need to once - only the first time. But If I then drop table 2 before refreshing Table 1 as outlined immediately above, re-create and try to reload, workbench shows query running until 7200 second timeout.
While the insert into Table 2 query is running, opening a second instance of Workbench and selecting count of records in table 2 after 15 minutes gives me the 71+ Million records I'm looking for, but Workbench continues running until timeout.
The query shows up in Show Processlist for those 15 minutes, but disappears around the 15 minute mark - presumably once all records are loaded.
I have tried running with timeouts set to 0 as well as 86,400 seconds, indicating no read timeout and 24 hours timeout, respectively, but query still times out at 7200.0xx seconds, or 2 hours, every time.
The exact error message I get is: Error Code: 2013. Lost connection to MySQL server during query 7200.125 sec
I have tried running the insert statement with COMMIT and without.
This is being done in a Test Instance set up for this development where I am the only user, and only a single table is in use during the insert process.
Finding one idea on line I ran the following suggested query to identify locked tables but got the error message that the table does not exist:
SELECT TRX_ID, TRX_REQUESTED_LOCK_ID, TRX_MYSQL_THREAD_ID, TRX_QUERY
FROM INNODB_TRX
and, of course, with only a single table being called by a single user in the system nothing should be locked.
As noted above I can fulfill the entire process a single time but, when trying to run up to the point of stopping just before truncating Table 1 so I can start over, I am consistently unable to succeed. Table 2 never gets released after being loaded again.
The reason it is important for me test a second iteration is that once this process is successful it will be applied to several database instances that were not just set up for the purpose of testing this process, and if this only works on a newly created database instance that has had no other processing performed it may not be dependable.

Update 62Millions Records in Oracle

I have to update 62 millions of records in production database. Its a simple update statement.
Its a pretty big table.
This is the total number of records count in that table = 1251797271.
Can I approach bulk collect method for the updating the records?
Please let me know what is the best approach..
update statement looks like this,
UPDATE CASHFLOW_HIST
SET EFF_DT = '03-JAN-2019'
WHERE EFF_DT= '01-JAN-2019'
Note: I'm not looking for this method,
create a new table ,then drop the original table and rename the new table to original table instead of updating a table with millions of records.

PLSQL Daily record of changes on table, then select from day

Oracle PL SQL question: One table should be archived day by day. Table counts about 50.000 records. But only few records during a day are changed. Second table (destination/history table) has one additional field - import_date. Two days = 100.000 records. Should be 50.000 + feq records with informations about changes during a day.
I need one simple solution to copy data from source table to destination like a "LOG" - only changes are copied/registered. But I should have possibility to check dataset of source table from given day.
Is there such mechanism like MERGE or something like that?
Normally you'd have a day_table and a master_table. All records are loaded from the day_table into master and only master is manipulated with the day table used to store the raw data.
You could add a new column to master such as a date_modified and have the app update this field when a record changes, or a flag used to indicate it's changed.
Another way to do this is to have an active/latest flag. Instead of changing the record it is duplicated with a flag set to indicate this is a better/old record. This might be easier for comparison
e.g. select * from master_table where record = 'abcd'
This would show 2 rows - the original loaded at 1pm and the modified active one changed at 2pm.
There's no need to have another table, you could base a view on this flag then
e.g. CHANGED_RECORDS_VIEW = select * from master_table where flag = 'Y'
Once i faced a similar issue. And please find the solution below.
Tables we had :
Master table always has records it and keeps adding up.
One backup table to store all the master records on daily basis.
Solution:
From morning to evening records are inserted and updated into the master table. The concept of finding out the new records was the timestamp. Whenever a new record is inserted/updated then corresponding timestamp is added and kept.
At night, we had created a job schedule to run a procedure (Create_Job-> please check oracle documentations for further learning) which runs exactly at 10:00 pm to bulk collect all the records available in master table based on today's date and insert into the backup table.
This scenrio which i have explained to you will help you. Please check out the concept of Job scheduling which will help you. Thank you .

Creating a record history table - How do I create a record on creation?

For a project, I want to have a "History" table for my records. I have two tables for this (example) system:
RECORDS
ID
NAME
CREATE_DATE
RECORDS_HISTORY
ID
RECORDS_ID
LOG_DATE
LOG_TYPE
MESSAGE
When I insert a record into RECORDS, how can I automatically create an associated entry in RECORDS_HISTORY where RECORDS_ID is equal to the newly inserted ID in RECORDS?
I currently have a sequence on the ID in RECORDS to automatically increment when a new row is inserted, but I am unsure how to prepopulate a record in RECORDS_HISTORY that will look like this for each newly created (not updated) record.
INSERT INTO RECORDS_HISTORY (RECORDS_ID, LOG_DATE, LOG_TYPE, MESSAGE) VALUES (<records.id>, sysdate(), 'CREATED', 'Record created')
How can I create this associated _HISTORY record on creation?
You didn't mention the DB you are working with. I assume its Oracle. The most obvious answer is: Use a "On Insert Trigger". You even can get back the ID (sequence) from the insert statement into table RECORDS. Disadvantages of this solution: Triggers are kinda "hidden" code, can slow down processes on massive inserts and you consume like double diskspace on storing data partially redundant. What if RECORDS got updated or deleted? Can that happen and do you have to take care of that as well? The big question is: What is your goal?
There are proved historisation concepts around. Have a look at this: https://en.wikipedia.org/wiki/Slowly_changing_dimension

ORACLE Table Loading Speed

This is a new issue that I haven't run into before.
I have a table that at one point contained over 100k records, it's an event log for a dev environment.
it took up to 10 seconds to load the table (simply clicking on it to view the data in the table).
I removed all but 30 rows and it still takes 7 seconds to load.
I'm using Toad, and it gives me a dialog box that says "Statement Processing..."
Any ideas?
The following are some select statements and how long they took
select * from log; 21 rows in 10 secs
select * from log where id = 120000; 1 row in 1 msec
select * from log where user = 35000; 9 rows in 7 sec
The id is the pk, there is no index on the user field.
I have a table view that contains all of the fields sitting ontop of this table as well and it runs just as slow.
If you issue a "select * from event_log_table", then you are scanning the entire table with a full table scan. It has to scan through all allocated segments to see if there are rows in there. If your table once contained over 100K rows, then it has allocated at least the amount of space to be able to hold those 100K+ rows. Please see: http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/schema.htm#sthref2100
Now if you delete rows, the space is still allocated to this table, and Oracle still has to scan all space. It works like a high water mark.
To reduce the high water mark, you can issue a TRUNCATE TABLE command, which resets the high water mark. But then you'll lose ALL rows.
And there is an option to shrink the space in the table. You can read about it and its preconditions here:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_3001.htm#sthref5117
Regards,
Rob.
I would better understand this if you started off with a 100M records table. But just in case, try running Oracle stats. If this doesn't help, drop and recreate indices on that table.

Resources