I'm trying to isolate duplicates in a 500MB database and have tried two ways to do it. One creating a new table and grouping:
CREATE TABLE test_table as
SELECT * FROM items WHERE 1 GROUP BY title;
But it's been running for an hour and in MySQL Admin it says the status is Locked.
The other way I tried was to delete duplicates with this:
DELETE bad_rows.*
from items as bad_rows
inner join (
select post_title, MIN(id) as min_id
from items
group by title
having count(*) > 1
) as good_rows on good_rows.post_title = bad_rows.post_title;
..and this has been running for 24hours now, Admin telling me it's Sending data...
Do you think either or these queries are actually still running? How can I find out if it's hung? (with Apple OS X 10.5.7)
You can do this:
alter ignore table items add unique index(title);
This will add a unique index and at the same time remove any duplicates, which will prevent any future duplicates from occurring. Make sure you do a backup before running this command.
Related
Could you recommed approaches that would allow to make the MERGE opearation in SQL to work faster?
I believe that the question is all about knowledge and experience and should not be considered as opinion based, since anything which would make the operation faster is definitely appropriate for the question and the faster the operation will become, the better the answer is.
In my particular case I have approximately 1.7 million records, which I am fetching in a recurring job and I use the records to update the existing records. In order to lock the real table (it is [LegalContractors]) as little as possible, I am using a temporary table (it is [LegalContractorTemps]) into which I add all the records from non SQL (but C#) code and after that I run the MERGE.
Here is what I am trying:
DELETE FROM [dbo].[LegalContractorTemps] WHERE [Code] IS NULL;
DELETE FROM [dbo].[LegalContractorTemps]
WHERE [Id] IN (
SELECT [Id]
FROM [dbo].[LegalContractorTemps] [Temp]
JOIN (
SELECT [Code], [Status], MAX([Id]) as [MaxId]
FROM [dbo].[LegalContractorTemps]
GROUP BY [Code], [Status]
HAVING COUNT([Id]) > 1
) [TempGroup]
ON ([Temp].[Code] = [TempGroup].[Code] AND [Temp].[Status] = [TempGroup].[Status] AND [MaxId] != [Id])
);
CREATE UNIQUE INDEX [CodeStatus]
ON [dbo].[LegalContractorTemps] ([Code], [Status]);
SELECT GETDATE() AS [beginTime];
MERGE [dbo].[LegalContractors] AS TblTarget
USING [dbo].[LegalContractorTemps] AS TblSource
ON (TblSource.[Code] = TblTarget.[Code] AND TblSource.[Status] = TblTarget.[Status])
WHEN NOT MATCHED BY TARGET THEN
INSERT ([Code], [ShortName], [Name], [LegalAddress], [Status], [LastModified])
VALUES (TblSource.[Code], TblSource.[ShortName], TblSource.[Name], TblSource.[LegalAddress], TblSource.[Status], GETDATE())
WHEN MATCHED AND
(TblTarget.[ShortName] != TblSource.[ShortName] OR
TblTarget.[Name] != TblSource.[Name] OR
TblTarget.[LegalAddress] != TblSource.[LegalAddress]) THEN
UPDATE SET
TblTarget.[ShortName] = TblSource.[ShortName],
TblTarget.[Name] = TblSource.[Name],
TblTarget.[LegalAddress] = TblSource.[LegalAddress],
TblTarget.[LastModified] = GETDATE()
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
SELECT GETDATE() AS [endTime];
DROP INDEX [CodeStatus] ON [dbo].[LegalContractorTemps];
Right now the code shown above runs approximately 2 minutes.
I found this answer, but I was not able to apply it to my case, because I need the WHEN NOT MATCHED clause and I will have to perform a full scan anyway (whether or nor I will use the MERGE).
I would consider doing a modified flush and fill, rather than doing a MERGE at all.
The method I've had the most success with uses partition switching. You build three identical tables; the main table that your users pull from, a staging table that you use for applying CRUD operations, and a holding table that you'll only use during the transition period after your updates.
This will require a little re-tooling to shift your LastModified logic right into the CRUD operations you're performing during your updates.
Then, after the staging table is ready for prime time, truncate yesterday's copy of the holding table. Next, switch the data from the main table to the now-empty holding table. Switch the data from staging to main. Probably wrap all of that in an explicit transaction.
Boom. Your table is up-to-date. And you have a back up copy of yesterday's data in the holding table, just in case.
Tons of additional detail in these articles:
Comparison: Switching Tables vs. sp_rename
Why You Should Switch in Staging Tables Instead of Renaming Them
I need to update a table once in a period, however periods are either 4 or 5 weeks long.
In order to schedule a SQL query to run this process, I have been attempting to use the create table statement as a result of a successful when clause.
Is this possible as I am not getting any joy and cannot find anything that states one way or the other.
e.g.
select
case when 1=1 then
(create table db.case_test_1 as
select 'Y' as test)
end as test
I got a SQLite database (First project with SQLite) where I need to delete a bulk of records at once, this is like 14.000 records. The quere is as follows (I changed the names for better read) :
delete from table_1 where table_2_id in (
select id from table_2 where table_3_id in (
select id from table_3
where (deleted = 1 or
table_4_id in (select id from table_4 where deleted = 1))));
This query takes like 8 minutes to delete. But when I do
select * from table_1 where table_2_id in (
select id from table_2 where table_3_id in (
select id from table_3
where (deleted = 1 or
table_4_id in (select id from table_4 where deleted = 1))));
it gives me a result in 3 seconds.
I tried using a transaction, cache size, journal mode, but I do not get it to work to get a better performance. What am I missing?
I had the same problem. Solution was to divide it to many smaller chunks, and do it again and again until no more rows are affected (sqlite3_changes() returns zero).
Of course this way the operation is not completed sooner, but the table is not locked for too long continuously.
Hope this helps someone.
It was 30MB and also turned of the virusscanner, but without any result. So I tried multiple things, I copied the database, delete all foreign keys and tried again, and it was much faster. So then I put an index on all foreign keys and it also deleted fast. So this was solution, but I do not know why it then select fast and deletes super slow. But hope this helps anyone!
Do you have an index on any of the columns involved? If so, consider dropping it for a large delete and then rebuilding it. If you don't, try adding one.
I have a script in python to delete entries in sqLite DB, lfl is a list of files that I want to delete from DB, in statement speeds process a lot:
while len(lfl)>0:
print ("Deleting entries in DB: ",len(lfl))
sql="""delete from md5t where file in ('%s')"""%('\',\''.join(tuple(lfl[:500])))
cursor.execute(sql)
db.commit()
del lfl[:500]
I was trying to get the count from a table with millions of entries. My query looks somewhat like this:
Select count(*)
from Users
where status = 'A' and office_id = '000111' and user_type = 'C'
Status can be A or C, User Type can be C or R.
Status, Office_id and User_type are Strings
The result has around 10 million rows, and its taking a lot of time. I just want the total count.
Would appreciate if anyone could tell me why its taking this much time, and workaround if any.
Do let me know in case of any more details required.
The database engine is Oracle 11g
Edit: I Added index for all three columnns. Still theres no improvement. Also tried the below query, but it always returns the total count in the table without checking the conditions.
SELECT COUNT(office_id_key)
FROM Users
WHERE EXISTS (SELECT * FROM Users WHERE status = 'A' AND office_id = '000111' AND user_type = 'C')
Why not just simply create indexes on the table on age and place this way your search will be faster then simply scanning the entire table for these values.
CREATE INDEX age_index ON Employee(age);
CREATE INDEX place_index ON Employee(place);
This should speed up the process.
AMENDED BASED ON QUERY CHANGE
CREATE INDEX status_index ON Users(status);
CREATE INDEX office_id_index ON Users(office_id);
CREATE INDEX user_type_index ON Users(user_type);
You'll want to create the following multi-column index on the Users table to improve the query:
(office_id, status, user_type)
The database can use a "covering" index with COUNT(*). Create the index with the columns in that order, due to cardinality.
After adding the indexes, I think changing where to where exists and a subquery may help as well.
Edit2: removed exists as it was returning all valid, usually the subquery has multiple joins, but I guess the case with one table returns all true. I read that count is optimized to act similar to exists when it has only one table and no where clause, so I treat the results as a table. Hopefully, this will have the same quick results.
select count(1) from
(select 1 from Employee where age = '25' and place = 'bricksgate')
Edit: When you use 'where exists' the db server doesn't load your data into memory and also takes advantage of the indexes because you will be reading values from the indexes not doing costly table lookups. You may also want to change count(*) to count(place) - that way it will limit the fields to an indexed field as well.
In your original query, your data was doing table lookups and then loading them into memory just to be counted.
count(1) works faster than count(*)
I am getting
ORA-30926: unable to get a stable set of rows in the source tables
in the following query:
MERGE INTO table_1 a
USING
(SELECT a.ROWID row_id, 'Y'
FROM table_1 a ,table_2 b ,table_3 c
WHERE a.mbr = c.mbr
AND b.head = c.head
AND b.type_of_action <> '6') src
ON ( a.ROWID = src.row_id )
WHEN MATCHED THEN UPDATE SET in_correct = 'Y';
I've ran table_1 it has data and also I've ran the inside query (src) which also has data.
Why would this error come and how can it be resolved?
This is usually caused by duplicates in the query specified in USING clause. This probably means that TABLE_A is a parent table and the same ROWID is returned several times.
You could quickly solve the problem by using a DISTINCT in your query (in fact, if 'Y' is a constant value you don't even need to put it in the query).
Assuming your query is correct (don't know your tables) you could do something like this:
MERGE INTO table_1 a
USING
(SELECT distinct ta.ROWID row_id
FROM table_1 a ,table_2 b ,table_3 c
WHERE a.mbr = c.mbr
AND b.head = c.head
AND b.type_of_action <> '6') src
ON ( a.ROWID = src.row_id )
WHEN MATCHED THEN UPDATE SET in_correct = 'Y';
You're probably trying to to update the same row of the target table multiple times. I just encountered the very same problem in a merge statement I developed. Make sure your update does not touch the same record more than once in the execution of the merge.
A further clarification to the use of DISTINCT to resolve error ORA-30926 in the general case:
You need to ensure that the set of data specified by the USING() clause has no duplicate values of the join columns, i.e. the columns in the ON() clause.
In OP's example where the USING clause only selects a key, it was sufficient to add DISTINCT to the USING clause. However, in the general case the USING clause may select a combination of key columns to match on and attribute columns to be used in the UPDATE ... SET clause. Therefore in the general case, adding DISTINCT to the USING clause will still allow different update rows for the same keys, in which case you will still get the ORA-30926 error.
This is an elaboration of DCookie's answer and point 3.1 in Tagar's answer, which from my experience may not be immediately obvious.
How to Troubleshoot ORA-30926 Errors? (Doc ID 471956.1)
1) Identify the failing statement
alter session set events ‘30926 trace name errorstack level 3’;
or
alter system set events ‘30926 trace name errorstack off’;
and watch for .trc files in UDUMP when it occurs.
2) Having found the SQL statement, check if it is correct (perhaps using explain plan or tkprof to check the query execution plan) and analyze or compute statistics on the tables concerned if this has not recently been done. Rebuilding (or dropping/recreating) indexes may help too.
3.1) Is the SQL statement a MERGE?
evaluate the data returned by the USING clause to ensure that there are no duplicate values in the join. Modify the merge statement to include a deterministic where clause
3.2) Is this an UPDATE statement via a view?
If so, try populating the view result into a table and try updating the table directly.
3.3) Is there a trigger on the table? Try disabling it to see if it still fails.
3.4) Does the statement contain a non-mergeable view in an 'IN-Subquery'? This can result in duplicate rows being returned if the query has a "FOR UPDATE" clause. See Bug 2681037
3.5) Does the table have unused columns? Dropping these may prevent the error.
4) If modifying the SQL does not cure the error, the issue may be with the table, especially if there are chained rows.
4.1) Run the ‘ANALYZE TABLE VALIDATE STRUCTURE CASCADE’ statement on all tables used in the SQL to see if there are any corruptions in the table or its indexes.
4.2) Check for, and eliminate, any CHAINED or migrated ROWS on the table. There are ways to minimize this, such as the correct setting of PCTFREE.
Use Note 122020.1 - Row Chaining and Migration
4.3) If the table is additionally Index Organized, see:
Note 102932.1 - Monitoring Chained Rows on IOTs
Had the error today on a 12c and none of the existing answers fit (no duplicates, no non-deterministic expressions in the WHERE clause). My case was related to that other possible cause of the error, according to Oracle's message text (emphasis below):
ORA-30926: unable to get a stable set of rows in the source tables
Cause: A stable set of rows could not be got because of large dml activity or a non-deterministic where clause.
The merge was part of a larger batch, and was executed on a live database with many concurrent users. There was no need to change the statement. I just committed the transaction before the merge, then ran the merge separately, and committed again. So the solution was found in the suggested action of the message:
Action: Remove any non-deterministic where clauses and reissue the dml.
SQL Error: ORA-30926: unable to get a stable set of rows in the source tables
30926. 00000 - "unable to get a stable set of rows in the source tables"
*Cause: A stable set of rows could not be got because of large dml
activity or a non-deterministic where clause.
*Action: Remove any non-deterministic where clauses and reissue the dml.
This Error occurred for me because of duplicate records(16K)
I tried with unique it worked .
but again when I tried merge without unique same proble occurred
Second time it was due to commit
after merge if commit is not done same Error will be shown.
Without unique, Query will work if commit is given after each merge operation.
I was not able to resolve this after several hours. Eventually I just did a select with the two tables joined, created an extract and created individual SQL update statements for the 500 rows in the table. Ugly but beats spending hours trying to get a query to work.
As someone explained earlier, probably your MERGE statement tries to update the same row more than once and that does not work (could cause ambiguity).
Here is one simple example. MERGE that tries to mark some products as found when matching the given search patterns:
CREATE TABLE patterns(search_pattern VARCHAR2(20));
INSERT INTO patterns(search_pattern) VALUES('Basic%');
INSERT INTO patterns(search_pattern) VALUES('%thing');
CREATE TABLE products (id NUMBER,name VARCHAR2(20),found NUMBER);
INSERT INTO products(id,name,found) VALUES(1,'Basic instinct',0);
INSERT INTO products(id,name,found) VALUES(2,'Basic thing',0);
INSERT INTO products(id,name,found) VALUES(3,'Super thing',0);
INSERT INTO products(id,name,found) VALUES(4,'Hyper instinct',0);
MERGE INTO products p USING
(
SELECT search_pattern FROM patterns
) o
ON (p.name LIKE o.search_pattern)
WHEN MATCHED THEN UPDATE SET p.found=1;
SELECT * FROM products;
If patterns table contains Basic% and Super% patterns then MERGE works and first three products will be updated (found). But if patterns table contains Basic% and %thing search patterns, then MERGE does NOT work because it will try to update second product twice and this causes the problem. MERGE does not work if some records should be updated more than once. Probably you ask why not update twice!?
Here first update 1 and second update 1 are the same value but only by accident. Now look at this scenario:
CREATE TABLE patterns(code CHAR(1),search_pattern VARCHAR2(20));
INSERT INTO patterns(code,search_pattern) VALUES('B','Basic%');
INSERT INTO patterns(code,search_pattern) VALUES('T','%thing');
CREATE TABLE products (id NUMBER,name VARCHAR2(20),found CHAR(1));
INSERT INTO products(id,name,found) VALUES(1,'Basic instinct',NULL);
INSERT INTO products(id,name,found) VALUES(2,'Basic thing',NULL);
INSERT INTO products(id,name,found) VALUES(3,'Super thing',NULL);
INSERT INTO products(id,name,found) VALUES(4,'Hyper instinct',NULL);
MERGE INTO products p USING
(
SELECT code,search_pattern FROM patterns
) s
ON (p.name LIKE s.search_pattern)
WHEN MATCHED THEN UPDATE SET p.found=s.code;
SELECT * FROM products;
Now first product name matches Basic% pattern and it will be updated with code B but second product matched both patterns and cannot be updated with both codes B and T in the same time (ambiguity)!
That's why DB engine complaints. Don't blame it! It knows what it is doing! ;-)