SQLite Delete Slow - performance

I got a SQLite database (First project with SQLite) where I need to delete a bulk of records at once, this is like 14.000 records. The quere is as follows (I changed the names for better read) :
delete from table_1 where table_2_id in (
select id from table_2 where table_3_id in (
select id from table_3
where (deleted = 1 or
table_4_id in (select id from table_4 where deleted = 1))));
This query takes like 8 minutes to delete. But when I do
select * from table_1 where table_2_id in (
select id from table_2 where table_3_id in (
select id from table_3
where (deleted = 1 or
table_4_id in (select id from table_4 where deleted = 1))));
it gives me a result in 3 seconds.
I tried using a transaction, cache size, journal mode, but I do not get it to work to get a better performance. What am I missing?

I had the same problem. Solution was to divide it to many smaller chunks, and do it again and again until no more rows are affected (sqlite3_changes() returns zero).
Of course this way the operation is not completed sooner, but the table is not locked for too long continuously.
Hope this helps someone.

It was 30MB and also turned of the virusscanner, but without any result. So I tried multiple things, I copied the database, delete all foreign keys and tried again, and it was much faster. So then I put an index on all foreign keys and it also deleted fast. So this was solution, but I do not know why it then select fast and deletes super slow. But hope this helps anyone!

Do you have an index on any of the columns involved? If so, consider dropping it for a large delete and then rebuilding it. If you don't, try adding one.

I have a script in python to delete entries in sqLite DB, lfl is a list of files that I want to delete from DB, in statement speeds process a lot:
while len(lfl)>0:
print ("Deleting entries in DB: ",len(lfl))
sql="""delete from md5t where file in ('%s')"""%('\',\''.join(tuple(lfl[:500])))
cursor.execute(sql)
db.commit()
del lfl[:500]

Related

Sqlite3 slow UPDATE

I have this huge (8 GB/14126762 rows) with two tables, table1 (very big) and table2 (much smaller) on which I need to reduce a value from table1 using table2 vakyes.
While I did some tests using a smaller database(5MB database), it was fine. But now, when i use it on the bigger database, it takes forever and I don't know if it works at all.
For instance, it takes 12 mins to create the database with INSERT command.
The troublesome transaction is the following:
UPDATE table1
SET vl_empenho = vl_empenho -
(SELECT vl_estorno
FROM table2 WHERE table1.cd_ugestora =
table2.cd_ugestora AND table1.dt_ano =
table2.dt_ano AND table1.nu_empenho =
table2.nu_empenho)
WHERE cd_ugestora IN (SELECT table2.cd_ugestora FROM
table2 WHERE table1.dt_ano =
table2.dt_ano AND table1.nu_empenho =
table2.nu_empenho);
I´m not proficient on Sqlite and the transaction gave what I wanted, but I don't know if it redundant.
Thanks for any help!
After reading the comments and other stackoverflow related question, I made a index for each columns on the query, also I set:
PRAGMA synchronize = OFF;
PRAGMA jorunal_mode = MEMORY;
With that, i took about 20mins to do the UPDATE above mentioned and 6 mins for the INSERT command, which I find appropriate, considering the filesize(actually 10gb).
Thanks for all the attention!
EDIT: Regarding the comment from David Stein it is TRUE! You cant easily corrupt your database with these options. On my case, it was a very replaceable rebuidable database with no sensitive data. I rebuild it anytime I wanted and I was its only user. So I needed it blazing-fast!
Maybe it´s not your situation.

Optimal way to DELETE specified rows from Oracle

I have a project that needs to occasionally delete several tens of thousands of rows from one of six tables of varying sizes but that have about 30million rows between them. Because of the structure of the data I've been given, I don't know which of the six tables has the row that needs to be deleted in it so I have to run all deletes against all tables. I've built an INDEX against the ID column to try and speed things up, but it can be removed if that'll speed things up.
My problem is, that I can't seem to find an efficient way to actually perform the delete. For the purposes of my testing I'm running 7384 delete rows against single test-table which has about 9400 rows. I've tested a number of possible query solutions in Oracle SQL Developer:
7384 separate DELETE statements took 203 seconds:
delete from TABLE1 where ID=1000001356443294;
delete from TABLE1 where ID=1000001356443296;
etc...
7384 separate SELECT statements took 57 seconds:
select ID from TABLE1 where ID=1000001356443294
select ID from TABLE1 where ID=1000001356443296
etc...
7384 separate DELETE from (SELECT) statements took 214 seconds:
delete from (select ID from TABLE1 where ID=1000001356443294);
delete from (select ID from TABLE1 where ID=1000001356443296);
etc...
1 SELECT statement that has 7384 OR clauses in the where took 127.4s:
select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...
1 DELETE from (SELECT) statement that has 7384 OR clauses in the where took 74.4s:
delete from (select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...)
While the last may be the fastest, upon further testing its still very slow when scaled up from the 9000 row table to even just a 200,000 row table (which is still < 1% of the final tableset size) where the same statement takes 14mins to run. While > 50% faster per row, that still extrapolates up to about a day when being run against the full dataset. I have it on good authority that the piece of software we used to us to do this task could do it in about 20mins.
So my questions are:
Is there a better way to delete?
Should I use a round of SELECT statements (i.e., like the second test) to discover which table any given row is in and then shoot off delete queries? Even that looks quite slow but...
Is there anything else I can do to speed the deletes up? I don't have DBA-level access or knowledge.
In advance of my questions being answered, this is how I'd go about it:
Minimize the number of statements and the work they do issued in relative terms.
All scenarios assume you have a table of IDs (PURGE_IDS) to delete from TABLE_1, TABLE_2, etc.
Consider Using CREATE TABLE AS SELECT for really large deletes
If there's no concurrent activity, and you're deleting 30+ % of the rows in one or more of the tables, don't delete; perform a create table as select with the rows you wish to keep, and swap the new table out for the old table. INSERT /*+ APPEND */ ... NOLOGGING is surprisingly cheap if you can afford it. Even if you do have some concurrent activity, you may be able to use Online Table Redefinition to rebuild the table in-place.
Don't run DELETE statements you know won't delete any rows
If an ID value exists in at most one of the six tables, then keep track of which IDs you've deleted - and don't try to delete those IDs from any of the other tables.
CREATE TABLE TABLE1_PURGE NOLOGGING
AS
SELECT ID FROM PURGE_IDS INNER JOIN TABLE_1 ON PURGE_IDS.ID = TABLE_1.ID;
DELETE FROM TABLE1 WHERE ID IN (SELECT ID FROM TABLE1_PURGE);
DELETE FROM PURGE_IDS WHERE ID IN (SELECT ID FROM TABLE1_PURGE);
DROP TABLE TABLE1_PURGE;
and repeat.
Manage Concurrency if you have to
Another way is to use PL/SQL looping over the tables, issuing a rowcount-limited delete statement. This is most likely appropriate if there's significant insert/update/delete concurrent load against the tables you're running the deletes against.
declare
l_sql varchar2(4000);
begin
for i in (select table_name from all_tables
where table_name in ('TABLE_1', 'TABLE_2', ...)
order by table_name);
loop
l_sql := 'delete from ' || i.table_name ||
' where id in (select id from purge_ids) ' ||
' and rownum <= 1000000';
loop
commit;
execute immediate l_sql;
exit when sql%rowcount <> 1000000; -- if we delete less than 1,000,000
end loop; -- no more rows need to be deleted!
end loop;
commit;
end;
Store all the to be deleted ID's into a table. Then there are 3 ways.
1) loop through all the ID's in the table, then delete one row at a time for X commit interval. X can be a 100 or 1000. It works on OLTP environment and you can control the locks.
2) Use Oracle Bulk Delete
3) Use correlated delete query.
Single query is usually faster than multiple queries because of less context switching, and possibly less parsing.
First, disabling the index during the deletion would be helpful.
Try with a MERGE INTO statement :
1) create a temp table with IDs and an additional column from TABLE1 and test with the following
MERGE INTO table1 src
USING (SELECT id,col1
FROM test_merge_delete) tgt
ON (src.id = tgt.id)
WHEN MATCHED THEN
UPDATE
SET src.col1 = tgt.col1
DELETE
WHERE src.id = tgt.id
I have tried this code and It's working fine in my case.
DELETE FROM NG_USR_0_CLIENT_GRID_NEW WHERE rowid IN
( SELECT rowid FROM
(
SELECT wi_name, relationship, ROW_NUMBER() OVER (ORDER BY rowid DESC) RN
FROM NG_USR_0_CLIENT_GRID_NEW
WHERE wi_name = 'NB-0000001385-Process'
)
WHERE RN=2
);

MySQL Query still executing after a day..?

I'm trying to isolate duplicates in a 500MB database and have tried two ways to do it. One creating a new table and grouping:
CREATE TABLE test_table as
SELECT * FROM items WHERE 1 GROUP BY title;
But it's been running for an hour and in MySQL Admin it says the status is Locked.
The other way I tried was to delete duplicates with this:
DELETE bad_rows.*
from items as bad_rows
inner join (
select post_title, MIN(id) as min_id
from items
group by title
having count(*) > 1
) as good_rows on good_rows.post_title = bad_rows.post_title;
..and this has been running for 24hours now, Admin telling me it's Sending data...
Do you think either or these queries are actually still running? How can I find out if it's hung? (with Apple OS X 10.5.7)
You can do this:
alter ignore table items add unique index(title);
This will add a unique index and at the same time remove any duplicates, which will prevent any future duplicates from occurring. Make sure you do a backup before running this command.

Oracle command hangs when using view for "WHERE x IN..." subquery

I'm working on a web service that fetches data from an oracle data source in chunks and passes it back to an indexing/search tool in XML format. I'm the C#/.NET guy, and am kind of fuzzy on parts of Oracle.
Our Oracle team gave us the following script to run, and it works well:
SELECT ROWID, [columns]
FROM [table]
WHERE ROWID IN (
SELECT ROWID
FROM (
SELECT ROWID
FROM [table]
WHERE ROWID > '[previous_batch_last_rowid]'
ORDER BY ROWID
)
WHERE ROWNUM <= 10000
)
ORDER BY ROWID
10,000 rows is an arbitrary but reasonable chunk size and ROWID is sufficiently unique for our purposes to use as a UID since each indexing run hits only one table at a time. Bracketed values are filled in programmatically by the web service.
Now we're going to start adding views to the indexing, each of which will union a few separate tables. Since ROWID would no longer function as a unique identifier, they added a column to the views (VIEW_UNIQUE_ID) that concatenates the ROWIDs from the component tables to construct a UID for each union.
But this script does not work, even though it follows the same form as the previous one:
SELECT VIEW_UNIQUE_ID, [columns]
FROM [view]
WHERE VIEW_UNIQUE_ID IN (
SELECT VIEW_UNIQUE_ID
FROM (
SELECT VIEW_UNIQUE_ID
FROM [view]
WHERE VIEW_UNIQUE_ID > '[previous_batch_last_view_unique_id]'
ORDER BY VIEW_UNIQUE_ID
)
WHERE ROWNUM <= 10000
)
ORDER BY VIEW_UNIQUE_ID
It hangs indefinitely with no response from the Oracle server. I've waited 20+ minutes and the SQLTools dialog box indicating a running query remains the same, with no progress or updates.
I've tested each subquery independently and each works fine and takes a very short amount of time (<= 1 second), so the view itself is sound. But as soon as the inner two SELECT queries are added with "WHERE VIEW_UNIQUE_ID IN...", it hangs.
Why doesn't this query work for views? In what important way are they not interchangeable here?
Updated: the architecture of the solution stipulates that it is to be stateless, so I shouldn't try to make the web service preserve any index state information between requests from consumers.
they added a column to the views
(VIEW_UNIQUE_ID) that concatenates the
ROWIDs from the component tables to
construct a UID for each union.
God, that is the most obscene idea I've seen in a long time.
Let's say the view is a simple one like
SELECT C.CUST_ID, C.CUST_NAME, O.ORDER_ID, C.ROWID||':'||O.ROWID VIEW_UNIQUE_ID
FROM CUSTOMER C JOIN ORDER O ON C.CUST_ID = O.CUST_ID
Every time you want to do the
SELECT VIEW_UNIQUE_ID
FROM [view]
WHERE VIEW_UNIQUE_ID > '[previous_batch_last_view_unique_id]'
ORDER BY VIEW_UNIQUE_ID
It has to build that entire result set, apply the filter, and order it. For anything other than trivially sized tables, that will be a nightmare.
Stop using the database to paginate/chunk the data here and do that in the client. Open the database connection, execute the query, fetch the first ten thousand rows from the query, index them, fetch the next ten thousand. Don't close and reopen the query each time, only after you've processed each row. You'll be able to forget about ordering.
For stateless, you need to re-architect. The whole thing with concatenated ROWIDs will not fly.
Start by putting the records to be processed into a fresh table, then you can flag them/process them/delete them in chunks.
INSERT INTO pending_table
SELECT 'N' state_flag, v.* FROM view v;
<start looping here>
UPDATE pending_table
SET state_flag = 'P'
WHERE ROWNUM < 10000;
COMMIT;
SELECT * FROM pending_table
WHERE state_flag = 'P';
<client processing>
DELETE FROM pending_table
WHERE state_flag = 'P';
<go back to start of loop, and keep going until pending_table is empty>

How to delete large data from Oracle 9i DB?

I have a table that is 5 GB, now I was trying to delete like below:
delete from tablename
where to_char(screatetime,'yyyy-mm-dd') <'2009-06-01'
But it's running long and no response. Meanwhile I tried to check if anybody is blocking with this below:
select l1.sid, ' IS BLOCKING ', l2.sid
from v$lock l1, v$lock l2
where l1.block =1 and l2.request > 0
and l1.id1=l2.id1
and l1.id2=l2.id2
But I didn't find any blocking also.
How can I delete this large data without any problem?
5GB is not a useful measurement of table size. The total number of rows matters. The number of rows you are going to delete as a proportion of the total matters. The average length of the row matters.
If the proportion of the rows to be deleted is tiny it may be worth your while creating an index on screatetime which you will drop afterwards. This may mean your entire operation takes longer, but crucially, it will reduce the time it takes for you to delete the rows.
On the other hand, if you are deleting a large chunk of rows you might find it better to
Create a copy of the table using
'create table t1_copy as select * from t1
where screatedate >= to_date('2009-06-01','yyyy-mm-dd')`
Swap the tables using the rename command.
Re-apply constraints, indexs to the new T1.
Another thing to bear in mind is that deletions eat more UNDO than other transactions, because they take more information to rollback. So if your records are long and/or numerous then your DBA may need to check the UNDO tablespace (or rollback segs if you're still using them).
Finally, have you done any investigation to see where the time is actually going? DELETE statements are just another query, and they can be tackled using the normal panoply of tuning tricks.
Use a query condition to export necessary rows
Truncate table
Import rows
If there is an index on screatetime your query may not be using it. Change your statement so that your where clause can use the index.
delete from tablename where screatetime < to_date('2009-06-01','yyyy-mm-dd')
It runs MUCH faster when you lock the table first. Also change the where clause, as suggested by Rene.
LOCK TABLE tablename IN EXCLUSIVE MODE;
DELETE FROM tablename
where screatetime < to_date('2009-06-01','yyyy-mm-dd');
EDIT: If the table cannot be locked, because it is constantly accessed, you can choose the salami tactic to delete those rows:
BEGIN
LOOP
DELETE FROM tablename
WHERE screatetime < to_date('2009-06-01','yyyy-mm-dd')
AND ROWNUM<=10000;
EXIT WHEN SQL%ROWCOUNT=0;
COMMIT;
END LOOP;
END;
Overall, this will be slower, but it wont burst your rollback segment and you can see the progress in another session (i.e. the number of rows in tablename goes down). And if you have to kill it for some reason, rollback won't take forever and you haven't lost all work done so far.

Resources