I am using Java with mybatis.
I have a query like this and I need to execute this for 2000 values on key_b. That means I need to run the sql for 2000 times. Which is reasonably slow.
DELETE FROM my_table
WHERE key_a = xxx
AND key_b = yyy
Now I came up with another solution, this time I am sending 1000 values in IN clause for key_b. Which means only two query I am executing. I was expecting this one to be faster at least. But this seems to be even slower than the above one. Here is the sql.
DELETE FROM my_table
WHERE key_a = xxxx
AND key_b IN (y1, y2, ... y1000)
For more information, the key_b is the Primary Key. And the key_a is a Foreign key and has an Index.
Another thing, I've tried to take out the session and make a commit after all the sqls are executed. But It didn't improve that much.
you can use temp table for this:
I mean if you have a table which has id column.
And then You can insert your values to that table like this:
insert into temp_table
select 1 from dual -- your ids
union all
select 2 from dual
union all
select 3 from dual
union all
......
after you fill your temp_table you can run just this:
DELETE FROM my_table
WHERE key_a = xxxx
AND key_b IN
(
select id from temp_table
);
I recommend sticking with 1st approach: called prepared Delete statement in a Java loop over id collection. Off course with ExecutorType REUSE or BATCH, so that statement is prepared once and run for every record.
Furthermore, I discourage trying to bind thousands of parameters.
Anyway, I fear this is the best you can do since Delete operation will check integrity constraints, probably update index, for every record. That is not "bulked".
Related
All the Oracles out here,
I have an Oracle PL/SQL procedure but very small data that can run on the query. I suspect that when the data gets large, the query might start performing back. Are there ways in which I can check for performance and take corrective measure even before the data build up? If I wait for the data buildup, it might get too late.
Do you have any general & practical suggestions for me? Searching the internet did not get me anything convincing.
Better to build yourself some test data to get an idea of how things will perform. Its easy to get started, eg
create table MY_TEST as select * from all_objects;
gives you approx 50,000 rows typically. You can scale that easily with
create table MY_TEST as select a.* from all_objects a ,
( select 1 from dual connect by level <= 10);
Now you have 500,000 rows
create table MY_TEST as select a.* from all_objects a ,
( select 1 from dual connect by level <= 10000);
Now you have 500,000,000 rows!
If you want unique values per row, then add rownum eg
create table MY_TEST as select rownum r, a.* from all_objects a ,
( select 1 from dual connect by level <= 10000);
If you want (say) 100,000 distinct values in a column then TRUNC or MOD. You can also use DBMS_RANDOM to generate random numbers, strings etc.
Also check out Morten's test data generator
https://github.com/morten-egan/testdata_ninja
for some domain specific data, and also the Oracle sample schemas on github which can also be scaled using techniques above.
https://github.com/oracle/db-sample-schemas
I created a table in oracle like
CREATE TABLE suppliers AS (SELECT * FROM companies WHERE id > 1000);
I would like to know the complete select statement which was used to create this table.
I have already tried get_ddl but it is not giving the select statement. Can you please let me know how to get the select statement?
If you're lucky one of these statements will show the DDL used to generate the table:
select *
from gv$sql
where lower(sql_fulltext) like '%create table suppliers%';
select *
from dba_hist_sqltext
where lower(sql_text) like '%create table%';
I used the word lucky because GV$SQL will usually only have results for a few hours or days, until the data is purged from the shared pool. DBA_HIST_SQLTEXT will only help if you have AWR enabled, the statement was run in the last X days that AWR is configured to hold data (the default is 8), the statement was run after the last snapshot collection (by default it happens every hour), and the statement ran long enough for AWR to think it's worth saving.
And for each table Oracle does not always store the full SQL. For security reasons, DDL statements are often truncated in the data dictionary. Don't be surprised if the text suddenly cuts off after the first N characters.
And depending on how the SQL is called the case and space may be different. Use lower and lots of wildcards to increase the chance of finding the statement.
TRY THIS:
select distinct table_name
from
all_tab_columns where column_name in
(
select column_name from
all_tab_columns
where table_name ='SUPPLIERS'
)
you can find table which created from table
Some tables' data need to be updated(or deleted , inserted) in my system.
But I want to know which data are updated,deleted and inserted.
So before the data are changed ,I will backup the table in different schema
just like this:
create table backup_table as select * from schema1.testtable
and after the data are changed,I want to find the difference between backup_table
and testtable ,and I want to save the difference into a table in the backup schema.
the sql I will run is like this:
CREATE TABLE TEST_COMPARE_RESULT
AS
SELECT 'BEFORE' AS STATUS, T1.*
FROM (
SELECT * FROM backup_table
MINUS
SELECT * FROM schema1.testtable
) T1
UNION ALL
SELECT 'AFTER' AS STATUS, T2.*
FROM (
SELECT * FROM schema1.testtable
MINUS
SELECT * FROM backup_table
) T2
What I am worried about is that I heared about that the minus operation will use
a lot of system resource.In my sysytem, some table size will be over 700M .So I want to
know how oracle will read the 700M data in memory (PGA??) or the temporary tablespace?
and How I should make sure that the resource are enough to to the compare operation?
Minus is indeed a resource intensive task. It need to read both tables and do sorts to compare the two tables. However, Oracle has advanced techniques to do this. It won't load the both tables in memory(SGA) if can't do it. It will use, yes, temporary space for sorts. But I would recommend you to have a try. Just run the query and see what happens. The database won't suffer and allways you can stop the execution of statement.
What you can do to improve the performance of the query:
First, if you have columns that you are sure that won't changed, don't include them.
So, is better to write:
select a, b from t1
minus
select a, b from t2
than using a select * from t, if there are more than these two columns, because the work is lesser.
Second, if the amount of data to compare si really big for your system(too small temp space), you should try to compare them on chunks:
select a, b from t1 where col between val1 and val2
minus
select a, b from t2 where col between val1 and val2
Sure, another possibility than minus is to have some log columns, let's say updated_date. selecting with where updated_date greater than start of process will show you updated records. But this depends on how you can alter the database model and etl code.
I have a project that needs to occasionally delete several tens of thousands of rows from one of six tables of varying sizes but that have about 30million rows between them. Because of the structure of the data I've been given, I don't know which of the six tables has the row that needs to be deleted in it so I have to run all deletes against all tables. I've built an INDEX against the ID column to try and speed things up, but it can be removed if that'll speed things up.
My problem is, that I can't seem to find an efficient way to actually perform the delete. For the purposes of my testing I'm running 7384 delete rows against single test-table which has about 9400 rows. I've tested a number of possible query solutions in Oracle SQL Developer:
7384 separate DELETE statements took 203 seconds:
delete from TABLE1 where ID=1000001356443294;
delete from TABLE1 where ID=1000001356443296;
etc...
7384 separate SELECT statements took 57 seconds:
select ID from TABLE1 where ID=1000001356443294
select ID from TABLE1 where ID=1000001356443296
etc...
7384 separate DELETE from (SELECT) statements took 214 seconds:
delete from (select ID from TABLE1 where ID=1000001356443294);
delete from (select ID from TABLE1 where ID=1000001356443296);
etc...
1 SELECT statement that has 7384 OR clauses in the where took 127.4s:
select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...
1 DELETE from (SELECT) statement that has 7384 OR clauses in the where took 74.4s:
delete from (select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...)
While the last may be the fastest, upon further testing its still very slow when scaled up from the 9000 row table to even just a 200,000 row table (which is still < 1% of the final tableset size) where the same statement takes 14mins to run. While > 50% faster per row, that still extrapolates up to about a day when being run against the full dataset. I have it on good authority that the piece of software we used to us to do this task could do it in about 20mins.
So my questions are:
Is there a better way to delete?
Should I use a round of SELECT statements (i.e., like the second test) to discover which table any given row is in and then shoot off delete queries? Even that looks quite slow but...
Is there anything else I can do to speed the deletes up? I don't have DBA-level access or knowledge.
In advance of my questions being answered, this is how I'd go about it:
Minimize the number of statements and the work they do issued in relative terms.
All scenarios assume you have a table of IDs (PURGE_IDS) to delete from TABLE_1, TABLE_2, etc.
Consider Using CREATE TABLE AS SELECT for really large deletes
If there's no concurrent activity, and you're deleting 30+ % of the rows in one or more of the tables, don't delete; perform a create table as select with the rows you wish to keep, and swap the new table out for the old table. INSERT /*+ APPEND */ ... NOLOGGING is surprisingly cheap if you can afford it. Even if you do have some concurrent activity, you may be able to use Online Table Redefinition to rebuild the table in-place.
Don't run DELETE statements you know won't delete any rows
If an ID value exists in at most one of the six tables, then keep track of which IDs you've deleted - and don't try to delete those IDs from any of the other tables.
CREATE TABLE TABLE1_PURGE NOLOGGING
AS
SELECT ID FROM PURGE_IDS INNER JOIN TABLE_1 ON PURGE_IDS.ID = TABLE_1.ID;
DELETE FROM TABLE1 WHERE ID IN (SELECT ID FROM TABLE1_PURGE);
DELETE FROM PURGE_IDS WHERE ID IN (SELECT ID FROM TABLE1_PURGE);
DROP TABLE TABLE1_PURGE;
and repeat.
Manage Concurrency if you have to
Another way is to use PL/SQL looping over the tables, issuing a rowcount-limited delete statement. This is most likely appropriate if there's significant insert/update/delete concurrent load against the tables you're running the deletes against.
declare
l_sql varchar2(4000);
begin
for i in (select table_name from all_tables
where table_name in ('TABLE_1', 'TABLE_2', ...)
order by table_name);
loop
l_sql := 'delete from ' || i.table_name ||
' where id in (select id from purge_ids) ' ||
' and rownum <= 1000000';
loop
commit;
execute immediate l_sql;
exit when sql%rowcount <> 1000000; -- if we delete less than 1,000,000
end loop; -- no more rows need to be deleted!
end loop;
commit;
end;
Store all the to be deleted ID's into a table. Then there are 3 ways.
1) loop through all the ID's in the table, then delete one row at a time for X commit interval. X can be a 100 or 1000. It works on OLTP environment and you can control the locks.
2) Use Oracle Bulk Delete
3) Use correlated delete query.
Single query is usually faster than multiple queries because of less context switching, and possibly less parsing.
First, disabling the index during the deletion would be helpful.
Try with a MERGE INTO statement :
1) create a temp table with IDs and an additional column from TABLE1 and test with the following
MERGE INTO table1 src
USING (SELECT id,col1
FROM test_merge_delete) tgt
ON (src.id = tgt.id)
WHEN MATCHED THEN
UPDATE
SET src.col1 = tgt.col1
DELETE
WHERE src.id = tgt.id
I have tried this code and It's working fine in my case.
DELETE FROM NG_USR_0_CLIENT_GRID_NEW WHERE rowid IN
( SELECT rowid FROM
(
SELECT wi_name, relationship, ROW_NUMBER() OVER (ORDER BY rowid DESC) RN
FROM NG_USR_0_CLIENT_GRID_NEW
WHERE wi_name = 'NB-0000001385-Process'
)
WHERE RN=2
);
I have table with "varchar2" as primary key.
It has about 1 000 000 Transactions per day.
My app wakes up every 5 minute to generate text file by querying only new record.
It will remember last point and process only new records.
Do you have idea how to query with good performance?
I am able to add new column if necessary.
What do you think this process should do by?
plsql?
java?
Everyone here is really really close. However:
Scott Bailey's wrong about using a bitmap index if the table's under any sort of continuous DML load. That's exactly the wrong time to use a bitmap index.
Everyone else's answer about the PROCESSED CHAR(1) check in ('Y','N')column is right, but missing how to index it; you should use a function-based index like this:
CREATE INDEX MY_UNPROCESSED_ROWS_IDX ON MY_TABLE
(CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END);
You'd then query it using the same expression:
SELECT * FROM MY_TABLE
WHERE (CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END) = 'N';
The reason to use the function-based index is that Oracle doesn't write index entries for entirely NULL values being indexed, so the function-based index above will only contain the rows with PROCESSED_FLAG = 'N'. As you update your rows to PROCESSED_FLAG = 'Y', they'll "fall out" of the index.
Well, if you can add a new column, you could create a Processed column, which will indicate processed records, and create an index on this column for performance.
Then the query should only be for those rows that have been newly added, and not processed.
This should be easily done using sql queries.
Ah, I really hate to add another answer when the others have come so close to nailing it. But
As Ponies points out, Oracle does have a hidden column (ORA_ROWSCN - System Change Number) that can pinpoint when each row was modified. Unfortunately, the default is that it gets the information from the block instead of storing it with each row and changing that behavior will require you to rebuild a really large table. So while this answer is good for quieting the SQL Server fella, I'd not recommend it.
Astander is right there but needs a few caveats. Add a new column needs_processed CHAR(1) DEFAULT 'Y' and add a BITMAP index. For low cardinality columns ('Y'/'N') the bitmap index will be faster. Once you have the rest is pretty easy. But you've got to be careful not select the new rows, process them and mark them as processed in one step. Otherwise, rows could be inserted while you are processing that will get marked processed even though they have not been.
The easiest way would be to use pl/sql to open a cursor that selects unprocessed rows, processes them and then updates the row as processed. If you have an aversion to walking cursors, you could collect the pk's or rowids into a nested table, process them and then update using the nested table.
In MS SQL Server world where I work, we have a 'version' column of type 'timestamp' on our tables.
So, to answer #1, I would add a new column.
To answer #2, I would do it in plsql for performance.
Mark
"astander" pretty much did the work for you. You need to ALTER your table to add one more column (lets say PROCESSED)..
You can also consider creating an INDEX on the PROCESSED ( a bitmap index may be of some advantage, as the possible value can be only 'y' and 'n', but test it out ) so that when you query it will use INDEX.
Also if sure, you query only for every 5 mins, check whether you can add another column with TIMESTAMP type and partition the table with it. ( not sure, check out again ).
I would also think about writing job or some thing and write using UTL_FILE and show it front end if it can be.
If performance is really a problem and you want to create your file asynchronously, you might want to use Oracle Streams, which will actually get modification data from your redo log withou affecting performance of the main database. You may not even need a separate job, as you can configure Oracle Streams to do Asynchronous replication of the changes, through which you can trigger the file creation.
Why not create an extra table that holds two columns. The ID column and a processed flag column. Have an insert trigger on the original table place it's ID in this new table. Your logging process can than select records from this new table and mark them as processed. Finally delete the processed records from this table.
I'm pretty much in agreement with Adam's answer. But I'd want to do some serious testing compared to an alternative.
The issue I see is that you need to not only select the rows, but also do an update of those rows. While that should be pretty fast, I'd like to avoid the update. And avoid having any large transactions hanging around (see below).
The alternative would be to add CREATE_DATE date default sysdate. Index that. And then select records where create_date >= (start date/time of your previous select).
But I don't have enough data on the relative costs of setting a sysdate as default vs. setting a value of Y, updating the function based vs. date index, and doing a range select on the date vs. a specific select on a single value for the Y. You'll probably want to preserve stats or hint the query to use the index on the Y/N column, and definitely want to use a hint on a date column -- the stats on the date column will almost certainly be old.
If data are also being added to the table continuously, including during the period when your query is running, you need to watch out for transaction control. After all, you don't want to read 100,000 records that have the flag = Y, then do your update on 120,000, including the 20,000 that arrived when you query was running.
In the flag case, there are two easy ways: SET TRANSACTION before your select and commit after your update, or start by doing an update from Y to Q, then do your select for those that are Q, and then update to N. Oracle's read consistency is wonderful but needs to be handled with care.
For the date column version, if you don't mind a risk of processing a few rows more than once, just update your table that has the last processed date/time immediately before you do your select.
If there's not much information in the table, consider making it Index Organized.
What about using Materialized view logs? You have a lot of options to play with:
SQL> create table test (id_test number primary key, dummy varchar2(1000));
Table created
SQL> create materialized view log on test;
Materialized view log created
SQL> insert into test values (1, 'hello');
1 row inserted
SQL> insert into test values (2, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------------
1 01/01/4000 I N FE
2 01/01/4000 I N FE
SQL> delete from mlog$_test where id_test in (1,2);
2 rows deleted
SQL> insert into test values (3, 'hello');
1 row inserted
SQL> insert into test values (4, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------
3 01/01/4000 I N FE
4 01/01/4000 I N FE
I think this solution should work..
What you need to do following steps
For the first run, you will have to copy all records. In first run you need to execute following query
insert into new_table(max_rowid) as (Select max(rowid) from yourtable);
Now next time when you want to get only newly inserted values, you can do it by executing follwing command
Select * from yourtable where rowid > (select max_rowid from new_table);
Once you are done with processing above query, simply truncate new_table and insert max(rowid) from yourtable
I think this should work and would be fastest solution;