Oracle command hangs when using view for "WHERE x IN..." subquery - oracle

I'm working on a web service that fetches data from an oracle data source in chunks and passes it back to an indexing/search tool in XML format. I'm the C#/.NET guy, and am kind of fuzzy on parts of Oracle.
Our Oracle team gave us the following script to run, and it works well:
SELECT ROWID, [columns]
FROM [table]
WHERE ROWID IN (
SELECT ROWID
FROM (
SELECT ROWID
FROM [table]
WHERE ROWID > '[previous_batch_last_rowid]'
ORDER BY ROWID
)
WHERE ROWNUM <= 10000
)
ORDER BY ROWID
10,000 rows is an arbitrary but reasonable chunk size and ROWID is sufficiently unique for our purposes to use as a UID since each indexing run hits only one table at a time. Bracketed values are filled in programmatically by the web service.
Now we're going to start adding views to the indexing, each of which will union a few separate tables. Since ROWID would no longer function as a unique identifier, they added a column to the views (VIEW_UNIQUE_ID) that concatenates the ROWIDs from the component tables to construct a UID for each union.
But this script does not work, even though it follows the same form as the previous one:
SELECT VIEW_UNIQUE_ID, [columns]
FROM [view]
WHERE VIEW_UNIQUE_ID IN (
SELECT VIEW_UNIQUE_ID
FROM (
SELECT VIEW_UNIQUE_ID
FROM [view]
WHERE VIEW_UNIQUE_ID > '[previous_batch_last_view_unique_id]'
ORDER BY VIEW_UNIQUE_ID
)
WHERE ROWNUM <= 10000
)
ORDER BY VIEW_UNIQUE_ID
It hangs indefinitely with no response from the Oracle server. I've waited 20+ minutes and the SQLTools dialog box indicating a running query remains the same, with no progress or updates.
I've tested each subquery independently and each works fine and takes a very short amount of time (<= 1 second), so the view itself is sound. But as soon as the inner two SELECT queries are added with "WHERE VIEW_UNIQUE_ID IN...", it hangs.
Why doesn't this query work for views? In what important way are they not interchangeable here?
Updated: the architecture of the solution stipulates that it is to be stateless, so I shouldn't try to make the web service preserve any index state information between requests from consumers.

they added a column to the views
(VIEW_UNIQUE_ID) that concatenates the
ROWIDs from the component tables to
construct a UID for each union.
God, that is the most obscene idea I've seen in a long time.
Let's say the view is a simple one like
SELECT C.CUST_ID, C.CUST_NAME, O.ORDER_ID, C.ROWID||':'||O.ROWID VIEW_UNIQUE_ID
FROM CUSTOMER C JOIN ORDER O ON C.CUST_ID = O.CUST_ID
Every time you want to do the
SELECT VIEW_UNIQUE_ID
FROM [view]
WHERE VIEW_UNIQUE_ID > '[previous_batch_last_view_unique_id]'
ORDER BY VIEW_UNIQUE_ID
It has to build that entire result set, apply the filter, and order it. For anything other than trivially sized tables, that will be a nightmare.
Stop using the database to paginate/chunk the data here and do that in the client. Open the database connection, execute the query, fetch the first ten thousand rows from the query, index them, fetch the next ten thousand. Don't close and reopen the query each time, only after you've processed each row. You'll be able to forget about ordering.

For stateless, you need to re-architect. The whole thing with concatenated ROWIDs will not fly.
Start by putting the records to be processed into a fresh table, then you can flag them/process them/delete them in chunks.
INSERT INTO pending_table
SELECT 'N' state_flag, v.* FROM view v;
<start looping here>
UPDATE pending_table
SET state_flag = 'P'
WHERE ROWNUM < 10000;
COMMIT;
SELECT * FROM pending_table
WHERE state_flag = 'P';
<client processing>
DELETE FROM pending_table
WHERE state_flag = 'P';
<go back to start of loop, and keep going until pending_table is empty>

Related

How can you check query performance with small data set

All the Oracles out here,
I have an Oracle PL/SQL procedure but very small data that can run on the query. I suspect that when the data gets large, the query might start performing back. Are there ways in which I can check for performance and take corrective measure even before the data build up? If I wait for the data buildup, it might get too late.
Do you have any general & practical suggestions for me? Searching the internet did not get me anything convincing.
Better to build yourself some test data to get an idea of how things will perform. Its easy to get started, eg
create table MY_TEST as select * from all_objects;
gives you approx 50,000 rows typically. You can scale that easily with
create table MY_TEST as select a.* from all_objects a ,
( select 1 from dual connect by level <= 10);
Now you have 500,000 rows
create table MY_TEST as select a.* from all_objects a ,
( select 1 from dual connect by level <= 10000);
Now you have 500,000,000 rows!
If you want unique values per row, then add rownum eg
create table MY_TEST as select rownum r, a.* from all_objects a ,
( select 1 from dual connect by level <= 10000);
If you want (say) 100,000 distinct values in a column then TRUNC or MOD. You can also use DBMS_RANDOM to generate random numbers, strings etc.
Also check out Morten's test data generator
https://github.com/morten-egan/testdata_ninja
for some domain specific data, and also the Oracle sample schemas on github which can also be scaled using techniques above.
https://github.com/oracle/db-sample-schemas

A fast query that selects the number of rows in each table

I want a query that selects the number of rows in each table
but they are NOT updated statistically .So such query will not be accurate:
select table_name, num_rows from user_tables
i want to select several schema and each schema has minimum 500 table some of them contain a lot of columns . it will took for me days if i want to update them .
from the site ask tom he suggest a function includes this query
'select count(*)
from ' || p_tname INTO l_columnValue;
such query with count(*) is really slow and it will not give me fast results.
Is there a query that can give me how many rows are in table in a fast way ?
You said in a comment that you want to delete (drop?) empty tables. If you don't want an exact count but only want to know if a table is empty you can do a shortcut count:
select count(*) from table_name where rownum < 2;
The optimiser will stop when it reaches the first row - the execution plan shows a 'count stopkey' operation - so it will be fast. It will return zero for an empty table, and one for a table with any data - you have no idea how much data, but you don't seem to care.
You still have a slight race condition between the count and the drop, of course.
This seems like a very odd thing to want to do - either your application uses the table, in which case dropping it will break something even if it's empty; or it doesn't, in which case it shouldn't matter whether it has (presumably redundant) and it can be dropped regardless. If you think there might be confusion, that sounds like your source (including DDL) control needs some work, maybe?
To check if either table in two schemas have a row, just count from both of them; either with a union:
select max(c) from (
select count(*) as c from schema1.table_name where rownum < 2
union all
select count(*) as c from schema2.table_name where rownum < 2
);
... or with greatest and two sub-selects, e.g.:
select greatest(
(select count(*) from schema1.table_name where rownum < 2),
(select count(*) from schema2.table_name where rownum < 2)
) from dual;
Either would return one if either table has any rows, and would only return zero f they were both empty.
Full Disclosure: I had originally suggested a query that specifically counts a column that's (a) indexed and (b) not null. #AlexPoole and #JustinCave pointed out (please see their comments below) that Oracle will optimize a COUNT(*) to do this anyway. As such, this answer has been altered significantly.
There's a good explanation here for why User_Tables shouldn't be used for accurate row counts, even when statistics are up to date.
If your tables have indexes which can be used to speed up the count by doing an index scan rather than a table scan, Oracle will use them. This will make the counts faster, though not by any means instantaneous. That said, this is the only way I know to get an accurate count.
To check for empty (zero row) tables, please use the answer posted by Alex Poole.
You could make a table to hold the counts of each table. Then, set a trigger to run on INSERT for each of the tables you're counting that updates the main table.
You'd also need to include a trigger for DELETE.

Compare two tables with minus operation in oracle

Some tables' data need to be updated(or deleted , inserted) in my system.
But I want to know which data are updated,deleted and inserted.
So before the data are changed ,I will backup the table in different schema
just like this:
create table backup_table as select * from schema1.testtable
and after the data are changed,I want to find the difference between backup_table
and testtable ,and I want to save the difference into a table in the backup schema.
the sql I will run is like this:
CREATE TABLE TEST_COMPARE_RESULT
AS
SELECT 'BEFORE' AS STATUS, T1.*
FROM (
SELECT * FROM backup_table
MINUS
SELECT * FROM schema1.testtable
) T1
UNION ALL
SELECT 'AFTER' AS STATUS, T2.*
FROM (
SELECT * FROM schema1.testtable
MINUS
SELECT * FROM backup_table
) T2
What I am worried about is that I heared about that the minus operation will use
a lot of system resource.In my sysytem, some table size will be over 700M .So I want to
know how oracle will read the 700M data in memory (PGA??) or the temporary tablespace?
and How I should make sure that the resource are enough to to the compare operation?
Minus is indeed a resource intensive task. It need to read both tables and do sorts to compare the two tables. However, Oracle has advanced techniques to do this. It won't load the both tables in memory(SGA) if can't do it. It will use, yes, temporary space for sorts. But I would recommend you to have a try. Just run the query and see what happens. The database won't suffer and allways you can stop the execution of statement.
What you can do to improve the performance of the query:
First, if you have columns that you are sure that won't changed, don't include them.
So, is better to write:
select a, b from t1
minus
select a, b from t2
than using a select * from t, if there are more than these two columns, because the work is lesser.
Second, if the amount of data to compare si really big for your system(too small temp space), you should try to compare them on chunks:
select a, b from t1 where col between val1 and val2
minus
select a, b from t2 where col between val1 and val2
Sure, another possibility than minus is to have some log columns, let's say updated_date. selecting with where updated_date greater than start of process will show you updated records. But this depends on how you can alter the database model and etl code.

Optimal way to DELETE specified rows from Oracle

I have a project that needs to occasionally delete several tens of thousands of rows from one of six tables of varying sizes but that have about 30million rows between them. Because of the structure of the data I've been given, I don't know which of the six tables has the row that needs to be deleted in it so I have to run all deletes against all tables. I've built an INDEX against the ID column to try and speed things up, but it can be removed if that'll speed things up.
My problem is, that I can't seem to find an efficient way to actually perform the delete. For the purposes of my testing I'm running 7384 delete rows against single test-table which has about 9400 rows. I've tested a number of possible query solutions in Oracle SQL Developer:
7384 separate DELETE statements took 203 seconds:
delete from TABLE1 where ID=1000001356443294;
delete from TABLE1 where ID=1000001356443296;
etc...
7384 separate SELECT statements took 57 seconds:
select ID from TABLE1 where ID=1000001356443294
select ID from TABLE1 where ID=1000001356443296
etc...
7384 separate DELETE from (SELECT) statements took 214 seconds:
delete from (select ID from TABLE1 where ID=1000001356443294);
delete from (select ID from TABLE1 where ID=1000001356443296);
etc...
1 SELECT statement that has 7384 OR clauses in the where took 127.4s:
select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...
1 DELETE from (SELECT) statement that has 7384 OR clauses in the where took 74.4s:
delete from (select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...)
While the last may be the fastest, upon further testing its still very slow when scaled up from the 9000 row table to even just a 200,000 row table (which is still < 1% of the final tableset size) where the same statement takes 14mins to run. While > 50% faster per row, that still extrapolates up to about a day when being run against the full dataset. I have it on good authority that the piece of software we used to us to do this task could do it in about 20mins.
So my questions are:
Is there a better way to delete?
Should I use a round of SELECT statements (i.e., like the second test) to discover which table any given row is in and then shoot off delete queries? Even that looks quite slow but...
Is there anything else I can do to speed the deletes up? I don't have DBA-level access or knowledge.
In advance of my questions being answered, this is how I'd go about it:
Minimize the number of statements and the work they do issued in relative terms.
All scenarios assume you have a table of IDs (PURGE_IDS) to delete from TABLE_1, TABLE_2, etc.
Consider Using CREATE TABLE AS SELECT for really large deletes
If there's no concurrent activity, and you're deleting 30+ % of the rows in one or more of the tables, don't delete; perform a create table as select with the rows you wish to keep, and swap the new table out for the old table. INSERT /*+ APPEND */ ... NOLOGGING is surprisingly cheap if you can afford it. Even if you do have some concurrent activity, you may be able to use Online Table Redefinition to rebuild the table in-place.
Don't run DELETE statements you know won't delete any rows
If an ID value exists in at most one of the six tables, then keep track of which IDs you've deleted - and don't try to delete those IDs from any of the other tables.
CREATE TABLE TABLE1_PURGE NOLOGGING
AS
SELECT ID FROM PURGE_IDS INNER JOIN TABLE_1 ON PURGE_IDS.ID = TABLE_1.ID;
DELETE FROM TABLE1 WHERE ID IN (SELECT ID FROM TABLE1_PURGE);
DELETE FROM PURGE_IDS WHERE ID IN (SELECT ID FROM TABLE1_PURGE);
DROP TABLE TABLE1_PURGE;
and repeat.
Manage Concurrency if you have to
Another way is to use PL/SQL looping over the tables, issuing a rowcount-limited delete statement. This is most likely appropriate if there's significant insert/update/delete concurrent load against the tables you're running the deletes against.
declare
l_sql varchar2(4000);
begin
for i in (select table_name from all_tables
where table_name in ('TABLE_1', 'TABLE_2', ...)
order by table_name);
loop
l_sql := 'delete from ' || i.table_name ||
' where id in (select id from purge_ids) ' ||
' and rownum <= 1000000';
loop
commit;
execute immediate l_sql;
exit when sql%rowcount <> 1000000; -- if we delete less than 1,000,000
end loop; -- no more rows need to be deleted!
end loop;
commit;
end;
Store all the to be deleted ID's into a table. Then there are 3 ways.
1) loop through all the ID's in the table, then delete one row at a time for X commit interval. X can be a 100 or 1000. It works on OLTP environment and you can control the locks.
2) Use Oracle Bulk Delete
3) Use correlated delete query.
Single query is usually faster than multiple queries because of less context switching, and possibly less parsing.
First, disabling the index during the deletion would be helpful.
Try with a MERGE INTO statement :
1) create a temp table with IDs and an additional column from TABLE1 and test with the following
MERGE INTO table1 src
USING (SELECT id,col1
FROM test_merge_delete) tgt
ON (src.id = tgt.id)
WHEN MATCHED THEN
UPDATE
SET src.col1 = tgt.col1
DELETE
WHERE src.id = tgt.id
I have tried this code and It's working fine in my case.
DELETE FROM NG_USR_0_CLIENT_GRID_NEW WHERE rowid IN
( SELECT rowid FROM
(
SELECT wi_name, relationship, ROW_NUMBER() OVER (ORDER BY rowid DESC) RN
FROM NG_USR_0_CLIENT_GRID_NEW
WHERE wi_name = 'NB-0000001385-Process'
)
WHERE RN=2
);

CTE With Insert In Oracle

i am running a query in oracle with CTE.
When i execute the query it works fine in select statement but when i use insert statement it takes ample of time to execute.Any help here is the code
INSERT INTO port_weeklydailypricesTest (co_code,start_dtm,end_dtm)
SELECT * FROM
(
WITH CTE(co_code, start_dtm, end_dtm) AS
(
SELECT co_code ,
CAST(NEXT_DAY(MIN(dlyprice_date),'FRIDAY')-6 AS DATE) start_dtm ,
CAST(NEXT_DAY(MIN(dlyprice_date),'FRIDAY') AS DATE) end_dtm
FROM feed_dlyprice
GROUP BY co_code
UNION ALL
SELECT co_code ,
CAST(TO_CHAR(end_dtm + INTERVAL '1' DAY,'DD-MON-YYYY') AS DATE),
CAST(TO_CHAR(end_dtm + INTERVAL '7' DAY,'DD-MON-YYYY') AS DATE)
FROM CTE
WHERE CAST(end_dtm AS DATE) <= TO_CHAR(TO_DATE(SYSDATE+1,'DD-MON-YYYY'))
)
SELECT co_code,start_dtm,end_dtm
FROM CTE
);
If, as you say, the performance of the SELECT on its own is satisfactory the problem must lie with the INSERT part of the statement.
There are a number of things which might cause an insert to run slow:
The most likely is the presence of a trigger on the target table which executes something very expensive.
Another possibility is that the insert is waiting on a locked resource (say some other process has an exclusive table level lock on the target table, or some other shared resource such as a code control table).
it could be a storage allocation issue, chaining or row migration, too many indexes or lots of derived columns.
perhaps it is down to hardware - underpowered network, dodgy interconnects, a bad disk.
This is by no means exhaustive. The items at the top are application issues which you should be able to investigate and resolve. The further down the list you go the more likely it is that you will need the assistance on an on-site DBA.

Resources