SDO_JOIN table join on large tables no longer returns results after upgrade from Oracle 11.2.0.1 to 11.2.0.4 - oracle

I have the same database on 2 different Oracle servers, one is 11.2.0.1.0 and the other is 11.2.0.4.0.
I have the same 2 geometry tables in both databases and run the following query on both servers. When run on an 11.2.0.1.0 version of Oracle, the query runs for a few minutes and I get results, the same query when run on 11.2.0.4.0 runs for about 3 seconds and returns no results.
The BLPUs table holds 36 million points and the PD_B2 table holds a polygon. I am trying to find all the points that fall in the polygon.
Other spatial queries do return rows but it takes hours and hours whereas the table join suggested in the Oracle Spatial documentation, takes 15 minutes to return all the points.
SELECT /*+ ordered */ a.uprn
FROM TABLE(SDO_JOIN('BLPUS', 'GEOLOC', 'PD_B2', 'GEOLOC','mask=ANYINTERACT')) c, blpus a, PD_B2 b
WHERE c.rowid1 = a.rowid
AND c.rowid2 = b.rowid;
The spatial queryies below return SDO_ROWIDSET() when run on the 11.2.0.4 server
select SDO_JOIN('BLPUS', 'GEOLOC', 'PD_B2', 'GEOLOC','mask=ANYINTERACT')
from dual;
select SDO_JOIN('BLPUS', 'GEOLOC', 'PD_B2', 'GEOLOC')
from dual;
On the 11.2.0.1 server they return results.
I have discovered that a much smaller table of points will work on 11.2.0.4 so it seems that there is a size limit on 11.2.0.4 when using SDO_JOIN where as 11.2.0.1 seems to cope with the large table.
Does anyone know why this is or if there is an actual limit on table size when using SDO_JOIN?

This is strange. I see no reason why SDO_JOIN will not work the same way in 11.2.0.4. At least I have not seen that sort of behavior before. It looks like a bug to me and I suggest you file a service request with Oracle Support so we can take a look. You may need to provide a dump of the tables - or at least of a small enough subset that demonstrates the problem.
That said there are a few things to check: did you apply the 11.2.0.4 patch on the same database ? I.e. nothing changed in terms of table structures or content, grants, etc ?
When you say that the query returns no rows, does it do so immediately ? Or does it perform some processing before completing without anything being returned ?
How large is the PD_B2 table ?
What happens when you do:
select SDO_JOIN('BLPUS', 'GEOLOC', 'PD_B2', 'GEOLOC','mask=ANYINTERACT')
from dual;
Does this also return nothing ?
What happens if you do
select SDO_JOIN('BLPUS', 'GEOLOC', 'PD_B2', 'GEOLOC')
from dual;
Both should return something that looks like this:
SDO_ROWIDSET(SDO_ROWIDPAIR('AAAW3LAAGAAAADmAAu', 'AAAW3TAAGAAAAg7AAC'), SDO_ROWIDPAIR('AAAW3LAAGAAAADmABE', 'AAAW3TAAGAAAAgrAAA'),...)
You will see this if you run the query in sqlplus. [If you use a GUI (like TOAD or SQLDeveloper) then you may not see that. All those GUIs have problems dealing with objects or arrays.]
But the fact that your 11.2.0.4 tests complete very quickly probably means that you get an empty result back, maybe like this:
SDO_ROWIDSET()
and that confirms that something did not work.
Now, from what you say PD_B2 only contains one row ? If so then there is no reason whatsoever to go via SDO_JOIN. A straightforward spatial query is easier to write. SDO_JOIN only makes sense when both tables being joined contain multiple rows. Then again, if one of the tables is very small (like the PD_B2 table in your case), then it anyway falls back to a simple query.
A simple query would look like this:
SELECT a.uprn
FROM blpus a, PD_B2 b
WHERE sdo_anyinteract (a.geoloc, b.geoloc) = 'TRUE';
Does this return what you expect in 11.2.0.4 ?
You may also want to examine the cost of the query and the query plan used. In sqlplus, do this:
set timing on
set autotrace traceonly
then run the above query. The effect is that sqlplus will not display any results - but it will still fetch and format the output: it will just not print them. At the end you will get a printout of the query plan used as well as some execution statistics and the elapsed time. Please add those results to your question.
Running the query from within your application should have a similar profile: similar response time and database-side costs - assuming you do fetch all the 1.3 million rows.
BTW, what do you do with those results ? Do you show them out as a report ? Do you save them into a table for later analysis ? Surely you do not want to show them all on a map ?
To clarify: SDO_JOIN is only for matching many to many geometries. For matching one to many, the simple SDO_ANYINTERACT() is what you need. As a matter of fact, SDO_JOIN will automatically fall back to a simple SDO_ANYINTERACT when one of the sets is very much smaller than the other. I can't see how SDO_JOIN could be faster in your circumstances (i.e. when searching for all objects that match one object) since it will perform the same as an SDO_ANYINTERACT and will require extra joins to the two tables.
Going back to the original issue - that of a difference in behavior in SDO_JOIN between 11.2.0.1 and 11.2.0.4 that IMO is definitely a bug that you need to report to Oracle Support.

Related

Oracle: Return Large Dataset with Cursor in Procedure

I've seen lots of posts regarding the use of cursors in PL/SQL to return data to a calling application, but none of them touch on the issue I believe I'm having with this technique. I am fairly new to Oracle, but have extensive experience with MSSQL Server. In SQL Server, when building queries to be called by an application for returning data, I usually put the SELECT statement inside a stored proc with/without parameters, and let the stored proc execute the statement(s) and return the data automatically. I've learned that with PL/SQL, you must store the resulting dataset in a cursor and then consume the cursor.
We have a query that doesn't necessarily return huge amounts of rows (~5K - 10K rows), however the dataset is very wide as it's composed of 1400+ columns. Running the SQL query itself in SQL Developer returns results instantaneously. However, calling a procedure that opens a cursor for the same query takes 5+ minutes to finish.
CREATE OR REPLACE PROCEDURE PROCNAME(RESULTS OUT SYS_REFCURSOR)
AS
BEGIN
OPEN RESULTS FOR
<SELECT_query_with_1400+_columns>
...
END;
After doing some debugging to try to get to the root cause of the slowness, I'm leaning towards the cursor returning one row at a time very slowly. I can actually see this real-time by converting the proc code into a PL/SQL block and using DBMS_SQL.return_result(RESULTS) after the SELECT query. When running this, I can see each row show up in the Script output window in SQL Developer one at a time. If this is exactly how the cursor returns the data to the calling application, then I can definitely see how this is the bottleneck as it could take 5-10 minutes to finish returning all 5K-10K rows. If I remove columns from the SELECT query, the cursor displays all the rows much faster, so it does seem like the large amount of columns is an issue using a cursor.
Knowing that running the SQL query by itself returns instant results, how could I get this same performance out of a cursor? It doesn't seem like it's possible. Is the answer putting the embedded SQL in the application code and not using a procedure/cursor to return data in this scenario? We are using Oracle 12c in our environment.
Edit: Just want to address how I am testing performance using the regular SELECT query vs the PL/SQL block with cursor method:
SELECT (takes ~27 seconds to return ~6K rows):
SELECT <1400+_columns>
FROM <table_name>;
PL/SQL with cursor (takes ~5-10 minutes to return ~6K rows):
DECLARE RESULTS SYS_REFCURSOR;
BEGIN
OPEN RESULTS FOR
SELECT <1400+_columns>
FROM <table_name>;
DBMS_SQL.return_result(RESULTS);
END;
Some of the comments are referencing what happens in the console application once all the data is returned, but I am only speaking regarding the performance of the two methods described above within Oracle\SQL Developer. Hope this helps clarify the point I'm trying to convey.
You can run a SQL Monitor report for the two executions of the SQL; that will show you exactly where the time is being spent. I would also consider running the two approaches in separate snapshot intervals and checking into the output from an AWR Differences report and ADDM Compare Report; you'd probably be surprised at the amazing detail these comparison reports provide.
Also, even though > 255 columns in a table is a "no-no" according to Oracle as it will fragment your record across > 1 database blocks, thus increasing the IO time needed to retrieve the results, I suspect the differences in the two approaches that you are seeing is not an IO problem since in straight SQL you report fast result fetching all. Therefore, I suspect more of a memory problem. As you probably know, PL/SQL code will use the Program Global Area (PGA), so I would check the parameter pga_aggregate_target and bump it up to say 5 GB (just guessing). An ADDM report run for the interval when the code ran will tell you if the advisor recommends a change to that parameter.

SP using table/index with volatile statistics that differ at compile and run time

I’m a longtime MSSQL developer who finds himself back in PL/SQL for the first time since Oracle 7. I’m looking for some tuning advice re a large export stored procedure, which is sporadically and not very reproducably running slow at certain points. This happens around some static working tables which it truncates, fills and uses as part of the export. The code in outline typically looks like this:
create or replace Procedure BigMultiPurposeExport as (
-- about 2000 lines of other code
INSERT WORK_TABLE_5 SELECT WHATEVER1 FROM WHEREVER1;
INSERT WORK_TABLE_5 SELECT WHATEVER2 FROM WHEREVER2;
INSERT WORK_TABLE_5 SELECT WHATEVER3 FROM WHEREVER3;
INSERT WORK_TABLE_5 SELECT WHATEVER4 FROM WHEREVER4;
-- WORK_TABLE_5 now has 0 to ~500k rows whose content can vary drastically from run to run
-- e.g. one hourly run exports 3 whale sightings, next exports all tourist visits to Kenya this decade
-- about 1000 lines of other code
INSERT OUTPUT_TABLE_3
SELECT THIS, THAT, THE_OTHER
FROM BUSINESS_TABLE_1 BT1
INNER JOIN BUSINESS_TABLE_2 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_3 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_1 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_2 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_3 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_5 WT5 ON BT1.ID = WT5.BT1_ID AND WT5.RECORD_TYPE = 21
-- join above is now supported by indexes on BUSINESS_TABLE_1 (ID) and WORK_TABLE_5 (BT1_ID, RECORD_TYPE), originally wasn't
LEFT OUTER JOIN WORK_TABLE_6 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_7 ON etc -- typical join on indexed columns
-- about 4000 lines of other code
)
That final insert into OUTPUT_TABLE_3 usually runs in under 10 seconds, but once in a while on certain customer servers it times out at our default 99 minutes. Then we have them take the tiemout off and run it on Friday night, and it finishes but takes 16 hours.
I narrowed the problem down to the join to WORK_TABLE_5, which had no index support, and put an index on the join terms. The next run took 4 seconds. But success has been intermittent, the customer occasionally gets some slow runs when they drastically change their export selection (i.e. drastically change the data in WORK_TABLE_5). And if we update statistics and rebuild indexes after a timed out export, it runs fine at the next attempt.
So, I am wondering about how best to handle truncating/filling static work tables with static indexes, statistics updated overnight, and a stored procedure compiled when the statistics are nothing like runtime.
I have a few general questions about things I'd like to understand better:
Is the nature of the data in the work table going to substantially effect the query plan? Does Oracle form its query plan when you compile the stored procedure? Could we get a highly inappropriate query plan if we compile the stored procedure with the table empty then use a table with 500k rows at runtime?
I expect that if this were an ad-hoc script then updating statistics on the problem table just before selecting from it would eliminate the sporadic slowdowns. But what if I were to update statistics inside the stored procedure, which is compiled with different statistics from runtime?
Anything else you'd like to add...
Thanks for any advice. I hope my MSSQL preconceptions haven't made me too far off base.
This is happening in Oracle 11g, but the code is deployed to assorted customers using Oracle 10 through 12 and I'd like to cater to all of those if possible.
-- Joel
Huge differences in table or index sizes can most definitely cause performance problems. The solution is to add statistics gathering to the procedure instead of relying on the default statistics jobs.
If you've been away from Oracle since version 7, the most important new feature is the Cost Based Optimizer. Oracle now builds query execution plans based on the optimizer statistics of tables, indexes, columns, expressions, system statistics, outlines, directives, dynamic sampling, etc. If you're a full time Oracle developer you should probably spend a day reading about optimizer statistics. Start with Managing Optimizer Statistics and DBMS_STATS in the official documentation.
Eventually the stored procedure should look like this:
--1: Insert into working tables.
insert into work_table...
--2: Gather statistics on working tables.
dbms_stats.gather_table_stats('SCHEMA_NAME', 'WORK_TABLE', ...);
--3: Use working tables.
insert into other_table select * from work_table...
There are so many statistics features it's hard to know exactly what parameters to use in that second step above. Here are some guesses about some features you might find useful:
DEGREE - One reason people avoid gathering statistics inside a process is the time is takes. You can significantly improve the run time by setting the degree. Although this also uses significantly more resources.
NO_INVALIDATE - It can be tricky to know when exactly are the statistics "set" for a query. Gathering statistics usually quickly invalidates execution plans that were based on old statistics. But not always. If you want to be 100% sure that the next query is using the latest statistics you want to set NO_INVALIDATE=>FALSE.
ESTIMATE_PERCENT In 11g and above you definitely want to use the default, which uses a faster algorithm. In 10g and below you may need to set the value to something low to make the gathering fast enough.
Although Oracle 10g and above comes with default statistics gathering jobs you cannot rely on them for a few reasons:
They are scheduled and may not run at the right time. If a process significantly changes the data then new stats are needed right away, not at 10 PM. If there are a lot of tables that need to be analyzed the job may not get to them all in one day.
Many DBAs disable the jobs. This is ridiculous and almost always a mistake. But you'll find many DBAs that disabled the job because they think they can do it better. Instead of working with the auto tasks, and settings preferences, many DBAs like to throw the whole thing out and replace it with a custom procedure that rots over time.

After upgrading from Sql Server 2008 to Sql Server 2016 a stored procedure that was fast is now slow

We have a stored procedure that returns all of the records that fall within a geospatial region ("geography"). It uses a CTE (with), some unions, some inner joins and returns the data as XML; nothing controversial or cutting edge but also not trivial.
This stored procedure has served us well for many years on SQL Server 2008. It has been running within 1 sec on a relatively slow server. We have just migrated to SQL Server 2016 on a super fast server with lots of memory and a super fast SDDs.
The entire database and associated application is going really fast on this new server and we are very happy with it. However this one stored procedure is running in 16 sec rather than 1 sec - against exactly the same parameters and exactly the same dataset.
We have updated the indexes and statistics on this database. We have also changed the compatibility level of the database from 100 to 130.
Interesting, I have re-written the stored procedure to use a temporary table and 'insert' rather than using the CTE. This has brought the time down from 16 sec to 4 sec.
The execution plan does not provide any obvious insights into where a bottleneck may be.
We are a bit stuck for ideas. What should we do next? Thanks in advance.
--
I have now spent more time on this problem than i care to admit. I have boiled down the stored procedure to the following query to demonstrate the problem.
drop table #T
declare #viewport sys.geography=convert(sys.geography,0xE610000001041700000000CE08C22D7740C002370B7670F4624000CE08C22D7740C002378B5976F4624000CE08C22D7740C003370B3D7CF4624000CE08C22D7740C003378B2082F4624000CE08C22D7740C003370B0488F4624000CE08C22D7740C004378BE78DF4624000CE08C22D7740C004370BCB93F4624000CE08C22D7740C004378BAE99F4624000CE08C22D7740C005370B929FF4624000CE08C22D7740C005378B75A5F4624000CE08C22D7740C005370B59ABF462406F22B7698E7640C005370B59ABF462406F22B7698E7640C005378B75A5F462406F22B7698E7640C005370B929FF462406F22B7698E7640C004378BAE99F462406F22B7698E7640C004370BCB93F462406F22B7698E7640C004378BE78DF462406F22B7698E7640C003370B0488F462406F22B7698E7640C003378B2082F462406F22B7698E7640C003370B3D7CF462406F22B7698E7640C002378B5976F462406F22B7698E7640C002370B7670F4624000CE08C22D7740C002370B7670F4624001000000020000000001000000FFFFFFFF0000000003)
declare #outputControlParameter nvarchar(max) = 'a value passed in through a parameter to the stored that controls the nature of data to return. This is not the solution you are looking for'
create table #T
(value int)
insert into #T
select 136561 union
select 16482 -- These values are sourced from parameters into the stored proc
select
[GeoServices_Location].[GeographicServicesGatewayId],
[GeoServices_Location].[Coordinate].Lat,
[GeoServices_Location].[Coordinate].Long
from GeoServices_Location
inner join GeoServices_GeographicServicesGateway
on GeoServices_Location.GeographicServicesGatewayId = GeoServices_GeographicServicesGateway.GeographicServicesGatewayId
where
(
(len(#outputControlParameter) > 0 and GeoServices_Location.GeographicServicesGatewayId in (select value from #T))
or (len(#outputControlParameter) = 0 and GeoServices_Location.Coordinate.STIntersects(#viewport) = 1)
)
and GeoServices_GeographicServicesGateway.PrimarilyFoundOnLayerId IN (3,8,9,5)
GO
With the stored procedure boiled down to this, it runs in 0 sec on SQL Server 2008 and 5 sec on SQL Server 2016
http://www.filedropper.com/newserver-slowexecutionplan
http://www.filedropper.com/oldserver-fastexecutionplan
Windows Server 2016 is choking on the Geospatial Intersects call with 94% of the time spent there. Sql Server 2008 is spending its time with with a bunch of other steps including Hash Matching and Parallelism and other standard stuff.
Remember this is the same database. One has just been copied to a SQL Server 2016 machine and had its compatibility level increased.
To get around the problem I have actually rewritten the stored procedure so that Sql Server 2016 does not choke. I have running in 250msec. However this should not have happened in the first place and I am concerned that there are other previously finely tuned queries or stored procedures that are now not running efficiently.
Thanks in advance.
--
Furthermore, I had a suggestion to add the traceflag -T6534 to start up parameter of the service. It made no difference to the query time. Also I tried adding option(QUERYTRACEON 6534) to the end of the query too but again it made no difference.
From the query plans you provided I see that spatial index is not used on newer server version.
Use spatial index hint to make sure query optimizer chose the plan with spatial index:
select
[GeoServices_Location].[GeographicServicesGatewayId],
[GeoServices_Location].[Coordinate].Lat,
[GeoServices_Location].[Coordinate].Long
from GeoServices_Location with (index ([spatial_index_name]))...
I see that the problem with the hint is OR operation in query predicate, so my suggestion with hint actually won’t help in this case.
However, I see that predicate depends on #outputControlParameter so rewriting query in order to have these two cases separated might help (see my proposal below).
Also, from your query plans I see that query plan on SQL 2008 is parallel while on SQL 2016 is serial. Use option (recompile, querytraceon 8649) to force parallel plan (should help if your new superfast server has more cores then the old one).
if (len(#outputControlParameter) > 0)
select
[GeoServices_Location].[GeographicServicesGatewayId],
[GeoServices_Location].[Coordinate].Lat,
[GeoServices_Location].[Coordinate].Long
from GeoServices_Location
inner join GeoServices_GeographicServicesGateway
on GeoServices_Location.GeographicServicesGatewayId = GeoServices_GeographicServicesGateway.GeographicServicesGatewayId
where
GeoServices_Location.GeographicServicesGatewayId in (select value from #T))
and GeoServices_GeographicServicesGateway.PrimarilyFoundOnLayerId IN(3,8,9,5)
option (recompile, querytraceon 8649)
else
select
[GeoServices_Location].[GeographicServicesGatewayId],
[GeoServices_Location].[Coordinate].Lat,
[GeoServices_Location].[Coordinate].Long
from GeoServices_Location with (index ([SPATIAL_GeoServices_Location]))
inner join GeoServices_GeographicServicesGateway
on GeoServices_Location.GeographicServicesGatewayId = GeoServices_GeographicServicesGateway.GeographicServicesGatewayId
where
GeoServices_Location.Coordinate.STIntersects(#viewport) = 1
and GeoServices_GeographicServicesGateway.PrimarilyFoundOnLayerId IN (3,8,9,5)
option (recompile, querytraceon 8649)
check the growth of the data/log files on the new server (DBs) vs old server (DBs) configuration: the DB the query is running on + tempdb
check the log for I/O buffer errors
check recovery model of the DB's - simple vs full/bulk
is this a consistent behavior? maybe a process is running during the execution?
regarding statistics/indexes - are you sure it's running on correct data sample? (look at the plan)
many more things can be checked/done - but there is not enough info in this question.

Oracle accessing multiple databases

I'm using Oracle SQL Developer version 4.02.15.21.
I need to write a query that accesses multiple databases. All that I'm trying to do is get a list of all the IDs present in "TableX" (There is an instance of Table1 in each of these databases, but with different values) in each database and union all of the results together into one big list.
My problem comes with accessing more than 4 databases -- I get this error: ORA-02020: too many database links in use. I cannot change the INIT.ORA file's open_links maximum limit.
So I've tried dynamically opening/closing these links:
SELECT Local.PUID FROM TableX Local
UNION ALL
----
SELECT Xdb1.PUID FROM TableX#db1 Xdb1;
ALTER SESSION CLOSE DATABASE LINK db1
UNION ALL
----
SELECT Xdb2.PUID FROM TableX#db2 Xdb2;
ALTER SESSION CLOSE DATABASE LINK db2
UNION ALL
----
SELECT Xdb3.PUID FROM TableX#db3 Xdb3;
ALTER SESSION CLOSE DATABASE LINK db3
UNION ALL
----
SELECT Xdb4.PUID FROM TableX#db4 Xdb4;
ALTER SESSION CLOSE DATABASE LINK db4
UNION ALL
----
SELECT Xdb5.PUID FROM TableX#db5 Xdb5;
ALTER SESSION CLOSE DATABASE LINK db5
However this produces 'ORA-02081: database link is not open.' On whichever db is being closed out last.
Can someone please suggest an alternative or adjustment to the above?
Please provide a small sample of your suggestion with syntactically correct SQL if possible.
If you can't change the open_links setting, you cannot have a single query that selects from all the databases you want to query.
If your requirement is to query a large number of databases via database links, it seems highly reasonable to change the open_links setting. If you have one set of people telling you that you need to do X (query data from a large number of tables) and another set of people telling you that you cannot do X, it almost always makes sense to have those two sets of people talk and figure out which imperative wins.
If we can solve the problem without writing a single query, then you have options. You can write a bit of PL/SQL, for example, that selects the data from each table in turn and does something with it. Depending on the number of database links involved, it may make sense to write a loop that generates a dynamic SQL statement for each database link, executes the SQL, and then closes the database link.
If you want need to provide a user with the ability to run a single query that returns all the data, you can write a pipelined table function that implements this sort of loop with dynamic SQL and then let the user query the pipelined table function. This isn't really a single query that fetches the data from all the tables. But it is as close as you're likely to get without modifying the open_links limit.

Same query, slow on Oracle 9, fast on Oracle 10

We have a table called t_reading, with the following schema:
MEAS_ASS_ID NUMBER(12,0)
READ_DATE DATE
READ_TIME VARCHAR2(5 BYTE)
NUMERIC_VAL NUMBER
CHANGE_REASON VARCHAR2(240 BYTE)
OLD_IND NUMBER(1,0)
This table is indexed as follows:
CREATE INDEX RED_X4 ON T_READING
(
"OLD_IND",
"READ_DATE" DESC,
"MEAS_ASS_ID",
"READ_TIME"
)
This exact table (with the same data) exists on two servers, the only difference is the Oracle version installed on each one.
The query in question is:
SELECT * FROM t_reading WHERE OLD_IND = 0 AND MEAS_ASS_ID IN (5022, 5003) AND read_date BETWEEN to_date('30/10/2012', 'dd/mm/yyyy') AND to_date('31/10/2012', 'dd/mm/yyyy');
This query executes in less than a second on Oracle 10, and around a minute in Oracle 9.
Are we missing something?
EDIT:
Execution plan for Oracle 9:
Execution plan for Oracle 10:
"Are we missing something?"
Almost certainly, but it's difficult for us to tell you what.
There were some performance improvements in the CBO from 9i to 10g but it's unlikely to make that much difference. So it must be some variation in your systems, which is obviously the hardest thing for us to diagnose, blind and remote as we are.
So, first things to rule out are general system diffences - disk speeds, i/o bottlenecks, memory sizing, etc. You say you have two servers, do they have different specs? Whilst it will require assistence from an sysadmin type to investigate these things, we can discount them with a single question: is it just this query, or can you reproduce this effect with many different queries?
If is just the query, there are at least three possible explanations.
One is data distribution. How was the data populated in the two databases? If the 10g was exported from the 9i database was it sorted in some fashion? Even if it wasn't it is possible that the ETL process has compacted and organised the data and built the fresh indexes in a way which improves the access times.
Another is statistics. Are the 10g statistics fresh and realistic, whilst the 9i statistics are stale and misleading?
A third possibility is a stored execution plan. (You have posted a query with literals, this only applies to queries with bind variables.) Searches on date ranges are notoriously hard to tune. A date range of to_date('30/10/2012', 'dd/mm/yyyy') AND to_date('31/10/2012', 'dd/mm/yyyy') suits one sort of plan, whereas date range of to_date('01/01/2010', 'dd/mm/yyyy') AND to_date('31/10/2012', 'dd/mm/yyyy') may well suit a different approach. If the extant plan on the 9i database suits a broader range then queries for a narrow range may take a long time.
While I've been typing this you have published the explain plans. The killer detail is at the bottom of the 9i plan:
Note: rule-based optimization
You haven't got any stats for the table or the index, so the optimizer is applying the dumb defaults of the RBO. You should really address this, but it's not a simple task. You may need to gather stats for all your tables. You may need to change the OPTIMIZER_MODE in the init.ora file. You may need to undertake a regression test of all the queries on your database. So, it's not something you shoudl do lightly.
In the meantime, if this query is bugging you, you'll need to wrnagle the Rule-Based Optimizer the old-fashioned way. Find out more.
A couple of potential explanations:
You're range scanning different indexes.
Assuming that you've got the same index on your 10g table but you've
just called it a different thing the explain plans are different.
The main worry I would have is the lack of information in the rows, bytes and cost column of the explain plan on your 9i query. Oracle 9i does not collect statistics by default and this detail would indicate that you have not collected statistics on this table. Use dbms_stats to gather statistics on your table and the indexes. Specifically the procedure gather_table_stats:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS (
ownname => user,
tabname => 'T_READING',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE
method_opt => 'FOR ALL INDEXED COLUMNS',
cascade => TRUE, -- gather index statistics
);
END:
There are plenty of other options if you're interested. Assuming the indexes are different this may help the CBO (assuming it's "turned on") to pick the correct index.
The other options include what server they are on and what the database parameters are. If they're on different servers then the relative "power", disk-speed, I/O and a never-ending list of other options could easily cause a difference. If the database parameters are different then you have the same problem.
Database tuning is an art as much as a science. Oracle has a whole book on it and there are plenty of other resources out there.
A few observations:
your index is a DESCENDING index. This is a function based index, as such, it won't work as expected under the RULE optimizer.
your 9i plan shows access only on OLD_IND, your 10g plan (you cut off the important predicate bits) shows a range scan + inlist iterator, so depending on that RED_PK, it may be accessing on MEAS_ASS_ID which is perhaps more selective.
in terms of indexing too, to answer your query WHERE OLD_IND = 0 AND MEAS_ASS_ID IN (5022, 5003) AND read_date BETWEEN ie OLD_IND equality, MEAS_ASS_ID equality and read_date range scanned, a better index is (OLD_IND , MEAS_ASS_ID , READ_DATE): do the range scan last to cut down on IO.
Have you tried running an explain on the queries on the two servers, the query optimiser for 9i is different from the one for 10g. The 10g query optimiser is much faster and parallelised. Check out the following link Upgrading Query Optimiser
explain SELECT * FROM t_reading WHERE OLD_IND = 0 AND MEAS_ASS_ID IN (5022, 5003) AND read_date BETWEEN to_date('30/10/2012', 'dd/mm/yyyy') AND to_date('31/10/2012', 'dd/mm/yyyy');

Resources