Oracle data conversion - Bulk/ForALL won't work - oracle

I'm trying to find a better way to pull data from on table into another as part of a larger processing project. I thought I could do it through BULK COLLECT and FORALL, and pick up significant speed, but I don't think I can handle individual column references using BULK COLLECT...
I have a data / application migration project (MSSQL to Oracle 11.2) on inheritance. I'm trying to optimize and check end to end... The first step of the process is to import legacy data (database table, 4.5M records, 170 columns, all in string format) into another table.
The initial conversion was cursor-based, looping row-by-row, with each column going through at least one function for clearning/conversion. It worked, but on the test system it took too long -- over 12 hours to translate 4.5 million records from one table to another table with very simple functions.
On a local implementation I have access to, they wound up limiting to 13000 unit id numbers IDs over 220k records.
I set up an even more limited dev system on my laptop for testing alternative techniques -- and can get over 5 times the import speed, but that's still cursor/row-by-row. I've set the table to NOLOGGING and use the APPEND hint. I've tested with/without indexes. I can't do SELECT INTO with that size table -- it just chokes.
Is there another / better technique? How else can I pick up conversion speed? Am I doing it wrong w/ the BULK COLLECT (i.e. IS there a way to reference the individual fields?)
If anybody has any insight, please chip in! I am including a very stripped down version of the procedure, so I can show my usage attempt. This same thing (pretty much) runs as a regular cursor loop, just not with the FORALL and (i) subscripts. The error I get was ORA-00913: Too Many Values. I have been over the full insert statement, matching fields to values. I've checked the data transformation functions - they work for regular columns as parameters. I am wondering if they don't work w/ BULK COLLECT and/or FORALL because of the subscripts??
UPDATED INFORMATION:
This is on a restricted-access system, and up until now (waiting for accounts), I've been having to remote diagnose the "real" (customer) DEV system, by running against a local system -- profiling code, data, timing, etc. My recommendations were put in by another developer, who would feed me back results. Seriously. However...
#Mark, #Justin - Normally, I would get rid of any cursors not ?absolutely? needed, and use SELECT INTO where possible. That's usually my 1st recommendation on older PL/SQL code... ("Why. So. Cursor?" wearing Joker make-up). That's the first thing I tried on the local system, but it just slowed the server down to a crawl and I quit testing. That was before the reduced NOLOGGING was implemented - That's what I'll attempt when I can touch the dev system.
After looking at the timing, queries, joins, indexes, and crying, I recommended NOLOGGING and converting to INSERT /*+ APPEND */ -- which bought time in other processes, mainly tables built off joins.
re: the " OID <= '000052000' " - when they set up there first converted code on the cust dev system, they had to limit the amount of records that they converted from the PMS_OHF table. Originally, they could get 13000 personnel identifiers to process in a reasonable amount of time. Those 13000 IDs would be in about 220K records, so, that's what they were moving in when I came on board. Some rewrites, join corrections, and the NOLOGGING/Insert Append made a big enough difference that they went on. On the local system, I thought 13000 was too small -- I don't think that I get a meaningful comparison against the legacy result -- so I upped it, and upped it. I should be brave and try a full conversion on the laptop dev system -- here I can at least watch what's going on through EM... the gov't won't allow their DBAs to use it. (!?)
BIGGER INFORMATION: -- after pondering the 00913 error again, and thinking back to other projects, I realized the earlier errors were when more than one element was passed to a function that expected a single element... which points me back tp my trying to use subscripted field names in a BULK COLLECT loop. I re-watched a couple of Steven Feuerstein YT presentations, and I think it finally sank in. The simple web examples... I was making my types horizontally, not vertically (or vice-versa)... in order to get my function calls to work, I think I have to make a TYPE for each Field, and an ARRAY/TABLE of that TYPE. Suddenly (170 times) I'm thinking that I will look at some Tom Kyte lessons on manually parallelism, and ask wx I'll have access to the new (11.2?) DBMS_PARALLEL_EXECUTE interface -- which I doubt. Also, not knowing more about the cust dev system, other than descriptions best termed "inadequate", I don't know wx //ism would be a huge help. I need to read up on //ism
All I know is, I have to get some full runs completed or I won't feel comfortable saying that our results are "close enough" to the legacy results. We might not have much choice over a multi-day full run for our testing.
PROCEDURE CONVERT_FA IS
CURSOR L_OHF IS -- Cursor used to get SOURCE TABLE data
SELECT *
FROM TEST.PMS_OHF -- OHF is legacy data source
where OID <= '000052000' -- limits OHF data to a smaller subset
ORDER BY ID ;
L_OHF_DATA TEST.PMS_OHF%ROWTYPE;
L_SHDATA TEST.OPTM_SHIST%ROWTYPE;
Type hist_Array is table of TEST.PMS_OHF%ROWTYPE;
SHF_INPUT hist_array ;
Type Ohist_Array is table of TEST.OPTM_SHIST%ROWTYPE;
TARG_SHIST ohist_Array ;
n_limit number := 1000 ;
BEGIN
begin
OPEN L_OHF;
LOOP
FETCH L_OHF BULK COLLECT INTO SHF_INPUT LIMIT n_limit ;
FORALL i in 1 .. n_limit
INSERT INTO TEST.OPTM_SHIST
( -- There are 170 columns in target table, requiring diff't xformations
RECORD_NUMBER , UNIQUE_ID , STRENGTH_YEAR_MONTH , FY , FM , ETHNIC ,
SOURCE_CODE_CURR , SOURCE_CODE_CURR_STAT ,
-- ... a LOT more fields
DESG_DT_01 ,
-- and some place holders for later
SOURCE_CALC , PSID , GAIN_CURR_DT_CALC
)
values
( -- examples of xformatiosn
SHF_INPUT.ID(i) ,
'00000000000000000000000' || SHF_INPUT.IOD(i) ,
TEST.PMS_UTIL.STR_TO_YM_DATE( SHF_INPUT.STRYRMO(i) ) ,
TEST.PMS_UTIL.STR_TO_YEAR( SHF_INPUT.STRYRMO(i) ) ,
TEST.PMS_UTIL.STR_TO_MONTH( SHF_INPUT.STRYRMO(i) ) ,
TEST.PMS_UTIL.REMOVE_NONASCII( SHF_INPUT.ETHNIC(i) ) ,
-- ... there are a lot of columns
TEST.PMS_UTIL.REMOVE_NONASCII( SUBSTR( SHF_INPUT.SCCURPRICL(i),1,2 ) ) ,
TEST.PMS_UTIL.REMOVE_NONASCII( SUBSTR( SHF_INPUT.SCCURPRICL(i),3,1 ) ) ,
-- an example of other transformations
( case
when (
(
SHF_INPUT.STRYRMO(i) >= '09801'
AND
SHF_INPUT.STRYRMO(i) < '10900'
)
OR
(
SHF_INPUT.STRYRMO(i) = '10901'
AND
SHF_INPUT.DESCHGCT01(i) = '081'
)
)
then TEST.PMS_UTIL.STR_TO_DATE( SHF_INPUT.DESCHGCT01(i) || SHF_INPUT.DESCHGST01(i) )
else TEST.PMS_UTIL.STR_TO_DATE( SHF_INPUT.DESCHGDT01(i) )
end ),
-- below are fields that will be filled later
null , -- SOURCE_CALC ,
SHF_INPUT.OID(i) ,
null -- GAIN_CURR_DT_CALC
) ;
EXIT WHEN L_OHF%NOTFOUND; -- exit when last row is fetched
END LOOP;
COMMIT;
close L_OHF;
END;
end CONVERT_OHF_FA;

execute immediate 'alter session enable parallel dml';
INSERT /*+ APPEND PARALLEL */ INTO TEST.OPTM_SHIST(...)
SELECT ...
FROM TEST.PMS_OHF
WHER OID <= '000052000';
This is The Way to do large data loads. Don't be fooled by all the fancy PL/SQL options like bulk collect, pipelined tables, etc. They are rarely any faster or easier to use than plain old SQL. The main benefit of those featues is to improve performance of a row-by-agonizing-row process without significant refactoring.
In this case it looks like there's already virtually no logic in PL/SQL. Almost all the PL/SQL can be thrown out and replaced with a single query. This makes it much easier to modify, debug, add parallelism, etc.
Some other tips:
The ORDER BY is probably not helpful for data loads. Unless you're trying to do something fancy with indexes, like improve the clustering factor or rebuild without sorting.
Ensure your functions are declared as DETERMINISTIC if the output is always identical for the same input. This may help Oracle avoid calling the function for the same result. For even better performance, you can inline all of the functions in the SQL statement, but that can get messy.
If you still need to use BULK COLLECT, use the hint APPEND_VALUES, not APPEND.

After dropping this for other issues, I picked this up again today.
Someone sent me a snippet of their similar code and I decided I was going to sit down and just brute-force through the issue: Go to the minimum # of columns and match values, and increase columns/values and recompile...
And then it hit me... my index was in the wrong place.
INCORRECT form:
SHF_INPUT.ID(i) ,
'00000000000000000000000' || SHF_INPUT.IOD(i) ,
TEST.PMS_UTIL.STR_TO_YM_DATE( SHF_INPUT.STRYRMO(i) ) ,
CORRECT form:
SHF_INPUT(i).ID ,
'00000000000000000000000' || SHF_Input(i).IOD ,
TEST.PMS_UTIL.STR_TO_YM_DATE( SHF_Input(i).STRYRMO ) ,
I blame it on looking at early multi-column bulk collect examples and assuming I could convert them to %ROWTYPE examples off the top of my head. I got impatient and didn't check.
Thank you for your help and recommendations.

Related

SP using table/index with volatile statistics that differ at compile and run time

I’m a longtime MSSQL developer who finds himself back in PL/SQL for the first time since Oracle 7. I’m looking for some tuning advice re a large export stored procedure, which is sporadically and not very reproducably running slow at certain points. This happens around some static working tables which it truncates, fills and uses as part of the export. The code in outline typically looks like this:
create or replace Procedure BigMultiPurposeExport as (
-- about 2000 lines of other code
INSERT WORK_TABLE_5 SELECT WHATEVER1 FROM WHEREVER1;
INSERT WORK_TABLE_5 SELECT WHATEVER2 FROM WHEREVER2;
INSERT WORK_TABLE_5 SELECT WHATEVER3 FROM WHEREVER3;
INSERT WORK_TABLE_5 SELECT WHATEVER4 FROM WHEREVER4;
-- WORK_TABLE_5 now has 0 to ~500k rows whose content can vary drastically from run to run
-- e.g. one hourly run exports 3 whale sightings, next exports all tourist visits to Kenya this decade
-- about 1000 lines of other code
INSERT OUTPUT_TABLE_3
SELECT THIS, THAT, THE_OTHER
FROM BUSINESS_TABLE_1 BT1
INNER JOIN BUSINESS_TABLE_2 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_3 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_1 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_2 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_3 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_5 WT5 ON BT1.ID = WT5.BT1_ID AND WT5.RECORD_TYPE = 21
-- join above is now supported by indexes on BUSINESS_TABLE_1 (ID) and WORK_TABLE_5 (BT1_ID, RECORD_TYPE), originally wasn't
LEFT OUTER JOIN WORK_TABLE_6 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_7 ON etc -- typical join on indexed columns
-- about 4000 lines of other code
)
That final insert into OUTPUT_TABLE_3 usually runs in under 10 seconds, but once in a while on certain customer servers it times out at our default 99 minutes. Then we have them take the tiemout off and run it on Friday night, and it finishes but takes 16 hours.
I narrowed the problem down to the join to WORK_TABLE_5, which had no index support, and put an index on the join terms. The next run took 4 seconds. But success has been intermittent, the customer occasionally gets some slow runs when they drastically change their export selection (i.e. drastically change the data in WORK_TABLE_5). And if we update statistics and rebuild indexes after a timed out export, it runs fine at the next attempt.
So, I am wondering about how best to handle truncating/filling static work tables with static indexes, statistics updated overnight, and a stored procedure compiled when the statistics are nothing like runtime.
I have a few general questions about things I'd like to understand better:
Is the nature of the data in the work table going to substantially effect the query plan? Does Oracle form its query plan when you compile the stored procedure? Could we get a highly inappropriate query plan if we compile the stored procedure with the table empty then use a table with 500k rows at runtime?
I expect that if this were an ad-hoc script then updating statistics on the problem table just before selecting from it would eliminate the sporadic slowdowns. But what if I were to update statistics inside the stored procedure, which is compiled with different statistics from runtime?
Anything else you'd like to add...
Thanks for any advice. I hope my MSSQL preconceptions haven't made me too far off base.
This is happening in Oracle 11g, but the code is deployed to assorted customers using Oracle 10 through 12 and I'd like to cater to all of those if possible.
-- Joel
Huge differences in table or index sizes can most definitely cause performance problems. The solution is to add statistics gathering to the procedure instead of relying on the default statistics jobs.
If you've been away from Oracle since version 7, the most important new feature is the Cost Based Optimizer. Oracle now builds query execution plans based on the optimizer statistics of tables, indexes, columns, expressions, system statistics, outlines, directives, dynamic sampling, etc. If you're a full time Oracle developer you should probably spend a day reading about optimizer statistics. Start with Managing Optimizer Statistics and DBMS_STATS in the official documentation.
Eventually the stored procedure should look like this:
--1: Insert into working tables.
insert into work_table...
--2: Gather statistics on working tables.
dbms_stats.gather_table_stats('SCHEMA_NAME', 'WORK_TABLE', ...);
--3: Use working tables.
insert into other_table select * from work_table...
There are so many statistics features it's hard to know exactly what parameters to use in that second step above. Here are some guesses about some features you might find useful:
DEGREE - One reason people avoid gathering statistics inside a process is the time is takes. You can significantly improve the run time by setting the degree. Although this also uses significantly more resources.
NO_INVALIDATE - It can be tricky to know when exactly are the statistics "set" for a query. Gathering statistics usually quickly invalidates execution plans that were based on old statistics. But not always. If you want to be 100% sure that the next query is using the latest statistics you want to set NO_INVALIDATE=>FALSE.
ESTIMATE_PERCENT In 11g and above you definitely want to use the default, which uses a faster algorithm. In 10g and below you may need to set the value to something low to make the gathering fast enough.
Although Oracle 10g and above comes with default statistics gathering jobs you cannot rely on them for a few reasons:
They are scheduled and may not run at the right time. If a process significantly changes the data then new stats are needed right away, not at 10 PM. If there are a lot of tables that need to be analyzed the job may not get to them all in one day.
Many DBAs disable the jobs. This is ridiculous and almost always a mistake. But you'll find many DBAs that disabled the job because they think they can do it better. Instead of working with the auto tasks, and settings preferences, many DBAs like to throw the whole thing out and replace it with a custom procedure that rots over time.

Oracle Temporary Table to convert a Long Raw to Blob

Questions have been asked in the past that seems to handle pieces of my full question, but I'm not finding a totally good answer.
Here is the situation:
I'm importing data from an old, but operational and production, Oracle server.
One of the columns is created as LONG RAW.
I will not be able to convert the table to a BLOB.
I would like to use a global temporary table to pull out data each time I call to the server.
This feels like a good answer, from here: How to create a temporary table in Oracle
CREATE GLOBAL TEMPORARY TABLE newtable
ON COMMIT PRESERVE ROWS
AS SELECT
MYID, TO_LOB("LONGRAWDATA") "BLOBDATA"
FROM oldtable WHERE .....;
I do not want the table hanging around, and I'd only do a chunk of rows at a time, to pull out the old table in pieces, each time killing the table. Is it acceptable behavior to do the CREATE, then do the SELECT, then DROP?
Thanks..
--- EDIT ---
Just as a follow up, I decided to take an even different approach to this.
Branching the strong-oracle package, I was able to do what I originally hoped to do, which was to pull the data directly from the table without doing a conversion.
Here is the issue I've posted. If I am allowed to publish my code to a branch, I'll post a follow up here for completeness.
Oracle ODBC Driver Release 11.2.0.1.0 says that Prefetch for LONG RAW data types is supported, which is true.
One caveat is that LONG RAW can technically be up to 2GB in size. I had to set a hard max size of 10MB in the code, which is adequate for my use, so far at least. This probably could be a variable sent in to the connection.
This fix is a bit off original topic now however, but it might be useful to someone else.
With Oracle GTTs, it is not be necessary to drop and create each time, and you don't need to worry about data "hanging around." In fact, it's inadvisable to drop and re-create. The structure itself persists, but the data in it does not. The data only persists within each session. You can test this by opening up two separate clients, loading data with one, and you will notice it's not there in the second client.
In effect, each time you open a session, it's like you are reading a completely different table, which was just truncated.
If you want to empty the table within your stored procedure, you can always truncate it. Within a stored proc, you will need to execute immediate if you do this.
This is really handy, but it also can make debugging a bear if you are implementing GTTs through code.
Out of curiosity, why a chunk at a time and not the entire dataset? What kind of data volumes are you talking about?
-- EDIT --
Per our comments conversation, this is very raw and untested, but I hope it will give you an idea what I mean:
CREATE OR REPLACE PROCEDURE LOAD_DATA()
AS
TOTAL_ROWS number;
TABLE_ROWS number := 1;
ROWS_AT_A_TIME number := 100;
BEGIN
select count (*)
into TOTAL_ROWS
from oldtable;
WHILE TABLE_ROWS <= TOTAL_ROWS
LOOP
execute immediate 'truncate table MY_TEMP_TABLE';
insert into MY_TEMP_TABLE
SELECT
MYID, TO_LOB(LONGRAWDATA) as BLOBDATA
FROM oldtable
WHERE ROWNUM between TABLE_ROWS and TABLE_ROWS + ROWS_AT_A_TIME;
insert into MY_REMOTE_TABLE#MY_REMOTE_SERVER
select * from MY_TEMP_TABLE;
commit;
TABLE_ROWS := TABLE_ROWS + ROWS_AT_A_TIME;
END LOOP;
commit;
end LOAD_DATA;

When do we say "let's use %ROWTYPE" and not %TYPE?

If I have a table myTable with around 15 columns and I want to SELECT just around 5 columns, is it worth it to have a rec myTable%ROWTYPE (the 10 cols will not be used at all) or just create a TYPE rec... and manually create fields like (col1 myTable.col1%TYPE...) for all 5 columns?
Up to what extent do we usually create manual types vs rowtypes?
How do I declare a rowtype for SELECT queries with joins? for example rec tableA%ROWTYPE + probably another field or 2 from tableB but all in 1 record type? Do I create it manually? (If only PLSQL has something like "extends" or record inheritance.
Sorry for the long questions, I hope I make sense.
Thanks in advance.
My preference is to always use the %ROWTYPE, or better yet a cursor FOR loop, to declare the variables. Back In The Day (tm) we used to worry A LOT about memory use, and thus declaring individual fields might have made sense if it saved us a few bytes. But nowadays, IMO, when we have memory space measured in gigabytes the greater code complexity outweighs the value of saving a few bytes. As far as execution speed goes - oh, please. We're in a world where some of the most common languages run under a virtual machine. I've spent hours hand-optimizing assembler to wring the last freaking cycle out of tight loop. Java makes me laugh. If we're running software under VM's what we're saying is "We've got too many cycles to burn!".
To address each issue:
I prefer using cursor FOR loops for just about everything. This is part of continually trying to answer Cunningham's Question - "What's The Simplest Thing That Could Possibly Work?". IMO using rowtype variables is The Simplest Thing. Using them eliminates the need to deal with NO_DATA_FOUND and TOO_MANY_ROWS exceptions explicitly - so I'll favor
FOR aRow IN (SELECT col1, col2, col3
FROM myTable
WHERE some_col = some_value)
LOOP
NULL; -- do something useful here
END LOOP;
over
DECLARE
var1 myTable.COL1%TYPE;
var2 myTable.COL2%TYPE;
var3 myTable.COL3%TYPE;
BEGIN
SELECT col1, col2, col3
INTO var1, var2, var3
FROM myTable
WHERE some_col = some_value;
NULL; -- do something useful here
EXCEPTION
WHEN NO_DATA_FOUND THEN
NULL; -- do something appropriate here
WHEN TOO_MANY_ROWS THEN
NULL; -- do something appropriate here
END;
even if I know that some_col is unique - largely because where I work what is unique today can be made non-unique tomorrow at the whim of some developer on some team I don't even know exists. It's called "defensive programming", and if it keeps me from being called at 2:00 AM I'm a happy camper.
Someone's going to complain that doing this requires the overhead of opening a cursor when it isn't absolutely necessary. In a whole bunch of years of programming I have never encountered a situation where doing this slowed a program down so much that the code had to be rewritten to use a stand-alone singleton SELECT. I suppose that somewhere in the dark depths of the digital jungle, out in the wilds where clock speeds are still measured in single-digit megahertz, memory is measured in kilobytes, those bytes have only got seven bits, and the Ravenous Bugblatter Beast of Traal awaits the un-towelled this may not apply - but where I'm at (and believe me, the cutting edge it ain't) it's a good-enough strategy.
I very seldom create stand-alone variables to hold the results of SELECTs or cursors, largely because of Rule #1 above.
To correctly quote Obi-Wan, "Use the cursor FOR loop, Luke!". (Seriously - that's what he really said. All the "force" stuff is just a bunch of junk dreamed up by some hard-of-hearing film director. Feh!). Writing something like
FOR aRow IN (SELECT *
FROM table1 t1
INNER JOIN table2 t2 on (t2.fieldx = t1.fieldx)
INNER JOIN table3 t3 on (t3.fieldy = t2.fieldy))
LOOP
NULL; -- whatever
END LOOP;
is, once again, The Simplest Thing That Could Possibly Work. Simpler is gooder.
:-)
Share and enjoy.
Good place to start is the documentation:
http://docs.oracle.com/cd/B19306_01/appdev.102/b14261/overview.htm#sthref159
(Oracle Database PL/SQL User's Guide and Reference)
Than improve it with a article written by Steven Feuerstein:
http://www.oracle.com/technetwork/issue-archive/2012/12-may/o32plsql-1578019.html

SELECT * FROM TABLE(pipelined function): can I be sure of the order of the rows in the result?

In the following example, will I always get “1, 2”, or is it possible to get “2, 1” and can you tell me where in the documentation you see that guarantee if it exists?
If the answer is yes, it means that without ORDER BY nor ORDER SIBLINGS there is a way to be sure of the result set order in a SELECT statement.
CREATE TYPE temp_row IS OBJECT(x number);
/
CREATE TYPE temp_table IS TABLE OF temp_row;
/
CREATE FUNCTION temp_func
RETURN temp_table PIPELINED
IS
BEGIN
PIPE ROW(temp_row(1));
PIPE ROW(temp_row(2));
END;
/
SELECT * FROM table(temp_func());
Thank you.
I don't think that there's anywhere in the documentation that guarantees the order that data will be returned in.
There's an old Tom Kyte thread from 2003 (so might be out of date) which states that relying on the implicit order would not be advisable, for the same reasons as you would not rely on the order in ordinary SQL.
1st: is the order of rows returned from the table function within a
SQL statement the exact same order in which the entries were "piped"
into the internal collection (so that no order by clause is needed)?
...
Followup May 18, 2003 - 10am UTC:
1) maybe, maybe not, I would not count on it. You should not count
on the order of rows in a result set without having an order by. If
you join or do something more complex then simply "select * from
table( f(x) )", the rows could well come back in some other order.
empirically -- they appear to come back as they are piped. I do not
believe it is documented that this is so.
In fact, collections of type NESTED TABLE are documented to explicitly
not have the ability to preserve order.
To be safe, you should do as you always would in a query, state an explicit ORDER BY, if you want the query results ordered.
Having said that I've taken your function and run 10 million iterations, to check whether the implicit order was ever broken; it wasn't.
SQL> begin
2 for i in 1 .. 10000000 loop
3 for j in ( SELECT a.*, rownum as rnum FROM table(temp_func()) a ) loop
4
5 if j.x <> j.rnum then
6 raise_application_error(-20000,'It broke');
7 end if;
8 end loop;
9 end loop;
10 end;
11 /
PL/SQL procedure successfully completed.
This procedural logic works differently to table-based queries. The reason that you cannot rely on orders in a select from a table is that you cannot rely on the order in which the RDBMS will identify rows as part of the required set. This is partly because of execution plans changing, and partly because there are very few situations in which the physical order of rows in a table is predictable.
However here you are selecting from a function that does guarantee the order in which the rows are emitted from the function. In the absence of joins, aggregations, or just about anything else (ie. for a straight "select ... from table(function)") I would be pretty certain that the row order is deterministic.
That advice does not apply where there is a table involved unless there is an explicit order-by, so if you load your pl/sql collection from a query that does not use an order-by then of course the order of rows in the collection is not deterministic.
The AskTom link in accepted answer is broken at this time but I found newer yet very similar question. After some "misunderstanding ping-pong", Connor McDonald finally admits the ordering is stable under certain conditions involving parallelism and ref cursors and related only to current releases. Citation:
Parallelism is the (potential) risk here.
As it currently stands, a pipelined function can be run in parallel only if it takes a ref cursor as input. There is of course no guarantee that this will not change in future.
So you could run on the assumption that in current releases you will get the rows back in order, but you could never 100% rely on it being the case now and forever more.
So no guarantee is given for future releases.
The function in question would pass this criterion hence it should provide stable ordering. However, I wouldn't personally trust it. My case (when I found this question) was even simpler: selecting from collection specified literally - select column_value from table(my_collection(5,3,7,2)) and I preferred explicit pairing between data and index anyway. It's not so hard and not much more longer.
Oracle should learn from Postgres where this situation is solved by unnest(array) with ordinality which is clearly understandable, trustworthy and well-documented feature.

History records, missing records, filling in the blanks

I have a table that contains a history of costs by location. These are updated on a monthly basis.
For example
Location1, $500, 01-JAN-2009
Location1, $650, 01-FEB-2009
Location1, $2000, 01-APR-2009
if I query for March 1, I want to return the value for Feb 1, since March 1 does not exist.
I've written a query using an oracle analytic, but that takes too much time (it would be fine for a report, but we are using this to allow the user to see the data visually through the front and and switch dates, requerying takes too long as the table is something like 1 million rows).
So, the next thought I had was to simply update the table with the missing data. In the case above, I'd simply add in a record identical to 01-FEB-2009 except set the date to 01-MAR-2009.
I was wondering if you all had thoughts on how to best do this.
My plan had been to simply create a cursor for a location, fetch the first record, then fetch the next, and if the next record was not for the next month, insert a record for the missing month.
A little more information:
CREATE TABLE MAXIMO.FCIHIST_BY_MONTH
(
LOCATION VARCHAR2(8 BYTE),
PARKALPHA VARCHAR2(4 BYTE),
LO2 VARCHAR2(6 BYTE),
FLO3 VARCHAR2(1 BYTE),
REGION VARCHAR2(4 BYTE),
AVG_DEFCOST NUMBER,
AVG_CRV NUMBER,
FCIDATE DATE
)
And then the query I'm using (the system will pass in the date and the parkalpha). The table is approx 1 million rows, and, again, while it takes a reasonable amount of time for a report, it takes way too long for an interactive display
select location, avg_defcost, avg_crv, fcimonth, fciyear,fcidate from
(select location, avg_defcost, avg_crv, fcimonth, fciyear, fcidate,
max(fcidate) over (partition by location) my_max_date
from FCIHIST_BY_MONTH
where fcidate <='01-DEC-2008'
and parkalpha='SAAN'
)
where fcidate=my_max_date;
The best way to do this is to create a PL/SQL stored procedure that works backwards from the present and runs queries that fail to return data. Each month that it fails to return data it inserts a row for the missing data.
create or replace PROCEDURE fill_in_missing_data IS
cursor have_data_on_date is
select locaiton, trunc(date_filed) have_date
from the_table
group by location, trunc(date_field)
order by desc 1
;
a_date date;
day_offset number;
n_days_to_insert number;
BEGIN
a_date := trunc(sysdate);
for r1 in fill_in_missing_data loop
if r1.have_date < a_date then
-- insert dates in a loop
n_days_to_insert := a_date - r1.have_date; -- Might be off by 1, need to test.
for day_offset in 1 .. n_days_to_insert loop
-- insert missing day
insert into the_table ( location, the_date, amount )
values ( r1.location, a_date-day_offset, 0 );
end loop;
end if;
a_date := r1.have_date;
-- this is a little tricky - I am going to test this and update it in a few minutes
end loop;
END;
Filling in the missing data will (if you are careful) make the queries much simpler and run faster.
I would also add a flag to the table to indicate that the data is missing data filled in so that if
you need to remove it (or create a view without it) later you can.
I have filled in missing data and also filled in dummy data so that outer join were not necessary so as to improve query performance a number of times. It is not "clean" and "perfect" but I follow Leflar's #1 Law, "always go with what works."
You can create a job in Oracle that will automatically run at off-peak times to fill in the missing data. Take a look at: This question on stackoverflow about creating jobs.
What is your precise use case underlying this request?
In every system I have worked on, if there is supposed to be a record for MARCH and there isn't a record for MARCH the users would like to know that fact. Apart from anything they might want to investigate why the MARCH record is missing.
Now if this is basically a performance issue then you ought to tune the query. Or if it presentation issue - you want to generate a matrix of twelve rows and that is difficult if a doesn't have a record for some reason - then that is a different matter, with a variety of possible solutions.
But seriously, I think it is a bad practice for the database to invent replacements for missing records.
edit
I see from your recent comment on your question that is did turn out to be a performance issue - indexes fixed the problem. So I feel vindicated.

Resources