Commit In loop gives wrong output? - oracle

I am trying to insert 1 to 10 numbers except 6and 8 in table messages,but when i fetch it from table mesages1, output is coming in this order
4
5
7
9
10
1
2
3
It should be like this
1
2
3
4
5
7
9
10
According to the logic ,it works fine when i omit commit or put it some where else,
Please explain why it is happening?
this is my code.
BEGIN
FOR i IN 1..10
LOOP
IF i<>6 AND i<>8
THEN
INSERT INTO messages1
VALUES (i);
END IF;
commit;
END LOOP;
END;
select * from messages1;

If you don't use ORDER BY, you should assume the order the results appear in is undefined. Often the results are in the same order they were inserted in, but it's not guaranteed.
Bottom line, if you want your results in some specific order, use ORDER BY.

As Matti says you need the order by clause explicity to guarantee the ordering is returned correctly.
When you have pending changes (ie uncommitted ones) you are the only one able to see them (generally...) this because they haven't been added to the data store where the other data is. Oracle maintains a separate list of pending changes which it uses to alter the results it it gets from the main data store. In your example the changing from this list happens to be returning in order, as there is very little data in the example Oracle presumably isn't needing to split the pending data in any way for optimise its storage.
Once the data is committed it will go into the main database storage and be ordered in any number of possible ways depending on how the table and partition is set up.
So in short, the data is coming from two different places before and after the commit, it just so happens they are returning in different orderings, but don't rely on them not always behaving like that.

Related

On Oracle Datatbase, can I perform aggregate functions on the results from a RETURNING clause?

Let's say I have a statement, dedupe_sql that performs a deduplication routine on a certain set of records.
I can output which keys have had their duplicates removed like this:
DECLARE
dedupe_sql VARCHAR2( 4000 BYTE ) := '...The details are not important...';
deleted_keys DBMS_SQL.VARCHAR2_TABLE
BEGIN
EXECUTE IMMEDIATE dedupe_sql
RETURNING key BULK COLLECT INTO deleted_keys;
FOR key_index IN 1 .. deleted_keys.COUNT
LOOP
DBMS_OUTPUT.PUT_LINE( 'Deleted: ' || deleted_keys( key_index ) )
END LOOP;
END;
But now let's say that each key could have multiple duplicates removed, and for auditing purposes, my output needs to display each removed key only once, followed by the number of occurrences that I removed.
And let's say that the input data set contains a huge volume of data. (And so, potentially, does the set of rows to be deleted.)
Is there any Oracle construct that would allow me to BULK COLLECT the results of my delete operation, and then run a COUNT/GROUP BY aggregate function on the result?
I have thought of creating a table to hold the results, which I would then drop when I no longer needed it, but I am wondering if there's a good way to do this in memory instead.
Many thanks!
Generally, a regular delete would always have better performance than a bulk delete and the only reason to use a bulk delete is if you can't make the delete in a single statement.
I may misunderstand your implication, however I would suggest to do the grouping with and counting as a select. The select will give you the same scope if you just want to log some statistics and then delete that scope.

Oracle: difference between max(id)+1 and sequence.nextval

I am using Oracle
What is difference when we create ID using max(id)+1 and using sequance.nexval,where to use and when?
Like:
insert into student (id,name) values (select max(id)+1 from student, 'abc');
and
insert into student (id,name) values (SQ_STUDENT.nextval, 'abc');
SQ_STUDENT.nextval sometime gives error that duplicate record...
please help me on this doubt
With the select max(id) + 1 approach, two sessions inserting simultaneously will see the same current max ID from the table, and both insert the same new ID value. The only way to use this safely is to lock the table before starting the transaction, which is painful and serialises the transactions. (And as Stijn points out, values can be reused if the highest record is deleted). Basically, never use this approach. (There may very occasionally be a compelling reason to do so, but I'm not sure I've ever seen one).
The sequence guarantees that the two sessions will get different values, and no serialisation is needed. It will perform better and be safer, easier to code and easier to maintain.
The only way you can get duplicate errors using the sequence is if records already exist in the table with IDs above the sequence value, or if something is still inserting records without using the sequence. So if you had an existing table with manually entered IDs, say 1 to 10, and you created a sequence with a default start-with value of 1, the first insert using the sequence would try to insert an ID of 1 - which already exists. After trying that 10 times the sequence would give you 11, which would work. If you then used the max-ID approach to do the next insert that would use 12, but the sequence would still be on 11 and would also give you 12 next time you called nextval.
The sequence and table are not related. The sequence is not automatically updated if a manually-generated ID value is inserted into the table, so the two approaches don't mix. (Among other things, the same sequence can be used to generate IDs for multiple tables, as mentioned in the docs).
If you're changing from a manual approach to a sequence approach, you need to make sure the sequence is created with a start-with value that is higher than all existing IDs in the table, and that everything that does an insert uses the sequence only in the future.
Using a sequence works if you intend to have multiple users. Using a max does not.
If you do a max(id) + 1 and you allow multiple users, then multiple sessions that are both operating at the same time will regularly see the same max and, thus, will generate the same new key. Assuming you've configured your constraints correctly, that will generate an error that you'll have to handle. You'll handle it by retrying the INSERT which may fail again and again if other sessions block you before your session retries but that's a lot of extra code for every INSERT operation.
It will also serialize your code. If I insert a new row in my session and go off to lunch before I remember to commit (or my client application crashes before I can commit), every other user will be prevented from inserting a new row until I get back and commit or the DBA kills my session, forcing a reboot.
To add to the other answers, a couple of issues.
Your max(id)+1 syntax will also fail if there are no rows in the table already, so use:
Coalesce(Max(id),0) + 1
There's nothing wrong with this technique if you only have a single process that inserts into the table, as might be the case with a data warehouse load, and if max(id) is fast (which it probably is).
It also avoids the need for code to synchronise values between tables and sequences if you are moving restoring data to a test system, for example.
You can extend this method to multirow insert by using:
Coalesce(max(id),0) + rownum
I expect that might serialise a parallel insert, though.
Some techniques don't work well with these methods. They rely of course on being able to issue the select statement, so SQL*Loader might be ruled out. However SQL*Loader has support for this technique in general through the SEQUENCE parameter of the column specification: http://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_field_list.htm#i1008234
Assuming MAX(ID) is actually fast enough, wouldn't it be possible to:
First get MAX(ID)+1
Then get NEXTVAL
Compare those two and increase sequence in case NEXTVAL is smaller then MAX(ID)+1
Use NEXTVAL in INSERT statement
In that case I would have a fully stable procedure and manual inserts would also be allowed without worrying about updating the sequence

SELECT * FROM TABLE(pipelined function): can I be sure of the order of the rows in the result?

In the following example, will I always get “1, 2”, or is it possible to get “2, 1” and can you tell me where in the documentation you see that guarantee if it exists?
If the answer is yes, it means that without ORDER BY nor ORDER SIBLINGS there is a way to be sure of the result set order in a SELECT statement.
CREATE TYPE temp_row IS OBJECT(x number);
/
CREATE TYPE temp_table IS TABLE OF temp_row;
/
CREATE FUNCTION temp_func
RETURN temp_table PIPELINED
IS
BEGIN
PIPE ROW(temp_row(1));
PIPE ROW(temp_row(2));
END;
/
SELECT * FROM table(temp_func());
Thank you.
I don't think that there's anywhere in the documentation that guarantees the order that data will be returned in.
There's an old Tom Kyte thread from 2003 (so might be out of date) which states that relying on the implicit order would not be advisable, for the same reasons as you would not rely on the order in ordinary SQL.
1st: is the order of rows returned from the table function within a
SQL statement the exact same order in which the entries were "piped"
into the internal collection (so that no order by clause is needed)?
...
Followup May 18, 2003 - 10am UTC:
1) maybe, maybe not, I would not count on it. You should not count
on the order of rows in a result set without having an order by. If
you join or do something more complex then simply "select * from
table( f(x) )", the rows could well come back in some other order.
empirically -- they appear to come back as they are piped. I do not
believe it is documented that this is so.
In fact, collections of type NESTED TABLE are documented to explicitly
not have the ability to preserve order.
To be safe, you should do as you always would in a query, state an explicit ORDER BY, if you want the query results ordered.
Having said that I've taken your function and run 10 million iterations, to check whether the implicit order was ever broken; it wasn't.
SQL> begin
2 for i in 1 .. 10000000 loop
3 for j in ( SELECT a.*, rownum as rnum FROM table(temp_func()) a ) loop
4
5 if j.x <> j.rnum then
6 raise_application_error(-20000,'It broke');
7 end if;
8 end loop;
9 end loop;
10 end;
11 /
PL/SQL procedure successfully completed.
This procedural logic works differently to table-based queries. The reason that you cannot rely on orders in a select from a table is that you cannot rely on the order in which the RDBMS will identify rows as part of the required set. This is partly because of execution plans changing, and partly because there are very few situations in which the physical order of rows in a table is predictable.
However here you are selecting from a function that does guarantee the order in which the rows are emitted from the function. In the absence of joins, aggregations, or just about anything else (ie. for a straight "select ... from table(function)") I would be pretty certain that the row order is deterministic.
That advice does not apply where there is a table involved unless there is an explicit order-by, so if you load your pl/sql collection from a query that does not use an order-by then of course the order of rows in the collection is not deterministic.
The AskTom link in accepted answer is broken at this time but I found newer yet very similar question. After some "misunderstanding ping-pong", Connor McDonald finally admits the ordering is stable under certain conditions involving parallelism and ref cursors and related only to current releases. Citation:
Parallelism is the (potential) risk here.
As it currently stands, a pipelined function can be run in parallel only if it takes a ref cursor as input. There is of course no guarantee that this will not change in future.
So you could run on the assumption that in current releases you will get the rows back in order, but you could never 100% rely on it being the case now and forever more.
So no guarantee is given for future releases.
The function in question would pass this criterion hence it should provide stable ordering. However, I wouldn't personally trust it. My case (when I found this question) was even simpler: selecting from collection specified literally - select column_value from table(my_collection(5,3,7,2)) and I preferred explicit pairing between data and index anyway. It's not so hard and not much more longer.
Oracle should learn from Postgres where this situation is solved by unnest(array) with ordinality which is clearly understandable, trustworthy and well-documented feature.

Incrementing Oracle Sequence by certain amount

I am programming a Windows Application (in Qt 4.6) which - at some point - inserts any number of datasets between 1 and around 76000 into some oracle (10.2) table. The application has to retrieve the primary keys, or at least the primary key range, from a sequence. It will then store the IDs in a list which is used for Batch Execution of a prepared query.
(Note: Triggers shall not be used, and the sequence is used by other tasks as well)
In order to avoid calling the sequence X times, I would like to increment the sequence by X instead.
What I have found out so far, is that the following code would be possible in a procedure:
ALTER SEQUENCE my_sequence INCREMENT BY X;
SELECT my_sequence.CURVAL + 1, my_sequence.NEXTVAL
INTO v_first_number, v_last_number
FROM dual;
ALTER SEQUENCE my_sequence INCREMENT BY 1;
I have two major concerns though:
I have read that ALTER SEQUENCE produces an implicit commit. Does this mean the transaction started by the Windows Application will be commited? If so, can you somehow avoid it?
Is this concept multi-user proof? Or could the following thing happen:
Sequence is at 10,000
Session A sets increment to 2,000
Session A selects 10,001 as first and 12,000 as last
Session B sets increment to 5,000
Session A sets increment to 1
Session B selects 12,001 as first and 12,001 as last
Session B sets increment to 1
Even if the procedure would be rather quick, it is not that unlikely in my application that two different users cause the procedure to be called almost simultaneously
1) ALTER SEQUENCE is DDL so it implicitly commits before and after the statement. The database transaction started by the Windows application will be committed. If you are using a distributed transaction coordinator other than the Oracle database, hopefully the transaction coordinator will commit the entire distributed transaction but transaction coordinators will sometimes have problems with commits issued that it is not aware of.
There is nothing that you can do to prevent DDL from committing.
2) The scenario you outline with multiple users is quite possible. So it doesn't sound like this approach would behave correctly in your environment.
You could potentially use the DBMS_LOCK package to ensure that only one session is calling your procedure at any point in time and then call the sequence N times from a single SQL statement. But if other processes are also using the sequence, there is no guarantee that you'll get a contiguous set of values.
CREATE PROCEDURE some_proc( p_num_rows IN NUMBER,
p_first_val OUT NUMBER,
p_last_val OUT NUMBER )
AS
l_lockhandle VARCHAR2(128);
l_lock_return_code INTEGER;
BEGIN
dbms_lock.allocate_unique( 'SOME_PROC_LOCK',
l_lockhandle );
l_lock_return_code := dbms_lock.request( lockhandle => l_lockhandle,
lockmode => dbms_lock.x_mode,
release_on_commit => true );
if( l_lock_return_code IN (0, 4) ) -- Success or already owned
then
<<do something>>
end if;
dbms_lock.release( l_lockhandle );
END;
Altering the sequence in this scenario is really bad idea. Particularly in multiuser environment. You'll get your transaction committed and probably several "race condition" data bugs or integrity errors.
It would be appropriate if you had legacy data alredy imported and want to insert new data with ids from sequence. Then you may alter the sequence to move currval to max existing ...
It seems to me that here you want to generate Ids from the sequence. That need not to be done by
select seq.nextval into l_variable from dual;
insert into table (id, ...) values (l_variable, ....);
You can use the sequence directly in the insert:
insert into table values (id, ...) values (seq.nextval, ....);
and optionally get the assigned value back by
insert into table values (id, ...) values (seq.nextval, ....)
returning id into l_variable;
It certainly is possible even for bulk operations with execBatch. Either just creating the ids or even returning them. I am not sure about the right syntax in java but it will be something about the lines
insert into table values (id, ...) values (seq.nextval, ....)
returning id bulk collect into l_cursor;
and you'll be given a ResultSet to browse the assigned numbers.
You can't prevent the implicit commit.
Your solution is not multi user proof. It is perfectly possible that another session will have 'restored' the increment to 1, just as you described.
I would suggest you keep fetching values one by one from the sequence, store these IDs one by one on your list and have the batch execution operate on that list.
What is the reason that you want to fetch a contiguous block of values from the sequence? I would not be too worried about performance, but maybe there are other requirements that I don't know of.
In Oracle, you can use following query to get next N values from a sequence that increments by one:
select level, PDQ_ACT_COMB_SEQ.nextval as seq from dual connect by level <= 5;

How to protect a running column within Oracle/PostgreSQL (kind of MAX-result locking or something)

I'd need advice on following situation with Oracle/PostgreSQL:
I have a db table with a "running counter" and would like to protect it in the following situation with two concurrent transactions:
T1 T2
SELECT MAX(C) FROM TABLE WHERE CODE='xx'
-- C for new : result + 1
SELECT MAX(C) FROM TABLE WHERE CODE='xx';
-- C for new : result + 1
INSERT INTO TABLE...
INSERT INTO TABLE...
So, in both cases, the column value for INSERT is calculated from the old result added by one.
From this, some running counter handled by the db would be fine. But that wouldn't work because
the counter values or existing rows are sometimes changed
sometimes I'd like there to be multiple counter "value groups" (as with the CODE mentioned) : with different values for CODE the counters would be independent.
With some other databases this can be handled with SERIALIZABLE isolation state but at least with Oracle&Postgre the phantom reads are prevented but as the result the table ends up with two distinct rows with same counter value. This seems to have to do with the predicate locking, locking "all the possible rows covered by the query" - some other db:s end up to lock the whole table or something..
SELECT ... FOR UPDATE -statements seem to be for other purposes and don't even seem to work with MAX() -function.
Setting an UNIQUE contraint on the column would probably be the solution but are there some other ways to prevent the situation?
b.r. Touko
EDIT: One more option could probably be manual locking even though it doesn't appear nice to me..
Both Oracle and PostgreSQL support what's called sequences and the perfect fit for your problem. You can have a regular int column, but define one sequence per group, and do a single query like
--PostgreSQL
insert into table (id, ... ) values (nextval(sequence_name_for_group_xx), ... )
--Oracle
insert into table (id, ... ) values (sequence_name_for_group_xx.nextval, ... )
Increments in sequences are atomic, so your problem just wouldn't exist. It's only a matter of creating the required sequences, one per group.
the counter values or existing rows are sometimes changed
You should to put a unique constraint on that column if this would be a problem for your app. Doing so would guarantee a transaction at SERIALIZABLE isolation level would abort if it tried to use the same id as another transaction.
One more option could probably be manual locking even though it doesn't appear nice to me..
Manual locking in this case is pretty easy: just take a SHARE UPDATE EXCLUSIVE or stronger lock on the table before selecting the maximum. This will kill concurrent performance, though.
sometimes I'd like there to be multiple counter "value groups" (as with the CODE mentioned) : with different values for CODE the counters would be independent.
This leads me to the Right Solution for this problem: sequences. Set up several sequences, one for each "value group" you want to get IDs in their own range. See Section 9.15 of The Manual for the details of sequences and how to use them; it looks like they're a perfect fit for you. Sequences will never give the same value twice, but might skip values: if a transaction gets the value '2' from a sequence and aborts, the next transaction will get the value '3' rather than '2'.
The sequence answer is common, but might not be right. The viability of this solution depends on what you actually need. If what you semantically want is "some guaranteed to be unique number" then that is what a sequence is for. However, if what you want is to make sure that your value increases by exactly one on each insert (as you have asked), then DO NOT USE A SEQUENCE! I have run into this trap before myself. Sequences are not guaranteed to be sequential! They can skip numbers. Depending on what sort of optimizations you have configured, they can skip LOTS of numbers. Even if you have things configured just right so that you shouldn't skip any numbers, that is not guaranteed, and is not what sequences are for. So, you are only asking for trouble if you (mis)use them like that.
One step better solution is to bundle the select into the insert, like so:
INSERT INTO table(code, c, ...)
VALUES ('XX', (SELECT MAX(c) + 1 AS c FROM table WHERE code = 'XX'), ...);
(I haven't test run that query, but I'm pretty sure it should work. My apologies if it doesn't.) But, doing something like that reflects the semantic intent of what you are trying to do. However, this is inefficient, because you have to do a scan for MAX, and the inference I am taking from your sample is that you have a small number of code values relative to the size of the table, so you are going to do an expensive, full table scan on every insert. That isn't good. Also, this doesn't even get you the ACID guarantee you are looking for. The select is not transactionally tied to the insert. You can't "lock" the result of the MAX() function. So, you could still have two transactions running this query and they both do the sub-select and get the same max, both add one, and then both try to insert. It's a much smaller window, but you may still technically have a race condition here.
Ultimately, I would challenge that you probably have the wrong data model if you are trying to increment on insert. You should insert with a unique key, most commonly a sequence value (at least as an easy, surrogate key for any natural key). That gets the data safely inserted. Then, if you need a count of things, then have one table that stores your counts.
CREATE TABLE code_counts (
code VARCHAR(2), --or whatever
count NUMBER
);
If you really want to store the code count of each item as it is inserted, the separate count table also allows you to do so correctly, transactionally, like so:
UPDATE code_counts SET count = count + 1 WHERE code = 'XX' RETURNING count INTO :count;
INSERT INTO table(code, c, ...) VALUES ('XX', :count, ...);
COMMIT;
The key is that the update locks the counter table and reserves that value for you. Then your insert uses that value. And all of that is committed as one transactional change. You have to do this in a transaction. Having a separate count table avoids the full table scan of doing SELECT MAX().... In essense, what this does is re-implements a sequence, but it also guarantees you sequencial, ordered use.
Without knowing your whole problem domain and data model, it is hard to say, but abstracting your counts out to a separate table like this where you don't have to do a select max to get the right value is probably a good idea. Assuming, of course, that a count is what you really care about. If you are just doing logging or something where you want to make sure things are unique, then use a sequence, and a timestamp to sort by.
Note that I'm saying not to sort by a sequence either. Basically, never trust a sequence to be anything other than unique. Because when you get to caching sequence values on a multi-node system, your application might even consume them out of order.
This is why you should use the Serial datatype, which defers the lookup of C to the time of insert (which uses table locks i presume). You would then not specify C, but it would be generated automatically. If you need C for some intermediate calculation, you would need to save first, then read C and finally update with the derived values.
Edit: Sorry, I didn't read your whole question. What about solving your other problems with normalization? Just create a second table for each specific type (for each x where A='x'), where you have another auto increment. Manually edited sequences could be another column in the same table, which uses the generated sequence as a base (i.e if pk = 34 you can have another column mypk='34Changed').
You can create sequential collumn by using sequence as default value:
First, you have to create the sequence counter:
CREATE SEQUENCE SEQ_TABLE_1 START WITH 1 INCREMENT BY 1;
So, you can use it as default value:
CREATE TABLE T (
COD NUMERIC(10) DEFAULT NEXTVAL('SEQ_TABLE_1') NOT NULL,
collumn1...
collumn2...
);
Now you don't need to worry about sequence on inserting rows:
INSERT INTO T (collumn1, collumn2) VALUES (value1, value2);
Regards.

Resources