Oracle: difference between max(id)+1 and sequence.nextval - oracle

I am using Oracle
What is difference when we create ID using max(id)+1 and using sequance.nexval,where to use and when?
Like:
insert into student (id,name) values (select max(id)+1 from student, 'abc');
and
insert into student (id,name) values (SQ_STUDENT.nextval, 'abc');
SQ_STUDENT.nextval sometime gives error that duplicate record...
please help me on this doubt

With the select max(id) + 1 approach, two sessions inserting simultaneously will see the same current max ID from the table, and both insert the same new ID value. The only way to use this safely is to lock the table before starting the transaction, which is painful and serialises the transactions. (And as Stijn points out, values can be reused if the highest record is deleted). Basically, never use this approach. (There may very occasionally be a compelling reason to do so, but I'm not sure I've ever seen one).
The sequence guarantees that the two sessions will get different values, and no serialisation is needed. It will perform better and be safer, easier to code and easier to maintain.
The only way you can get duplicate errors using the sequence is if records already exist in the table with IDs above the sequence value, or if something is still inserting records without using the sequence. So if you had an existing table with manually entered IDs, say 1 to 10, and you created a sequence with a default start-with value of 1, the first insert using the sequence would try to insert an ID of 1 - which already exists. After trying that 10 times the sequence would give you 11, which would work. If you then used the max-ID approach to do the next insert that would use 12, but the sequence would still be on 11 and would also give you 12 next time you called nextval.
The sequence and table are not related. The sequence is not automatically updated if a manually-generated ID value is inserted into the table, so the two approaches don't mix. (Among other things, the same sequence can be used to generate IDs for multiple tables, as mentioned in the docs).
If you're changing from a manual approach to a sequence approach, you need to make sure the sequence is created with a start-with value that is higher than all existing IDs in the table, and that everything that does an insert uses the sequence only in the future.

Using a sequence works if you intend to have multiple users. Using a max does not.
If you do a max(id) + 1 and you allow multiple users, then multiple sessions that are both operating at the same time will regularly see the same max and, thus, will generate the same new key. Assuming you've configured your constraints correctly, that will generate an error that you'll have to handle. You'll handle it by retrying the INSERT which may fail again and again if other sessions block you before your session retries but that's a lot of extra code for every INSERT operation.
It will also serialize your code. If I insert a new row in my session and go off to lunch before I remember to commit (or my client application crashes before I can commit), every other user will be prevented from inserting a new row until I get back and commit or the DBA kills my session, forcing a reboot.

To add to the other answers, a couple of issues.
Your max(id)+1 syntax will also fail if there are no rows in the table already, so use:
Coalesce(Max(id),0) + 1
There's nothing wrong with this technique if you only have a single process that inserts into the table, as might be the case with a data warehouse load, and if max(id) is fast (which it probably is).
It also avoids the need for code to synchronise values between tables and sequences if you are moving restoring data to a test system, for example.
You can extend this method to multirow insert by using:
Coalesce(max(id),0) + rownum
I expect that might serialise a parallel insert, though.
Some techniques don't work well with these methods. They rely of course on being able to issue the select statement, so SQL*Loader might be ruled out. However SQL*Loader has support for this technique in general through the SEQUENCE parameter of the column specification: http://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_field_list.htm#i1008234

Assuming MAX(ID) is actually fast enough, wouldn't it be possible to:
First get MAX(ID)+1
Then get NEXTVAL
Compare those two and increase sequence in case NEXTVAL is smaller then MAX(ID)+1
Use NEXTVAL in INSERT statement
In that case I would have a fully stable procedure and manual inserts would also be allowed without worrying about updating the sequence

Related

Consecutive application threads and uncommitted data in Oracle

Our application reads a record from an Oracle 'Event' table. When the event record exists we update the 'count' field of that record. If the record doesn't exist we insert it. So we want only 1 record for a particular event in the table.
The problem with this is probably quite predictable: one application thread will read the table, see the event is not there, insert the new event and commit. But before it commits a second thread will also read the table and see the event is not there. And then both threads will insert the event and we end up with 2 records for the same event.
I guess synchronizing access to this particular method in our application will prevent this problem, but what is the best option in Oracle to prevent this? Will MERGE for example always prevent this problem?
Serialising access to the procedure that implements this functionality would be trivial to implement, using DBMS_LOCK to define and take an exclusive lock.
Serialising through SQL based methods is practically impossible, due to the read consistency model.
CREATE TABLE EVENTS (ID NUMBER PRIMARY KEY, COUNTER NUMBER NOT NULL);
MERGE INTO EVENTS
USING (SELECT ID, COUNTER FROM DUAL LEFT JOIN EVENTS ON EVENTS.ID = :EVENT_ID) SRC
ON (EVENTS.ID = SRC.ID)
WHEN MATCHED THEN UPDATE SET COUNTER = SRC.COUNTER + 1
WHEN NOT MATCHED THEN INSERT (ID, COUNTER) VALUES (:EVENT_ID, 1);
Simple SQL securing single record for each ID and consistently increasing the counter no matter what application fires it or number of concurrent thread. You don't need to code anything at all and it's very lightweight as well.
It also doesn't produce any exception related to data consistency so you don't need any special handling.
UPDATE: It actually produces unique violation exception if both threads are inserting. I thought the second merge would switch to update, but it doesn't.
UPDATE: Just tested the same case on SQL Server and when executing in parallel and the record doesn't exist one MERGE inserts and the second updates.

Incrementing Oracle Sequence by certain amount

I am programming a Windows Application (in Qt 4.6) which - at some point - inserts any number of datasets between 1 and around 76000 into some oracle (10.2) table. The application has to retrieve the primary keys, or at least the primary key range, from a sequence. It will then store the IDs in a list which is used for Batch Execution of a prepared query.
(Note: Triggers shall not be used, and the sequence is used by other tasks as well)
In order to avoid calling the sequence X times, I would like to increment the sequence by X instead.
What I have found out so far, is that the following code would be possible in a procedure:
ALTER SEQUENCE my_sequence INCREMENT BY X;
SELECT my_sequence.CURVAL + 1, my_sequence.NEXTVAL
INTO v_first_number, v_last_number
FROM dual;
ALTER SEQUENCE my_sequence INCREMENT BY 1;
I have two major concerns though:
I have read that ALTER SEQUENCE produces an implicit commit. Does this mean the transaction started by the Windows Application will be commited? If so, can you somehow avoid it?
Is this concept multi-user proof? Or could the following thing happen:
Sequence is at 10,000
Session A sets increment to 2,000
Session A selects 10,001 as first and 12,000 as last
Session B sets increment to 5,000
Session A sets increment to 1
Session B selects 12,001 as first and 12,001 as last
Session B sets increment to 1
Even if the procedure would be rather quick, it is not that unlikely in my application that two different users cause the procedure to be called almost simultaneously
1) ALTER SEQUENCE is DDL so it implicitly commits before and after the statement. The database transaction started by the Windows application will be committed. If you are using a distributed transaction coordinator other than the Oracle database, hopefully the transaction coordinator will commit the entire distributed transaction but transaction coordinators will sometimes have problems with commits issued that it is not aware of.
There is nothing that you can do to prevent DDL from committing.
2) The scenario you outline with multiple users is quite possible. So it doesn't sound like this approach would behave correctly in your environment.
You could potentially use the DBMS_LOCK package to ensure that only one session is calling your procedure at any point in time and then call the sequence N times from a single SQL statement. But if other processes are also using the sequence, there is no guarantee that you'll get a contiguous set of values.
CREATE PROCEDURE some_proc( p_num_rows IN NUMBER,
p_first_val OUT NUMBER,
p_last_val OUT NUMBER )
AS
l_lockhandle VARCHAR2(128);
l_lock_return_code INTEGER;
BEGIN
dbms_lock.allocate_unique( 'SOME_PROC_LOCK',
l_lockhandle );
l_lock_return_code := dbms_lock.request( lockhandle => l_lockhandle,
lockmode => dbms_lock.x_mode,
release_on_commit => true );
if( l_lock_return_code IN (0, 4) ) -- Success or already owned
then
<<do something>>
end if;
dbms_lock.release( l_lockhandle );
END;
Altering the sequence in this scenario is really bad idea. Particularly in multiuser environment. You'll get your transaction committed and probably several "race condition" data bugs or integrity errors.
It would be appropriate if you had legacy data alredy imported and want to insert new data with ids from sequence. Then you may alter the sequence to move currval to max existing ...
It seems to me that here you want to generate Ids from the sequence. That need not to be done by
select seq.nextval into l_variable from dual;
insert into table (id, ...) values (l_variable, ....);
You can use the sequence directly in the insert:
insert into table values (id, ...) values (seq.nextval, ....);
and optionally get the assigned value back by
insert into table values (id, ...) values (seq.nextval, ....)
returning id into l_variable;
It certainly is possible even for bulk operations with execBatch. Either just creating the ids or even returning them. I am not sure about the right syntax in java but it will be something about the lines
insert into table values (id, ...) values (seq.nextval, ....)
returning id bulk collect into l_cursor;
and you'll be given a ResultSet to browse the assigned numbers.
You can't prevent the implicit commit.
Your solution is not multi user proof. It is perfectly possible that another session will have 'restored' the increment to 1, just as you described.
I would suggest you keep fetching values one by one from the sequence, store these IDs one by one on your list and have the batch execution operate on that list.
What is the reason that you want to fetch a contiguous block of values from the sequence? I would not be too worried about performance, but maybe there are other requirements that I don't know of.
In Oracle, you can use following query to get next N values from a sequence that increments by one:
select level, PDQ_ACT_COMB_SEQ.nextval as seq from dual connect by level <= 5;

Inserting into Oracle the wrong way - how to deal with it?

I've just found the following code:
select max(id) from TABLE_NAME ...
... do some stuff ...
insert into TABLE_NAME (id, ... )
VALUES (max(id) + 1, ...)
I can create a sequence for the PK, but there's a bunch of existing code (classic asp, existing asp.net apps that aren't part of this project) that's not going to use it.
Should I just ignore it, or is there a way to fix it without going into the existing code?
I'm thinking that the best option is just to do:
insert into TABLE_NAME (id, ... )
VALUES (select max(id) + 1, ...)
Options?
You can create a trigger on the table that overwrites the value for ID with a value that you fetch from a sequence.
That way you can still use the other existing code and have no problems with concurrent inserts.
If you cannot change the other software and they still do the select max(id)+1 insert that is most unfortunate. What you then can do is:
For your own insert use a sequence and populate the ID field with -1*(sequence value).
This way the insert will not interfere with the existing programs, but also not conflict with the existing programs.
(of do the insert without a value for id and use a trigger to populate the ID with the negative value of a sequence).
As others have said, you can override the max value in a database trigger using a sequence. However, that could cause problems if any of the application code uses that value like this:
select max(id) from TABLE_NAME ...
... do some stuff ...
insert into TABLE_NAME (id, ... )
VALUES (max(id) + 1, ...)
insert into CHILD_TABLE (parent_id, ...)
VALUES (max(id) + 1, ...)
Use a seqeunce in a before insert row trigger. select max(id) + 1 doesn't work in a multi concerrency environment.
This quickly turns in to a discussion of application architecture, especially when the question boils down to "what should I do?"
Primary keys in Oracle really need to come from sequences and since you're dealing with complex insert logic (parent/child inserts, at least) in your application code, you should go into the existing code, as you say (since triggers probably won't help you).
On one extreme you could take away direct SQL access from applications and make them call services so the insert/update/delete code can be centralized. Or you could rewrite your code using some sort of MVC architecture. I'm assuming both are overkill for your situation.
Is the id column at least set to be a true primary key so there's a constraint that will keep duplicates from occurring? If not, start there.
Once the primary key is in place, or if it already is, it's only a matter of time until inserts start to fail; you'll know when they start to fail, right? If not, get on the error-logging.
Now fix the application code. While you're in there, you should at least write and call helper code so your database interactions are in as few places as possible. Then provide some leadership to the other developers and make sure they use the helper code too.
Big question: does anybody rely on the value of the PK? If not I would recommend using a trigger, fetching the id from a sequence and setting it. The inserts wouldn't specify and id at all.
I am not sure but the
insert into TABLE_NAME (id, ... )
VALUES (select max(id) + 1, ...)
might cause problems when to sessions reach that code. It might be that oracle reads the table (calculating max(id)) and then trys to get the lock on the PK for insertion. If that is the case two concurrent session might try to use the same id, causing an exception in the second session.
You could add some logging to the trigger, to check if inserts get processed that already have an ID set. So you know you have still to hunt down some place where the old code is used.
It can be done by fetching the max value in a variable and then just insert it in the table like
Declare
v_max int;
select max(id) into v_max from table;
insert into table values((v_max+rownum),val1,val2....,valn);
commit;
This will create a sequence in a single as well as Bulk inserts.

How to protect a running column within Oracle/PostgreSQL (kind of MAX-result locking or something)

I'd need advice on following situation with Oracle/PostgreSQL:
I have a db table with a "running counter" and would like to protect it in the following situation with two concurrent transactions:
T1 T2
SELECT MAX(C) FROM TABLE WHERE CODE='xx'
-- C for new : result + 1
SELECT MAX(C) FROM TABLE WHERE CODE='xx';
-- C for new : result + 1
INSERT INTO TABLE...
INSERT INTO TABLE...
So, in both cases, the column value for INSERT is calculated from the old result added by one.
From this, some running counter handled by the db would be fine. But that wouldn't work because
the counter values or existing rows are sometimes changed
sometimes I'd like there to be multiple counter "value groups" (as with the CODE mentioned) : with different values for CODE the counters would be independent.
With some other databases this can be handled with SERIALIZABLE isolation state but at least with Oracle&Postgre the phantom reads are prevented but as the result the table ends up with two distinct rows with same counter value. This seems to have to do with the predicate locking, locking "all the possible rows covered by the query" - some other db:s end up to lock the whole table or something..
SELECT ... FOR UPDATE -statements seem to be for other purposes and don't even seem to work with MAX() -function.
Setting an UNIQUE contraint on the column would probably be the solution but are there some other ways to prevent the situation?
b.r. Touko
EDIT: One more option could probably be manual locking even though it doesn't appear nice to me..
Both Oracle and PostgreSQL support what's called sequences and the perfect fit for your problem. You can have a regular int column, but define one sequence per group, and do a single query like
--PostgreSQL
insert into table (id, ... ) values (nextval(sequence_name_for_group_xx), ... )
--Oracle
insert into table (id, ... ) values (sequence_name_for_group_xx.nextval, ... )
Increments in sequences are atomic, so your problem just wouldn't exist. It's only a matter of creating the required sequences, one per group.
the counter values or existing rows are sometimes changed
You should to put a unique constraint on that column if this would be a problem for your app. Doing so would guarantee a transaction at SERIALIZABLE isolation level would abort if it tried to use the same id as another transaction.
One more option could probably be manual locking even though it doesn't appear nice to me..
Manual locking in this case is pretty easy: just take a SHARE UPDATE EXCLUSIVE or stronger lock on the table before selecting the maximum. This will kill concurrent performance, though.
sometimes I'd like there to be multiple counter "value groups" (as with the CODE mentioned) : with different values for CODE the counters would be independent.
This leads me to the Right Solution for this problem: sequences. Set up several sequences, one for each "value group" you want to get IDs in their own range. See Section 9.15 of The Manual for the details of sequences and how to use them; it looks like they're a perfect fit for you. Sequences will never give the same value twice, but might skip values: if a transaction gets the value '2' from a sequence and aborts, the next transaction will get the value '3' rather than '2'.
The sequence answer is common, but might not be right. The viability of this solution depends on what you actually need. If what you semantically want is "some guaranteed to be unique number" then that is what a sequence is for. However, if what you want is to make sure that your value increases by exactly one on each insert (as you have asked), then DO NOT USE A SEQUENCE! I have run into this trap before myself. Sequences are not guaranteed to be sequential! They can skip numbers. Depending on what sort of optimizations you have configured, they can skip LOTS of numbers. Even if you have things configured just right so that you shouldn't skip any numbers, that is not guaranteed, and is not what sequences are for. So, you are only asking for trouble if you (mis)use them like that.
One step better solution is to bundle the select into the insert, like so:
INSERT INTO table(code, c, ...)
VALUES ('XX', (SELECT MAX(c) + 1 AS c FROM table WHERE code = 'XX'), ...);
(I haven't test run that query, but I'm pretty sure it should work. My apologies if it doesn't.) But, doing something like that reflects the semantic intent of what you are trying to do. However, this is inefficient, because you have to do a scan for MAX, and the inference I am taking from your sample is that you have a small number of code values relative to the size of the table, so you are going to do an expensive, full table scan on every insert. That isn't good. Also, this doesn't even get you the ACID guarantee you are looking for. The select is not transactionally tied to the insert. You can't "lock" the result of the MAX() function. So, you could still have two transactions running this query and they both do the sub-select and get the same max, both add one, and then both try to insert. It's a much smaller window, but you may still technically have a race condition here.
Ultimately, I would challenge that you probably have the wrong data model if you are trying to increment on insert. You should insert with a unique key, most commonly a sequence value (at least as an easy, surrogate key for any natural key). That gets the data safely inserted. Then, if you need a count of things, then have one table that stores your counts.
CREATE TABLE code_counts (
code VARCHAR(2), --or whatever
count NUMBER
);
If you really want to store the code count of each item as it is inserted, the separate count table also allows you to do so correctly, transactionally, like so:
UPDATE code_counts SET count = count + 1 WHERE code = 'XX' RETURNING count INTO :count;
INSERT INTO table(code, c, ...) VALUES ('XX', :count, ...);
COMMIT;
The key is that the update locks the counter table and reserves that value for you. Then your insert uses that value. And all of that is committed as one transactional change. You have to do this in a transaction. Having a separate count table avoids the full table scan of doing SELECT MAX().... In essense, what this does is re-implements a sequence, but it also guarantees you sequencial, ordered use.
Without knowing your whole problem domain and data model, it is hard to say, but abstracting your counts out to a separate table like this where you don't have to do a select max to get the right value is probably a good idea. Assuming, of course, that a count is what you really care about. If you are just doing logging or something where you want to make sure things are unique, then use a sequence, and a timestamp to sort by.
Note that I'm saying not to sort by a sequence either. Basically, never trust a sequence to be anything other than unique. Because when you get to caching sequence values on a multi-node system, your application might even consume them out of order.
This is why you should use the Serial datatype, which defers the lookup of C to the time of insert (which uses table locks i presume). You would then not specify C, but it would be generated automatically. If you need C for some intermediate calculation, you would need to save first, then read C and finally update with the derived values.
Edit: Sorry, I didn't read your whole question. What about solving your other problems with normalization? Just create a second table for each specific type (for each x where A='x'), where you have another auto increment. Manually edited sequences could be another column in the same table, which uses the generated sequence as a base (i.e if pk = 34 you can have another column mypk='34Changed').
You can create sequential collumn by using sequence as default value:
First, you have to create the sequence counter:
CREATE SEQUENCE SEQ_TABLE_1 START WITH 1 INCREMENT BY 1;
So, you can use it as default value:
CREATE TABLE T (
COD NUMERIC(10) DEFAULT NEXTVAL('SEQ_TABLE_1') NOT NULL,
collumn1...
collumn2...
);
Now you don't need to worry about sequence on inserting rows:
INSERT INTO T (collumn1, collumn2) VALUES (value1, value2);
Regards.

Manually inserting data in a table(s) with primary key populated with sequence

I have a number of tables that use the trigger/sequence column to simulate auto_increment on their primary keys which has worked great for some time.
In order to speed the time necessary to perform regression testing against software that uses the db, I create control files using some sample data, and added running of these to the build process.
This change is causing most of the tests to crash though as the testing process installs the schema from scratch, and the sequences are returning values that already exist in the tables. Is there any way to programtically say "Update sequences to max value in column" or do I need to write out a whole script by hand that updates all these sequences, or can I/should I change the trigger that substitutes the null value for the sequence to some how check this (though I think this might cause the mutating table problem)?
You can generate a script to create the sequences with the start values you need (based on their existing values)....
SELECT 'CREATE SEQUENCE '||sequence_name||' START WITH '||last_number||';'
FROM ALL_SEQUENCES
WHERE OWNER = your_schema
(If I understand the question correctly)
Here's a simple way to update a sequence value - in this case setting the sequence to 1000 if it is currently 50:
alter sequence MYSEQUENCE increment by 950 nocache;
select MYSEQUENCE_S.nextval from dual;
alter sequence MYSEQUENCE increment by 1;
Kudos to the creators of PL/SQL Developer for including this technique in their tool.
As part of your schema rebuild, why not drop and recreate the sequence?

Resources