Inserting into Oracle the wrong way - how to deal with it? - oracle

I've just found the following code:
select max(id) from TABLE_NAME ...
... do some stuff ...
insert into TABLE_NAME (id, ... )
VALUES (max(id) + 1, ...)
I can create a sequence for the PK, but there's a bunch of existing code (classic asp, existing asp.net apps that aren't part of this project) that's not going to use it.
Should I just ignore it, or is there a way to fix it without going into the existing code?
I'm thinking that the best option is just to do:
insert into TABLE_NAME (id, ... )
VALUES (select max(id) + 1, ...)
Options?

You can create a trigger on the table that overwrites the value for ID with a value that you fetch from a sequence.
That way you can still use the other existing code and have no problems with concurrent inserts.
If you cannot change the other software and they still do the select max(id)+1 insert that is most unfortunate. What you then can do is:
For your own insert use a sequence and populate the ID field with -1*(sequence value).
This way the insert will not interfere with the existing programs, but also not conflict with the existing programs.
(of do the insert without a value for id and use a trigger to populate the ID with the negative value of a sequence).

As others have said, you can override the max value in a database trigger using a sequence. However, that could cause problems if any of the application code uses that value like this:
select max(id) from TABLE_NAME ...
... do some stuff ...
insert into TABLE_NAME (id, ... )
VALUES (max(id) + 1, ...)
insert into CHILD_TABLE (parent_id, ...)
VALUES (max(id) + 1, ...)

Use a seqeunce in a before insert row trigger. select max(id) + 1 doesn't work in a multi concerrency environment.

This quickly turns in to a discussion of application architecture, especially when the question boils down to "what should I do?"
Primary keys in Oracle really need to come from sequences and since you're dealing with complex insert logic (parent/child inserts, at least) in your application code, you should go into the existing code, as you say (since triggers probably won't help you).
On one extreme you could take away direct SQL access from applications and make them call services so the insert/update/delete code can be centralized. Or you could rewrite your code using some sort of MVC architecture. I'm assuming both are overkill for your situation.
Is the id column at least set to be a true primary key so there's a constraint that will keep duplicates from occurring? If not, start there.
Once the primary key is in place, or if it already is, it's only a matter of time until inserts start to fail; you'll know when they start to fail, right? If not, get on the error-logging.
Now fix the application code. While you're in there, you should at least write and call helper code so your database interactions are in as few places as possible. Then provide some leadership to the other developers and make sure they use the helper code too.

Big question: does anybody rely on the value of the PK? If not I would recommend using a trigger, fetching the id from a sequence and setting it. The inserts wouldn't specify and id at all.
I am not sure but the
insert into TABLE_NAME (id, ... )
VALUES (select max(id) + 1, ...)
might cause problems when to sessions reach that code. It might be that oracle reads the table (calculating max(id)) and then trys to get the lock on the PK for insertion. If that is the case two concurrent session might try to use the same id, causing an exception in the second session.
You could add some logging to the trigger, to check if inserts get processed that already have an ID set. So you know you have still to hunt down some place where the old code is used.

It can be done by fetching the max value in a variable and then just insert it in the table like
Declare
v_max int;
select max(id) into v_max from table;
insert into table values((v_max+rownum),val1,val2....,valn);
commit;
This will create a sequence in a single as well as Bulk inserts.

Related

Oracle: difference between max(id)+1 and sequence.nextval

I am using Oracle
What is difference when we create ID using max(id)+1 and using sequance.nexval,where to use and when?
Like:
insert into student (id,name) values (select max(id)+1 from student, 'abc');
and
insert into student (id,name) values (SQ_STUDENT.nextval, 'abc');
SQ_STUDENT.nextval sometime gives error that duplicate record...
please help me on this doubt
With the select max(id) + 1 approach, two sessions inserting simultaneously will see the same current max ID from the table, and both insert the same new ID value. The only way to use this safely is to lock the table before starting the transaction, which is painful and serialises the transactions. (And as Stijn points out, values can be reused if the highest record is deleted). Basically, never use this approach. (There may very occasionally be a compelling reason to do so, but I'm not sure I've ever seen one).
The sequence guarantees that the two sessions will get different values, and no serialisation is needed. It will perform better and be safer, easier to code and easier to maintain.
The only way you can get duplicate errors using the sequence is if records already exist in the table with IDs above the sequence value, or if something is still inserting records without using the sequence. So if you had an existing table with manually entered IDs, say 1 to 10, and you created a sequence with a default start-with value of 1, the first insert using the sequence would try to insert an ID of 1 - which already exists. After trying that 10 times the sequence would give you 11, which would work. If you then used the max-ID approach to do the next insert that would use 12, but the sequence would still be on 11 and would also give you 12 next time you called nextval.
The sequence and table are not related. The sequence is not automatically updated if a manually-generated ID value is inserted into the table, so the two approaches don't mix. (Among other things, the same sequence can be used to generate IDs for multiple tables, as mentioned in the docs).
If you're changing from a manual approach to a sequence approach, you need to make sure the sequence is created with a start-with value that is higher than all existing IDs in the table, and that everything that does an insert uses the sequence only in the future.
Using a sequence works if you intend to have multiple users. Using a max does not.
If you do a max(id) + 1 and you allow multiple users, then multiple sessions that are both operating at the same time will regularly see the same max and, thus, will generate the same new key. Assuming you've configured your constraints correctly, that will generate an error that you'll have to handle. You'll handle it by retrying the INSERT which may fail again and again if other sessions block you before your session retries but that's a lot of extra code for every INSERT operation.
It will also serialize your code. If I insert a new row in my session and go off to lunch before I remember to commit (or my client application crashes before I can commit), every other user will be prevented from inserting a new row until I get back and commit or the DBA kills my session, forcing a reboot.
To add to the other answers, a couple of issues.
Your max(id)+1 syntax will also fail if there are no rows in the table already, so use:
Coalesce(Max(id),0) + 1
There's nothing wrong with this technique if you only have a single process that inserts into the table, as might be the case with a data warehouse load, and if max(id) is fast (which it probably is).
It also avoids the need for code to synchronise values between tables and sequences if you are moving restoring data to a test system, for example.
You can extend this method to multirow insert by using:
Coalesce(max(id),0) + rownum
I expect that might serialise a parallel insert, though.
Some techniques don't work well with these methods. They rely of course on being able to issue the select statement, so SQL*Loader might be ruled out. However SQL*Loader has support for this technique in general through the SEQUENCE parameter of the column specification: http://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_field_list.htm#i1008234
Assuming MAX(ID) is actually fast enough, wouldn't it be possible to:
First get MAX(ID)+1
Then get NEXTVAL
Compare those two and increase sequence in case NEXTVAL is smaller then MAX(ID)+1
Use NEXTVAL in INSERT statement
In that case I would have a fully stable procedure and manual inserts would also be allowed without worrying about updating the sequence

How to insert while avoiding unique constraints with oracle

We have a process that aggregates some data and inserts the results into another table that we use for efficient querying. The problem we're facing is that we now have multiple aggregators running at roughly the same time.
We use the original records id as the primary key in this new table - a unique constraint. However, if two aggregation processes are running at the same time, one of them will error with a unique constraint violation.
Is there a way to specify some kind of locking mechanism which will make the second writer wait until the first is finished? Alternatively, is there a way to tell oracle to ignore that specific row and continue with the rest?
Unfortunately it's not practical to reduce the aggregation to a single process, as the following procedures rely on an up to date version of the data being available and those procedures do need to scale out.
Edit:
The following is my [redacted] query:
INSERT INTO
agg_table
SELECT
h.id, h.col, h.col2
FROM history h
JOIN call c
ON c.callid = h.callid
WHERE
h.id > (SELECT coalesce(max(id),0) FROM agg_table)
It is possible run an INSERT statement with an error logging clause. The example from the Oracle docs is as follows:
INSERT INTO dw_empl
SELECT employee_id, first_name, last_name, hire_date, salary, department_id
FROM employees
WHERE hire_date > sysdate - 7
LOG ERRORS INTO err_empl ('daily_load') REJECT LIMIT 25
Alternatively, you could try using a [MERGE][2] statement. You would be merging into the summary table with a select from the detail table. If a match is not found, you INSERT and if it is found you would UPDATE. I believe this solution will handle your concurrency issues, but you would need to test it.
have a look at FOR UPDATE clause. If you correctly write the SELECT statement with FOR UPDATE clause within a transaction before your update/insert statements you will be able to "lock" the required records
Serialising the inserts is probably the best way, as there's no method that will get you round the problem of the multiple inserts being unable to see what each one is doing.
DBMS_Lock is probably the appropriate serialisation mechanism.

Stored Procedure: Cursor is bad?

I read somewhere that 99% of time you don't need to use a cursor.
But I can't think of any other way beside using a cursor in this following situation.
Select t.flag
From Dual t;
Let's say this return 4 rows of either 'Y' or 'N'. I want the procedure to trigger something if it finds 'Y'. I usually declare a cursor and loop until %NOTFOUND. Please tell me if there is a better way.
Also, if you have any idea, when is the best time to use a cursor?
EDIT: Instead of inserting the flags, what if I want to do "If 'Y' then trigger something"?
Your case definitely falls into the 99%.
You can easily do the conditional insert using insert into ... select.... It's just a matter or making a select that returns the result that you want to insert.
If you want to insert one record for each 'Y' then use a query with where flag = 'Y'. If you only want to insert a single record depending on whether there are at least one 'Y', then you can add distinct to the query.
A cursor is useful when you make something more complicated. I for example use a cursor when need to insert or update records in one table, and also for each record insert or update one or more records into several other tables.
Something like this:
INSERT INTO TBL_FLAG (col)
SELECT ID FROM Dual where flag = 'Y'
You will usually see a performance gain when using set based instead of procedural operations because most modern DBMS are setup to perform set based operations. You can read more here.
well the example doesnt quite make sense..
but you can always write an insert as select statement instead of what i think you are describing
Cursors are best to use when an column value form one table will be used repeatedly in multiple queries on different tables.
Suppose the values of id_test column are fetched from MY_TEST_TBL using a cursor CUR_TEST. Now this id_test column is a foreign key in MY_TEST_TBL. If we want to use id_test to insert or update any rows in table A_TBL,B_TBL and C_TBL, then in this case it's best to use cursors instead of using complex queries.
Hope this might help to understand the purpose of cursors

select * through dblink

I have some trouble when trying to update a table by looping cursor which select from source table through dblink.
I have two database DB1, DB2.
They are two different database instance.
And I am using this following statement in DB1:
CURSOR TestCursor IS
SELECT a.*, 'A' TEST_COL_A, 'B' TEST_COL_B
FROM rpt.SOURCE#DB2 a;
BEGIN
For C1 in TestCursor loop
INSERT into RPT.TARGET
(
/*The company_name and cust_id are select from SOURCE table from DB2*/
COMPANY_NAME, CUST_ID, TEST_COL_A, TEST_COL_B
)
values
(
C1.COMPANY_NAME, C1.CUST_ID, C1.TEST_COL_A , C1.TEST_COL_B
) ;
End loop;
/*Some code...*/
End
Everything works fine until I add a column "NEW_COL" to SOURCE table#DB2
The insert data got the wrong value.
The value of TEST_COL_A , as I expect, should be 'A'.
However, it contains the value of NEW_COL which i add at SOURCE table.
And the value of TEST_COL_B contains 'A'.
Have anyone encounter the same issue?
It seems like oracle cache the table columns when it compile.
Is there any way to add a column to source table without recompile?
According to this:
Oracle Database does not manage
dependencies among remote schema
objects other than
local-procedure-to-remote-procedure
dependencies.
For example, assume that a local view
is created and defined by a query that
references a remote table. Also assume
that a local procedure includes a SQL
statement that references the same
remote table. Later, the definition of
the table is altered.
Therefore, the local view and
procedure are never invalidated, even
if the view or procedure is used after
the table is altered, and even if the
view or procedure now returns errors
when used. In this case, the view or
procedure must be altered manually so
that errors are not returned. In such
cases, lack of dependency management
is preferable to unnecessary
recompilations of dependent objects.
In this case you aren't quite seeing errors, but the cause is the same. You also wouldn't have a problem if you used explicit column names instead of *, which is usually safer anyway. If you're using * you can't avoid recompiling (unless, I suppose, the * is the last item in the select list, in which case any extra columns on the end wouldn't cause a problem - as long as their names didn't clash).
I recommend that you use a single set processing insert statement in DB1 rather than a row at a time cursor for loop for the insert, for example:
INSERT into RPT.TARGET
select COMPANY_NAME, CUST_ID, 'A' TEST_COL_A, 'B' TEST_COL_B
FROM rpt.SOURCE#DB2
;
Rationale:
Set processing will almost always out perform Row-at-a-time
processing [which is really slow-at-a-time processing].
Set processing the insert is a scalable solution. If the application will need to scale to tens of thousands of rows or millions of rows, the row-at-a-time solution will not likely scale.
Also, using the select * construct is dangerous for the reason you
encountered [and other similar reasons].

Oracle Populate backup table from primary table

The program that I am currently assigned to has a requirement that I copy the contents of a table to a backup table, prior to the real processing.
During code review, a coworker pointed out that
INSERT INTO BACKUP_TABLE
SELECT *
FROM PRIMARY_TABLE
is unduly risky, as it is possible for the tables to have different columns, and different column orders.
I am also under the constraint to not create/delete/rename tables. ~Sigh~
The columns in the table are expected to change, so simply hard-coding the column names is not really the solution I am looking for.
I am looking for ideas on a reasonable non-risky way to get this job done.
Does the backup table stay around? Does it keep the data permanently, or is it just a copy of the current values?
Too bad about not being able to create/delete/rename/copy. Otherwise, if it's short term, just used in case something goes wrong, then you could drop it at the start of processing and do something like
create table backup_table as select * from primary_table;
Your best option may be to make the select explicit, as
insert into backup_table (<list of columns>) select <list of columns> from primary_table;
You could generate that by building a SQL string from the data dictionary, then doing execute immediate. But you'll still be at risk if the backup_table doesn't contain all the important columns from the primary_table.
Might just want to make it explicit, and raise a major error if backup_table doesn't exist, or any of the columns in primary_table aren't in backup_table.
How often do you change the structure of your tables? Your method should work just fine provided the structure doesn't change. Personally I think your DBAs should give you a mechanism for dropping the backup table and recreating it, such as a stored procedure. We had something similar at my last job for truncating certain tables, since truncating is frequently much faster than DELETE FROM TABLE;.
Is there a reason that you can't just list out the columns in the tables? So
INSERT INTO backup_table( col1, col2, col3, ... colN )
SELECT col1, col2, col3, ..., colN
FROM primary_table
Of course, this requires that you revisit the code when you change the definition of one of the tables to determine if you need to make code changes, but that's generally a small price to pay for insulating yourself from differences in column order, differences in column names, and irrelevent differences in table definitions.
If I had this situation, I would retrieve the column definitions for the two tables right at the beginning of the problem. Then, if they were identical, I would proceed with the simple:
INSERT INTO BACKUP_TABLE
SELECT *
FROM PRIMARY_TABLE
If they were different, I would only proceed if there were no critical columns missing from the backup table. In this case I would use this form for the backup copy:
INSERT INTO BACKUP_TABLE (<list of columns>)
SELECT <list of columns>
FROM PRIMARY_TABLE
But I'd also worry about what would happen if I simply stopped the program with an error, so I might even have a backup plan where I would use the second form for the columns that are in both tables, and also dump a text file with the PK and any columns that are missing from the backup. Also log an error even though it appears that the program completed normally. That way, you could recover the data if the worst happened.
Really, this is a symptom of bad processes somewhere which should be addressed, but defensive programming can help to make it someone else's problem, not yours. If they don't notice the log error message which tells them about the text dump with the missing columns, then its not your fault.
But, if you don't code defensively, and the worst happens, it will be partly your fault.
You could try something like:
CREATE TABLE secondary_table AS SELECT * FROM primary_table;
Not sure if that automatically copies data. If not:
CREATE TABLE secondary_table AS SELECT * FROM primary_table LIMIT 1;
INSERT INTO secondary_table SELECT * FROM primary_table;
Edit:
Sorry, didn't read your post completely: especially the constraints part. I'm afraid I don't know how. My guess would be using a procedure that first describes both tables and compares them, before creating a lengthy insert / select query.
Still, if you're using a backup-table, I think it's pretty important it matches the original one exactly.

Resources