The program that I am currently assigned to has a requirement that I copy the contents of a table to a backup table, prior to the real processing.
During code review, a coworker pointed out that
INSERT INTO BACKUP_TABLE
SELECT *
FROM PRIMARY_TABLE
is unduly risky, as it is possible for the tables to have different columns, and different column orders.
I am also under the constraint to not create/delete/rename tables. ~Sigh~
The columns in the table are expected to change, so simply hard-coding the column names is not really the solution I am looking for.
I am looking for ideas on a reasonable non-risky way to get this job done.
Does the backup table stay around? Does it keep the data permanently, or is it just a copy of the current values?
Too bad about not being able to create/delete/rename/copy. Otherwise, if it's short term, just used in case something goes wrong, then you could drop it at the start of processing and do something like
create table backup_table as select * from primary_table;
Your best option may be to make the select explicit, as
insert into backup_table (<list of columns>) select <list of columns> from primary_table;
You could generate that by building a SQL string from the data dictionary, then doing execute immediate. But you'll still be at risk if the backup_table doesn't contain all the important columns from the primary_table.
Might just want to make it explicit, and raise a major error if backup_table doesn't exist, or any of the columns in primary_table aren't in backup_table.
How often do you change the structure of your tables? Your method should work just fine provided the structure doesn't change. Personally I think your DBAs should give you a mechanism for dropping the backup table and recreating it, such as a stored procedure. We had something similar at my last job for truncating certain tables, since truncating is frequently much faster than DELETE FROM TABLE;.
Is there a reason that you can't just list out the columns in the tables? So
INSERT INTO backup_table( col1, col2, col3, ... colN )
SELECT col1, col2, col3, ..., colN
FROM primary_table
Of course, this requires that you revisit the code when you change the definition of one of the tables to determine if you need to make code changes, but that's generally a small price to pay for insulating yourself from differences in column order, differences in column names, and irrelevent differences in table definitions.
If I had this situation, I would retrieve the column definitions for the two tables right at the beginning of the problem. Then, if they were identical, I would proceed with the simple:
INSERT INTO BACKUP_TABLE
SELECT *
FROM PRIMARY_TABLE
If they were different, I would only proceed if there were no critical columns missing from the backup table. In this case I would use this form for the backup copy:
INSERT INTO BACKUP_TABLE (<list of columns>)
SELECT <list of columns>
FROM PRIMARY_TABLE
But I'd also worry about what would happen if I simply stopped the program with an error, so I might even have a backup plan where I would use the second form for the columns that are in both tables, and also dump a text file with the PK and any columns that are missing from the backup. Also log an error even though it appears that the program completed normally. That way, you could recover the data if the worst happened.
Really, this is a symptom of bad processes somewhere which should be addressed, but defensive programming can help to make it someone else's problem, not yours. If they don't notice the log error message which tells them about the text dump with the missing columns, then its not your fault.
But, if you don't code defensively, and the worst happens, it will be partly your fault.
You could try something like:
CREATE TABLE secondary_table AS SELECT * FROM primary_table;
Not sure if that automatically copies data. If not:
CREATE TABLE secondary_table AS SELECT * FROM primary_table LIMIT 1;
INSERT INTO secondary_table SELECT * FROM primary_table;
Edit:
Sorry, didn't read your post completely: especially the constraints part. I'm afraid I don't know how. My guess would be using a procedure that first describes both tables and compares them, before creating a lengthy insert / select query.
Still, if you're using a backup-table, I think it's pretty important it matches the original one exactly.
Related
Is this considered good practice?
Been trying to merge data from 2 different tables (dsb_nb_users_option & dsb_nb_default_options) for the number of users existing on dsb_nb_users table.
Would a JOIN statement be the best option?
Or a sub-query would work better for me to access the data laying on dsb_nb_users table?
This might be an operation i will have to perform a few times so i want to understand the mechanics of it.
INSERT INTO dsb_nb_users_option(dsb_nb_users_option.code, dsb_nb_users_option.code_value, dsb_nb_users_option.status)
SELECT dsb_nb_default_option.code, dsb_nb_default_option.code_value, dsb_nb_default_option.status
FROM dsb_nb_default_options
WHERE dsb_nb_users.user_id IS NOT NULL;
Thank you for your time!!
On the face of it, I see nothing wrong with your query to achieve your goal. That said, I see several things worth pointing out.
First, please learn to format your code - for your own sanity as well as that of others who have to read it. At the very least, put each column name on a line of its own, indented. A good IDE tool like SQL Developer will do this for you. Like this:
INSERT INTO dsb_nb_users_option (
dsb_nb_users_option.code,
dsb_nb_users_option.code_value,
dsb_nb_users_option.status
)
SELECT
dsb_nb_default_option.code,
dsb_nb_default_option.code_value,
dsb_nb_default_option.status
FROM
dsb_nb_default_options
WHERE
dsb_nb_users.user_id IS NOT NULL;
Now that I've made your code more easily readable, a couple of other things jump out at me. First, it is not necessary to prefix every column name with the table name. So your code gets even easier to read.
INSERT INTO dsb_nb_users_option (
code,
code_value,
status
)
SELECT
code,
code_value,
status
FROM
dsb_nb_default_options
WHERE
dsb_nb_users.user_id IS NOT NULL;
Note that there are times you need to qualify a column name, either because oracle requires it to avoid ambiguity to the parser, or because the developer needs it to avoid ambiguity to himself and those that follow. In this case we usually use table name aliasing to shorten the code.
select a.userid,
a.username,
b.user_mobile_phone
from users a,
join user_telephones b on a.userid=b.userid;
Finally, and more critical your your overall design, It appears that you are unnecessarily duplicating data across multiple tables. This goes against all the rules of data design. Have you read up on 'data normalization'? It's the foundation of relational database theory.
I'm using an external tool that scans tables in my database. It uses dba_objects.last_ddl_time to determine which tables have been scanned. Obviously, this strategy does not work if the table data is modified in between scans so sometimes I have to help it...
I need a way to "bump" the Last DDL time without actually changing anything.
I'm looking for the simplest possible instant DDL statement that can be executed on any table, knowing just the table name.
I have sysdba privileges.
Edit:
For example, I can use comment on table xxx is 'Boom'; but then I lose the original comment. I know how to fix this, but then it is no longer an small and easy statement I can quickly time in sql*plus
Changing LOGGING/NOLOGGING is pretty fast (though not instant).
If you set the LOGGING attribute back to itself, it will notch the LAST_DDL_TIME without making any real change to the table. This example below tries to touch every table except sys tabels (presumably you'd want more limits here)
BEGIN
FOR TABLE_POINTER IN (SELECT OWNER, TABLE_NAME, DECODE(LOGGING,'YES','LOGGING','NOLOGGING') DO_LOGGING
FROM DBA_TABLES WHERE OWNER NOT IN ('SYSTEM','SYS','SYSBACKUP','MDSYS' --etc. other restrictions here
))
LOOP
EXECUTE IMMEDIATE UTL_LMS.FORMAT_MESSAGE('ALTER TABLE %s.%s %s',TABLE_POINTER.OWNER, TABLE_POINTER.TABLE_NAME, TABLE_POINTER.DO_LOGGING);
END LOOP;
END;
/
EDIT: The above wouldn't work with temp tables. An alternative such as setting PCT_FREE to itself or another suitable attribute may be preferable. You may need to handle IOTs, Partitioned Tables, etc. differently than the rest of the tables as well.
I am using Oracle
What is difference when we create ID using max(id)+1 and using sequance.nexval,where to use and when?
Like:
insert into student (id,name) values (select max(id)+1 from student, 'abc');
and
insert into student (id,name) values (SQ_STUDENT.nextval, 'abc');
SQ_STUDENT.nextval sometime gives error that duplicate record...
please help me on this doubt
With the select max(id) + 1 approach, two sessions inserting simultaneously will see the same current max ID from the table, and both insert the same new ID value. The only way to use this safely is to lock the table before starting the transaction, which is painful and serialises the transactions. (And as Stijn points out, values can be reused if the highest record is deleted). Basically, never use this approach. (There may very occasionally be a compelling reason to do so, but I'm not sure I've ever seen one).
The sequence guarantees that the two sessions will get different values, and no serialisation is needed. It will perform better and be safer, easier to code and easier to maintain.
The only way you can get duplicate errors using the sequence is if records already exist in the table with IDs above the sequence value, or if something is still inserting records without using the sequence. So if you had an existing table with manually entered IDs, say 1 to 10, and you created a sequence with a default start-with value of 1, the first insert using the sequence would try to insert an ID of 1 - which already exists. After trying that 10 times the sequence would give you 11, which would work. If you then used the max-ID approach to do the next insert that would use 12, but the sequence would still be on 11 and would also give you 12 next time you called nextval.
The sequence and table are not related. The sequence is not automatically updated if a manually-generated ID value is inserted into the table, so the two approaches don't mix. (Among other things, the same sequence can be used to generate IDs for multiple tables, as mentioned in the docs).
If you're changing from a manual approach to a sequence approach, you need to make sure the sequence is created with a start-with value that is higher than all existing IDs in the table, and that everything that does an insert uses the sequence only in the future.
Using a sequence works if you intend to have multiple users. Using a max does not.
If you do a max(id) + 1 and you allow multiple users, then multiple sessions that are both operating at the same time will regularly see the same max and, thus, will generate the same new key. Assuming you've configured your constraints correctly, that will generate an error that you'll have to handle. You'll handle it by retrying the INSERT which may fail again and again if other sessions block you before your session retries but that's a lot of extra code for every INSERT operation.
It will also serialize your code. If I insert a new row in my session and go off to lunch before I remember to commit (or my client application crashes before I can commit), every other user will be prevented from inserting a new row until I get back and commit or the DBA kills my session, forcing a reboot.
To add to the other answers, a couple of issues.
Your max(id)+1 syntax will also fail if there are no rows in the table already, so use:
Coalesce(Max(id),0) + 1
There's nothing wrong with this technique if you only have a single process that inserts into the table, as might be the case with a data warehouse load, and if max(id) is fast (which it probably is).
It also avoids the need for code to synchronise values between tables and sequences if you are moving restoring data to a test system, for example.
You can extend this method to multirow insert by using:
Coalesce(max(id),0) + rownum
I expect that might serialise a parallel insert, though.
Some techniques don't work well with these methods. They rely of course on being able to issue the select statement, so SQL*Loader might be ruled out. However SQL*Loader has support for this technique in general through the SEQUENCE parameter of the column specification: http://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_field_list.htm#i1008234
Assuming MAX(ID) is actually fast enough, wouldn't it be possible to:
First get MAX(ID)+1
Then get NEXTVAL
Compare those two and increase sequence in case NEXTVAL is smaller then MAX(ID)+1
Use NEXTVAL in INSERT statement
In that case I would have a fully stable procedure and manual inserts would also be allowed without worrying about updating the sequence
I read somewhere that 99% of time you don't need to use a cursor.
But I can't think of any other way beside using a cursor in this following situation.
Select t.flag
From Dual t;
Let's say this return 4 rows of either 'Y' or 'N'. I want the procedure to trigger something if it finds 'Y'. I usually declare a cursor and loop until %NOTFOUND. Please tell me if there is a better way.
Also, if you have any idea, when is the best time to use a cursor?
EDIT: Instead of inserting the flags, what if I want to do "If 'Y' then trigger something"?
Your case definitely falls into the 99%.
You can easily do the conditional insert using insert into ... select.... It's just a matter or making a select that returns the result that you want to insert.
If you want to insert one record for each 'Y' then use a query with where flag = 'Y'. If you only want to insert a single record depending on whether there are at least one 'Y', then you can add distinct to the query.
A cursor is useful when you make something more complicated. I for example use a cursor when need to insert or update records in one table, and also for each record insert or update one or more records into several other tables.
Something like this:
INSERT INTO TBL_FLAG (col)
SELECT ID FROM Dual where flag = 'Y'
You will usually see a performance gain when using set based instead of procedural operations because most modern DBMS are setup to perform set based operations. You can read more here.
well the example doesnt quite make sense..
but you can always write an insert as select statement instead of what i think you are describing
Cursors are best to use when an column value form one table will be used repeatedly in multiple queries on different tables.
Suppose the values of id_test column are fetched from MY_TEST_TBL using a cursor CUR_TEST. Now this id_test column is a foreign key in MY_TEST_TBL. If we want to use id_test to insert or update any rows in table A_TBL,B_TBL and C_TBL, then in this case it's best to use cursors instead of using complex queries.
Hope this might help to understand the purpose of cursors
I've just found the following code:
select max(id) from TABLE_NAME ...
... do some stuff ...
insert into TABLE_NAME (id, ... )
VALUES (max(id) + 1, ...)
I can create a sequence for the PK, but there's a bunch of existing code (classic asp, existing asp.net apps that aren't part of this project) that's not going to use it.
Should I just ignore it, or is there a way to fix it without going into the existing code?
I'm thinking that the best option is just to do:
insert into TABLE_NAME (id, ... )
VALUES (select max(id) + 1, ...)
Options?
You can create a trigger on the table that overwrites the value for ID with a value that you fetch from a sequence.
That way you can still use the other existing code and have no problems with concurrent inserts.
If you cannot change the other software and they still do the select max(id)+1 insert that is most unfortunate. What you then can do is:
For your own insert use a sequence and populate the ID field with -1*(sequence value).
This way the insert will not interfere with the existing programs, but also not conflict with the existing programs.
(of do the insert without a value for id and use a trigger to populate the ID with the negative value of a sequence).
As others have said, you can override the max value in a database trigger using a sequence. However, that could cause problems if any of the application code uses that value like this:
select max(id) from TABLE_NAME ...
... do some stuff ...
insert into TABLE_NAME (id, ... )
VALUES (max(id) + 1, ...)
insert into CHILD_TABLE (parent_id, ...)
VALUES (max(id) + 1, ...)
Use a seqeunce in a before insert row trigger. select max(id) + 1 doesn't work in a multi concerrency environment.
This quickly turns in to a discussion of application architecture, especially when the question boils down to "what should I do?"
Primary keys in Oracle really need to come from sequences and since you're dealing with complex insert logic (parent/child inserts, at least) in your application code, you should go into the existing code, as you say (since triggers probably won't help you).
On one extreme you could take away direct SQL access from applications and make them call services so the insert/update/delete code can be centralized. Or you could rewrite your code using some sort of MVC architecture. I'm assuming both are overkill for your situation.
Is the id column at least set to be a true primary key so there's a constraint that will keep duplicates from occurring? If not, start there.
Once the primary key is in place, or if it already is, it's only a matter of time until inserts start to fail; you'll know when they start to fail, right? If not, get on the error-logging.
Now fix the application code. While you're in there, you should at least write and call helper code so your database interactions are in as few places as possible. Then provide some leadership to the other developers and make sure they use the helper code too.
Big question: does anybody rely on the value of the PK? If not I would recommend using a trigger, fetching the id from a sequence and setting it. The inserts wouldn't specify and id at all.
I am not sure but the
insert into TABLE_NAME (id, ... )
VALUES (select max(id) + 1, ...)
might cause problems when to sessions reach that code. It might be that oracle reads the table (calculating max(id)) and then trys to get the lock on the PK for insertion. If that is the case two concurrent session might try to use the same id, causing an exception in the second session.
You could add some logging to the trigger, to check if inserts get processed that already have an ID set. So you know you have still to hunt down some place where the old code is used.
It can be done by fetching the max value in a variable and then just insert it in the table like
Declare
v_max int;
select max(id) into v_max from table;
insert into table values((v_max+rownum),val1,val2....,valn);
commit;
This will create a sequence in a single as well as Bulk inserts.