Oracle trigger efficiency vs scheduled jobs - oracle

Certain invalid characters sometimes get inserted into database due to improper keyboard configuration of end-users.
We have a pl/sql script that iterates through all columns and replaces the invalid characters. The script takes less than one minute to complete (but our system is still beta and is not used widely) .
One solution is to create a trigger that replaces the invalid characters right before insertion into the tables and one solution would be using a scheduled job to run the so-called script.
Which one is more efficient ?

If you use a scheduled job, there is still some time during which the invalid characters are inside your data. If it is very important that this is not the case, use a trigger. A trigger will ensure that the database only contains valid characters.
I don't think efficiency is very important here. A trigger can easily keep up with data entry by users.

Not a fan of triggers, to be honest.
You could improve the efficiency of a scheduled job by providing an index that identifies the rows that need to be updated.
In general the syntax would be something like:
create index my_speshul_index on my_table(
case when some_expression then 1 else null end);
some_expression is the expression that detects the presence of these characters -- possibly a regexp_like.
Then you would query:
select ... from my_table
where case when some_expression then 1 else null end;

Another way to go would be to allow the data to enter the database as originally typed, but create a virtual column that would contain the cleaned data.
However, that's only workable if the problem is limited to a few columns. Since it can be all user-enterable columns, a trigger with regexp_replace seems to be the best way to go. Depending on what characters you allow, it could be as simple as a series of statements like
:new.columnA := regexp_replace(:new.ColumnA,'[^[:print:]]','');
That is, just remove all the non-printable characters.

Related

Can N function cause problems with existing queries?

We use Oracle 10g and Oracle 11g.
We also have a layer to automatically compose queries, from pseudo-SQL code written in .net (something like SqlAlchemy for Python).
Our layer currently wraps any string in single quotes ' and, if contains non-ANSI characters, it automatically compose the UNISTR with special characters written as unicode bytes (like \00E0).
Now we created a method for doing multiple inserts with the following construct:
INSERT INTO ... (...)
SELECT ... FROM DUAL
UNION ALL SELECT ... FROM DUAL
...
This algorithm could compose queries where the same string field is sometimes passed as 'my simple string' and sometimes wrapped as UNISTR('my string with special chars like \00E0').
The described condition causes a ORA-12704: character set mismatch.
One solution is to use the INSERT ALL construct but it is very slow compared to the one used now.
Another solution is to instruct our layer to put N in front of any string (except for the ones already wrapped with UNISTR). This is simple.
I just want to know if this could cause any side-effect on existing queries.
Note: all our fields on DB are either NCHAR or NVARCHAR2.
Oracle ref: http://docs.oracle.com/cd/B19306_01/server.102/b14225/ch7progrunicode.htm
Basicly what you are asking is, is there a difference between how a string is stored with or without the N function.
You can just check for yourself consider:
SQL> create table test (val nvarchar2(20));
Table TEST created.
SQL> insert into test select n'test' from dual;
1 row inserted.
SQL> insert into test select 'test' from dual;
1 row inserted.
SQL> select dump(val) from test;
DUMP(VAL)
--------------------------------------------------------------------------------
Typ=1 Len=8: 0,116,0,101,0,115,0,116
Typ=1 Len=8: 0,116,0,101,0,115,0,116
As you can see identical so no side effect.
The reason this works so beautifully is because of the elegance of unicode
If you are interested here is a nice video explaining it
https://www.youtube.com/watch?v=MijmeoH9LT4
I assume that you get an error "ORA-12704: character set mismatch" because your data inside quotes considered as char but your fields is nchar so char is collated using different charsets, one using NLS_CHARACTERSET, the other NLS_NCHAR_CHARACTERSET.
When you use an UNISTR function, it converts data from char to nchar (in any case that also converts encoded values into characters) as the Oracle docs say:
"UNISTR takes as its argument a text literal or an expression that
resolves to character data and returns it in the national character
set."
When you convert values explicitly using N or TO_NCHAR you only get values in NLS_NCHAR_CHARACTERSET without decoding. If you have some values encoded like this "\00E0" they will not be decoded and will be considered unchanged.
So if you have an insert such as:
insert into select N'my string with special chars like \00E0',
UNISTR('my string with special chars like \00E0') from dual ....
your data in the first inserting field will be: 'my string with special chars like \00E0' not 'my string with special chars like à'. This is the only side effect I'm aware of. Other queries should already use NLS_NCHAR_CHARACTERSET encoding, so it shouldn't be any problem using an explicit conversion.
And by the way, why not just insert all values as N'my string with special chars like à'? Just encode them into UTF-16 (I assume that you use UTF-16 for nchars) first if you use different encoding in 'upper level' software.
use of n function - you have answers already above.
If you have any chance to change the charset of the database, that would really make your life easier. I was working on huge production systems, and found the trend that because of storage space is cheap, simply everyone moves to AL32UTF8 and the hassle of internationalization slowly becomes the painful memories of the past.
I found the easiest thing is to use AL32UTF8 as the charset of the database instance, and simply use varchar2 everywhere. We're reading and writing standard Java unicode strings via JDBC as bind variables without any harm, and fiddle.
Your idea to construct a huge text of SQL inserts may not scale well for multiple reasons:
there is a fixed length of maximum allowed SQL statement - so it won't work with 10000 inserts
it is advised to use bind variables (and then you don't have the n'xxx' vs unistr mess either)
the idea to create a new SQL statement dynamically is very resource unfriedly. It does not allow Oracle to cache any execution plan for anything, and will make Oracle hard parse your looong statement at each call.
What you're trying to achieve is a mass insert. Use the JDBC batch mode of the Oracle driver to perform that at light-speed, see e.g.: http://viralpatel.net/blogs/batch-insert-in-java-jdbc/
Note that insert speed is also affected by triggers (which has to be executed) and foreign key constraints (which has to be validated). So if you're about to insert more than a few thousands of rows, consider disabling the triggers and foreign key constraints, and enable them after the insert. (You'll lose the trigger calls, but the constraint validation after insert can make an impact.)
Also consider the rollback segment size. If you're inserting a million of records, that will need a huge rollback segment, which likely will cause serious swapping on the storage media. It is a good rule of thumb to commit after each 1000 records.
(Oracle uses versioning instead of shared locks, therefore a table with uncommitted changes are consistently available for reading. The 1000 records commit rate means roughly 1 commit per second - slow enough to benefit of write buffers, but quick enough to not interfer with other humans willing to update the same table.)

Oracle: difference between max(id)+1 and sequence.nextval

I am using Oracle
What is difference when we create ID using max(id)+1 and using sequance.nexval,where to use and when?
Like:
insert into student (id,name) values (select max(id)+1 from student, 'abc');
and
insert into student (id,name) values (SQ_STUDENT.nextval, 'abc');
SQ_STUDENT.nextval sometime gives error that duplicate record...
please help me on this doubt
With the select max(id) + 1 approach, two sessions inserting simultaneously will see the same current max ID from the table, and both insert the same new ID value. The only way to use this safely is to lock the table before starting the transaction, which is painful and serialises the transactions. (And as Stijn points out, values can be reused if the highest record is deleted). Basically, never use this approach. (There may very occasionally be a compelling reason to do so, but I'm not sure I've ever seen one).
The sequence guarantees that the two sessions will get different values, and no serialisation is needed. It will perform better and be safer, easier to code and easier to maintain.
The only way you can get duplicate errors using the sequence is if records already exist in the table with IDs above the sequence value, or if something is still inserting records without using the sequence. So if you had an existing table with manually entered IDs, say 1 to 10, and you created a sequence with a default start-with value of 1, the first insert using the sequence would try to insert an ID of 1 - which already exists. After trying that 10 times the sequence would give you 11, which would work. If you then used the max-ID approach to do the next insert that would use 12, but the sequence would still be on 11 and would also give you 12 next time you called nextval.
The sequence and table are not related. The sequence is not automatically updated if a manually-generated ID value is inserted into the table, so the two approaches don't mix. (Among other things, the same sequence can be used to generate IDs for multiple tables, as mentioned in the docs).
If you're changing from a manual approach to a sequence approach, you need to make sure the sequence is created with a start-with value that is higher than all existing IDs in the table, and that everything that does an insert uses the sequence only in the future.
Using a sequence works if you intend to have multiple users. Using a max does not.
If you do a max(id) + 1 and you allow multiple users, then multiple sessions that are both operating at the same time will regularly see the same max and, thus, will generate the same new key. Assuming you've configured your constraints correctly, that will generate an error that you'll have to handle. You'll handle it by retrying the INSERT which may fail again and again if other sessions block you before your session retries but that's a lot of extra code for every INSERT operation.
It will also serialize your code. If I insert a new row in my session and go off to lunch before I remember to commit (or my client application crashes before I can commit), every other user will be prevented from inserting a new row until I get back and commit or the DBA kills my session, forcing a reboot.
To add to the other answers, a couple of issues.
Your max(id)+1 syntax will also fail if there are no rows in the table already, so use:
Coalesce(Max(id),0) + 1
There's nothing wrong with this technique if you only have a single process that inserts into the table, as might be the case with a data warehouse load, and if max(id) is fast (which it probably is).
It also avoids the need for code to synchronise values between tables and sequences if you are moving restoring data to a test system, for example.
You can extend this method to multirow insert by using:
Coalesce(max(id),0) + rownum
I expect that might serialise a parallel insert, though.
Some techniques don't work well with these methods. They rely of course on being able to issue the select statement, so SQL*Loader might be ruled out. However SQL*Loader has support for this technique in general through the SEQUENCE parameter of the column specification: http://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_field_list.htm#i1008234
Assuming MAX(ID) is actually fast enough, wouldn't it be possible to:
First get MAX(ID)+1
Then get NEXTVAL
Compare those two and increase sequence in case NEXTVAL is smaller then MAX(ID)+1
Use NEXTVAL in INSERT statement
In that case I would have a fully stable procedure and manual inserts would also be allowed without worrying about updating the sequence

Oracle DBMS package command to export table content as INSERT statement

Is there any subprogram similar to DBMS_METADATA.GET_DDL that can actually export the table data as INSERT statements?
For example, using DBMS_METADATA.GET_DDL('TABLE', 'MYTABLE', 'MYOWNER') will export the CREATE TABLE script for MYOWNER.MYTABLE. Any such things to generate all data from MYOWNER.MYTABLE as INSERT statements?
I know that for instance TOAD Oracle or SQL Developer can export as INSERT statements pretty fast but I need a more programmatically way for doing it. Also I cannot create any procedures or functions in the database I'm working.
Thanks.
As far as I know, there is no Oracle supplied package to do this. And I would be skeptical of any 3rd party tool that claims to accomplish this goal, because it's basically impossible.
I once wrote a package like this, and quickly regretted it. It's easy to get something that works 99% of the time, but that last 1% will kill you.
If you really need something like this, and need it to be very accurate, you must tightly control what data is allowed and what tools can be used to run the script. Below is a small fraction of the issues you will face:
Escaping
Single inserts are very slow (especially if it goes over a network)
Combining inserts is faster, but can run into some nasty parsing bugs when you start inserting hundreds of rows
There are many potential data types, including custom ones. You may only have NUMBER, VARCHAR2, and DATE now, but what happens if someone adds RAW, BLOB, BFILE, nested tables, etc.?
Storing LOBs requires breaking the data into chunks because of VARCHAR2 size limitations (4000 or 32767, depending on how you do it).
Character set issues - This will drive you ¿¿¿¿¿¿¿ insane.
Enviroment limitations - For example, SQL*Plus does not allow more than 2500 characters per line, and will drop whitespace at the end of your line.
Referential Integrity - You'll need to disable these constraints or insert data in the right order.
"Fake" columns - virtual columns, XML lobs, etc. - don't import these.
Missing partitions - If you're not using INTERVAL partitioning you may need to manually create them.
Novlidated data - Just about any constraint can be violated, so you may need to disable everything.
If you want your data to be accurate you just have to use the Oracle utilities, like data pump and export.
Why don't you use regular export ?
If you must you can generate the export script:
Let's assume a Table myTable(Name VARCHAR(30), AGE Number, Address VARCHAR(60)).
select 'INSERT INTO myTable values(''' || Name || ','|| AGE ||',''' || Address ||''');' from myTable
Oracle SQL Developer does that with it's Export feature. DDL as well as data itself.
Can be a bit unconvenient for huge tables and likely to cause issues with cases mentioned above, but works well 99% of the time.

How to protect a running column within Oracle/PostgreSQL (kind of MAX-result locking or something)

I'd need advice on following situation with Oracle/PostgreSQL:
I have a db table with a "running counter" and would like to protect it in the following situation with two concurrent transactions:
T1 T2
SELECT MAX(C) FROM TABLE WHERE CODE='xx'
-- C for new : result + 1
SELECT MAX(C) FROM TABLE WHERE CODE='xx';
-- C for new : result + 1
INSERT INTO TABLE...
INSERT INTO TABLE...
So, in both cases, the column value for INSERT is calculated from the old result added by one.
From this, some running counter handled by the db would be fine. But that wouldn't work because
the counter values or existing rows are sometimes changed
sometimes I'd like there to be multiple counter "value groups" (as with the CODE mentioned) : with different values for CODE the counters would be independent.
With some other databases this can be handled with SERIALIZABLE isolation state but at least with Oracle&Postgre the phantom reads are prevented but as the result the table ends up with two distinct rows with same counter value. This seems to have to do with the predicate locking, locking "all the possible rows covered by the query" - some other db:s end up to lock the whole table or something..
SELECT ... FOR UPDATE -statements seem to be for other purposes and don't even seem to work with MAX() -function.
Setting an UNIQUE contraint on the column would probably be the solution but are there some other ways to prevent the situation?
b.r. Touko
EDIT: One more option could probably be manual locking even though it doesn't appear nice to me..
Both Oracle and PostgreSQL support what's called sequences and the perfect fit for your problem. You can have a regular int column, but define one sequence per group, and do a single query like
--PostgreSQL
insert into table (id, ... ) values (nextval(sequence_name_for_group_xx), ... )
--Oracle
insert into table (id, ... ) values (sequence_name_for_group_xx.nextval, ... )
Increments in sequences are atomic, so your problem just wouldn't exist. It's only a matter of creating the required sequences, one per group.
the counter values or existing rows are sometimes changed
You should to put a unique constraint on that column if this would be a problem for your app. Doing so would guarantee a transaction at SERIALIZABLE isolation level would abort if it tried to use the same id as another transaction.
One more option could probably be manual locking even though it doesn't appear nice to me..
Manual locking in this case is pretty easy: just take a SHARE UPDATE EXCLUSIVE or stronger lock on the table before selecting the maximum. This will kill concurrent performance, though.
sometimes I'd like there to be multiple counter "value groups" (as with the CODE mentioned) : with different values for CODE the counters would be independent.
This leads me to the Right Solution for this problem: sequences. Set up several sequences, one for each "value group" you want to get IDs in their own range. See Section 9.15 of The Manual for the details of sequences and how to use them; it looks like they're a perfect fit for you. Sequences will never give the same value twice, but might skip values: if a transaction gets the value '2' from a sequence and aborts, the next transaction will get the value '3' rather than '2'.
The sequence answer is common, but might not be right. The viability of this solution depends on what you actually need. If what you semantically want is "some guaranteed to be unique number" then that is what a sequence is for. However, if what you want is to make sure that your value increases by exactly one on each insert (as you have asked), then DO NOT USE A SEQUENCE! I have run into this trap before myself. Sequences are not guaranteed to be sequential! They can skip numbers. Depending on what sort of optimizations you have configured, they can skip LOTS of numbers. Even if you have things configured just right so that you shouldn't skip any numbers, that is not guaranteed, and is not what sequences are for. So, you are only asking for trouble if you (mis)use them like that.
One step better solution is to bundle the select into the insert, like so:
INSERT INTO table(code, c, ...)
VALUES ('XX', (SELECT MAX(c) + 1 AS c FROM table WHERE code = 'XX'), ...);
(I haven't test run that query, but I'm pretty sure it should work. My apologies if it doesn't.) But, doing something like that reflects the semantic intent of what you are trying to do. However, this is inefficient, because you have to do a scan for MAX, and the inference I am taking from your sample is that you have a small number of code values relative to the size of the table, so you are going to do an expensive, full table scan on every insert. That isn't good. Also, this doesn't even get you the ACID guarantee you are looking for. The select is not transactionally tied to the insert. You can't "lock" the result of the MAX() function. So, you could still have two transactions running this query and they both do the sub-select and get the same max, both add one, and then both try to insert. It's a much smaller window, but you may still technically have a race condition here.
Ultimately, I would challenge that you probably have the wrong data model if you are trying to increment on insert. You should insert with a unique key, most commonly a sequence value (at least as an easy, surrogate key for any natural key). That gets the data safely inserted. Then, if you need a count of things, then have one table that stores your counts.
CREATE TABLE code_counts (
code VARCHAR(2), --or whatever
count NUMBER
);
If you really want to store the code count of each item as it is inserted, the separate count table also allows you to do so correctly, transactionally, like so:
UPDATE code_counts SET count = count + 1 WHERE code = 'XX' RETURNING count INTO :count;
INSERT INTO table(code, c, ...) VALUES ('XX', :count, ...);
COMMIT;
The key is that the update locks the counter table and reserves that value for you. Then your insert uses that value. And all of that is committed as one transactional change. You have to do this in a transaction. Having a separate count table avoids the full table scan of doing SELECT MAX().... In essense, what this does is re-implements a sequence, but it also guarantees you sequencial, ordered use.
Without knowing your whole problem domain and data model, it is hard to say, but abstracting your counts out to a separate table like this where you don't have to do a select max to get the right value is probably a good idea. Assuming, of course, that a count is what you really care about. If you are just doing logging or something where you want to make sure things are unique, then use a sequence, and a timestamp to sort by.
Note that I'm saying not to sort by a sequence either. Basically, never trust a sequence to be anything other than unique. Because when you get to caching sequence values on a multi-node system, your application might even consume them out of order.
This is why you should use the Serial datatype, which defers the lookup of C to the time of insert (which uses table locks i presume). You would then not specify C, but it would be generated automatically. If you need C for some intermediate calculation, you would need to save first, then read C and finally update with the derived values.
Edit: Sorry, I didn't read your whole question. What about solving your other problems with normalization? Just create a second table for each specific type (for each x where A='x'), where you have another auto increment. Manually edited sequences could be another column in the same table, which uses the generated sequence as a base (i.e if pk = 34 you can have another column mypk='34Changed').
You can create sequential collumn by using sequence as default value:
First, you have to create the sequence counter:
CREATE SEQUENCE SEQ_TABLE_1 START WITH 1 INCREMENT BY 1;
So, you can use it as default value:
CREATE TABLE T (
COD NUMERIC(10) DEFAULT NEXTVAL('SEQ_TABLE_1') NOT NULL,
collumn1...
collumn2...
);
Now you don't need to worry about sequence on inserting rows:
INSERT INTO T (collumn1, collumn2) VALUES (value1, value2);
Regards.

Manually inserting data in a table(s) with primary key populated with sequence

I have a number of tables that use the trigger/sequence column to simulate auto_increment on their primary keys which has worked great for some time.
In order to speed the time necessary to perform regression testing against software that uses the db, I create control files using some sample data, and added running of these to the build process.
This change is causing most of the tests to crash though as the testing process installs the schema from scratch, and the sequences are returning values that already exist in the tables. Is there any way to programtically say "Update sequences to max value in column" or do I need to write out a whole script by hand that updates all these sequences, or can I/should I change the trigger that substitutes the null value for the sequence to some how check this (though I think this might cause the mutating table problem)?
You can generate a script to create the sequences with the start values you need (based on their existing values)....
SELECT 'CREATE SEQUENCE '||sequence_name||' START WITH '||last_number||';'
FROM ALL_SEQUENCES
WHERE OWNER = your_schema
(If I understand the question correctly)
Here's a simple way to update a sequence value - in this case setting the sequence to 1000 if it is currently 50:
alter sequence MYSEQUENCE increment by 950 nocache;
select MYSEQUENCE_S.nextval from dual;
alter sequence MYSEQUENCE increment by 1;
Kudos to the creators of PL/SQL Developer for including this technique in their tool.
As part of your schema rebuild, why not drop and recreate the sequence?

Resources