Identical Oracle db setups: exception on just one of them - oracle

edit: Look to the end of this question for what caused the error and how I found out.
I have a very strange exception thrown on me from Hibernate when I run an app that does batch inserts of data into an oracle database. The error comes from the Oracle database, ORA-00001, which
" means that an attempt has been made to
insert a record with a duplicate
(unique) key. This error will also be
generated if an existing record is
updated to generate a duplicate
(unique) key."
The error is weird because I have created the same table (exactly same definition) on another machine where I do NOT get the same error if I use that through my app. AND all the data get inserted into the database, so nothing is really rejected.
There has to be something different between the two setups, but the only thing I can see that is different is the banner output that I get when issuing
select * from v$version where banner like 'Oracle%';
The database that gives me trouble:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Prod
The one that works:
Oracle Database 10g Release 10.2.0.3.0 - 64bit Production
Table definitions, input, and the app I wrote is the same for both. The table involved is basically a four column table with a composite id (serviceid, date, value1, value2) - nothing fancy.
Any ideas on what can be wrong? I have started out clean several times, dropping both tables to start on equal grounds, but I still get the error from the database.
Some more of the output:
Caused by: java.sql.BatchUpdateException: ORA-00001: unique constraint (STATISTICS.PRIMARY_KEY_CONSTRAINT) violated
at oracle.jdbc.driver.DatabaseError.throwBatchUpdateException(DatabaseError.java:367)
at oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:8728)
at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:70)
How I found out what caused the problem
Thanks to APC and ik_zelf I was able to pinpoint the root cause of this error. It turns out the Quartz scheduler was wrongly configured for the production database (where the error turned up).
For the job running against the non-failing oracle server I had <cronTriggerExpression>0/5 * * * * ?</cronTriggerExpression> which ran the batch job every five seconds. I figured that once a minute was sufficent for the other oracle server, and set the quartz scheduler up with * */1 * * * ?. This turns out to be wrong, and instead of running every minute, this ran every second!
Each job took approximately 1.5-2 seconds, and thus two or more jobs were running concurrently, thus causing simultaneous inserts on the server. So instead of inserting 529 elements, I was getting anywhere from 1000 to 2000 inserts. Changing the crontrigger expression to the same as the other one, running every five seconds, fixed the problem.
To find out what was wrong I had to set true in hibernate.cfg.xml and disable the primary key constraint on the table.
-- To catch exceptions
-- to find the offending rows run the following query
-- SELECT * FROM uptime_statistics, EXCEPTIONS WHERE MY_TABLE.rowid = EXCEPTIONS.row_id;
create table exceptions(row_id rowid,
owner varchar2(30),
table_name varchar2(30),
constraint varchar2(30));
-- This table was set up
CREATE TABLE MY_TABLE
(
LOGDATE DATE NOT NULL,
SERVICEID VARCHAR2(255 CHAR) NOT NULL,
PROP_A NUMBER(10,0),
PROP_B NUMBER(10,0),
CONSTRAINT PK_CONSTRAINT PRIMARY KEY (LOGDATE, SERVICEID)
);
-- Removed the constraint to see what was inserted twice or more
alter table my_table
disable constraint PK_CONSTRAINT;
-- Enable this later on to find rows that offend the constraints
alter table my_table
enable constraint PK_CONSTRAINT
exceptions into exceptions;

You have a unique compound constraint. ORA-00001 means that you have two or more rows which have duplicate values in ServiceID, Date, Value1 and/or Value2. You say the input is the same for both databases. So either:
you are imagining that your program is hurling ORA-00001
you are mistaken that the input is the same in both runs.
The more likely explanation is the second one: one or more of your key columns is populated by an external source or default value (e.g. code table for ServiceId or SYSDATE for the date column). In your failing database this automatic population is failing to provide a unique value. There can be any number of reasons why this might be so, depending on what mechanism(s) you're using. Remember that in a unique compound key NULL entries count. That is, you can have any number of records (NULL,NULL.NULL,NULL) but only one for (42,NULL,NULL,NULL).
It is hard for us to guess what the actual problem might be, and almost as hard for you (although you do have the advantage of being the code's author, which ought to grant you some insight). What you need is some trace statements. My preferred solution would be to use Bulk DML Exception Handling but then I am a PL/SQL fan. Hibernate allows you to hook in some logging to your programs: I suggest you switch it on. Code is a heck of a lot easier to debug when it has decent instrumentation.
As a last resort, disable the constraint before running the batch insert. Afterwards re-enable it like this:
alter table t42
enable constraint t42_uk
exceptions into my_exceptions
/
This will fail if you have duplicate rows but crucially the MY_EXCEPTIONS table will list all the rows which clash. That at least will give you some clue as to the source of the duplication. If you don't already have an exceptions table you will have to run a script: $ORACLE_HOME/rdbms/admin/utlexcptn.sql ( you may need a DBA to gain access to this directory).
tl;dr
insight requires information: instrument your code.

The one that has problems is a EE and the other looks like a SE database. I expect that the first is on quicker hardware. If that is the case, and your date column is filled using SYSDATE, it could very well be that the time resolution is not enough; that you get duplicate date values. If the other columns of your data are also not unique, you get ORA-00001.
It's a long shot but at first sight I would look into this direction.
Can you use an exception table to identify the data? See Reporting Constraint Exceptions

My guess would be the service id. Whatever service_id hibernate is using for the 'fresh' insert has already been used.
Possibly the table is empty in one database but populated in another
I'm betting though that the service_id is sequence generated and the sequence number is out of sync with the data content. So you have the same 1000 rows in the table but doing
SELECT service_id_seq.nextval FROM DUAL
in one database gives a lower number than the other. I see this a lot where the sequence has been created (eg out of source control) and data has been imported into the table from another database.

Related

Postgres (AWS Aurora) is not enforcing unique index/constraint

We are using Postgres for our production database, it's technically an Amazon AWS Aurora database using the 10.11 engine version. It doesn't seem to be under any unreasonable load (100-150 concurrent connections, CPU always under 10%, about 50% of the memory used, spikes to 300 write IOPS / 1500 read IOPS per second).
We like to ensure really good data consistency, so we make extensive use of foreign keys, triggers to validate data as it's being inserted/updated and also lots of unique constraints.
Most of the writes originate from simple REST API requests, which result in very standard insert and update queries. However, in some cases we also use triggers and functions to handle more complicated logic. For example, an update to one table will result in some fairly complicated cascading updates to other tables.
All queries are always wrapped in transactions, and for the most part we do not make use of explicit locking.
So what's wrong?
We have many (dozens of rows, across dozens of tables) instances where data exists in the database which does not conform to our unique constraints.
Sometimes the created_at and updated_at timestamps for the offending rows are identical, other times they are very similar (within half a second). This leads me to believe that this is being caused by a race condition.
We're not certain, but are fairly confident that the thing in common with these records is that the writes either triggered a function (the record was written from a simple insert or update, and caused several other tables to be updated) or that the write came from a function (a different record was written from a simple insert or update, which triggered a function that wrote the offending data).
From what I have been able to research, unique constraints/indexes are incredibly reliable and "just work". Is this true? If so, then why might this be happening?
Here is an example of some offending data, I've had to black out some of it, but I promise you the values in the user_id field are identical. As you will see below, there is a unique index across user_id, position, and undeleted. So the presence of this data should be impossible.
Here is an export of table structure:
-- Table Definition ----------------------------------------------
CREATE TABLE guides.preferences (
id uuid DEFAULT gen_random_uuid() PRIMARY KEY,
user_id uuid NOT NULL REFERENCES users.users(id),
guide_id uuid NOT NULL REFERENCES users.users(id),
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
undeleted boolean DEFAULT true,
deleted_at timestamp without time zone,
position integer NOT NULL CHECK ("position" >= 0),
completed_meetings_count integer NOT NULL DEFAULT 0,
CONSTRAINT must_concurrently_set_deleted_at_and_undeleted CHECK (undeleted IS TRUE AND deleted_at IS NULL OR undeleted IS NULL AND deleted_at IS NOT NULL),
CONSTRAINT preferences_guide_id_user_id_undeleted_unique UNIQUE (guide_id, user_id, undeleted),
CONSTRAINT preferences_user_id_position_undeleted_unique UNIQUE (user_id, position, undeleted) DEFERRABLE INITIALLY DEFERRED
);
COMMENT ON COLUMN guides.preferences.undeleted IS 'Set simultaneously with deleted_at to flag this as deleted or undeleted';
COMMENT ON COLUMN guides.preferences.deleted_at IS 'Set simultaneously with deleted_at to flag this as deleted or undeleted';
-- Indices -------------------------------------------------------
CREATE UNIQUE INDEX preferences_pkey ON guides.preferences(id uuid_ops);
CREATE UNIQUE INDEX preferences_user_id_position_undeleted_unique ON guides.preferences(user_id uuid_ops,position int4_ops,undeleted bool_ops);
CREATE INDEX index_preferences_on_user_id_and_guide_id ON guides.preferences(user_id uuid_ops,guide_id uuid_ops);
CREATE UNIQUE INDEX preferences_guide_id_user_id_undeleted_unique ON guides.preferences(guide_id uuid_ops,user_id uuid_ops,undeleted bool_ops);
We're really stumped by this, and hope that someone might be able to help us. Thank you!
I found it the reason! We have been building a lot of new functionality over the last few months, and have been running lots of migrations to change schema and update data. Because of all the triggers and functions in our database, it often makes sense to temporarily disable triggers. We do this with “set session_replication_role = ‘replica’;”.
Turns out that this also disables all deferrable constraints, because deferrable constraints and foreign keys are trigger based. As you can see from the schema in my question, the unique constraint in question is set as deferrable.
Mystery solved!

Operations on certain tables won't finish

We have a table TRANSMISSIONS(ID, NAME) which behaves funny in the following ways:
The statement to add a foreign key in another table referencing TRANSMISSIONS.ID won't finish
The statement to add a column to TRANSMISSIONS won't finish
The statement to disable/drop a unique constraint won't finish
The statement to disable/drop a trigger won't finish
TRANSMISSION's primary key is ID, there is also a unique constraint on NAME - therefore there are indexes on ID and NAME. We also have a trigger which creates values for column ID using a sequence, so that INSERT statements do not need to provide a value for ID.
Besides TRANSMISSIONS, there are two more tables behaving like this. For other tables, the above-mentioned statements work fine.
The database is used in an application with Hibernate and due to an incorrect JPA configuration we produced high values for ID during a time. Note that we use the trigger only for "manual" INSERT statements and that Hibernate produces ID values itself, also using the sequence.
The first thought was that the problems were due to the high IDs but we have the problems also with tables that never had such high IDs.
Anyways we suspected that the indexes might be fragmented somehow and called ALTER INDEX TRANSMISSIONS_PK SHRINK SPACE COMPACT, which ran through but showed no effect.
Also, we wanted to call ALTER TABLE TRANSMISSIONS SHRINK SPACE COMPACT which didn't work because we needed to call first ALTER TABLE TRANSMISSIONS ENABLE ROW MOVEMENT which never finished.
We have another instance of the database which does not behave in such a funny way. So we think it might be that in the course of running the application the database got somehow into an inconsistent state.
Does someone have any suggestions what might have gone out of control/into an inconsitent state?
More hints:
There are no locks present on any objects in the database (according to information in v$lock and v$locked_object)
We tried all these statements in SQL Developer and also using SQLPlus (command-line tool).

Sybase ASE remote row insert locking

Im working on an application which access a Sybase ASE 15.0.2 ,where the current code access a remote database
(CIS) to insert a row using a proxy table definition (the destination table is a DOL - DRL table - The PK
row is defined as identity ,and is always growing). The current code performs a select to check if the row
already exists to avoid duplicate data to be inserted.
Since the remote table also have a PK definition on the table, i do understand that the PK verification will
be done again prior to commiting the row.
Im planning to remove the select check since its being effectively done again by the PK verification,
but im concerned about if when receiving a file with many duplicates, the table may suffer
some unecessary contention when the data is tried to be commited.
Its not clear to me if Sybase ASE tries to hold the last row and writes the data prior to check for the
duplicate. Also, if the table is very big, im concerned also about the time it will spend looking the
whole index to find duplicates.
I've found some documentation for SQL anywhere, but not ASE in the following link
http://dcx.sybase.com/1200/en/dbusage/insert-how-transact.html
The best i could find is the following explanation
https://groups.google.com/forum/?fromgroups#!topic/comp.databases.sybase/tHnOqptD7X8
But it doesn't enlighten in details how the row is locked (and if there is any kind of
optimization to write it ahead or at the same time of PK checking)
, and also if it will waste a full PK look if im positively inserting a row which the PK
positively greater than the last row commited
Thanks
Alex
Unlike SqlAnywhere there is no option for ASE to set wait_for_commit. The primary key constraint is checked during the insert and not at the commit time. The problem as I understand from your post I see is if you have a mass insert from a file that may contain duplicates is to load into a temp table , check for duplicates, remove the duplicates and then insert the unique ones. Mass insert are lot faster though it still checks for primary key violations. However there is no cost associated as there is no rolling back. The insert statement is always all or nothing. Even if one row is duplicate the entire insert statement will fail. Check before insert in more of error free approach as opposed to use of constraint to the verification because it is going to fail and rollback is going to again be costly.
Thanks Mike
The link does have a very quick explanation about the insert from the CIS perspective. Its a variable to keep an eye on given that CIS may become a representative time consumer
if its performing data and syntax checking if it will be done again when CIS forwards the insert statement to the target server. I was afraid that CIS could have some influence beyond the network traffic/time over the locking/PK checking
Raju
I do agree that avoiding the PK duplication by checking if the row already exists by running a select and doing in a batch, but im currently looking for a stop gap solution, and that may be to perform the insert command in batches of about 50 rows and leave the
duplicate key check for the PK.
Hopefully the PK check will be done over a join of the 50 newly inserted rows, and thus
avoid to traverse the index for each single row...
Ill try to test this and comment back
Alex

Best way to add column with default value while under load

When adding a column to a table that has a default value and a constraint of not null. Is it better to run as a single statement or to break it into steps while the database is under load.
ALTER TABLE user ADD country VARCHAR2(4) DEFAULT 'GB' NOT NULL
VERSUS
ALTER TABLE user ADD country VARCHAR2(2)
UPDATE user SET country = 'GB'
COMMIT
ALTER TABLE user MODIFY country DEFAULT 'GB' NOT NULL
Performance depends on the Oracle version you use. Locks are generated anyway.
If version <= Oracle 11.1 then #1 does the same as #2. It is slow anyway.
Beginning with Oracle 11.2, Oracle introduced a great optimization for the first statement (one command doing it all). You don't need to change the command - Oracle just behaves differently. It stores the default value only in data dictionary instead of updating each physical row.
But I also have to say, that I encountered some bugs in the past related to this feature (in Oracle 11.2.0.1)
failure of traditional import if export was done with direct=Y
merge statement can throw an ORA-600 [13013] (internal oracle error)
a performance problem in queries using such tables
I think this issues are fixed in current version 11.2.0.3, so I can recommend to use this feature.
Some time ago we have evaluated possible solutions of the same problem. On our project we had to remove all indexes on table, perform altering and restore indexes back.
If your system needs to be using the table then DBMS_Redefinition is really your only choice.

Restricting number of records allowed in a table in a way which can't be subverted

We have a web application (Grails) which we are going to sell licenses for based on the number of users. There is a table in the database (Oracle 10g) which holds users. Customers will host their own copy of the software and database. Can someone suggest strategies for limiting the number of records which are allowed to exist in the user table in a way which can't reasonably be subverted by the customer? Thanks.
You should at least consider avoiding all technical means here and instead insisting that your customer sign an SLSA with an audit provision, and then audit here and there.
All these technical means introduce risks of failure, ranging from flat-out crashes to mysterious performance problems. The more stealthy and devious, the more stealthy and devious the bugs.
It will depend on your definition of "reasonably". If they're hosting the database, they'll always be able to allow more rows.
The simplest possible solution would be an AFTER STATEMENT trigger that counted the number of rows and threw an exception if too many rows had been inserted. They could, of course, drop or disable that trigger. On the other hand, your application could also query the data dictionary to verify that the trigger was present and enabled.
You could make it more difficult for them to remove the trigger by creating a DDL trigger that looked for statements that affected this trigger or the table in question and disallowed them. That would require that the attacker find and remove that trigger as well before they could remove the trigger on the table.
You could deliver a database job (DBMS_SCHEDULER or DBMS_JOB) that periodically ran, looked for the statement and DDL triggers and re-created them if they were missing. The attacker could figure out that there was a database job that was recreating the objects and remove that job, then remove the DDL trigger, then remove the statement trigger. In this job, you could potentially send a notification back to you (via email or http or something else) alerting you to the issue though that may be tricky from a networking standpoint-- your customer's firewall may not allow outbound HTTP requests from the database server back to your servers.
If you have a license key that is being checked, you can embed the number of users allowed in that license key and bounce that against the number of rows in the table during the login table.
If the customer doesn't have access to modify the table definition, you could use a simple set of constraints on the table:
CREATE TABLE user_table
(id NUMBER PRIMARY KEY
,name VARCHAR2(100) NOT NULL
,rn NUMBER NOT NULL
,CONSTRAINT rn_check CHECK (rn = TRUNC(rn) AND rn BETWEEN 1 AND 30)
,CONSTRAINT rn_uk UNIQUE (rn)
);
Now, the column rn must take an integer value between 1 and 30, and duplicates are not allowed: thus, a maximum of 30 rows may be added.

Resources