does Unique constraint on multiple columns has performance issues -Oracle - performance

I am using Oracle database and I have a table for customers records and want to put a Unique key constraint on multiple varchar2 columns. like
CUST_ID (Number),
CUST_Name(varchar2),
Cust_N.I.C_NO(varchar2) will make a unique key.
when inserting new record through forms 6i, if ORA-00001 error comes, user will be informed that it was a DUPLICATED record.
Please advise me if there will be any database performance issue when records in this table will exceed 50000 or more.
If this is not a good practice to avoid inserting duplicate records, then please suggest any other approach.
regards.

Unique constraints are enforced though an index. So there are additional reads involved in the enforcement process. However, the performance impact of the constraint is minimal compared to the performance impact incurred by resolving duplicate keys in the database. Not to mention the business impact of such data corruption.
Besides, 50000 rows is a toy-sized table. Seriously, you won't be able to measure the difference of an insert with and without the constraints.

Related

Postgres (AWS Aurora) is not enforcing unique index/constraint

We are using Postgres for our production database, it's technically an Amazon AWS Aurora database using the 10.11 engine version. It doesn't seem to be under any unreasonable load (100-150 concurrent connections, CPU always under 10%, about 50% of the memory used, spikes to 300 write IOPS / 1500 read IOPS per second).
We like to ensure really good data consistency, so we make extensive use of foreign keys, triggers to validate data as it's being inserted/updated and also lots of unique constraints.
Most of the writes originate from simple REST API requests, which result in very standard insert and update queries. However, in some cases we also use triggers and functions to handle more complicated logic. For example, an update to one table will result in some fairly complicated cascading updates to other tables.
All queries are always wrapped in transactions, and for the most part we do not make use of explicit locking.
So what's wrong?
We have many (dozens of rows, across dozens of tables) instances where data exists in the database which does not conform to our unique constraints.
Sometimes the created_at and updated_at timestamps for the offending rows are identical, other times they are very similar (within half a second). This leads me to believe that this is being caused by a race condition.
We're not certain, but are fairly confident that the thing in common with these records is that the writes either triggered a function (the record was written from a simple insert or update, and caused several other tables to be updated) or that the write came from a function (a different record was written from a simple insert or update, which triggered a function that wrote the offending data).
From what I have been able to research, unique constraints/indexes are incredibly reliable and "just work". Is this true? If so, then why might this be happening?
Here is an example of some offending data, I've had to black out some of it, but I promise you the values in the user_id field are identical. As you will see below, there is a unique index across user_id, position, and undeleted. So the presence of this data should be impossible.
Here is an export of table structure:
-- Table Definition ----------------------------------------------
CREATE TABLE guides.preferences (
id uuid DEFAULT gen_random_uuid() PRIMARY KEY,
user_id uuid NOT NULL REFERENCES users.users(id),
guide_id uuid NOT NULL REFERENCES users.users(id),
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
undeleted boolean DEFAULT true,
deleted_at timestamp without time zone,
position integer NOT NULL CHECK ("position" >= 0),
completed_meetings_count integer NOT NULL DEFAULT 0,
CONSTRAINT must_concurrently_set_deleted_at_and_undeleted CHECK (undeleted IS TRUE AND deleted_at IS NULL OR undeleted IS NULL AND deleted_at IS NOT NULL),
CONSTRAINT preferences_guide_id_user_id_undeleted_unique UNIQUE (guide_id, user_id, undeleted),
CONSTRAINT preferences_user_id_position_undeleted_unique UNIQUE (user_id, position, undeleted) DEFERRABLE INITIALLY DEFERRED
);
COMMENT ON COLUMN guides.preferences.undeleted IS 'Set simultaneously with deleted_at to flag this as deleted or undeleted';
COMMENT ON COLUMN guides.preferences.deleted_at IS 'Set simultaneously with deleted_at to flag this as deleted or undeleted';
-- Indices -------------------------------------------------------
CREATE UNIQUE INDEX preferences_pkey ON guides.preferences(id uuid_ops);
CREATE UNIQUE INDEX preferences_user_id_position_undeleted_unique ON guides.preferences(user_id uuid_ops,position int4_ops,undeleted bool_ops);
CREATE INDEX index_preferences_on_user_id_and_guide_id ON guides.preferences(user_id uuid_ops,guide_id uuid_ops);
CREATE UNIQUE INDEX preferences_guide_id_user_id_undeleted_unique ON guides.preferences(guide_id uuid_ops,user_id uuid_ops,undeleted bool_ops);
We're really stumped by this, and hope that someone might be able to help us. Thank you!
I found it the reason! We have been building a lot of new functionality over the last few months, and have been running lots of migrations to change schema and update data. Because of all the triggers and functions in our database, it often makes sense to temporarily disable triggers. We do this with “set session_replication_role = ‘replica’;”.
Turns out that this also disables all deferrable constraints, because deferrable constraints and foreign keys are trigger based. As you can see from the schema in my question, the unique constraint in question is set as deferrable.
Mystery solved!

Enable Constraint - Peformance Impact

The below statement consumes a huge amount of time for a table containing 70 million records.
ALTER TABLE <table-name> ENABLE CONSTRAINT <constraint-name>
Does table scan all rows while enabling the constraint.
Even though the constraint got enabled, the process just hung for more than 5 hours.
Any ideas on how this can be optimized
As guys said before, depends on constrain type it is possibility skip validate existing data by ALTER TABLE ENABLE NOVALIDATE CONSTRAINT . And check this data by some additional procedure or query.
You can find documentation about that here https://docs.oracle.com/cd/B28359_01/server.111/b28310/general005.htm#ADMIN11546

MiniProfiler SqlServerStorage becomes quite slow

We use mini profiler in two ways:
On developer machines with the pop-up
In our staging/prod environments with SqlServerStorage storing to MS SQL
After a few weeks we find that writing to the profiling DB takes a long time (seconds), and is causing real issues on the site. Truncating all profiler tables resolves the issue.
Looking through the SqlServerStorage code, it appears the inserts also do a check to make sure a row with that id doesnt already exist. Is this to ensure DB agnostic code? This seems it would introduce a massive penalty as the number of rows increases.
How would I go about removing the performance penalty from the performance profiler? Is anyone else experiencing this slow down? Or is it something we are doing wrong?
Cheers for any help or advice.
Hmm, it looks like I made a huge mistake in how that MiniProfilers table was created when I forgot about primary key being clustered by default... and the clustered index is a GUID column, a very big no-no.
Because data is physically stored on disk in the same order as the clustered index (indeed, one could say the table is the clustered index), SQL Server has to keep every newly inserted row in that physical order. This becomes a nightmare to keep sorted when we're using essentially a random number.
The fix is to add an auto-increasing int and switch the primary key to that, just like all the other tables (why I overlooked this, I don't remember... we don't use this storage provider here on Stack Overflow or this issue would have been found long ago).
I'll update the table creation scripts and provide you with something to migrate your current table in a bit.
Edit
After looking at this again, the main MiniProfilers table could just be a heap, meaning no clustered index. All access to the rows is by that guid ID column, so no physical ordering would help.
If you don't want to recreate your MiniProfiler sql tables, you can use this script to make the primary key nonclustered:
-- first remove the clustered index from the primary key
declare #clusteredIndex varchar(50);
select #clusteredIndex = name
from sys.indexes
where type_desc = 'CLUSTERED'
and object_name(object_id) = 'MiniProfilers';
exec ('alter table MiniProfilers drop constraint ' + #clusteredIndex);
-- and then make it non-clustered
alter table MiniProfilers add constraint
PK_MiniProfilers primary key nonclustered (Id);
Another Edit
Alrighty, I've updated the creation scripts and added indexes for most querying - see the code here in GitHub.
I would highly recommended dropping all your existing tables and rerunning the updated script.

One large table partitioned and then subpartitioned or several smaller partitioned tables?

I currently have several audit tables that audit specific tables data.
e.g. ATAB_AUDIT, BTAB_AUDIT and CTAB_AUDIT auditing inserts, updates and deletes from ATAB, BTAB and CTAB respectively.
These audit tables are partitioned by year.
As the columns in these audit tables are identical (change_date, old_value, new_value etc.) would it be beneficial to use one large audit table, add a column holding the name of the table that generated the audit record (table_name) partition it by table_name and then subpartition by year?
The database is Oracle 11g on Solaris.
Why or why not do this?
Many thanks in advance.
I would guess that performance characteristics would be quite similar with either approach. I would make this decision based solely on how you decide to model your data; that is how your application(s) wish to interact with the database. I don't think your partitioning strategy would affect this decision (at least in this example).
Both approaches are valid, but sometimes people get carried away with the single-table approach and end up putting all data in one big table. There's a name for this (anti)pattern but it slips my mind.

Restricting number of records allowed in a table in a way which can't be subverted

We have a web application (Grails) which we are going to sell licenses for based on the number of users. There is a table in the database (Oracle 10g) which holds users. Customers will host their own copy of the software and database. Can someone suggest strategies for limiting the number of records which are allowed to exist in the user table in a way which can't reasonably be subverted by the customer? Thanks.
You should at least consider avoiding all technical means here and instead insisting that your customer sign an SLSA with an audit provision, and then audit here and there.
All these technical means introduce risks of failure, ranging from flat-out crashes to mysterious performance problems. The more stealthy and devious, the more stealthy and devious the bugs.
It will depend on your definition of "reasonably". If they're hosting the database, they'll always be able to allow more rows.
The simplest possible solution would be an AFTER STATEMENT trigger that counted the number of rows and threw an exception if too many rows had been inserted. They could, of course, drop or disable that trigger. On the other hand, your application could also query the data dictionary to verify that the trigger was present and enabled.
You could make it more difficult for them to remove the trigger by creating a DDL trigger that looked for statements that affected this trigger or the table in question and disallowed them. That would require that the attacker find and remove that trigger as well before they could remove the trigger on the table.
You could deliver a database job (DBMS_SCHEDULER or DBMS_JOB) that periodically ran, looked for the statement and DDL triggers and re-created them if they were missing. The attacker could figure out that there was a database job that was recreating the objects and remove that job, then remove the DDL trigger, then remove the statement trigger. In this job, you could potentially send a notification back to you (via email or http or something else) alerting you to the issue though that may be tricky from a networking standpoint-- your customer's firewall may not allow outbound HTTP requests from the database server back to your servers.
If you have a license key that is being checked, you can embed the number of users allowed in that license key and bounce that against the number of rows in the table during the login table.
If the customer doesn't have access to modify the table definition, you could use a simple set of constraints on the table:
CREATE TABLE user_table
(id NUMBER PRIMARY KEY
,name VARCHAR2(100) NOT NULL
,rn NUMBER NOT NULL
,CONSTRAINT rn_check CHECK (rn = TRUNC(rn) AND rn BETWEEN 1 AND 30)
,CONSTRAINT rn_uk UNIQUE (rn)
);
Now, the column rn must take an integer value between 1 and 30, and duplicates are not allowed: thus, a maximum of 30 rows may be added.

Resources