Postgres (AWS Aurora) is not enforcing unique index/constraint - amazon-aurora

We are using Postgres for our production database, it's technically an Amazon AWS Aurora database using the 10.11 engine version. It doesn't seem to be under any unreasonable load (100-150 concurrent connections, CPU always under 10%, about 50% of the memory used, spikes to 300 write IOPS / 1500 read IOPS per second).
We like to ensure really good data consistency, so we make extensive use of foreign keys, triggers to validate data as it's being inserted/updated and also lots of unique constraints.
Most of the writes originate from simple REST API requests, which result in very standard insert and update queries. However, in some cases we also use triggers and functions to handle more complicated logic. For example, an update to one table will result in some fairly complicated cascading updates to other tables.
All queries are always wrapped in transactions, and for the most part we do not make use of explicit locking.
So what's wrong?
We have many (dozens of rows, across dozens of tables) instances where data exists in the database which does not conform to our unique constraints.
Sometimes the created_at and updated_at timestamps for the offending rows are identical, other times they are very similar (within half a second). This leads me to believe that this is being caused by a race condition.
We're not certain, but are fairly confident that the thing in common with these records is that the writes either triggered a function (the record was written from a simple insert or update, and caused several other tables to be updated) or that the write came from a function (a different record was written from a simple insert or update, which triggered a function that wrote the offending data).
From what I have been able to research, unique constraints/indexes are incredibly reliable and "just work". Is this true? If so, then why might this be happening?
Here is an example of some offending data, I've had to black out some of it, but I promise you the values in the user_id field are identical. As you will see below, there is a unique index across user_id, position, and undeleted. So the presence of this data should be impossible.
Here is an export of table structure:
-- Table Definition ----------------------------------------------
CREATE TABLE guides.preferences (
id uuid DEFAULT gen_random_uuid() PRIMARY KEY,
user_id uuid NOT NULL REFERENCES users.users(id),
guide_id uuid NOT NULL REFERENCES users.users(id),
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
undeleted boolean DEFAULT true,
deleted_at timestamp without time zone,
position integer NOT NULL CHECK ("position" >= 0),
completed_meetings_count integer NOT NULL DEFAULT 0,
CONSTRAINT must_concurrently_set_deleted_at_and_undeleted CHECK (undeleted IS TRUE AND deleted_at IS NULL OR undeleted IS NULL AND deleted_at IS NOT NULL),
CONSTRAINT preferences_guide_id_user_id_undeleted_unique UNIQUE (guide_id, user_id, undeleted),
CONSTRAINT preferences_user_id_position_undeleted_unique UNIQUE (user_id, position, undeleted) DEFERRABLE INITIALLY DEFERRED
);
COMMENT ON COLUMN guides.preferences.undeleted IS 'Set simultaneously with deleted_at to flag this as deleted or undeleted';
COMMENT ON COLUMN guides.preferences.deleted_at IS 'Set simultaneously with deleted_at to flag this as deleted or undeleted';
-- Indices -------------------------------------------------------
CREATE UNIQUE INDEX preferences_pkey ON guides.preferences(id uuid_ops);
CREATE UNIQUE INDEX preferences_user_id_position_undeleted_unique ON guides.preferences(user_id uuid_ops,position int4_ops,undeleted bool_ops);
CREATE INDEX index_preferences_on_user_id_and_guide_id ON guides.preferences(user_id uuid_ops,guide_id uuid_ops);
CREATE UNIQUE INDEX preferences_guide_id_user_id_undeleted_unique ON guides.preferences(guide_id uuid_ops,user_id uuid_ops,undeleted bool_ops);
We're really stumped by this, and hope that someone might be able to help us. Thank you!

I found it the reason! We have been building a lot of new functionality over the last few months, and have been running lots of migrations to change schema and update data. Because of all the triggers and functions in our database, it often makes sense to temporarily disable triggers. We do this with “set session_replication_role = ‘replica’;”.
Turns out that this also disables all deferrable constraints, because deferrable constraints and foreign keys are trigger based. As you can see from the schema in my question, the unique constraint in question is set as deferrable.
Mystery solved!

Related

Operations on certain tables won't finish

We have a table TRANSMISSIONS(ID, NAME) which behaves funny in the following ways:
The statement to add a foreign key in another table referencing TRANSMISSIONS.ID won't finish
The statement to add a column to TRANSMISSIONS won't finish
The statement to disable/drop a unique constraint won't finish
The statement to disable/drop a trigger won't finish
TRANSMISSION's primary key is ID, there is also a unique constraint on NAME - therefore there are indexes on ID and NAME. We also have a trigger which creates values for column ID using a sequence, so that INSERT statements do not need to provide a value for ID.
Besides TRANSMISSIONS, there are two more tables behaving like this. For other tables, the above-mentioned statements work fine.
The database is used in an application with Hibernate and due to an incorrect JPA configuration we produced high values for ID during a time. Note that we use the trigger only for "manual" INSERT statements and that Hibernate produces ID values itself, also using the sequence.
The first thought was that the problems were due to the high IDs but we have the problems also with tables that never had such high IDs.
Anyways we suspected that the indexes might be fragmented somehow and called ALTER INDEX TRANSMISSIONS_PK SHRINK SPACE COMPACT, which ran through but showed no effect.
Also, we wanted to call ALTER TABLE TRANSMISSIONS SHRINK SPACE COMPACT which didn't work because we needed to call first ALTER TABLE TRANSMISSIONS ENABLE ROW MOVEMENT which never finished.
We have another instance of the database which does not behave in such a funny way. So we think it might be that in the course of running the application the database got somehow into an inconsistent state.
Does someone have any suggestions what might have gone out of control/into an inconsitent state?
More hints:
There are no locks present on any objects in the database (according to information in v$lock and v$locked_object)
We tried all these statements in SQL Developer and also using SQLPlus (command-line tool).

Creating a Triggers

How do I start a trigger so that this allows nobody to be able to rent a movie if their unpaid balance exceeds 50 dollars?
What you have here is a cross-row table constraint - i.e. you can't just put a single Oracle CONSTRAINT on a column, as these can only look at data within a single row at a time.
Oracle has support for only two cross-row constraint types - uniqueness (e.g. primary keys and unique constraints) and referential integrity (foreign keys).
In your case, you'll have to hand-code the constraint yourself - and with that comes the responsibility to ensure that the constraint is not violated in the presence of multiple sessions, each of which cannot see data inserted/updated by other concurrent sessions (at least, until they commit).
A simplistic approach is to add a trigger that issues a query to count how many records conflict with the new record; but this won't work because the trigger cannot see rows that have been inserted/updated by other sessions but not committed yet; so the trigger will sometimes allow members to rent 6 videos, as long as (for example) they get two cashiers to enter the data in separate terminals.
One way to get around this problem is to put some element of serialization in - e.g. the trigger would first request a lock on the member record (e.g. with a SELECT FOR UPDATE) before it's allowed to check the rentals; that way, if a 2nd session tries to insert rentals, it will wait until the first session does a commit or rollback.
Another way around this problem is to use an aggregating Materialized View, which would be based on a query that is designed to find any rows that fail the test; the expectation is that the MV will be empty, and you put a table constraint on the MV such that if a row was ever to appear in the MV, the constraint would be violated. The effect of this is that any statement that tries to insert rows that violate the constraint will cause a constraint violation when the MV is refreshed.
Writing the query for this based on your design is left as an exercise for the reader :)
If you want to restrict something about your table data then you should have a look at Constraints and not Triggers.
Constraints are ensuring some conditions about your table data. Like your example.
Triggers are fired when some action (i.e. INSERT, UPDATE, DELETE) took place and you can do some work then as a reaction to this action.

does Unique constraint on multiple columns has performance issues -Oracle

I am using Oracle database and I have a table for customers records and want to put a Unique key constraint on multiple varchar2 columns. like
CUST_ID (Number),
CUST_Name(varchar2),
Cust_N.I.C_NO(varchar2) will make a unique key.
when inserting new record through forms 6i, if ORA-00001 error comes, user will be informed that it was a DUPLICATED record.
Please advise me if there will be any database performance issue when records in this table will exceed 50000 or more.
If this is not a good practice to avoid inserting duplicate records, then please suggest any other approach.
regards.
Unique constraints are enforced though an index. So there are additional reads involved in the enforcement process. However, the performance impact of the constraint is minimal compared to the performance impact incurred by resolving duplicate keys in the database. Not to mention the business impact of such data corruption.
Besides, 50000 rows is a toy-sized table. Seriously, you won't be able to measure the difference of an insert with and without the constraints.

oracle- index organized table

what is use-case of IOT (Index Organized Table) ?
Let say I have table like
id
Name
surname
i know the IOT but bit confuse about the use case of IOT
Your three columns don't make a good use case.
IOT are most useful when you often access many consecutive rows from a table. Then you define a primary key such that the required order is represented.
A good example could be time series data such as historical stock prices. In order to draw a chart of the stock price of a share, many rows are read with consecutive dates.
So the primary key would be stock ticker (or security ID) and the date. The additional columns could be the last price and the volume.
A regular table - even with an index on ticker and date - would be much slower because the actual rows would be distributed over the whole disk. This is because you cannot influence the order of the rows and because data is inserted day by day (and not ticker by ticker).
In an index-organized table, the data for the same ticker ends up on a few disk pages, and the required disk pages can be easily found.
Setup of the table:
CREATE TABLE MARKET_DATA
(
TICKER VARCHAR2(20 BYTE) NOT NULL ENABLE,
P_DATE DATE NOT NULL ENABLE,
LAST_PRICE NUMBER,
VOLUME NUMBER,
CONSTRAINT MARKET_DATA_PK PRIMARY KEY (TICKER, P_DATE) ENABLE
)
ORGANIZATION INDEX;
Typical query:
SELECT TICKER, P_DATE, LAST_PRICE, VOLUME
FROM MARKET_DATA
WHERE TICKER = 'MSFT'
AND P_DATE BETWEEN SYSDATE - 1825 AND SYSDATE
ORDER BY P_DATE;
Think of index organized tables as indexes. We all know the point of an index: to improve access speeds to particular rows of data. This is a performance optimisation of trick of building compound indexes on sub-sets of columns which can be used to satisfy commonly-run queries. If an index can completely satisy the columns in a query's projection the optimizer knows it doesn't have to read from the table at all.
IOTs are just this approach taken to its logical confusion: buidl the index and throw away the underlying table.
There are two criteria for deciding whether to implement a table as an IOT:
It should consists of a primary key (one or more columns) and at most one other column. (okay, perhaps two other columns at a stretch, but it's an warning flag).
The only access route for the table is the primary key (or its leading columns).
That second point is the one which catches most people out, and is the main reason why the use cases for IOT are pretty rare. Oracle don't recommend building other indexes on an IOT, so that means any access which doesn't drive from the primary key will be a Full Table Scan. That might not matter if the table is small and we don't need to access it through some other path very often, but it's a killer for most application tables.
It is also likely that a candidate table will have a relatively small number of rows, and is likely to be fairly static. But this is not a hard'n'fast rule; certainly a huge, volatile table which matched the two criteria listed above could still be considered for implementations as an IOT.
So what makes a good candidate dor index organization? Reference data. Most code lookup tables are like something this:
code number not null primary key
description not null varchar2(30)
Almost always we're only interested in getting the description for a given code. So building it as an IOT will save space and reduce the access time to get the description.

SQL Timeout and indices

Changing and finding stuff in a database containing a few dozen tables with around half a million rows in the big ones I'm running into timeouts quite often.
Some of these timeouts I don't understand. For example I got this table:
CREATE TABLE dbo.[VPI_APO]
(
[Key] bigint IDENTITY(1,1) NOT NULL CONSTRAINT [PK_VPI_APO] PRIMARY KEY,
[PZN] nvarchar(7) NOT NULL,
[Key_INB] nvarchar(5) NOT NULL,
) ON [PRIMARY]
GO
ALTER TABLE dbo.[VPI_APO] ADD CONSTRAINT [IX_VPI_APOKey_INB] UNIQUE NONCLUSTERED
(
[PZN],
[Key_INB]
) ON [PRIMARY]
GO
I often get timeouts when I search an item in this table like this (during inserting high volumes of items):
SELECT [Key] FROM dbo.[VPI_APO] WHERE ([PZN] = #Search1) AND ([Key_INB] = #Search2)
These timeouts when searching on the unique constraints happen quite often. I expected unique constraints to have the same benefits as indices, was I mistaken? Do I need an index on these fields, too?
Or will I have to search differently to benefit of the constraint?
I'm using SQL Server 2008 R2.
A unique constraint creates an index. Based on the query and the constraint defined you should be in a covering situation, meaning the index provides everything the query needs without a need to back to the cluster or the heap to retrieve data. I suspect that your index is not sufficiently selective or that your statistics are out of date. First try updating the statistics with sp_updatestats. If that doesn't change the behavior, try using UPDATE STATISTICS VPI_APO WITH FULL SCAN. If neither of those work, you need to examine the selectivity of the index using DBCC SHOW_STATISTICS.
My first thought is parameter sniffing.
If you're on SQL Server 2008, try "OPTIMIZE FOR UNKNOWN" rather than parameter masking
2nd thought is change the unique constrant to an index and INCLUDE the Key column explicitly. Internally they are the same, but as an index you have some more flexibility (eg filter, include etc)

Resources