Redshift: How to fix serializable isolation violation(1023) caused by concurrent MERGE operations? - aws-lambda

My use case is to extract, transform and load data incrementally and in real time from x number of Lambda functions. I expect multiple Lambda functions to be running concurrently and Redshift to stay alive for read queries.
Since Redshift doesn't enforce primary key(s) constraints, I'm using aws documentation Merge examples - Example of a merge that replaces existing rows to enforce unique rows. This method works fine when there is only 1 instance of lambda function running.
-- Start a new transaction
begin transaction;
-- Delete any rows from SALES that exist in STAGESALES, because they are updates
-- The join includes a redundant predicate to collocate on the distribution key
-- A filter on saletime enables a range-restricted scan on SALES
delete from sales
using stagesales
where sales.salesid = stagesales.salesid
and sales.listid = stagesales.listid
and sales.saletime > '2008-11-30';
-- Insert all the rows from the staging table into the target table
insert into sales
select * from stagesales;
-- End transaction and commit
end transaction;
-- Drop the staging table
drop table stagesales;
But as soon as > 1 lambda functions are running concurrently and accessing the same table, I'll receive:
"ERROR: 1023 DETAIL: Serializable isolation violation on table in Redshift" when performing operations in a transaction concurrently with another session.
How should I modify this example to allow it to run in a concurrent environment?

The issue you are running into is that you have multiple lambda functions executing DML on the same table concurrently. Redshift doesn't support concurrent transactions that are not serializable, i.e. that try and modify the same data at the same time. In that scenario Redshift will abort one or more of the transactions to ensure that all DML that gets executed is serializable.
Your current design will not work properly when scaled to more than one lambda function due to these restrictions in the way Redshift works. You will need to devise a method of managing the lambda function(s) such that there are not conflicting DML statements being run concurrently on the same table. It's not clear why you are using multiple lambda functions to do this so I can't comment on what an alternative would look like.

did you try locking table in each code as this will not allow other transaction to modify the data? You can all have separete staging table for different lambdas and have a merge job running parally which combine data from them and merge to final table.

A 1023 is a retriable error.
If it's happening only from time to time you can consider capturing it in your lambda function and then just submit the query again.

Related

Handling data in global temporary tables in case of transaction rollback

I've a job which runs with multiple instances i.e. the code base for all instances is same, but each instance works on set of data allocated to it so as to achieve parallelism and better throughput for the application.
These jobs use global temporary table for working through the data as there are multiple complex operations performed before final output is computed.
In case of failure, the transaction is rolled back (as it should), but with this I'm also losing the data in gtt.
Is there a way that the records in gtt can be copied over to another permanent table while rolling back the transaction.
I know it sounds weird, but this is a practical problem I'm facing.
I need to somehow store data in session table in case of failure of any sql, while rolling back the transaction as one of the sql has failed.
Thanks.
Hm, maybe something like this:
create a permanent table which will hold GTT data in case of failure
create an autonomous transaction procedure which would insert into permanent select * from gtt and commit
in exception handler section call that procedure and then rollback
The only way is printing the required data before your rollback.
You can use UTL_FILE to store data in the file. Later, you can use external table concept of oracle to retrieve data in the table.
Cheers!!

create plsql trigger for any DML operations performed in any tables

I have around 500 tables in DB. If there is any DML operations performed on that table then trigger should be fired to capture those dml activities and should load it into an audit table. I dont want to write 500 individual triggers. Any simple method to achieve this?
To switch all high level auditing of DML statements for all tables:
AUDIT INSERT TABLE, UPDATE TABLE, DELETE TABLE;
What objects we can manage depends on what privileges we have. Find out more.
AUDIT will write basic information to the audit trail. The destination depends on the value of the AUDIT_TRAIL parameter. If the parameter is set to db the output is written to a database table: we can see our trail in USER_AUDIT_TRAIL or (if we have the privilege) everything in DBA_AUDIT_TRAIL.
The audit trail is high level, which means it records that user FOX updated the EMP table but doesn't tell us which records or what the actual changes were. We can implement granular auditing by creating Fine-Grained Audit policies. This requires a lot more work on our part so we may decide not to enable it for all our tables. Find out more.
Triggers are used on tables only, not the entire database. Ignoring the complexity of maintaining disparate data types, data use, context of various tables and their use, what you are looking for would be extremely complex, something no RDBMS has addressed at the database level.
There is some information on triggers at this link:
https://docs.oracle.com/cd/A57673_01/DOC/server/doc/SCN73/ch15.htm
You could place a trigger on each table that calls the same procedure ... but then all that complexity comes into play.

Disable queries on table while updating

I have a pl/sql script that clears (via delete from statement) and populates several depended tables like this:
delete from table-A
insert into table-A values(...)
delete from table-B
insert into table-B values(...)
These operations require ~ 10 seconds to complete and I'd like to stop all sql queries that try to read data from table-A or table-B while tables are updating. These queries should stop and continue execution when table-A and table-B are completely updated.
What is the proper way to do this?
As others have pointed out, Oracle's basic concurrency model is that writers do not block readers and readers do not block writers. You can't stop a simple select from running. Your queries will see the data as of the SCN that they started executing (assuming that you're using the default read committed transaction isolation level) so they will have a consistent view of the data before your updates started.
You could potentially acquire a custom named lock using dbms_lock.request. You would need to acquire this lock before running your updates and every session that queries the tables would also need to acquire the lock before it starts to query the tables. That will, obviously, decrease the scalability of your application but it will accomplish what you appear to be asking for. Presumably, the sessions doing queries can acquire the lock in shared mode while the session doing the updates would need to acquire it in exclusive mode.

Delete and select on table impact when done simultaneously

I am having table ABC and using in procedures with Delete action at starting and select action at end. (Delete with no where clause)
Now if a process A invokes the procedure and it is at select on table ABC, then at same time another process B invokes the procedure which reached to Delete on ABC without any where clause.
So my question is, would process A can find the data as Delete with no where clause is happening at same time.
Literally, is synchronization would be there among tables.
I'd suggest you read about Oracle multi-versioning and ACID transactions
http://docs.oracle.com/cd/E18283_01/server.112/e16508/consist.htm
https://en.wikipedia.org/?title=ACID
Things that happen in a session within a transaction are not available to another session. This continues until a commit is issued. You have your own version until a commit or rollback is issued.
Oracle starts a transaction by default unlike some other database servers. Other database servers also have their own defaults and different implementations of ACID.

How synchronize multiple machines to insert one single row?

I have some code that is running on several machines, and accesses an Oracle database. I use this database (among other things) as a synchronization object between the different machines by locking rows.
The problem I have is that when my process is starting, there is nothing yet in the database to rely on for synchronization, and my processes get oracle exceptions about unique constraint violated since they all try to insert at the same time.
My solution for now is to catch that precise exception and ignore it, but I don't really like having exceptions being thrown in the normal workflow of my application.
Is there a better way to "test and insert" atomically in a database ? Locking the whole table/partition when inserting a row is not an acceptable solution.
I checked merge into, thinking it was my solution, but it produces the same problem.
You probably want to use DBMS_LOCK, which allows for user application code to implement the same locking model as the Oracle database does in locking rows and other resources. You can create an enqueue of type 'UL' (user lock), and define a resource name, and then have multiple sessions lock to their hearts content, without any dependence on data in a table somewhere. It supports both exclusive and shared locking, so, you have have some processes that can run concurrently (if they take a shared lock) or other processes that run exclusively (if they take an exclusive lock) and they will automatically queue behind the shared lock (if any) that are being held by the other type of process, etc.
It's a a very flexible locking model, and you don't need to rely on any data in any table to implement it.
See the Oracle PL/SQL Packages and Types Reference, for the full scoop on the DBMS_LOCK package.
Hope that helps.
You won't get an error immediately if your PK is policed by a non-unique index, consider:
<<SESSION 1>>
SQL> create table afac (
2 id number,
3 constraint afac_pk primary key (id)
4 deferrable /* will cause the PK to be policed by a non-unique index */
5 );
Table created.
SQL> insert into afac values (1);
1 row created.
<<SESSION 2>>
SQL> insert into afac values (1); /* Will cause session 2 to be blocked */
Session 2 will be blocked until session 1 either commits or rollbacks. I don't know if this mechanism is compatible with your requirement though.

Resources