Dead lock is happening same data base record updating in multiple connection sessions concurrently - oracle

We have implemented client server socket based application to process multiple shopping cart requests. Daily we receive thousands of shopping cart requests.
For this we implemented multi threaded architecture to process requests concurrently. We are using Oracle Connection Pool for data base operations and we set optimal value for connection pool size. As per our business process we have a main database table and we need to update same set of rows by multiple threads using multiple connection sessions concurrently. Now are getting some dead lock issues because of multiple threads will try to update the data on same rows using multiple connection sessions concurrently and also we are some other primary key violations on tables. Sometimes data base is also getting locked by inserting same data in multiple connection sessions concurrently.
Please suggest me good approach to handle above problems immediately.

There are a few different general solutions to writing multithreaded code that does not encounter deadlocks. The simplest is to ensure that you always lock resources in the same order.
A deadlock occurs when one session holds a lock on A and wants a lock on B while another session holds a lock on B and wants a lock on A. If you ensure that your code always locks A before B (or B before A), you can be guaranteed that you won't have a deadlock.
As for your comment about primary key violations, are you using something other than an Oracle sequence to generate your primary keys? If so, that is almost certainly the problem. Oracle sequences are explicitly designed to provide unique primary keys in the case where you have multiple sessions doing simultaneous inserts.

Related

RethinkDB changefeeds performance: architectural advice?

I am building an application with RethinkDB and I'm about to switch to using changefeeds. But I'm facing an architectural choice and I'd like to get some advice.
My application currently loads all user data from several tables on user login (sending all of it to the frontend), and then processes requests from the frontend, altering the database, and preparing and sending changed items to users. I'd like to switch that over to changefeeds. The way I see it, I have two choices:
Set up a single changefeed for each table. Filter by users logged in to a particular server, and distribute the changes to users manually. These changefeeds are never closed, e.g. they have the lifetime of my servers.
When a user logs in, set up an individual changefeed for that user, for that user's data only (using a getAll with a secondary index). Maintain as many changefeeds as there are currently logged in users. Close them when users log out.
Solution #1 has a big disadvantage: RethinkDB changefeeds do not have a concept of time (or version number), like for example Kafka does. This means that there is no way to a) load initial data, and b) get changes that happened since the initial load. There is a time window where changes can be lost: between initial data load (a) and the moment the changefeed is set up (b). I find this worrying.
Solution #2 seems better, because includeInitial can be used to get initial data, and then get subsequent changes without interruption. I'd have to deal with initial load performance (it's faster to load a single dump of all data than process thousands of updates), but it seems more "correct". But what about scaling? I'm planning to handle up to 1k users per server — is RethinkDB prepared to handle thousands of changefeeds, each being essentially a getAll query? The actual activity in these changefeeds will be very low, it's just the number that I'm worried about.
The RethinkDB manual is a bit terse about changefeed scaling, saying that:
Changefeeds perform well as they scale, although they create extra intracluster messages in proportion to the number of servers with open feed connections on each write.
Solution #2 creates many more feeds, but the number of servers with open feed connections is actually the same for both solutions. And "changefeeds perform well as they scale" isn't quite enough to go on :-)
I'd also be interested to know what are recommended practices for handling server restarts/upgrades and disconnections. The way I see it, if anything happens to RethinkDB, clients have to perform a full data load (using includeInitial) after reconnecting, because there is no way to know what changes have been lost during downtime. Is that what people do?
RethinkDB should be able to handle thousands of changefeeds just fine if it's on reasonable hardware. One thing some people to do lower network load in that case is they put a proxy node on the same machine as their app server, and connect to that, since the proxy node knows enough to deduplicate the changefeed messages coming in over the network, and because it takes a lot of CPU/memory load off of their main cluster.
Currently the only way to recover from a crash is to restart the changefeed using includeInitial. There are plans to add write timestamps in the future, but handling deletes is complicated in that case.

DB transactionality in API exit

This link states that the exit function can operate in the application's unit of work.
Imagine the application starts a UOW:
MQPUT a message to a queue
Insert a record in table T1 of a DB
Also, we have a Put_After exit function that also inserts a record in table T2 of the same DB.
As per the above link WebSphere MQ acting as a XA Transaction Manager, will treat the insertion into T1 and T2 as a single XA transaction.
My question is, will DB2 treat the two insertions as a single transaction?
Whether the inserts are one or two transactions depends on whether both are contained between the same MQBEGIN and MQCOMMIT calls. That said, the more you do within an exit the slower and less reliable MQ becomes. For example, an API exit that copies messages to another queue uses only the resources of the QMgr.
However an API exit that calls a DB traverses the network, possibly including the latency of DNS lookups, creates a TLS session (assuming the DB credentials are to be kept private), authenticates and signs-on to the DB, issues the XA PREPARE statements and other XA protocol, performs any inserts, then issues the COMMIT and the XA transaction completion.
Now imagine doing that for every message PUT or GET.
Also, if any of these actions taking place off the QMgr fails, it definitely kills the transaction, possibly kills the app, and depending on FASTPATH and other options, possibly trashes the QMgr.
It would be a much better design to put the before/after images to a separate MQ queue and have a completely independent program load those into the DB. This keeps the API calls extremely fast as they remain within MQ. You might lose only 50% of your throughput instead of 80% ~ 90% as would be the case is every PUT or GET entailed two external calls from the API exit and all the network traversal. This technique of offloading messages to a second queue for subsequent upload to the DB is in fact a common approach to posting audit messages.

Oracle database as a single synchronization point for two separate web applications

I am considering using an Oracle database to synchronize concurrent operations from two or more web applications on separate servers. The database is the single infrastructure element in common for those applications.
There is a good chance that two or more applications will attempt to perform the same operation at the exact same moment (cron invoked). I want to use the database to let one application decide that it will be the one which will do the work, and that the others will not do it at all.
The general idea is to perform a somehow-atomic and visible to all connections select/insert with node's ID. Only node which has the same id as the first inserted node ID returned by select would be do the work.
It was suggested to me that a merge statement can be of use here. However, after doing some research, I found a discussion which states that the merge statement is not designed to be called
Another option is to lock a table. By definition, only one node will be able to lock the server and do the insert, then select. After the lock is removed, other instances will see the inserted value and will not perform work.
What other solutions would you consider? I frown on workarounds with random delays, or even using oracle exceptions to notify a node that it should not do the work. I'd prefer a clean solution.
I ended up going with SELECT FOR UPDATE. It works as intended. It is important to remember to commit the transaction as soon as the needed update is made, so that other nodes don't hang waiting for the value.

One data store. Multiple processes. Will this SQL prevent race conditions?

I'm trying to create a Ruby script that spawns several concurrent child processes, each of which needs to access the same data store (a queue of some type) and do something with the data. The problem is that each row of data should be processed only once, and a child process has no way of knowing whether another child process might be operating on the same data at the same instant.
I haven't picked a data store yet, but I'm leaning toward PostgreSQL simply because it's what I'm used to. I've seen the following SQL fragment suggested as a way to avoid race conditions, because the UPDATE clause supposedly locks the table row before the SELECT takes place:
UPDATE jobs
SET status = 'processed'
WHERE id = (
SELECT id FROM jobs WHERE status = 'pending' LIMIT 1
) RETURNING id, data_to_process;
But will this really work? It doesn't seem intuitive the Postgres (or any other database) could lock the table row before performing the SELECT, since the SELECT has to be executed to determine which table row needs to be locked for updating. In other words, I'm concerned that this SQL fragment won't really prevent two separate processes from select and operating on the same table row.
Am I being paranoid? And are there better options than traditional RDBMSs to handle concurrency situations like this?
As you said, use a queue. The standard solution for this in PostgreSQL is PgQ. It has all these concurrency problems worked out for you.
Do you really want many concurrent child processes that must operate serially on a single data store? I suggest that you create one writer process who has sole access to the database (whatever you use) and accepts requests from the other processes to do the database operations you want. Then do the appropriate queue management in that thread rather than making your database do it, and you are assured that only one process accesses the database at any time.
The situation you are describing is called "Non-repeatable read". There are two ways to solve this.
The preferred way would be to set the transaction isolation level to at least REPEATABLE READ. This will mean that any row that concurrent updates of the nature you described will fail. if two processes update the same rows in overlapping transactions one of them will be canceled, its changes ignored, and will return an error. That transaction will have to be retried. This is achieved by calling
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
At the start of the transaction. I can't seem to find documentation that explains an idiomatic way of doing this for ruby; you may have to emit that sql explicitly.
The other option is to manage the locking of tables explicitly, which can cause a transaction to block (and possibly deadlock) until the table is free. Transactions won't fail in the same way as they do above, but contention will be much higher, and so I won't describe the details.
That's pretty close to the approach I took when I wrote pg_message_queue, which is a simple queue implementation for PostgreSQL. Unlike PgQ, it requires no components outside of PostgreSQL to use.
It will work just fine. MVCC will come to the rescue.

How do you react to the absence of an event in a distributed system?

I have a system that collects session data. A session consists of a number of distinct events, for example "session started" and "action X performed". There is no way to determine when a session ends, so instead heartbeat events are sent at regular intervals.
This is the main complication: without a way to determine if a session has ended the only way is to try to react to the absence of an event, i.e. no more heartbeats. How can I do this efficiently and correctly in a distributed system?
Here is some more background to the problem:
The events must then be assembled into objects representing sessions. The session objects are later updated with additional data from other systems, and eventually they are used to calculate things like the number of sessions, average session length, etc.
The system must scale horizontally, so there are multiple servers that receive the events, and multiple servers that process them. Events belonging to the same session can be sent to and processed by different servers. This means that there's no guarantee that they will be processed in order, and there are additional complications that meant that events can be duplicated (and there's always the risk that some are lost, either before they reach our servers, or when processed).
Most of this exists already, but I have no good solution to how to efficiently and correctly determine when a session has ended. The way I do it now is to periodically search through the collection of "incomplete" session objects looking for any that have not been updated in an amount of time equal to two heartbeats, and moving these to another collection with "complete" sessions. This operation is time consuming and inefficient, and it doesn't scale well horizontally. Basically it consists of sorting a table on a column representing the last timestamp and filtering out any rows that aren't old enough. Sounds simple, but it's hard to parallelize, and if you do it too often you won't be doing anything else, the database will be busy filtering your data, if you don't do it often enough each run will be slow because there's too much to process.
I'd like to react to when a session has not been updated for a while, not poll every session to see if it's been updated.
Update: Just to give you a sense of scale; there are hundreds of thousands of sessions active at any time, and eventually there will be millions.
One possibility that comes to mind:
In your database table that keeps track of sessions, add a timestamp field (if you don't have one already) that records the last time the session was "active". Update the timestamp whenever you get a heartbeat.
When you create a session, schedule a "timer event" to fire after some suitable delay to check whether the session should be expired. When the timer event fires, check the session's timestamp to see if there's been more activity during the interval that the timer was waiting. If so, the session is still active, so schedule another timer event to check again later. If not, the session has timed out, so remove it.
If you use this approach, each session will always have one server responsible for checking whether it's expired, but different servers can be responsible for different sessions, so the workload can be spread around evenly. When a heartbeat comes in, it doesn't matter which server handles it, because it just updates a timestamp in a database that's (presumably) shared between all the servers.
There's still some polling involved since you'll get periodic timer events that make you check whether a session is expired even when it hasn't expired. That could be avoided if you could just cancel the pending timer event each time a heartbeat arrives, but with multiple servers that's tricky: the server that handles the heartbeat may not be the same one that has the timer scheduled. At any rate, the database query involved is lightweight: just looking up one row (the session record) by its primary key, with no sorting or inequality comparisons.
So you're collecting heartbeats; I'm wondering if you could have a batch process (or something) that ran across the collected heartbeats looking for patterns that implied the end of a session.
The level of accuracy is governed by how regular the heartbeats are and how often you scan across the collected heartbeats.
The advantage is you're processing all heartbeats through a single mechanism (in one spot - you don't have to poll each heartbeat on it's own) so that should be able to scale - if it was a database centric solution that should be able to cope with lots of data, right?
There might be a more elegant solution but my brains a bit full just now :)

Resources