How to lock an object during a distributed transaction - microservices

I have been reading about microservices and distributed transactions. Most articles talk about 2 Phase commit or Saga pattern, but does not go into detail on how an object is locked, so that other can't access that data when the transaction has not completed.
if I have a customer service and an order service and I initiate a request to lock customers funds till the order has been processed. In a distributed system, how is this achieved.
In DB's is it possible to explicitly lock a row and then another request goes and unlocks the row or is this achieved using a locked field on the customers table that the first transaction sets it to locked and once the order is complete, it goes back and sets it to unlocked or empties that row?
If there are some examples with code samples that will be great

Most articles talk about 2 Phase commit or Saga pattern, but does not
go into detail on how an object is locked, so that other can't access
that data when the transaction has not completed.
The 2PC is defined as blocking. That means that if transaction manager, which manages the 2PC transaction, is down the 2PC can't be resolved. The transaction manager is then a single point of failure.
If you are ensured that a failed transaction manager is restarted then even the 2PC protocol is said to be blocking you are the assurance that transaction manager will be available and the resolution won't be blocked.
Then 2PC uses locks. They are required as a fundamental element of the protocol. Transaction manager communicates with participants - resources. The participant is the database. When the 2PC starts running then the call of prepare means that the database makes a persistent locks on all rows that participated in the transaction. This lock is released when transaction manager calls commit.
It's important to understand that the transaction before the 2PC is in-flight (not persistent). It's stored in-memory. After the prepare is called the transaction state is stored persistently, until commit is called (and at that time the protocol may be blocked by unavailable transaction manager - the lock is persistent and the system waits for the transaction manager to do release it).
That's about locking from the 2PC perspective. But there are transaction locks from database perspective.
When you update a row in database then the transaction is in-flight (stored in memory). At that time the database needs to ensure that concurrent updates won't corrupt your data. One way is to lock the row and do not permit the concurrent updates.
But, the most databases do not lock the row - by default, in dependence to isolation level - in these cases as they use snapshot isolation (MVCC, https://en.wikipedia.org/wiki/Snapshot_isolation). That particularly means that the row is locked optimistically and the database permits other transactions to update the row.
But! the 2PC prepare can't be processed optimistically. When the database replies 'OK' to prepare request from the transaction manager the row is just locked.
Plus, you can't manage this locking by hand. If you try to do so you ruin the 2PC guarantee of consistency.
As in your example there is a customer service and an order service. When the 2PC transaction spans over the both services. Then customer updates database and order service updates the database as well. There is still running in-flight transactions in the database. Then request finishes and transaction manager commands the in-flight transaction to commit. It runs the 2PC. It invokes prepare on the customer service db transaction, then on the order service transaction and then it calls to commit.
If you use the saga pattern then saga is span over the both services. From the transaction perspective the customer service creates a database in-flight transaction and it commits it immediately. Then the call goes to the order service where the same happens too. When the request finishes the saga checks that everything run fine. When a failure happened a compensate callback is called.
The failure is "the trouble" from the perspective of ease of use. For saga you need to maintain the failure resolution on your own in the callback method. For 2PC the failure resolution is processed automatically by rollback call.
A note: I tried to summarized the 2PC here: https://developer.jboss.org/wiki/TwoPhaseCommit2PC
I'm not truly sure if the explanation is comprehensible enough but you can try to check. And you may let me know what's wrongly explained there. Thanks.

In the micro-services world the transaction boundaries are within a service. The services rely on eventual consistency. So in your example the Order service will send a request(Synchronous or Asynchronous depending on application semantics and scale requirements) like Deduct x amount from customer y for order z.
The customer service would do the action on customer record in a transaction and return response to the customer like Order z successfully processed or Order z processing failed.
The order service can then trigger the confirmation/failure process of the order depending on response received.
The application typically choose between availability and ACID strong consistency. Most microservice based scenarios demand availability and high scalability as opposed to strong consistency, which means the communication between services is asynchronous and consistent state is eventually achieved.

Related

system design - How to update cache only after persisted to database?

After watching this awesome talk by Martin Klepmann about how Kafka can be used to stream events so that we can get rid of 2-phase-commits, I have a couple of questions related to updating a cache only when the database is updated properly.
Problem Statement
Lets say you have a Redis cache which stores the user's profile pic and a Postgres database which is used for all the User related operations(creating, updation, deletion, etc)
I want to update my Redis cache only and only when a new user has been successfully added to my database.
How can I do that using Kafka ?
If I am to take the example given in the video then the workflow would follow something like this:
User registers
Request is handled by User Registration Micro service
User Registration Microservice inserts a new entry into the User's table.
Then generates an User Creation Event in the user_created topic.
Cache population microservice consumes the newly created User Creation Event
Cache population microservice updates the redis cache.
The problem starts what would happen if the User Registration Microservice crashed just after writing to the database, but failed to send the event to Kafka ?
What would be the correct way of handling this ?
Does the User Registration Microservice maintain the last event it published ? How can it reliably do that ? Does it write to a DB ? Then the problem starts all over again, what if it published the event to Kafka but failed before it could update its last known offset.
There are three broad approaches one can take for this:
There's the transactional outbox pattern, wherein, in the same transaction as inserting the new entry into the user table, a corresponding user creation event is inserted into an outbox table. Some process then eventually queries that outbox table, publishes the events in that table to Kafka, and deletes the events in the table. Since the inserts are in the same transaction, they either both occur or neither occurs; barring a bug in the process which publishes the outbox to Kafka, this guarantees that every user insert eventually has an associated event published (at least once) to Kafka.
There's a more event-sourcingish pattern, where you publish the user creation event to Kafka and then some consuming process inserts into the user table based on the event. Since this happens with a delay, this strongly suggests that the user registration service needs to keep state of which users it has published creation events for (with the combination of Kafka and Postgres being the source of truth for this). Since Kafka allows a message to be consumed by arbitrarily many consumers, a different consumer can then update Redis.
Change data capture (e.g. Debezium) can be used to tie into Postgres' write-ahead log (as Postgres actually event sources under the hood...) and publish an event that essentially says "this row was inserted into the user table" to Kafka. A consumer of that event can then translate that into a user created event.
CDC in some sense moves the transactional outbox into the infrastructure, at the cost of requiring that the context it inherently throws away be reconstructed later (which is not always possible).
That said, I'd strongly advise against having ____ creation be a microservice and I'd likewise strongly advise against a RInK store like Redis. Both of these smell like attempts to paper over architectural deficiencies by adding microservices and caches.
The one-foot-on-the-way-to-event-sourcing approach isn't one I'd recommend, but if one starts there, the requirement to make the registration service stateful suddenly opens up possibilities which may remove the need for Redis, limit the need for a Kafka-like thing, and allow you to treat the existence of a DB as an implementation detail.

DB transactionality in API exit

This link states that the exit function can operate in the application's unit of work.
Imagine the application starts a UOW:
MQPUT a message to a queue
Insert a record in table T1 of a DB
Also, we have a Put_After exit function that also inserts a record in table T2 of the same DB.
As per the above link WebSphere MQ acting as a XA Transaction Manager, will treat the insertion into T1 and T2 as a single XA transaction.
My question is, will DB2 treat the two insertions as a single transaction?
Whether the inserts are one or two transactions depends on whether both are contained between the same MQBEGIN and MQCOMMIT calls. That said, the more you do within an exit the slower and less reliable MQ becomes. For example, an API exit that copies messages to another queue uses only the resources of the QMgr.
However an API exit that calls a DB traverses the network, possibly including the latency of DNS lookups, creates a TLS session (assuming the DB credentials are to be kept private), authenticates and signs-on to the DB, issues the XA PREPARE statements and other XA protocol, performs any inserts, then issues the COMMIT and the XA transaction completion.
Now imagine doing that for every message PUT or GET.
Also, if any of these actions taking place off the QMgr fails, it definitely kills the transaction, possibly kills the app, and depending on FASTPATH and other options, possibly trashes the QMgr.
It would be a much better design to put the before/after images to a separate MQ queue and have a completely independent program load those into the DB. This keeps the API calls extremely fast as they remain within MQ. You might lose only 50% of your throughput instead of 80% ~ 90% as would be the case is every PUT or GET entailed two external calls from the API exit and all the network traversal. This technique of offloading messages to a second queue for subsequent upload to the DB is in fact a common approach to posting audit messages.

Spanner's Read-Only Transaction

I do understand Spanner's read-only transaction in one paxos group.
But how does the read-only transaction over more than one paxos group work? The paper says that it uses TT.now().latest as timestamp which then performs a snapshot read with the given timestamp. But why does this work?
In each replica, there is a safe time. The safe time is the timestamp of the last write transaction within the replica. The replica is up to date, if asked timestamp <= safe time.
The paper also says that the snapshot read with the given timestamp (second phase of the read-only transaction) may need to wait until the replicas are up to date. What happens, if after the read transaction, there will never occur any write transaction? Then the safe time will never be updated and the read transaction will be blocked forever?
AFAICT, the point is that, if a process sees TT.now().latest has passed, all other process will never get that timestamp, thus any future write transaction will have commit time (safe time) greater than that. So the process performing the snapshot read only need to wait until that timestamp passes.
Spanner is now available a service on Google Cloud Platform.
Here are the docs on how the read-only transactions work:
https://cloud.google.com/spanner/docs/transactions#read-only_transactions
==
A Cloud Spanner read-only transaction executes a set of reads at a single logical point in time, both from the perspective of the read-only transaction itself and from the perspective of other readers and writers to the Cloud Spanner database. This means that read-only transactions always observe a consistent state of the database at a chosen point in the transaction history.
==

Achieving ACID properties using JDBC?

First of all i would like to confirm is it the responsibility of developer to follow these properties or responsibilty of transaction Apis like JDBC?
Below is my understanding how we achieve acid properties in JDBC
Atomicity:- as there is one transaction associated with connection, so we do commit or rollback , there are no partial updation.Hence achieved
Consitency:- when some data integrity constraint is voilated (say some check constraint) then sqlexception will be thrown . Then programmer acieve the consistent database by rollbacking the transaction?
one question on above say we do transaction1 and sql excpetion is thrown during transaction 2 as explained above . Now we catch the exception and do the commit will first transaction be commited?
Isolation:- Provided by JDBC Apis.But this leads to the problem of concurrent update . so it has be dealt manually right?
Durability:- Provided by JDBC Apis.
Please let me if above understanding is right?
ACID principles of transactional integrity are implemented by the database not by the API (like JDBC) or by the application. Your application's responsibility is to choose a database and a database configuration that supports whatever transactional integrity you need and to correctly identify the transactional boundaries in your application.
When an exception is thrown, your application has to determine whether it is appropriate to rollback the entire transaction or to proceed with additional processing. It may be appropriate if your application is processing orders from a vendor, for example, to process the 99 orders that succeed and log the 1 order that failed somewhere for users to investigate. On the other hand, you may reject all 100 orders because 1 failed. It depends what your application is doing.
In general, you only have one transaction open at a time (or, more accurately, one transaction per connection). So if you are working in transaction 2, transaction 1 by definition has already completed-- it was either committed or rolled back previously. Exceptions thrown in transaction 2 have no impact on transaction 1.
Depending on the transaction isolation level your application requests (and the transaction isolation levels your database supports) as well as the mechanics of your application, lost updates are something that you may need to be concerned about. If you set your transaction isolation level to read committed, it is possible that you would read a value as 'A' in transaction 1, wait for a user to do something, update the value to 'B', and commit without realizing that transaction 2 updated the value to 'C' between the time you read the data and the time you wrote the data. This may be a problem that you need to deal with or it may be something where it is fine for the last person to update a row to "win".
Your database, on the other hand, should take care of the automatic locking that prevents two transactions from simultaneously updating the same row of the same table. It may do this by locking more than is strictly necessary but it will serialize the updates somehow.

AUTONOMOUS_TRANSACTION: pros and cons

Can be autonomous transactions dangerous? If yes, in which situations? When autonomous transactions are necessary?
Yes, autonomous transactions can be dangerous.
Consider the situation where you have your main transaction. It has inserted/updated/deleted rows. If you then, within that, set up an autonomous transaction then either
(1) It will not query any data at all. This is the 'safe' situation. It can be useful to log information independently of the primary transaction so that it can be committed without impacting the primary transaction (which can be useful for logging error information when you expect the primary transaction to be rolled back).
(2) It will only query data that has not been updated by the primary transaction. This is safe, but superfluous. There is no point to the autonomous transaction.
(3). It will query data that has been updated by the primary transaction. This smacks of a poorly thought through design, since you've overwritten something and then need to go back to see what it was before you overwrote it. Sometimes people think that an autonomous transaction will still see the uncommitted changes of the primary transaction, and it won't. It reads the currently committed state of the database, plus any changes made within the autonomous transaction. Some people (often trying autonomous transactions in response to mutating trigger errors) don't care what state the data is in when they try to read it and these people simply shouldn't be allowed access to a database.
(4). It will try to update/delete data that hasn't been updated by the primary transaction. Again, this smacks of poor design. These changes are going to get committed (or rolled back) whether or not the primary transaction succeeds or fails. Worse you risk issue (5) since it is hard to determine, within an autonomous transaction, whether the data has been updated by the primary transaction.
(5). You try to update/delete data that has already been updated by the primary transaction, in which case it will deadlock and end up in an ugly mess.
Can be autonomous transactions dangerous?
Yes.
If yes, in which situations?
When they're misused. For example, when used to make changes to data which should have been rolled back if the rest of the parent transaction is rolled back. Misusing them can cause data corruption because some portions of a change are committed, while others are not.
When are autonomous transactions necessary?
They are necessary when the effects of one transaction must survive, regardless of whether the parent transaction is committed or rolled back. A good example is a procedure which logs the progress and activity of a process to a database table.
When are autonomous transactions necessary?
Check my question: How can LOCK survive COMMIT or how can changes to LOCKed table be propagated to another session without COMMIT and losing LOCK
We ingest business configurations sequentially and should forbid parallel processing.
I use lock to table with configurations and update other tables accordingly. I commit each batched updates to other tables as we can't afford to keep transaction on all records - probability of collision would be near 0.99.
Each failure because of concurrent access is persisted to log for later update attempt.

Resources