Spanner's Read-Only Transaction - time

I do understand Spanner's read-only transaction in one paxos group.
But how does the read-only transaction over more than one paxos group work? The paper says that it uses TT.now().latest as timestamp which then performs a snapshot read with the given timestamp. But why does this work?
In each replica, there is a safe time. The safe time is the timestamp of the last write transaction within the replica. The replica is up to date, if asked timestamp <= safe time.
The paper also says that the snapshot read with the given timestamp (second phase of the read-only transaction) may need to wait until the replicas are up to date. What happens, if after the read transaction, there will never occur any write transaction? Then the safe time will never be updated and the read transaction will be blocked forever?

AFAICT, the point is that, if a process sees TT.now().latest has passed, all other process will never get that timestamp, thus any future write transaction will have commit time (safe time) greater than that. So the process performing the snapshot read only need to wait until that timestamp passes.

Spanner is now available a service on Google Cloud Platform.
Here are the docs on how the read-only transactions work:
https://cloud.google.com/spanner/docs/transactions#read-only_transactions
==
A Cloud Spanner read-only transaction executes a set of reads at a single logical point in time, both from the perspective of the read-only transaction itself and from the perspective of other readers and writers to the Cloud Spanner database. This means that read-only transactions always observe a consistent state of the database at a chosen point in the transaction history.
==

Related

Can a Oracle query done after a commit, return values prior to the commit when such commit is done with COMMIT_WRITE = NOWAIT?

I have a 3th party Java library that in a moment, gets a JDBC connection, starts a transaction, does several batch updates with PreparedStatement.addBatch(), executes the batch, commits the transaction and closes the connection. Almost immediately after (in the span of <10 milliseconds), the library gets another connection and queries one of the records affected by the update.
For the proper functioning of the library, that query should return the updated record. However, in some rare cases, I'm getting (using P6Spy) that the query is returning the record with its values before the update (and the library fails in some point forwards due to unexpected data).
I'm trying to understand why this would happen, and then I found that in my database (Oracle 19c) there is a parameter COMMIT_WAIT that basically gives the possibility that a call to a commit doesn't block until the commit is finished, obtaining an asynchronous commit. So I used the SHOW PARAMETERS to see the value of that parameter and I found out that COMMIT_WAIT is set up to NOWAIT (also, COMMIT_LOGGING was set up to BATCH).
I began to speculate if what was happening was that the call to commit() just started the operation (without waiting for it to finish), and perhaps the next query occurred while the operation was still in progress, returning the value of the record before the transaction. (The isolation level for all connections is Connection.TRANSACTION_READ_COMMITTED)
Can COMMIT_WAIT set up to NOWAIT cause that kind of scenario? I read that the use of NOWAIT has a lot of risks associated with it, but mostly they refers to things like loss of durability if the database crashes.
Changing the commit behavior should not affect database consistency and should not cause wrong results to be returned.
A little background - Oracle uses REDO for durability (recovering data after an error) and uses UNDO for consistency (making sure the correct results are always returned for any point-in-time). To improve performance, there are many tricks to reduce REDO and UNDO. But changing the commit behavior doesn't reduce the amount of logical REDO and UNDO, it only delays and optimizes the REDO physical writes.
Before a commit happens, and even before your statements return, the UNDO data used for consistency has been written to memory. Changing the commit behavior won't stop the changes from making their way to the UNDO tablespace.
Per the Database Reference for COMMIT_WAIT, "Also, [the parameter] can violate the durability of ACID (Atomicity, Consistency, Isolation, Durability) transactions if the database shuts down unexpectedly." Since the manual is already talking about the "D" in ACID, I assume it would also explicitly mention if the parameter affects the "C".
On the other hand, the above statements are all just theory. It's possible that there's some UNDO optimization bug that's causing the parameter to break something. But I think that would be extremely unlikely. Oracle goes out of its way to make sure that data is never lost or incorrect. (I know because even when I don't want REDO or UNDO it's hard to turn them off.)

How to lock an object during a distributed transaction

I have been reading about microservices and distributed transactions. Most articles talk about 2 Phase commit or Saga pattern, but does not go into detail on how an object is locked, so that other can't access that data when the transaction has not completed.
if I have a customer service and an order service and I initiate a request to lock customers funds till the order has been processed. In a distributed system, how is this achieved.
In DB's is it possible to explicitly lock a row and then another request goes and unlocks the row or is this achieved using a locked field on the customers table that the first transaction sets it to locked and once the order is complete, it goes back and sets it to unlocked or empties that row?
If there are some examples with code samples that will be great
Most articles talk about 2 Phase commit or Saga pattern, but does not
go into detail on how an object is locked, so that other can't access
that data when the transaction has not completed.
The 2PC is defined as blocking. That means that if transaction manager, which manages the 2PC transaction, is down the 2PC can't be resolved. The transaction manager is then a single point of failure.
If you are ensured that a failed transaction manager is restarted then even the 2PC protocol is said to be blocking you are the assurance that transaction manager will be available and the resolution won't be blocked.
Then 2PC uses locks. They are required as a fundamental element of the protocol. Transaction manager communicates with participants - resources. The participant is the database. When the 2PC starts running then the call of prepare means that the database makes a persistent locks on all rows that participated in the transaction. This lock is released when transaction manager calls commit.
It's important to understand that the transaction before the 2PC is in-flight (not persistent). It's stored in-memory. After the prepare is called the transaction state is stored persistently, until commit is called (and at that time the protocol may be blocked by unavailable transaction manager - the lock is persistent and the system waits for the transaction manager to do release it).
That's about locking from the 2PC perspective. But there are transaction locks from database perspective.
When you update a row in database then the transaction is in-flight (stored in memory). At that time the database needs to ensure that concurrent updates won't corrupt your data. One way is to lock the row and do not permit the concurrent updates.
But, the most databases do not lock the row - by default, in dependence to isolation level - in these cases as they use snapshot isolation (MVCC, https://en.wikipedia.org/wiki/Snapshot_isolation). That particularly means that the row is locked optimistically and the database permits other transactions to update the row.
But! the 2PC prepare can't be processed optimistically. When the database replies 'OK' to prepare request from the transaction manager the row is just locked.
Plus, you can't manage this locking by hand. If you try to do so you ruin the 2PC guarantee of consistency.
As in your example there is a customer service and an order service. When the 2PC transaction spans over the both services. Then customer updates database and order service updates the database as well. There is still running in-flight transactions in the database. Then request finishes and transaction manager commands the in-flight transaction to commit. It runs the 2PC. It invokes prepare on the customer service db transaction, then on the order service transaction and then it calls to commit.
If you use the saga pattern then saga is span over the both services. From the transaction perspective the customer service creates a database in-flight transaction and it commits it immediately. Then the call goes to the order service where the same happens too. When the request finishes the saga checks that everything run fine. When a failure happened a compensate callback is called.
The failure is "the trouble" from the perspective of ease of use. For saga you need to maintain the failure resolution on your own in the callback method. For 2PC the failure resolution is processed automatically by rollback call.
A note: I tried to summarized the 2PC here: https://developer.jboss.org/wiki/TwoPhaseCommit2PC
I'm not truly sure if the explanation is comprehensible enough but you can try to check. And you may let me know what's wrongly explained there. Thanks.
In the micro-services world the transaction boundaries are within a service. The services rely on eventual consistency. So in your example the Order service will send a request(Synchronous or Asynchronous depending on application semantics and scale requirements) like Deduct x amount from customer y for order z.
The customer service would do the action on customer record in a transaction and return response to the customer like Order z successfully processed or Order z processing failed.
The order service can then trigger the confirmation/failure process of the order depending on response received.
The application typically choose between availability and ACID strong consistency. Most microservice based scenarios demand availability and high scalability as opposed to strong consistency, which means the communication between services is asynchronous and consistent state is eventually achieved.

In SQL Server Always ON configuration - will Transaction Log backup to Nul breaks Always On configuration?

Imagine we have two nodes participating in SQL 2012 AO. This is a test instance. During one of the index rebuild operation the log was grown up really big (250 GB). We are unable to back it up due to space constraint. What if we backup the Tlog to Nul (just to shrink it down) – will that break Always On?
AlwaysOn is a (marketing) umbrella term that covers both Availability Groups (AGs) and Failover Cluster Instances (FCIs). From context, I assume you are asking about AGs?
For both FCIs and AGs, the short answer is the same: performing transaction log backups (regardless of the destination) will not "break" your HA capabilities. However, I would urge you to NEVER EVER back up to NUL:, unless you don't care about the data in your database. Taking a log backup to NUL: (Regardless of if you were using an AG, FCI, or neither) will break your log backup chain, and prevent point-in-time recovery.
If you are using an Availability Group, SQL Server does not use transaction log backups to synchronize between nodes. It uses the transaction log itself, and therefore will not clear the transaction log if there is log data that needs to be synchronized to another node. That is to say: if your AG synchronization is behind, your transaction log will continue to fill/grow until synchronization catches up, regardless of the number of transaction log backups performed.
There are multiple reasons your transaction log might continue to grow, and AG synchronization is just one of those reasons. If SQL Server cannot reuse the transaction log because of unsynchronized transactions in the AG, the log_reuse_wait_desc column in sys.databases will show the value "AVAILABILITY_REPLICA".
Getting back to your root problem: Rebuilding an index made your transaction log get really, really big.
When you perform an ALTER INDEX...REBUILD, SQL Server creates the entire new index (a size-of-data operation), and must be able to roll back the index creation if it errors or is killed prior to completion. Therefore, you may see the log_reuse_wait_desc column in sys.databases showing as "ACTIVE_TRANSACTION" during a very large, long-running index rebuild. The rebuild itself would prevent you from reusing the log, and would cause the log to grow.

Dirty Reading in hibernate

Dirty Read: The definition states that
dirty reading occurs when a transaction reads data from a row that has been modified by another transaction but not yet committed.
Assuming the definition is correct, I am unable to fathom any such situation.
Due to the principle of Isolation, the transaction A can not see the uncommitted data of the row that has been modified by transaction B. If transaction B has simply not committed, how transaction A can see it in the first place? It is only possible when both operations are performed under same transaction.
Can someone please explain what am I missing here?
"Dirty", or uncommitted reads (UR) are a way to allow non-blocking reads. Reading uncommitted data is not possible in an Oracle database due to the multi-version concurrency control employed by Oracle; instead of trying to read other transactions' data each transaction gets its own snapshot of data as they existed (committed) at the start of the transaction. As a result all reads are essentially non-blocking.
In databases that use lock-based concurrency control, e.g DB2, uncommitted reads are possible. A transaction using the UR isolation level ignores locks placed by other transactions, and thus it is able to access rows that have been modified but not yet committed.
Hibernate, being an abstraction layer on top of a database, offers the UR isolation level support for databases that have the capability.

Oracle transaction read-consistency?

I have a problem understanding read consistency in database (Oracle).
Suppose I am manager of a bank . A customer has got a lock (which I don't know) and is doing some updating. Now after he has got a lock I am viewing their account information and trying to do some thing on it. But because of read consistency I will see the data as it existed before the customer got the lock. So will not that affect inputs I am getting and the decisions that I am going to make during that period?
The point about read consistency is this: suppose the customer rolls back their changes? Or suppose those changes fail because of a constraint violation or some system failure?
Until the customer has successfully committed their changes those changes do not exist. Any decision you might make on the basis of a phantom read or a dirty read would have no more validity than the scenario you describe. Indeed they have less validity, because the changes are incomplete and hence inconsistent. Concrete example: if the customer's changes include making a deposit and making a withdrawal, how valid would your decision be if you had looked at the account when they had made the deposit but not yet made the withdrawal?
Another example: a long running batch process updates the salary of every employee in the organisation. If you run a query against employees' salaries do you really want a report which shows you half the employees with updated salaries and half with their old salaries?
edit
Read consistency is achieved by using the information in the UNDO tablespace (rollback segments in the older implementation). When a session reads data from a table which is being changed by another session, Oracle retrieves the UNDO information which has been generated by that second session and substitutes it for the changed data in the result set presented to the first session.
If the reading session is a long running query it might fail because due to the notorious ORA-1555: snapshot too old. This means the UNDO extent which contained the information necessary to assemble a read consistent view has been overwritten.
Locks have nothing to do with read consistency. In Oracle writes don't block reads. The purpose of locks is to prevent other processes from attempting to change rows we are interested in.
For systems that have large number of users, where users may "hold" the lock for a long time the Optimistic Offline Lock pattern is usually used, i.e. use the version in the UPDATE ... WHERE statement.
You can use a date, version id or something else as the row version. Also the virtual columm ORA_ROWSCN may be used but you need to read up on it first.
When a record is locked due to changes or an explicit lock statement, an entry is made into the header of that block. This is called an ITL (interested transaction list). When you come along to read that block, your session sees this and knows where to go to get the read consistent copy from the rollback segment.

Resources