Commits in the absence of locks in CockroachDB - cockroachdb

I'm trying to understand how ACID in CockroachDB works without locks, from an application programmer's point of view. Would like to use it for an accounting / ERP application.
When two users update the same database field (e.g. a general ledger account total field) at the same time what does CockroachDB do? Assuming each is updating many other non-overlapping fields at the same time as part of the respective transactions.
Will the aborted application's commit process be informed about this immediately at the time of the commit?
Do we need to take care of additional possibilities than, for example, in ACID/locking PostgreSQL when we write the database access code in our application?
Or is writing code for accessing CockroachDB for all practical purposes the same as for accessing a standard RDBMS with respect to commits and in general.
Of course, ignoring performance issues / joins, etc.

I'm trying to understand how ACID in CockroachDB works without locks, from an application programmer's point of view. Would like to use it for an accounting / ERP application.
CockroachDB does have locks, but uses different terminology. Some of the existing documentation that talks about optimistic concurrency control is currently being updated.
When two users update the same database field (e.g. a general ledger account total field) at the same time what does CockroachDB do? Assuming each is updating many other non-overlapping fields at the same time as part of the respective transactions.
One of the transactions will block waiting for the other to commit. If a deadlock between the transactions is detected, one of the two transactions involved in the deadlock will be aborted.
Will the aborted application's commit process be informed about this immediately at the time of the commit?
Yes.
Do we need to take care of additional possibilities than, for example, in ACID/locking PostgreSQL when we write the database access code in our application?
Or is writing code for accessing CockroachDB for all practical purposes the same as for accessing a standard RDBMS with respect to commits and in general.
At a high-level there is nothing additional for you to do. CockroachDB defaults to serializable isolation which can result in more transaction restarts that weaker isolation levels, but comes with the advantage that the application programmer doesn't have to worry about anomalies.

Related

What's the performance penalty of long lived DB transactions interleaved with one another?

Could anyone provide an explanation or point me to a good source where it is explained the impact of long lived database transactions when there are other transactions involved?
I'm having difficulties trying to understand what is the real impact in the performance of an application of having transactions where most of the queries are reads and maybe a couple or three are writes, given the different isolation levels.
Mostly I would like to understand it in the situation where:
Neither the rows read nor the rows updated are involved in any other transaction.
The rows read are involved in another transaction but not the rows being updated and this other transaction is read only.
The rows read are involved in another transaction but not the rows being updated and this other transaction is modifying some data being read. I understand here it also affects whether the data is read before or after is being modified.
Both the rows read and the rows updated are involved in another transaction also modifying the data.
These questions come in the context of an application using micro services where all application layer services are annotated with #Transactional using JPA and PostgreSQL and, to transform the data, they need to do some network calls to other micro services within the transaction to fetch some other values.

Dynamically List contents of a table in database that continously updates

It's kinda real-world problem and I believe the solution exists but couldn't find one.
So We, have a Database called Transactions that contains tables such as Positions, Securities, Bogies, Accounts, Commodities and so on being updated continuously every second whenever a new transaction happens. For the time being, We have replicated master database Transaction to a new database with name TRN on which we do all the querying and updating stuff.
We want a sort of monitoring system ( like htop process viewer in Linux) for Database that dynamically lists updated rows in tables of the database at any time.
TL;DR Is there any way to get a continuous updating list of rows in any table in the database?
Currently we are working on Sybase & Oracle DBMS on Linux (Ubuntu) platform but we would like to receive generic answers that concern most of the platform as well as DBMS's(including MySQL) and any tools, utilities or scripts that can do so that It can help us in future to easily migrate to other platforms and or DBMS as well.
To list updated rows, you conceptually need either of the two things:
The updating statement's effect on the table.
A previous version of the table to compare with.
How you get them and in what form is completely up to you.
The 1st option allows you to list updates with statement granularity while the 2nd is more suitable for time-based granularity.
Some options from the top of my head:
Write to a temporary table
Add a field with transaction id/timestamp
Make clones of the table regularly
AFAICS, Oracle doesn't have built-in facilities to get the affected rows, only their count.
Not a lot of details in the question so not sure how much of this will be of use ...
'Sybase' is mentioned but nothing is said about which Sybase RDBMS product (ASE? SQLAnywhere? IQ? Advantage?)
by 'replicated master database transaction' I'm assuming this means the primary database is being replicated (as opposed to the database called 'master' in a Sybase ASE instance)
no mention is made of what products/tools are being used to 'replicate' the transactions to the 'new database' named 'TRN'
So, assuming part of your environment includes Sybase(SAP) ASE ...
MDA tables can be used to capture counters of DML operations (eg, insert/update/delete) over a given time period
MDA tables can capture some SQL text, though the volume/quality could be in doubt if a) MDA is not configured properly and/or b) the DML operations are wrapped up in prepared statements, stored procs and triggers
auditing could be enabled to capture some commands but again, volume/quality could be in doubt based on how the DML commands are executed
also keep in mind that there's a performance hit for using MDA tables and/or auditing, with the level of performance degradation based on individual config settings and the volume of DML activity
Assuming you're using the Sybase(SAP) Replication Server product, those replicated transactions sent through repserver likely have all the info you need to know which tables/rows are being affected; so you have a couple options:
route a copy of the transactions to another database where you can capture the transactions in whatever format you need [you'll need to design the database and/or any customized repserver function strings]
consider using the Sybase(SAP) Real Time Data Streaming product (yeah, additional li$ence is required) which is specifically designed for scenarios like yours, ie, pull transactions off the repserver queues and format for use in downstream systems (eg, tibco/mqs, custom apps)
I'm not aware of any 'generic' products that work, out of the box, as per your (limited) requirements. You're likely looking at some different solutions and/or customized code to cover your particular situation.

Reduce durability to improve performace in db2

In db2 10.5, is it possible that transaction commit does not wait for log IO to finish and return control to the client, like SQL server's delayed durability?
Is there any way to reduce the number of log IOs when there are large number of serial small transactions?
Db2 on Linux/Unit/Windows, at version 11.1 does not currently support lazy commits as offered by some versions of Microsoft SQL-server.
For some kinds of processing, especially large batches, it is often very convenient to use unlogged global temporary tables for intermediate tables. That is the one convenient way to eliminate logging overhead, although the use case is limited to specific scenarios. Such tables (declared global temporary tables, or created global temporary tables) let you do fast processing without incurring logging overhead, although you have to design your batches (typically your stored procedures) to work with these kinds of tables, including the ability to restart after failures and recover etc.
If you have high frequency discrete OLTP transactions that include both insert AND update (not batch) , you should concentrate on optimising your active-logging configuration. For example to ensure your active logs are on the fastest media, to ensure your Db2 is never waiting for log files, to ensure that the logbufsz is adequate, to ensure that the bufferpool cleaning is optimal etc, to ensure that the size of your transaction-log files is compatible with your RTO and RPO Service-levels.
Db2 LUW has a database configuration parameter called mincommit. The value of mincommit indicates how many transactions will commit before flushing out the log buffer to disk. This is probably what you are looking for.
From Db2 LUW v10.5 and higher, this parameter is ignored, and the value is only meaningful on versions of Db2 LUW up to v10.1.
For older versions of Db2 LUW, the recommendation is to leave the value at 1. In most cases there is no performance improvement while at the same time introducing risk. Hence, the configuration parameter has been deprecated in version 10.1. My advice: Do not use it even though it still exists.

Transaction Performance Impact of Materialized View Logs

I have been researching using materialized views for data aggregation and reporting purposes for a company that is largely centered around transactions (using an Oracle db). The current reporting system is dependent upon a series of views that obscure a lot of the complex data logic of the application. These views place a heavy burden on the system when they are called.
We are interested in using the "fast refresh" for incremental updates to perform some of the complex query logic prior to use in reporting; however, there is a concern within the organization that the materialized view logs (which are required for this fast refresh) will have an impact on our current transaction performance in the database. This performance is very essential to our organization therefore there is a great fear of any change.
Here is an example of the type of materialized view log we would need to implement:
create materialized view log on transaction
with rowid, sequence(transaction_id,account_id,order_id,currency_id,price,transaction_date,payment_processor_id)
including new values;
We would not be using the "on commit" clause for updates but rather the "on demand" clause in creation of the view, as we understand this would have a performance impact.
Will implementing this type of logging affect database transaction performance? I imagine that it must slightly affect performance as there is an additional write procedure (to the log) that is wrapped in the commit, but I cannot find any reference to this in the Oracle documentation. Any literature or advice on this subject would be greatly appreciated.
Thanks for your help!
Yes, there will be an impact. The materialized log needs to be maintained synchronously so the transactions will need to insert a new row into the materialized view log for every row that is modified in the base table. How great an impact depends heavily on the system. If your system is I/O bound and you've optimized it so that physically writing the changes to the base table is a significant fraction of the wait time, the impact will be much greater than if your system is CPU bound and most of your wait time is spent reading data or performing computations.
If you are really concerned about the performance of the OLTP system, it would make sense to offload reporting to a different database on a different server. You can replicate the data to the reporting server using Streams (or GoldenGate if you can afford the additional licensing) which will have less of an impact on the source than materialized views because the redo information can be read asynchronously (and can be read on the reporting server rather than putting that workload on the production server). You could then define materialized views on the reporting server where they won't have any impact on the OLTP server. Or you could create a logical standby database as your reporting server and create the materialized views there. Either way, moving the reporting workload off the production server and reading the redo data asynchronously will protect the performance of the production server.

One data store. Multiple processes. Will this SQL prevent race conditions?

I'm trying to create a Ruby script that spawns several concurrent child processes, each of which needs to access the same data store (a queue of some type) and do something with the data. The problem is that each row of data should be processed only once, and a child process has no way of knowing whether another child process might be operating on the same data at the same instant.
I haven't picked a data store yet, but I'm leaning toward PostgreSQL simply because it's what I'm used to. I've seen the following SQL fragment suggested as a way to avoid race conditions, because the UPDATE clause supposedly locks the table row before the SELECT takes place:
UPDATE jobs
SET status = 'processed'
WHERE id = (
SELECT id FROM jobs WHERE status = 'pending' LIMIT 1
) RETURNING id, data_to_process;
But will this really work? It doesn't seem intuitive the Postgres (or any other database) could lock the table row before performing the SELECT, since the SELECT has to be executed to determine which table row needs to be locked for updating. In other words, I'm concerned that this SQL fragment won't really prevent two separate processes from select and operating on the same table row.
Am I being paranoid? And are there better options than traditional RDBMSs to handle concurrency situations like this?
As you said, use a queue. The standard solution for this in PostgreSQL is PgQ. It has all these concurrency problems worked out for you.
Do you really want many concurrent child processes that must operate serially on a single data store? I suggest that you create one writer process who has sole access to the database (whatever you use) and accepts requests from the other processes to do the database operations you want. Then do the appropriate queue management in that thread rather than making your database do it, and you are assured that only one process accesses the database at any time.
The situation you are describing is called "Non-repeatable read". There are two ways to solve this.
The preferred way would be to set the transaction isolation level to at least REPEATABLE READ. This will mean that any row that concurrent updates of the nature you described will fail. if two processes update the same rows in overlapping transactions one of them will be canceled, its changes ignored, and will return an error. That transaction will have to be retried. This is achieved by calling
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
At the start of the transaction. I can't seem to find documentation that explains an idiomatic way of doing this for ruby; you may have to emit that sql explicitly.
The other option is to manage the locking of tables explicitly, which can cause a transaction to block (and possibly deadlock) until the table is free. Transactions won't fail in the same way as they do above, but contention will be much higher, and so I won't describe the details.
That's pretty close to the approach I took when I wrote pg_message_queue, which is a simple queue implementation for PostgreSQL. Unlike PgQ, it requires no components outside of PostgreSQL to use.
It will work just fine. MVCC will come to the rescue.

Resources