Cache Coherency with Tarantool - tarantool

I understand that Tarantool has ACID transactions within a stored procedure. My question is: does it also make sure in-memory data is in sync with persistent file system data? For Example, if I change 5 records using a Stored Proc and something goes wrong while writing the changes to WAL file, will the in-memory cache roll back to original values for ALL 5 records?
Also, while an update transaction is in progress, will other readers see 'dirty' uncommitted records or a consistent view of the records as these existed before transaction started?
Thanks

Tarantool has a special function for transaction control[1] inside stored procedure. But it has some limitations for instance: you can't call fiber.yield()[2] (that includes underlying calls, i.e. fio, sockets and so on) inside box.begin() box.end() section. You can find more about transaction control here: https://tarantool.io/en/doc/1.9/book/box/atomic.html?highlight=yield.
And also, tarantool supports fsync[3].
[1] https://tarantool.io/en/doc/1.9/book/box/box_txn_management.html?highlight=commit#lua-function.box.commit
[2] https://tarantool.io/en/doc/1.9/reference/reference_lua/fiber.html?highlight=yield#lua-function.fiber.yield
[3] https://tarantool.io/en/doc/1.9/reference/configuration/index.html?highlight=fsync#confval-wal_mode
It is possible only if a user uses fibers and don't control own codes. That means that is possible only if a logical error exists inside user's codes.
You are welcome.

Related

Dynamically List contents of a table in database that continously updates

It's kinda real-world problem and I believe the solution exists but couldn't find one.
So We, have a Database called Transactions that contains tables such as Positions, Securities, Bogies, Accounts, Commodities and so on being updated continuously every second whenever a new transaction happens. For the time being, We have replicated master database Transaction to a new database with name TRN on which we do all the querying and updating stuff.
We want a sort of monitoring system ( like htop process viewer in Linux) for Database that dynamically lists updated rows in tables of the database at any time.
TL;DR Is there any way to get a continuous updating list of rows in any table in the database?
Currently we are working on Sybase & Oracle DBMS on Linux (Ubuntu) platform but we would like to receive generic answers that concern most of the platform as well as DBMS's(including MySQL) and any tools, utilities or scripts that can do so that It can help us in future to easily migrate to other platforms and or DBMS as well.
To list updated rows, you conceptually need either of the two things:
The updating statement's effect on the table.
A previous version of the table to compare with.
How you get them and in what form is completely up to you.
The 1st option allows you to list updates with statement granularity while the 2nd is more suitable for time-based granularity.
Some options from the top of my head:
Write to a temporary table
Add a field with transaction id/timestamp
Make clones of the table regularly
AFAICS, Oracle doesn't have built-in facilities to get the affected rows, only their count.
Not a lot of details in the question so not sure how much of this will be of use ...
'Sybase' is mentioned but nothing is said about which Sybase RDBMS product (ASE? SQLAnywhere? IQ? Advantage?)
by 'replicated master database transaction' I'm assuming this means the primary database is being replicated (as opposed to the database called 'master' in a Sybase ASE instance)
no mention is made of what products/tools are being used to 'replicate' the transactions to the 'new database' named 'TRN'
So, assuming part of your environment includes Sybase(SAP) ASE ...
MDA tables can be used to capture counters of DML operations (eg, insert/update/delete) over a given time period
MDA tables can capture some SQL text, though the volume/quality could be in doubt if a) MDA is not configured properly and/or b) the DML operations are wrapped up in prepared statements, stored procs and triggers
auditing could be enabled to capture some commands but again, volume/quality could be in doubt based on how the DML commands are executed
also keep in mind that there's a performance hit for using MDA tables and/or auditing, with the level of performance degradation based on individual config settings and the volume of DML activity
Assuming you're using the Sybase(SAP) Replication Server product, those replicated transactions sent through repserver likely have all the info you need to know which tables/rows are being affected; so you have a couple options:
route a copy of the transactions to another database where you can capture the transactions in whatever format you need [you'll need to design the database and/or any customized repserver function strings]
consider using the Sybase(SAP) Real Time Data Streaming product (yeah, additional li$ence is required) which is specifically designed for scenarios like yours, ie, pull transactions off the repserver queues and format for use in downstream systems (eg, tibco/mqs, custom apps)
I'm not aware of any 'generic' products that work, out of the box, as per your (limited) requirements. You're likely looking at some different solutions and/or customized code to cover your particular situation.

Dirty Reading in hibernate

Dirty Read: The definition states that
dirty reading occurs when a transaction reads data from a row that has been modified by another transaction but not yet committed.
Assuming the definition is correct, I am unable to fathom any such situation.
Due to the principle of Isolation, the transaction A can not see the uncommitted data of the row that has been modified by transaction B. If transaction B has simply not committed, how transaction A can see it in the first place? It is only possible when both operations are performed under same transaction.
Can someone please explain what am I missing here?
"Dirty", or uncommitted reads (UR) are a way to allow non-blocking reads. Reading uncommitted data is not possible in an Oracle database due to the multi-version concurrency control employed by Oracle; instead of trying to read other transactions' data each transaction gets its own snapshot of data as they existed (committed) at the start of the transaction. As a result all reads are essentially non-blocking.
In databases that use lock-based concurrency control, e.g DB2, uncommitted reads are possible. A transaction using the UR isolation level ignores locks placed by other transactions, and thus it is able to access rows that have been modified but not yet committed.
Hibernate, being an abstraction layer on top of a database, offers the UR isolation level support for databases that have the capability.

One data store. Multiple processes. Will this SQL prevent race conditions?

I'm trying to create a Ruby script that spawns several concurrent child processes, each of which needs to access the same data store (a queue of some type) and do something with the data. The problem is that each row of data should be processed only once, and a child process has no way of knowing whether another child process might be operating on the same data at the same instant.
I haven't picked a data store yet, but I'm leaning toward PostgreSQL simply because it's what I'm used to. I've seen the following SQL fragment suggested as a way to avoid race conditions, because the UPDATE clause supposedly locks the table row before the SELECT takes place:
UPDATE jobs
SET status = 'processed'
WHERE id = (
SELECT id FROM jobs WHERE status = 'pending' LIMIT 1
) RETURNING id, data_to_process;
But will this really work? It doesn't seem intuitive the Postgres (or any other database) could lock the table row before performing the SELECT, since the SELECT has to be executed to determine which table row needs to be locked for updating. In other words, I'm concerned that this SQL fragment won't really prevent two separate processes from select and operating on the same table row.
Am I being paranoid? And are there better options than traditional RDBMSs to handle concurrency situations like this?
As you said, use a queue. The standard solution for this in PostgreSQL is PgQ. It has all these concurrency problems worked out for you.
Do you really want many concurrent child processes that must operate serially on a single data store? I suggest that you create one writer process who has sole access to the database (whatever you use) and accepts requests from the other processes to do the database operations you want. Then do the appropriate queue management in that thread rather than making your database do it, and you are assured that only one process accesses the database at any time.
The situation you are describing is called "Non-repeatable read". There are two ways to solve this.
The preferred way would be to set the transaction isolation level to at least REPEATABLE READ. This will mean that any row that concurrent updates of the nature you described will fail. if two processes update the same rows in overlapping transactions one of them will be canceled, its changes ignored, and will return an error. That transaction will have to be retried. This is achieved by calling
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
At the start of the transaction. I can't seem to find documentation that explains an idiomatic way of doing this for ruby; you may have to emit that sql explicitly.
The other option is to manage the locking of tables explicitly, which can cause a transaction to block (and possibly deadlock) until the table is free. Transactions won't fail in the same way as they do above, but contention will be much higher, and so I won't describe the details.
That's pretty close to the approach I took when I wrote pg_message_queue, which is a simple queue implementation for PostgreSQL. Unlike PgQ, it requires no components outside of PostgreSQL to use.
It will work just fine. MVCC will come to the rescue.

the best way to track data changes in oracle

as the title i am talking about, what's the best way to track data changes in oracle? i just want to know which row being updated/deleted/inserted?
at first i think about the trigger, but i need to write more triggers on each table and then record down the rowid which effected into my change table, it's not good, then i search in Google, learn new concepts about materialized view log and change data capture,
materialized view log is good for me that i can compare it to original table then i can get the different records, even the different of the fields, i think the way is the same with i create/copy new table from original (but i don't know what's different?);
change data capture component is complicate for me :), so i don't want to waste my time to research it.
anybody has the experience the best way to track data changes in oracle?
You'll want to have a look at the AUDIT statement. It gathers all auditing records in the SYS.AUD$ table.
Example:
AUDIT insert, update, delete ON t BY ACCESS
Regards,
Rob.
You might want to take a look at Golden Gate. This makes capturing changes a snap, at a price but with good performance and quick setup.
If performance is no issue, triggers and audit could be a valid solution.
If performance is an issue and Golden Gate is considered too expensive, you could also use Logminer or Change Data Capture. Given this choice, my preference would go for CDC.
As you see, there are quite a few options, near realtime and offline.
Coding a solution by hand also has a price, Golden Gate is worth investigating.
Oracle does this for you via redo logs, it depends on what you're trying to do with this info. I'm assuming your need is replication (track changes on source instance and propagate to 1 or more target instances).
If thats the case, you may consider Oracle streams (other options such as Advanced Replication, but you'll need to consider your needs):
From Oracle:
When you use Streams, replication of a
DML or DDL change typically includes
three steps:
A capture process or an application
creates one or more logical change
records (LCRs) and enqueues them into
a queue. An LCR is a message with a
specific format that describes a
database change. A capture process
reformats changes captured from the
redo log into LCRs, and applications
can construct LCRs. If the change was
a data manipulation language (DML)
operation, then each LCR encapsulates
a row change resulting from the DML
operation to a shared table at the
source database. If the change was a
data definition language (DDL)
operation, then an LCR encapsulates
the DDL change that was made to a
shared database object at a source
database.
A propagation propagates the staged
LCR to another queue, which usually
resides in a database that is separate
from the database where the LCR was
captured. An LCR can be propagated to
a number of queues before it arrives
at a destination database.
At a destination database, an apply
process consumes the change by
applying the LCR to the shared
database object. An apply process can
dequeue the LCR and apply it directly,
or an apply process can dequeue the
LCR and send it to an apply handler.
In a Streams replication environment,
an apply handler performs customized
processing of the LCR and then applies
the LCR to the shared database object.

Two Phase Commit/Shared Transaction

The scenario is this
We have two applications A and B, both which are running in separate database (Oracle 9i ) transactions
Application A - inserts some data into the database, then calls Application B
Application B - inserts some data into the database, related (via foreign keys) to A's data. Returns an "ID" to Application A
Application A - uses ID to insert further data, including the ID from B
Now, because these are separate transactions, but both rely on data from each others transactions, we need to commit between the calls to each application. This of course makes it very difficult to rollback if anything goes wrong.
How would you approach this problem, with minimal refactoring of the code. Surely this kind of this is a common problem in the SOA world?
------ Update --------
I have not been able to find anything in Oracle 9i, however Oracle 11g provides DBMS_XA, which does exactly what I was after.
You have three options:
Redesign the application so that you don't have two different processes (both with database connections) writing to the database and roll it into a single app.
Create application C that handles all the database transactions for A and B.
Roll your own two phase commit. Application C acts as the coordinator. C signals A and B to ask if they're ready to commit. A and B do their processing, and respond to C with either a "ready" or a "fail" reply (note that there should be a timeout on C to avoid an infinite wait if one process hangs or dies). If both reply ready then C tells them to commit. Otherwise it sends a rollback signal.
Note that you may run into issues with option 3 if app A is relying on foreign keys from app B (which you didn't state, so this may not be an issue). Oracle's read consistency would probably prevent this from being allowed, since app A's transaction will begin before app B. Just a warning.
A few suggestions:
Use Compensating transactions. Basically, you make it possible to undo the transaction you did earlier. The hard part is figuring out which transactions to rollback.
Commit the data of applications A and B to the database using a flag indicating that it is only temporary. Then, after everything checks out fine, modify the flag to indicate that the data is final. During the night, run a batch job to flush out data that has not been finalized.
You could probably insert the data from Application A into a 'temporary' area so that Application B can do the inserts of both A and B without changing much in either appplications. It's not particularly elegant but it might do the trick.
In another scenario you could add a 'confirmation' flag field to your data which is updated after the entire process has run successfully. It if fails at one point, it might be easier to track down the records you need to rollback (in effect, delete).
I like both solutions presented, so I avoided posting this for a while. But you could also make an update to the main table, having saved the state of the affected rows in some cache before hand.
This could be combined with the two-tier (The traffic cop system Zathrus proposed)--because it really wouldn't be needed for neonski's solution of using a "sketchpad" table or tables. The drawback of this is that you would have to have your procs/logic consult the main table from the workarea or the workarea from the main table--or perhaps store your flag in the main table and set it back when you commit the data to the main table.
A lady on our team is designing something like that for our realtime system, using permanent work tables.
App_A =={0}=> database # App_A stores information for App_B
App_A ------> App_B # App_A starts App_B
App_B <={0}== database # App_B retrieves the information
App_B =={1}=> database # App_B stores more informaion
App_A <={2}== App_B # App_B returns 'ID' to App_A
App_A ={2,3}> database # App_A stores 'ID' and additional data
Is it just me or does it seem like Application B is essentially just a subroutine of A. I mean Application B doesn't do anything until A asks it, and Application A doesn't do anything until B returns an ID. Which means it makes little sense to have them in different applications, or even separate threads.

Resources