I have a flight reservation program use mssql
,For reserving flights i want to be sure should i use isolation level or locks?
(this is a sample code,my problem is Isolation Level for this situation not do the reservation)
My Database has a table for inventory like:
Inventory Table
------------------------
id (Pk),
FlightNumber,
Total,
Sold
now if some want to reserve a flight,i use this code in transaction
Decalre #total int;
Decalre #sold int;
Select #total=Total,#sold=Sold From Inventory where FlightNumber='F3241b';
IF #total-#sold > 0
BEGIN
Update inventory set Sold=Sold+1 where FlightNumber='F3241b';
PRINT 'Reserve Complete'
END
ELSE
PRINT 'this flight is full'
i have these Question:
Q1: Should I use Locks or Isolation Levels?does it have any benefit for perfomance to use one?
Q2: according to Q1 Which Isolation Level or Lock should i use
If you're looking to see what isolation level will make the sample code work as it stands, rather than what is the best way to solve the problem addressed by the sample code, you would need the guarantees of at least REPEATABLE READ.
Databases which use strict two-phase locking (S2PL) for concurrency allow READ COMMITTED transactions to drop shared locks at the completion of each statement, or even earlier, so between the time transaction A checks availability and the time it claims the seats, someone else could come through with transaction B and read again, without causing either transaction to fail. Transaction A might block transaction B briefly, but both would update, and you could be over-sold.
In databases using multi-version concurrency control (MVCC) for concurrency, reads don't block writes and writes don't block reads. At READ COMMITTED, each statement uses a new snapshot of the database based on what has committed, and in at least some (I know this is true in PostgreSQL), concurrent writes are resolved without error. So even if transaction A was in the process of updating the sold count, or had done so and not committed, transaction B would see the old count and proceed to update. When it attempted the update, it could block waiting for the previous update, but once that committed, it would find the new version of the row, check whether it meets the selection criteria, update if it does and ignore the row if not, and proceed to commit without error. So, again, you are over-sold.
I guess that answers Q2, if you choose to use transaction isolation. The problem can be solved at a lower isolation level by modifying the example code to take explicit locks, but that will usually cause more blocking that using an isolation level which is strict enough to handle it automatically.
You are overly complicating things. All your queries can be replaced with:
Update inventory
set Sold = Sold + 1
where FlightNumber = 'F3241b'
AND Total - Sold > 0 -- Important!
If the flight is full, the UPDATE won't take place (the second condition is not met) and it will return 0 modified rows. If this is the case it means the flight is full. Otherwise the query modifies the Sold value and returns 1 modified row.
In this case any isolation level is fine because a single query is always atomic. This is somewhat similar to optimistic-locking.
BTW this query can be easily tuned to allow arbitrary number of reservations to be made atomically:
Update inventory
set Sold = Sold + #seats
where FlightNumber = 'F3241b'
AND Total - Sold >= #seats
See this link that explains SNAPSHOT ISOLATION level in SQL Server.
http://msdn.microsoft.com/en-us/library/ms345124(v=sql.90).aspx
They talk about a car rental application.
If you need a more restrictive isolation level you could move to IsolationLevel Serializable. But be warned that this is prone to locking and might affect your performance.
Related
I have started using spring from last few months and I have a question on transactions. I have a java method inside my spring batch job which first does a select operation to get first 100 rows with status as 'NOT COMPLETED' and does a update on the selected rows to change the status to 'IN PROGRESS'. Since I'm processing around 10 million records, I want to run multiple instances of my batch job and each instance has multiple threads. For a single instance, to make sure two threads are not fetching the same set of records, I have made my method as synchonized. But if I run multiple instances of my batch job (multiple JVMs), there is high probability that same set of records might be fetched by both the instances even if I use "optimistic" or "pesimistic lock" or "select for update" since we cannot lock records during selection. Below is the example shown. Transaction 1 has fetched 100 records and meanwhile Transaction2 also fetched 100 records but if I enable locking transaction 2 waits until transaction 1 is updated and committed. But Transaction 2 again does the same update.
Is there any way in spring to make transaction 2's select operation to wait until transaction 1's select is completed ?
Transaction1 Transaction2
fetch 100 records
fetch 100 records
update 100 records
commit
update 100 records
commit
#Transactional
public synchronized List<Student> processStudentRecords(){
List<Student> students = getNotCompletedRecords();
if(null != students && students.size() > 0){
updateStatusToInProgress(students);
}
return student;
}
Note: I cannot perform update first and then select. I would appreciate if any alternative approach is suggested ?
Transaction synchronization should be left to the database server and not managed at the application level. From the database server point of view, no matter how many JVMs (threads) you have, those are concurrent database clients asking for read/write operations. You should not bother yourself with such concerns.
What you should do though is try to minimize contention as much as possible in the design of your solution, for example, by using the (remote) partitioning technique.
if I run multiple instances of my batch job (multiple JVMs), there is high probability that same set of records might be fetched by both the instances even if I use "optimistic" or "pesimistic lock" or "select for update" since we cannot lock records during selection
Partitioning data will by design remove all these problems. If you give each instance a set of data to work on, there is no chance that a worker would select the same of records of another worker. Michael gave a detailed example in this answer: https://stackoverflow.com/a/54889092/5019386.
(Logical) Partitioning however will not solve the contention problem since all workers would read/write from/to the same table, but that's the nature of the problem you are trying to solve. What I'm saying is that you don't need to start locking/unlocking the table in your design, leave this to the database. Some database severs like Oracle can write data of the same table to different partitions on disk to optimize concurrent access (which might help if you use partitioning), but again that's Oracle's business, not Spring's (or any other framework) business.
Not everybody can afford Oracle so I would look for a solution at the conceptual level. I have successfully used the following solution ("Pseudo" physical partitioning) to a problem similar to yours:
Step 1 (in serial): copy/partition unprocessed data to temporary tables (in serial)
Step 2 (in parallel): run multiple workers on these tables instead of the source table with millions of rows.
Step 3 (in serial): copy/update processed data back to the original table
Step 2 removes the contention problem. Usually, the cost of (Step 1 + Step 3) is neglectable compared to Step 2 (even more neglectable if Step 2 is done in serial). This works well if the processing is the bottleneck.
Hope this helps.
I have an e-commerce site written with Spring Boot + Angular. I need to maintain a counter in my product table for tracking how many has been sold. But the counter sometime becomes inaccurate when many users are ordering the same item concurrently.
In my service code, I have the following transactional declaration:
#Transactional(propagation = Propagation.REQUIRES_NEW, isolation = Isolation.READ_COMMITTED)
in which, after persisting the order (using CrudRepository.save()), I do a select query to sum the quantities being ordered so far, hoping the select query will count all orders have been committed. But that doesn't seem to be the case, from time to time, the counter is less than the actual number.
Same issue happens for my other use case: quantity limit a product. I use the same transaction isolation setting. In the code, I'll do a select query to see how many has been sold and throw out of stock error if we can't fulfill the order. But for hot items, we some times oversold the item because each thread doesn't see the orders just committed in other threads.
So is READ_COMMITTED the right isolation level for my use case? Or I should do pessimistic locking for this use case?
UPDATE 05/13/17
I chose Ruben's approach as I know more about java than database so I took the easier road for me. Here's what I did.
#Transactional(propagation = Propagation.REQUIRES_NEW, isolation = Isolation.SERIALIZABLE)
public void updateOrderCounters(Purchase purchase, ACTION action)
I'm use JpaRepository so I don't play entityManager directly. Instead, I just put the code to update counters in a separate method and annotated as above. It seems to work well so far. I have seen >60 concurrent connections making orders and no oversold and the response time seems ok as well.
Depending on how you retrieve the total sold items count the available options might differ :
1. If you calculate the sold items count dynamically via a sum query on orders
I believe in this case the option you have is using SERIALIZABLE isolation level for the transaction, since this is the only one which supports range locks and prevents phantom reads.
However, I would not really recommend going with this isolation level since it has a major performance impact on your system (or used really carefully on a well designed spots only).
Links : https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html#isolevel_serializable
2. If you maintain a counter on product or some other row associated with the product
In this case I would probably recommend using row level locking eg select for update in a service method which checks the availability of the product and increments the sold items count. The high level algorithm of the product placement could be similar to the steps below :
Retrieve the row storing the number of remaining/sold items count using the select for update query (#Lock(LockModeType.PESSIMISTIC_WRITE) on a repository method).
Make sure that the retrieved row has up to date field values since it could be retrieved from the Hibernate session level cache (hibernate would just execute select for update query on the id just to acquire the lock). You can achieve this by calling 'entityManager.refresh(entity)'.
Check the count field of the row and if the value is fine with your business rules then increment/decrement it.
Save the entity, flush the hibernate session, and commit the transaction (explicitly or implicitly).
A meta code is below :
#Transactional
public Product performPlacement(#Nonnull final Long id) {
Assert.notNull(id, "Product id should not be null");
entityManager.flush();
final Product product = entityManager.find(Product.class, id, LockModeType.PESSIMISTIC_WRITE);
// Make sure to get latest version from database after acquiring lock,
// since if a load was performed in the same hibernate session then hibernate will only acquire the lock but use fields from the cache
entityManager.refresh(product);
// Execute check and booking operations
// This method call could just check if availableCount > 0
if(product.isAvailableForPurchase()) {
// This methods could potentially just decrement the available count, eg, --availableCount
product.registerPurchase();
}
// Persist the updated product
entityManager.persist(product);
entityManager.flush();
return product;
}
This approach will make sure that no any two threads/transactions will be ever performing a check and update on the same row storing the count of a product concurrently.
However, because of that it will also have some performance degradation effect on your system hence it is essential to make sure that atomic increment/decrement is being used as far in the purchase flow as possible and as rare as possible (eg, right in the checkout handling routine when customer hits pay). Another useful trick for minimizing the effect of a lock would be adding that 'count' column not on a product itself but on a different table which is associated with the product. This will prevent you from locking the products rows, since the locks will be acquired on a different row/table combination which are used purely during the checkout stage.
Links: https://dev.mysql.com/doc/refman/5.7/en/innodb-locking-reads.html
Summary
Please note that both of the techniques introduce extra synchronization points in your system hence reducing throughput. So please make sure to carefully measure the impact it has on your system via performance test or any other technique which is being used in your project for measuring the throughput.
Quite often online shops choose going towards overselling/booking some items rather then affecting the performance.
Hope this helps.
With these transaction settings, you should see the stuff that is committed. But still, your transaction handling isn't water tight. The following might happen:
Let's say you have one item in stock left.
Now two transactions start, each ordering one item.
Both check the inventory and see: "Fine enough stock for me."
Both commit.
Now you oversold.
Isolation level serializable should fix that. BUT
the isolation levels available in different databases vary widely, so I don't think it is actually guaranteed to give you the requested isolation level
this limits seriously limits scalability. The transactions doing this should be as short and as rare as possible.
Depending on the database you are using it might be a better idea to implement this with a database constraint. In oracle, for example, you could create a materialized view calculating the complete stock and put a constraint on the result to be non-negative.
Update
For the materialized view approach you do the following.
create materialized view, that calculates the value that you want to constraint, e.g. the sum of orders. Make sure the materialized view gets updated in the transaction that change the content of the underlyingt tables.
For oracle this is achieved by the ON COMMIT clause.
ON COMMIT Clause
Specify ON COMMIT to indicate that a fast refresh is to occur whenever the database commits a transaction that operates on a master table of the materialized view. This clause may increase the time taken to complete the commit, because the database performs the refresh operation as part of the commit process.
See https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6002.htm for more details.
Put a check constraint on that materialized view to encode the constraint that you want, e.g. that the value is never negative. Note, that a materialized view is just another table, so you can create constraints just as you would normaly do.
See fore example https://www.techonthenet.com/oracle/check.php
I'm trying to create a Ruby script that spawns several concurrent child processes, each of which needs to access the same data store (a queue of some type) and do something with the data. The problem is that each row of data should be processed only once, and a child process has no way of knowing whether another child process might be operating on the same data at the same instant.
I haven't picked a data store yet, but I'm leaning toward PostgreSQL simply because it's what I'm used to. I've seen the following SQL fragment suggested as a way to avoid race conditions, because the UPDATE clause supposedly locks the table row before the SELECT takes place:
UPDATE jobs
SET status = 'processed'
WHERE id = (
SELECT id FROM jobs WHERE status = 'pending' LIMIT 1
) RETURNING id, data_to_process;
But will this really work? It doesn't seem intuitive the Postgres (or any other database) could lock the table row before performing the SELECT, since the SELECT has to be executed to determine which table row needs to be locked for updating. In other words, I'm concerned that this SQL fragment won't really prevent two separate processes from select and operating on the same table row.
Am I being paranoid? And are there better options than traditional RDBMSs to handle concurrency situations like this?
As you said, use a queue. The standard solution for this in PostgreSQL is PgQ. It has all these concurrency problems worked out for you.
Do you really want many concurrent child processes that must operate serially on a single data store? I suggest that you create one writer process who has sole access to the database (whatever you use) and accepts requests from the other processes to do the database operations you want. Then do the appropriate queue management in that thread rather than making your database do it, and you are assured that only one process accesses the database at any time.
The situation you are describing is called "Non-repeatable read". There are two ways to solve this.
The preferred way would be to set the transaction isolation level to at least REPEATABLE READ. This will mean that any row that concurrent updates of the nature you described will fail. if two processes update the same rows in overlapping transactions one of them will be canceled, its changes ignored, and will return an error. That transaction will have to be retried. This is achieved by calling
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
At the start of the transaction. I can't seem to find documentation that explains an idiomatic way of doing this for ruby; you may have to emit that sql explicitly.
The other option is to manage the locking of tables explicitly, which can cause a transaction to block (and possibly deadlock) until the table is free. Transactions won't fail in the same way as they do above, but contention will be much higher, and so I won't describe the details.
That's pretty close to the approach I took when I wrote pg_message_queue, which is a simple queue implementation for PostgreSQL. Unlike PgQ, it requires no components outside of PostgreSQL to use.
It will work just fine. MVCC will come to the rescue.
I read that Oracle maintains row versions to deal with concurrency. I want to run an update query on a very big real-time database but this update job must alter the most recent version of the row.
Is this possible via PL/SQL or simply SQL?
Edited below **
Let me clear the scenario, the real-life issue that we faced on a very large database. Our client is a well-known cell phone service provider.
Our database has a table that manages records of the current balance left on the customer's cell phone account. Among the other columns of the table, one column stores the amount of recharge done and one other column manages the current active balance left.
We have two independent PL/SQL scripts. One script is automatically fired when the customer recharges his phone and updates his balance.
The second script is about deduction certain charges from the customers account. This is a batch job as it applies to all the customers. This script is scheduled to run at certain intervals of a day. When this script is run, it loads 50,000 records in the memory, updates certain columns and performs bulk update back to the table.
The issue happened is like this:
A customer, whose ID is 101, contacted his local shop to get his phone recharged. He pays the amount. But till the time his phone was about to recharge, the scheduled time of the second script fired the second script. The second script loaded the records of 50,000 customers in the memory. In this in-memory records, one of the record of this customer too.
Till the time the second script's batch update finishes, the first script successfully recharged the customer's account.
Now what happened is that is the actual table, the column: "CurrentAccountBalance" gets updated to 150, but the in-memory records on which the second script was working had the customer's old balance i.e, 100.
The second script had to deduct 10 from the column: "CurrentAccountBalance". When, according to actual working, the customer's "CurrentAccountBalance" should be 140, this issue made his balance 90.
Now how to deal with this issue.
I think what you want is what is anyway happening if you UPDATE.
It is true that Oracle keeps old data for a while, but just to support consistent reads. That is, read operations that see only the state as it was at start of the transaction--even if the data was overwritten in the meantime. It's called Multi Version Concurrency Control and can be controlled by the Transaction Isolation Level.
You can explicitly request the most recent one by selecting `FOR UPDATE; that adds a lock for the record so that nobody else can update it in the meanwhile (until your transaction ends).
However, if you need to write anything (e.g., UPDATE) Oracle works always on the most recent version.
As #Markus suggested, you have a race condition. If you're loading records into memory and working on them before updating the rows in the table, and something else may try to update them in the meantime, then you need to lock them while you work on them. (I'm assuming whatever you're doing is too complicated to do a simple one-step update). Something like this would work:
DECLARE
CURSOR c is SELECT * FROM current_balance_table FOR UPDATE;
BEGIN
FOR r IN c LOOP
/* Do whatever calculations you need */
new_value := r.CurrantAccountBalance - 10;
UPDATE current_balance_table SET CurrentAccountBalance = new_value
WHERE CURRENT OF c;
END LOOP:
END;
The problem now is that all records are locked for the duration of the loop, so your customer in the shop will either not be able to update their balance, or will have a log wait before the update takes effect - though when it does it will work on the updated value you stored. So you'd have to break the cursor up into small chunks, balancing performance of your script against the impact on anyone else trying to update the same table.
One option would be to have an outer cursor selecting all the customers you're targeting with no locking, and then an inner one that locks the balance record for that customer while that row is calculated and updated. You'd have to commit after each inner loop to release the lock for that row. This involves a lot more locking/unlocking and committing after every row update slows things down a lot. But it minimises the impact on the individual customer in the shop, as only a single row is locked at a time and the length of time that is locked is minimised. So, you need to find the right balance.
I have a problem understanding read consistency in database (Oracle).
Suppose I am manager of a bank . A customer has got a lock (which I don't know) and is doing some updating. Now after he has got a lock I am viewing their account information and trying to do some thing on it. But because of read consistency I will see the data as it existed before the customer got the lock. So will not that affect inputs I am getting and the decisions that I am going to make during that period?
The point about read consistency is this: suppose the customer rolls back their changes? Or suppose those changes fail because of a constraint violation or some system failure?
Until the customer has successfully committed their changes those changes do not exist. Any decision you might make on the basis of a phantom read or a dirty read would have no more validity than the scenario you describe. Indeed they have less validity, because the changes are incomplete and hence inconsistent. Concrete example: if the customer's changes include making a deposit and making a withdrawal, how valid would your decision be if you had looked at the account when they had made the deposit but not yet made the withdrawal?
Another example: a long running batch process updates the salary of every employee in the organisation. If you run a query against employees' salaries do you really want a report which shows you half the employees with updated salaries and half with their old salaries?
edit
Read consistency is achieved by using the information in the UNDO tablespace (rollback segments in the older implementation). When a session reads data from a table which is being changed by another session, Oracle retrieves the UNDO information which has been generated by that second session and substitutes it for the changed data in the result set presented to the first session.
If the reading session is a long running query it might fail because due to the notorious ORA-1555: snapshot too old. This means the UNDO extent which contained the information necessary to assemble a read consistent view has been overwritten.
Locks have nothing to do with read consistency. In Oracle writes don't block reads. The purpose of locks is to prevent other processes from attempting to change rows we are interested in.
For systems that have large number of users, where users may "hold" the lock for a long time the Optimistic Offline Lock pattern is usually used, i.e. use the version in the UPDATE ... WHERE statement.
You can use a date, version id or something else as the row version. Also the virtual columm ORA_ROWSCN may be used but you need to read up on it first.
When a record is locked due to changes or an explicit lock statement, an entry is made into the header of that block. This is called an ITL (interested transaction list). When you come along to read that block, your session sees this and knows where to go to get the read consistent copy from the rollback segment.