Lets say I have a function which carries out a lot of CRUD operations, and also assume that this function is going to get executed without any exception (100% success). Is it better to have a transaction for the entire function or transaction commits for each CRUD operation. Basically, I wanted to know whether using many transaction commits has an impact on the memory and time consumption while executing the function which has a lot of CRUD operations.
Transaction boundaries should be defined by your business logic.
If your application has 100 CRUD operations to do, and each is completely independent of the others, maybe a commit after each is appropriate. Think about this: is it OK for a user running a report against your database to see only half of the CRUD operations?
A transaction is a set of updates that must all happen together or not at all, because a partial transaction would represent an inconsistent or inaccurate state.
Commit at the end of every transaction - that's it. No more, no less. It's not about performance, releasing locks, or managing server resources. Those are all real technical issues, but you don't solve them by committing halfway through a logical unit of work. Commit frequency is not a valid "tuning trick".
EDIT
To answer your actual question:
Basically, I wanted to know whether using many transaction commits has an impact on the memory and time consumption while executing the function which has a lot of CRUD operations.
Committing frequently will actually slow you down. Every time you do a regular commit, Oracle has to make sure that anything in the redo log buffers is flushed to disk, and your COMMIT will wait for the that process to complete.
Also, there is little or no memory savings in frequent commits. Almost all your transaction's work and any held locks are written to redo log buffers and/or database block buffers in memory. Oracle will flush both of those to disk in background as often as it needs to in order to manage memory. Yes, that's right -- your dirty, uncommitted database blocks can be written to disk. No commit necessary.
The only resource that a really huge transaction can blow out is UNDO space. But, again, you don't fix that problem by committing half way through a logical unit of work. If your logical unit of work is really that huge, size your database with an appropriate amount of UNDO space.
My response is "it depends." Does the transaction involve data in only one table or several? Are you performing inserts, updates, or deletes. With an INSERT no other session can see your data till it is committed so technically no rush. However if you update a row on a table where the exact same row may need to be updated by another session in short order you do not want to hold the row for any longer than absolutely necessary. What constitutes a logic unit of work, how much UNDO the table and index changes involved consume, and concurrent DML demand for the same rows all come into play when choosing the commit frequency.
Related
I am working o a big DB driven application that sometimes needs a huge data import. Data is imported from excel spreadsheets and at the start of the proces (for about 500 rows) the data is processed relatively quicly, but lates slows down significantly. Import generates 6 linked entites per row of excel that are flushed after processing every line. My guess is that all those entities are getting cached by doctrine and just build up. My idea is to clear out all that cach every 200 rows but I could not find how to clear it from within the code (console is not an option at this stage). Any assistance or links would be much appreciated.
I suppose that the cause may lie not in Doctrine but in the database transaction log buffer size. The documentation says
A large log buffer enables large transactions to run without a need to write the log to disk before the transactions commit. Thus, if you have big transactions, making the log buffer larger saves disk I/O.
Most likely you insert your data in one big transaction. When the buffer is full, it is written to disk which is normally slower.
There are several possible solutions.
Increase buffer size so that the transaction fits into the buffer.
Split the transaction into several parts that fit into the buffer.
In the second case keep in mind that each transaction needs time as well, so wrapping each insert in a separate transaction will reduce performance as well.
I recommend to wrap about 500 rows in a transaction because this seems to be a size that fits in the buffer.
When there is only one writer to a Berkeley DB, is it worth to use transactions?
Do transaction cause a significant slowdown? (in percents please)
You use transactions if you require the atomicity that they provide. Perhaps you need to abort the transaction, undoing everything in it? Or perhaps you need the semantic that should the application fail, a partially completed transaction is aborted. Your choice of transactions is based on atomicity, not performance. If you need it, you need it.
If you don't need atomicity, you may not need durability. Then, that is significantly faster!
Transactions with DB_INIT_TXN in Berkeley DB are not significantly
slower than other models, although generally maintaining a transactional
log requires all data to be written to the log before being written
to the database.
For a single writer and multiple readers, try the DB_INIT_CDB
model because the code is much simpler. Locks in the INIT_CDB
model are per-table and so overall throughput might be worse
than a INIT_TXN model because of coarse grained per-table
lock contention.
Performance will depend on access patterns more than whether
one uses DB_INIT_TXN or DB_INIT_CDB models.
I'm no DBA, I just want to learn about Oracle's Multi-Version Concurrency model.
When launching a DML operation, the first step in the MVCC protocol is to bind a undo segment. The question is why one undo segment can only serve for one active transaction?
thank you for your time~~
Multi-Version Concurrency is probably the most important concept to grasp when it comes to Oracle. It is good for programmers to understand it even if they don't want to become DBAs.
There are a few aspects but to this, but they all come down to efficiency: undo management is overhead, so minimizing the number of cycles devoted to it contributes to the overall performance of the database.
A transaction can consist of many statements and generate a lot of undo: it might insert a single row, it might delete thirty thousands. It is better to assign one empty UNDO block at the start rather than continually scouting around for partially filled blocks with enough space.
Following one from that, sharing undo blocks would require the kernel to track of usage at a much finer granularity, which is just added complexity.
When the transaction completes the undo is released (unless, see next point). The fewer blocks the transaction has used the fewer latches have to be reset. Plus, if the blocks are shared we would have to free shards of a block, which is just more effort.
The key thing about MVCC is read consistency. This means that all the records returned by a longer running query will appear in the state they had when the query started. So if I issue a SELECT on the EMP table which takes fifteen minutes to run and halfway through you commit an update of all the salaries I won't see your change, The database does this by retrieving the undo data from the blocks your transaction used. Again, this is a lot easier when all the undo data is collocated in a one or two blocks.
"why one undo segment can only serve for one active transaction?"
It is simply a design decision. That is how undo segments are designed to work. I guess that it was done to address some of the issues that could occur with the previous rollback mechanism.
Rollback (which is still available but deprecated in favor of undo) included explicit creation of rollback segments by the DBA, and multiple transactions could be assigned to a single rollback segment. This had some drawbacks, most obviously that if one transaction assigned to a given segment generated enough rollback data that the segment was full (and could no longer extend), then other transactions using the same segment would be unable to perform any operation that would generate rollback data.
I'm surmising that one design goal of the new undo feature was to prevent this sort of inter-transaction dependency. Therefore, they designed the mechanism so that the DBA sizes and creates the undo tablespace, but the management of segments within it is done internally by Oracle. This allows the use of dedicated segments by each transaction. They can still cause problems for each other if the tablespace fills up (and cannot autoextend), but at the segment level there is no possibility of one transaction causing problems for another.
I have a very large set of data (~3 million records) which needs to be merged with updates and new records on a daily schedule. I have a stored procedure that actually breaks up the record set into 1000 record chunks and uses the MERGE command with temp tables in an attempt to avoid locking the live table while the data is updating. The problem is that it doesn't exactly help. The table still "locks up" and our website that uses the data receives timeouts when attempting to access the data. I even tried splitting it up into 100 record chunks and even tried a WAITFOR DELAY '000:00:5' to see if it would help to pause between merging the chunks. It's still rather sluggish.
I'm looking for any suggestions, best practices, or examples on how to merge large sets of data without locking the tables.
Thanks
Change your front end to use NOLOCK or READ UNCOMMITTED when doing the selects.
You can't NOLOCK MERGE,INSERT, or UPDATE as the records must be locked in order to perform the update. However, you can NOLOCK the SELECTS.
Note that you should use this with caution. If dirty reads are okay, then go ahead. However, if the reads require the updated data then you need to go down a different path and figure out exactly why merging 3M records is causing an issue.
I'd be willing to bet that most of the time is spent reading data from the disk during the merge command and/or working around low memory situations. You might be better off simply stuffing more ram into your database server.
An ideal amount would be to have enough ram to pull the whole database into memory as needed. For example, if you have a 4GB database, then make sure you have 8GB of RAM.. in an x64 server of course.
I'm afraid that I've quite the opposite experience. We were performing updates and insertions where the source table had only a fraction of the number of rows as the target table, which was in the millions.
When we combined the source table records across the entire operational window and then performed the MERGE just once, we saw a 500% increase in performance. My explanation for this is that you are paying for the up front analysis of the MERGE command just once instead of over and over again in a tight loop.
Furthermore, I am certain that merging 1.6 million rows (source) into 7 million rows (target), as opposed to 400 rows into 7 million rows over 4000 distinct operations (in our case) leverages the capabilities of the SQL server engine much better. Again, a fair amount of the work is in the analysis of the two data sets and this is done only once.
Another question I have to ask is well is whether you are aware that the MERGE command performs much better with indexes on both the source and target tables? I would like to refer you to the following link:
http://msdn.microsoft.com/en-us/library/cc879317(v=SQL.100).aspx
From personal experience, the main problem with MERGE is that since it does page lock it precludes any concurrency in your INSERTs directed to a table. So if you go down this road it is fundamental that you batch all updates that will hit a table in a single writer.
For example: we had a table on which INSERT took a crazy 0.2 seconds per entry, most of this time seemingly being wasted on transaction latching, so we switched this over to using MERGE and some quick tests showed that it allowed us to insert 256 entries in 0.4 seconds or even 512 in 0.5 seconds, we tested this with load generators and all seemed to be fine, until it hit production and everything blocked to hell on the page locks, resulting in a much lower total throughput than with the individual INSERTs.
The solution was to not only batch the entries from a single producer in a MERGE operation, but also to batch the batch from producers going to individual DB in a single MERGE operation through an additional level of queue (previously also a single connection per DB, but using MARS to interleave all the producers call to the stored procedure doing the actual MERGE transaction), this way we were then able to handle many thousands of INSERTs per second without problem.
Having the NOLOCK hints on all of your front-end reads is an absolute must, always.
I have to simultaneously load data into a table and run queries on it. Because of data nature, I can trade integrity for performance. How can I minimize the overhead of transactions?
Unfortunately, alternatives like MySQL cannot be used (due to non-technical reasons).
Other than the general optimization practices that apply to all databases such as eliminating full table scans, removing unused or inefficient indexes, etc., etc., here are a few things you can do.
Run in No Archive Log mode. This sacrifices recoverability for speed.
For inserts use the /*+ APPEND */ hint. This puts data into the table above the high water mark which does not create UNDO. The disadvantage is that existing free space is not used.
On the hardware side, RAID 0 over a larger number of smaller disks will give you the best insert performance, but depending on your usage RAID 10 with its better read performance may provide a better fit.
This said, I don't think you will gain much from any of these changes.
Perhaps I'm missing something, but since in Oracle readers don't block writers and writers don't block readers, what exactly is the problem you are trying to solve?
From the perspective of the sessions that are reading the data, sessions that are doing inserts aren't really adding any overhead (updates might add a bit of overhead as the reader would have to look at data in the UNDO tablespace in order to reconstruct a read-consistent view of the data). From the perspective of the sessions that are inserting the data, sessions that are doing reads aren't really adding any overhead. Of course, your system as a whole might have a bottleneck that causes the various sessions to contend for resources (i.e. if your inserts are using up 100% of the available I/O bandwidth, that is going to slow down queries that have to do physical I/O), but that isn't directly related to the type of operations that the different sessions are doing-- you can flood an I/O subsystem with a bunch of reporting users just as easily as with a bunch of insert sessions.
You want transaction isolation read uncommitted. I don't recommend it but that's what you asked for :)
This will allow you to breach transaction isolation and read uncommitted inserted data.
Please read this Ask Tom article: http://www.oracle.com/technology/oramag/oracle/05-nov/o65asktom.html.
UPDATE: I was actually mistaking, Oracle doesn't really support read uncommitted isolation level, they just mention it :).
How about you try disabling all constraints in your table, then inserting all the data, then enabling them back again?
i.e. alter session set constraints=deffered;
However, if you had not set the constraints in your table to defferable during table creation, there might arise a slight problem.
What kind of performance volumes are you looking at? Are inserts batched or numerous small ones?
Before banging your head against the wall trying to think of clever ways to have good performance, did you create any simple prototypes which would give you a better picture of the out-of-the-box performance? It could easily turn out that you don't need to do anything special to meet the goals.