a question about oracle undo segment binding - oracle

I'm no DBA, I just want to learn about Oracle's Multi-Version Concurrency model.
When launching a DML operation, the first step in the MVCC protocol is to bind a undo segment. The question is why one undo segment can only serve for one active transaction?
thank you for your time~~

Multi-Version Concurrency is probably the most important concept to grasp when it comes to Oracle. It is good for programmers to understand it even if they don't want to become DBAs.
There are a few aspects but to this, but they all come down to efficiency: undo management is overhead, so minimizing the number of cycles devoted to it contributes to the overall performance of the database.
A transaction can consist of many statements and generate a lot of undo: it might insert a single row, it might delete thirty thousands. It is better to assign one empty UNDO block at the start rather than continually scouting around for partially filled blocks with enough space.
Following one from that, sharing undo blocks would require the kernel to track of usage at a much finer granularity, which is just added complexity.
When the transaction completes the undo is released (unless, see next point). The fewer blocks the transaction has used the fewer latches have to be reset. Plus, if the blocks are shared we would have to free shards of a block, which is just more effort.
The key thing about MVCC is read consistency. This means that all the records returned by a longer running query will appear in the state they had when the query started. So if I issue a SELECT on the EMP table which takes fifteen minutes to run and halfway through you commit an update of all the salaries I won't see your change, The database does this by retrieving the undo data from the blocks your transaction used. Again, this is a lot easier when all the undo data is collocated in a one or two blocks.

"why one undo segment can only serve for one active transaction?"
It is simply a design decision. That is how undo segments are designed to work. I guess that it was done to address some of the issues that could occur with the previous rollback mechanism.
Rollback (which is still available but deprecated in favor of undo) included explicit creation of rollback segments by the DBA, and multiple transactions could be assigned to a single rollback segment. This had some drawbacks, most obviously that if one transaction assigned to a given segment generated enough rollback data that the segment was full (and could no longer extend), then other transactions using the same segment would be unable to perform any operation that would generate rollback data.
I'm surmising that one design goal of the new undo feature was to prevent this sort of inter-transaction dependency. Therefore, they designed the mechanism so that the DBA sizes and creates the undo tablespace, but the management of segments within it is done internally by Oracle. This allows the use of dedicated segments by each transaction. They can still cause problems for each other if the tablespace fills up (and cannot autoextend), but at the segment level there is no possibility of one transaction causing problems for another.

Related

Will Shrinking and lowering the high water mark cause issues in OLTP systems

newb here, We have an old Oracle 10g instance that they have to keep alive until it is replaced. The nightly jobs have been very slow causing some issues. Every other Week there is a large process that does large amounts of DML (deletes, inserts, updates). Some of these tables have 2+ million rows. I noticed that some of the tables the HWM is higher than expected and in Toad I ran a database advisor check that recommended shrinking some tables, but I am concerned that the tables may need the space for DML operations or will shrinking them make the process faster or slower?
We cannot add cpu due to licensing costs
If you are accessing the tables with full scans and have a lot of empty space below the HWM, then yes, definitely reorg those (alter table move). There is no downside, only benefit. But if your slow jobs are using indexes, then the benefit will be minimal.
Don't assume that your slow jobs are due to space fragmentation. Use ASH (v$active_session_history) and SQL monitor (v$sql_plan_monitor) data or a graphical tool that utilizes this data to explore exactly what your queries are doing. Understand how to read execution plans and determine whether the correct plan is being used for your data. Tuning is unfortunately not a simple thing that can be addressed with a question on this forum.
In general, shrinking tables or rebuilding indexes should speed up reads of the table, or anything that does full table scans. It should not affect other DML operations.
When selecting or searching data, all of the empty blocks in the table and any indexes used by the query must still be read, so rebuilding them to reduce empty space and lower the high water mark will generally improve performance. This is especially true in indexes, where space lost to deleted rows is not recovered for reuse.

Oracle Transaction Commit

Lets say I have a function which carries out a lot of CRUD operations, and also assume that this function is going to get executed without any exception (100% success). Is it better to have a transaction for the entire function or transaction commits for each CRUD operation. Basically, I wanted to know whether using many transaction commits has an impact on the memory and time consumption while executing the function which has a lot of CRUD operations.
Transaction boundaries should be defined by your business logic.
If your application has 100 CRUD operations to do, and each is completely independent of the others, maybe a commit after each is appropriate. Think about this: is it OK for a user running a report against your database to see only half of the CRUD operations?
A transaction is a set of updates that must all happen together or not at all, because a partial transaction would represent an inconsistent or inaccurate state.
Commit at the end of every transaction - that's it. No more, no less. It's not about performance, releasing locks, or managing server resources. Those are all real technical issues, but you don't solve them by committing halfway through a logical unit of work. Commit frequency is not a valid "tuning trick".
EDIT
To answer your actual question:
Basically, I wanted to know whether using many transaction commits has an impact on the memory and time consumption while executing the function which has a lot of CRUD operations.
Committing frequently will actually slow you down. Every time you do a regular commit, Oracle has to make sure that anything in the redo log buffers is flushed to disk, and your COMMIT will wait for the that process to complete.
Also, there is little or no memory savings in frequent commits. Almost all your transaction's work and any held locks are written to redo log buffers and/or database block buffers in memory. Oracle will flush both of those to disk in background as often as it needs to in order to manage memory. Yes, that's right -- your dirty, uncommitted database blocks can be written to disk. No commit necessary.
The only resource that a really huge transaction can blow out is UNDO space. But, again, you don't fix that problem by committing half way through a logical unit of work. If your logical unit of work is really that huge, size your database with an appropriate amount of UNDO space.
My response is "it depends." Does the transaction involve data in only one table or several? Are you performing inserts, updates, or deletes. With an INSERT no other session can see your data till it is committed so technically no rush. However if you update a row on a table where the exact same row may need to be updated by another session in short order you do not want to hold the row for any longer than absolutely necessary. What constitutes a logic unit of work, how much UNDO the table and index changes involved consume, and concurrent DML demand for the same rows all come into play when choosing the commit frequency.

SQL Server - Merging large tables without locking the data

I have a very large set of data (~3 million records) which needs to be merged with updates and new records on a daily schedule. I have a stored procedure that actually breaks up the record set into 1000 record chunks and uses the MERGE command with temp tables in an attempt to avoid locking the live table while the data is updating. The problem is that it doesn't exactly help. The table still "locks up" and our website that uses the data receives timeouts when attempting to access the data. I even tried splitting it up into 100 record chunks and even tried a WAITFOR DELAY '000:00:5' to see if it would help to pause between merging the chunks. It's still rather sluggish.
I'm looking for any suggestions, best practices, or examples on how to merge large sets of data without locking the tables.
Thanks
Change your front end to use NOLOCK or READ UNCOMMITTED when doing the selects.
You can't NOLOCK MERGE,INSERT, or UPDATE as the records must be locked in order to perform the update. However, you can NOLOCK the SELECTS.
Note that you should use this with caution. If dirty reads are okay, then go ahead. However, if the reads require the updated data then you need to go down a different path and figure out exactly why merging 3M records is causing an issue.
I'd be willing to bet that most of the time is spent reading data from the disk during the merge command and/or working around low memory situations. You might be better off simply stuffing more ram into your database server.
An ideal amount would be to have enough ram to pull the whole database into memory as needed. For example, if you have a 4GB database, then make sure you have 8GB of RAM.. in an x64 server of course.
I'm afraid that I've quite the opposite experience. We were performing updates and insertions where the source table had only a fraction of the number of rows as the target table, which was in the millions.
When we combined the source table records across the entire operational window and then performed the MERGE just once, we saw a 500% increase in performance. My explanation for this is that you are paying for the up front analysis of the MERGE command just once instead of over and over again in a tight loop.
Furthermore, I am certain that merging 1.6 million rows (source) into 7 million rows (target), as opposed to 400 rows into 7 million rows over 4000 distinct operations (in our case) leverages the capabilities of the SQL server engine much better. Again, a fair amount of the work is in the analysis of the two data sets and this is done only once.
Another question I have to ask is well is whether you are aware that the MERGE command performs much better with indexes on both the source and target tables? I would like to refer you to the following link:
http://msdn.microsoft.com/en-us/library/cc879317(v=SQL.100).aspx
From personal experience, the main problem with MERGE is that since it does page lock it precludes any concurrency in your INSERTs directed to a table. So if you go down this road it is fundamental that you batch all updates that will hit a table in a single writer.
For example: we had a table on which INSERT took a crazy 0.2 seconds per entry, most of this time seemingly being wasted on transaction latching, so we switched this over to using MERGE and some quick tests showed that it allowed us to insert 256 entries in 0.4 seconds or even 512 in 0.5 seconds, we tested this with load generators and all seemed to be fine, until it hit production and everything blocked to hell on the page locks, resulting in a much lower total throughput than with the individual INSERTs.
The solution was to not only batch the entries from a single producer in a MERGE operation, but also to batch the batch from producers going to individual DB in a single MERGE operation through an additional level of queue (previously also a single connection per DB, but using MARS to interleave all the producers call to the stored procedure doing the actual MERGE transaction), this way we were then able to handle many thousands of INSERTs per second without problem.
Having the NOLOCK hints on all of your front-end reads is an absolute must, always.

What makes Oracle more scalable?

Oracle seems to have a reputation for being more scalable than other RDBMSes. After working with it a bit, I can say that it's more complex than other RDBMSes, but I haven't really seen anything that makes it more scalable than other RDBMSes. But then again, I haven't really worked on it in a whole lot of depth.
What features does Oracle have that are more scalable?
Oracle's RAC architecture is what makes it scalable where it can load balance across nodes and parallel queries can be split up and pushed to other nodes for processing.
Some of the tricks like loading blocks from another node's buffer cache instead of going to disc make performance a lot more scalable.
Also, the maintainability of RAC with rolling upgrades help make the operation of a large system more sane.
There is also a different aspect of scalability - storage scalability. ASM makes increasing the storage capacity very straightforward. A well designed ASM based solution, should scale past the 100s of terabyte size without needing to do anything very special.
Whether these make Oracle more scalable than other RDBMSs, I don't know. But I think I would feel less happy about trying to scale up a non-Oracle database.
Cursor sharing is (or was) a big advantage over the competition.
Basically, the same query plan is used for matching queries. An application will have a standard set of queries it issue (eg get the orders for this customer id). The simple way is to treat every query individually, so if you see 'SELECT * FROM ORDERS WHERE CUSTOMER_ID = :b1', you look at whether table ORDERS has an index on CUSTOMER_ID etc. As a result, you can spend as much work looking up meta data to get a query plan as actually retrieving the data. With simple keyed lookups, a query plan is easy. Complex queries with multiple tables joined on skewed columns are harder.
Oracle has a cache of query plans, and older/less used plans are aged out as new ones are required.
If you don't cache query plans, there's a limit to how smart you can make your optimizer as the more smarts you code into it, the bigger impact you have on each query processed. Caching queries means you only incur that overhead the first time you see the query.
The 'downside' is that for cursor sharing to be effective you need to use bind variables. Some programmers don't realise that and write code that doesn't get shared and then complain that Oracle isn't as fast as mySQL.
Another advantage of Oracle is the UNDO log. As a change is done, the 'old version' of the data is written to an undo log. Other database keep old versions of the record in the same place as the record. This requires VACUUM style cleanup operations or you bump into space and organisation issues. This is most relevant in databases with high update or delete activity.
Also Oracle doesn't have a central lock registry. A lock bit is stored on each individual data record. SELECT doesn't take a lock. In databases where SELECT locks, you could have multiple users reading data and locking each other or preventing updates, introducing scalability limits. Other databases would lock a record when a SELECT was done to ensure that no-one else could change that data item (so it would be consistent if the same query or transaction looked at the table again). Oracle uses UNDO for its read consistency model (ie looking up the data as it appeared at a specific point in time).
Tom Kyte's "Expert Oracle Database Architecture" from Apress does a good job of describing Oracle's architecture, with some comparisons with other rDBMSs. Worth reading.

How to minimize transaction overhead in Oracle?

I have to simultaneously load data into a table and run queries on it. Because of data nature, I can trade integrity for performance. How can I minimize the overhead of transactions?
Unfortunately, alternatives like MySQL cannot be used (due to non-technical reasons).
Other than the general optimization practices that apply to all databases such as eliminating full table scans, removing unused or inefficient indexes, etc., etc., here are a few things you can do.
Run in No Archive Log mode. This sacrifices recoverability for speed.
For inserts use the /*+ APPEND */ hint. This puts data into the table above the high water mark which does not create UNDO. The disadvantage is that existing free space is not used.
On the hardware side, RAID 0 over a larger number of smaller disks will give you the best insert performance, but depending on your usage RAID 10 with its better read performance may provide a better fit.
This said, I don't think you will gain much from any of these changes.
Perhaps I'm missing something, but since in Oracle readers don't block writers and writers don't block readers, what exactly is the problem you are trying to solve?
From the perspective of the sessions that are reading the data, sessions that are doing inserts aren't really adding any overhead (updates might add a bit of overhead as the reader would have to look at data in the UNDO tablespace in order to reconstruct a read-consistent view of the data). From the perspective of the sessions that are inserting the data, sessions that are doing reads aren't really adding any overhead. Of course, your system as a whole might have a bottleneck that causes the various sessions to contend for resources (i.e. if your inserts are using up 100% of the available I/O bandwidth, that is going to slow down queries that have to do physical I/O), but that isn't directly related to the type of operations that the different sessions are doing-- you can flood an I/O subsystem with a bunch of reporting users just as easily as with a bunch of insert sessions.
You want transaction isolation read uncommitted. I don't recommend it but that's what you asked for :)
This will allow you to breach transaction isolation and read uncommitted inserted data.
Please read this Ask Tom article: http://www.oracle.com/technology/oramag/oracle/05-nov/o65asktom.html.
UPDATE: I was actually mistaking, Oracle doesn't really support read uncommitted isolation level, they just mention it :).
How about you try disabling all constraints in your table, then inserting all the data, then enabling them back again?
i.e. alter session set constraints=deffered;
However, if you had not set the constraints in your table to defferable during table creation, there might arise a slight problem.
What kind of performance volumes are you looking at? Are inserts batched or numerous small ones?
Before banging your head against the wall trying to think of clever ways to have good performance, did you create any simple prototypes which would give you a better picture of the out-of-the-box performance? It could easily turn out that you don't need to do anything special to meet the goals.

Resources