I'm trying to work out why my Oracle 19c database is "suddenly" experiencing high commit waits. Looking in V$ACTIVE_SESSION_HISTORY and DBA_HIST_ACTIVE_SESS_HISTORY shows me that lots of sessions are waiting on "log file sync" and the blocking session is the LGWR process. Not a sign of a problem in itself, but a couple of months ago (before a recent set of product updates) it wasn't doing that, so I'm trying to understand what has changed. Either some code changes made over the last 2 months have caused this, or potentially the I/O system is experiencing a problem.
Because it's an OLTP system we have many different types of transaction, and I'm finding it difficult to filter out the noise from the performance views. What I'd like to be able to do is identify the sessions which are doing most commits, and also the sessions that are doing the "largest" commits, and then I can trace these back to see which pieces of code are responsible etc.
I would therefore like to be able to create a table such as this:
SESSION_ID
SESSION_SERIAL#
COMMIT_COUNT
COMMIT_SIZE
1
12345
3
132436
For commit size, I guessed I would need to use something like the "wait time" as an approximation and was hoping that the TM_DELTA_DB_TIME column would help me out here, but not sure how to measure the number of commits. I had hoped that the XID column would allow me to see the transaction boundaries but it's usually NULL.
And now I've stopped to question why there isn't an easier way to do this, and whether I'm going about it the wrong way. Surely I can't be the only person to want more in-depth understanding of the commit activity within their Oracle database. Or am I asking for data that doesn't exist in the views?
If anybody has some tips for where to look I would be very grateful!
Related
I'm looking at doing a data-fix and need to be able to prove that the data I have intended to change is the only data changed. (For example - that a trigger hasn't modified additional columns that I wasn't expecting)
I've been looking at Oracle's Flashback Query process, which would be great, except that this is not enabled on the database in question.
Since this check would be carried out prior to committing the transaction, Oracle must have the "before" information squirreled away somewhere, and I wondered if there is any way of accessing this undo information?
Otherwise, I would potentially have to make a temporary copy of each table and do a compare between the live table and the backup, which may also result in inconsistencies between the backup query time and the update transaction time.
While I'm expecting the answer "no", I'm hoping someone can point me in a better direction than that which I appear to be headed at present.
Thanks!
I apologize if this is too vague, but it is a random issue that occurs with many types of statements. Google and Stack Overflow searches have failed me. Here is what I am experiencing, I hope that someone out there has seen or at least heard of this happening and possibly knows of a solution.
From time to time, with no apparent rhyme or reason, statements that I run through PL/SQL Developer against our Oracle databases do not "stick". Last week I ran an update on table A, a commit for the update statement, then a truncate on table B and an insert to table B followed by another commit. Everything seemed to work fine, as in I received no errors. I was, of course, able to query the changes and see that they were made. However, upon logging out and then back in, the changes had not been committed. Even the truncate command had not worked "stuck" - and truncates do not need a commit performed.
Some details that may be helpful: I am logging into the database server through PL/SQL on a shared account that is used by my team only to gain access to the schema (multiple schemas on each server, each schema has one shared login/PW). Of the 12 people on my team, I am the only one experiencing this issue. I have asked our database administration team to investigate my profile setup and have been told that my profile looks the same as my teammates' profiles. We are forced to go through Citrix to connect to our production database servers. I can only have one instance of PL/SQL open at any time through Citrix, so I typically have PL/SQL connected to several schemas, but I have never been running SQL on more than one schema simultaneously. I'm not even sure if that's possible, but I thought I would mention it. I typically have 3-4 windows open within PL/SQL, each connected to a different schema.
My manager was directly involved in a case where something similar to this happened. I ran four update commands, and committed each one in between; then he ran a select statement only to find that my updates had not actually committed.
I hope that one of my fellow Overflowers' has seen or heard of this issue, or at least may be able to provide me with a direction to follow to attempt to get to the bottom of this.
"it has begun to reflect poorly on me and damage my reputation in the company."
What would really reflect poorly on you would be you believing that an Oracle RDBMS is a magical or random device, or, even worse, sentient and conducting a personal vendetta against you. Computers may seem vindictive but that is always us projecting onto them ;-)
The way to burnish your reputation would be through an informed investigation of the situation. Databases do not randomly lose transactions. So, what is going on?
Possible culprits:
Triggers: does table A have an UPDATE trigger which suppresses some of your SQL?
Synonyms: are tables A and B really the tables you think they are?
Ownership: are these tables in another schema which has row level security enabled (although that should through an error message if you violate a policy)?
PL/SQL Developer configuration: is the IDE hiding error messages or are you not spotting them?
Object types: are tables A and B really tables? Could they be views with INSTEAD OF triggers suppressing some of your SQL?
Object types: or could A and B be materialized views and your session has QUERY_REWRITE_INTEGRITY=stale_tolerated?
If that last one seems a bit of a stretch there other similarly esoteric explanations, involving data flashback, pipelined functions and other malarky. This a category of explanation which indicates a colleague is pranking you.
How to proceed:
Try different tools. SQL*Plus (or the new SQL Command Line) may produce a different outcome. Rule out PL/SQL Developer.
Write some test cases. Strive to establish reproducible test cases: given a certain set-up this SQL statement always leads to a given outcome (SQL always sticks or always does not).
Eliminate bugs or "funnies" in the queries you use to check the results.
Use the data dictionary to understand the characteristics and associated objects of the troublesome tables. You need to understand what causes the different outcomes. What distinguishes a row where the UPDATE holds compared to one where it does not?
I have used PL/SQL Developer for over a decade and I have never known it silently undo successful truncate operations. If it can do that, AA should add it as a menu item. It seems more likely that you ran the commands against the wrong database connection.
I can feel your frustration, sorry you're going through this. I am surprised, however, that at a large company, your change control process is like this. I don't work for a large multi-national company, but any changes done to a production database are first approved by management and run by the DBAs (or in your case, your team). Every script that is run does a few things:
Lists the database instance information its connecting to. For example:
select host_name, instance_name, version, startup_time from v$instance;
Spools the output to a file (the DBAs typically use sqlplus, but I'm sure PL/SQL Developer can do the same)
Shows the current date and time (in the beginning and end of the script)
The output file is saved to a change control server (the directory structure makes it easy to pull any changes for a given instance and/or given timeframe)
Exits on any errors:
WHENEVER SQLERROR EXIT SQL.SQLCODE
Any additional checks that need to be run post script (select counts, etc)
Shows each command that is being run (set echo on), including the commits!
All of this would allow you to not only verify that the script was run successfully, but would allow you to CYOA. Perhaps you can talk with your team about putting some of this in place in your own environment. Hope that helps.
I have no way of knowing if my issue is fixed or not, but here is what I've done:
1. I contacted our company's Citrix team to request that they give my team the ability to have several instances of PL/SQL open. This has been done and so will eliminate the need for one instance with multiple DB connections.
2. I contacted the DBA's and had them remove my old profile, then create a new one with a new username.
So far, all SQL I've run under these new conditions has been just fine. However, I have no way of recreating the issue I'm experiencing so I am just continuing on about my business and hoping for the best.
Should I find a few months from now that I have not experienced this issue again I will update this post in case anyone else experiences it.
Thank you all for the accusations of operator error (screenshots prove that this is not operator error but why should you believe me when my own co-workers have accused me of faking the screenshots) and for the moral support.
I need your opinion in this situation. I’ll try to explain the scenario. I have a Windows service that stores data in an Oracle database periodically. The table where this data is being stored is partitioned by date (Interval-Date Range Partitioning). The database also has a dbms_scheduler job that, among other operations, truncates and drops older partitions.
This approach has been working for some time, but recently I had an ORA-00054 error. After some investigation, the error was reproduced with the following steps:
Open one sqlplus session, disable auto-commit, and insert data in the
partitioned table, without committing the changes;
Open another sqlplus session and truncate/drop an old partition (DDL
operations are automatically committed, if I’m not mistaken). We
will then get the ORA-00054 error.
There are some constraints worthy to be mentioned:
I don’t have DBA access to the database;
This is a legacy application and a complete refactoring isn’t
feasible;
So, in your opinion, is there any way of dropping these old partitions, without the risk of running into an ORA-00054 error and without the intervention of the DBA? I can just delete the data, but the number of empty partitions will grow everyday.
Many thanks in advance.
This error means somebody (or something) is working with the data in the partition you are trying to drop. That is, the lock is granted at the partition level. If nobody was using the partition your job could drop it.
Now you say this is a legacy app and you don't want to, or can't, refactor it. Fair enough. But there is clearly something not right if you have a process which is zapping data that some other process is using. I don't agree with #tbone's suggestion of just looping until the lock is released: you can't just get rid of data which somebody is using with establishing why they are still working with data that they apparently should not be using.
So, the first step is to find out what the locking session is doing. Why are they still amending this data your background job wants to retire? Here's a script which will help you establish which session has the lock.
Except that you "don't have DBA access to the database". Hmmm, that's a curly one. Basically this is not a problem which can be resolved without DBA access.
It seems like you have several issues to deal with. Unfortunately for you, they are political and architectural rather than technical, and there's not much we can do to help you further.
How about wrapping the truncate or drop in pl/sql that tries the operation in a loop, waiting x seconds between tries, for a max num of tries. Then use dbms_scheduler to call that procedure/function.
Maybe this can help. Seems to be the same issue as the one that you discribe.
(ignore the comic sans, if you can) :)
Right now the process that we're using for inserting sets of records is something like this:
(and note that "set of records" means something like a person's record along with their addresses, phone numbers, or any other joined tables).
Start a transaction.
Insert a set of records that are related.
Commit if everything was successful, roll back otherwise.
Go back to step 1 for the next set of records.
Should we be doing something more like this?
Start a transaction at the beginning of the script
Start a save point for each set of records.
Insert a set of related records.
Roll back to the savepoint if there is an error, go on if everything is successful.
Commit the transaction at the beginning of the script.
After having some issues with ORA-01555 and reading a few Ask Tom articles (like this one), I'm thinking about trying out the second process. Of course, as Tom points out, starting a new transaction is something that should be defined by business needs. Is the second process worth trying out, or is it a bad idea?
A transaction should be a meaningful Unit Of Work. But what constitutes a Unit Of Work depends upon context. In an OLTP system a Unit Of Work would be a single Person, along with their address information, etc. But it sounds as if you are implementing some form of batch processing, which is loading lots of Persons.
If you are having problems with ORA-1555 it is almost certainly because you are have a long running query supplying data which is being updated by other transactions. Committing inside your loop contributes to the cyclical use of UNDO segments, and so will tend to increase the likelihood that the segments you are relying on to provide read consistency will have been reused. So, not doing that is probably a good idea.
Whether using SAVEPOINTs is the solution is a different matter. I'm not sure what advantage that would give you in your situation. As you are working with Oracle10g perhaps you should consider using bulk DML error logging instead.
Alternatively you might wish to rewrite the driving query so that it works with smaller chunks of data. Without knowing more about the specifics of your process I can't give specific advice. But in general, instead of opening one cursor for 10000 records it might be better to open it twenty times for 500 rows a pop. The other thing to consider is whether the insertion process can be made more efficient, say by using bulk collection and FORALL.
Some thoughts...
Seems to me one of the points of the asktom link was to size your rollback/undo appropriately to avoid the 1555's. Is there some reason this is not possible? As he points out, it's far cheaper to buy disk than it is to write/maintain code to handle getting around rollback limitations (although I had to do a double-take after reading the $250 pricetag for a 36Gb drive - that thread started in 2002! Good illustration of Moore's Law!)
This link (Burleson) shows one possible issue with savepoints.
Is your transaction in actuality steps 2,3, and 5 in your second scenario? If so, that's what I'd do - commit each transaction. Sounds a bit to me like scenario 1 is a collection of transactions rolled into one?
I have a query editor (Toad) looking at the database.
At the same time, I am also debugging an application with its own separate connection.
My application starts a transaction, does some updates, and then makes decisions based on some SELECT statements. Because the update statements (which are many and complex) are not committed yet, the results my application gets from its SELECT are not the same as what I get if I run the same statement in Toad.
Currently I get around this by dumping the query output from the app into a text file, and reading that.
Is there a better way to peek inside another oracle session, and see what that session sees, before the commit is complete?
Another way to ask this is: Under Oracle, can I enable dirty reads between only two sessions, without affecting anyone else's session?
No, Oracle does not permit dirty reads. Also, since the changes may not have physically been written to disk, you won't find them in the data files.
The log writer will write any pending data changes at least every three seconds, so you may be able to use the Log Miner stuff to pick it out from there.
But in general, your best bet is to include your own debugging information which you can easily switch on and off as required.
It's not a full answer I know, but while there are no dead reads, there are locks that can give you some idea what is going on.
In session 1 if you insert a row with primary key 7, then you will not see it when you select from session 2. (That would be a dirty read).
However, if you attempt an insert from session 2 using the primary key of 7 then it will block behind session 1 as it has to wait and see if session 1 will commit or rollback. You can use "WAIT 10" to wait 10 seconds for this to happen.
A similar story exists for updates or anything that would cause a unique constraint violation.
Can you not just set the isolation level in the session you want to peak at to 'read uncommitted' with an alter session command or a logon trigger (I have not tried this myself) temporarily?
What I prefer to do (in general) is place debug statements in the code that remain there permanently, but are turned off in production - Tom Kyte's debug.f package is a useful place to start - http://asktom.oracle.com/tkyte/debugf