I need your opinion in this situation. I’ll try to explain the scenario. I have a Windows service that stores data in an Oracle database periodically. The table where this data is being stored is partitioned by date (Interval-Date Range Partitioning). The database also has a dbms_scheduler job that, among other operations, truncates and drops older partitions.
This approach has been working for some time, but recently I had an ORA-00054 error. After some investigation, the error was reproduced with the following steps:
Open one sqlplus session, disable auto-commit, and insert data in the
partitioned table, without committing the changes;
Open another sqlplus session and truncate/drop an old partition (DDL
operations are automatically committed, if I’m not mistaken). We
will then get the ORA-00054 error.
There are some constraints worthy to be mentioned:
I don’t have DBA access to the database;
This is a legacy application and a complete refactoring isn’t
feasible;
So, in your opinion, is there any way of dropping these old partitions, without the risk of running into an ORA-00054 error and without the intervention of the DBA? I can just delete the data, but the number of empty partitions will grow everyday.
Many thanks in advance.
This error means somebody (or something) is working with the data in the partition you are trying to drop. That is, the lock is granted at the partition level. If nobody was using the partition your job could drop it.
Now you say this is a legacy app and you don't want to, or can't, refactor it. Fair enough. But there is clearly something not right if you have a process which is zapping data that some other process is using. I don't agree with #tbone's suggestion of just looping until the lock is released: you can't just get rid of data which somebody is using with establishing why they are still working with data that they apparently should not be using.
So, the first step is to find out what the locking session is doing. Why are they still amending this data your background job wants to retire? Here's a script which will help you establish which session has the lock.
Except that you "don't have DBA access to the database". Hmmm, that's a curly one. Basically this is not a problem which can be resolved without DBA access.
It seems like you have several issues to deal with. Unfortunately for you, they are political and architectural rather than technical, and there's not much we can do to help you further.
How about wrapping the truncate or drop in pl/sql that tries the operation in a loop, waiting x seconds between tries, for a max num of tries. Then use dbms_scheduler to call that procedure/function.
Maybe this can help. Seems to be the same issue as the one that you discribe.
(ignore the comic sans, if you can) :)
Related
I apologize if this is too vague, but it is a random issue that occurs with many types of statements. Google and Stack Overflow searches have failed me. Here is what I am experiencing, I hope that someone out there has seen or at least heard of this happening and possibly knows of a solution.
From time to time, with no apparent rhyme or reason, statements that I run through PL/SQL Developer against our Oracle databases do not "stick". Last week I ran an update on table A, a commit for the update statement, then a truncate on table B and an insert to table B followed by another commit. Everything seemed to work fine, as in I received no errors. I was, of course, able to query the changes and see that they were made. However, upon logging out and then back in, the changes had not been committed. Even the truncate command had not worked "stuck" - and truncates do not need a commit performed.
Some details that may be helpful: I am logging into the database server through PL/SQL on a shared account that is used by my team only to gain access to the schema (multiple schemas on each server, each schema has one shared login/PW). Of the 12 people on my team, I am the only one experiencing this issue. I have asked our database administration team to investigate my profile setup and have been told that my profile looks the same as my teammates' profiles. We are forced to go through Citrix to connect to our production database servers. I can only have one instance of PL/SQL open at any time through Citrix, so I typically have PL/SQL connected to several schemas, but I have never been running SQL on more than one schema simultaneously. I'm not even sure if that's possible, but I thought I would mention it. I typically have 3-4 windows open within PL/SQL, each connected to a different schema.
My manager was directly involved in a case where something similar to this happened. I ran four update commands, and committed each one in between; then he ran a select statement only to find that my updates had not actually committed.
I hope that one of my fellow Overflowers' has seen or heard of this issue, or at least may be able to provide me with a direction to follow to attempt to get to the bottom of this.
"it has begun to reflect poorly on me and damage my reputation in the company."
What would really reflect poorly on you would be you believing that an Oracle RDBMS is a magical or random device, or, even worse, sentient and conducting a personal vendetta against you. Computers may seem vindictive but that is always us projecting onto them ;-)
The way to burnish your reputation would be through an informed investigation of the situation. Databases do not randomly lose transactions. So, what is going on?
Possible culprits:
Triggers: does table A have an UPDATE trigger which suppresses some of your SQL?
Synonyms: are tables A and B really the tables you think they are?
Ownership: are these tables in another schema which has row level security enabled (although that should through an error message if you violate a policy)?
PL/SQL Developer configuration: is the IDE hiding error messages or are you not spotting them?
Object types: are tables A and B really tables? Could they be views with INSTEAD OF triggers suppressing some of your SQL?
Object types: or could A and B be materialized views and your session has QUERY_REWRITE_INTEGRITY=stale_tolerated?
If that last one seems a bit of a stretch there other similarly esoteric explanations, involving data flashback, pipelined functions and other malarky. This a category of explanation which indicates a colleague is pranking you.
How to proceed:
Try different tools. SQL*Plus (or the new SQL Command Line) may produce a different outcome. Rule out PL/SQL Developer.
Write some test cases. Strive to establish reproducible test cases: given a certain set-up this SQL statement always leads to a given outcome (SQL always sticks or always does not).
Eliminate bugs or "funnies" in the queries you use to check the results.
Use the data dictionary to understand the characteristics and associated objects of the troublesome tables. You need to understand what causes the different outcomes. What distinguishes a row where the UPDATE holds compared to one where it does not?
I have used PL/SQL Developer for over a decade and I have never known it silently undo successful truncate operations. If it can do that, AA should add it as a menu item. It seems more likely that you ran the commands against the wrong database connection.
I can feel your frustration, sorry you're going through this. I am surprised, however, that at a large company, your change control process is like this. I don't work for a large multi-national company, but any changes done to a production database are first approved by management and run by the DBAs (or in your case, your team). Every script that is run does a few things:
Lists the database instance information its connecting to. For example:
select host_name, instance_name, version, startup_time from v$instance;
Spools the output to a file (the DBAs typically use sqlplus, but I'm sure PL/SQL Developer can do the same)
Shows the current date and time (in the beginning and end of the script)
The output file is saved to a change control server (the directory structure makes it easy to pull any changes for a given instance and/or given timeframe)
Exits on any errors:
WHENEVER SQLERROR EXIT SQL.SQLCODE
Any additional checks that need to be run post script (select counts, etc)
Shows each command that is being run (set echo on), including the commits!
All of this would allow you to not only verify that the script was run successfully, but would allow you to CYOA. Perhaps you can talk with your team about putting some of this in place in your own environment. Hope that helps.
I have no way of knowing if my issue is fixed or not, but here is what I've done:
1. I contacted our company's Citrix team to request that they give my team the ability to have several instances of PL/SQL open. This has been done and so will eliminate the need for one instance with multiple DB connections.
2. I contacted the DBA's and had them remove my old profile, then create a new one with a new username.
So far, all SQL I've run under these new conditions has been just fine. However, I have no way of recreating the issue I'm experiencing so I am just continuing on about my business and hoping for the best.
Should I find a few months from now that I have not experienced this issue again I will update this post in case anyone else experiences it.
Thank you all for the accusations of operator error (screenshots prove that this is not operator error but why should you believe me when my own co-workers have accused me of faking the screenshots) and for the moral support.
The app is connected to an oracle 11G database using the JDBC driver provided from the official website. When many users (Around 50) from different instances connected to the same schema start using the application, i experience some freezes all around the app and when i run a query to get the locking sessions and the locked objects i find only "Row Exclusive" lock type, which normally should not lock all the table and permits multiple sessions to perfom DML queries. Thus my question is when can a row exclusive table lock the whole table or else provoque these freezes.
Note: i have looked around in forums and saw some MAXTRANS and ITL configurations, could these parameters be generating these freezes ?
Thank you
i think you have your terms confused.. "Row Exclusive" locks mean 'i have locked this row.. no other session is allowed to update it'.
so if you have 50 sessions all trying to update or delete a specific row then yes.. you are going to have contention. and that will seriously limit your performance.
so I would guess that its possible your application is missing a commit statment that would free the lock after the row has been modified.
you say you are using sequences.. are you using an actually oracle sequence (ie create sequence my_seq; ) or are you doing to custom thing that like select max(id)+1 from sequence_table which would be another bad idea.
Maybe it's too early to blame Oracle. It can be a servlet container configuration such as not enough exec threads. Or it can be an internal contention. Many things can go wrong. A quick way to identify the bottleneck is to get a thread dump when the application is experiencing "some freezes all around the app" and see where your threads as stuck. You can get a thread dump by sending kill -3 to your Java process. Post it here and will be happy to look at it.
We're supposed to update some columns in a table 'tab1' with some values(which can be picked up from a different table 'tab2'). Now 'tab1' is getting new records inserted almost every few seconds(from MQ by a different system).
We want to design a solution that will update 'tab1' as soon as there is a new record added to 'tab1'. It doesn't have to be done in the same moment as the record is added, but the sooner its updated, the better. We were considering what can be the best way to do it:
1) First we thought of a 'before insert' trigger on tab1, so we can update the record - but that design was vetted out by our Architect, since the organization doesn't allow use of database triggers(don't know why, but that is a restriction, we have been asked to live with)
2) Second we thought, we will create a stored procedure which will perform the updates to records in 'tab1'. This stored procedure will be called within an long-running loop from a shell script. After every iteration there will be a pause of lets say 3 secs and then next loop will kick off, which will again call the stored proc. So this job will run 12 AM to 11:59 PM and then restarted every night.
My question is - is there a database only solution to this? Any other solutions are also welcome, but simplicity of design will be a huge plus. One colleague was wondering if there is a 'trigger-like' solution, which will perform the job within the database itself - so we don't have to write a shell script.
Any pointers will be appreciated!
Triggers The obvious solution.
DBMS_SCHEDULER Another obvious solution.
Continuous Query Notification This would be a "trigger-like" solution. It's meant to call an application when the results of a specific query would be different. But you can call PL/SQL instead of an application, and the query could be a simple select * from tab1; which would fire on any table changes. Normally I'd hope an architect would be to look at this solution and say, "a trigger would be a lot simpler".
DBMS_JOBS This is the old version of DBMS_SCHEDULER and is not as good. But it's different and maybe it won't be caught as an unauthorized feature.
Ignore the Architect The problem isn't that he disapproved of using triggers or jobs; there may be legitimate reasons to ban those technologies. The problem is that he rejected a sound idea without clearly articulating why it wasn't allowed. If he understood databases, or cared about your project, or acted like a professional, he would have said something like, "Oh, I'm sorry, I know that's the typical way to do this, but we don't allow it because of X, Y, Z."
To answer your questions:
Q: Is there a database only solution to this?
Unlikely, given all the limitations on your architecture.
Q: Any other solutions are also welcomed
It seems your likely solution is to have your application handle what would normally be handled by a trigger or stored procedure. Just do it all in one transaction.
Right now the process that we're using for inserting sets of records is something like this:
(and note that "set of records" means something like a person's record along with their addresses, phone numbers, or any other joined tables).
Start a transaction.
Insert a set of records that are related.
Commit if everything was successful, roll back otherwise.
Go back to step 1 for the next set of records.
Should we be doing something more like this?
Start a transaction at the beginning of the script
Start a save point for each set of records.
Insert a set of related records.
Roll back to the savepoint if there is an error, go on if everything is successful.
Commit the transaction at the beginning of the script.
After having some issues with ORA-01555 and reading a few Ask Tom articles (like this one), I'm thinking about trying out the second process. Of course, as Tom points out, starting a new transaction is something that should be defined by business needs. Is the second process worth trying out, or is it a bad idea?
A transaction should be a meaningful Unit Of Work. But what constitutes a Unit Of Work depends upon context. In an OLTP system a Unit Of Work would be a single Person, along with their address information, etc. But it sounds as if you are implementing some form of batch processing, which is loading lots of Persons.
If you are having problems with ORA-1555 it is almost certainly because you are have a long running query supplying data which is being updated by other transactions. Committing inside your loop contributes to the cyclical use of UNDO segments, and so will tend to increase the likelihood that the segments you are relying on to provide read consistency will have been reused. So, not doing that is probably a good idea.
Whether using SAVEPOINTs is the solution is a different matter. I'm not sure what advantage that would give you in your situation. As you are working with Oracle10g perhaps you should consider using bulk DML error logging instead.
Alternatively you might wish to rewrite the driving query so that it works with smaller chunks of data. Without knowing more about the specifics of your process I can't give specific advice. But in general, instead of opening one cursor for 10000 records it might be better to open it twenty times for 500 rows a pop. The other thing to consider is whether the insertion process can be made more efficient, say by using bulk collection and FORALL.
Some thoughts...
Seems to me one of the points of the asktom link was to size your rollback/undo appropriately to avoid the 1555's. Is there some reason this is not possible? As he points out, it's far cheaper to buy disk than it is to write/maintain code to handle getting around rollback limitations (although I had to do a double-take after reading the $250 pricetag for a 36Gb drive - that thread started in 2002! Good illustration of Moore's Law!)
This link (Burleson) shows one possible issue with savepoints.
Is your transaction in actuality steps 2,3, and 5 in your second scenario? If so, that's what I'd do - commit each transaction. Sounds a bit to me like scenario 1 is a collection of transactions rolled into one?
I'm trying to test the utility of a new summary table for my data.
So I've created two procedures to fetch the data of a certain interval, each one using a different table source. So on my C# console application I just call one or another. The problem start when I want to repeat this several times to have a good pattern of response time.
I got something like this: 1199,84,81,81,81,81,82,80,80,81,81,80,81,91,80,80,81,80
Probably my Oracle 10g is making an inappropriate caching.
How I can solve this?
EDIT: See this thread on asktom, which describes how and why not to do this.
If you are in a test environment, you can put your tablespace offline and online again:
ALTER TABLESPACE <tablespace_name> OFFLINE;
ALTER TABLESPACE <tablespace_name> ONLINE;
Or you can try
ALTER SYSTEM FLUSH BUFFER_CACHE;
but again only on test environment.
When you test on your "real" system, the times you get after first call (those using cached data) might be more interesting, as you will have cached data. Call the procedure twice, and only consider the performance results you get in subsequent executions.
Probably my Oracle 10g is making a
inappropriate caching.
Actually it seems like Oracle is doing some entirely appropriate caching. If these tables are going to be used a lot then you would hope to have them in cache most of the time.
edit
In a comment on Peter's response Luis said
flushing before the call I got some
interesting results like:
1370,354,391,375,352,511,390,375,326,335,435,334,334,328,337,314,417,377,384,367,393.
These findings are "interesting" because the flush means the calls take a bit longer than when the rows are in the DB cache but not as long as the first call. This is almost certainly because the server has stored the physical records in its physical cache. The only way to avoid that, to truely run against an empty cache is to reboot the server before every test.
Alternatively learn to tune queries properly. Understanding how the database works is a good start. And EXPLAIN PLAN is a better tuning aid than the wall-clock. Find out more.