I get a strange behaviour in my system.
There is a process "proc_1". This process is fired on some group of people. When this group of people is under 5000-6000 everything works fine. But later it was fired on group in amount of 12000(took about 3 hours) and i get this:
ora-01086 : save point was not established or invalid
Something crashed and tries to go back to savepoint, but there is none.
At first i checked if there is no mistakely added commits/rollbacks - looks fine.
Then, i put some bug in a code to crash any process, and after 10 mintues, 20 minutes, a hour every process crashed and i get an actual problem cause (zero-divie).
I have a few guesses left but out of curiosty:
Could savepoint die in session when it took too much time?
Related
My Oracle DBA have setup a task with following repeat_interval:
Start Date :"30/JAN/20 08:00AM"
Repeat_interval: "FREQ=DAILY; INTERVAL=0; BYMINUTE=15"
Can I ask what is "Interval=0" means?
Does it means this task will run daily from 8AM, and will repeat every 15 mins until success?
I tried to get the answer from Google, but what I find is what is Interval=1, but nothing for 0.
So would be great if anyone can share me some light here.
Thanks in advance!
INTERVAL is the number of increments of the FREQ value between executions. I believe in this case that a value of 0 or 1 would be the same. The schedule as shown would execute once per day (FREQ=DAILY), at approximately 15 minutes past a random hour (BYMINUTE=15, but BYHOUR and BYSECOND are not set).
Schedule has nothing to do with whether or not the previous execution succeeded or not. Start Date is only the date at which the job was enabled, not when it actually starts processing.
If you want it to run every 15 minutes from the moment you enable it, you should set as follows:
FREQ=MINUTELY; INTERVAL=15
If you want it to run exactly on the quarter hour, then this:
FREQ=MINUTELY; BYMINUTE=0,15,30,45; BYSECOND=0
If you want it to run every day at 8am, then this:
FREQ=DAILY; BYHOUR=8; BYMINUTE=0; BYSECOND=0
I have these instructions :
$prd = $modelp->loadByAttribute('sku', $psku);
$prd->setStatus((int)$status);
$prd->save();
I checked and indeed the product had saved and status changed, however the last instruction runs for hours. Maybe you think instructions after, no I am sure (I can check that in a written list) and if I skip these instructions the program finish quickly. I read all products and only one product have to be changed, there it hangs.
So I have to terminate this program and to ask to my customer to change such product (sometimes one) manually, because of this problem ...
What can be the reason (version 1.5) ?
I got this error finally:
SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction
I am busy alone on the system, what can be the reason ?
This has to do probably with the re-index of prices UPDATE ON SAVE ... that gives a lock sometimes ..
I can't simply stop it and it continues to read blocks and use rollback segments. It's a simple select but I fear it won't stop...
The session is marked as killed.
What can I do?
I've found some extra info on the following link:
http://oracleunix.wordpress.com/2006/08/06/alter-system-kill-session-marked-for-killed-forever/
but if I launche the following query it returns 241 records. What does it mean?
SELECT spid
FROM v$process
WHERE NOT EXISTS (SELECT 1
FROM v$session
WHERE paddr = addr);
If the session you kill had a large open transaction, it will have to roll back all those changes. So, you should see amount of undo being used go down, not up.
Try this query:
select vt.used_ublk from v$transaction vt, v$session vs where vs.taddr=vt.addr and vs.sid=&&sid;
Now, if you run the above query multiple times in succession, is used_ublk falling or increasing? If it's falling, then the session is rolling back.
Hope that helps.
I'm going to assume that you session you killed was just a select as you state and that you're operating on a *nix variant.
If you're running an update or delete then waiting for the rollback to complete would be best. You can check the amount of rollback by using the following query, which I've shamelessly stolen from orafaq because I don't remember these things off the top of my head:
select rn.Name "Rollback Segment", rs.RSSize/1024 "Size (KB)", rs.Gets "Gets"
, rs.waits "Waits", (rs.Waits/rs.Gets)*100 "% Waits"
, rs.Shrinks "# Shrinks", rs.Extends "# Extends"
from sys.v_$rollName rn, sys.v_$rollStat rs
where rn.usn = rs.usn;
First off a select shouldn't be using using rollback... if it does then you've probably got a function that does some DML somewhere, which isn't a very good idea. You also don't mention whether this select is using a database link, if it is that clears things up a little bit.
If the select is not using a database link and is not doing any DML, then the link you've found will do everything you need. Your 241 rows, should mostly be identical - there may be more than one value if you have more than one process that has this problem. I would change the query to:
select p.*
from v$process p
left outer join v$session s
on p.addr = s.paddr
where s.saddr is null
This means that you can check the username that owns the process the terminal it was run from and program that is running before doing anything drastic. You don't want to go around killing the wrong thing.
You can then go direct to your box and issue the sigterm kill 1234. This issues a terminate signal to the process at the level of your OS and should get rid of it.
As an addendum, if your session is using a database link then killing it on the box it was running from is normally not enough. You may also have to kill it on the box that you're selecting from. Try the standard Oracle kill first and then scale it to OS level.
This should work. However, it's possible to get a lot more drastic; I've had to recently after a slave VM started accepting connections incoming and then not sending an error or returning a value.
Warning: The more violent you get to the box the more violent it will be to you and the more likely things are to go wrong.
The next step up from a sigterm is a sigkill. This is a signal to the OS to kill a process without asking any questions. On *nix this is kill -9 1234. This should rarely be necessary. If you were doing DML it will stop any rollback and may make it difficult to recover the database to a consistent state in the event of failure.
If this still doesn't work then you have major problems. In the example given with the VM we ended up doing the following in order to stop the problem. Most of these are not recommended :-).
Oracle - alter system kill 123
OS - kill 1234
OS - kill -9 1234
Oracle - shutdown immediate - this is actually politer than kill -9 ..... It doesn't send a sigkill to the OS and waits for processes to rollback etc. But it's always good to be polite to your database.
Oracle - shutdown abort - this is about the same as a sigkill. It's a signal to the database to stop everything immediately and die ( confusing terminology I know ).
OS - reboot
Yes that's right, reboot didn't work. Once you've reached this stage you better hope you're using a VM. We ended up deleting it...
I'm having a problem where no matter what I try all Passenger instances are destroyed after an idle period (5 minutes, but sometimes longer). I've read the Passenger docs and related questions/answers on Stack Overflow.
My global config looks like this:
PassengerMaxPoolSize 6
PassengerMinInstances 1
PassengerPoolIdleTime 300
And my virtual config:
PassengerMinInstances 1
The above should ensure that at least one instance is kept alive after the idle timeout. I'd like to avoid setting PassengerPoolIdleTime to 0 as I'd like to clean up all but one idle instance.
I've also added the ruby binary to my CSF ignore list to prevent the long running process from being culled.
Is there somewhere else I should be looking?
Have you tried to set the PassengerMinInstances to anything other than 1 like 3 and see that work?
Ok, I found the answer for you on this link: http://groups.google.com/group/phusion-passenger/browse_thread/thread/7557f8ef0ff000df/62f5c42aa1fe5f7e . Look at the last comment by Phusion guy.
Is there a way to ensure that I always have 10 processes up and
running, and that each process only serves 500 requests before being
shut down?
"Not at this time. But the current behavior is such that the next time
it determines that more processes need to be spawned it will make sure
L at least PassengerMinInstances processes exist."
I have to say their documentation doesn't seem to match what the current behavior.
This seems to be quite a common problem for people running Apache on WHM/cPanel:
http://techiezdesk.wordpress.com/2011/01/08/apache-graceful-restart-requested-every-two-hours/
Enabling piped logging sorted the problem out for me.
We have our application calling to two Oracle databases using two connections (which are kept open through out the application). For certain functionality, we use distributed transactions. We have Enlist=false in the connection string and manually enlist the connection to the transaction.
The problem comes with a scenario where, we update the same record very frequently within a distributed transaction, on which we see a delay to see the commited data in the previous run.
ex.
using (OracleConnection connection1 = new OracleConnection())
{
using(OracleConnection connection2 = new OracleConnection())
{
connection1.ConnectionString = connection1String;
connection1.Open();
connection2.ConnectionString = connection2String;
connection2.Open();
//for 100 times, do an update
{
.. check the previously updated value
connection1.EnlistTransaction(currentTransaction);
connection2.EnlistTransaction(currentTransaction);
.. do an update using connection1
.. do some updates with connection2
}
}
}
as in the above code fragment, we do update and check the previously updated value in the next iteration. The issues comes up when we run this for a single record frequently, on which we don't see the committed update in the last iteration in the next iteration even though it was committed in the previous iteration. But when this happens this update is visible in other applications in a very very small delay, and even within our code it's visible if we were to debug and run the line again.
It's almost like delay in the commit even though previous commit returned from the code.
Any one has any ideas ?
It turned out that I there's no way to control this behavior through ODAC. So the only viable solution was to implement a retry behavior in our code, since this occurs very rarely and when it happens, delay 10 seconds and retry the same.
Additional details on things I that I found on this can be found here.