I have a 400 line sql query which is throwing exception withing 30 seconds
ORA-03113: end-of-file on communication channel
Below are things to note:
I have set the timeout as 10 mins
There is one last condition when removed resolves this error.
This error came only recently when I analyzed indexes.
The troubling condition is like this:
AND UPPER (someMultiJoin.someColumn) LIKE UPPER ('%90936%')
So my assumption is that the query is getting terminated from the server side apparently because its identified as a resource hog.
Is my assumption appropriate ? How should I go about to fix this problem ?
EDIT: I tried to get the explain plan of faulty query but the explain plan query also gives me an ORA-03113 error. I understand that my query is not very performant but why should that be a reason for ORA-03113 error. I am trying to run the query from toad and there are no alert log or trace generated, my db version is
Oracle9i Enterprise Edition Release 9.2.0.7.0 - Production
One possible cause of this error is a thread crash on the server side. Check whether the Oracle server has generated any trace files, or logged any errors in its alert log.
You say that removing one condition from the query causes the problem to go away. How long does the query take to run without that condition? Have you checked the execution plans for both versions of the query to see if adding that condition is causing some inefficient plan to be chosen?
I've had similar connection dropping issues with certain variations on a query. In my case connections dropped when using rownum under certain circumstances. It turned out to be a bug that had a workaround by adjusting a certain Oracle Database configuration setting. We went with a workaround until a patch could be installed. I wish I could remember more specifics or find an old email on this but I don't know that the specifics would help address your issue. I'm posting this just to say that you've probably encountered a bug and if you have access to Oracle's support site (support.oracle.com) you'll likely find that others have reported it.
Edit:
I had a quick look at Oracle support. There are more than 1000 bugs related to ORA-03113 but I found one that may apply:
Bug 5015257: QUERY FAILS WITH ORA-3113 AND COREDUMP WHEN QUERY_REWRITE_ENABLED='TRUE'
To summarize:
Identified in 9.2.0.6.0 and fixed in 10.2.0.1
Running a particular query
(not identified) causes ORA-03113
Running explain on query does the
same
There is a core file in
$ORACLE_HOME/dbs
Workaround is to set
QUERY_REWRITE_ENABLED to false: alter
system set query_rewrite_enabled =
FALSE;
Another possibility:
Bug 3659827: ORA-3113 FROM LONG RUNNING QUERY
9.2.0.5.0 through 10.2.0.0
Problem: Customer has long running query that consistently produces ORA-3113 errros.
On customers system they receive core.log files but do not receive any errors
in the alert.log. On test system I used I receivded ORA-7445 errors.
Workaround: set "_complex_view_merging"=false at session level or instance level.
You can safely remove the "UPPER" on both parts if you are using the like with numbers (that are not case sensitive), this can reduce the query time to check the like sentence
AND UPPER (someMultiJoin.someColumn) LIKE UPPER ('%90936%')
Is equals to:
AND someMultiJoin.someColumn LIKE '%90936%'
Numbers are not affected by UPPER (and % is independent of character casing).
From the information so far it looks like an back-end crash, as Dave Costa suggested some time ago. Were you able to check the server logs?
Can you get the plan with set autotrace traceonly explain? Does it happen from SQL*Plus locally, or only with a remote connection? Certainly sounds like an ORA-600 on the back-end could be the culprit, particularly if it's at parse time. The successful run taking longer than the failing one seems to rule out a network problem. I suspect it's failing quite quickly but the client is taking up to 30 seconds to give up on the dead connection, or the server is taking that long to write trace and core files.
Which probably leaves you the option of patching (if you can find a relevant fix for the specific ORA-600 on Metalink) or upgrading the DB; or rewriting the query to avoid it. You may get some ideas for how to do that from Metalink if it's a known bug. If you're lucky it might be as simple as a hint, if the extra condition is having an unexpected impact on the plan. Is someMultiJoin.someColumn part of an index that's used in the successful version? It's possible the UPPER is confusing it and you could persuade it back on to the successful plan by hinting it to use the index anyway, but that's obviously rather speculative.
It means you have been disconnected. This not likely to be due to being a resource hog.
I have seen where the connection to the DB is running over a NAT and because there is no traffic it closes the tunnel and thus drops the connection. Generally if you use connection pooling you won't get this.
As #Daniel said, the network connection to the server is being broken. You might take a look at End-of-file on communication channel to see if it offers any useful suggestions.
Share and enjoy.
This is often a bug in the Cost Based Optimizer with complex queries.
What you can try to do is to change the execution plan. E.g. use WITH to pull some subquerys out. Or use the SELECT /*+ RULE */ hint to prevent Oracle from using the CBO. Also dropping the statistics helps, because Oracle then uses another execution plan.
If you can update the database, make a test installation of 9.2.0.8 and see if the error is gone there.
Sometimes it helps to make a dump of the schema, drop everything in it and import the dump again.
I was having the same error, in my case what was causing it was the length of the query.
By reducing said length, I had no more problems.
Related
I'm trying to implement MonetDB in three machines, one master and two replicas in lazy logical replication.
For now I'm trying to implement in only machine with the following commands I took from this old issue in only one machine for now.
Everything goes according to plan until the first problem I have: When trying to create tables or inserting stuff I get the following errors I was not able to find on google:
Error in optimizer wlc: TypeException:user.main[17]:'wlc.predicate' undefined in: X_0:any := wlc.predicate("alpha":str, "id":str);
Error in optimizer wlc: TypeException:user.main[50]:'wlc.predicate' undefined in: X_0:any := wlc.predicate("beta":str, "id":str);
Error in optimizer wlc: TypeException:user.main[77]:'wlc.depend' undefined in: X_0:any := wlc.depend("beta":str, X_1:lng);
I got around this by setting optpipe to minimal_pipe but I wanted to know why this is happening so I don't have to do this.
The second problem I have when trying CALL wlr.replicate:
Perhaps a missing wlr.master() call.
How do I correctly set-up replication?
Thanks in advance.
The wlc/wlr features are experimental and de facto deprecated in current releases of MonetDB and completely removed starting from the next major release. Replication in MonetDB is a topic currently under revision. You might be better off formulating a feature request on MonetDB's githup page.
You might also consider looking into the concepts of replicate and remote tables. But those are definitely not solutions by themselves and if used as such, implement replication on the SQL layer instead of the infrastructural layer.
But on the short run, I do not expect that the open source community can help you out here much. Consider commercial support otherwise if feasible.
In an environment where online processing and batch processing is simultaneous, is there a way to devise the parameter open_cursors?
I am trying to look so that I can optimize our testing environment for open_cursor parameter. I have already checked the Oracle Performance Tuning guide but still I am unable to understand how to arrive to this number.
Will running load runner tests help me get to this number? Please let me know if any more info is needed to help.
Do you actually have a problem? open_cursors is a limit on the number of cursors a single session can have open. It is not a system-wide limit. The proper value isn't influenced by load or what happens in some other session.
The default value is almost always more than sufficient for a properly written application. If you have an application that has long-running sessions and cursor leaks, increasing the value may let you run longer before you start to encounter problems while you find and address the cursor leaks but if you have a leak you'll eventually run out no matter what your setting. In the vast majority of cases, when people get an error related to open_cursors, the proper solution is to find and fix the bug that is leaking cursors rather than to change open_cursors.
The error I'm getting is "no more data to read from socket" on this line
COALESCE( NonRewrite.SALES_LOG_DATE, SYSDATE) "SoldDate"
Oddly, I can just do NonRewrite.SALES_LOG_DATE and there are no null values anyway. If I do SYSDATE alone it gives me the same error.
If I replace SYSDATE with TO_DATE('18-MAR-2016')), I do not receive an error.
"no more data to read from socket" is a generic error that doesn't really tell you the problem. That error means a database process crashed so hard it didn't even raise a proper exception and the connection died unexpectedly.
When that happens Oracle stores an error message in the alert log, which you can find in the path found from this query: select value from v$parameter where name = 'background_dump_dest';. Look for a file named alert*.log.
In that file there will probably be an ORA-600 or ORA-7445 error around the same time as the exception. The error usually has several parameters, for example ORA-00600: internal error code, arguments: [ktfbtgex-7], [1015817], [1024], [1015816], [], [], [], [].
The first parameter is usually the most important one. If you're lucky you can Google it and find an answer. But usually you'll need to login to support.oracle.com and search for that error. There's a special page for that, search for the "ora-600 tool". That will bring up a page to search for the first parameter of the error message.
Hopefully that tool will bring up specific documents that explain the problem. There may be a patch, or a workaround, or possibly no information at all. It's usually easiest to workaround the problem by avoiding some very specific combination of features, possibly by slightly re-writing the query.
Post the error message, the exact Oracle version, and the entire query and someone may be able to help. If the query is large you'll want to shrink it as much as possible. Shrinking the query and making a reproducible test case may take a few hours but is necessary to truly understand the problem. People who don't spend the time doing that usually end up avoiding important features and give bad advice to other developers like "avoid SYSDATE!".
These types of errors may take a long time to fix.
We have a query that takes 2 seconds to run in Sql Server Management Studio but it takes 13 seconds to be shown on a client screen.
I used dotTrace to profile my source code and noticed there is this SNIReadSync method (part of ADO.net assemblies)that takes a lot of time to do its job(9 seconds).I ran my source over server so I could omit the network effects and the result was the same.
It doesn't matter if I'm using OleDBConnection or SqlConnection.
It doesn't matter if I'm using a DataReader or a DataSet.
Connection pooling does not solve this issue(as my result shows).
I googled this issue and I couldn't find an answer to the question that what this method is actually doing and how we can improve it.
here's what I found on StakOverFlow that's not helpful either:
https://stackoverflow.com/questions/1610874/snireadsync-executing-between-120-500-ms-for-a-simple-query-what-do-i-look-for
Ignoring SNIReadSync for a moment (I think this might be a red herring).
The symptoms you are describing sound like an incorrectly cached query plan.
Please update your statistics (or rebuild indexes) and see if it still occurs.
I'm in a situation where I have many connections that are "IDLE in transaction". I
spotted them myself. I could kill them, but that wouldn't prevent them
from happening again. In fact, it happens regularily.
Is there a way to get the list of SQL statements that were previously
executed as part of a given transaction?
If I could get that, it would be a hell lot easier to figure out the
misbehaving client.
There is some work being done right now on the pgsql-hackers mailing list toward adding exactly this capability, under the title "display previous query string of idle-in-transaction". Where that looks to be going is that pg_stat_activity will have a new column named something like "last_query" that includes the info you want.
Until that's done and available in probably the next release, the suggestion from depesz is probably as good as you're going to get here--unless you want to start grabbing early patches working on this feature as they trickle out.
Basically - you have to turn on all statements logging, with time of execution. best way to achieve it is to use log_min_duration_statement with value of 0, and using log_line_prefix such that is it will include information required to match lines coming form the same backend.
I generally use log_line_prefix = '%m %u#%d %p %r '.
Afterward you can write some tool to help you hunt idle-in-transaction, or you can use mine.