We have a number of .NET applications that use the Oracle Data Provider for .NET. This generally works fine, but occassionally we get a burst of errors from one particular application (sometimes other but very rarely), where its claiming something went wrong with the Oracle Data Access Client. The errors are sometimes slightly different, but all relate to User Defined Types. One example of a failure would be;
Inner Exception:
Message: Object reference not set to an instance of an object.
Source: Oracle.DataAccess
Stack Trace:
at Oracle.DataAccess.Types.OracleUdtDescriptor.get_UdtTypeName()
at Oracle.DataAccess.Client.OracleDataReader.RetrieveSchemaTable(DataTable&
dataTable, Boolean isFromEx)
at Oracle.DataAccess.Client.OracleDataReader.GetSchemaTableCopy(DataTable&
dataTable, Boolean isFromEx)
at Oracle.DataAccess.Client.OracleDataReader.GetSchemaTable()
The SQL that causes this is;
SELECT a.SECTION_NO,
INITCAP(a.SECTION_NAME) AS SECTION_NAME,
a.SITE_USRN,
a.NETWORK_HIERARCHY,
a.ROAD_CLASS,
a.CENTRAL_ASSET_ID,
a.GEOM,
SDO_NN_DISTANCE(1) AS dist
FROM SCHEMANAME.TABLE_NAME a
WHERE SDO_NN(a.GEOM,
sdo_geometry(2001, 27700, sdo_point_type(366646,101677,NULL), NULL, NULL), 'sdo_num_res=1',1)
= 'TRUE'
There's no pattern to when this happens, and 95% of the time these requests go through with no problem, but at least once a day this will happen. It'll keep occuring for 1 or 2 minutes (so I'll get 10-20 error alerts, all saying about the same thing), then it'll sort itself out and it'll be fine again. I really don't understand what the problem is. I don't know much about User Defined Types, and searching for the particular error results in very few relevant results.
Another example of a similar error message we sometimes get;
Inner Exception:
Message: Object reference not set to an instance of
an object.
Source: Oracle.DataAccess
Stack Trace:
at Oracle.DataAccess.Types.OracleUdtDescriptor.GetMetaDataTable()
at Oracle.DataAccess.Client.OracleDataReader.GetCachedOracleUdtDescriptor(Int32
index)
at Oracle.DataAccess.Client.OracleDataReader.RetrieveSchemaTable(DataTable&
dataTable, Boolean isFromEx)
at Oracle.DataAccess.Client.OracleDataReader.GetSchemaTableCopy(DataTable&
dataTable, Boolean isFromEx)
at Oracle.DataAccess.Client.OracleDataReader.GetSchemaTable()
As you can see, they are similar, but not exactly the same which makes tracking it down even harder. The only possibilities I can think of is it is something to do with pulling back the GEOM column, but we do that in a huge number of places and don't get this error, or possibly something to do with pulling back the SDO_NN number, but again, we do that in multiple places with no errors.
As far as I can tell there are no special or exotic column types being returned there, apart from the GEOM, which, considering we are using Oracle Spatial, shouldn't be a problem.
One fix I have attempted is to add a setting to my Web.config called StatementCacheWithUdts.
<oracle.dataaccess.client>
<settings>
<add name="StatementCacheWithUdts" value="0"/>
</settings>
</oracle.dataaccess.client>
This was based on a similar problem I found somewhere (cannot find the link anymore, stupidly forgot to write it down), but this didn't resolve the problem.
Does anyone have any pointers whatsoever as to what I can try?
Further Details
Oracle version: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
ODP.Net version: 4.121.2.0 ODAC RELEASE 3
Related
When trying to run a fairly basic query using the driver provided by Simba, I'm running into issues where the "nanosecond" value is negative, causing IllegalArgumentException.
When writing a simple query that returns a Timestamp value, what comes back is an epoch value that is initially stored in a Double. Going through and debugging for example, I can see that the value coming back from the query is "1.498571169792E9". This corresponds to a timestamp of "Tuesday, June 27, 2017 1:46:09.792 PM" according to epochconverter.com, which is exactly what it should be.
Continuing to step through the code, we eventually try to use BQCoreUtils.convertUnixToTimestamp(). Now, while I've tried to disassemble the class (thanks IntelliJ!), I can't quite figure out what's going on. It eventually tries to create a new TimestampTz() which is an extension of java.sql.Timestamp, but the value getting passed for nanos is negative. This of course prompts Java to throw an IllegalArgumentException, but I can't figure out what I need to do to avoid this.
I also don't know if there's a simpler explanation for what's going on. Ultimately though, it appears that there's a driver bug. BQCoreUtils.convertUnixToTimestamp doesn't properly safeguard against the nanos calculation being non-negative.
The dumb question then is: has anyone else experience issues querying BigQuery where Timestamp values are triggering exceptions?
Update: Since this is happening in Timestamp created by JDBC driver, it does appear to be a bug in JDBC driver itself. Please file it under https://issuetracker.google.com/issues?q=componentid:187149.
Original:
The timestamp resolution in BigQuery is microseconds, and it looks like the value you are providing is in seconds, so you should multiply it by 1000000.
With reference to Google issue tracker,
This should be resolved with versions newer than 1.1.1 of the drivers, which also addressed other timestamp-related issues.
If any issue persists, please report at Google issue tracker they will re-open to examine.
The error I'm getting is "no more data to read from socket" on this line
COALESCE( NonRewrite.SALES_LOG_DATE, SYSDATE) "SoldDate"
Oddly, I can just do NonRewrite.SALES_LOG_DATE and there are no null values anyway. If I do SYSDATE alone it gives me the same error.
If I replace SYSDATE with TO_DATE('18-MAR-2016')), I do not receive an error.
"no more data to read from socket" is a generic error that doesn't really tell you the problem. That error means a database process crashed so hard it didn't even raise a proper exception and the connection died unexpectedly.
When that happens Oracle stores an error message in the alert log, which you can find in the path found from this query: select value from v$parameter where name = 'background_dump_dest';. Look for a file named alert*.log.
In that file there will probably be an ORA-600 or ORA-7445 error around the same time as the exception. The error usually has several parameters, for example ORA-00600: internal error code, arguments: [ktfbtgex-7], [1015817], [1024], [1015816], [], [], [], [].
The first parameter is usually the most important one. If you're lucky you can Google it and find an answer. But usually you'll need to login to support.oracle.com and search for that error. There's a special page for that, search for the "ora-600 tool". That will bring up a page to search for the first parameter of the error message.
Hopefully that tool will bring up specific documents that explain the problem. There may be a patch, or a workaround, or possibly no information at all. It's usually easiest to workaround the problem by avoiding some very specific combination of features, possibly by slightly re-writing the query.
Post the error message, the exact Oracle version, and the entire query and someone may be able to help. If the query is large you'll want to shrink it as much as possible. Shrinking the query and making a reproducible test case may take a few hours but is necessary to truly understand the problem. People who don't spend the time doing that usually end up avoiding important features and give bad advice to other developers like "avoid SYSDATE!".
These types of errors may take a long time to fix.
When running a sproc with SqlDataAdapter.fill(), I noticed it was taking upwards of 90 seconds when running the same sproc in management studio took only 1-2 seconds. I started messing around with the parameters to try to find the issue, and I eventually did, though it's a strange one. I discovered that if I simply declared three new variables in the sproc and directly copied the contents of the parameters into them, and then used those new variables in the body of the sproc, the fill() method dropped to 1-2 seconds just like running the sproc directly in management studio. In other words, doing this:
CREATE PROCEDURE [dbo].[TestProc]
#location nvarchar(100), #startTime datetime, #endTime datetime
AS
declare #location2 nvarchar(100), #endTime2 datetime, #startTime2 datetime
set #location2 = #location
set #startTime2 = #startTime
set #endTime2 = #endTime
--... query using #location2, #startTime2, #endTime2
If I changed even just one of the references in the query body from #startTime2 back to #startTime (the actual parameter passed in from C#), the query jumped right back up to around 90s or even longer.
SO.... why in the world does SQLDataAdapter or SQL Server care what I do with its parameters once they're passed into the sproc? Why would this affect execution time? Any guidance of how to root out this issue further is greatly appreciated. Thanks!
Edit: Although I could've sworn there was a difference between running the query from C# using SqlDataAdapter and using management studio, as of right now, I can't replicate the difference. Now, management studio also takes > 90 seconds to run the sproc when I do NOT copy the parameters. This is a huge relief, because it means the problem isn't somehow with C#, and it just a more run of the mill (though still strange) SQL Server issue. One of the guys on my team that's an excellent SQL guy is looking at the execution path of the sproc when run with and without first copying the parameters. If we figure it out, I'll post the answer here. Thanks for the help so far!
It's undoubtedly a case of parameter sniffing and improper reuse of execution plans that were created with a different set of parameters that had a very different optimal access pattern.
The sudden change to the two different-style accesses being the same (rather than one quick) strongly suggests that the cached execution plan was updated to a version that now performs slowly with both access methods, or your data or your parameters changed.
In my experience the general culprit in this sort of small/huge time difference of execution is use of a nested loop join where a hash match is actually required. (For a very small number of rows the nested loop is superior, past a certain fairly low barrier, then the hash match becomes less expensive. Unless you're lucky that your inputs are both sorted by the join criteria, a merge join is rare to find as sorting large sets tends to be more expensive than hash matching.)
The reason that your parameter tweaking in the SP fixed the problem is that then SQL Server became aware you were doing something to the parameters by setting them to some value (ignoring what you'd set them to) and it had to compute a new execution plan, so it threw out the old one and designed a new access path based on the current set of parameters, getting better results.
If this problem persists then playing with SP recompilation/clearing the plan cache combined with using different parameters that must deal with hugely different number of rows may reveal where the problem is. Look at the execution plan that is used to run the SP with different parameters and see the effects of different access strategies being employed in the wrong conditions.
I have a 400 line sql query which is throwing exception withing 30 seconds
ORA-03113: end-of-file on communication channel
Below are things to note:
I have set the timeout as 10 mins
There is one last condition when removed resolves this error.
This error came only recently when I analyzed indexes.
The troubling condition is like this:
AND UPPER (someMultiJoin.someColumn) LIKE UPPER ('%90936%')
So my assumption is that the query is getting terminated from the server side apparently because its identified as a resource hog.
Is my assumption appropriate ? How should I go about to fix this problem ?
EDIT: I tried to get the explain plan of faulty query but the explain plan query also gives me an ORA-03113 error. I understand that my query is not very performant but why should that be a reason for ORA-03113 error. I am trying to run the query from toad and there are no alert log or trace generated, my db version is
Oracle9i Enterprise Edition Release 9.2.0.7.0 - Production
One possible cause of this error is a thread crash on the server side. Check whether the Oracle server has generated any trace files, or logged any errors in its alert log.
You say that removing one condition from the query causes the problem to go away. How long does the query take to run without that condition? Have you checked the execution plans for both versions of the query to see if adding that condition is causing some inefficient plan to be chosen?
I've had similar connection dropping issues with certain variations on a query. In my case connections dropped when using rownum under certain circumstances. It turned out to be a bug that had a workaround by adjusting a certain Oracle Database configuration setting. We went with a workaround until a patch could be installed. I wish I could remember more specifics or find an old email on this but I don't know that the specifics would help address your issue. I'm posting this just to say that you've probably encountered a bug and if you have access to Oracle's support site (support.oracle.com) you'll likely find that others have reported it.
Edit:
I had a quick look at Oracle support. There are more than 1000 bugs related to ORA-03113 but I found one that may apply:
Bug 5015257: QUERY FAILS WITH ORA-3113 AND COREDUMP WHEN QUERY_REWRITE_ENABLED='TRUE'
To summarize:
Identified in 9.2.0.6.0 and fixed in 10.2.0.1
Running a particular query
(not identified) causes ORA-03113
Running explain on query does the
same
There is a core file in
$ORACLE_HOME/dbs
Workaround is to set
QUERY_REWRITE_ENABLED to false: alter
system set query_rewrite_enabled =
FALSE;
Another possibility:
Bug 3659827: ORA-3113 FROM LONG RUNNING QUERY
9.2.0.5.0 through 10.2.0.0
Problem: Customer has long running query that consistently produces ORA-3113 errros.
On customers system they receive core.log files but do not receive any errors
in the alert.log. On test system I used I receivded ORA-7445 errors.
Workaround: set "_complex_view_merging"=false at session level or instance level.
You can safely remove the "UPPER" on both parts if you are using the like with numbers (that are not case sensitive), this can reduce the query time to check the like sentence
AND UPPER (someMultiJoin.someColumn) LIKE UPPER ('%90936%')
Is equals to:
AND someMultiJoin.someColumn LIKE '%90936%'
Numbers are not affected by UPPER (and % is independent of character casing).
From the information so far it looks like an back-end crash, as Dave Costa suggested some time ago. Were you able to check the server logs?
Can you get the plan with set autotrace traceonly explain? Does it happen from SQL*Plus locally, or only with a remote connection? Certainly sounds like an ORA-600 on the back-end could be the culprit, particularly if it's at parse time. The successful run taking longer than the failing one seems to rule out a network problem. I suspect it's failing quite quickly but the client is taking up to 30 seconds to give up on the dead connection, or the server is taking that long to write trace and core files.
Which probably leaves you the option of patching (if you can find a relevant fix for the specific ORA-600 on Metalink) or upgrading the DB; or rewriting the query to avoid it. You may get some ideas for how to do that from Metalink if it's a known bug. If you're lucky it might be as simple as a hint, if the extra condition is having an unexpected impact on the plan. Is someMultiJoin.someColumn part of an index that's used in the successful version? It's possible the UPPER is confusing it and you could persuade it back on to the successful plan by hinting it to use the index anyway, but that's obviously rather speculative.
It means you have been disconnected. This not likely to be due to being a resource hog.
I have seen where the connection to the DB is running over a NAT and because there is no traffic it closes the tunnel and thus drops the connection. Generally if you use connection pooling you won't get this.
As #Daniel said, the network connection to the server is being broken. You might take a look at End-of-file on communication channel to see if it offers any useful suggestions.
Share and enjoy.
This is often a bug in the Cost Based Optimizer with complex queries.
What you can try to do is to change the execution plan. E.g. use WITH to pull some subquerys out. Or use the SELECT /*+ RULE */ hint to prevent Oracle from using the CBO. Also dropping the statistics helps, because Oracle then uses another execution plan.
If you can update the database, make a test installation of 9.2.0.8 and see if the error is gone there.
Sometimes it helps to make a dump of the schema, drop everything in it and import the dump again.
I was having the same error, in my case what was causing it was the length of the query.
By reducing said length, I had no more problems.
We have a query that takes 2 seconds to run in Sql Server Management Studio but it takes 13 seconds to be shown on a client screen.
I used dotTrace to profile my source code and noticed there is this SNIReadSync method (part of ADO.net assemblies)that takes a lot of time to do its job(9 seconds).I ran my source over server so I could omit the network effects and the result was the same.
It doesn't matter if I'm using OleDBConnection or SqlConnection.
It doesn't matter if I'm using a DataReader or a DataSet.
Connection pooling does not solve this issue(as my result shows).
I googled this issue and I couldn't find an answer to the question that what this method is actually doing and how we can improve it.
here's what I found on StakOverFlow that's not helpful either:
https://stackoverflow.com/questions/1610874/snireadsync-executing-between-120-500-ms-for-a-simple-query-what-do-i-look-for
Ignoring SNIReadSync for a moment (I think this might be a red herring).
The symptoms you are describing sound like an incorrectly cached query plan.
Please update your statistics (or rebuild indexes) and see if it still occurs.