I'm having a ton of trouble making a consistent connection to SQL Server while debugging, lately. We're spending most of our time diagnosing the issue from a server/networking level, but I wondered if there is a common configuration issue that might be at work here.
Here is some general information about the problem. Frankly, I don't even know what information would be relevant. So, any questions welcome.
This is a web forms project using LINQ-to-SQL for data access
It's a template that is used to create many different sites using a config setting
Most of the data retrieval calls are made with ObjectTrackingEnabled = false
The SQL Server is mirrored, so the connection string has a fail over
partner specified, a connection timeout of 30 seconds, and a Network=dbmssocn
The problem is inconsistent - but frequent - and
only affects the site when debugging locally.
In production, the sites log errors to the DB unless a connection fails in which case they send an email; we have received no dropped connection emails from the prod sites
We have not experienced any large bump in traffic in terms of queries running against the SQL Server
UPDATE: After writing the question, I suspected that perhaps the mirrored server setup and time out was causing the issue. Only, after setting up the connection string so that a different one is used depending on whether the site is being run in debug mode or not, I am still having the same trouble connecting.
This ended up being a bad NIC.
Related
We have recently upgraded from Oracle.ManagedDataAccess.EntityFramework to Oracle.EntityFrameworkCore (we are on .net standard 2.0). When we connect to the database we use proxy credentials, with the following connection string:
User Id=changingUserId;Data Source=dbname;Proxy User Id=proxyUserId;Proxy Password=proxyUserPassword;
The UserID element changes based on who is connecting.
The problem we have is that the connection pools are no longer working as expected, with many connections being spawning and not closed - we very quickly reach the pool size limit and everything grinds to halt. Before the upgrade, pools would increase and decrease in size, but they now only grow!
Reading the oracle docs, it appears it requires the connection string to be identical for connection pooling to work correctly, but I don't see how this is possible when we are using proxy users. Has anyone else come across this/got around it or am I missing something?
Thanks
Chris
We have found a work around, adding the users password into the connection string makes it work as expected - no more filling up the connection pool/connection numbers are again rising and falling.
User Id=changingUserId;Password=usersPwd;Data Source=dbname;Proxy User Id=proxyUserId;Proxy Password=proxyUserPassword;
This isn't ideal for us - authentication/authorisation is handled elsewhere - but it will do for now. We are raising a call with Oracle as I suspect this is a bug in their library.
We are working on an ASP.NET 5 Web API project that is in production now but we are experiencing an issue where it becomes unresponsive intermittently throughout the day.
A few notes about the application architecture. It is an ASP.NET Web API project using a MariaDB database on a separate EC2 instance within the same private network. The connection string uses the private IP of the database server to avoid any name resolution issues. The site is hosted via IIS 10.
The application itself has been developed carefully following the best practices provided by Microsoft. Heavy focus on async operations, minimizing query response times and offloading more expensive operations into background services.
The app is extremely responsive. It performs with sub 100ms responses on almost all requests, even the more complicated requests, and all the way up until it becomes unresponsive this high level of performance remains the same. We tend to see between 10-30 requests per second and 300-500 select queries per second at peak usage so not too extreme. However, randomly (2-3 times over a 24 hour period) it will begin hanging on requests and simply not respond to the request. During this time, the database is still extremely responsive and we are never over 300 connections out of our 512 connection limit.
The resources on the application server itself are never really taxed much at all. The CPU never gets above ~20% and the memory usage sits around 20-30%.
If I were to stop the site in IIS and start it again while this is happening, it will quickly come back online. If I don't it will be down for a few minutes until IIS finally kills it due to a failed health check. There are no real errors generated as a response to the issue other than typical errors caused by the hanging of the process such as connection terminated errors. The only thing I have seen before that gave me pause was the fact that there a few connection timeouts when getting the connection from the pool, but like I said, the connections to the server are never close to the limit.
Also, this app and version has been in production for months and it wasn't until the traffic volume started to grow that we started seeing these issues. At this point, I am at a loss for next steps of troubleshooting and I'm seeking suggestions.
In IIS App Pool advanced settings set Start Mode to AlwaysRunning
I never found a root cause for this issue, however, after updating to newer versions of .NET MVC this issue went away. My best guess is that changes with the Kestrel possibly resolved this issue, although, I have no idea what specific change that might have been. I have gone through the change logs a few times and didn't see anything that specifically jumped out at me.
My VB6 program uses ADODB to do a lot of SQL (2000) CRUD.
Sometimes the network connection between the remote clients and the data center somehow "drops" resulting in the impossibility to establish new connections (so users launching the program can't use it).
The issue is the following:
Anyone who is using the program at the moment of the "drop" can continue using it with no issues whatoever, perform every operation, update data, read data, and everything seems like is working normally.
User then proceeds to fire up a "sum-up" report which lists everything that was done (before or after the "drop").
If we then check the database, all data regarding whatever was done after the network drop is not there. User goes back into the program and everything is as it was before the network drop.
It seems like all queries where somehow performed in-memory ? I'm at a loss about how to even approach the issue (I'm familiar enough with VB6 to work with the source code but I don't know a lot about ADODB).
I haven't yet tried to replicate the behavior due to limited customer's availability (development environment is housed in their offices), I'll try starting up the program from the IDE then rip out the network cable.
Provided I can replicate the issue, how do I fix this ? Is there some setting I'm not aware of ?
On a side note: the issue is sporadic (it happened a handful of times during the last year, and the software is being used heavily and on a daily basis by mutiple concurrent users).
After reading up on Disconnected Recordsets, it seems that's what's behind this odd behavior I'm experiencing.
This is not something that can be simply "turned off".
I have an asp.net mvc4 web api interface that gets about 54k requests a day.
http://myserv.x.com/api/123/getstuff?whatstuff=thisstuff
I have 3 web servers behind a load balancer that are setup to handle the http requests.
On average response times are ~300ms. However, lately something has gone awry (or maybe it has always been there) as there is sporadic behavior of response times coming back in 10-20sec. This would be for the same request hitting the same server directly instead of through the load balancer.
GIVEN:
- System has been passed down to me so there may be gaps with IIS confiuration, etc,.
- Database: SQL Server 2008R2
- Web Servers: Windows Server 2008R2 Enterprise SP1
- IIS 7.5
- Using MemoryCache aggressively with Model and Business Objects with eviction set to 2hrs
- Looked at the logs but really don't see anything significantly relevant
- One application pool...no other LOB applications running on this server
Assumptions & Ask:
Somehow I'm thinking that something is recycling the application pool or IIS worker threads are shutting down and restarting thus causing each new request to warmup and recache itself. It's so sporadic that it's tough to trouble shoot right now. The same request to the same server comes back fast as expected (back to back N requests) since it was cached in about 300ms....but wait about 5-10-20min and that same request to the same server takes 16seconds.
I have limited tracing to go by as these are prod systems so I can only expose so much logging details. Any help and information attacking this or similar behavior somebody else has run into is appreciated. Thx
UPDATE:
The w3wpe.exe process grows to ~3G. Somehow it gets wiped out and the PID changes so itself or something is killing it every 3-4min I see tons of warnings in my webserver (IIS) log:
A process serving application pool 'MyApplication' suffered a fatal
communication error with the Windows Process Activation Service. The
process id was '1732'. The data field contains the error number.
After 4-5 days of assessing IIS and configuration vs internal code issues I finally found the issue with little to no help with windbg or debugdiag IIS tools. Those tools contain so much information even with mini dumps or log trace stacks that they can be red herrings. Best bet was to reproduce it by setting up a "copy intelligently" instance of a production system, which we did not have at the time and took a bit for ops to set something up.
Needless to say the problem had to do with over cacheing business objects. There was one race condition where updates on a certain table were updating an attribute to that corresponding business object (updates were coming from multiple servers) which was causing an OOC stackoverflow that pretty much caused the cacheing to recursively cache itself to death thus causing the w3wp.exe process to die and psuedo-recycle itself. It was one of those edge cases that was incredibly hard to test and repro in a non-production environment.
We have a fairly standard client/server application built using MS RPC. Both client and server are implemented in C++. The client establishes a session to the server, then makes repeated calls to it over a period of time before finally closing the session.
Periodically, however, especially under heavy load conditions, we are seeing an RPC exception show up with code 1754: RPC_S_NOTHING_TO_EXPORT.
It appears that this happens in the middle of a session. The user is logged on for a while, making successful calls, then one of the calls inexplicably returns this error. As far as we can tell, the server receives no indication that anything went wrong - and it definitely doesn't see the call the client made.
The error code appears to have permanent implications, as well. Having the client retry the connection doesn't work, either. However, if the user has multiple user sessions active simultaneously between the same client and server, the other connections are unaffected.
In essence, I have two questions:
Does anyone know what RPC_S_NOTHING_TO_EXPORT means? The MSDN documentation simply says: "No interfaces have been exported." ... Huh? The session was working fine for numerous instances of the same call up until this point...
Does anyone have any ideas as to how to identify the real problem? Note: Capturing network traffic is something we would rather avoid, if possible, as the problem is sporadic enough that we would likely go through multiple gigabytes of traffic before running into an occurrence.
Capturing network traffic would be one of the best ways to tackle this issue. If you can't do that, could you dump the client process and debug with WinDBG or Visual Studio? Perhaps compare a dump when operating normally versus in the error state?