I have a service being load tested by a third party. A few minutes after starting, we start to see requests hanging for a very long period of time and the caller ultimately times out (after 60 seconds).
They are testing with 15 users with each user using two devices at once, so a total of 30 connections.
The service is a simple façade to a more complex operation, calling an external system. Benchmarking our communications to the external system looks as though everything is responding in the time we would expect (sub 200ms).
The IIS logs reveals a bunch of very high requests (> 200sec) which ultimately do return a 200 and have Win32 error code ERROR_NETNAME_DELETD (error 64). I have checked the Service Log and can match up the response to the request (based on the SOAP message id) and can see that we do eventually respond with the correct information (although the client has long given up).
Any ideas as to what could be causing this behavior? We're hosting in IIS using wsHttpBinding and we're using WS-Security with x509 certificates (message & transport encryption).
We don't have benchmark logging inside of our service but the code is a very simple mapping of the WCF request to the server request, making the request, and mapping the response to the WCF response. We do this manually and there is no parsing involved (straight assignments).
After a detailed investigation, including getting Microsoft support involved we were hitting up against the serviceThrottling defaults, specifically the maxConcurrentSessions. We determined this from perfmon - there is a counter for this. We were unsure as to why we saw this as the service behaved when called with a .NET client.
It turns out that the Java consumer of this application, using CXF, was not respecting the WSDL (specifically the bit about WS-SecureConversation) and closing sessions out when it closed its connection.
Our solution was to jack up the maxConcurrentSessions to a high number, set the inactivityTimeout down low (to a minute) to force session abandonment. In addition, we set establishSecurityContext to false to avoid the WSS negotiation consuming an additional session.
The solution is inelegant as the service logs are littered with errors about forced session closures, but it fixed the issue we were seeing here. Unfortunately we had a requirement for WS-Security so our solution needed to stick with that.
I hope this helps someone as this was an interesting and time consuming problem to pin down.
Related
Oh the joyous question of HTTP vs WebSockets is at it again, however even after quit a bit of reading on the hundreds of versus blog posts, SO questions, etc, etc.. I'm still at a complete loss as to what I should be working towards for our application. In this post I will be supplying information on application functionality, and the types of requests/responses used in our application currently.
Currently our application is a sloppy piece of work, thrown together using AngularJS and AJAX requests to a Apache server running PHP, namely XAMPP. With the launch of our application I've noticed that we're having problems with response times when the server is under any kind of load. This probably has something to do with the sloppy architecture of our server, the hardware, and the fact that our MySQL database isn't exactly optimized.
However, with such a loyal fanbase and investors seeing potential in our application and giving us a chance to roll out a 2.0 I've been studying hard into how to turn this application into a powerhouse of low latency scalability. Honestly the best option would be hire someone with experience, but unfortunately I'm a hobbyist, and a one-man-army without much experience.
After some extensive research, I've decided on writing the backend using NodeJS this time. However I'm having a hard time deciding on HTTP or Websockets. Here's the types of transactions that are done between the Server/Client.
Client sends a request to the server in JSON format. The request has a few different things.
A request id (For processing logic based on the request)
The data associated with the request ID.
The server receives the request, polls the database (if necessary) and then responds to the client in JSON format. Sometimes the server is serving files to the client. Namely images in Base64 format.
Currently the application (When being used) sends a request to the server every time an interface is changed, which on average for our application is once every few seconds. Every action on our interfaces sends another request to the server. The application also sends requests to check for notifications/messages every 8 seconds, (or two seconds depending on if they're on the messaging interface).
Currently here are the benefits I see of a stated connection over a stateless connection with our application.
If the connection is stated, I can eliminate the requests for notifications and messages, as the server can just tell the client whenever one comes available. This can eliminate x(n)/4 requests per second to the server alone.
Handling something like a disconnection from the server is as simple as attempting to reconnect, opposed to handling timeouts/errors per request, this would only be handled on the socket.
Additional security can be obtained by removing security keys for database interaction, this should prevent the possibility of Hijacking(?) of a session_key and using it to manipulate or access another users data. The session_key is only needed due to there being no state in the AJAX setup.
However, I'm someone who started learning programming through TCP game server emulation. So I understand some benefits of a STATED connection, while I don't understand the benefits of a STATELESS connection very much at all. I know they both have their benefits and quirks, but I'm curious what would be the best approach for us.
We're mainly looking for Scalability, as we had a local application launch and managed to bottleneck at nearly 10,000 users in under 48 hours. Luckily I announced this as a BETA and the users are cutting me a lot of slack after learning that I did it all on my own as a learning project. I've disabled registrations while looking into improving the application's front and backend.
IMPORTANT:
If using WebSockets, would we be able to asynchronously download pictures from the server like we can with AJAX? For example, I can make 5 requests to the server using AJAX for 5 different images, and they will all start downloading immediately, using a stated connection would I have to wait for each photo to be streamed before moving to the next request? Would this only bottle-neck a single user, or every user that is waiting on a request to be completed?
It all boils down on how your application works and how it needs to scale. I would use bare WebSockets rather than any wrapper, since it is an already easy to use API and your hands won't be tied when you need to scale out.
Here some links that will give you insight, although not concrete answers to your questions because as I said, it depends on your expectations.
Hard downsides of long polling?
WebSocket/REST: Client connections?
Websockets, and identifying unique peers[PHP]
How HTML5 Web Sockets Interact With Proxy Servers
If your question is Should I use HTTP over Websockets ?, the response is: You should not.
Even if it is faster because you don't lose time opening the connection, you lose also all the HTTP specification like verbs (GET, POST, PATCH, PUT, ...), path, body, and also response, status code. This seams simple but you'll have to re-implement all or part of these protocol things.
So you should use Ajax, as long as it is one ponctual request.
When you need to make an ajax request every 2 seconds, you need in fact that the server sends you data, not YOU request server to check Api change (if changed). So this is a sign that you should implement a websocket server.
I have an asp.net mvc4 web api interface that gets about 54k requests a day.
http://myserv.x.com/api/123/getstuff?whatstuff=thisstuff
I have 3 web servers behind a load balancer that are setup to handle the http requests.
On average response times are ~300ms. However, lately something has gone awry (or maybe it has always been there) as there is sporadic behavior of response times coming back in 10-20sec. This would be for the same request hitting the same server directly instead of through the load balancer.
GIVEN:
- System has been passed down to me so there may be gaps with IIS confiuration, etc,.
- Database: SQL Server 2008R2
- Web Servers: Windows Server 2008R2 Enterprise SP1
- IIS 7.5
- Using MemoryCache aggressively with Model and Business Objects with eviction set to 2hrs
- Looked at the logs but really don't see anything significantly relevant
- One application pool...no other LOB applications running on this server
Assumptions & Ask:
Somehow I'm thinking that something is recycling the application pool or IIS worker threads are shutting down and restarting thus causing each new request to warmup and recache itself. It's so sporadic that it's tough to trouble shoot right now. The same request to the same server comes back fast as expected (back to back N requests) since it was cached in about 300ms....but wait about 5-10-20min and that same request to the same server takes 16seconds.
I have limited tracing to go by as these are prod systems so I can only expose so much logging details. Any help and information attacking this or similar behavior somebody else has run into is appreciated. Thx
UPDATE:
The w3wpe.exe process grows to ~3G. Somehow it gets wiped out and the PID changes so itself or something is killing it every 3-4min I see tons of warnings in my webserver (IIS) log:
A process serving application pool 'MyApplication' suffered a fatal
communication error with the Windows Process Activation Service. The
process id was '1732'. The data field contains the error number.
After 4-5 days of assessing IIS and configuration vs internal code issues I finally found the issue with little to no help with windbg or debugdiag IIS tools. Those tools contain so much information even with mini dumps or log trace stacks that they can be red herrings. Best bet was to reproduce it by setting up a "copy intelligently" instance of a production system, which we did not have at the time and took a bit for ops to set something up.
Needless to say the problem had to do with over cacheing business objects. There was one race condition where updates on a certain table were updating an attribute to that corresponding business object (updates were coming from multiple servers) which was causing an OOC stackoverflow that pretty much caused the cacheing to recursively cache itself to death thus causing the w3wp.exe process to die and psuedo-recycle itself. It was one of those edge cases that was incredibly hard to test and repro in a non-production environment.
It seems this is a common question/problem but despite checking out a number of proposed solutions nothing worked for us so far.
The app
It's a simple chat app, that puts a new interface on an existing app's JSON library. We proxy all the calls to their app to avoid x-domain restrictions (IE8).
ASP.net MVC3 App;
It's hosted in IIS6, W2K3 SP2. DEV svr has 1gig ram, TEST svr has 4gig ram.
The problem
When we approach 20 concurrent users, requests start lagging - no issues in Event Viewer to be found. It looks like calls are just queued. There are NO 503's returned.
What we've tried
We're using AsyncController to long-poll a 3rd party webservice for results
Hosted in IIS6
We're using the TPL to call their service in our AsyncController method
We've modified processModel and set maxWorkerThreads=100.
We've looked at this how-to but the HTTP.SYS config looks to service an infinite number of threads so we haven't bothered adding the reg keys.
The 3rd party service can handle lots of concurrent requests (and is in a web farm, so we're fairly confident we're the weakest link)
what are we missing? - any help greatly appreciated
Well... almost four weeks later and I thought I'd update this in case anyone wants to find out what helped us overcome these limitations (we're cramming around 100 simul connections on our DEV server, 1gig Xeon).
AsyncControllers
If you've got a potentially long waiting request (i.e. long polling) then use them.
Feel free to use TaskFactory but be sure to mark it as a long running process, if there is risk you could exception in your thread, be sure to use ContinueWith so you can decrement the operations, and return the error to your caller.
ServicePointManager
If you're making downstream calls (i.e. WebService/3rd party API) then make sure you have increased DefaultConnectionLimit from the default of 2 simultaneous connections.
A rough guide is 8 * Num cores so you don't starve outgoing connection resources.
See this MSDN article on DefaultConnectionLimit for more info.
IOCP vs RestSharp
I love RestSharp's API, it's fantastic but it's probably meant more for client side programming, not for proxying requests. My mistake!! Use HttpWebRequest and the Begin/End methods to make use of IOCP
If you're looking to reverse proxy or url rewrite, check out URL Rewriter, a great library available freely on CodePlex
In the end, our issue wasn't with incoming requests, it was with requests being proxied to a third party, we weren't supplying enough connections and thus they all queued up lagging the whole system. Happy to say after lots of reading, investigation and coding we've resolved it.
Can somebody explain what ajax-push is? From what I understand it involves leaving HTTP connections open for a long time and reconnecting as needed. It seems to be used in chat systems a lot.
I have also heard when using ajax-push in Java it is important to use something with the NIO-connetors or grizzle serlvet api? Again, I'm just researching what it exactly.
In normal AJAX (call it pull) you ask the server for something and you get it immediately. This is fine when you want to get some data from the server now. But what if something happens on the server and the server wants to push that event to the client(s)?
Technically this is implemented using so called long polling - the browser opens the HTTP connection and waits for the response. As long as there is nothing interesting on the server side, it waits. But when something happens, the server sends the response and the client receives it immediately. This is a huge advantage over normal polling where you ask the server every few seconds - it generates a lot of traffic and still introduces noticeable latency.
The only problem with this approach is the number of pending HTTP connections. Old-school Java servlet containers aren't quite capable of handling such amount of connections due to one-thread-per-connection limitation - they quickly run out of memory. Even though the HTTP threads aren't doing anything (waiting for some other part of the system to wake them up and give them the response), they occupy memory.
However there are plenty of solutions nowadays:
Tomcat NIO connectors
Atmosphere Ajax Push/Comet library
Servlet 3.0 #Async (most portable)
Container-specific features, but Servlet 3.0, if available, should be considered superior.
I am creating a service which receives some data from mobile phones and saves it to the database.
The phone is sending the data every 250 ms. As I noticed that the delay for data storing is increasing I tried to run WireShark and write a log as well.
I noticed that the web requests from mobile phone are being made without the delay (checked with WireShark), but the in the service log I noticed the request is received every second and a half or almost two seconds.
Does anyone know where could be the problem or the way to test and determine the cause of such delay?
I am creating a service with WCF (webHttpBinding) and the database is MS SQL.
By the way the log stores the time of http request and also the time of writing data to the database. As mentioned above the request is received every 1.5 - 2 seconds and after that it takes 50 ms to store data to the database.
Thanks!
My first guess after reading the question was that maybe you are submitting data so fast, the database server is hitting a write-contention lock (e.g. AutoNumber fields?)
If your database platform is SQL Server, take a look at http://www.sql-server-performance.com/articles/per/lock_contention_nolock_rowlock_p1.aspx
Anyway please post more information about the overall architecture of the system... what softwares/platforms are used at what parts etc...
Maybe there is some limitation in the connection imposed by the service provider?
What happens if you (for testing) don't write to the database and just log the page hits in the server log with timestamp?
Check that you do not have any tracing running on the web services, this can really kill perf.