WCF connections close very slowly with SSL - performance

We have WCF services that operate over multiple protocols for different customers. Most work fine, but when we use SSL the connections take a long time to close. Opening a connection is no problem, but closing is very slow.
The strangest behavior is that the close time is proportional to the amount of data that was transmitted on the connection. If just a few bytes are sent from the server to the client the connection will close almost instantly, but a search that returns several hundred rows takes as long to close the connection as the original search. The close time seems directly proportional to amount of data transmitted. It seems that the results are retransmitted back to the server for verification before the connection will close.
An error is almost never thrown, but the connection close time essentially doubles the required time to execute a call.
Here are the basic settings:
Custom binding
Binary encoding
Reliable session, Ordered=true
Binding element is HttpsTransportBindingElement
using RemoteCertificateValidationCallback
All of the proxies are constructed programmatically with ChannelFactory.

We found that the problem was with ReliableSession. ReliableSession tries to verify everything that was sent in the next connection. This sounds like a good idea, but it is essentially worthless because even if I found something that didn't verify it is too late to do anything about it.
Bottom line: ReliableSession isn't very reliable.

Just a theory it could be that it writes to a log when the proxy closes, and you get an extra hit due to decryption, or that it does not cache https results.
Do you have any WCF logging turned on?
Does the CPU spike when you close the proxy?
Could you check if it is actually sending two requests to the server?

Related

Can/Should a http read_timeout be retried?

I'm on a network that usually causes a ton of connection timeout issues, and ocasionally I'm running into read timeout issues as well. Retrying the code whenever a connect timeout happens fixes the problem with connecting to the server. Is is safe to retry the code whenever I get a read_timeout, or whould the response become corrupted? I'm using Ruby, with Net::HTTP client, but I guess this could apply to other languages as well.
A read_timeout means that the server did not send any data within the expected timeout. The response becoming corrupted is less likely as this is TCP.
To answer if it's safe or not to retry depends on what operation you're performing and/or any guarantees the service you're interacting with gives you.
In general GET should be safe to retry.
POST/PUT may need special handling (i.e. rereading some state before deciding to retry) as this usually means that something changes on the server.

What does the times mean in Google Chrome's timeline in the network panel?

Often when troubleshooting performance using the Google Chrome's network panel I see different times and often wonder what they mean.
Can someone validate that I understand these properly:
Blocking: Time blocked by browser's multiple request for the same domain limit(???)
Waiting: Waiting for a connection from the server (???)
Sending: Time spent to transfer the file from the server to the browser (???)
Receiving: Time spent by the browser analyzing and decoding the file (???)
DNS Lookup: Time spent resolving the hostname.
Connecting: Time spent establishing a socket connection.
Now how would someone fix long blocking times?
Now how would someone fix long waiting times?
Sending is time spent uploading the data/request to the server. It occurs between blocking and waiting. For example, if I post back an ASPX page this would indicate the amount of time it took to upload the request (including the values of the forms and the session state) back to the ASP server.
Waiting is the time after the request has been sent, but before a response from the server has been received. Basically this is the time spent waiting for a response from the server.
Receiving is the time spent downloading the response from the server.
Blocking is the amount of time between the UI thread starting the request and the HTTP GET request getting onto the wire.
The order these occur in is:
Blocking*
DNS Lookup
Connecting
Sending
Waiting
Receiving
*Blocking and DNS Lookup might be swapped.
The network tab does not indicate time spent processing.
If you have long blocking times then the machine running the browser is running slowly. You can fix this by upgrading the machine (more RAM, faster processor, etc.) or by reducing its workload (turn off services you do not need, closing programs, etc.).
Long wait times indicate that your server is taking a long time to respond to requests. This either means:
The request takes a long time to process (like if you are pulling a large amount of data from the database, large amounts of data need to be sorted, or a file has to be found on an HDD which needs to spin up).
Your server is receiving too many requests to handle all requests in a reasonable amount of time (it might take .02 seconds to process a request, but when you have 1000 requests there will be a noticeable delay).
The two issues (long waiting + long blocking) are related. If you can reduce the workload on the server by caching, adding new server, and reducing the work required for active pages then you should see improvements in both areas.
You can read a detailed official explanation from google team here. It is a really helpful resource and your info goes under Timeline view section.
Resource network timing shows the same information as in resource bar in timeline view. Answering your quesiton:
DNS lookup: Time spent performing the DNS lookup. (you need to find out IP address of site.com and this takes time)
Blocking: Time the request spent waiting for an already established connection to become available for re-use. As was said in another answer it does not depend on your server - this is client's problem.
Connecting: Time it took to establish a connection, including TCP handshakes/retries, DNS lookup, and time connecting to a proxy or negotiating a secure-socket layer (SSL). Depends on network congestion.
Sending - Time spent sending the request. Depends on the size of sent data (which is mostly small because your request is almost always a few bytes except if you submit a big image or huge amount of text), network congestion, proximity of client to server
Waiting - Time spent waiting for the initial response. This is mostly the time of your server to process and respond to your response. This is how fast if your server calculating things, fetching records from database and so on.
Receiving - Time spent receiving the response data. Something similar to the sending, but now you are getting your data from the server (response size is mostly bigger than request). So it also depends on the size, connection quality and so on.
Blocking: Time the request spent waiting for an already established connection to become available for re-use. As was said in
another answer it does not depend on your server - this is client's
problem.
I do not agree with the statement above. All else being same [my machine workload] - my browser shows very less "Blocking" time for one website and long blocking time for some other website.
So if waiting for one of the six threads + proxy negotiation** is high, it is mostly because of the cascading effect of the server's slowness OR the bad design of page [too much being sent across the wire, too many times].
** - whatever "Proxy Negotiation" means!, nobody explains this very well, particularly where no local/CDN proxy is actually involved

How does Google Docs autosave work?

Okay, I know it sounds generic. But I mean on an AJAX level. I've tried using Firebug to track the NET connections and posts and it's a mystery. Does anyone know how they do the instant autosave constantly without DESTROYING the network / browser?
My guess (and this is only a guess) is that google uses a PUSH service. This seems like the most viable option given their chat client (which is also integrated within the window) also uses this to delivery "real time" messages with minimal latency.
I'm betting they have a whole setup that manages everything connection related and send flags to trigger specific elements. You won't see connection trigers because the initial page visit establishes the connection then just hangs on the entire duration you have the page open. e.g.
You visit the page
The browser established a connection to [example]api.docs.google.com[/example] and remains open
The client-side code then sends various commands and receives an assortment of responses.
These commands are sent back and forth until you either:
Lose the connection (timeout, etc.) in which case it's re-established
The browser window is closed
Example of, how I see, a typical communication:
SERVER: CLIENT:
------- -------
DOC_FETCH mydocument.doc
DOC_CONTENT mydocument.doc 15616 ...
DOC_AUTOSAVE mydocument.doc 24335 ...
IM collaboratorName Hi Joe!
IM_OK collaboratorName OK
AUTOSAVE_OK mydocument.doc OK
Where the DOC_FETCH command is saying I want the data. The server replies with the corresponding DOC_CONTENT <docname> <length> <contents>. Then the client triggers DOC_AUTOSAVE <docname> <length> <content>. Given the number of potential simultaneous requests, I would bet they keep the "context" in the requests/responses so after something is sent it can be matched up. In this example, it knows the IM_OK matches the second request (IM), and the AUTOSAVE_OK matches the first request (AUTOSAVE)--Something like how AOL's IM protocol works.
Again, this is only a guess.
--
To prove this, use something like ethereal and see if you can see the information transferring in the background.

ie save onunload bug

I have a dynamic ajaxy app, and I save the state when the user closes the explorer window.
It works ok in all browsers but in IE there is problem. After I close twice the application tab, i can't connect anymore to the server.
My theory is that the connection to the server fail to complete while the tab is being closed and somehow ie7 thinks that it has 2 outstanding connections to the server and therefore queues new connections indefinitely.
Any one has experienced this, any workaround or solution?
In IE if you use long-polling AJAX request, you have to close down the XHR connection on 'unload'. Otherwise it will be kept alive by browser, even if you navigate away from your site. These kept alive connections will then cause the hang, because your browser will hit the maximum open connection limit.
This problem does not happen in other browsers.
Well, you can get around the connection-limit easily enough; simply create a wildcard domain and instruct your app to round-robin the subdomains; e.g. a.rsrc.dmvnoc.com, b.rsrc.dmvnoc.com, etc, for my netMail application. Without this trick, preloading all the images takes almost 30 seconds on a LAN (because of MSIE's low connection limit), but with it, the images download in about a second.
If you need to combine scripts with this trick, just set document.domain to the parent in the new scripts.
However, you might want to checkpoint the state on change anyway- the user might lose their network connection, or their computer might crash. If you want to reduce network traffic, have the client simply set a cookie that contains the relevent state- you can fit an awful lot in there (3000 bytes or so) and then the server gets it automatically on the next connection anyway- where it can save the results (as it presently does) and remove the cookie to signal that it has saved the state.

Meaning/cause of RPC Exception 'No interfaces have been exported.'

We have a fairly standard client/server application built using MS RPC. Both client and server are implemented in C++. The client establishes a session to the server, then makes repeated calls to it over a period of time before finally closing the session.
Periodically, however, especially under heavy load conditions, we are seeing an RPC exception show up with code 1754: RPC_S_NOTHING_TO_EXPORT.
It appears that this happens in the middle of a session. The user is logged on for a while, making successful calls, then one of the calls inexplicably returns this error. As far as we can tell, the server receives no indication that anything went wrong - and it definitely doesn't see the call the client made.
The error code appears to have permanent implications, as well. Having the client retry the connection doesn't work, either. However, if the user has multiple user sessions active simultaneously between the same client and server, the other connections are unaffected.
In essence, I have two questions:
Does anyone know what RPC_S_NOTHING_TO_EXPORT means? The MSDN documentation simply says: "No interfaces have been exported." ... Huh? The session was working fine for numerous instances of the same call up until this point...
Does anyone have any ideas as to how to identify the real problem? Note: Capturing network traffic is something we would rather avoid, if possible, as the problem is sporadic enough that we would likely go through multiple gigabytes of traffic before running into an occurrence.
Capturing network traffic would be one of the best ways to tackle this issue. If you can't do that, could you dump the client process and debug with WinDBG or Visual Studio? Perhaps compare a dump when operating normally versus in the error state?

Resources