Stream writeheaders take too much time WCF - performance

A similar question is asked at NewRelic stream & writeHeaders
I am profiling my WCF services on New Relic. There is a WCF service which calls another WCF service.
Now I suppose while calling the other WCF service, when it creates request, somewhere the internal process writes headers to request stream which is slow some times.
The traces I found in New Relic tells me that for a particular method of one of my WCF service which calls a method of my another WCF service, takes around 50-60 seconds, out of which 95-100 % of time is consumed by System.Net.ConnectStream.WriteHeaders.
Stream[url of WCF service/soap]: WriteHeaders -> 99.78 % time (approx 49 seconds).
I am not getting what it is and how to reduce this time ?
I have searched and I didn't found what ConnectStream actually do or some details about it, so that I can find any way to lessen the amount of time its taking.
Please, let me know your suggestions.

It sounds like you're streaming a large file up from a client, catching it in one WCF web service, then re-writing the data into a new HttpWebRequest, then sending it to another host. I think I'd be tempted to try buffering the data from the client to your web service rather than streaming.
I've spent the last year working on a project that sounds similar to wha you're doing. The difference betweens streaming and buffering is this:
Streaming reads (from source) and then writes (to target) the data in an interative process you don't have much control over. If the source file is large (like a gig or more), the WCF request/response will iterate a dozen or more times back and forth between the client and host before the request is complete.
Buffering, on the other hand, accummulates the entire content of the target file BEFORE filling the request and sending it to the host, thus speeding up the process. And since the performance penalty incurred by buffering (time required to accummulate the bytes in memory) is placed on the client, it's generally not a problem.
So when buffering data from the client, your host you'll receive one Http request with a complete byte array (let's say) that's ready to be repackaged into the request you're passing onto the second, target WCF host. At that point, again, you have the choice between buffering and streaming. On the host, were performance matters, streaming the request to the second host will improve your scalability but (again) potentially hurt your performance speed.
On the client side:
With binding
.TransferMode =TransferMode.Buffered 'instead of Transfermode.Streamed
.MessageEncoding = WSMessageEncoding.Text
.TextEncoding = System.Text.Encoding.UTF8
.MaxReceivedMessageSize = Integer.MaxValue
.ReaderQuotas.MaxArrayLength = Integer.MaxValue
.ReaderQuotas.MaxBytesPerRead = Integer.MaxValue
.ReaderQuotas.MaxDepth = Integer.MaxValue
.ReaderQuotas.MaxNameTableCharCount = Integer.MaxValue
.ReaderQuotas.MaxStringContentLength = Integer.MaxValue
.MaxBufferSize = Integer.MaxValue
.MaxBufferPoolSize = Integer.MaxValue
On the host side:
With binding
.TransferMode = TransferMode.Buffered
.MaxReceivedMessageSize = Integer.MaxValue

I've seen the same thing before when the service you're calling is stalling or is flooded with too many concurrent connections. If the issue is the former, profiling your WCF service may help identify the root cause -- maybe it's slow to respond due to database access or some other I/O bound process. If the issue is the later, it may be something that you can resolve by tuning the performance of the service (http://msdn.microsoft.com/en-us/library/ee377061(v=bts.10).aspx)
This can also manifest itself as "BeginRequest" for an ASP.NET application in New Relic. Rarely does BeginRequest or WriteHeaders mean the problem is really with sending the data itself, though it could be if you have large payloads, but in regular calls where the data transmitted is small, the problem with a slow time to connect or slow response will appear in these two areas.

Related

How to implement a cache in a Vertx application?

I have an application that at some point has to perform REST requests towards another (non-reactive) system. It happens that a high number of requests are performed towards exactly the same remote resource (the resulting HTTP request is the same).
I was thinking to avoid flooding the other system by using a simple cache in my app.
I am in full control of the cache and I have proper moments when to invalidate it, so this is not an issue. Without this cache, I'm running into other issues, like connection timeout or read timeout, the other system having troubles with high load.
Map<String, Future<Element>> cache = new ConcurrentHashMap<>();
Future<Element> lookupElement(String id) {
String key = createKey(id);
return cache.computeIfAbsent(key, key -> {
return performRESTRequest(id);
}.onSucces( element -> {
// some further processing
}
}
As I mentioned lookupElement() is invoked from different worker threads with same id.
The first thread will enter in the computeIfAbsent and perform the remote quest while the other threads will be blocked by ConcurrentHashMap.
However, when the first thread finishes, the waiting threads will receive the same Future object. Imagine 30 "clients" reacting to the same Future instance.
In my case this works quite fine and fast up to a particular load, but when the processing input of the app increases, resulting in even more invocations to lookupElement(), my app becomes slower and slower (although it reports 300% CPU usage, it logs slowly) till it starts to report OutOfMemoryException.
My questions are:
Do you see any Vertx specific issue with this approach?
Is there a more Vertx friendly caching approach I could use when there is a high concurrency on the same cache key?
Is it a good practice to cache the Future?
So, a bit unusual to respond to my own question, but I managed to solve the problem.
I was having two dilemmas:
Is ConcurentHashMap and computeIfAbsent() appropriate for Vertx?
Is it safe to cache a Future object?
I am using this caching approach in two places in my app, one for protecting the REST endpoint, and one for some more complex database query.
What was happening is that for the database query there was up to 1300 "clients" waiting for a response. Or 1300 listeners waiting for an onSuccess() of the same Future. When the Future was emitting strange things were happening. Some kind of thread strangulation.
I did a bit of refactoring eliminating this concurrency on the same resource/key, but I did kept both caches and things went back to normal.
In conclusion I think my caching approach is safe as long as we have enough spreading or in other words, we don't have such a high concurrency on the same resource. Having 20-30 listeners on the same Future works just fine.

Batching generation of http responses

I'm trying to find an architecture for the following scenario. I'm building a REST service that performs some computation that can be quickly batch computed. Let's say that computing 1 "item" takes 50ms, and computing 100 "items" takes 60ms.
However, the nature of the client is that only 1 item needs to be processed at a time. So if I have 100 simultaneous clients, and I write the typical request handler that sends one item and generates a response, I'll end up using 5000ms, but I know I could compute the same in 60ms.
I'm trying to find an architecture that works well in this scenario. I.e., I would like to have something that merges data from many independent requests, processes that batch, and generates the equivalent responses for each individual client.
If you're curious, the service in question is python+django+DRF based, but I'm curious about what kind of architectural solutions/patterns apply here and if anything solving this is already available.
At first you could think of a reverse proxy detecting all pattern-specific queries, collecting all theses queries and sending it to your application in an HTTP 1.1 pipeline (pipelining is a way to send a big number of queries one after another and receiving all HTTP responses in the same order at the end, without waiting for a response after each query).
But:
Pipelining is very hard to do well
you would have to code the reverse proxy as I do not know a way to do it
one slow response in the pipeline block all the other responses
you need an http server able to give several queries to your application language, something which never happens if the http server is not directly coded in your application, because usually http is made to work on only one query (like you never receive 2 queries in a PHP env, you receive the 1st one, send the response, and then receive the next one, even if the connection contain 2 queries).
So the good idea would be to do that on the application side. You could identify matching queries, and wait for a small amount of time (10ms?) to see if some other queries are also incoming. You will need a way to communicate between several parallel workers here (like you have 50 application workers and 10 of them have received queries that could be treated in the same batch). This way of communication could be a database (a very fast one) or some shared memory, depends on the technology used.
Then when too much time waiting has been spend (10ms?) or when a big amount of queries are received, one of the worker could collect all queries, run the batch, and tell every other workers that a result is there (here again you need a central point of communication, like LISTEN/NOTIFY in PostgreSQL, a shared memory thing, a message queue service, etc.).
Finally every worker is responsible for sending the right HTTP response.
The key here is having a system where the time you loose in trying to share requests treatment is less important than the time saved in batching several queries together, and in case of low traffic this time should stay reasonnable (as here you will always loose time waiting for nothing). And of course you are also adding some complexity on the system, harder to maintain, etc.

How does a non-forking web server work?

Non-forking (aka single-threaded or select()-based) webservers like lighttpd or nginx are
gaining in popularity more and more.
While there is a multitude of documents explaining forking servers (at
various levels of detail), documentation for non-forking servers is sparse.
I am looking for a bird eyes view of how a non-forking web server works.
(Pseudo-)code or a state machine diagram, stripped down to the bare
minimum, would be great.
I am aware of the following resources and found them helpful.
The
World of SELECT()
thttpd
source code
Lighttpd
internal states
However, I am interested in the principles, not implementation details.
Specifically:
Why is this type of server sometimes called non-blocking, when select() essentially blocks?
Processing of a request can take some time. What happens with new requests during this time when there is no specific listener thread or process? Is the request processing somehow interrupted or time sliced?
Edit:
As I understand it, while a request is processed (e.g file read or CGI script run) the
server cannot accept new connections. Wouldn't this mean that such a server could miss a lot
of new connections if a CGI script runs for, let's say, 2 seconds or so?
Basic pseudocode:
setup
while true
select/poll/kqueue
with fd needing action do
read/write fd
if fd was read and well formed request in buffer
service request
other stuff
Though select() & friends block, socket I/O is not blocking. You're only blocked until you have something fun to do.
Processing individual requests normally involved reading a file descriptor from a file (static resource) or process (dynamic resource) and then writing to the socket. This can be done handily without keeping much state.
So service request above typically means opening a file, adding it to the list for select, and noting that stuff read from there goes out to a certain socket. Substitute FastCGI for file when appropriate.
EDIT:
Not sure about the others, but nginx has 2 processes: a master and a worker. The master does the listening and then feeds the accepted connection to the worker for processing.
select() PLUS nonblocking I/O essentially allows you to manage/respond to multiple connections as they come in a single thread (multiplexing), versus having multiple threads/processes handle one socket each. The goal is to minimize the ratio of server footprint to number of connections.
It is efficient because this single thread takes advantage of the high level of active socket connections required to reach saturation (since we can do nonblocking I/O to multiple file descriptors).
The rationale is that it takes very little time to acknowledge bytes are available, interpret them, then decide on the appropriate bytes to put on the output stream. The actual I/O work is handled without blocking this server thread.
This type of server is always waiting for a connection, by blocking on select(). Once it gets one, it handles the connection, then revisits the select() in an infinite loop. In the simplest case, this server thread does NOT block any other time besides when it is setting up the I/O.
If there is a second connection that comes in, it will be handled the next time the server gets to select(). At this point, the first connection could still be receiving, and we can start sending to the second connection, from the very same server thread. This is the goal.
Search for "multiplexing network sockets" for additional resources.
Or try Unix Network Programming by Stevens, Fenner, Rudoff

what is a good Pattern for using AsyncSockets in .net35 when inititiating several client connections

I'm re-building an IM gateway and hope to take advantage of the new performance features in AsyncSockets for .net35.
My existing implementation simply creates packets and forwards IM requests from users to the various IM networks as required, handling request/ response streams for each connected users session(socket).
i presently have to coupe with IasyncResult and as you know it's not very pretty or scalable.
My confusion is this basically:
1) in using the new Begin/End and SocketAsyncEventArgs in 3.5 do we still need to create one SocketAsyncEventArgs per socket?
2) do we gain anything by pre-initializing say, 20000 client connections since we know the expected max_connections per server is 20000
3) do we still need to use a LOH (large object heap) allocated byte[] to handle receive data as shown in SocketServers example on MSDN, we are not building a server per say, but are still handling a lot of independent receives for each connected socket.
4) maybe there is a better pattern altogether for what i'm trying to acheive?
Thanks in advance.
Charles.
1) IAsyncResult/Begin/End is a completely different system from The "xAsync" methods that use SocketAsyncEventArgs. You're better off using SocketAsyncEventArgs and dropping Begin/End entirely.
2) Not really. Initialize a smaller number (50? 100?) and use an intermediate class (ie/ a "resource pool") to manage them. As more requests come in, grow the pool by another 50 or 100 for example. The tough part is efficiently "scaling down" the number of pooled items as resource requirements drop. A large # of sockets/buffers/etc will consume a large amount of memory, so it's better to only allocate it in batches as the server requires it.
3) Don't need to use it, but it's still a good idea. The buffer will still be "pinned" during each call.

Distributed time synchronization and web applications

I'm currently trying to build an application that inherently needs good time synchronization across the server and every client. There are alternative designs for my application that can do away with this need for synchronization, but my application quickly begins to suck when it's not present.
In case I am missing something, my basic problem is this: firing an event in multiple locations at exactly the same moment. As best I can tell, the only way of doing this requires some kind of time synchronization, but I may be wrong. I've tried modeling the problem differently, but it all comes back to either a) a sucky app, or b) requiring time synchronization.
Let's assume I Really Really Do Need synchronized time.
My application is built on Google AppEngine. While AppEngine makes no guarantees about the state of time synchronization across its servers, usually it is quite good, on the order of a few seconds (i.e. better than NTP), however sometimes it sucks badly, say, on the order of 10 seconds out of sync. My application can handle 2-3 seconds out of sync, but 10 seconds is out of the question with regards to user experience. So basically, my chosen server platform does not provide a very reliable concept of time.
The client part of my application is written in JavaScript. Again we have a situation where the client has no reliable concept of time either. I have done no measurements, but I fully expect some of my eventual users to have computer clocks that are set to 1901, 1970, 2024, and so on. So basically, my client platform does not provide a reliable concept of time.
This issue is starting to drive me a little mad. So far the best thing I can think to do is implement something like NTP on top of HTTP (this is not as crazy as it may sound). This would work by commissioning 2 or 3 servers in different parts of the Internet, and using traditional means (PTP, NTP) to try to ensure their sync is at least on the order of hundreds of milliseconds.
I'd then create a JavaScript class that implemented the NTP intersection algorithm using these HTTP time sources (and the associated roundtrip information that is available from XMLHTTPRequest).
As you can tell, this solution also sucks big time. Not only is it horribly complex, but only solves one half the problem, namely giving the clients a good notion of the current time. I then have to compromise on the server, either by allowing the clients to tell the server the current time according to them when they make a request (big security no-no, but I can mitigate some of the more obvious abuses of this), or having the server make a single request to one of my magic HTTP-over-NTP servers, and hoping that request completes speedily enough.
These solutions all suck, and I'm lost.
Reminder: I want a bunch of web browsers, hopefully as many as 100 or more, to be able to fire an event at exactly the same time.
Let me summarize, to make sure I understand the question.
You have an app that has a client and server component. There are multiple servers that can each be servicing many (hundreds) of clients. The servers are more or less synced with each other; the clients are not. You want a large number of clients to execute the same event at approximately the same time, regardless of which server happens to be the one they connected to initially.
Assuming that I described the situation more or less accurately:
Could you have the servers keep certain state for each client (such as initial time of connection -- server time), and when the time of the event that will need to happen is known, notify the client with a message containing the number of milliseconds after the beginning value that need to elapse before firing the event?
To illustrate:
client A connects to server S at time t0 = 0
client B connects to server S at time t1 = 120
server S decides an event needs to happen at time t3 = 500
server S sends a message to A:
S->A : {eventName, 500}
server S sends a message to B:
S->B : {eventName, 380}
This does not rely on the client time at all; just on the client's ability to keep track of time for some reasonably short period (a single session).
It seems to me like you're needing to listen to a broadcast event from a server in many different places. Since you can accept 2-3 seconds variation you could just put all your clients into long-lived comet-style requests and just get the response from the server? Sounds to me like the clients wouldn't need to deal with time at all this way ?
You could use ajax to do this, so yoǘ'd be avoiding any client-side lockups while waiting for new data.
I may be missing something totally here.
If you can assume that the clocks are reasonable stable - that is they are set wrong, but ticking at more-or-less the right rate.
Have the servers get their offset from a single defined source (e.g. one of your servers, or a database server or something).
Then have each client calculate it's offset from it's server (possible round-trip complications if you want lots of accuracy).
Store that, then you the combined offset on each client to trigger the event at the right time.
(client-time-to-trigger-event) = (scheduled-time) + (client-to-server-difference) + (server-to-reference-difference)
Time synchronization is very hard to get right and in my opinion the wrong way to go about it. You need an event system which can notify registered observers every time an event is dispatched (observer pattern). All observers will be notified simultaneously (or as close as possible to that), removing the need for time synchronization.
To accommodate latency, the browser should be sent the timestamp of the event dispatch, and it should wait a little longer than what you expect the maximum latency to be. This way all events will be fired up at the same time on all browsers.
Google found the way to define time as being absolute. It sounds heretic for a physicist and with respect to General Relativity: time is flowing at different pace depending on your position in space and time, on Earth, in the Universe ...
You may want to have a look at Google Spanner database: http://en.wikipedia.org/wiki/Spanner_(database)
I guess it is used now by Google and will be available through Google Cloud Platform.

Resources