I'm wondering about the trade-offs between two approaches to handling HTTP timeouts between two services. Service A is trying to implement retry functionality when calling service B.
Approach 1: This is the typical approach (e.g. Ethernet proto). Perform a request with fixed timeout T. If timeout occurs, sleep for X and retry the request. Increase X exponentially.
Approach 2: Instead of sleeping between retries, increase the actual HTTP timeout value (say, exponentially). In both cases, consider a max-bound.
For Ethernet, this makes sense because of it's low-level location in the network stack. However, for an application-level retry mechanism, would approach 2 be more appropriate? In a situation where there are high levels of network congestion, I would think #2 is better for a couple reasons:
Sending additional TCP connection requests will only flood the network more
You're basically guaranteed to not receive a response when you're sleeping (because you already timed out and/or tore down the socket), whereas if you instead just allowed the TCP request to remain outstanding (or kept the socket open if the connection has at least been established), you at least have the possibility of success occurring.
Any thoughts on this?
On a high-packet-loss network (e.g. cellular, or wi-fi near the limits of its range), there's a distinct possibility that your requests will continue to time out forever if the timeout is too short. So increasing the timeout is often a good idea.
And retrying the request immediately often works, and if it doesn't, waiting a while might make no difference (e.g. if you no longer have a network connection). For example, on iOS, your best bet is to use reachability, and if reachability determines that the network is down, there's no reason to retry until it isn't.
My general thoughts are that for short requests (i.e. not uploading/downloading large files) if you haven't received any response from the server at all after 3-5 seconds, start a second request in parallel. Whichever request returns a header first wins. Cancel the other one. Keep the timeout at 90 seconds. If that fails, see if you can reach generate_204.
If generate_204 works, the problem could be a server issue. Retry immediately, but flag the server as suspect. If that retry fails a second time (after a successful generate_204 response), start your exponential backoff waiting for the server (with a cap on the maximum interval).
If the generate_204 request doesn't respond, your network is dead. Wait for a network change, trying only very occasionally (e.g. every couple of minutes minimum).
If the network connectivity changes (i.e. if you suddenly have Wi-Fi), restart any waiting connections after a few seconds. There's no reason to wait the full time at that point, because everything has changed.
But obviously there's no correct answer. This approach is fairly aggressive. Others might take the opposite approach. It all depends on what your goals are.
There's not much point in sleeping when you could be doing useful work, or in using a shorter timeout than you can really tolerate. I would use (2).
The idea that Ethernet or indeed anything uses (1) seems fanciful. Do you have a citation?
Related
One of my colleagues asked me this question what the difference between Circuit Breaker and Retry is but I was not able answer him correctly. All I know circuit breaker is useful if there is heavy request payload, but this can be achieve using retry. Then when to use Circuit Breaker and when to Retry.
Also, it is it possible to use both on same API?
The Retry pattern enables an application to retry an operation in hopes of success.
The Circuit Breaker pattern prevents an application from performing an operation that is likely to fail.
Retry - Retry pattern is useful in scenarios of transient failures. What does this mean? Failures that are "temporary", lasting only for a short amount of time are transient. A momentary loss of network connectivity, a brief moment when the service goes down or is unresponsive and related timeouts are examples of transient failures.
As the failure is transient, retrying after some time could possibly give us the result needed
Circuit Breaker - Circuit Breaker pattern is useful in scenarios of long lasting faults. Consider a loss of connectivity or the failure of a service that takes some time to repair itself. In such cases, it may not be of much use to keep retrying often if it is indeed going to take a while to hear back from the server. The Circuit Breaker pattern wants to prevent an application from performing an operation that is likely to fail.
The Circuit Breaker keeps a tab on the number of recent failures, and on the basis of a pre-determined threshold, determines whether the request should be sent to the server under stress or not.
Several years ago I wrote a resilience catalog to describe different mechanisms. Originally I've created this document for co-workers and then I shared it publicly. Please allow me to quote here the relevant parts.
Retry
Categories: reactive, after the fact
The relation between retries and attempts: n retries means at most n+1 attempts. The +1 is the initial request, if it fails (for whatever reason) then retry logic kicks in. In other words, the 0th step is executed with 0 delay penalty.
There are situation where your requested operation relies on a resource, which might not be reachable in a certain point of time. In other words there can be a temporal issue, which will be gone sooner or later. This sort of issues can cause transient failures. With retries you can overcome these problems by attempting to redo the same operation in a specific moment in the future. To be able to use this mechanism the following criteria group should be met:
The potentially introduced observable impact is acceptable
The operation can be redone without any irreversible side effect
The introduced complexity is negligible compared to the promised reliability
Let’s review them one by one:
The word failure indicates that the effect is observable by the requester as well, for example via higher latency / reduced throughput / etc.. If the “penalty“ (delay or reduced performance) is unacceptable then retry is not an option for you.
This requirement is also known as idempotent operation. If I call the action with the same input several times then it will produce the exact same result. In other words, the operation acts like it only depends on its parameter and nothing else influences the result (like other objects' state).
This condition is even though one of the most crucial, this is the one that is almost always forgotten. As always there are trade-offs (If I introduce Z then it will increase X but it might decrease Y).
We should be fully aware of them otherwise it will give us some unwanted surprises in the least expected time.
Circuit Breaker
Categories: proactive, before the fact
It is hard to categorize the circuit breaker because it is pro- and reactive at the same time. It detects that a given downstream system is malfunctioning (reactive) and it protects the downstream systems from being flooded with new requests (proactive).
This is one of the most complex patterns mainly because it uses different states to define different behaviours. Before we jump into the details lets see why this tool exists at all:
Circuit breaker detects failures and prevents the application from trying to perform the action that is doomed to fail (until it is safe to retry) - Wikipedia
So, this tool works as a mini data and control plane. The requests go through this proxy, which examines the responses (if any) and it counts subsequent failures. If a predefined threshold is reached then the transfer is suspended temporarily and it fails immediately.
Why is it useful?
It prevents cascading failures. In other words the transient failure of a downstream system should not be propagated to the upstream systems. By concealing the failure we are actually preventing a chain reaction (domino effect) as well.
How does it know when a transient failure is gone?
It must somehow determine when would be safe to operate again as a proxy. For example it can use the same detection mechanism that was used during the original failure detection. So, it works like this: after a given period of time it allows a single request to go through and it examines the response. If it succeeds then the downstream is treated as healthy. Otherwise nothing changes (no request is transferred through this proxy) only the timer is reset.
What states does it use?
The circuit breaker can be in any of the following states: Closed, Open, HalfOpen.
Closed: It allows any request. It counts successive failed requests.
If the successive failed count is below the threshold and the next request succeeds then the counter is set back to 0.
If the predefined threshold is reached then it transitions into Open
Open: It rejects any request immediately. It waits a predefined amount of time.
If that time is elapsed then it transitions into HalfOpen
HalfOpen: It allows only one request. It examines the response of that request:
If the response indicates success then it transitions into Closed
If the response indicates failure then it transitions back to Open
Resiliency strategy
The above two mechanisms / policies are not mutually exclusive, on the contrary. They can be combined via the escalation mechanism. If the inner policy can't handle the problem it can propagate one level up to an outer policy.
When you try to perform a request while the Circuit Breaker is Open then it will throw an exception. Your retry policy could trigger for that and adjust its sleep duration (to avoid unnecessary attempts).
The downstream system can also inform upstream that it is receiving too many requests with 429 status code. The Circuit Breaker could also trigger for this and use the Retry-After header's value for its sleep duration.
So, the whole point of this section is that you can define a protocol between client and server how to overcome on transient failures together.
When performing AJAX requests, I have always tried to do as few as possible since there is an overhead to each request having to open the http connection to send the data. Since a websocket connection is constantly open, is there any cost outside of the obvious packet bandwidth to sending a request?
For example. Over the space of 1 minute, a client will send 100kb of data to the server. Assuming the client does not need a response to any of these requests, is there any advantage to queuing packets and sending them in one big burst vs sending them as they are ready?
In other words, is there an overhead to the stopping and starting data transfer for a connection that is constantly open?
I want to make a multiplayer browser game as real time as possible, but I don't want to find that 100s of tiny requests per minute compared to a larger consolidated request is causing the server additional stress. I understand that if the client needs a response it will be slower as there is a lot of waiting from the back and forth. I will consider this and only consolidate when it is appropriate. The more smaller requests per minute, the better user experience, but I don't know what toll it will have on the server.
You are correct that a webSocket message will have lower overhead for a given message transmission than sending the same message via an Ajax call because the webSocket connection is already established and because a webSocket message has lower overhead than an HTTP request.
First off, there's always less overhead in sending one larger transmission vs. sending lots of smaller transmissions. That's just the nature of TCP. Every TCP packet gets separately processed and acknowledged so sending more of them costs a bit more overhead. Whether that difference is relevant or significant and worth writing extra code for or worth sacrificing some element of your user experience (because of the delay for batching) depends entirely upon the specifics of a given situation.
Since you've described a situation where your client gets the best experience if there is no delay and no batching of packets, then it seems that what you should do is not implement the batching and test out how your server handles the load with lots of smaller packets when it gets pretty busy. If that works just fine, then stay with the better user experience. If you have issues keeping up with the load, then seriously profile your server and find out where the main bottleneck to performance is (you will probably be surprised about where the bottleneck actually is as it is often not where you think it will be - that's why you have to profile and measure to know where to concentrate your energy for improving the scalability).
FYI, due to the implementation of Nagel's algorithm in most implementations of TCP, the TCP stack itself does small amounts of batching for you if you are sending multiple requests fairly closely spaced in time or if sending over a slower link.
It's also possible to implement a dynamic system where as long as your server is able to keep up, you keep with the smaller and more responsive packets, but if your server starts to get busy, you start batching in order to reduce the number of separate transmissions.
I developed a server for a custom protocol based on tcp/ip-stack with Netty. Writing this was a pleasure.
Right now I am testing performance. I wrote a test-application on netty that simply connects lots (20.000+) of "clients" to the server (for-loop with Thread.wait(1) after each bootstrap-connect). As soon as a client-channel is connected it sends a login-request to the server, that checks the account and sends a login-response.
The overall performance seems to be quite OK. All clients are logged in below 60s. But what's not so good is the spread waiting time per connections. I have extremely fast logins and extremely slow logins. Variing from 9ms to 40.000ms spread over the whole test-time. Is it somehow possible to share waiting time among the requesting channels (Fifo)?
I measured a lot of significant timestamps and found a strange phenomenon. I have a lot of connections where the server's timestamp of "channel-connected" is way after the client's timestamp (up to 19 seconds). I also do have the "normal" case, where they match and just the time between client-sending and server-reception is several seconds. And there are cases of everything in between those two cases. How can it be, that client and server "channel-connected" are so much time away from each other?
What is for sure is, that the client immediatly receives the server's login-response after it has been send.
Tuning:
I think I read most of the performance-articles around here. I am using the OrderMemoryAwareThreadPool with 200 Threads on a 4CPU-Hyper-Threading-i7 for the incoming connections and also do start the server-application with the known aggressive-options. I also completely tweaked my Win7-TCP-Stack.
The server runs very smooth on my machine. CPU-usage and memory consumption is ca. at 50% from what could be used.
Too much information:
I also started 2 of my test-apps from 2 seperate machines "attacking" the server in parallel with 15.000 connections each. There I had about 800 connections that got a timeout from the server. Any comments here?
Best regards and cheers to Netty,
Martin
Netty has a dedicated boss thread that accepts an incoming connection. If the boss thread accepts a new connection, it forwards the connection to a worker thread. The latency between the acceptance and the actual socket read might be larger than expected under load because of this. Although we are looking into different ways to improve the situation, meanwhile, you might want to increase the number of worker threads so that a worker thread handles less number of connections.
If you think it's performing way worse than non-Netty application, please feel free to file an issue with reproducing test case. We will try to reproduce and fix the problem.
I'm working on a UDP server/client configuration. The client sends the server a single packet, which varies in size but is usually <500 bytes. The server responds essentially instantly with a single outgoing packet, usually smaller than the incoming request packet. Complete transactions always consist of a single packet exchange.
If the client doesn't see the response within T amount of time, it retries R times, increasing T by X before each retry, before finally giving up and returning an error. Currently, R is never changed.
Is there any special logic to choosing optimum initial T (wait time), R (retries), and X (wait increase)? How persistent should retries be (ie, what minimum R to use) to reach some approximation of a "reliable" protocol?
This is similar to question 5227520. Googling "tcp retries" and "tcp retransmission" leads to lots of suggestions that have been tried over the years. Unfortunately, no single solution appears optimum.
I'd choose T to start at 2 or 3 seconds. My increase X would be half of T (doubling T seems popular, but you quickly get long timeouts). I'd adjust R on the fly to be at least 5 and more if necessary so my total timeout is at least a minute or two.
I'd be careful not to leave R and T too high if subsequent transactions are usually quicker; you might want to lower R and T as your stats allow so you can retry and get a quick response instead of leaving R and T at their max (especially if your clients are human and you want to be responsive).
Keep in mind: you're never going to be as reliable as an algorithm that retries more than you, if those retries succeed. On the other hand, if your server is always available and always "responds essentially instantly" then if the client fails to see a response it's a failure out of your server's control and the only thing that can be done is for the client to retry (although a retry can be more than just resending, such as closing/reopening the connection, trying a backup server at a different IP, etc).
The minimum timeout should be the path latency, or half the Round-Trip-Time (RTT).
See RFC 908 — Reliable Data Protocol.
The big question is deciding what happens after one timeout, do you reset to the same timeout or do you double up? This is a complicated decision based on the size on the frequency of the communication and how fair you wish to play with others.
If you are finding packets are frequently lost and latency is a concern then you want to look at either keeping the same timeout or having a slow ramp up to exponential timeouts, e.g. 1x, 1x, 1x, 1x, 2x, 4x, 8x, 16x, 32x.
If bandwidth isn't much of a concern but latency really is, then follow UDP-based Data Transfer Protocol (UDT) and force the data through with low timeouts and redundant delivery. This is useful for WAN environments, especially intercontinental distances and why UDT is frequently found within WAN accelerators.
More likely latency isn't that much of a concern and fairness to other protocols is preferred, then use a standard back-off pattern, 1x, 2x, 4x, 8x, 16x, 32x.
Ideally the implementation of the protocol handling should be advanced to automatically derive the optimum timeout and retry periods. When there is no data loss you do not need redundant delivery, when there is data loss you need to increase delivery. For timeouts you may wish to consider reducing the timeout in optimum conditions then slowing down when congestion occurs to prevent synonymous broadcast storms.
I'm trying to work out in my head the best way to structure a Cocoa app that's essentially a concurrent download manager. There's a server the app talks to, the user makes a big list of things to pull down, and the app processes that list. (It's not using HTTP or FTP, so I can't use the URL-loading system; I'll be talking across socket connections.)
This is basically the classic producer-consumer pattern. The trick is that the number of consumers is fixed, and they're persistent. The server sets a strict limit on the number of simultaneous connections that can be open (though usually at least two), and opening new connections is expensive, so in an ideal world, the same N connections are open for the lifetime of the app.
One way to approach this might be to create N threads, each of which would "own" a connection, and wait on the request queue, blocking if it's empty. Since the number of connections will never be huge, this is not unreasonable in terms of actual system overhead. But conceptually, it seems like Cocoa must offer a more elegant solution.
It seems like I could use an NSOperationQueue, and call setMaxConcurrentOperationCount: with the number of connections. Then I just toss the download requests into that queue. But I'm not sure, in that case, how to manage the connections themselves. (Just put them on a stack, and rely on the queue to ensure I don't over/under-run? Throw in a dispatch semaphore along with the stack?)
Now that we're in the brave new world of Grand Central Dispatch, does that open up any other ways of tackling this? At first blush, it doesn't seem like it, since GCD's flagship ability to dynamically scale concurrency (and mentioned in Apple's recommendations on Changing Producer-Consumer Implementations) doesn't actually help me. But I've just scratched the surface of reading about it.
EDIT:
In case it matters: yes, I am planning on using the asynchronous/non-blocking socket APIs to do the actual communication with the server. So the I/O itself does not have to be on its own thread(s). I'm just concerned with the mechanics of queuing up the work, and (safely) doling it out to the connections, as they become available.
If you're using CFSocket's non-blocking calls for I/O, I agree, that should all happen on the main thread, letting the OS handle the concurrency issues, since you're just copying data and not really doing any computation.
Beyond that, it sounds like the only other work your app needs to do is maintain a queue of items to be downloaded. When any one of the transfers is complete, the CFSocket call back can initiate the transfer of the next item on the queue. (If the queue is empty, decrement your connection count, and if something is added to an empty queue, start a new transfer.) I don't see why you need multiple threads for that.
Maybe you've left out something important, but based on your description the app is I/O bound, not CPU bound, so all of the concurrency stuff is just going to make more complicated code with minimal impact on performance.
Do it all on the main thread.
For posterity's sake, after some discussion elsewhere, the solution I think I'd adopt for this is basically:
Have a queue of pending download operations, initially empty.
Have a set containing all open connections, initially empty.
Have a mutable array (queue, really) of idle open connections, initially empty.
When the user adds a download request:
If the array of idle connections is not empty, remove one and assign the download to it.
If there are no idle connections, but the number of total connections has not reached its limit, open a new connection, add it to the set, and assign the download to it.
Otherwise, enqueue the download for later.
When a download completes: if there are queued requests, dequeue one
and give it to the connection; otherwise, add the connection to the idle list.
All of that work would take place on the main thread. The work of decoding the results of each download would be offloaded to GCD, so it can handle throttling the concurrency, and it doesn't clog the main thread.
Opening a new connection might take a while, so the process of creating a new one might be a tad more complicated in actual practice (say, enqueue the download, initiate the connection process, and then dequeue it when the connection is fully established). But I still think my perception of the possibility of race conditions was overstated.