Use the same etcd client object to watch and campaign cause the number of mvcc_watcher_total keep increasing, has anyone encountered the same problem? - etcd

I use the same etcd client object to both watch and campaign, and this cause me the number of mvcc_watcher_total keep increasing, with time going, etcd can not tolerate this increasing and will not work.

Related

AWS Kinesis - Avoiding stalled shards

I am using Kinesis to process events in a micro-service architecture. Events are partitioned at a client project level to ensure all events related to the same project occur in the correct sequence. Currently if there is an error processing one of the events, this can cause the events from other partitions to also become blocked. I had hoped that by increasing the parallelisation factor and bisecting the batch on error, this would allow the other lambda processors to continue processing events from other partitions. This is largely the case, but there are still times when multiple partitions become stuck, presumably because kinesis is sometimes deciding to always allocate several partitions to the same lambda processor.
My question is, is there any way to avoid this in kinesis, or will I need to start making use of a dead letter queue, and removing events that are repeatedly failing? Downside to this is that I don't really want to continue processing further events for the same partition once there is a failure, as the state of the micro-service is likely to be corrupt at this point, and I would rather out team manually address whatever issue has occurred before continuing to play events from the failed partition.

Spring Cloud Stream: Proper handling for StreamListener when it takes a long time to process the message

When StreamListener is taking a long time (longer than max.poll.interval.ms) to process a message, thus that particular consumer is occupied and other new messages will be assigned to other partitions. After the time is greater than max.poll.interval.ms, rebalance happened and the same situation will happened to another consumer. So this message will circulate around all the partitions and keep on hogging the resources.
However, this situation is not happening very often, only a few messages somehow is taking such a long time to process and it's uncontrollable.
Can we commit the offset and throw it to DLQ after a few times of rebalancing? If yes, how can we do that? If no, what's the proper handling for this kind of situation?
Increasing max.poll.interval.ms will have no impact on performance (except it will take longer to detect a consumer that is really dead).
Going through a rebalance each time you process this "bad" record is much more damaging to performance.
You can, however, do what you want with a custom SeekToCurrentErrorHandler together with a recoverer such as the DeadLetterPublishingRecoverer. You would also need a rebalance listener to count the rebalances and some mechanism to share the state from the error handler across instances (the standard one only keeps state in memory).
Quite complicated, I think.

Adjusting HTTP Timeout versus backoff during retries

I'm wondering about the trade-offs between two approaches to handling HTTP timeouts between two services. Service A is trying to implement retry functionality when calling service B.
Approach 1: This is the typical approach (e.g. Ethernet proto). Perform a request with fixed timeout T. If timeout occurs, sleep for X and retry the request. Increase X exponentially.
Approach 2: Instead of sleeping between retries, increase the actual HTTP timeout value (say, exponentially). In both cases, consider a max-bound.
For Ethernet, this makes sense because of it's low-level location in the network stack. However, for an application-level retry mechanism, would approach 2 be more appropriate? In a situation where there are high levels of network congestion, I would think #2 is better for a couple reasons:
Sending additional TCP connection requests will only flood the network more
You're basically guaranteed to not receive a response when you're sleeping (because you already timed out and/or tore down the socket), whereas if you instead just allowed the TCP request to remain outstanding (or kept the socket open if the connection has at least been established), you at least have the possibility of success occurring.
Any thoughts on this?
On a high-packet-loss network (e.g. cellular, or wi-fi near the limits of its range), there's a distinct possibility that your requests will continue to time out forever if the timeout is too short. So increasing the timeout is often a good idea.
And retrying the request immediately often works, and if it doesn't, waiting a while might make no difference (e.g. if you no longer have a network connection). For example, on iOS, your best bet is to use reachability, and if reachability determines that the network is down, there's no reason to retry until it isn't.
My general thoughts are that for short requests (i.e. not uploading/downloading large files) if you haven't received any response from the server at all after 3-5 seconds, start a second request in parallel. Whichever request returns a header first wins. Cancel the other one. Keep the timeout at 90 seconds. If that fails, see if you can reach generate_204.
If generate_204 works, the problem could be a server issue. Retry immediately, but flag the server as suspect. If that retry fails a second time (after a successful generate_204 response), start your exponential backoff waiting for the server (with a cap on the maximum interval).
If the generate_204 request doesn't respond, your network is dead. Wait for a network change, trying only very occasionally (e.g. every couple of minutes minimum).
If the network connectivity changes (i.e. if you suddenly have Wi-Fi), restart any waiting connections after a few seconds. There's no reason to wait the full time at that point, because everything has changed.
But obviously there's no correct answer. This approach is fairly aggressive. Others might take the opposite approach. It all depends on what your goals are.
There's not much point in sleeping when you could be doing useful work, or in using a shorter timeout than you can really tolerate. I would use (2).
The idea that Ethernet or indeed anything uses (1) seems fanciful. Do you have a citation?

Efficiently implementing Birman-Schiper-Stephenson(BSS) protocol's delay queue

I am using the Birman-Schiper-Stephenson protocol of distributed system with the current assumption that peer set of any node doesn't change. As the protocol dictates, the messages which have come out of causal order to a node have to be put in a 'delay queue'. My problem is with the organisation of the delay queue where we must implement some kind of order with the messages. After deciding the order we will have to make a 'Wake-Up' protocol which would efficiently search the queue after the current timestamp is modified to find out if one of the delayed messages can be 'woken-up' and accepted.
I was thinking of segregating the delayed messages into bins based on the points of difference of their vector-timestamps with the timestamp of this node. But the number of bins can be very large and maintaining them won't be efficient.
Please suggest some designs for such a queue(s).
Sorry about the delay -- didn't see your question until now. Anyhow, if you look at Isis2.codeplex.com you'll see that in Isis2, I have a causalsend implementation that employs the same vector timestamp scheme we described in the BSS paper. What I do is to keep my messages in a partial order, sorted by VT, and then when a delivery occurs I can look at the delayed queue and deliver off the front of the queue until I find something that isn't deliverable. Everything behind it will be undeliverable too.
But in fact there is a deeper insight here: you actually never want to allow the queue of delayed messages to get very long. If the queue gets longer than a few messages (say, 50 or 100) you run into the problem that the guy with the queue could be holding quite a few bytes of data and may start paging or otherwise running slowly. So it becomes a self-perpetuating cycle in which because he has a queue, he is very likely to be dropping messages and hence enqueuing more and more. Plus in any case from his point of view, the urgent thing is to recover that missed message that caused the others to be out of order.
What this adds up to is that you need a flow control scheme in which the amount of pending asynchronous stuff is kept small. But once you know the queue is small, searching every single element won't be very costly! So this deeper perspective says flow control is needed no matter what, and then because of flow control (if you have a flow control scheme that works) the queue is small, and because the queue is small, the search won't be costly!

sync client time to server time, i.e. to make client application independant of the local computer time

Ok, so the situation is as follows.
I have a server with services for a game, a particular command from the server sends a timestamp for when the next game round should commence. To get this perfectly synced on all connected clients I also have a webbservice that returns a timestamp of the servers current time.
What I know: the time between request sent and answer recieved.
What I dont know: where the latency lies, on client processing or server processing or bandwidth issues.
What is the best practice to get a reasonable result here. I guess that GPS must have solved this in some fashion but I´ve been unable to find a good pattern.
What I do now is to add half the latency of the request to the server timestamp, but it's not quite good enough. This may have to do that the time between send and recieve can be as high as 11 seconds.
Suggestions?
There're many common solutions to sync time between machines, including correct PLL implementation done by NTPD with RTP. This is useful to you if you can change machine's local time. If not, perhaps you should do more or less what you did, but drop sync points where the latency is unreasonable.
The best practice is usually not to synchronise the absolute times but to work with relative times instead.

Resources