Measure RTT of WebsocketServer: Single Message or bulk of messages?

Measure RTT of WebsocketServer: Single Message or bulk of messages? - performance

I recently changed something in the WebsocketServer implementation of one of my projects, that (presumably) decreases the Round-Trip-Time (RTT) of its messages. I want to measure the RTT of both implementations to compare them. For this, I am sending m messages of s bytes from c different connections.
The thing I am wondering now is: Should the RTT be measured for each message separately - remembering the sending time of each message and when the response arrives -, or should I remember the time the first message was sent and the response to the last message arrives? Which one is more accurate?
I would probably go for the first approach, what made me wondering is that websocket-benchmark seems to use the latter approach. Is this just an oversight or is there a reasoning behind it?

Related

What additional overheads are there to sending a packet over a websocket connection?

When performing AJAX requests, I have always tried to do as few as possible since there is an overhead to each request having to open the http connection to send the data. Since a websocket connection is constantly open, is there any cost outside of the obvious packet bandwidth to sending a request?
For example. Over the space of 1 minute, a client will send 100kb of data to the server. Assuming the client does not need a response to any of these requests, is there any advantage to queuing packets and sending them in one big burst vs sending them as they are ready?
In other words, is there an overhead to the stopping and starting data transfer for a connection that is constantly open?
I want to make a multiplayer browser game as real time as possible, but I don't want to find that 100s of tiny requests per minute compared to a larger consolidated request is causing the server additional stress. I understand that if the client needs a response it will be slower as there is a lot of waiting from the back and forth. I will consider this and only consolidate when it is appropriate. The more smaller requests per minute, the better user experience, but I don't know what toll it will have on the server.

You are correct that a webSocket message will have lower overhead for a given message transmission than sending the same message via an Ajax call because the webSocket connection is already established and because a webSocket message has lower overhead than an HTTP request.
First off, there's always less overhead in sending one larger transmission vs. sending lots of smaller transmissions. That's just the nature of TCP. Every TCP packet gets separately processed and acknowledged so sending more of them costs a bit more overhead. Whether that difference is relevant or significant and worth writing extra code for or worth sacrificing some element of your user experience (because of the delay for batching) depends entirely upon the specifics of a given situation.
Since you've described a situation where your client gets the best experience if there is no delay and no batching of packets, then it seems that what you should do is not implement the batching and test out how your server handles the load with lots of smaller packets when it gets pretty busy. If that works just fine, then stay with the better user experience. If you have issues keeping up with the load, then seriously profile your server and find out where the main bottleneck to performance is (you will probably be surprised about where the bottleneck actually is as it is often not where you think it will be - that's why you have to profile and measure to know where to concentrate your energy for improving the scalability).
FYI, due to the implementation of Nagel's algorithm in most implementations of TCP, the TCP stack itself does small amounts of batching for you if you are sending multiple requests fairly closely spaced in time or if sending over a slower link.
It's also possible to implement a dynamic system where as long as your server is able to keep up, you keep with the smaller and more responsive packets, but if your server starts to get busy, you start batching in order to reduce the number of separate transmissions.

Com port queue latency metering

I have two programs (host and slave) communicating over a com port. In the simplest case, the host sends a command to the slave and waits for a response, then does it again. But this means that each side has to wait for the other for every transaction. So I use a queue so the second command can be sent before the first response comes back. This keeps things flowing faster.
But I need a way of metering the use of the queue so that there are never more than N command/response pairs in route at any time. So for example if N is 3, I will wait to send the fourth command until I get the first response back, etc. And it must keep track of which response goes with which command.
One thought I had is to tag each command with an integer modulo counter which is also returned with the response. This would ensure that the command and response are always paired correctly and I can do a modulo compare to be able to meter the commands always N ahead of the responses.
What I am wondering, is there a better way? Isn't this a somewhat common thing to do?
(I am using Python, but that is not important.)

Using a sequence number and modulo arithmetic is in fact quite a common way to both acknowledge messages received and tell the sender when it can send more messages - see e.g. http://en.wikipedia.org/wiki/Sliding_window_protocol. Unfortunately for you, the obvious example, TCP, is unusual in that it uses a sequence number based on byte counts, not message counts, but the principle is much the same - TCP just has an extra degree of flexibility.

What are good UDP timeout and retry values?

I'm working on a UDP server/client configuration. The client sends the server a single packet, which varies in size but is usually <500 bytes. The server responds essentially instantly with a single outgoing packet, usually smaller than the incoming request packet. Complete transactions always consist of a single packet exchange.
If the client doesn't see the response within T amount of time, it retries R times, increasing T by X before each retry, before finally giving up and returning an error. Currently, R is never changed.
Is there any special logic to choosing optimum initial T (wait time), R (retries), and X (wait increase)? How persistent should retries be (ie, what minimum R to use) to reach some approximation of a "reliable" protocol?

This is similar to question 5227520. Googling "tcp retries" and "tcp retransmission" leads to lots of suggestions that have been tried over the years. Unfortunately, no single solution appears optimum.
I'd choose T to start at 2 or 3 seconds. My increase X would be half of T (doubling T seems popular, but you quickly get long timeouts). I'd adjust R on the fly to be at least 5 and more if necessary so my total timeout is at least a minute or two.
I'd be careful not to leave R and T too high if subsequent transactions are usually quicker; you might want to lower R and T as your stats allow so you can retry and get a quick response instead of leaving R and T at their max (especially if your clients are human and you want to be responsive).
Keep in mind: you're never going to be as reliable as an algorithm that retries more than you, if those retries succeed. On the other hand, if your server is always available and always "responds essentially instantly" then if the client fails to see a response it's a failure out of your server's control and the only thing that can be done is for the client to retry (although a retry can be more than just resending, such as closing/reopening the connection, trying a backup server at a different IP, etc).

The minimum timeout should be the path latency, or half the Round-Trip-Time (RTT).
See RFC 908 — Reliable Data Protocol.
The big question is deciding what happens after one timeout, do you reset to the same timeout or do you double up? This is a complicated decision based on the size on the frequency of the communication and how fair you wish to play with others.
If you are finding packets are frequently lost and latency is a concern then you want to look at either keeping the same timeout or having a slow ramp up to exponential timeouts, e.g. 1x, 1x, 1x, 1x, 2x, 4x, 8x, 16x, 32x.
If bandwidth isn't much of a concern but latency really is, then follow UDP-based Data Transfer Protocol (UDT) and force the data through with low timeouts and redundant delivery. This is useful for WAN environments, especially intercontinental distances and why UDT is frequently found within WAN accelerators.
More likely latency isn't that much of a concern and fairness to other protocols is preferred, then use a standard back-off pattern, 1x, 2x, 4x, 8x, 16x, 32x.
Ideally the implementation of the protocol handling should be advanced to automatically derive the optimum timeout and retry periods. When there is no data loss you do not need redundant delivery, when there is data loss you need to increase delivery. For timeouts you may wish to consider reducing the timeout in optimum conditions then slowing down when congestion occurs to prevent synonymous broadcast storms.

Howto take latency differences into consideration when verifying location differences with timestamps (anti-cheating)?

When you have a multiplayer game where the server is receiving movement (location) information from the client, you want to verify this information as an anti-cheating measure.
This can be done like this:
maxPlayerSpeed = 300; // = 300 pixels every 1 second
if ((1000 / (getTime() - oldTimestamp) * (newPosX - oldPosX)) > maxPlayerSpeed)
{
disconnect(player); //this is illegal!
}
This is a simple example, only taking the X coords into consideration. The problem here is that the oldTimestamp is stored as soon as the last location update was received by the server. This means that if there was a lag spike at that time, the old timestamp will be received much later relatively than the new location update by the server. This means that the time difference will not be accurate.
Example:
Client says: I am now at position 5x10
Lag spike: server receives this message at timestamp 500 (it should normally arrive at like 30)
....1 second movement...
Client says: I am now at position 20x15
No lag spike: server receives message at timestamp 1530
The server will now think that the time difference between these two locations is 1030. However, the real time difference is 1500. This could cause the anti-cheating detection to think that 1030 is not long enough, thus kicking the client.
Possible solution: let the client send a timestamp while sending, so that the server can use these timestamps instead
Problem: the problem with that solution is that the player could manipulate the client to send a timestamp that is not legal, so the anti-cheating system won't kick in. This is not a good solution.
It is also possible to simply allow maxPlayerSpeed * 2 speed (for example), however this basically allows speed hacking up to twice as fast as normal. This is not a good solution either.
So: do you have any suggestions on how to fix this "server timestamp & latency" issue in order to make my anti-cheating measures worthwhile?

No no no.. with all due respect this is all wrong, and how NOT to do it.
The remedy is not trusting your clients. Don't make the clients send their positions, make them send their button states! View the button states as requests where the clients say "I'm moving forwards, unless you object". If the client sends a "moving forward" message and can't move forward, the server can ignore that or do whatever it likes to ensure consistency. In that case, the client only fools itself.
As for speed-hacks made possible by packet flooding, keep a packet counter. Eject clients who send more packets within a certain timeframe than the allowed settings. Clients should send one packet per tick/frame/world timestep. It's handy to name the packets based on time in whole timestep increments. Excessive packets of the same timestep can then be identified and ignored. Note that sending the same packet several times is a good idea when using UDP, to prevent package loss.
Again, never trust the client. This can't be emphasized enough.

Smooth out lag spikes by filtering. Or to put this another way, instead of always comparing their new position to the previous position, compare it to the position of several updates ago. That way any short-term jitter is averaged out. In your example the server could look at the position before the lag spike and see that overall the player is moving at a reasonable speed.
For each player, you could simply hold the last X positions, or you might hold a lot of recent positions plus some older positions (eg 2, 3, 5, 10 seconds ago).
Generally you'd be performing interpolation/extrapolation on the server anyway within the normal movement speed bounds to hide the jitter from other players - all you're doing is extending this to your cheat checking mechanism as well. All legitimate speed-ups are going to come after an apparent slow-down, and interpolation helps cover that sort of error up.

Regardless of opinions on the approach, what you are looking for is the speed threshold that is considered "cheating".
Given a a distance and a time increment, you can trivially see if they moved "too far" based on your cheat threshold.
time = thisTime - lastTime;
speed = distance / time;
If (speed > threshold) dudeIsCheating();
The times used for measurement are server received packet times. While it seems trivial, it is calculating distance for every character movement, which can end up very expensive. The best route is server calculate position based on velocity and that is the character's position. The client never communicates a position or absolute velocity, instead, the client sends a "percent of max" velocity.
To clarify:
This was just for the cheating check. Your code has the possibility of lag or long processing on the server affect your outcome. The formula should be:
maxPlayerSpeed = 300; // = 300 pixels every 1 second
if (maxPlayerSpeed <
(distanceTraveled(oldPos, newPos) / (receiveNewest() - receiveLast()))
{
disconnect(player); //this is illegal!
}
This compares the players rate of travel against the maximum rate of travel. The timestamps are determined by when you receive the packet, not when you process the data. You can use whichever method you care to to determine the updates to send to the clients, but for the threshold method you want for determining cheating, the above will not be impacted by lag.
Receive packet 1 at second 1: Character at position 1
Receive packet 2 at second 100: Character at position 3000
distance traveled = 2999
time = 99
rate = 30
No cheating occurred.
Receive packet 3 at second 101: Character at position 3301
distance traveled = 301
time = 1
rate = 301
Cheating detected.
What you are calling a "lag spike" is really high latency in packet delivery. But it doesn't matter since you aren't going by when the data is processed, you go by when each packet was received. If you keep the time calculations independent of your game tick processing (as they should be as stuff happened during that "tick") high and low latency only affect how sure the server is of the character position, which you use interpolation + extrapolation to resolve.
If the client is out of sync enough to where they haven't received any corrections to their position and are wildly out of sync with the server, there is significant packet loss and high latency which your cheating check will not be able to account for. You need to account for that at a lower layer with the handling of actual network communications.
For any game data, the ideal method is for all systems except the server to run behind by 100-200ms. Say you have an intended update every 50ms. The client receives the first and second. The client doesn't have any data to display until it receives the second update. Over the next 50 ms, it shows the progression of changes as it has already occurred (ie, it's on a very slight delayed playback). The client sends its button states to the server. The local client also predicts the movement, effects, etc. based on those button presses but only sends the server the "button state" (since there are a finite number of buttons, there are a finite number of bits necessary to represent each state, which allows for a more compact packet format).
The server is the authoritative simulation, determining the actual outcomes. The server sends updates every, say, 50ms to the clients. Rather than interpolating between two known frames, the server instead extrapolates positions, etc. for any missing data. The server knows what the last real position was. When it receives an update, the next packet sent to each of the clients includes the updated information. The client should then receive this information prior to reaching that point in time and the players react to it as it occurs, not seeing any odd jumping around because it never displayed an incorrect position.
It's possible to have the client be authoritative for some things, or to have a client act as the authoritative server. The key is determining how much impact trust in the client is there.
The client should be sending updates regularly, say, every 50 ms. That means that a 500 ms "lag spike" (delay in packet reception), either all packets sent within the delay period will be delayed by a similar amount or the packets will be received out of order. The underlying networking should handle these delays gracefully (by discarding packets that have an overly large delay, enforcing in order packet delivery, etc.). The end result is that with proper packet handling, the issues anticipated should not occur. Additionally, not receiving explicit character locations from the client and instead having the server explicitly correct the client and only receive control states from the client would prevent this issue.

How Does the Network Time Protocol Work?

The Wikipedia entry doesn't give details and the RFC is way too dense. Does anyone around here know, in a very general way, how NTP works?
I'm looking for an overview that explains how Marzullo's algorithm (or a modification of it) is employed to translate a timestamp on a server into a timestamp on a client. Specifically what mechanism is used to produce accuracy which is, on average, within 10ms when that communication takes place over a network with highly variable latency which is frequently several times that.

(This isn't Marzullo's algorithm. That's only used by the high-stratum servers to get really accurate time using several sources. This is how an ordinary client gets the time, using only one server)
First of all, NTP timestamps are stored as seconds since January 1, 1900. 32 bits for the number of seconds, and 32 bits for the fractions of a second.
The synchronization is tricky. The client stores the timestamp (say A) (all these values are in seconds) when it sends the request. The server sends a reply consisting of the "true" time when it received the packet (call that X) and the "true" time it will transmit the packet (Y). The client will receive that packet and log the time when it received it (B).
NTP assumes that the time spent on the network is the same for sending and receiving. Over enough intervals over sane networks, it should average out to be so. We know that the total transit time from sending the request to receiving the response was B-A seconds. We want to remove the time that the server spent processing the request (Y-X), leaving only the network traversal time, so that's B-A-(Y-X). Since we're assuming the network traversal time is symmetric, the amount of time it took the response to get from the server to the client is [B-A-(Y-X)]/2. So we know that the server sent its response at time Y, and it took us [B-A-(Y-X)]/2 seconds for that response to get to us.
So the true time when we received the response is Y+[B-A-(Y-X)]/2 seconds. And that's how NTP works.
Example (in whole seconds to make the math easy):
Client sends request at "wrong" time 100. A=100.
Server receives request at "true" time 150. X=150.
The server is slow, so it doesn't send out the response until "true" time 160. Y=160.
The client receives the request at "wrong" time 120. B=120.
Client determines the time spend on the network is B-A-(Y-X)=120-100-(160-150)=10 seconds
Client assumes the amount of time it took for the response to get from the server to the client is 10/2=5 seconds.
Client adds that time to the "true" time when the server sent the response to estimate that it received the response at "true" time 165 seconds.
Client now knows that it needs to add 45 seconds to its clock.
In a proper implementation, the client runs as a daemon, all the time. Over a long period of time with many samples, NTP can actually determine if the computer's clock is slow or fast, and automatically adjust it accordingly, allowing it to keep reasonably good time even if it is later disconnected from the network. Together with averaging the responses from the server, and application of more complicated thinking, you can get incredibly accurate times.
There's more, of course, to a proper implementation than that, but that's the gist of it.

The NTP client asks all of its NTP
servers what time it is.
The different servers will give
different answers, with different confidence levels because the
requests will take different amounts
of time to travel from the client to
the server and back.
Marzullo's algorithm will find the smallest
range of time values consistent with
all of the answers provided.
You can be more confident of the accuracy of the answer from this algorithm than of that from any single time servers because the intersection of several sets will likely contain fewer elements than any individual set.
The more servers you query, the more constraints you'll have on the possible answer, and the more accurate your clock will be.

IF you are using timestamps to decide ordering, specific times may not be nessisary. You could use lamport clocks instead, which are less of a pain than network syncronization. It can tell you what came "first", but not the exact difference in times. It doesn't care what the computer's clock actually says.

The trick is that some packets are fast, and the fast packets give you tight constraints on the time.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio