Unclear on Nagle's Algorithm - algorithm

I've been doing some research on Nagle's algorithm out of idle curiousity. I understand the basic concept behind it (TCP packets contain a significant amount of overhead especially when dealing with small payloads), but I'm not sure I grok the implementation.
I was reading this article on Wikipedia, but I'm still unclear on how it works. Let's take the example of a Telnet connection. The connection is established and I begin typing. Let's say I type three characters (cat, for example) and hit return. Now we're talking cat\r\n which is still only 5 bytes. I'd think this would not get sent until we queue up enough bytes to send - and yet, it does get sent immediately (from a user perspective), since cat is immediately executed upon hitting return.
I think I have a fundamental misunderstanding here on how the algorithm works, specifically regarding the bit where "if there is unconfirmed data still in the pipe, enqueue, else send immediately."

The data gets sent immediately only if the server has already responded to any previous messages from you (or this is your first contact with it in this session). So, as the server gets busier and slower to respond, in order to avoid swamping it with too many packets, the data gets queued up to a maximum packet size before getting sent.
So whether data gets sent immediately or not only can be determined in the context of previous messages, if any.

Read this post, it is quite in-depth and clarified a lot of the things for me.

Related

How to manage repeated requests on a cached server while the result arrives

In the context of a highly requested web service written in go language, I am considering to cache some computations. For that, I am thinking to use Redis.
My application is susceptible to receiving an avalanche of requests containing the same payload that triggers a costly computation. So a cache would reward and allow to compute once.
Consider the following figure extracted from here
I use this figure because I think it helps me illustrate the problem. The figure considers the general two cases: the book is in the cache, or this one is not in. However, the figure does not consider the transitory case when a book is being retrieved from the database and other "get-same-book" requests arrive. In this case, I would like to queue the repeated requests temporarily until the book is retrieved. Next, once the book has already arrived, the queued requests are replied with the result which would remain in the cache for fast retrieving of future requests.
So my question asks for approaches for implementing this requirement. I'm considering to use a kind of table on the server (repository) that writes the status of a query database (computing, ready), but this seems a little complicated, because I would need to handle some race conditions.
So I would like to know if anyone knows this pattern or if Redis itself implements it in some way (I have not found it in my consultations, but I suspect that using a Redis lock would be possible)
You can design it as you have described. But there is some things that are important.
Use a unique key
Use an unique key for each book, and if the book is ever changed, that key should also change. This design makes your step (6) save the book in Redis an idempotent operation (you can do it many times with the same result). So you will avoid any race condition with "get-same-book".
Idempotent requests OR asynchronous messages
I would like to queue the repeated requests temporarily until the book is retrieved. Next, once the book has already arrived, the queued requests are replied with the result
I would not recommend to queue requests as you describe it. If the request is a cache-miss, let it retrieve it from the database - but design it idempotent. Alternatively, you should handle all requests as asynchronous, and use a message queue e.g. nats, RabbitMQ or something, but the complexity grows with that solution.
Serializing requests
My problem is that while that second of computation where the result is not still gotten, too many repeated requests can arrive and due to the cost I need to avoid to repeat their computations. I need to find a way of retaining them while the result of the first request arrives.
It sounds like you want to have your computations serialized instead of doing them concurrently because you want to avoid doing the same computation twice. To solve this, you should let the requests initialize the computation, e.g. by putting the input on a queue and then do the computation in a serial order (but still possibly concurrently if they have a different key) and finally notify the client, or if the client is subscribing for updates (a better solution).
Redis do have support for PubSub but it depends on what requirements you have on the clients. I would recommend a solution without locks, for scalability.

What additional overheads are there to sending a packet over a websocket connection?

When performing AJAX requests, I have always tried to do as few as possible since there is an overhead to each request having to open the http connection to send the data. Since a websocket connection is constantly open, is there any cost outside of the obvious packet bandwidth to sending a request?
For example. Over the space of 1 minute, a client will send 100kb of data to the server. Assuming the client does not need a response to any of these requests, is there any advantage to queuing packets and sending them in one big burst vs sending them as they are ready?
In other words, is there an overhead to the stopping and starting data transfer for a connection that is constantly open?
I want to make a multiplayer browser game as real time as possible, but I don't want to find that 100s of tiny requests per minute compared to a larger consolidated request is causing the server additional stress. I understand that if the client needs a response it will be slower as there is a lot of waiting from the back and forth. I will consider this and only consolidate when it is appropriate. The more smaller requests per minute, the better user experience, but I don't know what toll it will have on the server.
You are correct that a webSocket message will have lower overhead for a given message transmission than sending the same message via an Ajax call because the webSocket connection is already established and because a webSocket message has lower overhead than an HTTP request.
First off, there's always less overhead in sending one larger transmission vs. sending lots of smaller transmissions. That's just the nature of TCP. Every TCP packet gets separately processed and acknowledged so sending more of them costs a bit more overhead. Whether that difference is relevant or significant and worth writing extra code for or worth sacrificing some element of your user experience (because of the delay for batching) depends entirely upon the specifics of a given situation.
Since you've described a situation where your client gets the best experience if there is no delay and no batching of packets, then it seems that what you should do is not implement the batching and test out how your server handles the load with lots of smaller packets when it gets pretty busy. If that works just fine, then stay with the better user experience. If you have issues keeping up with the load, then seriously profile your server and find out where the main bottleneck to performance is (you will probably be surprised about where the bottleneck actually is as it is often not where you think it will be - that's why you have to profile and measure to know where to concentrate your energy for improving the scalability).
FYI, due to the implementation of Nagel's algorithm in most implementations of TCP, the TCP stack itself does small amounts of batching for you if you are sending multiple requests fairly closely spaced in time or if sending over a slower link.
It's also possible to implement a dynamic system where as long as your server is able to keep up, you keep with the smaller and more responsive packets, but if your server starts to get busy, you start batching in order to reduce the number of separate transmissions.

ZeroMQ pattern for load balancing work across workers based on idleness

I have a single producer and n workers that I only want to give work to when they're not already processing a unit of work and I'm struggling to find a good zeroMQ pattern.
1) REQ/REP
The producer is the requestor and creates a connection to each worker. It tracks which worker is busy and round-robins to idle workers
Problem:
How to be notified of responses and still able to send new work to idle workers without dedicating a thread in the producer to each worker?
2) PUSH/PULL
Producer pushes into one socket that all workers feed off, and workers push into another socket that the producer listens to.
Problem:
Has no concept of worker idleness, i.e. work gets stuck behind long units of work
3) PUB/SUB
Non-starter, since there is no way to make sure work doesn't get lost
4) Reverse REQ/REP
Each worker is the REQ end and requests work from the producer and then sends another request when it completes the work
Problem:
Producer has to block on a request for work until there is work (since each recv has to be paired with a send ). This prevents workers to respond with work completion
Could be fixed with a separate completion channel, but the producer still needs some polling mechanism to detect new work and stay on the same thread.
5) PAIR per worker
Each worker has its own PAIR connection allowing independent sending of work and receipt of results
Problem:
Same problem as REQ/REP with requiring a thread per worker
As much as zeroMQ is non-blocking/async under the hood, I cannot find a pattern that allows my code to be asynchronous as well, rather than blocking in many many dedicated threads or polling spin-loops in fewer. Is this just not a good use case for zeroMQ?
Your problem is solved with the Load Balancing Pattern in the ZMQ Guide. It's all about flow control whilst also being able to send and receive messages. The producer will only send work requests to idle workers, whilst the workers are able to send and receive other messages at all times, e.g. abort, shutdown, etc.
Push/Pull is your answer.
When you send a message in ZeroMQ, all that happens initially is that it sits in a queue waiting to be delivered to the destination(s). When it has been successfully transferred it is removed from the queue. The queue is limited in length, but can be set by changing a socket's high water mark.
There is a/some background thread(s) that manage all this on your behalf, and your calls to the ZeroMQ API are simply issuing instructions to that/those threads. The threads at either end of a socket connection are collaborating to marshall the transfer of messages, i.e. a sender won't send a message unless the recipient can receive it.
Consider what this means in a push/pull set up. Suppose one of your pull workers is falling behind. It won't then be accepting messages. That means that messages being sent to it start piling up until the highwater mark is reached. ZeroMQ will no longer send messages to that pull worker. In fact AFAIK in ZeroMQ, a pull worker whose queue is more full than those of its peers will receive less messages, so the workload is evened out across all workers.
So What Does That Mean?
Just send the messages. Let 0MQ sort it out for you.
Whilst there's no explicit flag saying 'already busy', if messages can be sent at all then that means that some pull worker somewhere is able to receive it solely because it has kept up with the workload. It will therefore be best placed to process new messages.
There are limitations. If all the workers are full up then no messages are sent and you get blocked in the push when it tries to send another message. You can discover this only (it seems) by timing how long the zmq_send() took.
Don't Forget the Network
There's also the matter of network bandwidth to consider. Messages queued in the push will tranfer at the rate at which they're consumed by the recipients, or at the speed of the network (whichever is slower). If your network is fundamentally too slow, then it's the Wrong Network for the job.
Latency
Of course, messages piling up in buffers represents latency. This can be restricted by setting the high water mark to be quite low.
This won't cure a high latency problem, but it will allow you to find out that you have one. If you have an inadequate number of pull workers, a low high water mark will result in message sending failing/blocking sooner.
Actually I think in ZeroMQ it blocks for push/pull; you'd have to measure elapsed time in the call to zmq_send() to discover whether things had got bottled up.
Thought about Nanomsg?
Nanomsg is a reboot of ZeroMQ, one of the same guys is involved. There's many things I prefer about it, and ultimately I think it will replace ZeroMQ. It has some fancier patterns which are more universally usable (PAIR works on all transports, unlike in ZeroMQ). Also the patterns are essentially a plugable component in the source code, so it is far simpler for patterns to be developed and integrated than in ZeroMQ. There is a discussion on the differences here
Philisophical Discussion
Actor Model
ZeroMQ is definitely in the realms of Actor Model programming. Messages get stuffed into queues / channels / sockets, and at some undetermined point in time later they emerge at the recipient end to be processed.
The danger of this type of architecture is that it is possible to have the potential for deadlock without knowing it.
Suppose you have a system where messages pass both ways down a chain of processes, say instructions in one way and results in the other. It is possible that one of the processes will be trying to send a message whilst the recipient is actually also trying to send a message back to it.
That only works so long as the queues aren't full and can (temporarily) absorb the messages, allowing everyone to move on.
But suppose the network briefly became a little busy for some reason, and that delayed message transfer. The message send might then fail because the high water mark had been reached. Whoops! No one is then sending anything to anyone anymore!
CSP
A development of the Actor Model, called Communicating Sequential Processes, was invented to solve this problem. It has a restriction; there is no buffering of messages at all. No process can complete sending a message until the recipient has received all the data.
The theoretical consequence of this was that it was then possible to mathematically analyse a system design and pronounce it to be free of deadlock. The practical consequence is that if you've built a system that can deadlock, it will do so every time. That's actually not so bad; it'll show up in testing, not post-deployment.
Curiously this is hinted at in the documentation of Microsoft's Task Parallel library, where they advocate setting buffer lengths to zero in the intersts of achieving a more robust application.
It'd be like setting the ZeroMQ high water mark to zero, but in zmq_setsockopt() 0 means default, not nought. The default is non-zero...
CSP is much more suited to real time applications. Any shortage of available workers immediately results in an inability to send messages (so your system knows it's failed to keep up with the real time demand) instead of resulting in an increased latency as data is absorbed by sockets, etc. (which is far harder to discover).
Unfortunately almost every communications technology we have (Ethernet, TCP/IP, ZeroMQ, nanomsg, etc) leans towards Actor Model. Everything has some sort of buffer somewhere, be it a packet buffer on a NIC or a socket buffer in an operating system.
Thus to implement CSP in the real world one has to implement flow control on top of the existing transports. This takes work, and it's slightly inefficient. But if a system that needs it, it's definitely the way to go.
Personally I'd love to see 0MQ and Nanomsg to adopt it as a behavioural option.

Why is it legit to use no-op to fill gaps between paxos events?

I am learning Paxos algorithm (http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf) and there is one point I do not understand.
We know that events follow a timely order, and it happens when, say, events 1-5 and 10 are decided, but 6-9 and 11 thereafter are not yet. In the paper above, it says we simply fill in the gap between 6-9 with no-op values, and simply record new events from 11 and on.
So in this case, since event 10 is already recorded, we know some kinds of events must have happened between 5 and 10 but are not recorded by Paxos due to some failures. If we simply fill in no-op values, these events will lost in our recording.
Even worse, if, as the paper I linked above says, events are in fact commands from the client, then missing a few commands in the middle might make the entire set of operations illegal (if none of the commands can be skipped or the order of them matters).
So why is it still legit for Paxos to fill no-op values for gaps between events? (If the entire set of records might be invalid because of no-op values as I concerned above.) Also, is there a better way to recover from such gaps instead of using no-op?
This is a multi-part answer.
Proposing no-op values is the way to discover commands that haven't got to the node yet. We don't simply fill each slot in the gap with a no-op command: we propose each slot is filled with a no-op. If any of the peers have accepted a command already, it will return that command in the Prepare-ack message and the proposer will use that command in the Accept round instead of the no-op.
For example, assume a node was behind a temporary network partition and was unable to play with the others for slots 6-9. It knows it missed out upon learning the command in slot 10. It then proposes no-ops to learn what was decided in those slots.
Practical implementations also have an out-of-band learning protocol to learn lots of transitions in bulk.
A command isn't a command until it is fully decided; until then it is just a proposed command. Paxos is about choosing between contending commands from multiple clients. Clients must be prepared to have their commands rejected because another client's was chosen instead.
Practical implementations are all about choosing the order of client commands. Their world view is that of a write-ahead log, and they are placing the commands in that log. They retry in the next slot if they're command wasn't chosen. (There are many ways to reduce the contention; Lamport mentions forwarding requests to a leader, such as is done in Multi-Paxos.)
Practical systems also have some means to know if the command is invalid before proposing it; such as knowing a set of reads and a set of writes. This is important for two reasons. First, it's an asynchronous, multi-client system and anything could have changed by the time the client's command has reached the server. Second, if two concurrent commands do not conflict then both should be able to succeed.
The system model allows commands (messages) to be lost by the network anyway. If a message is lost, the client is expected to eventually retry the request; so it is fine to drop some of them. If the commands of a client have to executed in client order, then either the client only sends commands synchronously; or the commands have to be ordered at a higher level in the library and kept in some client-session object before being executed.
AFAIK the Zab protocol guarantees client-order, if you don't want to implement that at a higher level.

Com port queue latency metering

I have two programs (host and slave) communicating over a com port. In the simplest case, the host sends a command to the slave and waits for a response, then does it again. But this means that each side has to wait for the other for every transaction. So I use a queue so the second command can be sent before the first response comes back. This keeps things flowing faster.
But I need a way of metering the use of the queue so that there are never more than N command/response pairs in route at any time. So for example if N is 3, I will wait to send the fourth command until I get the first response back, etc. And it must keep track of which response goes with which command.
One thought I had is to tag each command with an integer modulo counter which is also returned with the response. This would ensure that the command and response are always paired correctly and I can do a modulo compare to be able to meter the commands always N ahead of the responses.
What I am wondering, is there a better way? Isn't this a somewhat common thing to do?
(I am using Python, but that is not important.)
Using a sequence number and modulo arithmetic is in fact quite a common way to both acknowledge messages received and tell the sender when it can send more messages - see e.g. http://en.wikipedia.org/wiki/Sliding_window_protocol. Unfortunately for you, the obvious example, TCP, is unusual in that it uses a sequence number based on byte counts, not message counts, but the principle is much the same - TCP just has an extra degree of flexibility.

Resources