IO callbacks on goroutine - go

I'm a beginner at golang. Looking at all golang tutorials, it looks you should create goroutines for everything. Coming from something like libuv in C where you can define callbacks for socket read/write on a single thread, is the right way to achieve that in golang to create nested goroutines for any IO tasks needed?
As an example, take something like nginx where a single thread will handle multiple connections. To do something like that in golang, we would need a goroutine for every connection?

Go stands out in the area of tools to write networked services specifically because of the fact it has I/O-awareness integrated right into the runtime scheduler powering any running GO program.
The basic idea is roughly like this: a goroutine performs normal, sequential, callback-free operations on sockets — that is, plain reads and plain writes, — and as soon as the next I/O operation would block (yes, the relevant syscall on a Unix-like kernel returns EWOULDBLOCK), the goroutine is suspended, its socket is handed out into a component of the runtime called "netpoller", which is implemented using the platform-native socket I/O multiplexor such as epoll, kqueue or IOCP, and the OS thread the goroutine was running on is handed off to another goroutine which wants to run. As soon as the netpoller signals the I/O on the socket caused the goroutine to suspend can proceed, the scheduler queues that goroutine for execution and then it contnues to run exactly where it left off.
Because of this, the usual model employed when writing networking services in Go is to have one goroutine per socket. When you're writing plain TCP server, you should create a goroutine yourself (and hand it the socket returned by the listener once it accepted a client's connection).
net/http.Server has this behaviour built-in as it creates a goroutine to serve each incoming client request (actually, for HTTP/1.x, two or even three goroutines are created per connection, but it's invisible to HTTP request handlers).
Now, we've just covered the basics. Of course, there might exist legitimate reasons to have extra goroutines to handle tasks needed to be carried out to complete a request, and that's what #Volker referred to.
More info:
"What color is your function?" — a classical essay dealing with I/O multiplexing implemented as a library vs it being implemented in the core.
"Go's work-stealing scheduler"; also see this and this and this design doc.
State threads library which implements the approach quite similar to that of Go, just on much lower level. Its documentation is quite insightful on the approach implemented in Go.
libtask is a much more recent stab at
the same problem, by one of Go's creators.


boost asio concurrent async_read and async_write

Looking at the documentation it looks like the TCP socket object is not thread-safe. So I cannot issue async_read from one thread and async_write concurrently from another thread? Also I would guess it applies to boost::asio::write() as well?
Can I issue write() - synchronous, while I do async_read from another thread?
If that is not safe, then only way is probably to get the socket native handle
and use synchronous linux mechanisms to achieve concurrent read and writes. I have an application where the reads and writes are actually independent.
It is thread-safe for the use-cases you listed. You can read in one thread, and write in another. And you can use the synchronous as well as asynchronous operations for that.
You will however run into problems, if you try to do one dedicated operation type (e.g. reads) from more than one thread. Especially if you are using the freestanding/composed operations (boost::asio::read(socket) instead of socket.read_some(). The reason for this is one the primitive operations are atomic / threadsafe. And the composed operations are working by calling multiple times into the primitives.

Synchronous vs Asynchronous socket reads

Most example apps I come across for receiving data are using async calls. For instance, c++ examples use boost asio services to bind message handlers to callbacks. But what about an app that only needs to listen to data from a single socket and process the messages in order? Would it be faster to have a loop that polls/recv's from the socket and calls the handler without using a callback (assume main and logging threads are separate)? Or is there no performance difference (assume messages are coming in as fast as the network card and kernel can handle them)?
There are many intricacies I don't know such as the impact of callbacks to performance due to things like branch prediction. Or if there will be a performance penalty of the callbacks call a different thread to do the processing. Curious to hear some thoughts, experiences, dialog on this subject to save myself from attempting both implementations to discover the answer.

Is a blocking function on an asynchronous api idiomatic?

Is it more idiomatic to have an async api, with a blocking function as the synchronous api that simply calls the async api and waits for an answer before returning, rather than using a non-concurrent api and let the caller run it in their own goroutine if they want it async?
In my current case I have a worker goroutine that reads from a request channel and sends the return value down the response channel (that it got in a request struct from the request channel).
This seems to differ from the linked question since I need the return values, or to synchronize so that I can be sure the api call finishes before I do something else, to avoid race conditions.
For golang, I recommend Effective Go-concurrency. Especially I think everyone using golang need to known the basics of goroutine and parallelization:
Goroutines are multiplexed onto multiple OS threads so if one should block, such as while waiting for I/O, others continue to run. Their design hides many of the complexities of thread creation and management.
The current implementation of the Go runtime dedicates only a single core to user-level processing. An arbitrary number of goroutines can be blocked in system calls, but by default only one can be executing user-level code at any time.

UDP Server Thread Sleeping

We have a server that needs 1 UDP connection for each gameplay area, and these each run on their own thread.
We are using C++.
We are non-blocking sockets with recvfrom. The first thing checked in the "read" function is if the recvfrom "in" buffer contains NULL after calling, and then if the error is WSAEWOULDBLOCK.
If the error is found, the function returns and the thread is put to sleep for 1ms (but really, it's longer).
If there is data, it is processed. Some paths lead to immediate processing but most cases the data is put into a queue for the game area's main thread to handle.
My question: Is there a more efficient and performing method than using thread.sleep(1) to ensure each gameplay area's UDP Server instance does not spin while there is nothing to receive, and yet be able to respond to packets faster than the inherent and random thread wake-up of the Scheduler?
In this last part of the requirement, I'm referring to the fact that a thread will usually never sleep only 1ms, rather, on average more like 50ms.
The case may arise, later when the server is being sent requests at a constant rate, that the loop to check and respond to packets is never empty, and so the thread.sleep(1) will never be reached, so I suppose this is more a Best Practice type of question, but I would implement a better solution if one is available.
Thank you
Edit- added info. After adding this, perhaps this implementation isn't anything to worry about. I think worst case scenario is a set of packets would have to wait the 45-55ms for the thread to be scheduled should they miss the opportunity to be read by the socket.
I suppose to improve, I could make the recvfrom call it's own thread, make the socket block, and use a conditional variable to awaken the thread responsible for processing the packets. What do you think about this idea? Too much overhead?

How does a non-forking web server work?

Non-forking (aka single-threaded or select()-based) webservers like lighttpd or nginx are
gaining in popularity more and more.
While there is a multitude of documents explaining forking servers (at
various levels of detail), documentation for non-forking servers is sparse.
I am looking for a bird eyes view of how a non-forking web server works.
(Pseudo-)code or a state machine diagram, stripped down to the bare
minimum, would be great.
I am aware of the following resources and found them helpful.
World of SELECT()
source code
internal states
However, I am interested in the principles, not implementation details.
Why is this type of server sometimes called non-blocking, when select() essentially blocks?
Processing of a request can take some time. What happens with new requests during this time when there is no specific listener thread or process? Is the request processing somehow interrupted or time sliced?
As I understand it, while a request is processed (e.g file read or CGI script run) the
server cannot accept new connections. Wouldn't this mean that such a server could miss a lot
of new connections if a CGI script runs for, let's say, 2 seconds or so?
Basic pseudocode:
while true
with fd needing action do
read/write fd
if fd was read and well formed request in buffer
service request
other stuff
Though select() & friends block, socket I/O is not blocking. You're only blocked until you have something fun to do.
Processing individual requests normally involved reading a file descriptor from a file (static resource) or process (dynamic resource) and then writing to the socket. This can be done handily without keeping much state.
So service request above typically means opening a file, adding it to the list for select, and noting that stuff read from there goes out to a certain socket. Substitute FastCGI for file when appropriate.
Not sure about the others, but nginx has 2 processes: a master and a worker. The master does the listening and then feeds the accepted connection to the worker for processing.
select() PLUS nonblocking I/O essentially allows you to manage/respond to multiple connections as they come in a single thread (multiplexing), versus having multiple threads/processes handle one socket each. The goal is to minimize the ratio of server footprint to number of connections.
It is efficient because this single thread takes advantage of the high level of active socket connections required to reach saturation (since we can do nonblocking I/O to multiple file descriptors).
The rationale is that it takes very little time to acknowledge bytes are available, interpret them, then decide on the appropriate bytes to put on the output stream. The actual I/O work is handled without blocking this server thread.
This type of server is always waiting for a connection, by blocking on select(). Once it gets one, it handles the connection, then revisits the select() in an infinite loop. In the simplest case, this server thread does NOT block any other time besides when it is setting up the I/O.
If there is a second connection that comes in, it will be handled the next time the server gets to select(). At this point, the first connection could still be receiving, and we can start sending to the second connection, from the very same server thread. This is the goal.
Search for "multiplexing network sockets" for additional resources.
Or try Unix Network Programming by Stevens, Fenner, Rudoff
