I'm building a high concurrency server which needs to handle tens of thousands of active sockets. I initially used epoll to build a event-based server and it worked well under moderate scale(several thousands of active sockets). But it seems to become unstable when I have a concurrency more than 10,000 sockets. So I'm considering about libevent(or libev) since it's a mature project and claim to be able to "handle tens of thousands of active sockets".
I'm also thinking of using UDT because it's a "reliable UDP" and I start to have problem with TCP due to the overhead and memory usage. So a natural thought is to use libevent as my event frame work and UDT as transmission protocol. I know that UDT provides its own set of epoll operations. Does it mean that it won't be working with regular linux epoll? If so it won't be working with libevent or libev because they are built based on Linux epoll.
Is there anyone who have worked on both UDT and epoll / libevent / libev? Can UDT work with any of them?
Any help wold be appreciated.
Thanks.
UDT exposes a epoll API which can be used to use the protocol with epoll.
see http://udt.sourceforge.net/udt4/doc/epoll.htm for more information.
After some research I figured out that UDT sockets are not file descriptors, so can't be handled with epoll.
Related
We are currently upgrading ZeroMQ from version 2.2.12 to version 4.0.5 and we are finding that the performance is considerably worse since we upgraded.
We have a fairly simple DEALER/DEALER topology with either 1:1 or 1:many connections. We are running a message pump on either end of the connection using polling.
We are using a number of different bindings (ffi-rzmq, clrzmq, jzmq and zmq_cpp) to allow components written in different languages to communicate. All of our components seem to suffer from the same performance problems.
We are running under Windows 7 using loopback (127.0.0.1) TCP sockets.
Has anyone got any ideas of what could be wrong (or even any additional information I'll need to provide here?)
It turns out the the performance isn't slow, it was that messages between certain components where going missing causing the system to misbehave.
This was caused by us using an unsupported socket pair (DEALER / PUSH) and (DEALER / PULL) in a certain part of our system. This worked in ZeroMQ 2 but not 4.
The fix was to replace with with a supported topology (DEALER / DEALER in our case).
I'm having problems scaling up an application that uses the actor model and zeromq. To put it simply: I'm trying to create thousands of threads that communicate via sockets. Similar to what one would do with a Erlang-type message passing. I'm not doing it for multicore/performance reasons, but because framing it in this way gives me very clean code.
From a philosophical point of view it sounds as if this is what zmq developers would like to achieve, e.g.
http://zeromq.org/whitepapers:multithreading-magic
However, it seems as if there are some practical limitations. At 1024 inproc sockets I start getting the "ZMQError: Too many open files" error. TCP gives me the typical "Assertion failed: fds.size () <= FD_SETSIZE" crash.
Why does inproc sockets have this limit?
To get it to work I've had to group together items to share a socket. Is there a better way?
Is zmq just the wrong tool for this kind of job? i.e. it's still more a network library than an actor message passing library?
ZMQ uses file descriptors as the "resource unit" for inproc connections. There is a limit for file descriptors set by the OS, you should be able to modify that (found several potential avenues for Windows with a quick Google search), though I don't know what the performance impact might be.
It looks like this is related to the ZMQ library using C code that is portable among systems for opening new files, rather than Windows native code that doesn't suffer from this same limitation.
Using golang's net/http server to handle connections, is there a pattern to better handle 10,000 keep alive connections with relatively low requests per second each?
my benchmark performance with something like Wrk is 50,000 requests per second, and with real traffic (from realtime bidding exchanges) I have a hard time beating 8,000 requests per second.
I know connection multiplexing from a hardware loadbalancer is possible, but it seems like the same type of pattern can be achieved in Go.
You can distribute load on local and remote servers using an IPC protocol like JSON RPC through e.g. UNIX and TCP sockets.
Related: Go Inter-Process Communication
As to the performance bottleneck; it has been discussed extensively on the go-nuts mailing list. At the time of writing it is the runtime's goroutine scheduler and world-stopping garbage collector.
The core team has recently made major improvements to the runtime to alleviate this problem yet there still is room for improvement. To quote one example:
Due to tighter coupling of the run-time and network libraries, fewer context switches are required on network operations.
I'm looking for a mechanism to use to create a simple many-to-many messaging system to allow Windows applications to communicate on a single machine but across sessions and desktops.
I have the following hard requirements:
Must work across all Windows sessions on a single machine.
Must work on Windows XP and later.
No global configuration required.
No central coordinator/broker/server.
Must not require elevated privileges from the applications.
I do not require guaranteed delivery of messages.
I have looked at many, many options. This is my last-ditch request for ideas.
The following have been rejected for violating one or more of the above requirements:
ZeroMQ: In order to do many-to-many messaging a central broker is required.
Named pipes: Requires a central server to receive messages and forward them on.
Multicast sockets: Requires a properly configured network card with a valid IP address, i.e. a global configuration.
Shared Memory Queue: To create shared memory in the global namespace requires elevated privileges.
Multicast sockets so nearly works. What else can anyone suggest? I'd consider anything from pre-packaged libraries to bare-metal Windows API functionality.
(Edit 27 September) A bit more context:
By 'central coordinator/broker/server', I mean a separate process that must be running at the time that an application tries to send a message. The problem I see with this is that it is impossible to guarantee that this process really will be running when it is needed. Typically a Windows service would be used, but there is no way to guarantee that a particular service will always be started before any user has logged in, or to guarantee that it has not been stopped for some reason. Run on demand introduces a delay when the first message is sent while the service starts, and raises issues with privileges.
Multicast sockets nearly worked because it manages to avoid completely the need for a central coordinator process and does not require elevated privileges from the applications sending or receiving multicast packets. But you have to have a configured IP address - you can't do multicast on the loopback interface (even though multicast with TTL=0 on a configured NIC behaves as one would expect of loopback multicast) - and that is the deal-breaker.
Maybe I am completely misunderstanding the problem, especially the "no central broker", but have you considered something based on tuple spaces?
--
After the comments exchange, please consider the following as my "definitive" answer, then:
Use a file-based solution, and host the directory tree on a Ramdisk to insure good performance.
I'd also suggest to have a look at the following StackOverflow discussion (even if it's Java based) for possible pointers to how to manage locking and transactions on the filesystem.
This one (.NET based) may be of help, too.
How about UDP broadcasting?
Couldn't you use a localhost socket ?
/Tony
In the end I decided that one of the hard requirements had to go, as the problem could not be solved in any reasonable way as originally stated.
My final solution is a Windows service running a named pipe server. Any application or service can connect to an instance of the pipe and send messages. Any message received by the server is echoed to all pipe instances.
I really liked p.marino's answer, but in the end it looked like a lot of complexity for what is really a very basic piece of functionality.
The other possibility that appealed to me, though again it fell on the complexity hurdle, was to write a kernel driver to manage the multicasting. There would have been several mechanisms possible in this case, but the overhead of writing a bug-free kernel driver was just too high.
I am going to tell the problem that I have to solve and I need some suggestions if i am in the right path.
The problem is:
I need to create a Windows Service application that receive a request and do some action. (Socket communication) This action is to execute a script (maybe in lua or perl).This script models te bussiness rules of the client, querying in Databases, making request in websites and then send a response to the client.
There are 3 mandatory requirements:
The service will receive a lot of request at the same time. So I think to use the worker's thread model.
The service must have a high throughput. I will have many of requests at the same second.
Low Latency: I must response these requests very quickly.
Every request will generate a log entries. I cant write these log entries in the physical disk at same time the scripts execute because the big I/O time. Probably I will make a queue in memory and others threds will consume this queue and write on disk.
In the future, is possible that two woker's thread have to change messages.
I have to make a protocol to this service. I was thinking to use Thrift, but i don't know the overhead involved. Maybe i will make my own protocol.
To write the windows service, i was thinking in Erlang. Is it a good idea?
Does anyone have suggestions/hints to solve this problem? Which is the better language to write this service?
Yes, Erlang is a good choice if you're know it or ready to learn. With Erlang you don't need any worker thread, just implement your server in Erlang style and you'll receive multithreaded solution automatically.
Not sure how to convert Erlang program to Windows service, but probably it's doable.
Writing to the same log file from many threads are suboptimal because requires locking. It's better to have a log-entries queue (lock-free?) and a separate thread (Erlang process?) that writes them to the file. BTW, are you sure that executing external script in another language is much faster than writing a log-record to the file?
It's doubtfully you'll receive much better performance with your own serialization library than Thrift provides for free. Another option is Google Protocol Buffers, somebody claimed that it's faster.
Theoretically (!) it's possible that Erlang solution won't provide you required performance. In this case consider a compilable language, e.g. C++ and asynchronous networking, e.g. Boost.Asio. But be ready that it's much more complicated than Erlang way.