Is polling on a ZMQ router socket threadsafe? - zeromq

I have multiple threads interacting with the same ZeroMQ router socket (bad idea, I know). I manage thread safety with locks on all sends and receives.
Do I also need to lock polling or is this relatively benign operation threadsafe?

I think using polling alleviates any need for multithreading. You can pick up events as they come in using the poll loop on one thread and then distribute those events if you wish to other threads for processing. This way you don't need to share the socket.

Related

How to grab the latest message sent from each connection

I have a ZMQ_PULL/ZMQ_PUSH socket connection.
I have multiple ZMQ_PUSH connections pushing to a single ZMQ_PULL connection.
ZMQ_PUSH connection 1----->
ZMQ_PUSH connection 2-----> ZMQ_PULL
ZMQ_PUSH connection N----->
I do not need every message, I just need the latest message that was sent. I am doing some inference on the back end and am streaming the results to the ZMQ_PULL socket.
I have set the ZMQ_PULL socket to Conflate=true
"If set, a socket shall keep only one message in its inbound/outbound queue, this message being the last message received/the last message to be sent. Ignores ZMQ_RCVHWM and ZMQ_SNDHWM options."
But after testing I realize I actually need the last message of each connection, not just the last message. So, if 3 connections, it grabs in a round robin from each connection, so I constantly have the latest from each connection.
Is there an option that is like Conflate, but instead of for all messages, it is for each connection?
Docs: http://api.zeromq.org/4-0:zmq-setsockopt
Is there an option that is like Conflate, but instead of for all messages, it is for each connection?
No.
The documentation you cite explains that 0MQ does not currently
offer direct support for such a single-socket use case.
You could certainly code it up and submit an upstream PR
so that future revs of 0MQ offer such functionality.
Given that you'll need app-level support to make
this work with 0MQ 4.3, simplest approach would
be to maintain N ZMQ_PULL sockets with ZMQ_CONFLATE
set, as you're already aware.
An alternate approach would be to assign a dedicated
thread or process to keep draining the existing muxed
socket, and update a shared memory data structure
that interested clients could consult.
The idea is to burn a core on keeping the queue
mostly empty, while doing no processing,
just focusing on communications.
Then other cores can examine "most recent message"
and each one then embarks on some expensive processing,
while another core continues to keep the queue drained.
This is essentially offering the 0MQ service proposed
above but at a different place in the stack,
up a level, within your application.
To do this in a distributed way,
the "queue draining service" would need to
know about idle workers.
That is, a worker could publish a brief
"I just completed an expensive task" message,
which would trigger the drainer to post
a fresh work item, never using shared memory at all.
This lets the drainer worry about eliding dup messages
that arrived when no one was available to immediately
start work on them, which have been superseded by a
more recent message.

RabbitMQ: routing messages to threads

I have an application written in Ruby that has multiple threads that each send requests to remote AMQP endpoints. These threads are spawned from time to time when new tasks have to be run.
If I use temporary, exclusive queues per thread for sending responses to their requests, then it becomes easy to write the code to handle incoming messages in this Ruby service. The queues are deleted as soon as the associated channel is closed so they don't stick around after their purpose is over.
The alternatives I can think of all require a listener thread listening on one or more queues that receive all incoming messages / responses into the Ruby service, and then routing these messages to waiting threads using some message identifiers. This seems more complicated, and I am unable to use RabbitMQ for all the required semantic routing.
Is the first model a viable model for AMQP communication? Is there a better pattern for handling this case?
the answer largely depends on your use case
if you don't care about losing messages when a given queue is deleted, then the first option is fine.
if you need messages to stick around in a queue until something comes along to process it, then you need to have a durable queue where messages sit.
there is no requirement for queue per thread, with rabbitmq.
however, you should be using a channel per thread.
given that, you can have a channel per thread and have multiple channels consuming from the same (or different) queue without issue.
as long as you keep channels limited to a single thread, you can do whatever you need in regards to the queues you are consuming from.

JMS consumer inside a Netty handler?

I'm designing a quite complicated system and was wondering what the best way is to put a jms consumer (activemq, vm protocol, non persitent) inside a netty handler.
Let me explain, i have several clients connecting to my netty server using websockets. For every client connection i create a jms consumer that listens for interesting messages on one or more topics. If a interesting message arrives i need to do a extra step (additional filtering) before sending the message to the client using the websocket.
Is the following a good way to do this:
inside a SimpleChannelInboundHandler i declare a private non static consumer
the consumer is initialized in channelActive
the consumer is destroyed in channelInactive
when a message is received by consumer i do the extra filter a send it using ctx.channel().write()
In this setup i'm a bit worried that the consumer might turn into slow consumer and slow everything down, cause the websocket goes over the internet.
I came up with a more complex one to decouple the "receiving of message by consumer" and "sending of message through a websocket".
inside a SimpleChannelInboundHandler i declare a private non static consumer
the consumer is initialized in channelActive
the consumer is destroyed in channelInactive
when a message is received by consumer i put it in a blockedqueue
every minute i let a thread (created for every client) look in the queue and send the found messages to the client using ctx.channel().write().
At this point i'm a bit worried about the extra thread per client.
Or is there maybe a better way to accomplish this task?
This is a classic slow consumer problem and the first step to resolving it is to determine what the appropriate action is when a slow consumer is detected. If it is acceptable that the slow consumer misses messages then the solution is some variation on dropping messages or unsubscribing them from the feed. For example, if it's acceptable that the client misses messages then, when one is received from JMS, check if the channel is writable. If it isn't, drop the message. If you want to give yourself a bit more of a buffer (although OS buffers are quite large) you can track the number of write completion future's that haven't completed (ie the messages haven't been written to the OS send buffer) and drop messages if there are too many outstanding write requests.
If the client may not miss messages, and is consistently slow, then the problem is more difficult. One option might be to divert messages to a JMS queue with a specific header value, then open a new consumer that reads messages from that queue using a JMS selector. This will put more load on the JMS server but might be appropriate for temporary slowness and hopefully it won't interfere with you main topic feeds. Alternatively you might want to stash the messages in a different store, such as a database, so you can poll for messages when they can be sent. If you do this right a single polling thread can cope with many clients (query for clients which have outstanding messages, then for each client, load a bunch of messages). However this isn't as convenient as using JMS.
I wouldn't go with option 2 because the blocking queue is only going to solve the problem temporarily, and you can achieve the same thing by tracking how many write operations are waiting to complete.

Ruby: How do I receive and send data parallel?

I want to create an application that sends and receives data parallel, like a chat application. It gets input and also sends some output, but NOT only if it receives data. I want to use UDP as protocol. I'm using ruby 1.9.3.
here's the code that receives data:
#s = UDPSocket.new
#s.bind(localhost, 1234)
Socket.udp_server_loop_on([#s]) do |message, sender|
#do something
end
This code should run independent from the rest of the application, it shouldn't block it.
Should I use a thread? I've never tried a network program and I'm not a professional developer, so please be patient. Perhaps my code/design is just crap, so feel free to tell me how this is done by professionals! ;)
UDP lends itself this sort of non-blocking processing quite naturally since you're receiving individual, atomic messages over your socket and can reply in the same fashion.
Inside that loop, just be sure to process things quickly and send response messages. If you make long blocking calls it will hold up your loop and affect response times.
EventMachine provides a structure for writing asynchronous applications and has its own methods for handling UDP and TCP sockets.
Don't forget to look at solutions which are already implemented. For chat applications, Socket.IO is a great place to start.
You should take a look at Eventmachine gem which handles blocking IO very efficiently.
Among others it also offers TCP and UDP server/client API.

Web sockets make ajax/CORS obsolete?

Will web sockets when used in all web browsers make ajax obsolete?
Cause if I could use web sockets to fetch data and update data in realtime, why would I need ajax? Even if I use ajax to just fetch data once when the application started I still might want to see if this data has changed after a while.
And will web sockets be possible in cross-domains or only to the same origin?
WebSockets will not make AJAX entirely obsolete and WebSockets can do cross-domain.
AJAX
AJAX mechanisms can be used with plain web servers. At its most basic level, AJAX is just a way for a web page to make an HTTP request. WebSockets is a much lower level protocol and requires a WebSockets server (either built into the webserver, standalone, or proxied from the webserver to a standalone server).
With WebSockets, the framing and payload is determined by the application. You could send HTML/XML/JSON back and forth between client and server, but you aren't forced to. AJAX is HTTP. WebSockets has a HTTP friendly handshake, but WebSockets is not HTTP. WebSockets is a bi-directional protocol that is closer to raw sockets (intentionally so) than it is to HTTP. The WebSockets payload data is UTF-8 encoded in the current version of the standard but this is likely to be changed/extended in future versions.
So there will probably always be a place for AJAX type requests even in a world where all clients support WebSockets natively. WebSockets is trying to solve situations where AJAX is not capable or marginally capable (because WebSockets its bi-directional and much lower overhead). But WebSockets does not replace everything AJAX is used for.
Cross-Domain
Yes, WebSockets supports cross-domain. The initial handshake to setup the connection communicates origin policy information. The wikipedia page shows an example of a typical handshake: http://en.wikipedia.org/wiki/WebSockets
I'll try to break this down into questions:
Will web sockets when used in all web browsers make ajax obsolete?
Absolutely not. WebSockets are raw socket connections to the server. This comes with it's own security concerns. AJAX calls are simply async. HTTP requests that can follow the same validation procedures as the rest of the pages.
Cause if I could use web sockets to fetch data and update data in realtime, why would I need ajax?
You would use AJAX for simpler more manageable tasks. Not everyone wants to have the overhead of securing a socket connection to simply allow async requests. That can be handled simply enough.
Even if I use ajax to just fetch data once when the application started I still might want to see if this data has changed after a while.
Sure, if that data is changing. You may not have the data changing or constantly refreshing. Again, this is code overhead that you have to account for.
And will web sockets be possible in cross-domains or only to the same origin?
You can have cross domain WebSockets but you have to code your WS server to accept them. You have access to the domain (host) header which you can then use to accept / deny requests. This can, however, be spoofed by something as simple as nc. In order to truly secure the connection you will need to authenticate the connection by other means.
Websockets have a couple of big downsides in terms of scalability that ajax avoids. Since ajax sends a request/response and closes the connection (..or shortly after) if someone stays on the web page it doesn't use server resources when idling. Websockets are meant to stream data back to the browser, and they tie up server resources to do so. Servers have a limit in how many simultaneous connections they can keep open at one time. Not to mention depending on your server side technology, they may tie up a thread to handle the socket. So websockets have more resource intensive requirements for both sides per connection. You could easily exhaust all of your threads servicing clients and then no new clients could come in if lots of users are just sitting on the page. This is where nodejs, vertx, netty can really help out, but even those have upper limits as well.
Also there is the issue of state of the underlying socket, and writing the code on both sides that carry on the stateful conversation which isn't something you have to do with ajax style because it's stateless. Websockets require you create a low level protocol which is solved for you with ajax. Things like heart beating, closing idle connections, reconnection on errors, etc are vitally important now. These are things you didn't have to solve when using AJAX because it was stateless. State is very important to the stability of your app and more importantly the health of your server. It's not trivial. Pre-HTTP we built a lot of stateful TCP protocols (FTP, telnet, SSH), and then HTTP happened. And no one did that stuff much anymore because even with its limitations HTTP was surprisingly easier and more robust. Websockets bring back the good and the bad of stateful protocols. You'll learn soon enough if you didn't get a dose of that last go around.
If you need streaming of realtime data this extra overhead is warranted because polling the server to get streamed data is worse, but if all you are doing is user interaction->request->response->update UI, then ajax is easier and will use less resources because once the response is sent the conversation is over and no additional server resources are used. So I think it's a tradeoff and the architect has to decide which tool fits their problem. AJAX has its place, and websockets have their place.
Update
So the architecture of your server is what matters when we are talking about threads. If you are using a traditionally multi-threaded server (or processes) where a each socket connection gets its own thread to respond to requests then websockets matter a lot to you. So for each connection we have a socket, and eventually the OS will fall over if you have too many of these, and the same goes for threads (more so for processes). Threads are heavier than sockets (in terms of resources) so we try and conserve how many threads we have running simultaneously. That means creating a thread pool which is just a fixed number of threads that is shared among all sockets. But once a socket is opened the thread is used for the entire conversation. The length of those conversations govern how quickly you can repurpose those threads for new sockets coming in. The length of your conversation governs how much you can scale. However if you are streaming this model doesn't work well for scaling. You have to break the thread/socket design.
HTTP's request/response model makes it very efficient in turning over threads for new sockets. If you are just going to use request/response use HTTP its already built and much easier than reimplementing something like that in websockets.
Since websockets don't have to be request/response as HTTP and can stream data if your server has a fixed number of threads in its thread pool and you have the same number of websockets tying up all of your threads with active conversations, you can't service new clients coming in! You've reached your maximum capacity. That's where protocol design is important too with websockets and threads. Your protocol might allow you to loosen the thread per socket per conversation model that way people just sitting there don't use a thread on your server.
That's where asynchronous single thread servers come in. In Java we often call this NIO for non-blocking IO. That means it's a different API for sockets where sending and receiving data doesn't block the thread performing the call.
So traditional in blocking sockets when you call socket.read() or socket.write() they wait until the data is received or sent before returning control to your program. That means your program is stuck waiting for the socket data to come in or go out until you can do anything else. That's why we have threads so we can do work concurrently (at the same time). Send this data to client X while I wait on data from client Y. Concurrencies is the name of the game when we talk about servers.
In a NIO server we use a single thread to handle all clients and register callbacks to be notified when data arrives. For example
socket.read( function( data ) {
// data is here! Now you can process it very quickly without waiting!
});
The socket.read() call will return immediately without reading any data, but our function we provided will be called when it comes in. This design radically changes how you build and architect your code because if you get hung up waiting on something you can't receive any new clients. You have a single thread you can't really do two things at once! You have to keep that one thread moving.
NIO, Asynchronous IO, Event based program as this is all known as, is a much more complicated system design, and I wouldn't suggest you try and write this if you are starting out. Even very Senior programmers find it very hard to build a robust systems. Since you are asynchronous you can't call APIs that block. Like reading data from the DB or sending messages to other servers have to be performed asynchronously. Even reading/writing from the file system can slow your single thread down lowering your scalability. Once you go asynchronous it's all asynchronous all the time if you want to keep the single thread moving. That's where it gets challenging because eventually you'll run into an API, like DBs, that is not asynchronous and you have to adopt more threads at some level. So a hybrid approaches are common even in the asynchronous world.
The good news is there are other solutions that use this lower level API already built that you can use. NodeJS, Vertx, Netty, Apache Mina, Play Framework, Twisted Python, Stackless Python, etc. There might be some obscure library for C++, but honestly I wouldn't bother. Server technology doesn't require the very fastest languages because it's IO bound more than CPU bound. If you are a die hard performance nut use Java. It has a huge community of code to pull from and it's speed is very close (and sometimes better) than C++. If you just hate it go with Node or Python.
Yes, yes it does. :D
The earlier answers lack imagination. I see no more reason to use AJAX if websockets are available to you.

Resources