tornado - make connections survive server bounce - websocket

I'm storing instances of tornado.websocket.WebSocketHandler in a dictionary so when a message comes for a specific user I can route the message to the appropriate listener.
Implication of this is when the server bounces we lose the listener details and the client would have to create a new WebSocket instance.
I would like to implement means of storing the listener details in persistent store, maybe in redis but am unsure of best approach.
I could pickle the WebSocketHandler instance and write to redis, then read and unpickled when a message to a specific user needs to be routed to their client, but this feels a bit hacky. Is there a less hacky solution?

You can't usefully pickle the WebSocketHandler because connected sockets cannot be transferred in this way. You might be able to do something with a multiprocessing.Queue instead of simply pickling, but this will be tricky and hacky at best. Clients must be able to create new WebSocket connections in any case to recover from network outages; it's normal to simply do the same when the server restarts.

Related

Detecting socket connection using ZeroMQ STREAM sockets

I am building a new application that receives data from a number of external devices and needs to make it available to a number of different components. ZeroMQ seems purpose-built for the "data bus" aspect of my architecture.
I recently became aware that zmq STREAM sockets can connect to native TCP sockets and send/received messages. Using zmq throughout has a lot of appeal, but I have one problem that I don't know how to get around.
One of my devices needs to be set up. That is, I connect a socket to it, send it some configuration information, then sit back and wait for it to send me data. The device also has a "reset" capability (useful in some contexts), that requires re-sending the configuration information. Doing this depends upon having visibility to the setup/tear-down stage of the socket interface. I need to know when a new connection is established, so I can send the necessary configuration messages.
It seems that zmq is purposely designed to shield me from that knowledge. Is there a way to do what I want? Or should I just use regular sockets for this interface?
Well, it turns out that reading (the right version of) the fine manual can be instructive.
When a connection is made, a zero-length message will be received by the application. Similarly, when the peer disconnects (or the connection is lost), a zero-length message will be received by the application.
I guess all that remains is to disambiguate between connect and disconnect. Still looking for advice from the community, if others have dealt with this situation before.
Following up on your own answer, I would hesitate to rely on that zero length connect/disconnect message as your whole strategy - that seems needlessly fragile. It's not clear to me from your question which end is persistent and which end needs configuration information, but I expect that one end knows it's resetting and reconnecting, and that end needs configuration information from the peer, so it should ask for it with a message when it needs it, to which the peer responds with the requested information.
If the peer does not yet have the required configuration information before it receives some other message, it could either queue up that work or it could respond back with the need for the config, and then have the rest of the network handle that need appropriately.
You shouldn't need stream/tcp sockets to make that work, it should work with more standard ZMQ socket types, you just need to build the robustness into your application rather than trying to get it for free from TCP/socket actions.
If I've missed your point, and what I'm suggesting won't work for some reason, you will have to give more specific information about your network topology for anyone else to understand what a suitable solution might be.

How to handle global resources in Spring State Machine?

I am thinking of using Spring State Machine for a TCP client. The protocol itself is given and based on proprietary TCP messages with message id and length field. The client sets up a TCP connection to the server, sends a message and always waits for the response before sending the next message. In each state, only certain responses are allowed. Multiple clients must run in parallel.
Now I have the following questions related to Spring State machine.
1) During the initial transition from disconnected to connected the client sets up a connection via java.net.Socket. How can I make this socket (or the DataOutputStream and BufferedReader objects got from the socket) available to the actions of the other transitions?
In this sense, the socket would be some kind of global resource of the state machine. The only way I have seen so far would be to put it in the message headers. But this does not look very natural.
2) Which runtime environment do I need for Spring State Machine?
Is a JVM enough or do I need Tomcat?
Is it thread-safe?
Thanks, Wolfgang
There's nothing wrong using event headers but those are not really global resources as header exists only for duration of a event processing. I'd try to add needed objects into an machine's extended state which is then available for all actions.
You need just JVM. On default machine execution is synchronous so there should not be any threading issues. Docs have notes if you want to replace underlying executor asynchronous(this is usually done if multiple concurrent regions are used).

How to provide both initial data and subsequent events via WAMP/Websockets

I have a an application from which I need to send live updates to web clients.
I'm currently happily using websockets for that, via the WAMP protocol, as it provides both publish-subscribe and RPC methods.
Now, I find that in lots of situations, when a user starts the application or a view, I need to send an initial state to the client, and then keep sending updates. I do the first with an RPC call, and the latter via publish-subscribe.
Now, this forces me to write server-side and client-side code for both of the methods, even while I'm basically conveying the same information in both cases.
On server side, I'm moving appropriate code to a common method, but I still need to take care of both sending the event and provide an entry point for the RPC call:
# RPC endpoint for getting mission info
def get_mission_info(self):
return self.get_mission_info()
# Scheduled or manually called method to send mission info to all users
def publish_mission_info(self):
self.wamp.publish("UPDATE_INFO", [self.get_mission_info()])
def get_mission_info(self):
# Here we generate a JSON serializable dict with the info
return info
And you canimagine, client side (JS or Python) shows a similar duplicity (two handler methods).
Question is: is there a more clever way of handling this, and avoiding that boilerplate code? Some approach I could follow, perhaps automatically sending last event of each type just to clients that ask for it, or that just subscribed? Perhaps something at crossbar level?
In general terms, I feel I could be doing a better state synchronization strategy leveraging these two channels (pub-sub and RPC). How does people do it?
My WAMP server is Crossbar, and my client library is autobahn.js in Python and JS.

WebSocket pushing database updates

Most of the articles on the web dealing with WebSockets are about in-memory Chat.
I'm interested in kind of less instant Chat, that is persistent, like a blog's post's comments.
I have a cluster of two servers handling client requests.
I wonder what could be the best strategy to handle pushing of database update to corresponding clients.
As I'm using Heroku to handle this cluster (of 2 web dynos), I obviously read this tutorial aiming to build a Chat Room shared between all clients.
It uses Redis in order to centralize coming messages; each server listening for new messages to propagate to web clients through websocket connections.
My use case differs in that I've got a Neo4j database, persisting into it each message written by any client.
My goal is to notify each client from a specific room that a new message/comment has just been persisted by a client.
With an architecture similar to the tutorial linked above, how could I filter only new messages to propagate to user? Is there an easy and efficient way to tell Redis:
"(WebSocket saying) When my client initiates the websocket connection, I take care to make a query for all persisted messages and sent them to client, however I want you (Redis) to feed me with all NEW messages, that I didn't send to client, so that I will be able to provide them."
How to prevent Redis from publishing the whole conversation each time a websocket connection is made? It would lead to duplications since the database query already provided the existing contents at the moment.
This is actually a pretty common scenario, where you have three components:
A cluster of stateless web servers that maintain open connections with all clients (load balanced across the cluster, obviously)
A persistent main data storage - Neo4j in your case
A messaging/queueing backend for broadcasting messages across channels (thus across the server cluster) - Redis
Your requirement is for new clients to receive an initial feed of the recent messages, and any consequent messages in real-time. All of this is implemented in your connection handlers.
Essentially, this is what your (pseudo-)code should look like:
class ConnectionHandler:
redis = redis.get_connection()
def on_init():
self.send("hello, here are all the recent messages")
recent_msgs = fetch_msgs_from_neo4j()
self.send(recent_msgs)
redis.add_listener(on_msg)
self.send("now listening on new messages")
def on_msg(msg):
self.send("new message: ")
self.send(msg)
The exact implementation really depends on your environment, but this is the general flow of things.

Web server and ZeroMQ patterns

I am running an Apache server that receives HTTP requests and connects to a daemon script over ZeroMQ. The script implements the Multithreaded Server pattern (http://zguide.zeromq.org/page:all#header-73), it successfully receives the request and dispatches it to one of its worker threads, performs the action, responds back to the server, and the server responds back to the client. Everything is done synchronously as the client needs to receive a success or failure response to its request.
As the number of users is growing into a few thousands, I am looking into potentially improving this. The first thing I looked at is the different patterns of ZeroMQ, and whether what I am using is optimal for my scenario. I've read the guide but I find it challenging understanding all the details and differences across patterns. I was looking for example at the Load Balancing Message Broker pattern (http://zguide.zeromq.org/page:all#header-73). It seems quite a bit more complicated to implement than what I am currently using, and if I understand things correctly, its advantages are:
Actual load balancing vs the round-robin task distribution that I currently have
Asynchronous requests/replies
Is that everything? Am I missing something? Given the description of my problem, and the synchronous requirement of it, what would you say is the best pattern to use? Lastly, how would the answer change, if I want to make my setup distributed (i.e. having the Apache server load balance the requests across different machines). I was thinking of doing that by simply creating yet another layer, based on the Multithreaded Server pattern, and have that layer bridge the communication between the web server and my workers.
Some thoughts about the subject...
Keep it simple
I would try to keep things simple and "plain" ZeroMQ as long as possible. To increase performance, I would simply to change your backend script to send request out from dealer socket and move the request handling code to own program. Then you could just run multiple worker servers in different machines to get more requests handled.
I assume this was the approach you took:
I was thinking of doing that by simply creating yet another layer, based on the Multithreaded Server pattern, and have that layer bridge the communication between the web server and my workers.
Only problem here is that there is no request retry in the backend. If worker fails to handle given task it is forever lost. However one could write worker servers so that they handle all the request they got before shutting down. With this kind of setup it is possible to update backend workers without clients to notice any shortages. This will not save requests that get lost if the server crashes.
I have the feeling that in common scenarios this kind of approach would be more than enough.
Mongrel2
Mongrel2 seems to handle quite many things you have already implemented. It might be worth while to check it out. It probably does not completely solve your problems, but it provides tested infrastructure to distribute the workload. This could be used to deliver the request to be handled to multithreaded servers running on different machines.
Broker
One solution to increase the robustness of the setup is a broker. In this scenario brokers main role would be to provide robustness by implementing queue for the requests. I understood that all the requests the worker handle are basically the same type. If requests would have different types then broker could also do lookups to find correct server for the requests.
Using the queue provides a way to ensure that every request is being handled by some broker even if worker servers crashed. This does not come without price. The broker is by itself a single point of failure. If it crashes or is restarted all messages could be lost.
These problems can be avoided, but it requires quite much work: the requests could be persisted to the disk, servers could be clustered. Need has to be weighted against the payoffs. Does one want to use time to write a message broker or the actual system?
If message broker seems a good idea the time which is required to implement one can be reduced by using already implemented product (like RabbitMQ). Negative side effect is that there could be a lot of unwanted features and adding new things is not so straight forward as to self made broker.
Writing own broker could covert toward inventing the wheel again. Many brokers provide similar things: security, logging, management interface and so on. It seems likely that these are eventually needed in home made solution also. But if not then single home made broker which does single thing and does it well can be good choice.
Even if broker product is chosen I think it is a good idea to hide the broker behind ZeroMQ proxy, a dedicated code that sends/receives messages from the broker. Then no other part of the system has to know anything about the broker and it can be easily replaced.
Using broker is somewhat developer time heavy. You either need time to implement the broker or time to get use to some product. I would avoid this route until it is clearly needed.
Some links
Comparison between broker and brokerless
RabbitMQ
Mongrel2

Resources