Is there a standard approach for deduping parallel event streams ? Before I attempt to reinvent the wheel, I want to know if this problem has some known approaches.
My client component will be communicating with two servers. Each one is providing a near real-time event stream (~1 second). The events may occasionally be out of order. Assume I can uniquely identify the events. I need to send a single stream of events to the consuming code at the same near real-time performance.
A lot has been written about this kind of problem. Here's a foundational paper, by Leslie Lamport:
http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#time-clocks
The Wikipedia article on Operational Transformation theory is a perfectly good starting point for further research:
http://en.wikipedia.org/wiki/Operational_transformation
As for your problem, you'll have to choose some arbitrary weight to measure the cost of delay vs the cost of dropped events. You can maintain two priority queues, time-ordered, where incoming events go. You'd do a merge-and on the heads of the two queues with some delay (to allow for out-of-order events), and throw away events that happened "before" the timestamp of whatever event you last sent. If that's no better than what you had in mind already, well, at least you get to read that cool Lamport paper!
I think that the optimization might be OS-specific. From the task as you described it I think about two threads consuming incoming data and appending it to the common stream having access based on mutexes. Both Linux and Win32 have mutex-like procedures, but they may have slow performance if you have data rate is really great. In this case I'd operate by blocks of data, that will allow to use mutexes not so often. Sure there's a main thread that consumes the data and it also access it with a mutex.
Related
Using queue and first-in-first-out algorithm seems like a standard way of dealing with requests to servers/databases/services. But could any other data structures and algorithms be used for dealing with large quantities of requests?
There is an article at http://queue.acm.org/detail.cfm?id=2991130 "Scaling Synchronization in Multicore Programs" which goes into the detail of reducing the locking costs when multiple cores are trying to update the same FIFO queue.
If different users or different requests have different priorities then a FIFO queue won't do, and you need something that takes account of the priority. I have often seen ordered maps used for this, such as TreeMap and std::map or std::multimap. It's not the use case these are designed for, but people know how to use them, and they are typically highly optimised, so they can be faster than most people's attempt at implementing a more specialised data structure.
What are the techniques, in general, to write system which receives events from outside world? E.g. Bluetooth application can receive many low level events from HCI. It has to track the state which is very often temporary and allow/disallow operation based on current state.
Are there some specific techniques that helps deal with such complexity? Is there any practical literature to describe some in more details?
Well observer pattern is the primary and perhaps only pattern to deal with such cases. Though there are variations to it like message driven collaborations, pub-sub, topic or tag based collaborations but most of them all reuse the ideology of observer pattern.
Now comping to complexity. Normally event observers and publishers are stateless in nature. i.e. they receive and event and they either propagate them or consume them. Now propagation is easy but consumptions may be tricky as the meaning of consumption may differ based on the target and the event type.
Then there are complexities related to performance of the operations of producing and consuming events. Here again a good design would tend to keep events mutually exclusive to each other. that means that two clicks of a button or two file writes are not dependent on each other in any way. Same should be the case about consumption of event as well. The consumption process should not be dependent on the sequence of arrival of events.
A good event observer design leverages stateless producers and consumers, domain driven event objects (an event object that know where it is supposed to go) and concurrency in producing and consuming to the brim.
For a data-mining algorithm I am currently developing using Akka, I was wondering if Akka implements performance optimizations of the messages that are sent.
For instance, if I have an Actor that emits a very large number of messages to the same other Actor, is it good to encapsulate a set of messages into another large message? Or does Akka have some sort of buffer itself so that not one message but many messages are transfered over the network at once?
I am asking this question because the algorithm is supposed to be executed remotely on a cluster where transfer performance is important and I currently have no option to just do benchmarks myself.
For messages passed in Akka on the same machine, I don't think it matters a lot whether you use small message or an aggregation of messages as single message. The additional overhead of many calls versus having to loop while processing the aggregation is minimal I think.
I would prefer using small messages because it keeps the system simpler.
However, when sending messages over the network Akka is using HTTP and so there is the additional HTTP overhead costs for setting up a connection etc. Therefore you might choose here to aggregate some messages into a single message.
However, this also depends on your use case. Buffering implies waiting for more until there are enough (or a timeout occured). If you cannot wait, e.g. because you need fast responses, then you still need to send each message over individually.
I don't think there is a standard Akka actor available which does some aggregation of messages. Maybe a special kind of routing could be applied which does the buffering.
Or you might have a look at Akka Streams. That does support buffering of messages.
I am using the Birman-Schiper-Stephenson protocol of distributed system with the current assumption that peer set of any node doesn't change. As the protocol dictates, the messages which have come out of causal order to a node have to be put in a 'delay queue'. My problem is with the organisation of the delay queue where we must implement some kind of order with the messages. After deciding the order we will have to make a 'Wake-Up' protocol which would efficiently search the queue after the current timestamp is modified to find out if one of the delayed messages can be 'woken-up' and accepted.
I was thinking of segregating the delayed messages into bins based on the points of difference of their vector-timestamps with the timestamp of this node. But the number of bins can be very large and maintaining them won't be efficient.
Please suggest some designs for such a queue(s).
Sorry about the delay -- didn't see your question until now. Anyhow, if you look at Isis2.codeplex.com you'll see that in Isis2, I have a causalsend implementation that employs the same vector timestamp scheme we described in the BSS paper. What I do is to keep my messages in a partial order, sorted by VT, and then when a delivery occurs I can look at the delayed queue and deliver off the front of the queue until I find something that isn't deliverable. Everything behind it will be undeliverable too.
But in fact there is a deeper insight here: you actually never want to allow the queue of delayed messages to get very long. If the queue gets longer than a few messages (say, 50 or 100) you run into the problem that the guy with the queue could be holding quite a few bytes of data and may start paging or otherwise running slowly. So it becomes a self-perpetuating cycle in which because he has a queue, he is very likely to be dropping messages and hence enqueuing more and more. Plus in any case from his point of view, the urgent thing is to recover that missed message that caused the others to be out of order.
What this adds up to is that you need a flow control scheme in which the amount of pending asynchronous stuff is kept small. But once you know the queue is small, searching every single element won't be very costly! So this deeper perspective says flow control is needed no matter what, and then because of flow control (if you have a flow control scheme that works) the queue is small, and because the queue is small, the search won't be costly!
I'm looking for a distributed message queue that will support millions of queues, with each queue handling tens of messages per second.
The messages will be small (tens of bytes), and I don't expect the queues to get very long--on the order of tens of messages per queue at maximum, but when the system is humming along, the queues should stay fairly empty.
I'm not sure how many nodes to expect in the cluster--probably depends on the specific solution, but if I had to guess, I would say ten nodes. I would prefer that queues were relatively resilient to individual node failures within the cluster, but a few lost messages here and there won't make me lose sleep.
Does such a message queue exist? Seems like most of the field is optimized toward handling hundreds of queues with high throughput. But what is SQS built on? Surely not magic.
Update:
By request, it may indeed help to shed light on my problem domain. (I'd left details out before so as not to muddy the waters.) I'm experimenting with distributed cellular automata, with an initial target of a million cells in simulation. In some CA models, it's useful to add an event model, so that a cell can send events to its neighbors. Hence, a million queues, each with one consumer and 8 or so producers.
Costs are a concern for now, as I'm funding the experiments myself. (Thus Amazon's SQS is probably out of reach.)
From your description, it looks like OMG's Data Distribution Service could be a good fit. It is related to message queueing technologies, but I would rather call it a distributed data management infrastructure. It is completely distributed and supports advanced features that give you a lot of control over how the data is distributed, by means of a rich set of Quality of Service settings.
Not knowing much about your problem, I could guess what an approach might be. DDS is about distributing the state of strongly-typed data-items, as structures with typed attributes. You could create a data-type describing the state of an automaton. One of its attributes could be an ID uniquely identifying the automaton in the system. If possible, that would be assigned according to a scheme such that every automaton knows what the ID's of its neighbors are (if they are present). Each automaton would publish its state as needed, resulting in a distributed data-space containing the current state of all automatons. DDS supports so-called partitioning of that data-space. If you took advantage of that, then each of the nodes in your machine would be responsible for a well-defined subset of all automatons. Communication over the wire would only happen for those automatons neighboring a different partition. Since automatons know the ID's of their neighbors, they would be able to query the data-space for the states of the automatons it's interested in.
It is a bit hard to explain without a white board, but the end-result would be a single instance (which are a sort of very light-weight message queues) for most automatons, and two or three instances for those automatons at the border of a partition. If you had ten nodes and one million automata, then each node would have to be able to hold administration for approximately hundred thousand automata. I have seen systems being built with DDS of that scale, and larger, with tens of updates per second for each instance. The nice thing is that this technology scales well with the number of nodes, so you could bring down the resource load per node by adding more nodes.
If this is a research project, then you might even be able to use a commercial product without charge. Just google on dds research license.