I am reading the documentation about multicasting saying:
In RxJS observables are cold, or unicast by default. These operators can make an observable hot, or multicast, allowing side-effects to be shared among multiple subscribers.
I wonder whether all cold observables are unicast and all hot are multicast, or whether there can be any exceptional cases.
In the link provided by #lagoman (which I've read multiple times but still missed that point) I found some interesting explanation about hot-multicast relationship:
² Hot observables are usually multicast, but they could be listening to a producer that only supports one listener at a time. The grounds for calling it “multicast” at that point are a little fuzzy.
Related
I use Rabbitmq with microservice architecture. I use topic and direct exchange for many of my use-cases, and it works fine. However I have a use-case where I have to delete a record from database. When I deleted the record several other services needs to be called and maintain/delete the referenced records. I could achieve that by simple call those services with direct exchange, but I read that it is choreography preferred instead orchestration. That means the I should implement publish/subscribe pattern(fanout in rabbitmq).
My question is that if I use the publish/subscribe pattern in a distributed system how to make sure that only one instance by service consumes the published messages?
Your question doesn't deal so much with publish-subscribe, as it does with basic message processing. The fundamental issue is whether or not you can guarantee that an operation will be performed exactly one time. The short answer is that you probably want to use a direct exchange such that a message goes into one queue and is processed by one (of possibly many) consumers.
The long answer is that "exactly once" cannot be guaranteed, so you need to make this part of your design.
Background
It is best practice to have message processing be an idempotent operation. In fact, idempotency is a critical design assumption of almost any external interface (and I would argue it is equally-important in internal interfaces).
Additionally, you should be aware of the fact it is not possible to guarantee "exactly once" delivery. Mathematically, no such guarantee can be made. Instead, you can have one of two things (being mutually exclusive):
At most once delivery (0 < n <= 1)
At least once delivery (1 <= n)
From the RabbitMQ documentation:
Use of acknowledgements guarantees at-least-once delivery. Without acknowledgements, message loss is possible during publish and consume operations and only at-most-once delivery is guaranteed.
Several things are happening when messages are published and consumed. Because of the asynchronous nature of message handling systems, and the AMQP protocol in particular, there is no way to guarantee exactly once processing while still yielding the performance you would need from a messaging system (essentially, to try to ensure exactly-once would forces everything through a serial process at the point of de-duplication).
Design Implications
Given the above, it is important that your design rely upon "at least once" delivery. For a delete operation, this involves re-writing the definition of that operation to be assertive rather than procedural (e.g. "Delete this" becomes "Ensure this does not exist."). The difference is that you describe the end-state rather than the process.
I thing you should have a separate queue for each of the service that instance should be notified about db record deletion. The exchanger puts a copy of a message in all queues. Service instances compete for access to dedicated queue (only one gets a message).
What are the techniques, in general, to write system which receives events from outside world? E.g. Bluetooth application can receive many low level events from HCI. It has to track the state which is very often temporary and allow/disallow operation based on current state.
Are there some specific techniques that helps deal with such complexity? Is there any practical literature to describe some in more details?
Well observer pattern is the primary and perhaps only pattern to deal with such cases. Though there are variations to it like message driven collaborations, pub-sub, topic or tag based collaborations but most of them all reuse the ideology of observer pattern.
Now comping to complexity. Normally event observers and publishers are stateless in nature. i.e. they receive and event and they either propagate them or consume them. Now propagation is easy but consumptions may be tricky as the meaning of consumption may differ based on the target and the event type.
Then there are complexities related to performance of the operations of producing and consuming events. Here again a good design would tend to keep events mutually exclusive to each other. that means that two clicks of a button or two file writes are not dependent on each other in any way. Same should be the case about consumption of event as well. The consumption process should not be dependent on the sequence of arrival of events.
A good event observer design leverages stateless producers and consumers, domain driven event objects (an event object that know where it is supposed to go) and concurrency in producing and consuming to the brim.
I am looking at using masstransit and have a need for selectively sending messages to consumers at the end if unreliable and slow network links (they are in the same WAN but use a slow and expensive cellular link).
I am expecting a fanout of 1 to 200 where the sites with lowest volume of messages and least reliable / most expensive links need to ignore the potentially high amount of message traffic othe consumers will see
I have looked at using the Selective consumer interface but this seems to imply that the message is always sent to all consumers, and then discarded if it doesn't match the predicate. This overhead is not acceptable.
Without using endpoint factory and manually managing uri end points to do a Send(), is there a nice way to do thus using subscriptions?
Simple answer: nope.
You do have a few options though. Is it just routing based upon load/processing? You could use competing consumers to do load balancing. All the endpoints read off the same queue (but they must be the same consumers on every process reading from the queue) and just pick up the next one. If you're slow, you just pick off fewer messages. (You can only use competing consumers with RabbitMQ).
For MSMQ there's a distributor that was built for load balancing. You could look at rebuilding that on top of RabbitMQ that if that's your transport. It's not super complicated, but would take some effort to do.
Other than that, I think you're likely down to writing something from scratch. It's not really pub/sub any more. So it falls outside MT's wheelhouse.
I am not clear on the idea of a Queue. It seems that this term is ambiguous or at least I am confused about it.
While it seems that the most common explanation of a queue (e.g. in wikipedia) is that it is an Abstract Data Type that adheres to the FIFO principle, in practice this term appears to have a broader meaning.
For example, we have
Priority Queues where each item is retrieve according to a priority,
we have a stack which also is a form of inverse queue (LIFO),
we have message queues, which seem to be just a list of items with no
ordering, there by classifying a simple list as a queue etc
So could someone please help me out here on why exactly a queue has so many different meanings?
A queue is inherently a data structure following the FIFO principle as its default nature.
Let us treat this queue as a queue in our natural day-to-day lives. Take an example of a queue on the railway station for purchasing tickets.
Normal queue: The person standing front-most in the queue gets the ticket, and any new person arriving stands at the end of the queue, waiting for his turn to get a ticket.
Priority queue: Suppose you are a VIP standing in the middle of that queue. The ticket vendor immediately notices you, and calls you to the front of the queue to get your tickets, even though its not your turn to purchase. Had you not been important, the queue would have kept playing its usual role, but as soon as any element is considered more important than the other, its picked up, irrespective of its position in the queue. But otherwise, the default nature of the queue remains the same.
Stack: Let's not confuse it with the queue at all. The purpose of the stack is inherently different from that of a queue. Take an example of dishes washed and kept in your kitchen, where the last dish washed is the first one to be picked for serving. So, stack and queue have a different role to play in different situations, and should not be confused with each other.
Message queue: As is the case with priority queue, the default nature of this queue is that the message that comes first is read first, while the upcoming messages line up in the queue waiting for their turn, unless a message is considered more important than the other and is called to the front of the queue before its usual turn.
So, the default nature of any type of queue remains the same, it continues to follow its FIFO principle unless its made to do otherwise, in special circumstances.
Hope it helps
In general, a queue models a waiting area where items enter and are eventually selected and removed. However, different queues can have different scheduling policies such as First-In-First-Out (FIFO), Last-In-First-Out (LIFO), Priority, or Random. For example, queueing theory addresses all of these as queues.
However, in computer science/programming, we typically use the word "queue" to refer specifically to FIFO queues, and use the other words (stack, priority queue, etc.) for the other scheduling policies. In practice, you should assume FIFO when you hear the word queue, but don't completely rule out the possibility that the speaker might be using the word more generally.
As an aside, similar issues arise with the word "heap" which, depending on context, might refer to a specific implementation of priority queues or to priority queues in general (or, in a mostly unrelated meaning, to an area of memory used for dynamic allocation).
Priority Queue: Its not a queue. Take a look: http://docs.oracle.com/javase/7/docs/api/java/util/PriorityQueue.html it does not implement the Queue interface.
Stack: Its not a queue. Take a look: http://docs.oracle.com/javase/7/docs/api/java/util/Stack.html it does not implement the Queue interface.
Message queue: I do not know what it is.
Queue: Queue has only one meaning, whoever comes first also get served first.
Queue ADT: It is an interface, meaning it has bunch of functions. Most common ones: add-adds to the end of the line, remove-removes from the beginning of the line. http://docs.oracle.com/javase/7/docs/api/java/util/Queue.html
Is there a standard approach for deduping parallel event streams ? Before I attempt to reinvent the wheel, I want to know if this problem has some known approaches.
My client component will be communicating with two servers. Each one is providing a near real-time event stream (~1 second). The events may occasionally be out of order. Assume I can uniquely identify the events. I need to send a single stream of events to the consuming code at the same near real-time performance.
A lot has been written about this kind of problem. Here's a foundational paper, by Leslie Lamport:
http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#time-clocks
The Wikipedia article on Operational Transformation theory is a perfectly good starting point for further research:
http://en.wikipedia.org/wiki/Operational_transformation
As for your problem, you'll have to choose some arbitrary weight to measure the cost of delay vs the cost of dropped events. You can maintain two priority queues, time-ordered, where incoming events go. You'd do a merge-and on the heads of the two queues with some delay (to allow for out-of-order events), and throw away events that happened "before" the timestamp of whatever event you last sent. If that's no better than what you had in mind already, well, at least you get to read that cool Lamport paper!
I think that the optimization might be OS-specific. From the task as you described it I think about two threads consuming incoming data and appending it to the common stream having access based on mutexes. Both Linux and Win32 have mutex-like procedures, but they may have slow performance if you have data rate is really great. In this case I'd operate by blocks of data, that will allow to use mutexes not so often. Sure there's a main thread that consumes the data and it also access it with a mutex.