Preserving MessageID in Pub-sub [duplicate] - ibm-mq

The JMS 2.0 specification says
The JMSMessageID header field contains a value that uniquely
identifies each message sent by a provider.
...and...
The exact scope of uniqueness is provider defined. It should at least
cover all messages for a specific installation of a provider where an
installation is some connected set of message routers.
The specification does not explicitly state that the JMSMessageID returned from the publish API call must match the one present in the message when it is consumed. The discussion in the spec about moving the JMSMessageID to the JMSCorrelationID when replying to a request implies that the two would be the same. If the message ID was changed between publication and consumption, this style of request/reply would fail.
Certainly in the unified domain model of JMS 1.1 and now 2.0, it would not make sense for the behavior of the JMSMessageID to change depending on whether the destination is a queue or a topic. Under the unified model, one would expect all destinations to act alike in this regard.
Also, if "provider" as used in the first paragraph refers to the thing that is sending messages, then a publication that fanned out to 10 identical messages, with identical JMSMessageID values, would meet the spec since uniqueness is measured at the sending side.
Unfortunately, the specification liberally switches between using the term "provider" to describe the thing sending messages versus using it to describe the vendor of the JMS transport. This is evident in the two quoted passages above. This ambiguity doesn't help matters any.
At least one implementation (IBM's MQ) takes the approach that a publication fanning out to 10 messages has created 10 unique, new messages, and therefore each of these has a unique JMSMessageID value. This is arguably consistent with the second quoted passage which requires uniqueness scoped to the provider, where "provider" appears to refer to the vendor implementation and not the thing sending messages.
It is my belief that when a published message fans out to multiple subscribers the correct behavior would be that the JMSMessageID would be preserved in each instance of the message so that replies can be correlated as expected. In other words, I believe IBM's implementation to be non-compliant. Since the specification is ambiguous on the matter, I'm looking for an authoritative source which either states outright or strongly implies the behavior as intended by the spec, one way or the other. Depending on the response, I'll either stand down, or else raise the issue with IBM as a compliance defect.

The term "provider" here is simply a reference to the specific messaging product being used, and covers both client-side and server-side components. To avoid confusion, I'll use the word JMS product vendor here.
The purpose of the JMS specification is to define a Java API implemented by that messaging product. It uses loose terms like "provider" because the JMS spec does not define how the product is architected and is trying to avoid suggesting how the implementation should be shared between client-side and server-side components, or even whether there is a server (or cluster of servers) at all. You'll notice the spec never (well, almost never) says "the server does this" or "the server does that".
The sentence about the "exact scope of uniqueness" is there to make it easy for the JMS product vendor to implement the code that generates JMSMessageID values. It's saying that the code that generates JMSMessageID values doesn't need to worry about ensuring that the values generated are unique across the entire universe. It's sufficient to ensure that they are unique to that particular product installation.
You say that "The specification does not explicitly state that the JMSMessageID returned from the publish API call must match the one present in the message when it is consumed."
I think this is stated in Section 4.4.11 "How message header values are set". This states that the JMSMessageID is set by the "JMS provider send method". The same section goes on to say that "Message header fields that are defined as being set by the 'JMS provider send method' will be available on the sending client as well as on the receiving client."
This means that after the call to send() or publish() has returned, the sending application can use the method getJMSMessageID() to find the message ID that was assigned to that message. When this message is received, the receiving application can use the same method, and get the same value.
Each message sent to a topic is delivered to every subscriber on that topic. These subscribers will receive a separate copy of the same message, with the same body, properties and headers, including JMSMessageID value.
Feel free to argue; the JMS spec is not free of ambiguities.

I think the issue here is less about when the JMSMessageID field is set on a published message, and more about what happens to that message when it is processed within the JMS provider.
As stated in T.Rob's and Nigel's posts, section 3.4.3 of the JMS 2.0 specification states:
"The JMSMessageID header field contains a value that uniquely
identifies each message sent by a provider."
and also:
"A JMSMessageID is a String value which should function as a unique
key for identifying messages in a historical repository. The exact
scope of uniqueness is provider defined. It should at least cover all
messages for a specific installation of a provider where an
installation is some connected set of message routers."
That is to say, two or more messages, even if they contain the same data, ought to have different JMSMessageID values if they constitute different messages within a repository.
The spec also states, in section 4.2.1 that,
"A topic can be thought of as a mini message broker that gathers and
distributes messages addressed to it. By relying on the topic as an
intermediary, message publishers are kept independent of subscribers
and vice versa."
This would imply that the intention of the spec is that, when a message is sent to a Topic, the Topic can do some work on the message, including creating multiple copies of the message (or, more specifically, creating multiple messages with the same data that are considered separate within the provider's repository.
Finally, section 4.2.2 states:
"A subscription will receive a copy of every message that is sent to
the topic after the subscription is created, ... Each copy of the
message is treated as a completely separate message. Work done on one
copy has no effect on any other; acknowledging one does not
acknowledge any other; one message may be delivered immediately, while
another waits for its consumer to process messages ahead of it."
Putting these passages together, the spec can be read as saying
When a message is sent to a Topic, that Topic can create a copy of the message for each current subscription.
The copies of the message created when sending to a Topic can be considered as completely separate messages.
Because separate JMS messages are uniquely identified by their JMSMessageID field, each separate subscription message should have a different JMSMessageID
To pick up Nigel's last sentence the JMS specification isn't free of ambiguities. This is very true and vendors and customers have previously worked around issues, and work in the expert group does take place to clarify these and provide guidance as well as make suggestions for improving the compliance tests. Based on the understanding outlined above, and the tests within the JMS 2.0 Compliance Test Suite that IBM MQ v8 passes, the IBM MQ v8 implementation is JMS2.0 compliant (and likewise earlier IBM MQ versions are JMS1.1 compliant; the JMS 1.1 specification has the same ambiguity).
The request-response paradigm is a common one, though with a pub-sub based distribution model the sending application does potentially have to cope with multiple responses not just the one that would be more likely with a point-point architecture. We acknowledge that there are messaging scenarios where the capability for a message id to have a different 'value of uniqueness' from the one currently implemented by IBM MQ would provide value to some IBM MQ customers
For the above reasons IBM strongly believes that its MQ JMS solution is compliant, so a PMR will not be accepted. However, we do acknowledge that there are a number of use cases where maintaining the message ID would be beneficial to you. For that reason we will make RFE 35062 an uncommitted candidate, which means it has the highest probability of being addressed and we promise that we're actively working to provide the solution that best fits the needs as quickly as possible. But to do this we'd appreciate additional feedback on the RFE with descriptions of what the actual problems our users are trying to solve here. For example is this for audit purposes, request-reply, message flows, etc, and what it is you need replicated? The more information we have, the more likely the solution is to satisfy the need.

Related

What is the Purpose of the DestinationAddress field in the MassTransit Envelope?

When sending a message, MassTransit wraps that payload with an envelope which has a field called destinationAddress. What purpose does this field have?
I found this because I have a number of C# microservices communicating with some node and java based services - so I've been using the minimum payload defined here:
http://masstransit-project.com/MassTransit/advanced/interoperability.html
I've had no problem integrating the two services together I was just wondering what the point was of having the destinationAddress as part of the message itself? Is it just a belts and braces kind of thing to make sure messages don't go on the wrong queue by mistake?
I would have thought that all of this information can be derived since it is literally just built up of a) the message bus host and b) the queue name used when actually sending the message?
Transports have a variety of ways to delivering messages. For instance, publishing a message to a topic would set the destination address to (URI of topic) but it may be delivered to a queue (via a subscription, forwarded by the transport) with a different address. In this case, the envelope has the original destinationAddress, whereas the queue would have a different address.
There are also cases where messages may be scheduled, redelivered, faulted, etc., and having that information helps in troubleshooting production systems in cases where the original destination may not be known otherwise.
So, yeah, in the simplest case it seems superfluous, however, it comes in useful down the road when trying to figure out why something doesn't work.

Should the JMSMessageID change between publish and subscribe?

The JMS 2.0 specification says
The JMSMessageID header field contains a value that uniquely
identifies each message sent by a provider.
...and...
The exact scope of uniqueness is provider defined. It should at least
cover all messages for a specific installation of a provider where an
installation is some connected set of message routers.
The specification does not explicitly state that the JMSMessageID returned from the publish API call must match the one present in the message when it is consumed. The discussion in the spec about moving the JMSMessageID to the JMSCorrelationID when replying to a request implies that the two would be the same. If the message ID was changed between publication and consumption, this style of request/reply would fail.
Certainly in the unified domain model of JMS 1.1 and now 2.0, it would not make sense for the behavior of the JMSMessageID to change depending on whether the destination is a queue or a topic. Under the unified model, one would expect all destinations to act alike in this regard.
Also, if "provider" as used in the first paragraph refers to the thing that is sending messages, then a publication that fanned out to 10 identical messages, with identical JMSMessageID values, would meet the spec since uniqueness is measured at the sending side.
Unfortunately, the specification liberally switches between using the term "provider" to describe the thing sending messages versus using it to describe the vendor of the JMS transport. This is evident in the two quoted passages above. This ambiguity doesn't help matters any.
At least one implementation (IBM's MQ) takes the approach that a publication fanning out to 10 messages has created 10 unique, new messages, and therefore each of these has a unique JMSMessageID value. This is arguably consistent with the second quoted passage which requires uniqueness scoped to the provider, where "provider" appears to refer to the vendor implementation and not the thing sending messages.
It is my belief that when a published message fans out to multiple subscribers the correct behavior would be that the JMSMessageID would be preserved in each instance of the message so that replies can be correlated as expected. In other words, I believe IBM's implementation to be non-compliant. Since the specification is ambiguous on the matter, I'm looking for an authoritative source which either states outright or strongly implies the behavior as intended by the spec, one way or the other. Depending on the response, I'll either stand down, or else raise the issue with IBM as a compliance defect.
The term "provider" here is simply a reference to the specific messaging product being used, and covers both client-side and server-side components. To avoid confusion, I'll use the word JMS product vendor here.
The purpose of the JMS specification is to define a Java API implemented by that messaging product. It uses loose terms like "provider" because the JMS spec does not define how the product is architected and is trying to avoid suggesting how the implementation should be shared between client-side and server-side components, or even whether there is a server (or cluster of servers) at all. You'll notice the spec never (well, almost never) says "the server does this" or "the server does that".
The sentence about the "exact scope of uniqueness" is there to make it easy for the JMS product vendor to implement the code that generates JMSMessageID values. It's saying that the code that generates JMSMessageID values doesn't need to worry about ensuring that the values generated are unique across the entire universe. It's sufficient to ensure that they are unique to that particular product installation.
You say that "The specification does not explicitly state that the JMSMessageID returned from the publish API call must match the one present in the message when it is consumed."
I think this is stated in Section 4.4.11 "How message header values are set". This states that the JMSMessageID is set by the "JMS provider send method". The same section goes on to say that "Message header fields that are defined as being set by the 'JMS provider send method' will be available on the sending client as well as on the receiving client."
This means that after the call to send() or publish() has returned, the sending application can use the method getJMSMessageID() to find the message ID that was assigned to that message. When this message is received, the receiving application can use the same method, and get the same value.
Each message sent to a topic is delivered to every subscriber on that topic. These subscribers will receive a separate copy of the same message, with the same body, properties and headers, including JMSMessageID value.
Feel free to argue; the JMS spec is not free of ambiguities.
I think the issue here is less about when the JMSMessageID field is set on a published message, and more about what happens to that message when it is processed within the JMS provider.
As stated in T.Rob's and Nigel's posts, section 3.4.3 of the JMS 2.0 specification states:
"The JMSMessageID header field contains a value that uniquely
identifies each message sent by a provider."
and also:
"A JMSMessageID is a String value which should function as a unique
key for identifying messages in a historical repository. The exact
scope of uniqueness is provider defined. It should at least cover all
messages for a specific installation of a provider where an
installation is some connected set of message routers."
That is to say, two or more messages, even if they contain the same data, ought to have different JMSMessageID values if they constitute different messages within a repository.
The spec also states, in section 4.2.1 that,
"A topic can be thought of as a mini message broker that gathers and
distributes messages addressed to it. By relying on the topic as an
intermediary, message publishers are kept independent of subscribers
and vice versa."
This would imply that the intention of the spec is that, when a message is sent to a Topic, the Topic can do some work on the message, including creating multiple copies of the message (or, more specifically, creating multiple messages with the same data that are considered separate within the provider's repository.
Finally, section 4.2.2 states:
"A subscription will receive a copy of every message that is sent to
the topic after the subscription is created, ... Each copy of the
message is treated as a completely separate message. Work done on one
copy has no effect on any other; acknowledging one does not
acknowledge any other; one message may be delivered immediately, while
another waits for its consumer to process messages ahead of it."
Putting these passages together, the spec can be read as saying
When a message is sent to a Topic, that Topic can create a copy of the message for each current subscription.
The copies of the message created when sending to a Topic can be considered as completely separate messages.
Because separate JMS messages are uniquely identified by their JMSMessageID field, each separate subscription message should have a different JMSMessageID
To pick up Nigel's last sentence the JMS specification isn't free of ambiguities. This is very true and vendors and customers have previously worked around issues, and work in the expert group does take place to clarify these and provide guidance as well as make suggestions for improving the compliance tests. Based on the understanding outlined above, and the tests within the JMS 2.0 Compliance Test Suite that IBM MQ v8 passes, the IBM MQ v8 implementation is JMS2.0 compliant (and likewise earlier IBM MQ versions are JMS1.1 compliant; the JMS 1.1 specification has the same ambiguity).
The request-response paradigm is a common one, though with a pub-sub based distribution model the sending application does potentially have to cope with multiple responses not just the one that would be more likely with a point-point architecture. We acknowledge that there are messaging scenarios where the capability for a message id to have a different 'value of uniqueness' from the one currently implemented by IBM MQ would provide value to some IBM MQ customers
For the above reasons IBM strongly believes that its MQ JMS solution is compliant, so a PMR will not be accepted. However, we do acknowledge that there are a number of use cases where maintaining the message ID would be beneficial to you. For that reason we will make RFE 35062 an uncommitted candidate, which means it has the highest probability of being addressed and we promise that we're actively working to provide the solution that best fits the needs as quickly as possible. But to do this we'd appreciate additional feedback on the RFE with descriptions of what the actual problems our users are trying to solve here. For example is this for audit purposes, request-reply, message flows, etc, and what it is you need replicated? The more information we have, the more likely the solution is to satisfy the need.

Camel JMS ensuring ordering when unsidelining from dead letter channel

I am using camel to integrate with ActiveMQ JMS. I am receiving prices for products on this queue. I am using JMSXGroupID on productId to ensure ordering across a productId. Now if I fail to process this message I move it to a DeadLetterQueue. This could be because of a connection error on a dependent service or because of error with the message itself.
In case of the former I would have to manually remove it from the DLQ and put it back into the JMS queue.
Now the problem is that I dont know if any other message on that groupId has been received and processed or not. And hence unsidelining from DLQ will disrupt the order. On the other hand if I dont unsideline it and no other message has been received the product Id will not get the correct price.
1 solution that I have in mind is to use a fast key-value store(Redis) to store the last messageId or JMSTimestamp against a productId(message group). This is updated everytime I dequeue a message. Any other solution for this?
Relying on message order in JMS is a risky business - at best.
The best thing to do is to make the receiver handle messages out of sequence as a special case (but may take advantage message order during normal operation).
You may also want to distinguish between two errors: posion messages and temporary connection problems, maybe even use two different error queues for them. In the case of a posion message (invalid payload etc.) then there is nothing you can really do about it except starting a bug investigation. In such cases, you can probably send along "something else", such as dummy message to not interfere with order.
For the issues with connection problems, you can have another strategy - ActiveMQ Redelivery Policies. If there is network trouble, it's usually no use in trying to process the second message until the first has been handled. A Redelivery Policy ensures that (given you have a single consumer, that is). There is another question at SO where the poster actually has a solution to your problem and wants to avoid it. Read it. :)

Is there an enterprise message queue which can drop duplicate messages (first value stays)?

I am looking looking for a message queue with these requirements. Couldn't find it; maybe the closest was the rabbitmq-lvc plugin (but I need the first value in the line to stick and stay in front).
Would anyone know a technology to support these?
message queue is FIFO
if a duplicate message is being enqueued, the message queue itself either rejects or drops it.
For example, producers put these three messages (each with a discriminator value) into the queue in this sequence: M1(discriminator=7654), M2(discriminator=2435), M3(discriminator=7654).
Now I want the message queue to see that M3 has the same discriminator value as M1 and thus drop/reject M3. Consumers receive only: M1, M2.
Thanks
Tom
I don't know the other transports but I know that WebSphere MQ doesn't do this and I believe that the explanation why would apply broadly across the category. I'd be very surprised to find that any messaging transport actually provides this. Here are a few reasons why:
Async messages are supposed to be atomic. Different vendors make their own accommodations for message affinity (a relationship between two or more messages) but as a rule, message affinity is to be avoided. Your use case not only requires the transport to deal with message affinity, but to do so over an indeterminate interval between related messages.
Message payload is a blob. For performance reasons, WMQ doesn't touch message payloads except for things like compression or code page conversion. Anything that requires parsing the message payload is a job for WebSphere Message Broker, DataPower or WebSphere ESB. I would expect any messaging transport which claims to be performant would face similar issues because parsing payloads results in longer code paths and non-linear performance degradation. The exception is message properties but WMQ uses these for selection only and I expect that is generally the case.
Stateless operation. As a transport, the state of the application may be stored in a persistent message but the state of the transport layer should not depend on the state of the application across different units of work. Again, an ESB type of product is best suited when you want to delegate management of some of the application state to the messaging layer and especially when such management spans many units of work.
Assured delivery. WMQ was designed to never lose your persistent message. If the app explicitly sets expiry the message might go away because the sender said it was OK to do so. If the message is non-persistent it might go away, but only in an exceptional condition and, again, because the sender said it was OK to do so. The use case you describe might result in a message going away not because the sender said it was OK, or even because the recipient said it was OK but because of an interaction with some unrelated 3rd party who happened to beat you to the queue with a duplicate value. What if that first message has an invalid header or code page problem and gets rolled back? What if I as an attacker spew out garbage messages with all possible 4-digit values for discriminator?
As I said, I don't know the other messaging products so there may be something out there which meets your requirement and if so I'll be interested to read about it. However in the event hat nobody replies, this post may shed some light on the reasons why.

Query regarding the java message queue

I have a design query regarding queues. My scenario is as follows:
I have to use a messaging system, with single producer and multiple consumers (asynchronous). The producer pushes different types of messages into the messaging system. Depending upon the message type, that particular consumer has to consume that message. (Each consumer is running on a different server). If one consumer is down and a message comes for that consumer, it will be in the messaging system only. If I use a message queue, the message in the queue will block the next messages that can be consumed by the other consumers. Are queues suitable for handling this kind of situation? Or do we need to go for a topic?
Whether you use a queue or a topic should depend on whether there an instance where multiple consumers must process the same message. If that is the case then a topic is required do generate that one-to-many pattern.
On the other hand, if any one message will only ever be consumed by one consumer, then you can use a queue or topic and the consumers specify the message type as a JMS selector. In this way, all consumers can listen on the same queue and each selects a different subset of messages. In the event one application is not there, it's messages do not "block the next messages that can be consumed by the other consumers" but rather they just stack up in the queue and other consumers still receive their messages based on selection criteria.
Please also realize that queues are lightweight constructions and you can easily have one queue per consumer. Typically, things providing a service listen on a well-known queue and each queue represents a different function of the service or a different service. Thus there may be many service input queues. Similarly, reply messages are generally uniquely addressed to the application instance that made the request and go to a unique, often dynamic, reply-to queue. Both of these implementations I have described lead to a separation of traffic across queues rather than pooling different message types into the same queue. Since JMS selectors always impart an additional processing cost, using more queues is generally more performant than selecting many types of message from the same queue.
I am responding to your question about selectors in the comment section here since I have more space and can put links in...
Section 3.8.1 of the JMS 1.1 spec states:
A JMS message selector allows a client to specify, by message header, the
messages it’s interested in. Only messages whose headers and properties
match the selector are delivered. The semantics of not delivered differ a bit
depending on the MessageConsumer being used. See Section 5.8,
“QueueReceiver,” and Section 6.11, “TopicSubscriber,” for more details.
Message selectors cannot reference message body values.
A message selector matches a message if the selector evaluates to true when
the message’s header field and property values are substituted for their
corresponding identifiers in the selector.
As noted above, selectors can be on fields that are implicit in the message such as MsgID or CorrelationID or thsey can be on fields specifically set by the message producer such as a message property. Either way, the client must specify the value of any selectors used by the message consumer.

Resources