When to use persistence with Java Messaging and Queuing Systems - jms

I'm performing a trade study on (Java) Messaging & Queuing systems for an upcoming re-design of a back-end framework for a major web application (on Amazon's EC2 Cloud, x-large instances). I'm currently evaluating ActiveMQ and RabbitMQ.
The plan is to have 5 different queues, with one being a dead-letter queue. The number of messages sent per day will be anywhere between 40K and 400K. As I plan for the message content to be a pointer to an XML file location on a data store, I expect the messages to be about 64 bytes. However, for evaluation purposes, I would also like to consider sending raw XML in the messages, with an average file size of 3KB.
My main questions: When/how many messages should be persisted on a daily basis? Is it reasonable to persist all messages, considering the amounts I specified above? I know that persisting will decrease performance, perhaps by a lot. But, by not persisting, a lot of RAM is being used. What would some of you recommend?
Also, I know that there is a lot of information online regarding ActiveMQ (JMS) vs RabbitMQ (AMQP). I have done a ton of research and testing. It seems like either implementation would fit my needs. Considering the information that I provided above (file sizes and # of messages), can anyone point out a reason(s) to use a particular vendor that I may have missed?
Thanks!

When/how many messages should be persisted on a daily basis? Is it
reasonable to persist all messages, considering the amounts I
specified above?
JMS persistence doesn't replace a database, it should be considered a short-lived buffer between producers and consumers of data. that said, the volume/size of messages you mention won't tax the persistence adapters on any modern JMS system (configured properly anyways) and can be used to buffer messages for extended durations as necessary (just use a reliable message store architecture)
I know that persisting will decrease performance, perhaps by a lot.
But, by not persisting, a lot of RAM is being used. What would some of
you recommend?
in my experience, enabling message persistence isn't a significant performance hit and is almost always done to guarantee messages. for most applications, the processes upstream (producers) or downstream (consumers) end up being the bottlenecks (especially database I/O)...not JMS persistence stores
Also, I know that there is a lot of information online regarding
ActiveMQ (JMS) vs RabbitMQ (AMQP). I have done a ton of research and
testing. It seems like either implementation would fit my needs.
Considering the information that I provided above (file sizes and # of
messages), can anyone point out a reason(s) to use a particular vendor
that I may have missed?
I have successfully used ActiveMQ on many projects for both low and high volume messaging. I'd recommend using it along with a routing engine like Apache Camel to streamline integration and complex routing patterns

A messaging system must be used as a temporary storage. Applications should be designed to pull the messages as soon as possible. The more number of messages lesser the performance. If you are pulling of messages then there will be a better performance as well as lesser memory usage. Whether persistent or not memory will still be used as the messages are kept in memory for better performance and will backed up on disk if a message type is persistent only.
The decision on message persistence depends on how critical a message is and does it require to survive a messaging provider restart.
You may want to have a look at IBM WebSphere MQ. It can meet your requirements. It has JMS as well as proprietary APIs for developing applications.

ActiveMQ is a good choice for open source JMS, more expensive ones I can recommend are TIBCO EMS or maybe Solace.
But JMS is actually built for once-only delivery and longer persistence is left out of the specification. You could of course go database, but that's heavy weight and possibly expensive.
What I would recommend (Note: I work for CodeStreet) is our 'ReplayService for JMS'. It let's you store any type of JMS messages (or native WebSphere MQ ones) in a high-performance file-based disk storage. Each message is automatically assigned a nanosecond timestamp and a globalMsgID that you can overwrite on publication. So the XML messages could be recorded by the ReplayServer and your actual message could just contain the globalMsgID as reference. And maybe some properties ?
Once a receiver receives the globalMsgID, it could then replay that message from the ReplayServer, if needed.
But on the other hand, 400K*3KB XML message should be easily doable for ActiveMQ or others. Also, you should compress your XML messages before sending.

Related

Can MQ Support Multiple Separate Clients for the Same Queue While Maintaining Independent Messaging?

We have multiple application environments (development, QA, UAT, etc) that need to connect to fewer provider environments through MQ. For example, the provider only has one test (we'll call it TEST1) environment to which all of the client application environments need to interact. It is imperative that each client environment only receives MQ responses to the messages sent by that respective environment. This is a high volume scenario so correlating message IDs has been ruled out.
Right now TEST1 has a queue set up and is functional, but if one of the client app's environments wants to use it the others have to be shut off so that messaging doesn't overlap.
Does MQ support a model having multiple clients connect to a single queue while preserving the client-specific messaging? If so, where is that controlled (i.e. the channel, queue manager, etc)? If not, is the only solution to set up additional queues for each corresponding client?
Over the many years I have worked with IBM MQ, I have gone back and forth on this issue. I've come to the conclusion that sharing a queue just makes life more difficult. Queues should be handed out like candy on Halloween. If an application team says that they have 10 components to their application then the MQAdmin should give them 10 queues. To the queue manager or server or CPU or hard disk, there is no difference in resource usage.
Also, use an MQ naming standard that makes sense and is easy to apply security to. i.e. for HR (Human Resource) department
HR.PAYROLL.SALARY
HR.PAYROLL.DEDUCTIONS
HR.PAYROLL.BENEFITS
HR.EMPLOYEE.DETAILS
HR.EMPLOYEE.REVIEWS
etc...
You could use a selector such as MQGET(where applname="myapp") or based on a specific user-defined property assuming the sender populates such a property but that's likely to be worse performance than any retrieval by msgid or correlid. Though you've not given any information to demonstrate that get-by-correlid is actually problematic.
And of course any difference between a test and production environment - whether it involves code or configuration - is going to be very risky.
You would not normally share a single destination queue between multiple different application types - multiple queues is far more standard.

ActiveMQ non-persistent delivery mode limitations?

I am using ActiveMQ where I need following requirements
To have very fast consumers as my producers are already very fast
Need processing at lease 2K messages per second
Not require to process/consume messages again in case of server crash or other failures. I can trigger whole process again.
Needs to run very normal configuration server - 4Gib RAM
I have configured ActiveMQ as given below
Using non-persistent delivery mode (vm://localhost)(http://activemq.apache.org/what-is-the-difference-between-persistent-and-non-persistent-delivery.html)
Using spring integration for put/fetch messages in/from queue/channel.
Using max-concurrent-consumers with 10 threads
Assume all other configs are by default with ActiveMQ and Sprig-integration.
Problems/Questions
I am not sure how ActiveMQ stores messages in case of non-persistent delivery mode, is it possible that my process will fail with out of memory errors once my queue size exceed some limit? I am asking this because it's very difficult to test whole process for me. So I needs to be aware about limitation before I trigger the process.
If non-persistent delivery mode is not sufficient with my above requirements, is there any performance tuning tips with which I can achieve my requirements with persistent delivery mode (tcp://). I have already tested with this mode, but it seems consumers are very slow here. Also, I have already tried to use DUPS_OK_ACKNOWLEDGE to make my consumer fast with persistent delivery mode but no luck.
NOTE : I am using latest ActiveMQ version 5.14
I am not sure how ActiveMQ stores messages in case of non-persistent delivery mode
Activemq store messages in the memory at first, and it will also swap it to the disk(there is a tmp_storage folder in activemq's data path).
is it possible that my process will fail with out of memory errors once my queue size exceed some limit
I have never met out of memory in activemq, even with about one million messages.
You can also make sure by the producer flow control(http://activemq.apache.org/producer-flow-control.html).
You can make the producer hang when there is too many messages not consumed.
And about performance of persistent delivery, I also have no good methods.

What is the best way to deliver real-time messages to Client that can not be requested

We need to deliver real-time messages to our clients, but their servers are behind a proxy, and we cannot initialize a connection; webhook variant won't work.
What is the best way to deliver real-time messages considering that:
client that is behind a proxy
client can be off for a long period of time, and all messages must be delivered
the protocol/way must be common enough, so that even a PHP developer could easily use it
I have in mind three variants:
WebSocket - client opens a websocket connection, and we send messages that were stored in DB, and messages comming in real time at the same time.
RabbitMQ - all messages are stored in a durable, persistent queue. What if partner will not read from a queue for some time?
HTTP GET - partner will pull messages by blocks. In this approach it is hard to pick optimal pull interval.
Any suggestions would be appreciated. Thanks!
Since you seem to have to store messages when your peer is not connected, the question applies to any other solution equally: what if the peer is not connected and messages are queueing up?
RabbitMQ is great if you want loose coupling: separating the producer and the consumer sides. The broker will store messages for you if no consumer is connected. This can indeed fill up memory and/or disk space on the broker after some time - in this case RabbitMQ will shut down.
In general, RabbitMQ is a great tool for messaging-based architectures like the one you describe:
Load balancing: you can use multiple publishers and/or consumers, thus sharing load.
Flexibility: you can configure multiple exchanges/queues/bindings if your business logic needs it. You can easily change routing on the broker without reconfiguring multiple publisher/consumer applications.
Flow control: RabbitMQ also gives you some built-in methods for flow control - if a consumer is too slow to keep up with publishers, RabbitMQ will slow down publishers.
You can refactor the architecture later easily. You can set up multiple brokers and link them via shovel/federation. This is very useful if you need your app to work via multiple data centers.
You can easily spot if one side is slower than the other, since queues will start growing if your consumers can't read fast enough from a queue.
High availability and fault tolerance. RabbitMQ is very good at these (thanks to Erlang).
So I'd recommend it over the other two (which might be good for a small-scale app, but you might grow it out quickly is requirements change and you need to scale up things).
Edit: something I missed - if it's not vital to deliver all messages, you can configure queues with a TTL (message will be discarded after a timeout) or with a limit (this limits the number of messages in the queue, if reached new messages will be discarded).

Sending files over MSMQ

In a retail scenario where each stores report their daily transaction to the backend system at the end of the day. Today a file consisting of the daily transactions and some other meta information is transferred from the stores to the backend using FTP. I’m currently investigating replacing FTP with something else. MSMQ has been suggested as an alternative transport mechanism. So my question is, do we need to write a custom windows service that sticks the daily transactions file into a message object and sends it on its way or is there any out the box mechanism in MSMQ to handle this?
Also, since the files we want to transfer can reach 5-6 Mb for large stores should we rule out MSMQ? In that case is there any other suggested technologies we should investigate?
Cheers!
NServiceBus provides a nice abstraction over MSMQ for situations like this. You get the reliable messaging aspects of MSMQ, along with a very nice programming model for defining your messages.
MSMQ is limited to a 4MB message size, however, and there are two ways you could deal with this in NServiceBus:
NServiceBus has a concept called the Data Bus, which takes the large attachments in your messages and transmits them reliably using another method. This is handled by the infrastructure and as far as your message handlers are concerned, the data is just there.
You could break up the payload into smaller atomic messages and send them as normal messages. The NServiceBus infrastructure would ensure that they all arrive at their destination and are processed. I would recommend this method unless it's absolutely critical that the entire huge data dump is processed as one atomic transaction.
One other thing to note is that the fact that you do nightly dumps is probably a limitation of a previous system. With NServiceBus it may be possible to change the system so that these bits of information are sent in a more immediate fashion, which will result in much more up-to-date data all the time, which may be a big win for the business.
You can look at IBM Sterling Managed File Transfer and WebSphere MQ Managed File Transfer products.
You can consider WebSphere MQ MFT if you require both messaging and file transfer capabilities. On the other hand if your requirement is just file transfer then you can look at Sterling MFT.
Sending files over a messaging transport is not trivial. If you put the entire file into a single message you can have the atomicity you need but tuning the messaging provider for wide variance in message sizes can be challenging. If all the files are of about the same size, one per message is about the simplest solution.
On the other hand, you can split the files into multiple messages but then you've got to reassemble them, in the right order, include a protocol to detect and resend missing segments, integrity-check the file received against the file sent, etc. You also probably want to check that the files on either end did not change during the transmission.
With any of these systems you also need the system to be smart enough to manage the disposition of sending and receiving files under normal and exception conditions, log the transfers, etc.
So when considering whether to move to messaging the two best options are either to move natively to messaging and give up files altogether, or to use an enterprise managed file transfer solution that runs atop the messaging provider that you choose. None of the off-the-shelf MFT products will cost as much in the long run as developing it yourself if you wish to do it right with robust exception handling and reporting.
If the stores are on separate networks and communicating over the internet, then MSMQ is not really an option. NServiceBus provides a concept of a gateway, which allows to asynchronously transport MSMQ messages over HTTP or HTTPS.

Factors Affected for Low Performance of middleware Messaging Softwares

I am planning to inegrate messaging middleware in my web application. Right now I am tesing different messaging middleware software like RabbitMQ,JMS, HornetQ, etc..
Examples provided with this softwares are working but its not giving as desired results.
So, I want to know that which are the factors which are responsible to improve peformance that one should keep in eyes?
Which are the areas, a developer should take care of to improve the performance of middleware messaging software?
I'm the project lead for HornetQ but I will try to give you a generic answer that could be applied to any message system you choose.
A common question that I see is people asking why a single producer / single consumer won't give you the expected performance.
When you send a message, and are asking confirmation right away, you need to wait:
The message transfer from client to server
The message being persisted on the disk
The server acknowledging receipt of the message by sending a callback to the client
Similarly when you are receiving a message, you ACK to the server:
The ACK is sent from client to server
The ACK is persisted
The server sends back a callback saying that the callback was achieved
And if you need confirmation for all your message-sends and mesage-acks you need to wait these steps as you have a hardware involved on persisting the disk and sending bits on the network.
Message Systems will try to scale up with many producers and many consumers. That is if many are producing they should all use the resources available at the server shared for all the consumers.
There are ways to speed up a single producer or single consumer:
One is by using transactions. So, you minimize the blocks and syncs you perform on disk while persisting at the server and roundtrips on the network. (This is actually the same on any database)
Another one, is by using Callbacks instead of blocking at the consumer. (JMS 2 is proposing a Callback similar to the ConfirmationHandler on HornetQ).
Also: most providers I know will have a performance section on their docs with requirements and suggestions for that specific product. You should look individually at each product

Resources