Consequences of changing USERPostMessageLimit - winapi

One of our legacy applications relies heavily on PostThreadMessage() for inter-thread communication, so we increased USERPostMessageLimit in the registry (way) beyond the normal 10.000.
However, documentation on MSDN states that "This limit should be sufficiently large. If your application exceeds the limit, it should be redesigned to avoid consuming so many system resources." [1]
Can anyone enlighten me as to how exactly consuming too many system resources manifests itself? What exactly are system resources? Can I somehow monitor an application's usage of system resources? Any information would be very helpful in deciding whether it is worth the time and effort to redesign this application.

The resources it is refering to are those used by the threads for receiving/handling the messages. You can monitor the thread pool size & other resources using the Taskmanager (look at View->Select Columns). It it may help you identify the specific resource if the consumer is resource locked, look for a resource count that tops out even while your threads are increasing.
However; if you need to increase USERPostMessageLimit then message producer is simply overloading the message consumer; by increasing this limit you are compounding your problem not fixing it. Reducing USERPostMessageLimit back to the default, and if your message producer cannot post the message try sleeping before retrying, allowing the consuming thread to clear some messages.

Related

One slow ActiveMQ consumer causing other consumers to be slow

I'm looking for help regarding a strange issue where a slow consumer on a queue causes all the other consumers on the same queue to start consuming messages at 30 second intervals. That is all consumers but the slow one don't consumer messages as fast as they can, instead they wait for some magical 30s barrier before consuming.
The basic flow of my application goes like this:
a number of producers place messages onto a single queue. Messages can have different JMSXGroupIDs
a number of consumers listen to messages on that single queue
as standard practice the JMSXGroupIDs get distributed across the consumers
at some point one of the consumers becomes slow and can't process messages very quickly
the slow consumer ends up filling its prefetch buffer on the broker and AMQ recognises that it is slow (default behaviour)
at that point - or some 'random' but close time later - all consumers except the slow one start to only consume messages at the same 30s intervals
if the slow consumer becomes fast again then things very quickly return to normal operation and the 30s barrier goes away
I'm at a loss for what could be causing this issue, or how to fix it, please help.
More background and findings
I've managed to reliably reproduce this issue on AMQ 5.8.0, 5.9.0 (where the issue was originally noticed) and 5.9.1, on fresh installs and existing ops-managed installs and on different machines some vm and some not. All linux installs, different OSs and java versions.
It doesn't appear to be affected by anything prefetch related, that is: changing the prefetch value from 1 to 10 to 1000 didn't stop the issue from happening
[red herring?] Enabling debug logs on the amq instance shows logs relating to the periodic check for messages that can be expired. The queue doesn't have an expiry policy so I can only think that the scheduled expireMessagesPeriod time is just waking amq up in such a way that it then sends messages to the non-slow consumers.
If the 30s mode is entered then left then entered again the seconds-past-the-minute time is always the same, for example 14s and 44s past the minute. This is true across all consumers and all machines hosting those consumers. Those barrier points do change after restarts of amq.
While not strictly a solution to the problem, further investigation has uncovered the root cause of this issue.
TL;DR - It's known behaviour and won't be fixed before Apollo
More Details
Ultimately this is caused by the maxPageSize property and the fact that AMQ will only apply selection criteria to messages in memory. Generally these are message selectors (property = value), but in my case they are JMSXGroupID=>Consumer assignments.
As messages are received by the queue they get paged into memory and placed into a collection (named pagedInPendingDispatch in the source). To dispatch messages AMQ will scan through this list of messages and try to find a consumer that will accept it. That includes checking the group id, message selector and prefetch buffer space. For our use case we aren't using message selectors but we are using groups. If no consumer can take the message then it is left in the collection and will be checked again at the next tick.
In order to stop the pagedInPendingDispatch collection from eating up all the resources available there is a suggested limit to the size of this queue configured via the maxPageSize property. This property isn't actually a maximum, it's more a hint as to whether, under normal conditions, new message arrivals should be paged in memory or paged to disk.
With these two pieces of information and a slow consumer it turns out that eventually all the messages in the pagedInPendingDispatch collection end up only being consumable by the slow consumer, and hence the collection effectively gets blocked and no other messages get dispatched. This explains why the slow consumer wasn't affected by the 30s interval, it had maxPageSize messages waiting delivery already.
This doesn't explain why I was seeing the non-slow consumers receive messages every 30s though. As it turns out, paging messages into memory has two modes, normal and forced. Normal follows the process outlined above where the size of the collection is compared to the maxPageSize property, when forced, however, messages are always paged into memory. This mode exists to allow you to browse through messages that aren't in memory. As it happens this forced mode is also used by the expiry mechanism to allow AMQ to expire messages that aren't in memory.
So what we have now is a collection of messages in memory that are all targeted for dispatch to the same consumer, a consumer that won't accept them because it is slow or blocked. We also have a backlog of messages awaiting delivery to all consumers. Every expireMessagesPeriod milliseconds a task runs that force pages messages into memory to check if they should be expired or not. This adds those messages onto the pages in collection which now contains maxPageSize messages for the slow consumer and N more messages destined for any consumer. Those messages get delivered.
QED.
References
Ticket referring to this issue but for message selectors instead
Docs relating to the configuration properties
Somebody else with this issue but for selectors

When multi MessageConsumer connect to same queue(Websphere MQ),how to load balance message-consumer?

I am Using WebSphere MQ 7,and I have two clients connected to the same QMgr and consuming messages from same queue, like following code:
while (true) {
TextMessage message = (TextMessage) consumer.receive(1000);
if (message != null) {
System.out.println("*********************" + message.getText());
}
}
I found only one client always retrieve messages. Is there any method to let consume-message load balancing in two client? Any config options in MQ Server side?
When managing queue handles, it is MUCH faster for WMQ to put them in a stack rather than a LIFO queue. So if the messages arrive on the queue slower than it takes to process them, it is possible that an instance will process the message and perform another GET, which WMQ pushes down on the stack. The result is that only one instance will see messages in a low-volume use case.
In larger environments where there are many instances waiting on messages, it is possible that activity will round-robin amongst a portion of those instances while the other instances starve for messages. For example, with 10 GETters on the queue you may see three processing messages and 7 idle.
Although this is considerably faster for MQ, it is confusing to customers who are not aware of how it works internally and so they open PMRs asking this exact question. IBM had to choose among several alternatives:
Adding several code paths to manage by stack for performance when fully loaded, versus manage by LIFO for apparent balancing when lightly loaded. This bloats the code, adds many new decision points to introduce errors and solves a problem that was one of perception rather than reliability or performance.
Educate the customers as to how it works. Of course, once you document it, then you can't change it. The way I found out about this was attending the "WMQ Internals" presentation at IMPACT. It's not in the Infocenter so IBM can change it, but it is available for customers.
Do nothing. Although this is the best result from the code design point of view, the behavior is counter-intuitive. Users need to understand why things do not behave as expected and will waste time trying to find the configuration that results in the desired behavior, or open a PMR.
I don't know for sure that it still works this way but I expect that it does. The way I used to test it was to put many messages on the queue at once and then see how they were distributed. If you drop about 50 messages on the queue in one unit of work, you should see a better distribution between the two instances.
How do you drop 50 messages on the queue at once? First generate them with the applications turned off or to a spare queue. If you generated them in the target queue, use the Q program to move them to the spare queue. Now start the apps and make sure the queue's IPPROC count equals however many instances of the app you started. Using Q again, copy all of the messages to the original queue in a single unit of work. Since they all become available on the queue at once, your two app instances should both immediately be passed a message. If you used copy instead of move, you can repeat this as often as required.
Your client is not doing much, so one instance can probably handle the full load. Try implementing a more realistic workload, or, simpler yet, put a Thread.sleep in the client.

Activemq topic subscribers heap memory leak - why are messages increasing?

I have console application which connects to activemq topics. Abount 10 messages per second are published on each topic. After some time monitored that the application memory is increasing and when all the memory is used the application crashes.
See the dump below. Why is ActiveMQTopicSubsctiber using so much heap? Also it is not visible but the ListEntries are about~14 000 (which means 14k messages).
http://imageshack.us/photo/my-images/404/amqmemoryproblem.png
A couple of things to possibly check for:
In your subscriber are you positive that the messages from the topic actually being consumed?
What is your prefetchLimit specified as?
If holding messages in memory continues to be a problem, you should consider configuring ActiveMQ to use file cursors. The use of file cursors tells ActiveMQ to spool messages to disk instead of holding them in memory.

Windows EventLog: How fast are operations with it?

I have a service application that is processing client requests over TCP and writing any events into Windows EventLog. Since this application is expected to service many clients and lots of requests from each client in a short amount of time (let's say between 1 and 50 requests per second), I'm curious to know how intensive (CPU wise and time wise) and how fast can writing into Windows EventLog be?
More specifically, how intensive are the operations of connecting to, reading from and writing to EventLog?
Don't do that. The event log is not designed for such an activity:
It has a maximum size.
When the maximum size is reached, it can overwrite events or stop logging, depending on settings (recent Windows can also archive the log and start a new one). If events are not overwritten, they can fill your partition or block other applications until the logs are manually cleared.
The event log is not a general logging facility. It should be used to report errors, situations that needs attention, and even informative reports, but not every little bit of information one has to write somewhere. If you have heavt log needs, use your own log facilities and report issues - if any - in the event log with a "pointer" where to find detailed data if needed.
NOTE: if really the event log is needed, at least the application should use its own log destination, not one of the standard ones (application or even worse system). This way it won't impact other applications operations, and won't "hide" other application events "flooding" the log with its events, making more difficult to spot the others without looking for them.
Event Tracing for Windows would likely be a better repository for this level of traffic.
Event Tracing for Windows (ETW) is an
efficient kernel-level tracing
facility that lets you log kernel or
application-defined events to a log
file. You can consume the events in
real time or from a log file and use
them to debug an application or to
determine where performance issues are
occurring in the application.
Sample pseudo-code:
const
MyApplicationProviderGUID: TGUID = '{47A0DECE-4DCF-4782-BCF4-82AECA6BAAB7}';
private
FETWRegistrationHandle: THandle;
...
EventRegister(MyApplicationProviderGUID, nil, nil, {out}FETWRegistrationHandle);
...
EventWriteString(FETWRegistrationHandle, 0, 0, 'Hello');
EventWriteString(FETWRegistrationHandle, 0, 0, ', ');
EventWriteString(FETWRegistrationHandle, 0, 0, 'world');
EventWriteString(FETWRegistrationHandle, 0, 0, '!');
...
EventUnregister(MyApplicationProviderGUID);
I made a test with my 2 event log classes, one writing to file (each log_event() writes to and flushes already opened file) and one based on EventLog (ReportEvent() call on already registered EventSource). In my case file log was about 10 times faster than EventLog. In multithread envirnonment I would add critical section to protect writing to file.
In my opinion files are better: they are easily parsed in tools such as grep. Speed is less important for me.
Maybe Microsoft Message Queuing (MSMQ) is an alternative to the Windows EventLog. It is available in all current versions of Windows, and offers high speed, loosely coupled messaging.

ActiveMQ: Slow processing consumers

Concerning ActiveMQ: I have a scenario where I have one producer which sends small (around 10KB) files to the consumers. Although the files are small, the consumers need around 10 seconds to analyze them and return the result to the producer. I've researched a lot, but I still cannot find answers to the following questions:
How do I make the broker store the files (completely) in a queue?
Should I use ObjectMessage (because the files are small) or blob messages?
Because the consumers are slow processing, should I lower their prefetchLimit or use a round-robin dispatch policy? Which one is better?
And finally, in the ActiveMQ FAQ, I read this - "If a consumer receives a message and does not acknowledge it before closing then the message will be redelivered to another consumer.". So my question here is, does ActiveMQ guarantee that only 1 consumer will process the message (and therefore there will be only 1 answer to the producer), or not? When does the consumer acknowledge a message (in the default, automatic acknowledge settings) - when receiving the message and storing it in a session, or when the onMessage handler finishes? And also, because the consumers are so slow in processing, should I change some "timeout limit" so the broker knows how much to wait before giving the work to another consumer (this is kind of related to my previous questions)?
Not sure about others, but here are some thoughts.
First: I am not sure what your exact concern is. ActiveMQ does store messages in a data store; all data need NOT reside in memory in any single place (either broker or client). So you should actually be good in that regard; earlier versions did require that all ids needed to fit in memory (not sure if that was resolved), but even that memory usage would be low enough unless you had tens of millions of in-queue messages.
As to ObjectMessage vs blob; raw byte array (blob) should be most compact representation, but since all of these get serialized for storage, it only affects memory usage on client. Pre-fetch mostly helps with access latency; but given that they are slow to process, you probably don't need any prefetching; so yes, either set it to 1 or 2 or disable altogether.
As to guarantees: best that distributed message queues can guarantee is either at-least-once (with possible duplicates), or at-most-once (no duplicates, can lose messages). It is usually better to take at-least-once, and make clients to de-duping using client-provided ids. How acknowledgement is sent is defiend by JMS specification so you can read more about JMS; this is not ActiveMQ specific.
And yes, you should set timeout high enough that worker typically can finish up work, including all network latencies. This can slow down re-transmit of dropped messages (if worked dies), but it is probably not a problem for you.

Resources