Determining Window Message Queue Depth - winapi

We have an application that uses the window message queue to pass data from a socket to consumer HWNDs (at a rate of ~2100Hz). This application has worked for >2 years. Recently our application has started exhibiting problems where WM_TIMER is not being fired/executed by our application. I think this is due to the data being pumped into the message queue.
My question is there a way to determine how many pending messages are in the message queue for a given thread/HWND?

This is answered in really great detail by Raymond Chen in his post "but then we ran into problems when we started posting 10,000 messages per second".
The research team asked to meet with
the user interface team to help work
out their problems under load. They
outlined their design and explained
that it worked well at low data rates,
"but then we ran onto problems when we
started posting 10,000 messages per
second."
At that point, the heads of all the
user interface people just sat there
and boggled for a few seconds.
"That's like saying your Toyota Camry
has stability problems once you get
over 500 miles per hour."

There isn't a good way to do this. One thing you could do is aggressively empty the message que and put them in your own queue. But, this will not solve your problem.
I hate telling you this, but you should really find a way to process your socket data. I think you will find some other mechanism scales better, performs better, and is easier to debug than using the windows message queue for this.
Foredecker

Related

investments/transactions/get endpoint - how long to return data?

I've been testing Plaid's investments transactions endpoint (investments/transactions/get) in development.
I'm encountering issues with highly variable delays for data to be returned (following the product initialization with Link). Plaid states that it takes 1–2 minutes to return investment transaction data, but I've found that in practice, it can be up to several hours before the data is returned.
Anyone else using this endpoint and getting data returned within 1–2 minutes, or is it generally a longer wait?
If it is a longer wait, do you simply wait for the DEFAULT_UPDATE webhook before you retrieve the data?
So far, my experience with their investments/transactions/get has been problematic (missing transactions, product doesn't work as described in their docs, limited sandbox dataset, etc.) so I'm very interested in hearing from anyone with more experience with this endpoint.
Do you find this endpoint generally reliable, and the data provided to be usable, or have you had issues? I've not seen any issues with investments/holdings/get, so I'm hoping that my problems are unusual, and I just need to push through it.
I'm testing in development with my own brokerage accounts, so I know what the underlying transactions are compared to what Plaid is returning to me. My calls are set up correctly, and I can't get a helpful answer from Plaid support.
I took at look at the support issue and it does appear like the problem you're hitting is related to a bug (or two different bugs, in this case).
However, for posterity/anyone else reading this question, I looked it up and the general answer to the question is that the endpoint in the general case is pretty fast -- P95 latency for calling /investments/transactions/get is currently about 1 second (initial calls on an Item will be higher latency as they have more data to fetch and because they are blocked on Plaid's extracting the data for the Item for the first time -- hence the 1-2 minute guidance in the docs).
In addition, Investments updates at some major brokerages are scheduled to happen only overnight after market close, so there might be a delay of 12+ hours between making a trade and seeing that trade be returned by the API.

Cancel last sent message ZeroMQ (python) (dealer/router and pushh/pull)

How would one cancel the last sent message ?
I have this set up
The idea is that the client can ask for different types of large data.
The server reads the request from the client and answers an acknowledgement.
Once its data is ready, it pushes it through the other socket.
This enables queueing task on the server side when multiple clients are connected.
However, if the client decides that it does not need the data anymore, it can send a cancel message to the server.
I'm using asyncio.Queue for queueing messages, so I can easily empty the queue, however, I don't know how to drop a message that is in the push/pull pipe to free up the channel?
The kill switch example (Figure 19 - Parallel Pipeline with Kill Signaling) in https://zguide.zeromq.org/docs/chapter2/ is used to end the process. I just want to cancel it.
My idea was to close the socket on the server side and reopen it, but even with linger set to 0, the messages are not dropped.
EDIT: The messages are indeed dropped, but I feel the solution is wrong.
It doesn't really make any sense for ZeroMQ itself to have such a feature.
Suppose that it did have a cancel message feature. For it to operate as expected, you would be critically dependent on the speed of the network. You might develop on a slow network and their you have the time available to decide to cancel, submit the request and for that to take effect before anything has moved anywhere. But on a fast network you won't.
ZeroMQ is a bit like the post office. Once you have posted a letter, they are going to deliver it.
Other issues for a library developer would include how messages are identified, who can cancel a message, etc? It would get very complex for the library to do it and cater for all possible use cases, so it's not unreasonable that they've left such things as an exercise for the application developers.
Chop the Responses Up
You could divide the responses up into smaller messages, send them at some likely rate (proportionate to the network throughput) and check to see if a cancellation has been received before sending each chunk.
It's a bit fiddly, you'd need to know what kind of rate to send the smaller messages so that you don't starve the network, but don't over do it either.
Or, Convert to CSP
The problem lies in ZeroMQ implementing Actor Model, where the transport buffers messages. What you need is Communicating Sequential Processes, which does not buffer messages. You can implement this quite easily on top of ZeroMQ, basically all you need to do is have a two way message exchange going on basically like:
Peer1->Peer2: I'd like to send you a message
time passes
Peer2->Peer1: Okay send a message
Peer1->Peer2: Here is the message
time passes
Peer2->Peer1: I have received the message
end
And in doing this the peers would block, ie peer 1 does nothing else until it gets peer 2's final response.
This feels clunky, but it's what you have to do to reign in an Actor Model system and control where your messages are at any point in time. It's slower because there's more too-ing and fro-ing going on between the peers (in systems like Transputers, this was all done down at the electronic level, so it wasn't an encumberance on software).
The blocking can be a blessing, if throughput matters. Basically, if you find the sender is being blocked too much, that just means you haven't got enough receivers for the tasks they're performing. Actor Model can deceive, because buffering in the network / actor model implementation can temporarily soak up an excess of messages, adding a bit of latency that goes unnoticed.
Anyway, this way you can have a mechanism whereby the flow of messages is fully managed within the application, and not within the ZeroMQ library. If a client does send a "cancel my last request" message (using the above mechanism to send it), that either arrives before the reponse has started to be sent, or after the response has already been delivered to the client (using the mechanism above to send it). There is no intermediate state where a response is already on the way, but out of control of the applications.
CSP is a mode that I'd dearly like ZeroMQ to implement natively. It nearly does, in that you can control the socket high water marks. Unfortunately, a high water mark of 0 means "inifinite", not zero.
CSP itself is a 1970s idea, that saw some popularity and indeed silicon in the 1980s, early 1990s (Inmos, Transputers, Occam, etc) but has recently made something of a comeback in languages like Rust, Go, Erlang. There's even a MS-supplied library for .NET that does it too (not that they call it CSP).
The really big benefit of CSP is that it is algebraically analysable - a design can be analysed and proven to be free of deadlock, without having to do any testing. However, with Actor model systems you cannot do that, and testing will not confirm a lack of problems either. Complex, circular message flows in Actor model can easily lead to deadlock, but that might not occur until the network between computers becomes just a tiny bit busier. Deadlock can happen in CSP too, but it's basically guaranteed to happen every time, if the system has accidentally been architected to deadlock. This shows up in testing quite readily (so at least you know early on!).
As I alluded to early, CSP also doesn't deceive you into thinking there is enough compute resources in a system. If a sender has a strict schedule to keep, and the recipient(s) aren't keeping up, the sender ends up being blocked trying to send instead of waiting for fresh input. It's easy to detect that the real time requirement has not been met. Whereas with Actor model, the send launches messages off into some buffer, and so long as the receiver(s) on average keeps up, all appears to be OK. However, you have no visibility of whether messages are building up inside the (in this case) ZeroMQ's own buffers, so there is little notice of a trending problem in the overall system.

What is the ZeroMQ PUB/SUB internal behaviour?

I'm trying to get my head around to the behaviour of zmq with PUB/SUB.
Q1: I can't find a real reason why with the PUSH/PULL sockets combo I can create a queue that actually queue in memory messages that it can't get delivered (the consumer is not available) when with the PUB/SUB not.
Q2: Is there any technical whitepaper or document that describes in detail the internals of the sockets?
EDIT:
This example of PUSH/PULL streamer works as expected (the worker join late or restart and gets the queued messages in the feeder. PUB/SUB forwarder does not behave in the same way.
While Q1 is hard to be answered / fully addressed without a SLOC ...
there is still a chance your code ( though yet unpublished,which StackOverflow so much encourages user to include in a form aka MCVEand you may already have felt or soon might feel some flames for not doing so ) just forgotten to set a subscription topic-filter
aSubSOCKET.setsockopt( zmq.SUBSCRIBE = "" ) # ->recv "EVERYTHING" / NO-TOPIC-FILTER
aSubSOCKET.setsockopt( zmq.SUBSCRIBE = "GOOD-NEWS" ) # ->recv "GOOD-NEWS" MESSAGES to be received only
A2: yes, there are exhaustive descriptions of all ZeroMQ API calls +
besides the API manpage collection for ØMQ/2.1.1 and other versions,there is a great online published pdf book "Code Connected, Vol.1" from Pieter HINTJENS himself.
Worth reading. A lot of insights into general distributed-processing area and ZeroMQ way.

MPI: master-slave with the master also doing work

I'm implementing a standard MPI master/slave system: there is a master that distributes work, and there are slaves who ask for chunks and process data.
However... if implemented in a naive way (rank==0 is master, the rest are slaves), the master ends up doing no real work, but still takes one core for what needs practically no real computing power. So I tried to implement a separate "scheduler" thread in the master, but that involved sending MPI messages to itself, and didn't really work...
Do you have any ideas how to solve this?
As I realized after some googling: you can send messages to yourself using tags. Tags are a kind of filter: if you do a recv for only tag==1, then you'll receive only those, with later messages being able to overtake eariler ones.
So, as for the solution:
tag the "scheduler to worker" and "worker to scheduler" messages with a different id
if rank==0: start a scheduler thread
afterwards, regardless of the rank, request work.
This way, the rank 0 worker won't receive its own "let's give me work" messages, because they will have a "to be received by the scheduler only" tag.
Edit: this thing doesn't really seem to be thread-safe though... (= it sometimes crashes in "free()" even though it's written in Python...) so I'd be still interested in the real & proven solution :)

What's a Reasonable Length of Time to Timeout a Ruby Thread?

I've got a need to retain the data and keep a Ruby program waiting for a response for anything up to a couple of days. I'm thinking about implementing this using threads (there may be a number of concurrent requests across a network). My question; is it reasonable to leave a thread running for anything up to a couple of days awaiting a response?
In general there is no problem with that. Check out the Queue class, it might facilitate the "job polling":
http://www.ruby-doc.org/stdlib-1.8.7/libdoc/thread/rdoc/Queue.html

Resources