I'm having trouble which boils down to wishing CreateProcess were StartProcess. The trouble is that there are circumstances under which CreateProcess returns true when it created the process but the system could not start the process. For example, CreateProcess will succeed even if one of the launchee's imports cannot be resolved.
There are probably a dozen suggestions one could make depending on what exactly I hope to accomplish by having launched this process. However, I'm afraid none of those suggestions is likely to be useful because I'm not hoping to acccomplish anything in particular by having launched this process.
One example suggestion might be to call WaitForSingleObject against the process handle and then GetExitCodeProcess. But I can't wait for the process to exit because it might stick around forever.
Another example suggestion might be to call WaitForInputIdle, which would work well if I hoped to communicate with the launchee by means of a window I could reasonably expect the launchee to create. But I don't hope that and I can't reasonably expect that. For all I know, the launchee is a console process and/or will never have a message queue. As well, I can't afford to wait around (with heuristic intent) to find out.
In fact, I can't assume anything about the launchee.
To get a better idea of how I'm thinking here, let's look at the flip side of the issue. If the process doesn't start, I want an error code that tells me how I might advise the user. If the imports all resolved and the main thread realizes it's about to jump into the CRT startup code (or equivalent), and the error code I get back is ERROR_SUCCESS, great! But I'm actually disinterested in the launchee and merely wish to provide a good user experience in the launcher.
Oh, and one more thing: I want this to be simple. I don't want to write a debugger. :-)
Ideas?
One example suggestion might be to call WaitForSingleObject against the process handle and then GetExitCodeProcess. But I can't wait for the process to exit because it might stick around forever.
Why don't you wait for the process handle for some reasonable time. If the timer expires before the handle is signaled, you can presume the process is up and running. If the handle is signaled first, and the exit code is good, then you can presume it ran and completed successfully.
In case you haven't seen it, the CreateProcess vs started problem was mentioned in Raymond Chen's blog.
Honestly, if you're not willing to accept heuristics (like, "it hasn't ended with a failure code after three seconds, therefore we assume all is well") then you're going to have to write a 'debugger', by which I mean inspect the internals of the launched process.
This question has gone so long without an answer that I suspect it's safe to conclude that the answer is: "You can't."
Related
Systemd will create a process & which in-turn creates many other applications/processes at startup, that need to run on my embedded device.
Is there any way we can add a piece of code in all the applications such that systemd will exchange 'heartbeat' & will know if some application is hung or not
Is there some examples which I can refer & understand?
The answer is yes, see this superuser post:
Yes; but first fix your buggy program before fiddling with systemd.
MariusMatutiae is quite correct. You have a problem with your program.
It deadlocks. Fiddling with systemd isn't the answer. At best, it's a
distraction. Fix your program so that it isn't broken. Direct your
energies at the right thing.
That said, other people are going to come here because of the question
title, rather than the question proper. For their benefit, here's the
answer to the title, ignoring the question proper:
Yes, systemd can monitor dæmons and automatically restart them if they
stop talking. Not just any old dæmons, though. As mvp notes, there's
no way to know that a dæmon has hung (in this universe, where the
halting problem is undecidable, at least). Neither systemd nor any
other computer program will ever be capable of deducing from scratch
that some random program thrown at them has deadlocked, or gone into
an infinite loop, or whatever. The best that you'll get here is
detecting that a dæmon hasn't performed a regular "heartbeat"
operation within a required timespan.
Dæmons that take advantage of systemd's watchdog capabilities,
therefore, have to be written to speak a systemd-specific protocol,
the sd_notify protocol. This complicates the dæmon code a tad. It's
complicated further because dæmons should, if written properly, check
whether they've been invoked with the watchdog function enabled, as
well.
A dæmon that speaks this protocol to make use of systemd's watchdog
capability …
… must check for the WATCHDOG_USEC environment variable;
… must call sd_notify() continually and frequently, throughout its lifetime, with the WATCHDOG=1 option set, at an interval of about WATCHDOG_USEC/2 ("USEC" stands for microseconds) ;
… must have Type=notify set in its unit file;
… should have NotifyAccess=main (or =all) set in its unit file;
… must have WatchdogSec=seconds set in its unit file.
… must link with libsystemd-daemon.so If you want to know the details of coding this, after reading the manual, make sure that you go to the right StackExchange. This is SuperUser. StackOverflow is over there.
Further reading
Lennart Poettering. 2011-04-12. Watchdogs. Freedesktop.org.
When I here about the halting problem, it sounds like non-termination is something to avoid and that the halting problem makes it impossible to know if the program/algorithm is good.
But when I think about it, aren't terminating programs the exception and no the rule? I can think of one class of applications where it's expected to terminate in a finite amount of time: compilers. Everything else, from the web-browser I'm using, to the desktop environment, to the text editor, to the shell, to server hosting SO, to the OS itself, aren't supposed to terminate on their own. Heck, even the package manager is supposed to ask the user for confirmation. They're all intended to keep running indefinitely unless a user or sysadmin says otherwise.
My point is is it really so bad that you can't prove that something will terminate? If anything, proving that something will exit in a finite amount of time would be more of a bug than the opposite.
I see your logic but while these programs you mention operate in an infinite loop until terminated you can still terminate them at any time using the exit feature. The problem with non-deterministic termination is that you have no idea when the program will release control of the operation it's performing so that it can be terminated.
Consider this. You write a program it completes a cycle and begins it's loop again. Each cycle would be similar to the program terminating. But rather than closing the program you ask it to start over. If you put a function call to an infinite loop in that program the program holds attention at that function effectively preventing all other functionality until that loop has completed. Hint, never. This is perceived by the user as the program freezing.
Termination of a program is not the point. It's only an easy to explain case of termination of a computation. Here's a practical example:
When you visit a web page, you may start running some Javascript. Depending on how the code is embedded in the page, you may have to wait for this script to terminate before the web page is fully displayed. If the script doesn't terminate within a certain time limit, you'll get a message like this:
(Chrome dialog pictured)
You're supposed to decide somehow whether the script is making progress and will finish if given a little more time, or if it's stuck in an infinite loop. You probably don't know the answer, so you guess. You wait until you're tired of waiting and then give up and kill it, not knowing if it was just 1 more second from completion when you hit the button.
Chrome doesn't tell you that the script is hopelessly stuck and will never terminate because detecting hopelessly stuck scripts would require solving the halting problem.
And it's not just page loads either. Javascript (in the web client context) is event-driven. A function is called when something external happens (i.e. you click on a form submit button) and that event is not processed until the function returns (terminates). A non-terminating script is a big problem.
The documentation for TerminateProcess says, in part:
This function stops execution of all threads within the process and requests
cancellation of all pending I/O.
...
TerminateProcess is asynchronous; it initiates termination and returns
immediately. If you need to be sure the process has terminated, call the
WaitForSingleObject function with a handle to the process.
This leaves some ambiguity about what happens if you use TerminateProcess to commit process suicide, like this:
TerminateProcess(GetCurrentProcess(), exit_code)
Logically that should be sufficient, but the documentation says that execution may continue afterwards, which is dangerous if you are calling TerminateProcess due to a bug that leaves your process in an indeterminate state.
The closest thing I have found to confirmation that waiting-on-suicide is not needed is the source code to _invoke_watson in:
C:\Program Files (x86)\Windows Kits\10\Source\10.0.10240.0\ucrt\misc\invalid_parameter.cpp
The last thing this function does is to call TerminateProcess on itself - with no wait.
It would be good to get certainty and I wanted to ask this here so that the answer can serve as a supplement to the documentation.
I got this answer from somebody offline who preferred not to post it here themselves:
My take on the user-mode documentation is that TerminateProcess is in general something that one process does to another and the documentation is written for this generality, and in this sense is correct and reasonable.
Give TerminateProcess a handle to some other process and you of course want that the call comes back to you for your continued operation.
What the documentation makes a point of is that the call's success does not mean that the other process has terminated, just that its termination has been successfully got under way.
If you rely on more than this, then you're surely doing something wrong. Investigating what exactly is reliable, or even thinking about it, has therefore never seemed very useful to me. As much as I've bothered, I've taken it all to mean that by the time TerminateProcess returns to you the other process and all its threads are already marked as terminated - literally, they have the Terminated bit set in the ETHREAD - but continues executing APCs, typically for such things as completing or cancelling I/O. Though this continued execution is greatly constrained, it may take some indefinite time. So, if you need to know that all execution is done, you watch for the process to get signalled.
Of course, to call TerminateProcess to terminate another process is a brutal thing to do, which the documentation also makes a point of saying.
Same Process
Calling TerminateProcess to terminate your own process is no less brutal. With one exception, it's very much a last resort, and even in that context, a really desperate one. Anyone who does it must have many bigger problems than what the call might return to, else surely they would call ExitProcess which explicitly does not return (but might leave the process in a deadlock).
Self-terminating via TerminateProcess introduces the case that one of the threads that's being terminated is the current thread. The post-termination APC processing is not activated for it: in the Windows 7 kernel, see PspTerminateThreadByPointer, which again makes the special case that user-mode termination of the current thread goes to the non-returning PspExitThread instead of queueing an APC for the final exit.
So, my reading is that TerminateProcess with -1 as the handle does not return, but that if you were in the kernel for the very last instructions that execute for the calling thread, you could see that other threads in your process may still execute to process APCs but yours has its APC queue drained and is about to be switched away from, never to come back. Indeed, if at this time your thread somehow has an APC to process, the kernel bug-checks!
By the way, the exception to TerminateProcess being brutal is that it's essentially the last thing that NTDLL does even for the relatively gentle self-termination via ExitProcess. It surely is well-known, yet it always surprises me, that to handle RtlExitUserProcess, NTDLL calls NtTerminateProcess twice. First, there's user-mode cleanup and a call to NtTerminateProcess giving NULL as the handle. If this succeeds, there's more user-mode cleanup, ending with a call to NtTerminateProcess giving -1 as the handle.
Well, that last call is just what you get from TerminateProcess with -1 as the handle. When you self-terminate via TerminateProcess you're just skipping past the clean-up, in both user mode and kernel mode, that you'd have got from ExitProcess. Anyway, when the TerminateProcess is, in effect, done by NTDLL as the end of ExitProcess, it's plainly expected not to return.
Two methods Thread#run and Thread#wakeup are different somehow, but it is not clear to me. Can someone provide a pair of code with minimal difference (i.e., difference being only the use of run in one, wakeup in another) that show different results, and possibly explanation for it?
Edit As Cary points out, it is indeed a duplicate of this question, and there is a good answer there, but now I am not sure what it means to have a thread awaken but not running. How is that different from a thread being in sleep situation?
To answer your second question. You cannot schedule a sleeping thread, however you when a thread is awake it can be scheduled to run, even if it isn't being ran currently.
To add to the first part from my understanding of wakeup vs run is that run calls wakeup inside of it, then calls run on it.
Hope that helps.
A simple search for DoEvents brings up lots of results that lead, basically, to:
DoEvents is evil. Don't use it. Use threading instead.
The reasons generally cited are:
Re-entrancy issues
Poor performance
Usability issues (e.g. drag/drop over a disabled window)
But some notable Win32 functions such as TrackPopupMenu and DoDragDrop perform their own message processing to keep the UI responsive, just like DoEvents does.
And yet, none of these seem to come across these issues (performance, re-entrancy, etc.).
How do they do it? How do they avoid the problems cited with DoEvents? (Or do they?)
DoEvents() is dangerous. But I bet you do lots of dangerous things every day. Just yesterday I set off a few explosive devices (future readers: note the original post date relative to a certain American holiday). With care, we can sometimes account for the dangers. Of course, that means knowing and understanding what the dangers are:
Re-entry issues. There are actually two dangers here:
Part of the problem here has to do with the call stack. If you call .DoEvents() in a loop that itself handles messages that use DoEvents(), and so on, you're getting a pretty deep call stack. It's easy to over-use DoEvents() and accidentally fill up your call stack, resulting in a StackOverflow exception. If you're only using .DoEvents() in one or two places, you're probably okay. If it's the first tool you reach for whenever you have a long-running process, you can easily find yourself in trouble here. Even one use in the wrong place can make it possible for a user to force a stackoverflow exception (sometimes just by holding down the enter key), and that can be a security issue.
It is sometimes possible to find your same method on the call stack twice. If you didn't build the method with this in mind (hint: you probably didn't) then bad things can happen. If everything passed in to the method is a value type, and there is no dependance on things outside of the method, you might be fine. But otherwise, you need to think carefully about what happens if your entire method were to run again before control is returned to you at the point where .DoEvents() is called. What parameters or resources outside of your method might be modified that you did not expect? Does your method change any objects, where both instances on the stack might be acting on the same object?
Performance Issues. DoEvents() can give the illusion of multi-threading, but it's not real mutlithreading. This has at least three real dangers:
When you call DoEvents(), you are giving control on your existing thread back to the message pump. The message pump might in turn give control to something else, and that something else might take a while. The result is that your original operation could take much longer to finish than if it were in a thread by itself that never yields control, definitely longer than it needs.
Duplication of work. Since it's possible to find yourself running the same method twice, and we already know this method is expensive/long-running (or you wouldn't need DoEvents() in the first place), even if you accounted for all the external dependencies mentioned above so there are no adverse side effects, you may still end up duplicating a lot of work.
The other issue is the extreme version of the first: a potential to deadlock. If something else in your program depends on your process finishing, and will block until it does, and that thing is called by the message pump from DoEvents(), your app will get stuck and become unresponsive. This may sound far-fetched, but in practice it's surprisingly easy to do accidentally, and the crashes are very hard to find and debug later. This is at the root of some of the hung app situations you may have experienced on your own computer.
Usability Issues. These are side-effects that result from not properly accounting for the other dangers. There's nothing new here, as long as you looked in other places appropriately.
If you can be sure you accounted for all these things, then go ahead. But really, if DoEvents() is the first place you look to solve UI responsiveness/updating issues, you're probably not accounting for all of those issues correctly. If it's not the first place you look, there are enough other options that I would question how you made it to considering DoEvents() at all. Today, DoEvents() exists mainly for compatibility with older code that came into being before other credible options where available, and as a crutch for newer programmers who haven't yet gained enough experience for exposure to the other options.
The reality is that most of the time, at least in the .Net world, a BackgroundWorker component is nearly as easy, at least once you've done it once or twice, and it will do the job in a safe way. More recently, the async/await pattern or the use of a Task can be much more effective and safe, without needing to delve into full-blown multi-threaded code on your own.
Back in 16-bit Windows days, when every task shared a single thread, the only way to keep a program responsive within a tight loop was DoEvents. It is this non-modal usage that is discouraged in favor of threads. Here's a typical example:
' Process image
For y = 1 To height
For x = 1 to width
ProcessPixel x, y
End For
DoEvents ' <-- DON'T DO THIS -- just put the whole loop in another thread
End For
For modal things (like tracking a popup), it is likely to still be OK.
I may be wrong, but it seems to me that DoDragDrop and TrackPopupMenu are rather special cases, in that they take over the UI, so don't have the reentrancy problem (which I think is the main reason people describe DoEvents as "Evil").
Personally I don't think it's helpful to dismiss a feature as "Evil" - rather explain the pitfalls so that people can decide for themselves. In the case of DoEvents there are rare cases where it's still reasonable to use it, for example while a modal progress dialog is displayed, where the user can't interact with the rest of the UI so there is no re-entrancy issue.
Of course, if by "Evil" you mean "something you shouldn't use without fully understanding the pitfalls", then I agree that DoEvents is evil.