How do I code a watchdog timer to restart a Windows service? - windows

I'm very interested in the answer to another question regarding watchdog timers for Windows services (see here). That answer stated:
I have also used an internal watchdog system running in another thread. That thread looks at the main thread for activity like log output or a toggling event. If the activity is not seen then the service is considered hung and I shutdown the service.
In this case you can configure windows to auto-restart a stopped service and that might clear the problem (as long as it's not an internal logic bug).
Also services I work with have text logs that are written to a log. In addition for services that are about to "sleep for a bit", I log the time for the next wake up. I use MTAIL to watch a log for output."
Could anyone give some sample code how to use an internal watchdog running in another thread, since I currently have a task to develop a windows service which will be able to self restart in case it failed, hung up, etc.
I really appreciate your help.

I'm not a big fan of running a watchdog as a thread in the process you're watching. That means if the whole process hangs for some reason, the watchdog won't work.
Watchdogs are an idea lifted from the hardware world and they had it right. Use an external circuit as simple as possible (so it can be provably correct). Typical watchdogs simply ran an timer and, if the process hadn't done something before the timer expired (like access a memory location the watchdog was watching), the whole thing was reset. When the watchdog was "kicked", it would restart the timer.
The act of the process kicking the watchdog protected that process from summary termination.
My advice would be to write a very simple stand-alone program which just monitored an event (such as file update time being modified). If that event didn't occur within the required time, kill the process being watched (and let Windows restart it).
Then have your watched program periodically rewrite that file.

Other approaches you might want to consider besides regularly modifying the lastwritetime of a file would be to create a proper performance counter or even a WMI object. We do the later in our build infrastructure, the 'trick' is to find a meaningful work unit in the service being monitored and pulse your 'heartbeat' each time a unit is finished.
The advantage of WMI or Perf Counters over a the file approach is that you then become visible to a whole bunch of professional MIS / management tools. This can add a lot of value.

You can configure from service properties to self restart in case of failure
Services -> right-click your service -> Properties -> First failure : restart the service -> Second failure : restart the service -> Subsequent failure : restart

Related

Does windows stop services gracefully or kills them when an error (memory overflow, etc) occurs?

Firstly sorry for my poor english. I don't really know how to formulate the question, but I can explain you my intentions so it may help you to understand me better.
Im developing tool that notifies you when a windows service goes down.
The exact logic that I follow is:
When a service goes down gracefully logs an event that you can see in windows event viewer. I've created a sheduled task that will be triggered when the service is stopped according to windows event log (Thanks to a XML filter).
This task triggers a powershell script that sends a request to a telegram bot that will notify me when the service dies.
This process works perfectly when I manually stop the service (From service.msc or Powershell's Stop-Service). The objective is to have a realtime track of the service, and in this case works correctly.
The problem comes here: I cannot force the service to crash in order to see if it logs information in windows event viewer.
My questions are:
If an error occurs will windows shut the service down gracefully (like when using Stop-Service) or will it kill the process without registering any log info (like when using taskkill /f)?
Any other suggestions? Is there another way to track a windows service in real time and trigger a script without a loop that runs every certain time.
Hope y'all understand me :)
If a service crashes, you should still see an error message in the event log under Windows Logs > System. The Source will be "Service Control Manager" and Event ID should be either 7031 or 7032 or 7034.
So you can add a filter for these and have your PowerShell script run on these kinds of events as well.

Why not launch external crash dump handler at the time the application crashes?

I am in the process of designing a crash handler solution for one of our applications that creates a crash dump file using the MiniDumpWriteDump() function. While reading up on the topic I have seen the recommendations to invoke MiniDumpWriteDump() from an external process to maximize the chance that the dump file contains the correct information. The common solution seems to be to run a watchdog process in parallel to the application process. When the application crashes it somehow contacts the watchdog process, providing it with the information that is required to create the crash dump. Then the application goes to sleep until it is terminated by the watchdog process.
I can imagine such a watchdog process being run continually as a background service. This has many implications, starting with "who creates the service?", but also "which user does the service run as?", and "how does the application contact the service?" etc. It seems a pretty heavy-weight solution which I don't feel is appropriate for the scope of my task.
A simpler approach is suggested by this SO answer: Launch a guard process on application startup that is tightly coupled to the application process. This is pretty good, but it still leaves me with the tasks of 1) keeping the information somewhere in the application how I can contact the guard process in case of a crash; and 2) making sure to terminate the guard process if the application process shuts down normally.
The simplest solution of all would be to launch the crash dump handler process at the time the crash occurs, passing all the information that is required to create the crash dump as arguments to the process. This information consists of
The process ID of the application process that crashed
The thread ID of the thread that crashed
The adress of the EXCEPTION_POINTERS structure that describes the exception that caused the crash
This "fire and forget" approach is compelling because it does not require any state retention, nor any complicated over-time process management. In fact, the approach seems so overwhelmingly simple that I cannot help but feel that I am overlooking something.
What are the arguments against such an approach?
The main argument against the "fire and forget" approach, as I called it, is that it is not safe to launch a new process at a time when the application is already in a state where it is about to crash.
Because of that I went for the "guard process" approach. It brings a number of challenges with it, for which Hans Passant has outlined a solution.
I also added a bit of code in this answer that should help with deep-copying the all-important EXCEPTION_POINTERS data structure.
Using WER, as proposed in the comments, also looks like a good alternative to writing your own guard process. I must admit I have not investigated this any further, though.

In Windows 7, how to send a Ctrl-C or Ctrl-Break to a separate process

Our group has long running processes which run daily. The processes are typically started at 9pm on any given day and run until 7pm the next day. Thus they typically run 22hrs/day. They are started by scheduled tasks on servers under a particular generic user ID, and they start and run regardless of whether or not that user ID is logged on. Thus, they are windowless console executables.
The tasks orchestrate computations running on a large server farm. Generally these controlling tasks run uninterrupted for the full 22hrs/day. However, we often have a need to stop and restart these processes. Because they control a multitude of tasks running on our server farm, it is important that they be shut down cleanly, so that they can stop and shut down all the server farm processes. Which brings me to our problem.
The controlling process has been programmed to respond to ctrl-C and ctrl-break signals. This works fine when the process is manually started in a console where we have access to the console and can "type" ctrl-c or ctrl-break in the console window. However, as mentioned, the processes typically run as windowless scheduled tasks. Hence we cannot "type" anything into a non-existent console window. Because they are console processes that execute without a logon process, the also must be able to execute in a completely windowless environment. So, how do we set up the process to listen for a shut-down signal?
While the process does indeed listen for a ctrl-C and ctrl-break signal, I can see no way to send that signal to a process. This seems to be a fundamental problem in Windows, or am I wrong? I am aware of SendSignal.exe, but so far have been unable to get it to work. It fails as follows:
>SendSignal 26320
Sending signal to process 26320...
CreateRemoteThread failed with 0x00000005.
StartRemoteThread failed with 0x00000005.
0x00000005 == Access is denied.
Trying "taskkill" without -F results in:
>taskkill /PID 24840
ERROR: The process with PID 24840 could not be terminated.
Reason: This process can only be terminated forcefully (with /F option).
All other "kill" functions kill the process immediately rather than sending a signal.
One possible solution would be a file-watch based solution: create a watch for some modification of a specific file. But this is a hack and we would prefer to do it with appropriate signaling. Has anyone solved this issue? It seems to be so very basic a functionality, and it is certainly trivial to do it in a Unix environment. Surely Microsoft has provided SOME mechanism to allow clean shut down of a windowless executable?
I am aware of the thread below, whose question is virtually identical (save for the specification of why the answer is necessary, i.e. why one needs to be able to do this for a windowless, console-less process), but there is no answer there excpet for "use SendSignal", which, as I said, does not work for us:
Can I send a ctrl-C (SIGINT) to an application on Windows?
There are other similar questions, but no answers as yet.
Any help appreciated.
[Upgrading #Anon's comment to an answer for visibility]
windows-kill worked perfectly and managed to resolve access denial issues faced with SendSignal. A privileged user would have to run it as well of course.
windows-kill also supports both ctrl-c and ctrl-break signals.

Windows Kernel Driver Boot\winlogon complete callback

Can I get an event callback to my kernel driver when the boot process has completed, or when a user logs in?
The simple answer is no.
The long answer is yes, but why?
I'll answer the second part, because it's easier. You can easily register to recieve a notification when any process is launched. A short examination of Windows Internals will tell you that from Vista and up, the process userinit.exe is the first process to be executed in any given user session.
To the first part, this very much changes depending on your definition of boot process. Is it when a GUI is loaded? Is it when the computer can receive network requests? Does it matter which network requests (TCP/IP, SMB, RPC)?
The answer to each of these is very different.
When win32K has finished loading
When the TCP/IP stack drivers finish loading
When specific services (RPC, Server service) are done loading
What is the problem you're trying to solve?

What processĀ API do I need to hook to track services?

I need to track to a log when a service or application in Windows is started, stopped, and whether it exits successfully or with an error code.
I understand that many services do not log their own start and stop times, or if they exit correctly, so it seems the way to go would have to be inserting a hook into the API that will catch when services/applications request a process space and relinquish it.
My question is what function do I need to hook in order to accomplish this, and is it even possible? I need it to work on Windows XP and 7, both 64-bit.
I think your best bet is to use a device driver. See PsSetCreateProcessNotifyRoutine.
Windows Vista has NotifyServiceStatusChange(), but only for single services. On earlier versions, it's not possible other than polling for changes or watching the event log.
If you're looking for a user-space solution, EnumProcesses() will return a current list. But it won't signal you with changes, you'd have to continually poll it and act on the differences.
If you're watching for a specific application or set of applications, consider assigning them to Job Objects, which are all about allowing you to place limits on processes and manage them externally. I think you could even associate Explorer with a job object, then all tasks launched by the user would be associated with your job object automatically. Something to look into, perhaps.

Resources