I want the node to get rebooted if a critical service is stopped.
In foo.service, I can put "ExecStopPost=shutdown -r", however that
would cause reboot even if foo gets restarted.
foo is a critical service, node cannot function without this service.
What I want is if someone does "systemctl stop foo", reboot node.
An alternative is to give an error if someone does "systemctl stop foo"
by making foo as a critical service.
Is there any way to accomplish any of the above in systemd ?
Related
I've got a service that needs to be resarted, but all attempts to kill it fail.
I have tried everything i've found online and nothing has seemed to work.
The core issue seems to be that Services is holding onto the process and not allowing it to be killed
ERROR: The process with PID 11204 (child process of PID 572) could not be terminated.
Reason: There is no running instance of the task.
this happens when i try to force kill the task using taskkill/f /pid 11204 /t
PID 572 is services, so i cannot kill it without crashing windows.
There is also an Interactive Services detection that is activating but just leads to a blank screen i can't exit out of (since the process is dead) but turning this off still doesn't allow me to kill it.
I've found similar issues around but none seem to have the problem of the program being a child of services, and so can't kill the parent.
Is a system restart the ONLY option here? This is a production server and so restarting has to be done only at scheduled downtime, so looking for other options.
Services should be controlled via services APIs, or SCcommand-line tool. Try SC stop command.
On a call to ControlService[Ex] with SERVICE_CONTROL_STOP, explicitly from your SW or from SC tool, service's Handler[Ex] should receive SERVICE_CONTROL_STOP. At this point service should
Stop all its own started threads and free its own allocated resources
If it takes long, should also call SetServiceStatus with SERVICE_STOP_PENDING before that
Call SetServiceStatus with SERVICE_STOPPED to inform the system that is is no longer running
Return from Handler[Ex]
If the service was the only service in its process, StartServiceCtrlDispatcher is likely to return shortly, and at this point service process should exit. If there are other services in the process, StartServiceCtrlDispatcher will not return, and process should not exit, but the service being stopped is considered stopped anyway.
In a Windows service, my Service Control Handler receives a SERVICE_CONTROL_STOP command. I would like to determine the reason for this command; specifically, I need to know whether the STOP was requested because a depended-upon service ("master") is stopping or because of any other reason. The reason is, if my service stopped because the user requested a stop or because Windows is shutting down or any other similar reason, I don't need to do anything, but if my service is stopping because master is stopping, I need to make sure I restart my service when master restarts.
Unfortunately, I don't really see any source of this information - RegisterServiceCtrlHandlerEx will let me provide a handler which can get some details behind the control event, but there doesn't seem to be a notification that I can use. But maybe there's some other way, e.g. getting the info through the Session Manager or something.
In a Windows service, my Service Control Handler receives a SERVICE_CONTROL_STOP command. I would like to determine the reason for this command
Sorry, but the SCM does not provide that information to services.
specifically, I need to know whether the STOP was requested because a depended-upon service ("master") is stopping or because of any other reason.
There is no way for your service to determine that.
The reason is, if my service stopped because the user requested a stop or because Windows is shutting down or any other similar reason, I don't need to do anything
Detecting Windows shutting down is easy - your service can request to receive SERVICE_CONTROL_PRESHUTDOWN and SERVICE_CONTROL_SHUTDOWN events. For any other stop reason, it will only receive SERVICE_CONTROL_STOP with no explanation as to why.
if my service is stopping because master is stopping, I need to make sure I restart my service when master restarts.
There are two possible ways to handle that:
run a separate process that monitors the status of "master", either by regularly polling QueryServiceStatus() or by using NotifyServiceStatusChange(), and have it start your service when it detects "master" stop and restart.
if "master" logs events in the system log via an ETW provider, you can use ChangeServiceConfig2(SERVICE_CONFIG_TRIGGER_INFO) to register a trigger action that starts your service when a particular event is logged.
Unfortunately, I don't really see any source of this information - RegisterServiceCtrlHandlerEx will let me provide a handler which can get some details behind the control event, but there doesn't seem to be a notification that I can use.
Correct, because there isn't one.
I tried to kill process from PowerShell by Stop-Service -Name (service name).
Sometimes the process exits properly, but sometimes even though the service gets stopped the background Java process for this app doesn't stop. Is there a solution to stop Java if it hasn't from PowerShell. The problem is that we have to find the right Java process and kill only that as we have other Java processes also running.
Stop-Service doesn't kill a process. It makes a request to the service control manager (SCM) to ask it to stop a service with a particular name. The SCM will then call into the process hosting the service and ask it to stop.
It's possible that the service won't shut down correctly when asked, and the SCM will timeout the call to stop the service. This tends to lead to the service showing as stopped in the SCM but still running in the background, which is what you are seeing.
If you want to explicitly kill the process hosting the service then you'll need to find a way to map the service name to a process id. This question may help you.
The definition given in the man for systemd unit is a bit sparse:
https://www.freedesktop.org/software/systemd/man/systemd.unit.html
"If a unit foo.service contains a setting Before=bar.service and both units are being started, bar.service's start-up is delayed until foo.service is started up."
I couldn't find any conclusive explanation on what 'started up' means. Is this just the call from systemd to the service to start up. Or does systemd wait for the service to enter a specific state after which it is considered to be up? Can I read details on how this works anywhere?
Before being active, service is in activating mode. systemd waits for the service to fully enter in active mode. Only after that it calls dependent service to start.
I configure the recovery for Windows services to restart with a one minute delay after failures. But I have never gotten it to actually restart the service (even with the most blatant errors).
I do get a message in the EventViewer:
The description for Event ID ( 1 ) in Source ( MyApp.exe ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: Access violation at address 00429874 in module 'MyApp.exe'. Write of address 00456704.
Is there something else I have to do? Is there something in my code (I use Delphi) which needs to be set to enable this?
Service Recovery is intended to handle the case where a service crashes - so if you go to taskmgr and right click "end process" on your service process, the recovery logic should kick in. I don't believe that the service recovery logic kicks in if your service exits gracefully (even if it exits with an error).
Also the eventvwr message indicates that your application called the ReportEvent API specifying event ID 1. But you haven't registered your event messages with the event viewer so it can't convert event ID 1 into a meaningful text string.
Service Recovery only works for unexpected exit like (exit(-1)) call.
For all the way we use to stop the service in usual way will not works for recovery.
If you want to stop service and still wants recovery to work, call exit(-1) and you will see error message as "service stopped with unexpected error" , and then your service will restart as recovery setting is.
The Service Control Manager will attempt to restart your service if you've set it up to be restarted by the SCM. This is detailed here in the documentation for the SERVICE_FAILURE_ACTIONS structure.
A service is considered failed when it
terminates without reporting a status of SERVICE_STOPPED to the
service controller.
This can be fine tuned by setting the SERVICE_FAILURE_ACTIONS_FLAG structure's fFailureActionsOnNonCrashFailures flag, see here). You can set this setting from the Services applet by checking the "Enable actions for stops with errors" checkbox on the recovery tab.
If this member is TRUE and the service has configured failure actions, the failure actions are queued if the service process terminates without reporting a status of SERVICE_STOPPED or if it enters the SERVICE_STOPPED state but the dwWin32ExitCode member of the SERVICE_STATUS structure is not ERROR_SUCCESS (0).
If this member is FALSE and the service has configured failure actions, the failure actions are queued only if the service terminates without reporting a status of SERVICE_STOPPED.
So, depending on how you have structured your service, how you have configured your failure actions AND what you do when you have your 'fatal error' it may be enough to call ExitProcess() or exit() and return a non zero value. However, it's probably safest to ensure that your service exits without the code that's dealing with the SCM telling the SCM that your service has reached the SERVICE_STOPPED state. This ensures that your failure actions ALWAYS happen...
If you 'kill' service from task manager - forgot for recovery logic. In background task manager 'kills' process by 'stop service'. and as yuo can guess - this is not service failure. This forced me to kill it really with Visual Studio. In task manager right click on service process. Select debug.
In Visual studio select Debug-> Terminate All.
And now you have simulated service fail. In this case recovery logic works fine.