How to check if a process started in the background still running? - go

It looks like if you create a subprocess via exec.Cmd and Start() it, the Cmd.Process field is populated right away, however Cmd.ProcessState field remains nil until the process exits.
// ProcessState contains information about an exited process,
// available after a call to Wait or Run.
ProcessState *os.ProcessState
So it looks like I can't actually check the status of a process I Start()ed while it's still running?
It makes no sense to me ProcessState is set when the process exits. There's an ProcessState.Exited() method which will always return true in this case.
So I tried to go this route instead: cmd.Process.Pid field exists right after I cmd.Start(), however it looks like os.Process doesn't expose any mechanisms to check if the process is running.
os.FindProcess says:
On Unix systems, FindProcess always succeeds and returns a Process for the given pid, regardless of whether the process exists.
which isn't useful –and it seems like there's no way to go from os.Process to an os.ProcessState unless you .Wait() which defeats the whole purpose (I want to know if the process is running or not before it has exited).

I think you have two reasonable options here:
Spin off a goroutine that waits for the process to exit. When the wait is done, you know the process exited. (Positive: pretty easy to code correctly; negative: you dedicate an OS thread to waiting.)
Use syscall.Wait4() on the published Pid. A Wait4 with syscall.WNOHANG set returns immediately, filling in the status.
It might be nice if there were an exported os or cmd function that did the Wait4 for you and filled in the ProcessState. You could supply WNOHANG or not, as you see fit. But there isn't.
The point of ProcessState.Exited() is to distinguish between all the various possibilities, including:
process exited normally (with a status byte)
process died due to receiving an unhandled signal
See the stringer for ProcessState. Note that there are more possibilities than these two ... only there seems to be no way to get the others into a ProcessState. The only calls to syscall.Wait seem to be:
syscall/exec_unix.go: after a failed exec, to collect zombies before returning an error; and
os/exec_unix.go: after a call to p.blockUntilWaitable().
If it were not for the blockUntilWaitable, the exec_unix.go implementation variant for wait() could call syscall.Wait4 with syscall.WNOHANG, but blockUntilWaitable itself ensures that this is pointless (and the goal of this particular wait is to wait for exit anyway).

Related

LTTng/Perf: Difference between events used for exiting (sched_process_exit) and freeing (sched_process_free) a process

Currently, I'm getting into the topic of kernel tracing with LTTng and Perf. I'm especially interested to trace the different states a process is in.
I stumbled over the event sched_process_free and sched_process_exit. I'm wondering if my current understanding is correct:
If a process is exited, sched_process_exit is written to the trace. However, the process descriptor might still be in the memory which leads to a zombie. When the whole memory connected to the process is freed, sched_process_free is called. This would mean, if I really want to be sure that the process is fully "terminated" and removed from memory, I have to listen to sched_process_free instead of sched_process_exit in the trace. Is this correct?
I find some time to edit my answer to make it more clear. If there are still some problem, please tell me, we can discuss and make it more clear. Let's dive into the end of task :
there are two system calls : exit_group() and exit(), and all of them will go to do_exit(), which will do the following things.
set PF_EXTING which means the task is deleting
remove the task descriptor from timer by del_timer_sync()
call exit_mm(), exit_sem(), __exit_fs() and others to release structure of that task
call perf_event_exit_task(tsk);
decrease the ref count
set exit_code to _exit()/exit_group() or error
call exit_notify()
update relationship with parent and child
check exit_signal, send SIGCHLD
if task is not traced or return value is -1, set the exit_state to EXIT_DEAD, call release_task() to recycle other memory and decrease ref count.
if task is traced, set exit_state to EXIT_ZOMBIE
set task flag to PF_DEAD
call schedule()
We need zombie state cause the parent may need to use those file descriptors so we can not delete all the things in the first time. The parent task will need to use something like wait() to check if child is dead. After wait(), it is time for the zombie to release totally by release_task()
decrease the owners' task number
if the task is traced, delete from the ptrace_children list
call __exit_signal() delete all pending signals and release signal_struct descriptor and exit_itimers() delete all the timer
call __exit_sighand() delete signal handler
call __unhash_process()
nr_threads--
call detach_pid() to delete task descriptor from PIDTYPE_PID and PIDTYPE_TGID
call REMOVE_LINKS to delete the task from list
call sched_exit() to schedule parent's time pieces
call put_task-struct() to decrease the counter, and release memory & task descriptor
call delayed_put_task_struct()
So, we know that sched_process_exit state will be make in the do_exit(), but we can not make sure if the process is released or not (may call release_task() or not, which will trigger sched_process_free). That is why we need both of the two perf event point.

Python3 How to gracefully shutdown a multiprocess application

I am trying to fix a python3 application where multiple proceess and threads are created controlled by various queues and pipes. I am trying to make a form of controlled exit when someone tries to break the program with ctrl-c. However no mather what I do it always hangs just at the end.
I've tried to used Keyboard-interrupt exception and signal catch
The below code is part of the multi process code.
from multiprocessing import Process, Pipe, JoinableQueue as Queue, Event
class TaskExecutor(Process):
def __init__(....)
{inits}
def signal_handler(self, sig, frame):
print('TaskExecutor closing')
self._in_p.close()
sys.exit(1)
def run
signal.signal(signal.SIGINT, self.signal_handler)
signal.signal(signal.SIGTERM, self.signal_handler)
while True:
# Get the Task Groupe name from the Task queue.
try:
ExecCmd = self._in_p.recv() # type: TaskExecCmd
except Exceptions as e:
self._in_p.close()
return
if ExecCmd.Kill:
self._log.info('{:30} : Kill Command received'.format(self.name))
self._in_p.close()
return
else
{other code executing here}
I'm getting the above print that its closing.
but im still getting a lot of different exceptions which i try to catch but it will not.
I'm am looking for some documentation on how to and in which order to shut down multiprocess and its main process.
I know it's very general question however its a very large application so if there are any question or thing i could test i could narrow it down.
Regards
So after investigating this issue further I found that in situation where I had a pipe thread, Queue thread and 4 multiprocesses running. # of these processes could end up hanging when terminating the application with ctrl-c. The Pipe and Queue process where already shut down.
In the multiprocessing documentation there are a warning.
Warning If this method is used when the associated process is using a
pipe or queue then the pipe or queue is liable to become corrupted and
may become unusable by other process. Similarly, if the process has
acquired a lock or semaphore etc. then terminating it is liable to
cause other processes to deadlock.
And I think this is what's happening.
I also found that even though I have a shutdown mechanism in my multi-process class the threads still running would of cause be considered alive (reading is_alive()) even though I know that the run() method have return IE som internal was hanging.
Now of the solution. My multiprocesses was for a design view not a Deamon because I wanted to control the shot down of them. However I changed them to Deamon so they would always be killed regardless. I first added that anyone kill signal would raise and ProgramKilled exception throughout my entire program.
def signal_handler(signum, frame):
raise ProgramKilled('Task Executor killed')
I then changed my shut down mechanism in my multi process class to
while True:
# Get the Task Groupe name from the Task queue.
try:
# Reading from pipe
ExecCmd = self._in_p.recv() # type: TaskExecCmd
# If fatal error just close it all
except BrokenPipe:
break
# This can occure close the pipe and break the loop
except EOFError:
self._in_p.close()
break
# Exception for when a kill signal is detected
# Set the multiprocess as killed (just waiting for the kill command from main)
except ProgramKilled:
self._log.info('{:30} : Died'.format(self.name))
self._KilledStatus = True
continue
# kill command from main recieved
# Shut down all we can. Ignore exceptions
if ExecCmd.Kill:
self._log.info('{:30} : Kill Command received'.format(self.name))
try:
self._in_p.close()
self._out_p.join()
except Exception:
pass
self._log.info('{:30} : Kill Command executed'.format(self.name))
break
else if (not self._KilledStatus):
{Execute code}
# When out of the loop set killed event
KilledEvent.set()
And in my main thread I have added the following clean up process.
#loop though all my resources
for ThreadInterfaces in ResourceThreadDict.values():
# test each process in each resource
for ThreadIf in ThreadInterfaces:
# Wait for its event to be set
ThreadIf['KillEvent'].wait()
# When event have been recevied see if its hanging
# We know at this point every thing have been closed and all data have been purged correctly so if its still alive terminate it.
if ThreadIf['Thread'].is_alive():
try:
psutil.Process(ThreadIf['Thread'].pid).terminate()
except (psutil.NoSuchProcess, AttributeError):
pass
Af a lot of testing I know its really hard to control a termination of and app with multiple processes because you simply do not know in which order all of your processes receive this signal.
I've tried to in someway to save most of my data when its killed. Some would argue what I need that data for when manually terminating the app. But in this case this app runs a lot of external scripts and other application and any of those can lock the application and then you need to manually kill it but still retain the information for what have already been executed.
So this is my solution to my current problem with my current knowledge.
Any input or more in depth knowledge on what happening is welcome.
Please note that this app runs both on linux and windows.
Regards

Trying to implement `signal.CTRL_C_EVENT` in Python3.6

I'm reading about signals and am attempting to implement signal.CTRL_C_EVENT
From what I"m understanding, if the user presses CTRC + C while the program is running, a signal will be sent to kill a program. I can specify the program as a parameter?
My attempt to test out the usage:
import sys
import signal
import time
import os
os.kill('python.exe', signal.CTRL_C_EVENT)
while(1):
print ("Wait...")
time.sleep(10)
However, it seems I need a pid number and 'python.exe' doesn't work. I looked under processes and I can't seem to find a PID number. I did see a PID column under services, but there were so many services -- I couldn't find a python one.
So how do I find PID number?
Also, does signal_CTRL_C_EVENT always have to be used within os.kill?
Can It be used for other purposes?
Thank you.
Windows doesn't implement Unix signals, so Python fakes os.kill. Unfortunately its implementation is confusing. It should have been split up into os.kill and os.killpg, but we're stuck with an implementation that mixes the two. To send Ctrl+C or Ctrl+Break, you need to use os.kill as if it were really os.killpg.
When its signal argument is either CTRL_C_EVENT (0) or CTRL_BREAK_EVENT (1), os.kill calls WinAPI GenerateConsoleCtrlEvent. This instructs the console (i.e. the conhost.exe instance that's hosting the console window of the current process) to send the event to a given process group ID (PGID). Group ID 0 is special cased to broadcast the event to all processes attached to the console. Otherwise
a process group ID is the ID of the lead process in a process group. Every process is either created as the leader of a new group or inherits the group of its parent. A new group can be created via the CreateProcess creation flag CREATE_NEW_PROCESS_GROUP.
If either calling GenerateConsoleCtrlEvent fails (e.g. the current process isn't attached to a console) or the signal argument isn't one of the above-mentioned control events, then os.kill instead attempts to open a handle for the given process ID (PID) with terminate access and call WinAPI TerminateProcess. This function is like sending a SIGKILL signal in Unix, but with a variable exit code. Note the confusion in that it operates on an individual process (i.e. kill), not a process group (i.e. killpg).
Windows doesn't provide a function to get the group ID of a process, so generally the only way to get a valid PGID is to create the process yourself. You can pass the CREATE_NEW_PROCESS_GROUP flag to subprocess.Popen via its creationflags parameter. Then you can send Ctrl+Break to the child process and all of its children that are in the same group, but only if it's a console process that's attached to the same console as your current process, i.e. it won't work if you also also use any of these flags: CREATE_NEW_CONSOLE, CREATE_NO_WINDOW, or DETACHED_PROCESS. Also, Ctrl+C is disabled in such a process, unless the child manually enables it via WinAPI SetConsoleCtrlHandler.
Only use os.kill(os.getpid(), signal.CTRL_C_EVENT) when you know for certain that your current process was started as the lead process of a group. Otherwise the behavior is undefined, and in practice it works like sending to process group ID 0.
You can get pid via os.getpid()
os.kill(os.getpid(), signal.CTRL_C_EVENT)

Communicating between Ruby processes, loops

I have a Ruby application which must run 24/7 to process information for a web API, both of which are operating on Google Compute Engine on a Debian Instance - the API is served by Sinatra. When I run this script in loop, it uses up the 1-core vCPU. Using a message queuing system like RabbitMQ to pass messages from the API to the backend script seems to me to skip a learning opportunity for communicating between Ruby scripts natively.
How do I keep a script dormant, i.e. awaiting instruction but not consuming memory 99% CPU? I'm assuming it's not going to be in an infinite loop, but I'm stumped on this.
How would it be best to communicate this message from one script to another? I read about Kernel#Select and forking of subprocesses, but I haven't encountered any definitive or comprehensible solution.
Forking may indeed be a good solution for you, and you only need to understand three system calls to make good use of it: fork(), waitpid() and exec(). I'm not a Ruby guy, so hopefully my C-like explanation will make enough sense for you to fill in the blanks.
The way fork() works is by the operating system making a byte-for-byte copy of the calling process' virtual memory space as it was when fork() was called and carving out new memory to place the copy into. This creates a new process with its parent's exact state--except for that the child process' fork() call returns 0, while the parent's returns the PID of the new child process. This allows the child process to know that it is a child, and the parent process to know who its children are.
While fork() copies its caller's process image, the exec() system call replaces its caller's process image with a brand new one, as specified by its arguments.
The waitpid() system call is used by the parent process to wait for a return value from a specific child process (one whose process ID was returned to the parent by the fork() call), and then properly log the process' completion with the OS. Even if you don't need your child process' return value, you should call waitpid() on it anyway so you don't end up accumulating "zombie processes."
Again, I'm not a Ruby guy, so hopefully my C-like pseudocode makes sense. Consider the following server:
while(1) { # an infinite loop
# Wait for and accept connections from your web API.
pid = fork(); # fork() returns a process ID number
# If fork() returns a negative number, something went wrong.
if(pid < 0) {
exit(1);
}
# If fork() returns 0, this is the child process.
else if(pid == 0) {
# Remember that because fork() copies your program's state,
# you can use variables you assigned before the fork to
# send to the new process as arguments.
exec(./processingscript.rb, "processingscript.rb", arg1, arg2, arg3, ...);
}
# If fork() returns a number greater than 0 (the PID of the forked
# child process), this is the parent process.
else if(pid > 0) {
childreturnvalue = waitpid(pid); # parent process hangs here until
# the process with the ID number
# pid returns.
}
}
Written this way, your CPU-intenive script only runs when a connection is received from the web API. It does its processing and then terminates, waiting to be called again. You can also specify "no hang" options for waitpid() so that you can fork multiple instances of your processing script concurrently without having your server hang every time it needs to wait for an instance of that script to complete.
Hope this helps! Perhaps somebody who knows Ruby can edit this to be a bit more idiomatic to the language.

CreateProcess returns non 0 but GetExitCodeProcess() returns 128

I am creating an application that will start another process using CreateProcess(). And in the parent process I will use GetExitCodeProcess() to check whether the process active or not.
Here CreateProcess() is successful (returned a non negative value) but GetExitCodeProcess() returns 128 (There are no child processes to wait for). I am not seeing any trace of the child process started(usually some debugs). It happens intermittently.
Any idea what really happened to the child process?. Where we get more information (in system/application event logs?).
Please guide me.
Thanks,
Naga
Thanks for your comments.
I have found the following MSDN articles that gives the same symptoms and resolution for the problem.
Cmd.exe, Perl.exe, or other console-mode applications may fail to initialize properly and terminate prematurely when launched by a service using the CreateProcess() or CreateProcessAsUser() APIs. The calling process has no way of knowing that the launched console-mode application has terminated prematurely.
In some instances, calling GetExitCode() against the failed process indicates the following exit code:
128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
http://support.microsoft.com/kb/156484
http://support.microsoft.com/kb/142676/EN-US
http://support.microsoft.com/kb/175687/EN-US
Thanks,
Naga

Resources