Communicating between Ruby processes, loops - ruby

I have a Ruby application which must run 24/7 to process information for a web API, both of which are operating on Google Compute Engine on a Debian Instance - the API is served by Sinatra. When I run this script in loop, it uses up the 1-core vCPU. Using a message queuing system like RabbitMQ to pass messages from the API to the backend script seems to me to skip a learning opportunity for communicating between Ruby scripts natively.
How do I keep a script dormant, i.e. awaiting instruction but not consuming memory 99% CPU? I'm assuming it's not going to be in an infinite loop, but I'm stumped on this.
How would it be best to communicate this message from one script to another? I read about Kernel#Select and forking of subprocesses, but I haven't encountered any definitive or comprehensible solution.

Forking may indeed be a good solution for you, and you only need to understand three system calls to make good use of it: fork(), waitpid() and exec(). I'm not a Ruby guy, so hopefully my C-like explanation will make enough sense for you to fill in the blanks.
The way fork() works is by the operating system making a byte-for-byte copy of the calling process' virtual memory space as it was when fork() was called and carving out new memory to place the copy into. This creates a new process with its parent's exact state--except for that the child process' fork() call returns 0, while the parent's returns the PID of the new child process. This allows the child process to know that it is a child, and the parent process to know who its children are.
While fork() copies its caller's process image, the exec() system call replaces its caller's process image with a brand new one, as specified by its arguments.
The waitpid() system call is used by the parent process to wait for a return value from a specific child process (one whose process ID was returned to the parent by the fork() call), and then properly log the process' completion with the OS. Even if you don't need your child process' return value, you should call waitpid() on it anyway so you don't end up accumulating "zombie processes."
Again, I'm not a Ruby guy, so hopefully my C-like pseudocode makes sense. Consider the following server:
while(1) { # an infinite loop
# Wait for and accept connections from your web API.
pid = fork(); # fork() returns a process ID number
# If fork() returns a negative number, something went wrong.
if(pid < 0) {
exit(1);
}
# If fork() returns 0, this is the child process.
else if(pid == 0) {
# Remember that because fork() copies your program's state,
# you can use variables you assigned before the fork to
# send to the new process as arguments.
exec(./processingscript.rb, "processingscript.rb", arg1, arg2, arg3, ...);
}
# If fork() returns a number greater than 0 (the PID of the forked
# child process), this is the parent process.
else if(pid > 0) {
childreturnvalue = waitpid(pid); # parent process hangs here until
# the process with the ID number
# pid returns.
}
}
Written this way, your CPU-intenive script only runs when a connection is received from the web API. It does its processing and then terminates, waiting to be called again. You can also specify "no hang" options for waitpid() so that you can fork multiple instances of your processing script concurrently without having your server hang every time it needs to wait for an instance of that script to complete.
Hope this helps! Perhaps somebody who knows Ruby can edit this to be a bit more idiomatic to the language.

Related

How to check if a process started in the background still running?

It looks like if you create a subprocess via exec.Cmd and Start() it, the Cmd.Process field is populated right away, however Cmd.ProcessState field remains nil until the process exits.
// ProcessState contains information about an exited process,
// available after a call to Wait or Run.
ProcessState *os.ProcessState
So it looks like I can't actually check the status of a process I Start()ed while it's still running?
It makes no sense to me ProcessState is set when the process exits. There's an ProcessState.Exited() method which will always return true in this case.
So I tried to go this route instead: cmd.Process.Pid field exists right after I cmd.Start(), however it looks like os.Process doesn't expose any mechanisms to check if the process is running.
os.FindProcess says:
On Unix systems, FindProcess always succeeds and returns a Process for the given pid, regardless of whether the process exists.
which isn't useful –and it seems like there's no way to go from os.Process to an os.ProcessState unless you .Wait() which defeats the whole purpose (I want to know if the process is running or not before it has exited).
I think you have two reasonable options here:
Spin off a goroutine that waits for the process to exit. When the wait is done, you know the process exited. (Positive: pretty easy to code correctly; negative: you dedicate an OS thread to waiting.)
Use syscall.Wait4() on the published Pid. A Wait4 with syscall.WNOHANG set returns immediately, filling in the status.
It might be nice if there were an exported os or cmd function that did the Wait4 for you and filled in the ProcessState. You could supply WNOHANG or not, as you see fit. But there isn't.
The point of ProcessState.Exited() is to distinguish between all the various possibilities, including:
process exited normally (with a status byte)
process died due to receiving an unhandled signal
See the stringer for ProcessState. Note that there are more possibilities than these two ... only there seems to be no way to get the others into a ProcessState. The only calls to syscall.Wait seem to be:
syscall/exec_unix.go: after a failed exec, to collect zombies before returning an error; and
os/exec_unix.go: after a call to p.blockUntilWaitable().
If it were not for the blockUntilWaitable, the exec_unix.go implementation variant for wait() could call syscall.Wait4 with syscall.WNOHANG, but blockUntilWaitable itself ensures that this is pointless (and the goal of this particular wait is to wait for exit anyway).

MPI: Is there a way to receive variables only when it has changed?

I'm working on a project where I need to implement some sort of termination detection via a variable which changes only in the root process of an MPI program.
I am struggling to understand the concepts of blocking and non-blocking instructions.
In short, only the root process can determine if the task has been completed or not. This is done by implementing a simple Boolean integer variable called "running". This has to be broadcasted to all processes in order for them to know when to exit their while-loops.
All processes run in their own while-loop. At the start, the root process sets the "running" variable to true if necessary.
The root process can then determine if the "running" variable should be set to zero and should broadcast it to all other processes.
Currently, I am using a broadcast to share this variable. Thus, whenever the loop reaches its end (or "running" gets set to zero) it broadcasts the value to all processes. Thus, each process has a broadcast inside of their function to receive the value.
I am either misunderstanding the concept of blocking or my program is not efficient.
Broadcast is blocking, thus, if the root keeps on broadcasting the variable that essentially stays the same (TRUE) for the majority of the running time, each process will essentially have to wait for the root to complete its work and then block before that process can continue with its future work.
The problem exists that since this variable only changes once in the root process, there are many unnecessary blocks happening while running. I only want the variable to be broadcasted once it has been changed to zero so that I can tell the other processes to terminate a part of their code and not have to wait for the root to broadcast every time.
if(myRank != 0) {
while(running) {
doThisFunction(myRank);
MPI_Broadcast(... running ...); //Wait for root to broadcast?
}
/* Start doing something else */
} else {
while(running || ... ) {
/* Do stuff */
if (...) {
running = 0; //Somewhere in an if statement
MPI_Broadcast(... running ...); //Now terminate the while
}
MPI_Broadcast(... running ...); //Unnecessary broadcast?
}
}
I was thinking that I could use MPI_IProbe to check if there's a message to be received and then removing the MPI_Broadcast in the root's while-loop. If there is, then the process will initiate an MPI_Broadcast. If not, then it will continue as normal.
TL;DR:
My program terminates some code in processes if "running" equals zero. Currently, it broadcasts this in every while iteration and I think this causes the program to have an unnecessary block. I only want to send/ receive the variable when it is changed in the root.
Thanks for the help!
edit: "running" is a global variable.

Trying to implement `signal.CTRL_C_EVENT` in Python3.6

I'm reading about signals and am attempting to implement signal.CTRL_C_EVENT
From what I"m understanding, if the user presses CTRC + C while the program is running, a signal will be sent to kill a program. I can specify the program as a parameter?
My attempt to test out the usage:
import sys
import signal
import time
import os
os.kill('python.exe', signal.CTRL_C_EVENT)
while(1):
print ("Wait...")
time.sleep(10)
However, it seems I need a pid number and 'python.exe' doesn't work. I looked under processes and I can't seem to find a PID number. I did see a PID column under services, but there were so many services -- I couldn't find a python one.
So how do I find PID number?
Also, does signal_CTRL_C_EVENT always have to be used within os.kill?
Can It be used for other purposes?
Thank you.
Windows doesn't implement Unix signals, so Python fakes os.kill. Unfortunately its implementation is confusing. It should have been split up into os.kill and os.killpg, but we're stuck with an implementation that mixes the two. To send Ctrl+C or Ctrl+Break, you need to use os.kill as if it were really os.killpg.
When its signal argument is either CTRL_C_EVENT (0) or CTRL_BREAK_EVENT (1), os.kill calls WinAPI GenerateConsoleCtrlEvent. This instructs the console (i.e. the conhost.exe instance that's hosting the console window of the current process) to send the event to a given process group ID (PGID). Group ID 0 is special cased to broadcast the event to all processes attached to the console. Otherwise
a process group ID is the ID of the lead process in a process group. Every process is either created as the leader of a new group or inherits the group of its parent. A new group can be created via the CreateProcess creation flag CREATE_NEW_PROCESS_GROUP.
If either calling GenerateConsoleCtrlEvent fails (e.g. the current process isn't attached to a console) or the signal argument isn't one of the above-mentioned control events, then os.kill instead attempts to open a handle for the given process ID (PID) with terminate access and call WinAPI TerminateProcess. This function is like sending a SIGKILL signal in Unix, but with a variable exit code. Note the confusion in that it operates on an individual process (i.e. kill), not a process group (i.e. killpg).
Windows doesn't provide a function to get the group ID of a process, so generally the only way to get a valid PGID is to create the process yourself. You can pass the CREATE_NEW_PROCESS_GROUP flag to subprocess.Popen via its creationflags parameter. Then you can send Ctrl+Break to the child process and all of its children that are in the same group, but only if it's a console process that's attached to the same console as your current process, i.e. it won't work if you also also use any of these flags: CREATE_NEW_CONSOLE, CREATE_NO_WINDOW, or DETACHED_PROCESS. Also, Ctrl+C is disabled in such a process, unless the child manually enables it via WinAPI SetConsoleCtrlHandler.
Only use os.kill(os.getpid(), signal.CTRL_C_EVENT) when you know for certain that your current process was started as the lead process of a group. Otherwise the behavior is undefined, and in practice it works like sending to process group ID 0.
You can get pid via os.getpid()
os.kill(os.getpid(), signal.CTRL_C_EVENT)

How can I get the PID of a new process before it executes?

So that I can do some injecting and interposing using the inject_and_interpose code, I need to way to get the PID of a newly-launched process (a typical closed-source user application) before it actually executes.
To be clear, I need to do better than just "notice it quickly"--I can't be polling, or receiving some asynchronous notification that means that the process has already been executing for a few milliseconds by the time I take action.
I need to have a chance to do my injecting and interposing before a single statement executes.
I'm open to writing a background process that gets synchronously notified when a process by a particular name comes into existence. I'm also open to writing a launcher application that in turn fires up the target application.
Any solution needs to support 64-bit code, at a minimum, under 10.5 (Leopard) through 10.8 (Mountain Lion).
In case this proves to be painfully simple, I'll go ahead and admit that I'm new to OS X :) Thanks!
I know how to do this on Linux, so maybe it would be the same(-ish) on OSX.
You first call fork() to duplicate your process. The return value of fork() indicates whether you are the parent or child. The parent gets the pid of the child process, and the child gets zero.
So then, the child calls exec() to actually begin executing the new executable. With the use of a pipe created before the call to fork, the child could wait on the parent to do whatever it needed before execing the new execuatable.
pid_t pid = fork();
if (pid == -1) {
perror("fork");
exit(1);
}
if (pid > 0) {
// I am the parent, and pid is the PID of the child process.
//TODO: If desired, somehow notify child to proceed with exec
}
else {
// I am the child.
//TODO: If desired, wait no notification from parent to continue
execl("path/to/executable", "executable", "arg1", NULL);
// Should never get here.
fprintf(stderr, "ERROR: execl failed!\n");
}

what happens at the lower levels after a fork system call?

I know what the fork() does at the higher level. What I'd like to know is this -
As soon as there is a fork call, a trap instruction follows and control jumps to execute the fork "handler" . Now,How does this handler , which creates the child process, by duplicating the parent process by creating another address space and process control block , return 2 values, one to each process ?
At what point of execution does the fork return 2 values ?
To put it in short, can anbody please explain the step-by-step events that take place at the lower level after a fork call ?
It's not so hard right - the kernel half of the fork() syscall can tell the difference between the two processes via the Process Control Block as you mentioned, but you don't even need to do that. So the pseudocode looks like:
int fork()
{
int orig_pid = getpid();
int new_pid = kernel_do_fork(); // Now there's two processes
// Remember, orig_pid is the same in both procs
if (orig_pid == getpid()) {
return new_pid;
}
// Must be the child
return 0;
}
Edit:
The naive version does just as you describe - it creates a new process context, copies all of the associated thread contexts, copies all of the pages and file mappings, and the new process is put into the "ready to run" list.
I think the part you're getting confused on is, that when these processes resume (i.e. when the parent returns from kernel_do_fork, and the child is scheduled for the first time), it starts in the middle of the function (i.e. executing that first 'if'). It's an exact copy - both processes will execute the 2nd half of the function.
The value returned to each process is different. The parent/original thread get's the PID of the child process and the child process get's 0.
The Linux kernel achieves this on x86 by changing the value in the eax register as it copies the current thread in the parent process.

Resources