Automatically re-spawn a ruby script from within node.js when it fails - ruby

I have a Node.js app which when started spawns a Ruby script to connect to a streaming data service and captures the output via STDOUT which is then served to the client via websocket.
Every now and again the Ruby script will fail (normally due to a disconnect from the far end) and while the Node script will carry on running its obviously not aware the spawned Ruby script has died.
Is there any way I can automate recovery of the spawned Ruby script from within Node or Ruby where I don't have to restart the entire Node instance (thus not booting the clients off) and the script will re-spawn attached to the correct instance of Node?
The script is spawned using the following;
var cp = require('child_process');
var tail = cp.spawn('/var/www/html/mapper/test/feed1-db.rb');
tail.stdout.on('data', function(chunk) {
#<more stuff here where data is split and emitted from the socket>#

I've finally had more time to look into this and have decided that it's probably a very bad idea to be automatically re-spawning failed scripts! (More on that later)
I have found that I can catch both error's and exit's of the child process by using the following;
tail.on('exit', function (code) {
console.log('child process exited with code ' + code);
});
Which will give me the exit code of the child script.
I also found out that I can catch any other errors using;
tail.stderr.on('data', (data) => {
console.error(`child stderr:\n${data}`);
});
Both of these output there error to the console meaning you can still back-trace any issues. I've also expanded the code for the error detection to output a failure notice to connected clients on the web socket.
Now on to why I decided that to auto re-spawn the script was a bad idea...
Up to now most of my underlying issues where caused up stream where I may get some invalid data which would choke my script (I know I should handle that else where but I'm kinda new to this!) or fat fingered problems caused by me!
Without lots of work if the script died due to some invalid data from upstream it would simply try and reconnect to consume the same bad data over and over again till the script got blocked from continuously connecting then disconnecting from the messaging server.
If it was something caused by a fat fingered moment like a bad variable name which isn't often called then I'd have the same problem as above but it could end up bringing down the local server running this script rather then the messaging server. Either way neither of those outcomes are a good way to go!
Unless you are catching very specific exit codes or failures which you know are not 'damaging' then I wouldn't go down this route. The two code blocks above at least allow me to catch the exit/error and notify someone about it so they can intervene and see what triggered it. It also means my on-line users are aware of a background failure where they might see data that appears to be valid, but is actually not updating.
Hopefully this insight helps someone else.

Related

Managing the lifetime of a process I don't control

I'm using Chromium Embedded Framework 3 (via CEFGlue) to host a browser in a third-party process via a plugin. CEF spins up various external processes (e.g. the renderer process) and manages the lifetime of these.
When the third-party process exits cleanly, CefRuntime.Shutdown is called and all the processes exit cleanly. When the third-party process exits badly (for example it crashes) I'm left with CEF executables still running and this (sometimes) causes problems with the host application meaning it doesn't start again.
I'd like a way to ensure that whatever manner the host application exits CefRuntime.Shutdown is called and the user doesn't end up with spurious processes running.
I've been pointed in the direction of job objects (see here) but this seems like it might be difficult to ship in a real solution as on some versions of Windows it requires administrative rights.
I could also set CEF to run in single process mode, but the documentation specifies that this is really for "debugging" only, so I'm assuming shipping this in production code is bad for some reason (see here).
What other options do I have?
Following on from the comments, I've tried passing the PID of the host process through to the client (I can do this by overriding OnBeforeChildProcessLaunch). I've then created a simple watchdog with the following code:
ThreadPool.QueueUserWorkItem(_ => {
var process = Process.GetProcessById(pid);
while (!process.WaitForExit(5000)) {
Console.WriteLine("Waiting for external process to die...");
}
Process.GetCurrentProcess().Kill();
});
I can verify in the debugger that this code executes and that the PID I'm passing into it is correct. However, if I terminate the host process I find that the thread simply dies in a way that I can't control and that the lines following the while loop are never executed (even if I replace it with a Console.WriteLine I never see any more messages printed from this thread.
For posterity, the solution suggested by #IInspectable worked, but in order to make it work I had to switch the implementation of of the external process to use the non-multi threaded message loop.
settings.MultiThreadedMessageLoop = false;
CefRuntime.Initialize(mainArgs, settings, cefWebApp, IntPtr.Zero);
Application.Idle += (sender,e) => {
if (parentProcess.HasExited) Process.GetCurrentProcess().Kill();
CefRuntime.DoMessageLoopWork();
}
Application.Run();

Connecting to an Adobe InDesign console

I have a single instance of InDesign Server running on a Windows 2007 VPS, which runs a SOAP service on port 8081. This runs as a Windows Service and runs both dev and live JSX scripts, depending on the path of the script (we have a dev folder and a live folder).
I am having trouble running a new script, so would like to get access to the console of the running service, but I am struggling to find a reference to how to do this in the Adobe PDF docs. I know the script itself being found, since there are errors in the Windows Event Viewer for a specific code line, but I think it is having trouble locating JSXBIN resources. The error message just lists the variable in question, rather than the explicit path.
I have modified the script to output path information to stdout, but this doesn't get into the Event Log. So, can I get a window on the console of the running service? I don't want to stop the current service as that is in use for live.
Some ideas I've got from the docs:
InDesignServer -console
InDesignServer -LogToApplicationEventLog
I think this executable however starts up a new instance, which isn't what I want (either it would choose a new port number, or try with 8081 and fail to start since the port is in use - I've not tried either for obvious reasons). The flags respectively display stdout in the DOS window, and redirect std out to the Event Log.
In short, I don't think this is possible. I was hesitant to start a new instance on our live server in case it upset anything, but in fact it is quite safe; just ensure that the port you specify is different to your usual one.
InDesignServer -noconsole -port 10001
The noconsole connects stdout and stderr with the current DOS window - using console opens a new one, so it's the former you want.
Aside: it may be worth avoiding LogToApplicationEventLog, since the process can get disconnected from the console, which makes it fiddly to kill in a graceful manner.

How to gracefully stop a server process which is listening on a pipe on Windows

I have a named pipe server similar to the MSDN sample at http://msdn.microsoft.com/en-us/library/windows/desktop/aa365588(v=vs.85).aspx and would like to allow clients to send an "exit" message which causes the server to gracefully stop.
So in the "InstanceThread()", if a special message is received, I would like to make the server stop.
I tried to stop the call to ConnectNamedPipe() in the main thread from the separate thread for "InstanceThread()" by closing the pipe handle, but this does not work.
I already tried various things, among others closing the overall pipe, exiting directly from the InstanceThread, ... but none of them causes the call to ConnectNamedPipe() to stop.
I played with SetNamedPipeHandleState(), but it complicates the implementation hugely, also using overlapped I/O seems overkill for this simple requirement.
So is there an easier way to get ConnectNamedPipe() to return when the server process should be stopped and not wait endlessly for client connections?
If you don't need to support Windows XP, you could try using CancelSynchronousIo.
If the process is exiting, you don't need to do anything; the thread will be terminated when Windows tears down the process.
Alternatively, you could make the call to ConnectNamedPipe exit simply by connecting to the named pipe yourself.

Command line tool error design

I'm currently working on a command line tool and since this is my first time designing a tool like this I have a few design questions, most notably how to handle a non lethal error.
The tool that I'm working on raises a main server on a configurable port and after that an optional web server on a non configurable port. If we then choose to do this again (while using a different port for the main server) we would obviously get an binding error when try to start up the optional web server.
Since this is a non lethal error (running the webserver is optional) and from UI experience my initial thoughts would be to print out a clear error and carry on with the program. However I've been told that from a scripting stand point print out the error and then existing is better practice.
So what is the better?
You might also want to consider that people might want to write scripts which expect the invocation to succeed even if the webserver is already running.
If you define a default behavior of 'fail if webserver already running', then such scripts will have to parse your error message, or read/understand your return value and figure out that the invocation failed for this particular reason (i.e. webserver already running).
Give them a way out of this and introduce a flag (argument) where they can decide which behavior they want. In the absence of the flag, do the safer thing maybe (i.e. error out if webserver is running).

async execution of tasks for a web application

A web application I am developing needs to perform tasks that are too long to be executed during the http request/response cycle. Typically, the user will perform the request, the server will take this request and, among other things, run some scripts to generate data (for example, render images with povray).
Of course, these tasks can take a long time, so the server should not hang for the scripts to complete execution before sending the response to the client. I therefore need to perform the execution of the scripts async, and give the client a "the resource is here, but not ready" and probably tell it a ajax endpoint to poll, so it can retrieve and display the resource when ready.
Now, my question is not relative to the design (although I would very much enjoy any hints on this regard as well). My question is: does a system to solve this issue already exists, so I do not reinvent the square wheel ? If I had to, I would use a process queue manager to submit the task and put a HTTP endpoint to shoot out the status, something like "pending", "aborted", "completed" to the ajax client, but if something similar already exists specifically for this task, I would mostly enjoy it.
I am working in python+django.
Edit: Please note that the main issue here is not how the server and the client must negotiate and exchange information about the status of the task.
The issue is how the server handles the submission and enqueue of very long tasks. In other words, I need a better system than having my server submit scripts on LSF. Not that it would not work, but I think it's a bit too much...
Edit 2: I added a bounty to see if I can get some other answer. I checked pyprocessing, but I cannot perform submission of a job and reconnect to the queue at a later stage.
You should avoid re-inventing the wheel here.
Check out gearman. It has libraries in a lot of languages (including python) and is fairly popular. Not sure if anyone has any out of the box ways to easily connect up django to gearman and ajax calls, but it shouldn't be do complicated to do that part yourself.
The basic idea is that you run the gearman job server (or multiple job servers), have your web request queue up a job (like 'resize_photo') with some arguments (like '{photo_id: 1234}'). You queue this as a background task. You get a handle back. Your ajax request is then going to poll on that handle value until it's marked as complete.
Then you have a worker (or probably many) that is a separate python process connect up to this job server and registers itself for 'resize_photo' jobs, does the work and then marks it as complete.
I also found this blog post that does a pretty good job summarizing it's usage.
You can try two approachs:
To call webserver every n interval and inform a job id; server processes and return some information about current execution of that task
To implement a long running page, sending data every n interval; for client, that HTTP request will "always" be "loading" and it needs to collect new information every time a new data piece is received.
About second option, you can to learn more by reading about Comet; Using ASP.NET, you can do something similiar by implementing System.Web.IHttpAsyncHandler interface.
I don't know of a system that does it, but it would be fairly easy to implement one's own system:
create a database table with jobid, jobparameters, jobresult
jobresult is a string that will hold a pickle of the result
jobparameters is a pickled list of input arguments
when the server starts working on a job, it creates a new row in the table, and spwans a new process to handle that, passing that process the jobid
the task handler process updates the jobresult in the table when it has finished
a webpage (xmlrpc or whatever you are using) contains a method 'getResult(jobid)' that will check the table for a jobresult
if it finds a result, it returns the result, and deletes the row from the table
otherwise it returns an empty list, or None, or your preferred return value to signal that the job is not finished yet
There are a few edge-cases to take care of so an existing framework would clearly be better as you say.
At first You need some separate "worker" service, which will be started separately at powerup and communicated with http-request handlers via some local IPC like UNIX-socket(fast) or database(simple).
During handling request cgi ask from worker state or other data and replay to client.
You can signal that a resource is being "worked on" by replying with a 202 HTTP code: the Client side will have to retry later to get the completed resource. Depending on the case, you might have to issue a "request id" in order to match a request with a response.
Alternatively, you could have a look at existing COMET libraries which might fill your needs more "out of the box". I am not sure if there are any that match your current Django design though.
Probably not a great answer for the python/django solution you are working with, but we use Microsoft Message Queue for things just like this. It basically runs like this
Website updates a database row somewhere with a "Processing" status
Website sends a message to the MSMQ (this is a non blocking call so it returns control back to the website right away)
Windows service (could be any program really) is "watching" the MSMQ and gets the message
Windows service updates the database row with a "Finished" status.
That's the gist of it anyways. It's been quite reliable for us and really straight forward to scale and manage.
-al
Another good option for python and django is Celery.
And if you think that Celery is too heavy for your needs then you might want to look at simple distributed taskqueue.

Resources