What are the circumstances in which a fork do call would throw Errno::EPIPE: Broken pipe? I'm using Resque on an AWS instance and bizarrely, while Resque runs fine on a staging server, it is constantly throwing these broken pipe errors when it tries to have a worker fork a child process on a production server even though they are using the same AMI. I've put in enough logging statements to identify that the error is thrown when Resque attempts to fork:
#child = fork do
unregister_signal_handlers if term_child
perform(job, &block)
exit! unless run_at_exit_hooks
end
So the error comes from that top line (https://github.com/resque/resque/blob/master/lib/resque/worker.rb#L909). I'm not clear though why a fork call would throw a Errno:EPIPE. The server has plenty of memory, CPU, and hard drive to spare.
Related
On Heroku, I use delayed_job to run asynchronous tasks. All is well until I do a git push heroku master and then the Heroku environment kills any worker threads that are in-process.
The issue here is that those jobs never get re-queued since the delayed_job table in my db shows them as still locked and running, even though the workers that used to be servicing them are long dead.
How do I prevent this situation from occurring? I'd like Heroku to wait for all delayed jobs in progress to complete or error out before closing down, or at least terminate them and allow a new worker to be assigned to them once the server comes back up post-reboot from changes being applied by my update.
Looks like you can configure DJ to handle SIGTERM and mark the in-progress jobs as failed (so they'll be restarted again):
Use this setting to throw an exception on TERM signals by adding this in your initializer:
Delayed::Worker.raise_signal_exceptions = :term
More info in this answer:
https://stackoverflow.com/a/16811844/1715829
I have a Node.js app which when started spawns a Ruby script to connect to a streaming data service and captures the output via STDOUT which is then served to the client via websocket.
Every now and again the Ruby script will fail (normally due to a disconnect from the far end) and while the Node script will carry on running its obviously not aware the spawned Ruby script has died.
Is there any way I can automate recovery of the spawned Ruby script from within Node or Ruby where I don't have to restart the entire Node instance (thus not booting the clients off) and the script will re-spawn attached to the correct instance of Node?
The script is spawned using the following;
var cp = require('child_process');
var tail = cp.spawn('/var/www/html/mapper/test/feed1-db.rb');
tail.stdout.on('data', function(chunk) {
#<more stuff here where data is split and emitted from the socket>#
I've finally had more time to look into this and have decided that it's probably a very bad idea to be automatically re-spawning failed scripts! (More on that later)
I have found that I can catch both error's and exit's of the child process by using the following;
tail.on('exit', function (code) {
console.log('child process exited with code ' + code);
});
Which will give me the exit code of the child script.
I also found out that I can catch any other errors using;
tail.stderr.on('data', (data) => {
console.error(`child stderr:\n${data}`);
});
Both of these output there error to the console meaning you can still back-trace any issues. I've also expanded the code for the error detection to output a failure notice to connected clients on the web socket.
Now on to why I decided that to auto re-spawn the script was a bad idea...
Up to now most of my underlying issues where caused up stream where I may get some invalid data which would choke my script (I know I should handle that else where but I'm kinda new to this!) or fat fingered problems caused by me!
Without lots of work if the script died due to some invalid data from upstream it would simply try and reconnect to consume the same bad data over and over again till the script got blocked from continuously connecting then disconnecting from the messaging server.
If it was something caused by a fat fingered moment like a bad variable name which isn't often called then I'd have the same problem as above but it could end up bringing down the local server running this script rather then the messaging server. Either way neither of those outcomes are a good way to go!
Unless you are catching very specific exit codes or failures which you know are not 'damaging' then I wouldn't go down this route. The two code blocks above at least allow me to catch the exit/error and notify someone about it so they can intervene and see what triggered it. It also means my on-line users are aware of a background failure where they might see data that appears to be valid, but is actually not updating.
Hopefully this insight helps someone else.
I'm invoking "knife ec2 server create" to create many ec2 instances with a delay of 10 seconds. It works well for few instances (approx. 10). However, if I create more instances (in the order of 30), I start getting the following argument error:
.INFO: SIGHUP received, reconfiguring
ERROR: ArgumentError: You must pass :on, :tail, or :head to :on
The error seems to happen during random phases. Sometimes while waiting for the ec2 instance, sometimes later when executing my recipe.
Is there a limit of knife processes or chef api calls I should have running at the same time?
I suspect this has nothing to do with Chef (although the error you're getting is being swallowed by Chef). I think the EC2 API is rate limiting you. You may need to add a splay or delay between calls or perform them in smaller batches.
If you are continuing to experience this error, I would recommend opening a ticket at https://tickets.opscode.com
My app runs on Heroku with unicorn and uses sucker_punch to send a small quantity of emails in the background without slowing the web UI. This has been working pretty well for a few weeks.
I changed the unicorn config to the Heroku recommended config. The recommended config
includes an option for the number of unicorn processes and I upped the number of processes from 2 to 3.
Apparently that was too much. The sucker_punch jobs stopped running. I have log messages that indicate when they are queued and I have messages that indicate when they start processing. The log shows them being queued but the processing never starts.
My theory is that I exceeded memory by going from 2 to 3 unicorns.
I did not find a message anywhere indicating a problem.
Q1: should I expect to find a failure messsage somewhere? Something like "attempting to start sucker_punch -- oops, not enough memory"?
Q2: Any suggestions on how I can be notified of a failure like this in the future.
Thanks.
If you are indeed exceeding dyno memory, you should find R14 or R15 errors in your logs. See https://devcenter.heroku.com/articles/error-codes#r14-memory-quota-exceeded
A more likely problem, though, given that you haven't found these errors, is that something within the perform method of your sucker punch worker is throwing an exception. I've found sucker punch tasks to be a pain to debug because it appears the lib swallows all exceptions silently. Try instantiating your task and calling perform on it from a rails console to make sure that it behaves as you expect.
For example, you should be able to do this without causing an exception:
task = YourTask.new
task.perform :something, 55
I have a small HTTP server script I've written using eventmachine which needs to call external scripts/commands and does so via backticks (``). When serving up requests which don't run backticked code, everything is fine, however, as soon as my EM code executes any backticked external script, it stops serving requests and stops executing in general.
I noticed eventmachine seems to be sensitive to sub-processes and/or threads, and appears to have the popen method for this purpose, but EM's source warns that this method doesn't work under Windows. Many of the machines running this script are running Windows, so I can't use popen.
Am I out of luck here? Is there a safe way to run an external command from an eventmachine script under Windows? Is there any way I could fire off some commands to be run externally without blocking EM's execution?
edit: the culprit that seems to be screwing up EM the most is my usage of the Windows start command, as in: start java myclass. The reason I'm using start is because I want those external scripts to start running and keep running after the EM request is served
The ruby documentation states that the backtick operator "Returns the standard output of running cmd in a subshell"
So if your command i.e. start java myclass is continuing to run then ruby is waiting for it to finish to pass back it's output to your program.
Try win32-open3 (and if it needs to be cross-platform not windows-only, also have a look at POpen4)
EventMachine has a thread pool. You can EM.defer your backticks like this
EM.defer { `start java myclass` }
By default the thread pool has 20 threads, and you can change its size by assiging EM.threadpool_size a value.
Important to note, that EM.defer can be passed operation, which is executed in deferred thread, callback, which is executed in reactor thread, and error callback which is run in reactor thread when the operation raises the exception.
If you use Java, you may consider using jruby, which has real threads support, and you could probably reuse your Java code from within jruby.