I'd like to know how to communicate between processes on a Heroku worker dyno.
We want a Resque worker to read off a queue and send the data to another process running on the same dyno. The "other process" is an off-the-shelf piece of software that usually uses TCP sockets (port xyz) to listen for commands. It is set up to run as a background process before the Resque worker starts.
However, when we try to connect locally to that TCP socket, we get nowhere.
Our Rake task for setting up the queue does this:
task "resque:setup" do
# First launch our listener process in the background
`./some_process_that_listens_on_port_12345 &`
# Now get our queue worker ready, set up Redis backing store
port = 12345
ENV['QUEUE'] = '*'
ENV['PORT'] = port.to_s
Resque.redis = ENV['REDISTOGO_URL']
# Start working from the queue
WorkerClass.enqueue
end
And that works -- our listener process runs, and Resque tries to process queued tasks. However, the Resque jobs fail because they can't connect to localhost:12345 (specifically, Errno::ECONNREFUSED).
Possibly, Heroku is blocking TCP socket communication on the same dyno. Is there a way around this?
I tried to take the "code" out of the situation and just executed on the command line (after the server process claims that it is properly bound to 12345):
nc localhost 12345 -w 1 </dev/null
But this does not connect either.
We are currently investigating changing the client/server code to use UNIXSocket on both sides as opposed to TCPSocket, but as it's an off-the-shelf piece of software, we'd rather avoid our own fork if possible.
Use message queue Heroku add-ons ...,
like IronMQ for exsample
Have you tried Fifo?
http://www.gnu.org/software/libc/manual/html_node/FIFO-Special-Files.html#FIFO-Special-Files
Reading your question, you've answered your own question, you cannot connect to localhost 12345.
This way of setting up your processes is a strange one as your running two processes within one Heroku dyno which removes a lot of the benefits of Heroku, i.e independant process scaling, isolation and clean depenedency declaration and isolation.
I would strongly recommend running this as two seperate processes that interact via a third party backing service.
Heroku only lets you listen in a given port ($PORT) per dyno, I think.
I see two solutions here:
Use Redis as a communication middleware, so the worker would write on Redis again and the listener process, instead of listening in a port would be querying redis for new jobs.
Get another heroku dyno (or better, a complete different application) and launch there the listening process (on $PORT) and communicate both applications
#makdad, is the "3rd party software" written in Ruby? If so, I would run it with a monkey patch which fakes out TCPSocket or whatever class it is using to access the TCP socket. Put the monkey patch in a file of its own, which will only be required by the Ruby process which is running the 3rd party software. The monkey patch could even read data directly from the queue, and make TCPSocket behave as if that data had been received.
Yes, it's not very elegant, and I'm sure there may be a better way to do it, but when are you trying to get a job done (not spend days doing research), sometimes you just have to bite the bullet and do something which is ugly, but works. Whatever solution you choose, make sure to document it for those who work on the project later.
Related
I have a project with lots of Celery tasks, and one of the tasks must be executed only one at a time (it is a request to a 3rd party API which disallows several concurrent connections).
I can achieve this by starting a separate celery process with a separate queue and a concurrency of 1.
Regular celery process:
celery -A sourcery worker -Q default -c 4
A separate single-worker process:
celery -A sourcery worker -Q separate_queue -c 1
But I am on Heroku, and I will be billed doubly for spinning up two processes instead of one. So, is there a way to achieve it with a single Celery process?
There is currently no way to do this.
I had a similar issue where I had a 3rd party API which only allowed one concurrent connection. I ended up running two separate dynos on Heroku.
Somebody did request this feature, but it has not been implemented: https://github.com/celery/celery/issues/1599
Edit:
There is something called celery-multi that could be worth looking into:
Celery - run different workers on one server
I'm not sure it's Heroku compatible though, let me know if you try!
I am looking for a good way to poll a lot of servers for their status through TCP. I am currently using synchronous code and the Minecraft Query Protocol, but whenever a server is offline the rest of the queue gets hold up.
Another problem I am experiencing with my current code is that some servers tend to block my server I use for polling in their firewall, and thus their servers appear offline on my serverlist.
I am using a Ruby rake task with an infinite loop in which every Minecraft server in my MongoDB database gets checked and updated every +- 10 minutes (I try to set this interval by letting the loop sleep (600/ s.count.to_i).ceil seconds.
Is there any way I can do this task efficiently (and prevent servers from blacklisting my IP in their firewall), preferably with Async code in Ruby?
You need to use non-blocking sockets to check - multithreading. The best thing to do is spawn several threads at once to check several servers at once - that way your main thread won't get held up.
This question contains a lot of information about multithreading in Ruby - you should be able to spawn multiple concurrent threads at once, or at least use non-blocking sockets.
Another point given by #Lie Ryan, you can use IO.Select to poll a array of servers, all at once. It will return an array of "online" servers when it's done - this could be more elegant than spawning multiple threads.
In deploying via Nginx/Unicorn, an issue for me is how to get in touch with running instances. I need this to update the in-memory cache of the app.
With Nginx/Thin, I run multiple instances on various ports and calling app instance by port, like:
#!/bin/bash
curl :2000/update_cache/page_id
curl :2001/update_cache/page_id
etc.
Ugly approach but it works, cause i can update the cache of a single page(from many thousands) on all running app instances.
I wonder how I can do the same with Unicorn, but not by port. Does Unicorn provide a list of running instances or somewhat a way to interact with them?
The issue with in-memory cache is that when updating it on some instance, it is normally not updated on other ones, so I get content discrepancy because some users see updated content and others yet stays with old content.
How do I update the cache for all Unicorn instances?
well, can you get the list of workers PIDs?
If so, you can manage them by sending signals.
Ruby plays well with Unix signals, you just need to catch them and perform needed internal operations.
A simple proof of concept:
Signal.trap 'INFO' do
puts "Updating %s" % Process.pid
# clear cache ...
end
Now if you have the workers PIDs, you simply do:
#!/bin/bash
for pid in $pids; do
kill -s INFO $pid
done
You can use any signal instead of INFO.
For list of signals see the Ruby Signal.list
To get the list of workers PIDs, see Unicorn's after_fork config.
This is not possible.
The unicorn master process opens the listening port, and the workers are constantly competing to accept requests on that port. There's no way to select individual workers, other than sending them Unix signals.
This is one of many reasons why caching shouldn't be done like this :)
I am trying to create a Ruby daemon process which clients will be able to connect to.
I need to ensure that the remote Ruby process always remains up and available for connection, so I need to detect network outages or unreachable errors.
I was thinking of having a heartbeat mechanism at the application level between clients and the server, and a timeout in the client if the connection fails.
I was told the select method in Ruby could be of help as well but not sure.
Can anyone share any good links/resources or impart some general wisdom to create reliable and fast daemon processes in Ruby?
I think a lot of people would use eventmachine for this type of application. At its core, it uses epoll (which is similar to select) to decide which socket to deal with next. There are lots of gems that build on eventmachine to allow you to run different types of servers. One example is em-websocket.
I've got an Express Web app running as my main app on Heroku Cedar. I need to run a worker job periodically. I know I can specify a worker: in my Procfile, but that seems to be for a forever running kind of job. Perhaps there is a way to have the event mechanism of nodeJS caus e the worker to Idle, and use Cron to poke it alive periodically??
to keep your process alive you can try using an external service which will "ping" your application, you can use the newrelice free addon on heroku for that.
I am currently experimenting and it seems that even with this the application is still put in idle mode but it restarts on the next "ping" so it is still up most of the time.
I don't known node.js but I do my worker with Ruby+EventMachine inside a ruby on rails application and it works fine, you just need something to work in the background aside of your web requests.