In deploying via Nginx/Unicorn, an issue for me is how to get in touch with running instances. I need this to update the in-memory cache of the app.
With Nginx/Thin, I run multiple instances on various ports and calling app instance by port, like:
#!/bin/bash
curl :2000/update_cache/page_id
curl :2001/update_cache/page_id
etc.
Ugly approach but it works, cause i can update the cache of a single page(from many thousands) on all running app instances.
I wonder how I can do the same with Unicorn, but not by port. Does Unicorn provide a list of running instances or somewhat a way to interact with them?
The issue with in-memory cache is that when updating it on some instance, it is normally not updated on other ones, so I get content discrepancy because some users see updated content and others yet stays with old content.
How do I update the cache for all Unicorn instances?
well, can you get the list of workers PIDs?
If so, you can manage them by sending signals.
Ruby plays well with Unix signals, you just need to catch them and perform needed internal operations.
A simple proof of concept:
Signal.trap 'INFO' do
puts "Updating %s" % Process.pid
# clear cache ...
end
Now if you have the workers PIDs, you simply do:
#!/bin/bash
for pid in $pids; do
kill -s INFO $pid
done
You can use any signal instead of INFO.
For list of signals see the Ruby Signal.list
To get the list of workers PIDs, see Unicorn's after_fork config.
This is not possible.
The unicorn master process opens the listening port, and the workers are constantly competing to accept requests on that port. There's no way to select individual workers, other than sending them Unix signals.
This is one of many reasons why caching shouldn't be done like this :)
Related
I'm using Puma as a web server, and Sidekiq as my queue runner.
For multiple things (Database connections, Redis connections, other external services) I'm using the ConnectionPool gem to manage safe access to connections.
Now, depending on whether I'm running in the context of Sidekiq or of Puma, I need those pools to be different sizes (as large as the number of Sidekiq Threads or Puma threads respectively, and they are different)
What is the best way to know, in your initializers, how big to make your connection pools based on execution context?
Thanks!
You use Sidekiq.server? which returns nil when not running inside the Sidekiq process itself.
I don't know about your specific case (puma/sidekiq), but in general you can find this information in the $PROGRAM_NAME variable. Also similar are $0 and __FILE__.
I am looking for a good way to poll a lot of servers for their status through TCP. I am currently using synchronous code and the Minecraft Query Protocol, but whenever a server is offline the rest of the queue gets hold up.
Another problem I am experiencing with my current code is that some servers tend to block my server I use for polling in their firewall, and thus their servers appear offline on my serverlist.
I am using a Ruby rake task with an infinite loop in which every Minecraft server in my MongoDB database gets checked and updated every +- 10 minutes (I try to set this interval by letting the loop sleep (600/ s.count.to_i).ceil seconds.
Is there any way I can do this task efficiently (and prevent servers from blacklisting my IP in their firewall), preferably with Async code in Ruby?
You need to use non-blocking sockets to check - multithreading. The best thing to do is spawn several threads at once to check several servers at once - that way your main thread won't get held up.
This question contains a lot of information about multithreading in Ruby - you should be able to spawn multiple concurrent threads at once, or at least use non-blocking sockets.
Another point given by #Lie Ryan, you can use IO.Select to poll a array of servers, all at once. It will return an array of "online" servers when it's done - this could be more elegant than spawning multiple threads.
I'm not familiar with the internals of Sidekiq and am wondering if it's okay to launch several Sidekiq instances with the same configuration (processing the same queues).
Is it possible that 2 or more Sidekiq instances will process the same message from a queue?
UPDATE:
I need to know if there is a possible conflict, when running Sidekiq on more than 1 machine
Yes, sidekiq can absolutely run many processes against the same queue. Redis will just give the message to a random process.
Nope, I've ran Sidekiqs in different machines with no issues.
Each of the Sidekiqs read from the same redis server, and redis is very robust in multi-threaded, and distributed scenarios.
In addition, if you look at the web interface for Sidekiq, it will show all the workers across all machines because all the workers are logged in the same redis server.
So no, no issues.
I'd like to know how to communicate between processes on a Heroku worker dyno.
We want a Resque worker to read off a queue and send the data to another process running on the same dyno. The "other process" is an off-the-shelf piece of software that usually uses TCP sockets (port xyz) to listen for commands. It is set up to run as a background process before the Resque worker starts.
However, when we try to connect locally to that TCP socket, we get nowhere.
Our Rake task for setting up the queue does this:
task "resque:setup" do
# First launch our listener process in the background
`./some_process_that_listens_on_port_12345 &`
# Now get our queue worker ready, set up Redis backing store
port = 12345
ENV['QUEUE'] = '*'
ENV['PORT'] = port.to_s
Resque.redis = ENV['REDISTOGO_URL']
# Start working from the queue
WorkerClass.enqueue
end
And that works -- our listener process runs, and Resque tries to process queued tasks. However, the Resque jobs fail because they can't connect to localhost:12345 (specifically, Errno::ECONNREFUSED).
Possibly, Heroku is blocking TCP socket communication on the same dyno. Is there a way around this?
I tried to take the "code" out of the situation and just executed on the command line (after the server process claims that it is properly bound to 12345):
nc localhost 12345 -w 1 </dev/null
But this does not connect either.
We are currently investigating changing the client/server code to use UNIXSocket on both sides as opposed to TCPSocket, but as it's an off-the-shelf piece of software, we'd rather avoid our own fork if possible.
Use message queue Heroku add-ons ...,
like IronMQ for exsample
Have you tried Fifo?
http://www.gnu.org/software/libc/manual/html_node/FIFO-Special-Files.html#FIFO-Special-Files
Reading your question, you've answered your own question, you cannot connect to localhost 12345.
This way of setting up your processes is a strange one as your running two processes within one Heroku dyno which removes a lot of the benefits of Heroku, i.e independant process scaling, isolation and clean depenedency declaration and isolation.
I would strongly recommend running this as two seperate processes that interact via a third party backing service.
Heroku only lets you listen in a given port ($PORT) per dyno, I think.
I see two solutions here:
Use Redis as a communication middleware, so the worker would write on Redis again and the listener process, instead of listening in a port would be querying redis for new jobs.
Get another heroku dyno (or better, a complete different application) and launch there the listening process (on $PORT) and communicate both applications
#makdad, is the "3rd party software" written in Ruby? If so, I would run it with a monkey patch which fakes out TCPSocket or whatever class it is using to access the TCP socket. Put the monkey patch in a file of its own, which will only be required by the Ruby process which is running the 3rd party software. The monkey patch could even read data directly from the queue, and make TCPSocket behave as if that data had been received.
Yes, it's not very elegant, and I'm sure there may be a better way to do it, but when are you trying to get a job done (not spend days doing research), sometimes you just have to bite the bullet and do something which is ugly, but works. Whatever solution you choose, make sure to document it for those who work on the project later.
My server process is basically an API that responds to REST requests.
Some of these requests are for starting long running tasks.
Is it a bad idea to do something like this?
get "/crawl_the_web" do
Thread.new do
Crawler.new # this will take many many days to complete
end
end
get "/status" do
"going well" # this can be run while there are active Crawler threads
end
The server won't be handling more than 1000 requests a day.
Not the best idea....
Use a background job runner to run jobs.
POST /crawl_the_web should simply add a job to the job queue. The background job runner will periodically check for new jobs on the queue and execute them in order.
You can use, for example, delayed_job for this, setting up a single separate process to poll for and run the jobs. If you are on Heroku, you can use the delayed_job feature to run the jobs in a separate background worker/dyno.
If you do this, how are you planning to stop/restart your sinatra app? When you finally deploy your app, your application is probably going to be served by unicorn, passenger/mod_rails, etc. Unicorn will manage the lifecycle of its child processes and it would have no knowledge of these long-running threads that you might have launched and that's a problem.
As someone suggested above, use delayed_job, resque or any other queue-based system to run background jobs. You get persistence of the jobs, you get horizontal scalability (just launch more workers on more nodes), etc.
Starting threads during request processing is a bad idea.
Besides that you cannot control your worker threads (start/stop them in a controlled way), you'll quickly get into troubles if you start a thread inside request processing. Think about what happens - the request ends and the process gets prepared to serve the next request, while your worker thread still runs and accesses process-global resources like the database connection, open files, same class variables and global variables and so on. Sooner or later, your worker thread (or any library used from it) will affect the main thread somehow and break other requests and it will be almost impossible to debug.
You're really better off using separate worker processes. delayed_job for example is a really small dependency and easy to use.