Python Tornado: What is the correct way to launch background tasks so they can be gracefully closed on shutdown? - python-asyncio

I am trying to support graceful shutdown for a Tornado application. I would like to stop receiving new requests and wait for the existing requests to complete before proceeding to close the connection pools and shutting down. However, I don’t see a way to monitor which background tasks are running to know that they are complete. What is the right way to do this with Tornado?
The other StackOverflow questions I found did not appear to address this issue.
The app code looks something like this:
class JobHandler(RequestHandler):
async def post(self):
...
IOLoop.instance().add_callback(self.backgroundtask, parameter)
...
server = None
async def shutdown():
# stop listening for new requests
server.stop()
# wait for all open tasks to complete ???
# close connection pools
def exit_handler(sig, frame):
tornado.ioloop.IOLoop.instance().add_callback_from_signal(shutdown)
if __name__ == "__main__":
uvloop.install()
signal.signal(signal.SIGTERM, exit_handler)
signal.signal(signal.SIGINT, exit_handler)
application = tornado.web.Application([
(r"/request_work", JobHandler),
])
server = HTTPServer(application)
server.listen(8888)
tornado.ioloop.IOLoop.instance().start()

Related

Jupyterhub custom spawner start long delays

My custom spawner connects via ssh to a slurm submit node on user's behalf and submits a slurm job.
All of that takes a long time, around 10 seconds if the job can start straight away, which is expected, but I want the user to be redirected to a progress page immediately.
Instead there is a 10 second hang between the user pressing the "start" button and the progress page. It looks like the Jupyterhub waits for the start method to complete before redirecting.
The start method does the following:
await for asyncssh connection
await for slurm job to be submitted
await for a job status to be "Running".
So there seems to be a lot of opportunities for Jupyterhub to do other things while the start method is running.
It looks like the issue was related to my spawner using an options_form. Options form causes the spawning process to make a POST request, and in JupyterHub 1.1 POST spawning doesn't go to a pending page.
This behavior is fixed in the current master branch:
https://github.com/jupyterhub/jupyterhub/commit/3908c6d041987e69db7150dcf2041916053b863d

Rails - Concurrency issue with puma workers

I have a Puma server configured to use two workers, each with 16 threads. And having config.threadsafe! disabled to allow threading using puma.
Now I have a code, which I doubt not using threadsafety even though I have used Mutex as a constant in there. I want this code to be executed by only one puma thread at a time to avoid concurrency issues, and uses Mutex for it.
Now, My question is,
Does Mutex works to inject threadsafety while using puma threads, on multiple workers? As I understand, worker is a separate process and so Mutex will not work.
If Mutex doesn't work as per above, then what could be the solution to enable threadsafety on perticular code?
Code example
class MyService
...
MUTEX = Mutex.new
...
def initialize
...
end
def doTask
MUTEX.synchronize do
...
end
end
end
The MUTEX thing didn't worked for me, so I need to find another approach. Please see the solution below.
The problem is, Diff. puma threads are making requests to external remote API at the same time and sometimes the remote API takes time to respond.
I wanted to restrict the number of total API requests, but it was not working because of above issue.
To resolve this,
I have created a DB table where I will create a new entry as in-pogress , when the request is sent to external API.
Once that API responds back, I will update the entry as processed
I am checking total requests having in-progress before making any new requests to the external API.
This way, I am able to restrict the total number of requests from my system to external API.

How long can a Worker Role process set status to "busy" before getting killed?

I have a worker role process that want to stop processing new requests when it's too busy (e.g. CPU load > 80%, long disk queue, or some other metrics).
If I set the role status to "busy", will it get killed by Fabric Controller after busying for too long time? If yes, how long will it takes until the Fabric Controller kill the process?
I assume the process is still capable to receive/send signals to the Fabric agent.
Thanks!
You can leave an instance in the Busy status forever. The only time Azure will take recovery action is if the process exits. See http://blogs.msdn.com/b/kwill/archive/2013/02/28/heartbeats-recovery-and-the-load-balancer.aspx for some additional information.
Also, what is your worker role doing? Setting the instance status to Busy will only take it out of the load balancer rotation so that new incoming TCP connections will not get routed to that instance. But if your worker role is a typical worker role where it does background jobs (ie. sits in a loop picking messages up from a queue, or listening on an InternalEndpoint for requests coming from a front end web role) then setting it to Busy will have no effect. In this scenario you would add logic to your code to stop doing work, but what that looks like will depend on what type of work your role is doing.

Running an EventMachine Worker on Heroku + Sinatra + Twitter Streaming API?

I'm trying to get my head around the asynchronous pattern involved in running eventmachine on Heroku with Sinatra. In a nutshell, what I'm trying to achieve is this: using em-http create a http request to the twitter streaming api, on the stream callback, parse and push the tweet to clients using websockets. So far, so good. The problem arises when the same application also needs to serve webpages. In my config.ru I have, among other Bundler stuff,
require 'app'
run TwitterApp
Then in my app file, the EM block:
EM.run{
class TwitterApp < Sinatra::Base {
get '/' do
haml :index
end
}
http = EventMachine::HttpRequest.new(url, options).get :head=>{'Authorization' => [USERNAME, PASSWORD]}
http.stream do |chunk|
#parse tweet, push using websockets
end
}
Now, what seems to be happening is that run TwitterApp never gets reached because EventMachine uses the Reactor pattern and never returns.
Alternately, if I try to do a
App.run!
within the EM.run block, everything runs fine locally and running using ruby app.rb, but using rackup it seems to run the server twice (once with thin and the other with WEBrick) and on Heroku it crashes with
Error R11 (Bad bind) -> Process bound to port other than $PORT
Stopping process with SIGKILL
Am I missing something very trivial here?
Thanks very much!
For this, you could just run async_sinatra -- https://github.com/raggi/async_sinatra -- and use its asynchronous handlers rather than rolling your own.
I run the reactor in it's own thread so that it doesn't block the main process:
if not EM.reactor_running?
Thread.new {
EM.run {
logger.info "Starting EventMachine Reactor"
EM.error_handler{ |e|
logger.error "Error raised during event loop: #{e.message}"
logger.error e.backtrace unless e.backtrace.nil?
}
}
}
else
logger.info "Reactor already started"
end
I then run things via
EM.next_tick { do_background_stuff }
I am still waiting to find out you need to have a Worker Dyno to use this pattern.
I am not so familiar with Eventmachine, but as far as I understand it, Websockets are not yet supported on Heroku. Projects like travis-ci get around that by using a service like Pusher to serve their Websockets.
The R11 (bad bind) error on Heroku means, that you have to make sure that your web worker binds to the port that it gets from Heroku (ENV["PORT"]). This makes sure the HTTP routing is supported I guess.
I hope this is helpful in some way.
You could separate your app into multiple server instances. App 1 serves web pages and App 2 runs the event-machine server (both are connected to the same db). You can use Pusher to glue it all together with web sockets.
Could you paste a gist of the full Sinatra app?

Is it a bad idea to create worker threads in a server process?

My server process is basically an API that responds to REST requests.
Some of these requests are for starting long running tasks.
Is it a bad idea to do something like this?
get "/crawl_the_web" do
Thread.new do
Crawler.new # this will take many many days to complete
end
end
get "/status" do
"going well" # this can be run while there are active Crawler threads
end
The server won't be handling more than 1000 requests a day.
Not the best idea....
Use a background job runner to run jobs.
POST /crawl_the_web should simply add a job to the job queue. The background job runner will periodically check for new jobs on the queue and execute them in order.
You can use, for example, delayed_job for this, setting up a single separate process to poll for and run the jobs. If you are on Heroku, you can use the delayed_job feature to run the jobs in a separate background worker/dyno.
If you do this, how are you planning to stop/restart your sinatra app? When you finally deploy your app, your application is probably going to be served by unicorn, passenger/mod_rails, etc. Unicorn will manage the lifecycle of its child processes and it would have no knowledge of these long-running threads that you might have launched and that's a problem.
As someone suggested above, use delayed_job, resque or any other queue-based system to run background jobs. You get persistence of the jobs, you get horizontal scalability (just launch more workers on more nodes), etc.
Starting threads during request processing is a bad idea.
Besides that you cannot control your worker threads (start/stop them in a controlled way), you'll quickly get into troubles if you start a thread inside request processing. Think about what happens - the request ends and the process gets prepared to serve the next request, while your worker thread still runs and accesses process-global resources like the database connection, open files, same class variables and global variables and so on. Sooner or later, your worker thread (or any library used from it) will affect the main thread somehow and break other requests and it will be almost impossible to debug.
You're really better off using separate worker processes. delayed_job for example is a really small dependency and easy to use.

Resources