I understand the Heroku API's support for scaling, but how does one control which instance is killed when workers are scaled down? Let's say that my worker (1 of 2) determines that its work is done and therefore wants to scale itself down. If the worker exits, presumably Heroku detects that and starts up a new worker. If the worker uses the API to scale itself, it may end up killing a second copy that is still busy instead of itself. So how does a worker tell Heroku that it is voluntarily scaling down and to not start a new copy?
Looks like there is an add-on for that:
https://devcenter.heroku.com/articles/adept-scale
See http://hirefire.io/ to automatically scale your workers on Heroku.
Related
I'm testing an app with a worker and a web dyno on Heroku free tier and I'd like to keep the worker alive to be able to execute background tasks while letting the web dyno idle. By default they both go idle in 30 mins even if I have things queued on the worker.
I understand there're ways to keep the web dyno alive (and with that the worker as well), and there're ways to keep the web alive while scaling down the worker. However I'd need the worker alive and the web in idle.
I tried running a recurring job on the worker which would
Restart the dyno.
Scale the dyno down and then back up.
Both approaches worked (as in they restarted and scaled the dyno correctly) but the worker dyno would still idle after 30 mins (as if it's dependent on the web dyno). Edit: yep, that's pretty much the case as explained here: https://devcenter.heroku.com/articles/free-dyno-hours#dyno-sleeping
I could do this form the outside but it seems I'd have to constantly check for the state since a new restart doesn't seem to give me 30 mins headway. I'd also have to expose the API key which I'd like to avoid.
If I've gotten you right, you're trying to stop the web dyno and leave the worker dyno alive.
You could do that by going to the Resources tab:
And then in the 'web' section:
Press the pencil, toggle it off and press 'Confirm'.
As a workaround I currently remove the web dyno and explicitly enable it when I need it. As explained here:
Worker-only Free dynos do not sleep since they do not respond to web
requests.
My workaround was to just create two apps that deploy automatically from the same repository. Then, all you would need to do is enable the worker dyno for one and the web dyno for the other.
I am new to Go and I am using go routines in my app in Heroku, which are long (up to 7 minutes), and cannot be interrupted.
I saw that the auto scaler sometimes kills the Heroku dyno which is running the routine. I need a way of running this routine independently from the dynos so I know that it will not get shutdown. I read articles and still don't understand how to perform a go routine in a background worker. It is hard for me to believe I am the only one experiencing this.
My go routines use my redis database.
Could someone please point me to an example of how to setup a background worker in heroku for go and how to send my go routine to that worker?
Thank you very much
I need a way of running this routine independently from the dynos so I
know that it will not get shutdown.
If you don't want to run your worker code on a dyno then you'll need to use a different provider from Heroku, like Amazon AWS, Digital Ocean, Linode etc.
Having said that, you should design your workers, especially those that are mission critical, to be able to recover from a shutdown. Either to be able to continue where they left off or to start over. Heroku's dyno manager restarts the dynos at least once a day but I wouldn't be surprised if the other cloud providers also restart their virtual instances once in a while, probably not once a day but still... And even if you decide to deploy your workers on a physical machine that you control and never turn off, you cannot prevent things like hardware failure or power outage from happening.
If your workers need to perform some task till it's done you need to make them be aware of possible shutdowns and have them handle such scenarios gracefully. Do not ever rely on a machine, physical or virtual, to keep running while your worker is doing it's job.
For example if you're on Heroku, use a worker dyno and make your worker listen for the SIGTERM signal, after your worker receives such a signal...
The application processes have 30 seconds to shut down cleanly
(ideally, they will do so more quickly than that). During this time
they should stop accepting new requests or jobs and attempt to finish
their current requests, or put jobs back on the queue for other worker
processes to handle. If any processes remain after that time period,
the dyno manager will terminate them forcefully with SIGKILL.
... continue reading here.
But keep in mind, as I mentioned earlier, if there is an outage and Heroku goes down, which is something that happens from time to time, your worker won't even have those 30 seconds to clean up.
Very similar question to Is it feasible to run multiple processeses on a Heroku dyno?, or Running Heroku background tasks with only 1 web dyno and 0 worker dynos except I'm talking about a Ruby on Rails app.
Context:
I understand that it's encouraged to separate worker and web dynos... but I'm still testing and don't want to pay the expense. Especially because with my app, all the web requests pretty much happen either in the AM or in the PM, and during the whole middle of the day (and also middle of the night), literally nothing is happening.
I'd like the web dyno to run two types of background processing on the "downtime":
A recurring, long-running task every day (mailings)
An asynchronous, long-running task that is triggered when a user performs a certain action (it's a mailer)
I've done quite a bit of reading on this, but this is my first time doing anything asynchronous, so I wanted to ask the community a couple of questions just to ensure what I'm trying to do is feasible.
Questions
How do I do activity #1 for free?
To put it bluntly... considering my context above, if I use Heroku's Scheduler add-on, this runs a one-off dyno which I'll be charged for since I use NewRelic now to constantly ping my web dyno so it never actually sleeps meaning my one web dyno is my free dyno. Is there another way of doing this with the one web dyno that, in the middle of the night, won't be processing any requests? Alternatively, is there a way to tell New Relic to ping except at certain times, which will also then allow me to spin up a one-off dyno but still be within my free dyno hours?
For activity #2, I'm thinking of using Delayed Jobs, but how do I tell Delayed Jobs to delay until end of user 1's session, and then run mailer for user 1, but then pause again if user 2 sends a request, and then when user 2 is finished, start where left off on user 1's mailer, and then do user 2's mailer... and so forth? I think the root of the challenge here is that from what I've read, Delayed Jobs needs to be started with a script. But I'm not going to be at my computer starting a script all the time. How do I make the start (and the queue as illustrated in the question) something that happens automatically?
Would love even just point me directional pointers on what methods/ what considerations, etc.
I'm going to check out a nifty gem https://github.com/brandonhilkert/sucker_punch to do this. According to the author, it was written specifically to use Heroku's single dyno for hobby websites that have no need to spin up another dyno. It basically creates another thread.
FYI also, there is an add-on link that allows sucker_punch to do recurring tasks, called https://github.com/facto/fist_of_fury
Some image resize jobs failed to exit when our heroku background worker was restarted.
The job is stuck in the busy page of the UI. It looks likes it is occupying one of the busy threads and was started over an hour ago.
But upon inspecting the job args and checking the DB, it looks like the images were actually processed, so maybe it's just redis, or the web UI contains wrong data.
Given the TID, and JID osuuiyruo 8e25ebc62ae7d7023a9b5650
Is there anyway to remove these "stuck" jobs? I tried quieting the workers and stopping them, and then scaling heroku workers to 0 then bringing them back up, but they stay in that stuck busy queue.
Please state your Sidekiq version in the future.
If you are on 3.x, this will be fixed in 3.1.4. https://github.com/mperham/sidekiq/issues/1764
I'm hoping the community can clarify something for me, and that others can benefit.
My understanding is that gunicorn worker processes are essentially virtual replicas of Heroku web dynos. In other words, Gunicorn's worker processes should not be confused with Heroku's worker processes (e.g. Django Celery Tasks).
This is because Gunicorn worker processes are focused on handling web requests (basically throttling up the performance of the Heroku Web Dyno) while Heroku Worker Dynos specialize in Remote API calls, etc that are long-running background tasks.
I have a simple Django app that makes decent use of Remote APIs and I want to optimize the resource balance. I am also querying a PostgreSQL database on most requests.
I know that this is very much an oversimplification, but am I thinking about things correctly?
Some relevant info:
https://devcenter.heroku.com/articles/process-model
https://devcenter.heroku.com/articles/background-jobs-queueing
https://devcenter.heroku.com/articles/django#running-a-worker
http://gunicorn.org/configure.html#workers
http://v3.mike.tig.as/blog/2012/02/13/deploying-django-on-heroku/
https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/gunicorn/
Other Quasi-Related Helpful SO Questions for those researching this topic:
Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack
Performance degradation for Django with Gunicorn deployed into Heroku
Configuring gunicorn for Django on Heroku
Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack
To provide an answer and prevent people from having to search through the comments, a dyno is like an entire computer. Using the Procfile, you give each of your dynos one command to run, and it cranks away on that command, re-running it periodically to refresh it and re-running it when it crashes. As you can imagine, it's rather wasteful to waste an entire computer running a single-threaded webserver, and that's where Gunicorn comes in.
The Gunicorn master thread does nothing but act as a proxy server, spawning a given number of copies of your application (workers), distributing HTTP requests amongst them. It takes advantage of the fact that each dyno actually has multiple cores. As someone mentioned, the number of workers you should choose depends on how much memory your app takes to run.
Contrary to what Bob Spryn said in the last comment, there are other ways of exploiting this opportunity for parallelism to run separate servers on the same dyno. The easiest way is to make a separate sub-procfile and run the all-Python Foreman equivalent, Honcho, from your main Procfile, following these directions. Essentially, in this case your single dyno command is a program that manages multiple single commands. It's kind of like being granted one wish from a genie, and making that wish be for 4 more wishes.
The advantage of this is you get to take full advantage of your dynos' capacity. The disadvantage of this approach is that you lose the ability scale individual parts of your app independently when they're sharing a dyno. When you scale the dyno, it will scale everything you've multiplexed onto it, which may not be desired. You will probably have to use diagnostics to decide when a service should be put on its own dedicated dyno.