how to remove sidekiq stuck job on heroku? - heroku

Some image resize jobs failed to exit when our heroku background worker was restarted.
The job is stuck in the busy page of the UI. It looks likes it is occupying one of the busy threads and was started over an hour ago.
But upon inspecting the job args and checking the DB, it looks like the images were actually processed, so maybe it's just redis, or the web UI contains wrong data.
Given the TID, and JID osuuiyruo 8e25ebc62ae7d7023a9b5650
Is there anyway to remove these "stuck" jobs? I tried quieting the workers and stopping them, and then scaling heroku workers to 0 then bringing them back up, but they stay in that stuck busy queue.

Please state your Sidekiq version in the future.
If you are on 3.x, this will be fixed in 3.1.4. https://github.com/mperham/sidekiq/issues/1764

Related

flaky jobs on heroku using sidekiq/redis

We have been having issues with flaky jobs recently on heroku using sidekiq and redis. We are using sidekiq-cron, and whenever testing locally all our jobs execute fine and without issue on-schedule. It seems however, that we have issues with jobs occassionally not running overnight and when i check the cron tab in our sidekiq web viewer it says "no scheduled jobs were found". If i repush our master branch (empty commit) and re-deploy, the schedule tab comes back in web view and everything then starts running again. In our heroku metrics, I have noticed that, overnight, our sidekiq worker has been running at about 180% of its max memory capacity. could this be causing these intermittent job issues we are seeing where things "just dont run"?. Im at a loss for what to do here since the i havent seen any errors other than customers saying intermittently that they arent getting notification emails. I have just scaled our worker dyno for sidekiq from 1-2 and am hoping that fixes things as we are running a fairly large amount of jobs overnight.

Performing go routines in background

I am new to Go and I am using go routines in my app in Heroku, which are long (up to 7 minutes), and cannot be interrupted.
I saw that the auto scaler sometimes kills the Heroku dyno which is running the routine. I need a way of running this routine independently from the dynos so I know that it will not get shutdown. I read articles and still don't understand how to perform a go routine in a background worker. It is hard for me to believe I am the only one experiencing this.
My go routines use my redis database.
Could someone please point me to an example of how to setup a background worker in heroku for go and how to send my go routine to that worker?
Thank you very much
I need a way of running this routine independently from the dynos so I
know that it will not get shutdown.
If you don't want to run your worker code on a dyno then you'll need to use a different provider from Heroku, like Amazon AWS, Digital Ocean, Linode etc.
Having said that, you should design your workers, especially those that are mission critical, to be able to recover from a shutdown. Either to be able to continue where they left off or to start over. Heroku's dyno manager restarts the dynos at least once a day but I wouldn't be surprised if the other cloud providers also restart their virtual instances once in a while, probably not once a day but still... And even if you decide to deploy your workers on a physical machine that you control and never turn off, you cannot prevent things like hardware failure or power outage from happening.
If your workers need to perform some task till it's done you need to make them be aware of possible shutdowns and have them handle such scenarios gracefully. Do not ever rely on a machine, physical or virtual, to keep running while your worker is doing it's job.
For example if you're on Heroku, use a worker dyno and make your worker listen for the SIGTERM signal, after your worker receives such a signal...
The application processes have 30 seconds to shut down cleanly
(ideally, they will do so more quickly than that). During this time
they should stop accepting new requests or jobs and attempt to finish
their current requests, or put jobs back on the queue for other worker
processes to handle. If any processes remain after that time period,
the dyno manager will terminate them forcefully with SIGKILL.
... continue reading here.
But keep in mind, as I mentioned earlier, if there is an outage and Heroku goes down, which is something that happens from time to time, your worker won't even have those 30 seconds to clean up.

Heroku: Prevent worker process from restarting?

I have a Heroku worker setup to do a long running job which iterates over long periods. However whenever I do an update & deploy of other files in the repo this worker restarts, which is annoying, any way to avoid this?
No. This behaviour is part of Heroku's Automatic Dyno Restarting.
You can't work around this. Instead, you need to build all parts of your app to be able to function properly despite the fact that all dynos will restart at least once every 24 hours or so, whether or not you deploy updates in your repo.
Most significantly, you need to build support for Graceful Shutdown into all your processes (e.g. web process and worker processes).

Heroku delayed_job workers killed during deployment

On Heroku, I use delayed_job to run asynchronous tasks. All is well until I do a git push heroku master and then the Heroku environment kills any worker threads that are in-process.
The issue here is that those jobs never get re-queued since the delayed_job table in my db shows them as still locked and running, even though the workers that used to be servicing them are long dead.
How do I prevent this situation from occurring? I'd like Heroku to wait for all delayed jobs in progress to complete or error out before closing down, or at least terminate them and allow a new worker to be assigned to them once the server comes back up post-reboot from changes being applied by my update.
Looks like you can configure DJ to handle SIGTERM and mark the in-progress jobs as failed (so they'll be restarted again):
Use this setting to throw an exception on TERM signals by adding this in your initializer:
Delayed::Worker.raise_signal_exceptions = :term
More info in this answer:
https://stackoverflow.com/a/16811844/1715829

how to monitor coffee-resque workers, with resque-web

I created some workers with coffee-resque and was trying to view workers using the ruby version of resque-web and only saw intermittent workers flash in and out.
I noticed that coffee-resque untracks workers while paused. Is that the intended behavior? This made it so that resque web only listed flashing intermittent workers and they always had a status of waiting when they did appear, even though that was when they were processing.
Am I doing it wrong or is there a suggested way of monitoring the worker queues?
Also, is there a way to clean up the inactive orphaned worker keys in redis if the worker process failed and didn't do a graceful untrack on exit?
I recently provided a pull request that fixed this issue. It has been accepted into coffee-resque and a new version was released.
https://github.com/technoweenie/coffee-resque/issues/17
This fix was released as 0.1.6.

Resources