flaky jobs on heroku using sidekiq/redis - heroku

We have been having issues with flaky jobs recently on heroku using sidekiq and redis. We are using sidekiq-cron, and whenever testing locally all our jobs execute fine and without issue on-schedule. It seems however, that we have issues with jobs occassionally not running overnight and when i check the cron tab in our sidekiq web viewer it says "no scheduled jobs were found". If i repush our master branch (empty commit) and re-deploy, the schedule tab comes back in web view and everything then starts running again. In our heroku metrics, I have noticed that, overnight, our sidekiq worker has been running at about 180% of its max memory capacity. could this be causing these intermittent job issues we are seeing where things "just dont run"?. Im at a loss for what to do here since the i havent seen any errors other than customers saying intermittently that they arent getting notification emails. I have just scaled our worker dyno for sidekiq from 1-2 and am hoping that fixes things as we are running a fairly large amount of jobs overnight.

Related

Heroku: Prevent worker process from restarting?

I have a Heroku worker setup to do a long running job which iterates over long periods. However whenever I do an update & deploy of other files in the repo this worker restarts, which is annoying, any way to avoid this?
No. This behaviour is part of Heroku's Automatic Dyno Restarting.
You can't work around this. Instead, you need to build all parts of your app to be able to function properly despite the fact that all dynos will restart at least once every 24 hours or so, whether or not you deploy updates in your repo.
Most significantly, you need to build support for Graceful Shutdown into all your processes (e.g. web process and worker processes).

how to remove sidekiq stuck job on heroku?

Some image resize jobs failed to exit when our heroku background worker was restarted.
The job is stuck in the busy page of the UI. It looks likes it is occupying one of the busy threads and was started over an hour ago.
But upon inspecting the job args and checking the DB, it looks like the images were actually processed, so maybe it's just redis, or the web UI contains wrong data.
Given the TID, and JID osuuiyruo 8e25ebc62ae7d7023a9b5650
Is there anyway to remove these "stuck" jobs? I tried quieting the workers and stopping them, and then scaling heroku workers to 0 then bringing them back up, but they stay in that stuck busy queue.
Please state your Sidekiq version in the future.
If you are on 3.x, this will be fixed in 3.1.4. https://github.com/mperham/sidekiq/issues/1764

Upgrade process on Heroku?

If I update an application running on Heroku using git push and this application is running on multiple dynos - how is the upgrade process run by Heroku?
All dynos at the same time?
One after another?
...?
In other words: Will there be a down-time of my "cluster", or will there be a small time-frame where different versions of my app are running in parallel, or ...?
well can not tell the internal state but what i have experienced is
Code push complete
Code compiled (slug is compiled )
After that all dynos get the latest code and get restarted. (restart take up to 30 seconds or so and during this time all requests get queue).
So there will be no down time as during the restart process all the requests get queued and there i dont think that that multiple versions of your code will be running after the deployment.
Everyone says there's 'no downtime' when updating a Heroku app, but for your app this may not be true.
I've recently worked on a reasonably sized Rails app that takes at least 25 seconds to start, and often fails to start inside the 30 seconds that Heroku allows before returning errors to your clients.
During this entire time, your users are waiting for something to happen. 30 seconds is a long time, and they may not be patient enough to wait.
Someone once told me that if you have more than 1 dyno, that they are re-started individually to give you no downtime. This is not true - Heroku Stops all dynos and then Starts all Dynos.
At no time will there be 2 versions of your app running on Heroku

Is there any down time when committing to a clojure app running on Heroku?

Is there any potential downtime when I do a commit to a clojure/Java app running on Heroku?
I am guessing not - but can't find out for sure.
Thanks.
When you push to Heroku, you invoke the slug compiler, which does all the heavy lifting needed to turn your application into a self-contained archive. That can take a little while, as you see whenever you run git push. However, during this time, your application is running normally.
When your slug finishes compiling, Heroku then pushes it out to the dyno grid. This causes existing web dynos to stop and causes new ones to start. Your application will be unresponsive between the time that the old dynos stop and the new ones begin serving requests -- probably only a few seconds. During this interval, Heroku's routing layer will queue incoming requests.
TL;DR: users might notice a pause (but not an error!) as your application is updated. You can simulate this at any time by running heroku restart.

Rufus scheduler tasks on heroku running more often than scheduled

I have a Rails app running on heroku with Rufus Scheduler added on.
A 'once a day' task in the scheduler is running more often than once a day.
My guess would be something to do with the heroku app running on different dynos during the day, but I'm at a loss on how to confirm/fix the problem.
Has anyone else seen this/know of a solution?
Edit: I couldn't resolve the problem with the gem and have moved my app over to the heroku scheduler add on which does not experience this problem.
The Heroku scheduler isn't guaranteed, it's only a simple scheduling system designed to fill a gap. It's nothing to do with your application moving between dynos as it's a seperate management system spinning up one of processes.
If timeliness is essential to you, take a look at clockwork, which will let you configure all sorts of stuff, but also give you a bit more reliability (at the expense of having a clock process running).
If this won't do - simply rework your job so that it doesn't matter how often it runs.

Resources