sidekiq - avoid clearing delayed_until jobs - ruby

I have sidekiq running on heroku that does a lot of syncing with user's emails etc.
Every so often, we were getting the following error:
Error R14 (Memory quota exceeded)
To counter this, I created a rake task that is executed by the heroku scheduler.
The rake task restarts all the dynos and flushes all the sidekiq jobs from redis with this code:
Sidekiq.redis { |r| r.flushall }
I have a new requirement whereby users want to schedule certain jobs to run in the future like this:
DeliverEmail.delay_until(email.send_time).perform_async(email.id)
Am I right in saying that flushall from the above code sample would flush any scheduled jobs that are created?
If that is the case, is there anything I can do to avoid this?

When you send a redis FLUSHALL command, it's truncating the entire redis datastore. This is a dangerous thing to do and probably not what you want.
It sounds like what you want to do is clear some types of enqueued work, while preserving others. You will need to flush each queue you are using, most likely just the default queue unless you've setup others:
Sidekiq::Queue.new('default').clear
This will remove the queue within redis, but preserve your scheduled jobs, statistics, and other data within redis.

Related

Laravel cron/queue/workers setup on multiple servers

I've got multiple servers sharing a database - on each of them a cron job fires ever 5 min checking if a text message log entry doesn't exist, creates a text message log entry and sends out a text message. I thought that there would never be a situation where text messages are sent multiple times, as one server should be first.
Well - I was wrong and that scenario did happen:
A - check if log exists - it doesn't
B - check if log exists - it doesn't
A - create log
B - create log
A - send message
B - send message
I've changed this behaviour to introduce queue, which should mitigate the issue. While the crons will still fire, multiple jobs will be queued, and workers should pick up given jobs at different times, thus preventing of sending of message twice. Though it might as well end up being:
A - pick up job 1
B - pick up job 2
A - check if log exists - it doesn't
B - check if log exists - it doesn't
Etc or A and B might as well pickup the same job at exactly the same time.
The solution would be, I guess, to run one worker server. But then I've the situation that jobs from multiple servers are queued many times, and I can't check if they're already enqueued as we end up with first scenario.
I'm at loss on how to proceed here - while multiple server, one worker server setup will work, I don't want to end up with instances of the same job (coming from different servers) multiple times in the queue.
Maybe the solution to go for is to have one cron/queue/worker server, but I don't have experience with Laravel/multiserver environment to set it up.
The other problematic thing for me is - how to test this? I can't, I guess, test it locally unless there's a way I can spin VM instances that are synchronized with each other.
The easy answer:
The code that checks the database for the existing database entry could use a database transaction with a level high enough to make sure that everyone else that is trying to do the same thing at the same time will be blocked and wait for the job to finish/commit.
A really naive solution (assuming mysql) would be LOCK TABLES entries WRITE; followed by the logic, then UNLOCK TABLES when you're done.
This also means that no one can access the table while your job is doing the check. I hope the check is really quick, because you'll block all access to the table for a small time period every five minutes.
WRITE lock:
The session that holds the lock can read and write the table.
Only the session that holds the lock can access the table. No other session can access it until the lock is released.
Lock requests for the table by other sessions block while the WRITE lock is held.
Source: https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html
That was a really boring answer, so I'll move on to the answer you're probably more interested in...
The server architecture answer:
Your wish to only have one job per time interval in your queue means that you should only have one machine dispatching the jobs. This is easiest done with one dedicated machine that only dispatches jobs from scheduled commands. (Laravel 5.5 introduced the ability to dispatch jobs directly from the scheduler; see Scheduling Queued Jobs)
You can then have an several worker machines processing the queue, and only one of them will pick up the job and execute it. Two worker machines will never execute the same job at the same time if everything works as usual*.
I would split up the web machines from the worker machines so that they can scale independently. I prefer having my web machines dedicated to web traffic, they are not processing jobs to make sure that any large amount of queued jobs will not affect my http response times.
So, I recommend the following machine types in your setup;
The scheduler - one single machine that runs the schedule and dispatches jobs.
Worker machines that handles your queue.
Web machines that handles visitors' traffic.
All machines will have identical source code for your Laravel application. They will also also have an identical configuration. The only think that is unique per machine type is ...
The scheduler has php artisan schedule:run in the crontab.
The workers have supervisor (or something similar) that runs php artisan queue:work.
The web servers have nginx + php-fpm and handles incoming web requests.
This setup will make sure that you will only get one job per 5 minute since there is only one machine that is pushing it. This setup will also make sure that the cpu load generated by the workers aren't affecting the web requests.
One issue with my answer is obvious; that single scheduler machine is a single point of failure. If it dies you will no longer have any of these scheduled jobs dispatched to the queue. That touches areas like server monitoring and health checks, which is out-of-scope of your question and are also highly dependant on your hosting provider.
Regarding that little asterisk; I can make up weird scenarios where a job is executed on several machines. This involves jobs that sleeps for longer than the timeout, while at the same time you've got an environment without support for terminating the job. This will cause the first worker to keep executing the job (since it cannot terminate it), and a second worker will consider the job as timed-out and retry it.
Since Laravel 5.6+ you can ensure your scheduled tasks only run on a single instance using the onOneServer function e.g.
$schedule->command('loggingTask')
->everyFiveMinutes()
->onOneServer();
This requires an APC or Redis cache to be set up because it seems to use a mutual exclusion lock, probably RedisLock if Redis is set up.
Using a queue you shouldn't really have such a problem because popping a task off a queue should be an atomic operation.
Source

Sidekiq - view completed jobs

Is it possible to somehow view sidekiq completed job list - for example, find all PurchaseWorkers with params (1)? Yesterday in my app delayed method that was supposed to run didn't and associated entity (lets say 'purchase') got stuck in limbo with state "processing". I am trying to understand whats the reason: job wasn't en-queued at all or was en-queued but for some reason exited unexpectedly. There were no errors in sidekiq log.
Thanks.
This is old but I wanted to see the same thing since I'm not sure if jobs I scheduled ran or not!
Turns out, Sidekiq doesn't have anything built in to see jobs that completed and still doesn't seem to.
If it err'd and never completes it should be in the 'dead' queue. But to check that something actually ran seems to be beyond Sidekiq by default.
The FAQ suggests installing 3rd party plugins to track and log information: https://github.com/mperham/sidekiq/wiki/FAQ#how-can-i-tell-when-a-job-has-finished One of them allows for having a callback to do follow up (maybe add a record for completed jobs elsewhere?)
You can also setup Sidekiq to log to somewhere other than STDOUT (default) so you can output log information about your jobs. In this case, logging that it's complete or catching errors if for some reason it is never landing in the retrying or dead jobs queue when there is a problem. See https://github.com/mperham/sidekiq/wiki/Logging
To see jobs still in queue you can use the Rails console and look at the queue by queue name https://www.rubydoc.info/gems/sidekiq/Sidekiq/Queue
One option is the default stats provided by sidekiq - https://github.com/mperham/sidekiq/wiki/Monitoring#using-the-built-in-dashboard
The best options is to use the Web UI provided here - https://github.com/mperham/sidekiq/wiki/Monitoring#web-ui

Heroku delayed_job workers killed during deployment

On Heroku, I use delayed_job to run asynchronous tasks. All is well until I do a git push heroku master and then the Heroku environment kills any worker threads that are in-process.
The issue here is that those jobs never get re-queued since the delayed_job table in my db shows them as still locked and running, even though the workers that used to be servicing them are long dead.
How do I prevent this situation from occurring? I'd like Heroku to wait for all delayed jobs in progress to complete or error out before closing down, or at least terminate them and allow a new worker to be assigned to them once the server comes back up post-reboot from changes being applied by my update.
Looks like you can configure DJ to handle SIGTERM and mark the in-progress jobs as failed (so they'll be restarted again):
Use this setting to throw an exception on TERM signals by adding this in your initializer:
Delayed::Worker.raise_signal_exceptions = :term
More info in this answer:
https://stackoverflow.com/a/16811844/1715829

how to remove sidekiq stuck job on heroku?

Some image resize jobs failed to exit when our heroku background worker was restarted.
The job is stuck in the busy page of the UI. It looks likes it is occupying one of the busy threads and was started over an hour ago.
But upon inspecting the job args and checking the DB, it looks like the images were actually processed, so maybe it's just redis, or the web UI contains wrong data.
Given the TID, and JID osuuiyruo 8e25ebc62ae7d7023a9b5650
Is there anyway to remove these "stuck" jobs? I tried quieting the workers and stopping them, and then scaling heroku workers to 0 then bringing them back up, but they stay in that stuck busy queue.
Please state your Sidekiq version in the future.
If you are on 3.x, this will be fixed in 3.1.4. https://github.com/mperham/sidekiq/issues/1764

Reliable persisted sidekiq task

I am working on a ruby application that creates todos and meetings.
There will be reminders that are sent out with respect to each meeting or todo as you would imagine.
We are already using sidekiq and it would be nice to use sidekiq to create the scheduled jobs in x number of days/hours etc.
My concern is that we will lose the jobs if redis restarts.
Am I write in assuming that if redis restarts, we lose the jobs and if so, is there anything that can be done about it?
If not sidekiq, what else could I use?
There are several ways of doing that, just go through the link http://redis.io/topics/persistence. Snapshotting is a technique to snapshots of the dataset on disk.

Resources