Currently I have a sidekiq process running on a single dyno on heroku that goes through every user in a table and syncs their mail with gmail:
Something along the lines of:
User.all do |user|
# sync user's email
end
The process runs every 10 minutes but as you would expect, the more users we have, the more time it takes to sync and our users want to see their mail pretty quickly.
We want to scale out and increase the dynos.
Can anyone suggest a way that I can split the job over 2 dynos?
Should I have a separate query on each dyno that splits the users into 2 or is there a better way?
You could make a scheduler job that runs every 10 minutes, and enqueues all users:
User.find_each do |user|
# put job for this user in redis or something
end
then have your workers constantly fetching new jobs. Each job is then fetching/syncing the email for a single user, and not "all". Use find_each too so that you're not trying to load all users into memory ( http://guides.rubyonrails.org/active_record_querying.html#retrieving-multiple-objects-in-batches )
Re-architecting like this makes processing the sync/fetching easier to scale, as you can just add new worker dynos to increase throughput.
Related
I am using Laravel 5.1, and I have a task that takes around 2 minutes to process, and this task particularly is generating a report...
Now, it is obvious that I can't make the user wait for 2 minutes on the same page where I took user's input, instead I should process this task in the background and notify the user later about task completion...
So, to achieve this, Laravel provides Queues that runs the tasks in background (If I didn't understand wrong), Now for multi-user environment, i.e. if more than one user demands report generation (say there are 4 users), so being the feature named Queues, does it mean that tasks will be performed one after the other (i.e. when 4 users demand for report generation one after other, then 4th user's report will only be generated when report of 3rd user is generated) ??
If Queues completes their tasks one after other, then is there anyway with which tasks are instantly processed in background, on request of user, and user can get notified later when its task is completed??
Queue based architecture is little complicated than that. See the Queue provides you an interface to different messaging implementations like rabbitMQ, beanstalkd.
Now at any point in code you send send message to Queue which in this context is termed as a JOB. Now your queue will have multiple jobs which are ready to get out as in FIFO sequence.
As per your questions, there are worker which listens to queue, they get a job and execute them. It's up to you how many workers you want. If you have one worker your tasks will be executed one after another, more the workers more the parallel processes.
Worker process are started with command line interface of laravel called Artisan. Each process means one worker. You can start multiple workers with supervisor.
Since you know for sure that u r going to send notification to user after around 2 mins, i suggest to use cron job to check whether any report to generate every 2 mins and if there are, you can send notification to user. That check will be a simple one query so don't need to worry about performance that much.
In sidekiq you can specify the name of the queue you want to add that job into, so what's the benefit from separating jobs in multiple queues? why do we need to not put all jobs in the default queue?
In my current project we send a lot of jobs to Sidekiq. One type of job we submit about 200,000 jobs a day. For another type (sending email to users) we send maybe 100. If they were all in the same queue then there is a very good chance that "please confirm your account" email will be in the 200,001st spot and won't get run for a long time (hours?). By having multiple queues I can ensure that those 100 get sent out promptly.
I'm building a Heroku app that relies on scheduled jobs. We were previously using Heroku Scheduler but clock processes seem more flexible and robust. So now we're using a clock process to enqueue background jobs at specific times/intervals.
Heroku's docs mention that clock dynos, as with all dynos, are restarted at least once per day--and this incurs the risk of a clock process skipping a scheduled job: "Since dynos are restarted at least once a day some logic will need to exist on startup of the clock process to ensure that a job interval wasn’t skipped during the dyno restart." (See https://devcenter.heroku.com/articles/scheduled-jobs-custom-clock-processes)
What are some recommended ways to ensure that scheduled jobs aren't skipped, and to re-enqueue any jobs that were missed?
One possible way is to create a database record whenever a job is run/enqueued, and to check for the presence of expected records at regular intervals within the clock job. The biggest downside to this is that if there's a systemic problem with the clock dyno that causes it to be down for a significant period of time, then I can't do the polling every X hours to ensure that scheduled jobs were successfully run, since that polling happens within the clock dyno.
How have you dealt with the issue of clock dyno resiliency?
Thanks!
You will need to store data about jobs somewhere. On Heroku, you don't have any informations or warranty about your code being running only once and all the time (because of cycling)
You may use a project like this on (but not very used) : https://github.com/amitree/delayed_job_recurring
Or depending on your need you could create a scheduler or process which schedule jobs for the next 24 hours and is run every 4 hours in order to be sure your jobs will be scheduled. And hope that the heroku scheduler will work at least once every 24 hours.
And have at least 2 worker processing the jobs.
Though it requires human involvement, we have our scheduled jobs check-in with Honeybadger via an after_perform hook in rails
# frozen_string_literal: true
class ScheduledJob < ApplicationJob
after_perform do |job|
check_in(job)
end
private
def check_in(job)
token = Rails.application.config_for(:check_ins)[job.class.name.underscore]
Honeybadger.check_in(token) if token.present?
end
end
This way when we happen to have poorly timed restarts from deploys we at least know should-be scheduled work didn't actually happen
Would be interested to know if someone has a more fully-baked, simple solution!
In a Sinatra app I need to run on a daily basis a job in the background (I will probably use sidekiq for this) for each User of the app.
I'd like to distribute them evenly during the day according to the number of users. So, for instance if there are 12 users the job has to be executed once every two hour and if there are 240 users the job has to be executed every 6 minutes.
I understand there are some gems that allow you to schedule background jobs (Rufus scheduler, Whenever ...), however I'm not sure they allow to change the internal a job must be executed according to dynamic values such as the number of objects in a collection.
Any idea how I can achieve that?
Using whenever, you could get started like this:
In your user model, after a user is added successfully:
every (1440/User.all.count).to_i.minutes do
add your background command task
end
Also don't forget to update the whenever store which actually updates the cron.
system 'bundle exec whenever --update-crontab store'
Each task I have work in a short bursts, then sleep for about an hour and then work again and so on until the job is done. Some jobs may take about 10 hours to complete and there is nothing I can do about it.
What bothers me is that while job is sleeping resque worker would be busy, so if I have 4 workers and 5 jobs the last job would have to wait 10 hours until it can be processed, which is grossly unoptimal since it can work while any other worker is sleeping. Is there any way to make resque worker to process other job while current job is sleeping?
Currently I have a worker similar to this:
class ImportSongs
def self.perform(api_token, songs)
api = API.new api_token
songs.each_with_index do |song, i|
# make current worker proceed with another job while it's sleeping
sleep 60*60 if i != 0 && i % 100 == 0
api.import_song song
end
end
end
It looks like the problem you're trying to solve is API rate limiting with batch processing of the import process.
You should have one job that runs as soon as it's enqueued to enumerate all the songs to be imported. You can then break those down into groups of 100 (or whatever size you have to limit it to) and schedule a deferred job using resque-scheduler in one hour intervals.
However, if you have a hard API rate limit and you execute several of these distributed imports concurrently, you may not be able to control how much API traffic is going at once. If you have that strict of a rate limit, you may want to build a specialized process as a single point of control to enforce the rate limiting with it's own work queue.
With resque-scheduler, you'll be able to repeat discrete jobs at scheduled or delayed times as an alternative to a single, long running job that loops with sleep statements.