How to see celery tasks in redis queue when there is no worker? - django-celery

I have a container creating celery tasks, and a container running a worker.
I have removed the worker container, so I expected that tasks would accumulate in the redis list of tasks.
But I can't see any tasks in redis.
This is with django. I need to isolate the worker and queue, hence the settings
A typical queue name is 'test-dear', that is, SHORT_HOSTNAME='test-dear'
CELERY_DATABASE_NUMBER = 0
CELERY_BROKER_URL = f"redis://{REDIS_HOST}:6379/{CELERY_DATABASE_NUMBER}"
CELERY_RESULT_BACKEND = f"redis://{REDIS_HOST}:6379/{CELERY_DATABASE_NUMBER}"
CELERY_BROKER_TRANSPORT_OPTIONS = {'global_keyprefix': SHORT_HOSTNAME }
CELERY_TASK_DEFAULT_QUEUE = SHORT_HOSTNAME
CELERY_TASK_ACKS_LATE = True
After starting everything, and stopping the worker, I add tasks.
For example, on the producer container after python manage.py shell
>>> from cached_dear import tasks
>>> t1 = tasks.purge_deleted_masterdata_fast.delay()
<AsyncResult: 9c9a564a-d270-444c-bc71-ff710a42049e>
t1.get() does not return.
then in redis:
127.0.0.1:6379> llen test-dear
(integer) 0
I was not expecting 0 entries.
What I am doing wrong or not understanding?

I did this from the redis container
redis-cli monitor | grep test-dear
and sent a task.
The list is test-deartest-dear and
llen test-deartest-dear
works to show the number of tasks which have not yet been sent to a worker.
The queue name is f"{global_keyprefix}{queue_name}

Related

get running job sidekiq from worker (ruby)

workers = Sidekiq::Workers.new
workers.each do |_process_id, _thread_id, work|
p work['payload']['jid']
end
How can I get the job from the worker?
I have the jid but job aren't in Queue because still running...

sge All queues dropped because of overload or full

I'm going to run a million batch jobs with " sge ".
Approximately 10,000 jobs are well executed, but after an hour of execution, they stop running.
After about an hour's run, the process slows down and eventually stops.
Checking the error message does not confirm any errors.
i can check the message below only.
"All queues dropped because of overload or full"
How do I set up the layout to run normally?
there is one master server and four clients and files share using nfs
and every system run on docker and docker-swirm
do qstat when job execution speed was slow down
$qstat -j
queue instance "peteris.q#sge00" dropped because it is full
queue instance "peteris.q#sge02" dropped because it is full
queue instance "peteris.q#sge03" dropped because it is full
queue instance "peteris.q#sge01" dropped because it is full
All queues dropped because of overload or full
detail messages
$qstat -j 1595799
=============================================================
job_number: 1595799
exec_file: job_scripts/1595799
submission_time: Sun May 27 08:08:10 2018
owner: root
uid: 0
group: root
gid: 0
sge_o_home: /root
sge_o_path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
sge_o_workdir: /data/23andMe
sge_o_host: sge
account: sge
cwd: /data/23andMe
mail_list: root#sge
notify: FALSE
job_name: python3
jobshare: 0
env_list:
job_args: lineage.py,makeShell/1009_user3130_user3600.list
script_file: python3
usage 1: cpu=00:00:02, mem=0.59503 GBs, io=0.03963, vmem=493.180M, maxvmem=493.180M
scheduling info: queue instance "peteris.q#sge00" dropped because it is full
queue instance "peteris.q#sge02" dropped because it is full
queue instance "peteris.q#sge03" dropped because it is full
queue instance "peteris.q#sge01" dropped because it is full
All queues dropped because of overload or full
sge config
algorithm default
schedule_interval 0:0:10
maxujobs 0
queue_sort_method load
job_load_adjustments np_load_avg=100.0
load_adjustment_decay_time 0:7:30
load_formula np_load_avg
schedd_job_info true
flush_submit_sec 2
flush_finish_sec 2
params none
reprioritize_interval 0:0:0
halftime 168
usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor 5.000000
weight_user 0.250000
weight_project 0.250000
weight_department 0.250000
weight_job 0.250000
weight_tickets_functional 0
weight_tickets_share 0
share_override_tickets TRUE
share_functional_shares TRUE
max_functional_jobs_to_schedule 200
report_pjob_tickets TRUE
max_pending_tasks_per_job 50
halflife_decay_list none
policy_hierarchy OFS
weight_ticket 0.500000
weight_waiting_time 0.278000
weight_deadline 3600000.000000
weight_urgency 0.500000
weight_priority 0.000000
max_reservation 0
default_duration INFINITY
sge queue config
qname peteris.q
hostlist #allhosts
seq_no 0
load_thresholds NONE
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:00:05
priority 0
min_cpu_interval 00:00:05
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make
rerun FALSE
slots 20
tmpdir /tmp
shell /bin/bash
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:01
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt INFINITY
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
Seems like you have hit a practical limit on the number of active jobs that the queue can handle at any given time. I cannot confirm where the maximum is defined by SGE, but seems likely it is:
max_jobs
The number of active (not finished) jobs simultaneously
allowed in Sun Grid Engine is controlled by this parameter.
A value greater than 0 defines the limit. The default value
0 means "unlimited". If the max_jobs limit is exceeded by a
job submission then the submission command exits with exit
status 25 and an appropriate error message.
Changing max_jobs will take immediate effect.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
From: http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_conf.html?pathrev=V62u5_TAG
If this is correct then value is unlimited; however, SGE will likely not perform well trying to manage ~1 million active jobs, hence the issue you are likely having. I would recommend you use job arrays, as this is the purpose of this type of job ie, to manage and run many near identical tasks.
There are many resources online for job arrays in SGE, such as this one:
http://wiki.gridengine.info/wiki/index.php/Simple-Job-Array-Howto
http://talby.rcs.manchester.ac.uk/~ri/_linux_and_hpc_lib/sge_array.html
https://wiki.duke.edu/display/SCSC/SGE+Array+Jobs
I am happy to assist further if you edit your question with specific requirements for each task. For example, does each of the ~ 1 millions tasks require one or more parameters as input?

How do You monitor sidekiq processes?

I'm working on a production app that has multiple rails servers behind nginx loadbalancer. We are monitoring sidekiq processes with monit, and it works just fine - when sidekiq proces dies monit starts it right back.
However recently encountered a situation where one of these processes was running and visible to monit, but for some reason not visible to sidekiq. That resulted in many failed jobs and took us some time to notice that we're missing one process in sidekiq Web UI, since monit was telling us everything was fine and all processes were running. Simple restart fixed the problem.
And that bring me to my question: how do you monitor your sidekiq processes? I know i can use something like rollbar to notify me when jobs fail, but i'd like to know if there is a way to monitor process count and preferably send mail when one dies. Any suggestions?
Something that would ping sidekiq/stats and verify response.
My super simple solution to a similar problem looks like this:
# sidekiq_check.rb
namespace :sidekiq_check do
task rerun: :environment do
if Sidekiq::ProcessSet.new.size == 0
exec 'bundle exec sidekiq -d -L log/sidekiq.log -C config/sidekiq.yml -e production'
end
end
end
and then using cron/whenever
# schedule.rb
every 5.minutes do
rake 'sidekiq_check:rerun'
end
We ran into this problem where our sidekiq processes had stopped working off jobs overnight and we had no idea. It took us about 30 minutes to integrate http://deadmanssnitch.com by following these instructions.
It's not the prettiest or cheapest option but it gets the job done (integrates nicely with Pagerduty) and has saved our butt twice in the last few months.
On of our complaints with the service is the shortest grace interval we can set is 15 minutes which is too long for us. So we're evaluating similar services like Healthchecks, etc.
My approach is the following:
create a background job that does something
call the job regularly
check that the thing is being done!
so; using a cron script (or something like whenever) every 5 mins, I run :
CheckinJob.perform_later
It's now up to sidekiq (or delayed_job, or whatever active job you're using) to actually run the job.
The job just has to do something which you can check.
I used to get the job to update a record in my Status table (essentially a list of key/value records). Then I'd have a /status page which returns a :500 status code if the record hasn't been updated in the last 6 minutes.
(obviously your timing may vary)
Then I use a monitoring service to monitor the status page! (something like StatusCake)
Nowdays I have a simpler approach; I just get the background job to check in with a cron monitoring service like
IsItWorking
Dead Mans Snitch
Health Checks
The monitoring service which expects your task to check in every X mins. If your task doesn't check in - then the monitoring service will let you know.
Integration is dead simple for all the services. For Is It Working it would be:
IsItWorkingInfo::Checkin.ping(key:"CHECKIN_IDENTIFIER")
full disclosure: I wrote IsItWorking !
I use god gem to monitor my sidekiq processes. God gem makes sure that your process is always running and also can notify the process status on various channels.
ROOT = File.dirname(File.dirname(__FILE__))
God.pid_file_directory = File.join(ROOT, "tmp/pids")
God.watch do |w|
w.env = {'RAILS_ENV' => ENV['RAILS_ENV'] || 'development'}
w.name = 'sidekiq'
w.start = "bundle exec sidekiq -d -L log/sidekiq.log -C config/sidekiq.yml -e #{ENV['RAILS_ENV']}"
w.log = "#{ROOT}/log/sidekiq_god.log"
w.behavior(:clean_pid_file)
w.dir = ROOT
w.keepalive
w.restart_if do |restart|
restart.condition(:memory_usage) do |c|
c.interval = 120.seconds
c.above = 100.megabytes
c.times = [3, 5] # 3 out of 5 intervals
end
restart.condition(:cpu_usage) do |c|
c.interval = 120.seconds
c.above = 80.percent
c.times = 5
end
end
w.lifecycle do |on|
on.condition(:flapping) do |c|
c.to_state = [:start, :restart]
c.times = 5
c.within = 5.minute
c.transition = :unmonitored
c.retry_in = 10.minutes
c.retry_times = 5
c.retry_within = 1.hours
end
end
end

multiple sidekiq queue for an sinatra application

We have a Ruby on Sinatra application. We use sidekiq and redis for queue process.
We already implemented and using sidekiq that queues up jobs that does insertion into database. it works pretty fine till now.
Now I wanted to add another jobs which will read bulk data from database and export to csv file.
I donot want both this job to be in same queue instead is there possible to create different queue for these jobs in same application?
Please give some solution.
You probably need advanced queue options. Read about them here: https://github.com/mperham/sidekiq/wiki/Advanced-Options
Create csv queue from command line (it can be done in config file as well):
sidekiq -q csv -q default
Then in your worker:
class CSVWorker
include Sidekiq::Worker
sidekiq_options :queue => :csv
# perform method
end
take a look at sidekiq wiki: https://github.com/mperham/sidekiq/wiki/Advanced-Options
by default everything goes inside 'default' queue but you can specify a queue in your worker:
sidekiq_options :queue => :file_queue
and to tell sidekiq to process your queue, you have to either declare it in configuration file:
:queues:
- file_queue
- default
or pass it as argument to the sidekiq process: sidekiq -q file_queue

Sidekiq multiple workers?

I have a question regarding Sidekiq. I come from the Resque paradigm, and in the current application I launch one worker per queue, so in the terminal I would do:
rake resque:work QUEUE='first'
rake resque:work QUEUE='second'
rake resque:work QUEUE='third'
Then, If I want more workers, for example for the third queue, I just create more workers as:
rake resque:work QUEUE='third'
My question is...
With Sidekiq, how would you start with multiple workers? I know you can do this:
sidekiq -q first, -q second, -q third
But that would just start one worker that fetches all those queues. So, how would I go to start three workers, and tell each worker to just focus on a particular queue? Also, how would I do that in Heroku?
You could use a config file in config/sidekiq.yml :
# Sample configuration file for Sidekiq.
# Options here can still be overridden by cmd line args.
# sidekiq -C config.yml
---
:verbose: true
:pidfile: ./tmp/pids/sidekiq.pid
:concurrency: 15
:timeout: 5
:queues:
- [first, 20]
- [second, 20]
- [third, 1]
staging:
:verbose: false
:concurrency: 25
production:
:verbose: false
:concurrency: 50
:timeout: 60
That way you can configure exactly what you want, and to answer precisely your question the concurrency value is what you are loking for, it defines the number of concurrent workers executed.
More info here : https://github.com/mperham/sidekiq/wiki/Advanced-Options
So, how would I go to start three workers, and tell each worker to
just focus on a particular queue?
You can define on the worker-level in which queue it should be placed via sidekiq_options.
To place for example your worker in a queue called "first" just define it with
class MyWorker
include Sidekiq::Worker
sidekiq_options :queue => :first
...
end

Resources