Celerybeat shuts down immediately after start - django-celery

I have a django app that is using celeryd and celerybeat. Both are set up to run as daemons.
The celerybeat tasks won't get executed because celerybeat does not start correctly. According to the logs it shuts down immediately:
[2012-05-04 13:02:49,055: WARNING/MainProcess] celerybeat v2.5.1 is starting.
[2012-05-04 13:02:49,122: INFO/MainProcess] process shutting down
[2012-05-04 13:02:49,122: DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[2012-05-04 13:02:49,134: DEBUG/MainProcess] running the remaining "atexit" finalizers
I'm starting it with /etc/int.d/celerybeat start
This is the /etc/default/celerybeat config:
# Where the Django project is.
CELERYBEAT_CHDIR="/var/www/path_to_app/cms/"
# Python interpreter from environment.
ENV_PYTHON="$CELERYBEAT_CHDIR/bin/python"
# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="cms.settings"
# Path to celerybeat
CELERYBEAT="$ENV_PYTHON $CELERYBEAT_CHDIR/cms/manage.py celerybeat"
# Extra arguments to celerybeat
CELERYBEAT_LOG_LEVEL="DEBUG"
CELERYBEAT_USER="www-data"
CELERYBEAT_GROUP="www-data"
The task schedule is set in settings.py:
CELERYBEAT_SCHEDULE = {
# Executes every morning at 7:00 A.M
"every-morning": {
"task": "cms.tasks.get_recent_posts_for_all_pages",
"schedule": crontab(hour=7, minute=00)
},
}
When I run celerybeat from the shell with ./manage.py celerybeat it seems to run fine.
There is also a celerybeat section in the celeryd config but I assume that one is ignored.
Regards
Simon

Maybe you're missing using a broker like rabbitmq
https://web.archive.org/web/20180703074815/http://celery.readthedocs.io/en/latest/getting-started/brokers/rabbitmq.html

Related

Hypercorn runs with duplicated process

I am not sure whether this is really hypercorn issue, but could not imagine what else can be. I have searched the net but have not found any topic close to this, so please bear with me.
I am running a server with hypercorn on Ubuntu 20.04, with python3.8.10.
The problem is that it is runs with a duplicated process in background.
root 2278497 0.8 0.1 41872 33568 pts/7 S 10:03 0:00 /usr/bin/python3 /usr/local/bin/hypercorn -c config.toml main:app --reload
root 2278499 0.0 0.0 17304 11332 pts/7 S 10:03 0:00 /usr/bin/python3 -c from multiprocessing.resource_tracker import main;main(4)
root 2278500 0.7 0.1 41648 34148 pts/7 S 10:03 0:00 /usr/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=7) --multiprocessing-fork
The main process is 2278497, but there are duplicated processes 2278499 and 2278450. I do not know why these are started.
This causes unwanted effects by executing twice the same tasks.
How can I avoid that?
EDIT:
A minimal example:
# test_main.py
from fastapi import FastAPI
app = FastAPI()
#app.get("/")
async def root():
return {"message": "Hello World"}
print("main module loaded.")
I then type:
sudo hypercorn test_main:app
and the stdout is:
main module loaded.
main module loaded.
[2022-11-02 15:08:45 +0100] [2364437] [INFO] Running on http://127.0.0.1:8000 (CTRL + C to quit)
I get the impression you're using Hypercorn wrong and it's not designed for you to run your own code in the same process.
That said, what you're seeing in your MWE is a master process and a worker process. You can distinguish between these by checking whether the current process is a "daemon" process (i.e. Unix-style background task) or not via:
import multiprocessing
print(multiprocessing.current_process().daemon)
this will output False for the master process, and True for all worker processes. E.g. this would increase to 5 Trues when executed as hypercorn -w5 test_main:app.
I think I'd suggest not using this hack in production and using another system (e.g. systemd or supervisord) to make sure that any background tasks are kept running. This would give you more control over them. You could still have the code in the same file, just behind the normal if __name__ == '__main__': guard.
Update with more complete example:
from fastapi import FastAPI
from multiprocessing import current_process
app = FastAPI()
# see https://asgi.readthedocs.io/en/latest/specs/lifespan.html
#app.on_event('startup')
async def on_startup():
print("asgi lifecycle startup event")
#app.on_event('shutdown')
async def on_shutdown():
print("asgi lifecycle shutdown event")
#app.get("/")
async def root():
return {"message": "Hello World"}
def main():
print("running as main module")
# see https://docs.python.org/3/library/__main__.html
if __name__ == "__main__":
import sys
sys.exit(main())
# warning, these will also execute if this module imported
if not current_process().daemon:
print("main module loaded into master process")
else:
print("main module loaded into worker process")
can be run as:
$ hypercorn -w2 test_main:app
main module loaded into master process
main module loaded into worker process
asgi lifecycle startup event
[2022-11-04 11:39:31 +0000] [24243] [INFO] Running on http://127.0.0.1:8000 (CTRL + C to quit)
main module loaded into worker process
asgi lifecycle startup event
[2022-11-04 11:39:31 +0000] [24244] [INFO] Running on http://127.0.0.1:8000 (CTRL + C to quit)
^C
asgi lifecycle shutdown event
asgi lifecycle shutdown event
$ python -m test_hyper
running as main module
not the the first line mentions "master process". This is Hypercorn's supervisor process which is responsible for looking after worker processes (e.g. clean shutdown / restarting). I also show that this code can recognise that it's being run as a main module, and could do different things there. This is because Hypercorn is importing this module into each process (i.e. whether it's a master or worker).

Spring Scheduler not working in google cloud run with cpu throttling off

Hello All I have a spring scheduler job running which has to be run on google cloud run with a scheduled time gap.
It works perfectly fine with docker-compose local deployment. It gets triggered without any issue.
Although it works fine locally in google cloud run service with CPU throttling off which keeps CPU 100% on always it is not working after the first run.
I will paste the docker file for any once reference but am pretty sure it is working fine
FROM maven:3-jdk-11-slim AS build-env
# Set the working directory to /app
WORKDIR /app
COPY pom.xml ./
COPY src ./src
COPY css-common ./css-common
RUN echo $(ls -1 css-common/src/main/resources)
# Build and create the common jar
RUN cd css-common && mvn clean install
# Build and the job
RUN mvn package -DskipTests
# It's important to use OpenJDK 8u191 or above that has container support enabled.
# https://docs.docker.com/develop/develop-images/multistage-build/#use-multi-stage-builds
FROM openjdk:11-jre-slim
# Copy the jar to the production image from the builder stage.
COPY --from=build-env /app/target/css-notification-job-*.jar /app.jar
# Run the web service on container startup
CMD ["java","-Djava.security.egd=file:/dev/./urandom","-jar","/app.jar"]
And below is the terraform script used for the deployment
resource "google_cloud_run_service" "job-staging" {
name = var.cloud_run_job_name
project = var.project
location = var.region
template {
spec {
containers {
image = "${var.docker_registry}/${var.project}/${var.cloud_run_job_name}:${var.docker_tag_notification_job}"
env {
name = "DB_HOST"
value = var.host
}
env {
name = "DB_PORT"
value = 3306
}
}
}
metadata {
annotations = {
"autoscaling.knative.dev/maxScale" = "4"
"run.googleapis.com/vpc-access-egress" = "all-traffic"
"run.googleapis.com/cpu-throttling" = false
}
}
}
timeouts {
update = "3m"
}
}
Something I noticed in the logs itself
2022-01-04T00:19:39.178057Z2022-01-04 00:19:39.177 INFO 1 --- [ionShutdownHook] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
Standard
2022-01-04T00:19:39.182017Z2022-01-04 00:19:39.181 INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
Standard
2022-01-04T00:19:39.194117Z2022-01-04 00:19:39.193 INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.
It is shutting down the entity manager. I provided -Xmx1024m heap memory to make sure it has enough memory.
Although in google documentation it has mentioned it should work I am not sure for some reason the scheduler not getting triggered. Any help would be really nice.
TL;DR: Using Spring Scheduler on Cloud Run is a bad idea. Prefer Cloud Scheduler instead
In fact, you have to understand what is the lifecycle of a Cloud Run instance. First of all, CPU is allocated to the process ONLY when a request is processed.
The immediate effect of that is that background process, like a scheduler, can't work, because there isn't CPUs allocated out of request processing.
Except if you set the CPU Throttling to off. You did it? Yes great, but there are another caveats!
An instance is created when a request comes in, and live up to 15 minutes without any request processing. Then the instance is offloaded and you scale to 0.
Here again, the scheduler can't work if the instance is shut down. The solution is to set the min instance to 1 AND the CPU throttling to false to keep 1 instance 100% up and let the scheduler do its job.
Final issue with Cloud Run, is the scalability. You set 4 in your terraform, that means, you can have up to 4 instances in parallel, and therefore 4 scheduler running in parallel, one on each instance. Is it really what you want? If not, you can set the max instance to 1 to limit the number of parallel instance to 1.
At the end, you have 1 instance, full time up, and that can't scale up and down. Because it can't scale, I don't recommend you to perform processing on the current instance but to call another API which run on another Cloud Run instance and that will be able to scale up and down according to the scheduler requirement.
And so, you will have only 1 scheduler that will perform API call to another Cloud Run services to perform task. That's the purpose of Cloud Scheduler.

Unresponsive socket after x time (puma - ruby)

I'm experiencing an unresponsive socket in with my Puma setup after random time. Up to this point I don't have a clue what's causing the issue. I was hoping somebody over here can help we with some answers or point me in the right direction. I'm having the following setup:
I'm using the official docker ruby-2.2.3-slim image together with the latest puma release 2.15.3, I've also installed Nginx as a reverse proxy. But I'm already sure Nginx isn't the problem over here because and I've tried to verify if the socket was working using this script. And the socket wasn't working, I got a timeout over there as well so I could ignore Nginx.
This is a testing environment so the server isn't experiencing any extreme load, I've also check memory consumption it has still several GB's of free space so that couldn't be the issue either.
What triggered me to look at the puma socket was the error message I got in my Nginx error logging:
upstream timed out (110: Connection timed out) while reading response header from upstream
Also I couldn't find anything in the logs of puma indicating what is going wrong, over here are my puma setup:
threads 0, 16
app_dir = ENV.fetch('APP_HOME')
environment ENV['RAILS_ENV']
daemonize
bind "unix://#{app_dir}/sockets/puma.sock"
stdout_redirect "#{app_dir}/log/puma.stdout.log", "#{app_dir}/log/puma.stderr.log", true
pidfile "#{app_dir}/pids/puma.pid"
state_path "#{app_dir}/pids/puma.state"
activate_control_app
on_worker_boot do
require 'active_record'
ActiveRecord::Base.connection.disconnect! rescue ActiveRecord::ConnectionNotEstablished
ActiveRecord::Base.establish_connection(YAML.load_file("#{app_dir}/config/database.yml")[ENV['RAILS_ENV']])
end
And this it the output in my puma state file:
---
pid: 43
config: !ruby/object:Puma::Configuration
cli_options:
conf:
options:
:min_threads: 0
:max_threads: 16
:quiet: false
:debug: false
:binds:
- unix:///APP/sockets/puma.sock
:workers: 1
:daemon: true
:mode: :http
:before_fork: []
:worker_timeout: 60
:worker_boot_timeout: 60
:worker_shutdown_timeout: 30
:environment: staging
:redirect_stdout: "/APP/log/puma.stdout.log"
:redirect_stderr: "/APP/log/puma.stderr.log"
:redirect_append: true
:pidfile: "/APP/pids/puma.pid"
:state: "/APP/pids/puma.state"
:control_url: unix:///tmp/puma-status-1449260516541-37
:config_file: config/puma.rb
:control_url_temp: "/tmp/puma-status-1449260516541-37"
:control_auth_token: cda8879717be7a645ea323d931b88d4b
:tag: APP
The application itself is a Rails app on the latest version 4.2.5, it's deployed on GCE (Google Container Engine).
If somebody could give me some pointer's on how to debug this any further would be very much appreciated. Because now I don't see any output anywhere which could help me any further.
EDIT
I replaced the unix socket with tcp connection to Puma with the same result, still hangs after x time
I'd start with:
How many requests get processed successfully per instance of puma?
Make sure you log the beginning and end of each request with the thread id of the thread executing it, what do you see?
Not knowing more about your application, I'd say it's likely the threads get stuck doing some long/blocking calls without timeouts or spinning on some computation until the whole thread pool gets depleted.
We'll see.
I finally found out why my application was behaving the way it was.
After trying to use a tcp connection and switching to Unicorn I start looking into other possible sources.
That's when I thought maybe my connection to Google Cloud SQL could be the problem. Once I read the faq of Cloud SQL, they mentioned that you have to tweak you Compute instances to ensure they keep open your DB connection. So I performed the next steps they recommend and that solved the problem for me, I added them just in case:
# Display the current tcp_keepalive_time value.
$ cat /proc/sys/net/ipv4/tcp_keepalive_time
# Set tcp_keepalive_time to 60 seconds and make it permanent across reboots.
$ echo 'net.ipv4.tcp_keepalive_time = 60' | sudo tee -a /etc/sysctl.conf
# Apply the change.
$ sudo /sbin/sysctl --load=/etc/sysctl.conf
# Display the tcp_keepalive_time value to verify the change was applied.
$ cat /proc/sys/net/ipv4/tcp_keepalive_time

systemd: How to use ExecStopPre in service files

Before my daemon is stopped I need to do call another program.
My first try was to use ExecStopPre similar to ExecStartPre but according to https://bugs.freedesktop.org/show_bug.cgi?id=73177 this is not supported and I should use "multiple ExecStop".
Anyone got an example for this? How should i kill the daemon from ExecStop?
You put multiple lines with ExecStop (from a node.js service): e.g.
[Service]
ExecStartPre=/usr/local/bin/npm run build
ExecStartPre=-/bin/rm local.sock
ExecStart=/usr/local/bin/npm --parseable start
ExecStop=/usr/local/bin/npm --parseable stop
ExecStop=-/bin/rm local.sock
RestartSec=300
Restart=always
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=nodejs
User=nobody
Group=nobody
Environment=NODE_ENV=dev
Environment=PORT=3000
WorkingDirectory=/var/www/nodejs/quaff
UMask=007

heroku apps not running forever script.js

I have a heroku app created on my local system and pushed it to heroku then i ran
heroku run node_modules/forever/bin/forever start server.js
and i got this response -
warn: --minUptime not set. Defaulting to: 1000ms
warn: --spinSleepTime not set. Your script will exit if it does not stay up for at least 1000ms
info: Forever processing file: server.js
and after that if i run
heroku run node_modules/forever/bin/forever list
I got -
Running `node_modules/forever/bin/forever list` attached to terminal... up, run.5132
info: No forever processes running
and the heroku logs here have -
Starting process with command `node_modules/forever/bin/forever start server.js` by harshitladdha93#gmail.com
2014-07-05T17:24:54.833343+00:00 heroku[run.5098]: State changed from starting to up
2014-07-05T17:24:58.695683+00:00 heroku[run.5098]: State changed from up to complete
2014-07-05T17:24:58.689043+00:00 heroku[run.5098]: Process exited with status 0
this, and my server.js has -
var async = require('async');
var shell = require('shelljs');
async.parallel([
async.apply(shell.exec, './collect1.sh'),
async.apply(shell.exec, './collect2.sh'),
async.apply(shell.exec, './collect3.sh'),
async.apply(shell.exec, './collect4.sh'),
async.apply(shell.exec, './collect5.sh'),
async.apply(shell.exec, './mi2.sh'),
],
function (err, results) {
console.log(results);
});
and these shell scripts are long executing files with huge amount of delays but the logs say that the state from up to complete, but I don't understand why as in my local system it creates more processes and sleep states and it runs fine
so, heroku doesn't allow this or I am making some mistake here.

Resources