Heroku restarting with SIGTERM status 143 - heroku

I have a scraper running on Heroku. It has been running for a while (+- 2 months) and it has days where it does great and reaches its 1,000 maximum and days during which it just magically restarts.
Does anyone know what the reason could be for such a restart? The scraper is showing no errors, the only thing I can find is the message below in the Heroku logs:
Feb 05 03:02:55 scraper heroku/web.1: Cycling
Feb 05 03:02:55 scraper heroku/web.1: State changed from up to starting
Feb 05 03:02:57 scraper heroku/web.1: Stopping all processes with SIGTERM
Feb 05 03:02:57 scraper heroku/web.1: Process exited with status 143
Feb 05 03:03:16 scraper heroku/web.1: Starting process with command `npm start`

The Cycling bit of the log here is the interesting one.
Heroku will restart dynos every 24h, and this process is called "cycling". This is what you're seeing here.

Related

can't start minio server in ubuntu with systemctl start minio

I configured a minio instance server on the ubuntu 18.04 with the guide from https://www.digitalocean.com/community/tutorials/how-to-set-up-an-object-storage-server-using-minio-on-ubuntu-18-04.
after the installation, the server failed to start with the command "sudo systemctl start minio", the error is saying :
root#iZbp1icuzly3aac0dmjz9aZ:~# sudo systemctl status minio
● minio.service - MinIO
Loaded: loaded (/etc/systemd/system/minio.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2021-12-23 17:11:56 CST; 4s ago
Docs: https://docs.min.io
Process: 9085 ExecStart=/usr/local/bin/minio server $MINIO_OPTS $MINIO_VOLUMES (code=exited, status=1/FAILURE)
Process: 9084 ExecStartPre=/bin/bash -c if [ -z "${MINIO_VOLUMES}" ]; then echo "Variable MINIO_VOLUMES not set in /etc/default/minio"; exit 1; fi (code=exited, status=0/SUCCESS)
Main PID: 9085 (code=exited, status=1/FAILURE)
Dec 23 17:11:56 iZbp1icuzly3aac0dmjz9aZ systemd[1]: minio.service: Main process exited, code=exited, status=1/FAILURE
Dec 23 17:11:56 iZbp1icuzly3aac0dmjz9aZ systemd[1]: minio.service: Failed with result 'exit-code'.
Dec 23 17:11:56 iZbp1icuzly3aac0dmjz9aZ systemd[1]: minio.service: Service hold-off time over, scheduling restart.
Dec 23 17:11:56 iZbp1icuzly3aac0dmjz9aZ systemd[1]: minio.service: Scheduled restart job, restart counter is at 5.
Dec 23 17:11:56 iZbp1icuzly3aac0dmjz9aZ systemd[1]: Stopped MinIO.
Dec 23 17:11:56 iZbp1icuzly3aac0dmjz9aZ systemd[1]: minio.service: Start request repeated too quickly.
Dec 23 17:11:56 iZbp1icuzly3aac0dmjz9aZ systemd[1]: minio.service: Failed with result 'exit-code'.
Dec 23 17:11:56 iZbp1icuzly3aac0dmjz9aZ systemd[1]: Failed to start MinIO.
It looks like the reason is the Variable MINIO_VOLUMES not set in /etc/default/minio.
However, I double check the file from /etc/default/minio
MINIO_ACCESS_KEY="minioadmin"
MINIO_VOLUMES="/usr/local/share/minio/"
MINIO_OPTS="-C /etc/minio --address localhost:9001"
MINIO_SECRET_KEY="minioadmin"
I have set the value MINIO_VOLUMES.
I tried to start manually with minio server --address :9001 /usr/local/share/minio/, it works.
now I don't know what goes wrong with starting the minio server by using the systemctl start minio
I'd recommend sticking to the official documentation wherever possible. It's intended for distributed deployments but the only real change is that your MINIO_VOLUMES will be for a single node/drive.
I would recommend trying a combination of things here:
Review minio.service and ensure the user/group exists
Review file path permissions on the MINIO_VOLUMES value
Now for the why:
My guess without seeing further logs (journalctl -u minio would have been helpful here) is that this is a combination of two things:
the minio.service user/group doesn't have rwx permissions on the /usr/local/share/minio path,
you are missing an environment variable we recently introduced to prevent users from pointing at their root drive (this was intended as a safety measure, but somewhat complicates these kinds of smaller setups).
Take a look at these lines in the minio.service file - I'm assuming that is what you are using based on the instructions in the DO guide.
If you ls -al /usr/local/share/minio I would venture it has ROOT permissions for user and group and limited write access if any.
Hope this helps - for further troubleshooting having at least 10-20 lines from journalctl is invaluable, as it would show the actual error and not just the final quit message.

Compute Engine start node server on instance startup

I am trying to run a discord bot in a node application in a free Compute Engine instance. I am struggling to make a script that actually starts the node app.
I created this script and added it as startup-script metadata from file:
cd code/movo-tron-2000 && npm start &
I checked that the script runs with sudo google_metadata_script_runner --script-type startup --debug, but when I restart the instance, the app doesn't start. Running sudo journalctl -u google-startup-scripts.service prints the following logs:
Apr 20 12:19:08 bot-vm systemd[1]: Starting Google Compute Engine Startup Scripts...
Apr 20 12:19:09 bot-vm startup-script[691]: INFO Starting startup scripts.
Apr 20 12:19:09 bot-vm startup-script[691]: INFO Found startup-script in metadata.
Apr 20 12:19:09 bot-vm startup-script[691]: INFO startup-script: /startup-od52epug/tmpjy_z4vue: line 1: cd: code/mo
Apr 20 12:19:09 bot-vm startup-script[691]: INFO startup-script: Return code 0.
Apr 20 12:19:09 bot-vm startup-script[691]: INFO Finished running startup scripts.
Apr 20 12:19:09 bot-vm systemd[1]: Started Google Compute Engine Startup Scripts.
I see that the script gets executed, but also gets terminated. The app listened for requests, so it shouldn't get terminated in order to run. I assume that the startup script gets run on the same thread as the google compute engine startup script so it gets terminated in order to continue the vm boot. What should I change in my startup script in order to start my app properly and not have it terminated by the instance?
Edit: I set up the following systemd service and script at their corresponding locations
Service:
[Unit]
Description=Start bot
[Service]
ExecStart=/home/me_adi_hf/code/movo-tron-2000/start.sh
[Install]
WantedBy=default.target
Script:
#!/bin/sh
date > /root/bot_report.txt
du -sh /home/ >> /root/bot_report.txt
But when running sudo systemd start bot.service and then checking it's status with sudo systemd status bot.service am getting this output, indicating Exec format error:
bot-start.service - Start bot
Loaded: loaded (/etc/systemd/system/bot-start.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2020-04-21 09:43:01 UTC; 9s ago
Process: 19303 ExecStart=/home/me_adi_hf/code/movo-tron-2000/start.sh (code=exited, status=203/EXEC)
Main PID: 19303 (code=exited, status=203/EXEC)
Apr 21 09:43:01 bot-vm systemd[1]: Started Start bot.
Apr 21 09:43:01 bot-vm systemd[19303]: bot-start.service: Failed at step EXEC spawning /home/me_adi_hf/code/movo-tron-2000/start.sh: Exec format error
Apr 21 09:43:01 bot-vm systemd[1]: bot-start.service: Main process exited, code=exited, status=203/EXEC
Apr 21 09:43:01 bot-vm systemd[1]: bot-start.service: Unit entered failed state.
Apr 21 09:43:01 bot-vm systemd[1]: bot-start.service: Failed with result 'exit-code'.
I am not sure what causes the error, since the service file syntax looks correct

Hadoop Container failed even 100 percent completed

I have setup a small cluster Hadoop 2.7, Hbase 0.98 and Nutch 2.3.1. I have wrote a custom job that simple first combine docs of same domain, after that each URL of domain (from cache i.e., a list) is first obtained from from cache and then corresponding key is used to fetched the object via datastore.get(url_key) and then after updating score, it is written via context.write.
The job should complete after all docs are processed but what I have observed that each attempt if failed due to timeout and progress is 100 percent complete show. Here is the LOG
attempt_1549963404554_0110_r_000001_1 100.00 FAILED reduce > reduce node2:8042 logs Thu Feb 21 20:50:43 +0500 2019 Fri Feb 22 02:11:44 +0500 2019 5hrs, 21mins, 0sec AttemptID:attempt_1549963404554_0110_r_000001_1 Timed out after 1800 secs Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143
attempt_1549963404554_0110_r_000001_3 100.00 FAILED reduce > reduce node1:8042 logs Fri Feb 22 04:39:08 +0500 2019 Fri Feb 22 07:25:44 +0500 2019 2hrs, 46mins, 35sec AttemptID:attempt_1549963404554_0110_r_000001_3 Timed out after 1800 secs Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143
attempt_1549963404554_0110_r_000002_0 100.00 FAILED reduce > reduce node3:8042 logs Thu Feb 21 12:38:45 +0500 2019 Thu Feb 21 22:50:13 +0500 2019 10hrs, 11mins, 28sec AttemptID:attempt_1549963404554_0110_r_000002_0 Timed out after 1800 secs Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143
What it is so i.e., when an attempt is 100.00 percent complete then it should be marked as successfull. Unfortunately, there is any error information other than timeout for my case. How to debug this problem ?
My reducer is somewhat posted to another question
Apache Nutch 2.3.1 map-reduce timeout occurred while updating the score
I have observed that, in the mentioned 3 logs the time required for execution is varied with big difference. Please look upto the job which you are executing once.

What is wrong with my systemd service

I have written a golang RestAPI based on labstack/echo and vuejs and have a working version compiled and everything runs nicely when I start it. So far so good.
However when trying to integrate it with systemd to start the process at boot I am stuck. I have a service file.
[Unit]
Description=Server Software Manager
After=network.target
[Service]
Type=simple
ExecStart=/var/gameserver/steam/sman
KillMode=process
User=steam
Group=steam
Restart=on-failure
SuccessExitStatus=2
[Install]
WantedBy=multi-user.target
Alias=sman.service
But everytime I want to start the service I get the following error.
Feb 25 14:17:49 <SERVERNAME> systemd[1]: Stopped Server Software Manager.
Feb 25 14:17:49 <SERVERNAME> systemd[1]: Started Server Software Manager.
Feb 25 14:17:49 <SERVERNAME> systemd[1]: sman.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Feb 25 14:17:49 <SERVERNAME> systemd[1]: sman.service: Unit entered failed state.
Feb 25 14:17:49 <SERVERNAME> systemd[1]: sman.service: Failed with result 'exit-code'.
Feb 25 14:17:50 <SERVERNAME> systemd[1]: sman.service: Service hold-off time over, scheduling restart.
Feb 25 14:17:50 <SERVERNAME> systemd[1]: Stopped Server Software Manager.
Feb 25 14:17:50 <SERVERNAME> systemd[1]: sman.service: Start request repeated too quickly.
Feb 25 14:17:50 <SERVERNAME> systemd[1]: Failed to start Server Software Manager.
Feb 25 14:19:59 <SERVERNAME> systemd[1]: Started Server Software Manager.
According to google that error is when the Service exits with error code but when I run the Service manually as the steam user it does not do that.
My assumption is that something is wrong with that unit file but I don't know what. And Systemd-analyze has also not complained.
I am completely lost and thankful for any leads you might have help debug this.
The output of jounarlctl -xfe -u sman:
Feb 26 14:18:23 <SERVERNAME> systemd[1]: Started Server Software Manager.
-- Subject: Unit sman.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit sman.service has finished starting up.
--
-- The start-up result is done.
Notes:
OS: Ubuntu 16.04 LTS
I finally managed to find out what was wrong. Systemd needs all Env Variables explicitly mentioned. However it looks like I had loads of troubles reloading systemd services. This is what i've ended up doing. the service needs to be stoped, the .service file moved to a location systemd does not read, after that reload systemd with systemctl daemon-reload and wait until systemctl status $SERVICE claims there is no such service. (Systemd might still reread the cache if status claims anything else) After that add the .service file again to the appropriate location and voila it finaly worked.

Kubernetes Installation with Vagrant & CoreOS behind proxy

I am behind a proxy server and following the "Kubernetes Installation with Vagrant & CoreOS" steps listed here: https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant.html
After finalizing the install, when I run
$ kubectl get nodes
I get the error.
Unable to connect to the server: Service Unavailable
e1, c1, and w1 are up and I can issue $vagrant ssh each of them.
when I check the w1, I have seen docker service was not running with the error listed below.
----------------------------------------------------------------------------
-- Unit docker.service has failed.
--
-- The result is dependency.
Aug 19 04:09:25 w1 systemd[1]: docker.service: Job docker.service/start failed with result 'dependency'.
Aug 19 04:09:25 w1 systemd[1]: flanneld.service: Unit entered failed state.
Aug 19 04:09:25 w1 systemd[1]: flanneld.service: Failed with result 'exit-code'.
Aug 19 04:09:30 w1 systemd[1]: flanneld.service: Service hold-off time over, scheduling restart.
Aug 19 04:09:30 w1 systemd[1]: Stopped Network fabric for containers.
-- Subject: Unit flanneld.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit flanneld.service has finished shutting down.
Aug 19 04:09:30 w1 systemd[1]: Starting Network fabric for containers...
-- Subject: Unit flanneld.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit flanneld.service has begun starting up.
Aug 19 04:09:30 w1 rkt[6888]: image: using image from file /usr/lib/rkt/stage1-images/stage1-fly.aci
Aug 19 04:09:31 w1 rkt[6888]: image: searching for app image quay.io/coreos/flannel
Aug 19 04:09:31 w1 rkt[6888]: run: discovery failed
Aug 19 04:09:31 w1 systemd[1]: flanneld.service: Main process exited, code=exited, status=1/FAILURE
Aug 19 04:09:31 w1 systemd[1]: Failed to start Network fabric for containers.
-- Subject: Unit flanneld.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit flanneld.service has failed.
--
-- The result is failed.
Aug 19 04:09:31 w1 systemd[1]: flanneld.service: Unit entered failed state.
Aug 19 04:09:31 w1 systemd[1]: flanneld.service: Failed with result 'exit-code'.
----------------------------------------------------------------------------
I am guessing that problem is because of I am behind the proxy. Before running the install steps I issue the commands
$export "HTTP_PROXY=http://http-proxy.xxxxxx.com:8080"
$export "HTTPS_PROXY=http://http-proxy.xxxxxx.com:8080"
$export "http_proxy=http://http-proxy.xxxxxx.com:8080"
$export "https_proxy=http://http-proxy.xxxxxx.com:8080"
Do you know if this is enough for the installation behind proxy or do I need to add proxy settings to somewhere else.
Thank you in advance,
turgos
The variables you're exporting are valid only in your current shell session, they are not available to your flannel systemd unit.
Create the following drop-in inside the systemd unit directory, and then reload the daemon with systemctl daemon-reload, it should fix your issue with flannel:
/etc/systemd/system/flannel.service.d/proxy.conf:
[Service]
Environment="HTTP_PROXY=http://http-proxy.xxx:8080"
Environment="...
A similar example is available in the CoreOS documentation: Customizing Docker

Resources