Kubernetes Pod terminates with Exit Code 143 - spring-boot

I am using a containerized Spring boot application in Kubernetes. But the application automatically exits and restarts with exit code 143 and error message "Error".
I am not sure how to identify the reason for this error.
My first idea was that Kubernetes stopped the container due to too high resource usage, as described here, but I can't see the corresponding kubelet logs.
Is there any way to identify the cause/origin of the SIGTERM? Maybe from spring-boot itself, or from the JVM?

Exit Code 143
It denotes that the process was terminated by an external signal.
The number 143 is a sum of two numbers: 128+x, # where x is the signal number sent to the process that caused it to terminate.
In the example, x equals 15, which is the number of the SIGTERM signal, meaning the process was killed forcibly.
Hope this helps better.

I've just run into this exact same problem. I was able to track down the origin of the Exit Code 143 by looking at the logs on the Kubernetes nodes (note, the logs on the node not the pod). (I use Lens as an easy way to get a node shell but there are other ways)
Then if you look in /var/log/messages for terminated you'll see something like this:
Feb 2 11:52:27 np-26992252-3 kubelet[23125]: I0202 11:52:27.541751 23125 kubelet.go:2214] "SyncLoop (probe)" probe="liveness" status="unhealthy" pod="default/app-compute-deployment-56ccffd87f-8s78v"
Feb 2 11:52:27 np-26992252-3 kubelet[23125]: I0202 11:52:27.541920 23125 kubelet.go:2214] "SyncLoop (probe)" probe="readiness" status="" pod="default/app-compute-deployment-56ccffd87f-8s78v"
Feb 2 11:52:27 np-26992252-3 kubelet[23125]: I0202 11:52:27.543274 23125 kuberuntime_manager.go:707] "Message for Container of pod" containerName="app" containerStatusID={Type:containerd ID:c3426d6b07fe3bd60bcbe675bab73b6b4b3619ef4639e1c23bca82692633765e} pod="default/app-comp
ute-deployment-56ccffd87f-8s78v" containerMessage="Container app failed liveness probe, will be restarted"
Feb 2 11:52:27 np-26992252-3 kubelet[23125]: I0202 11:52:27.543374 23125 kuberuntime_container.go:723] "Killing container with a grace period" pod="default/app-compute-deployment-56ccffd87f-8s78v" podUID=89fdc1a2-3a3b-4d57-8a4d-ab115e52dc85 containerName="app" containerID="con
tainerd://c3426d6b07fe3bd60bcbe675bab73b6b4b3619ef4639e1c23bca82692633765e" gracePeriod=30
Feb 2 11:52:27 np-26992252-3 containerd[22741]: time="2023-02-02T11:52:27.543834687Z" level=info msg="StopContainer for \"c3426d6b07fe3bd60bcbe675bab73b6b4b3619ef4639e1c23bca82692633765e\" with timeout 30 (s)"
Feb 2 11:52:27 np-26992252-3 containerd[22741]: time="2023-02-02T11:52:27.544593294Z" level=info msg="Stop container \"c3426d6b07fe3bd60bcbe675bab73b6b4b3619ef4639e1c23bca82692633765e\" with signal terminated"
The bit to look out for is containerMessage="Container app failed liveness probe, will be restarted"

Related

What is causing the systemd error: "Timer unit lacks value setting" (no other error messages?)

I have the following timer unit:
[Unit]
Description=Timer for Hive management command: process_payments
[Timer]
Unit=hive-manage#process_payments.service
OnCalendar=*:0/20
[Install]
WantedBy=hive.target
When I check the timer status using systemctl status hive-manage#process-payments.timer, I see the following error in the logs:
● hive-manage#process-payments.timer - Timer for Hive management command: process-payments
Loaded: error (Reason: Invalid argument)
Active: inactive (dead)
Mar 02 21:28:39 boldidea systemd[1]: hive-manage#process-payments.timer: Timer unit lacks value setting. Refusing.
Mar 02 21:39:06 boldidea systemd[1]: hive-manage#process-payments.timer: Timer unit lacks value setting. Refusing.
Mar 02 21:39:27 boldidea systemd[1]: hive-manage#process-payments.timer: Timer unit lacks value setting. Refusing.
After some searching, most people get an accompanying message that gives more detail on the error, however I am not getting any context other than "Timer unit lacks value setting".
This error is not very helpful -- I'm unaware of any setting named "value".
It turns out I had an older unit called process-payments, and it was later renamed to process_payments (underscore instead of hyphen). I was referencing the old name in my systemctl status command.

Why I got errors running redis in laravel app?

I need to run laravel 5 app on my local Kubuntu 18 and I need to run redis server for this app
I installed and in file /etc/redis/redis.conf I uncommented line :
requirepass foobared
in .env I modified redis config :
REDIS_HOST=http://127.0.0.1:8000 # I run app with command : php artisan serve
REDIS_PASSWORD=foobared
REDIS_PORT=6379 # default port
I restarted redis and check status:
$ sudo service redis status
[sudo] password for serge:
● redis-server.service - Advanced key-value store
Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-02-26 13:52:09 EET; 6min ago
Docs: http://redis.io/documentation,
man:redis-server(1)
Process: 1545 ExecStop=/bin/kill -s TERM $MAINPID (code=exited, status=0/SUCCESS)
Process: 1548 ExecStart=/usr/bin/redis-server /etc/redis/redis.conf (code=exited, status=0/SUCCESS)
Main PID: 1574 (redis-server)
Tasks: 4 (limit: 4915)
CGroup: /system.slice/redis-server.service
└─1574 /usr/bin/redis-server 127.0.0.1:6379
Feb 26 13:52:08 AtHome systemd[1]: Starting Advanced key-value store...
Feb 26 13:52:09 AtHome systemd[1]: redis-server.service: Can't open PID file /var/run/redis/redis-server.pid (yet?) after start: No such file or directory
Feb 26 13:52:09 AtHome systemd[1]: Started Advanced key-value store.
I see that file Can't open PID file not found above
But I have this file:
root#AtHome:/run/redis# ls -la
total 4
drwxr-sr-x 2 redis redis 60 Feb 26 14:32 .
drwxr-xr-x 38 root root 1160 Feb 26 14:32 ..
-rw-rw---- 1 redis redis 6 Feb 26 14:32 redis-server.pid
root#AtHome:/run/redis# cat redis-server.pid
22676
aand I got error on next command :
$ laravel-echo-server start
Error: The config file could not be found.
Is this errors as PID file was not found above?
How can it be fixed?
In composer.json :
"laravel/framework": "5.5.*",
"predis/predis": "^1.1",
MODIFIED # :
In /etc/redis/redis.conf I found
pidfile /var/run/redis/redis-server.pid
When I installed the ubuntu I installed /var on separate partition, so I have in /etc/fstab :
UUID=e531d8c5-530c-4533-a949-9fd5a62e0821 / ext4 errors=remount-ro 0 1
# /boot was on /dev/sdb1 during installation
UUID=23cc34a1-2be9-43b1-9c79-8e53af7bc799 /boot ext4 defaults 0 2
# /var was on /dev/sdb5 during installation
UUID=57c14b70-da85-4c5b-be6f-45174147d987 /var ext4 defaults 0 2
that is why /var/run/redis/redis-server.pid looks like /run/redis/redis-server.pid in my console command.
I do not know can that be key of this problem? How can it be salved ?
MODIFIED # 2 :
Yes, redis started and it shows in status command :
Active: active (running)
I see error message
Feb 26 13:52:09 AtHome systemd[1]: redis-server.service: Can't open PID file /var/run/redis/redis-server.pid
I do not know how critical is it ?
I need to run in the root of my app laravel-echo-server for redis working
but I got error :
$ laravel-echo-server start
Error: The config file could not be found.
I suppose that 2) depends on 1), but not sure, I am new in redis and very common
knowledge about laravel-echo-server...
MODIFIED # 3 :
Searching how test redis I found how made tests, but looks like not all works properly :
app_root$ redis-cli
127.0.0.1:6379> set greetings "Hello World!"
(error) NOAUTH Authentication required.
127.0.0.1:6379> get greetings
(error) NOAUTH Authentication required.
127.0.0.1:6379> exit
app_root$ sudo systemctl restart redis
[sudo] password for serge:
app_root$ redis-cli
127.0.0.1:6379> get greetings
(error) NOAUTH Authentication required.
127.0.0.1:6379>
But I am still not sure if it is reason that I got error running laravel-echo-server
Thanks!

Postgres Backup Restoration Issue

 my  oobjective is simple,  just a backup and retsore it  on other machine , which have no raltion with running cluter .
My steps .
1.  Remotly pg_basebackup on new machine .
2.  rm -fr ../../main/
3.  mv bacnkup/main/ ../../main/
4.  start postgres service
** During backup no error occur. **
But getting error:
2018-12-13 10:05:12.437 IST [834] LOG: database  system was shut down in recovery at 2018-12-12 23:01:58 IST
2018-12-13 10:05:12.437 IST [834] LOG:  invalid primary  checkpoint record
2018-12-13 10:05:12.437 IST [834] LOG: invalid secondary checkpoint record
2018-12-13 10:05:12.437 IST [834] PANIC: could not locate a valid checkpoint record
 2018-12-13 10:05:12.556 IST [833] LOG: startup process (PID 834) was terminated by signal 6: Aborted
 2018-12-13 10:05:12.556 IST [833] LOG: aborting  startup due to startup process failure
 2018-12-13 10:05:12.557 IST [833] LOG: database  system is shut down
Based on the answer to a very similar question (How to mount a pg_basebackup on a stand alone server to retrieve accidently deleted data and on the fact that that answer helped me get this working glitch-free, the steps are:
do the basebackup, or copy/untar previously made one, to the right location /var/lib/postgresql/9.5/main
remove the file backup_label
run /usr/lib/postgresql/9.5/bin/pg_resetxlog -f /var/lib/postgresql/9.5/main
start postgres service
(replying to this old question because it is the first one I found when looking to find the solution to the same problem).

Chronos insufficient resources warning

I'm trying to run Chronos on Mesos, but all my jobs are stuck in a queueing state.
systemctl status chronos -l shows:
Mar 20 20:21:08 core-mq3 chronos[17940]: [2017-03-20 20:21:08,985] WARN Insufficient resources remaining for task 'ct:1490040556081:0:JobName:', will append to queue. (Needed: [cpus: 0.5 mem: 256.0 disk: 256.0], Found: [cpus: 1.8 mem: 11034.0 disk: 60398.8,cpus: 2.0 mem: 6542.0 disk: 60399.0]) (org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:155)
So, it is refusing the offers even though all the resources are more than required.
This was a red herring. There was a constraint that the agent did not fulfill, which is why it couldn't run the task.
Running curl GET <chronos>/scheduler/jobs/search?name=<job> gave me all the details of the job, which I used to verify that the constraint was not being fulfilled.

HA - Pacemaker - Is there a way to clean automatically failed actions after X sec/min/hour?

I'm using Pacemaker + Corosync in Centos7
When one of my resource failed/stopped I/m getting a failed action message:
Master/Slave Set: myoptClone01 [myopt_data01]
Masters: [ pcmk01-cr ]
Slaves: [ pcmk02-cr ]
myopt_fs01 (ocf::heartbeat:Filesystem): Started pcmk01-cr
myopt_VIP01 (ocf::heartbeat:IPaddr2): Started pcmk01-cr
ServicesResource (ocf::heartbeat:RADviewServices): Started pcmk01-cr
Failed Actions:
* ServicesResource_monitor_120000 on pcmk02-cr 'unknown error' (1): call=141, status=complete, exitreason='none',
last-rc-change='Mon Jan 30 10:19:36 2017', queued=0ms, exec=142ms
Is there a way to clean automatically the failed actions after X sec/min/hour?
Look into the 'failure-timeout' resource option. This will automatically cleanup the failed action if no further failures for the particular resource has occurred within the value of failure-timeout.
I believe the failure-timeout is calculated during the cluster-recheck-interval. Which means that even if you have the failure-timeout configured to 1 minute it may still take up to 15 minutes and 59 seconds to clear the failed action with Pacemaker's default 15 minute cluster-recheck-interval.
More information:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-failure-migration.html
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-options.html

Resources