systemd : two services are running together - systemd

I use a simple script
[Unit]
Description = description here
After = multi-user.target
[Service]
type=simple
ExecStart = /usr/lib/name_deamon/CP_linux/CP_linux
Restart = on-failure
TimeoutStopSec = infinity
[Install]
WantedBy=custom.target
then when typing
systemctl --user status name.service
i get two identical process running in parallel
● name.service - description here
Loaded: loaded (/home/ubuntu/.config/systemd/user/name.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-11-02 11:03:47 CET; 13min ago
Main PID: 1625 (CP_linux_test.e)
Tasks: 2 (limit: 4384)
Memory: 14.2M
CPU: 122ms
CGroup: /user.slice/user-1000.slice/user#1000.service/app.slice/name.service
├─1625 /usr/lib/name_deamon/CP_linux_test/CP_linux_test.exe
└─1627 /usr/lib/name_deamon/CP_linux_test/CP_linux_test.exe
Since i have one ExecStart i don't understand why i get two process running in parallel.

The main process (PID 1625) is most probably forking another process (PID 1627).
To check that the parent process of 1627 is 1625: ps -o ppid 1627

Related

Redhat Codeready Container Failed to start (crc start error): .crcbundle not found

I am receiving the following error when executing 'crc start -p .\pull-secret.txt' command:
/home/admin/.crc/cache/crc_libvirt_4.9.0.crcbundle not found, please provide the path to a valid bundle using the -b option`
crc setup --log-level debug` Debug below:
INFO Checking if libvirt daemon is running
DEBU Checking if libvirtd service is running
DEBU Running 'systemctl status virtqemud.socket'
DEBU Command failed: exit status 3
DEBU stdout: * virtqemud.socket - Libvirt qemu local socket
Loaded: loaded (/usr/lib/systemd/system/virtqemud.socket; disabled; vendor preset: disabled)
Active: inactive (dead)
Listen: /run/libvirt/virtqemud-sock (Stream)
DEBU stderr:
DEBU virtqemud.socket is neither running nor listening
DEBU Running 'systemctl status libvirtd.socket'
DEBU libvirtd.socket is running
INFO Checking if systemd-networkd is running
DEBU Checking if systemd-networkd.service is running
DEBU Running 'systemctl status systemd-networkd.service'
DEBU Command failed: exit status 4
DEBU stdout:
DEBU stderr: Unit systemd-networkd.service could not be found.
DEBU systemd-networkd.service is not running
INFO Checking crc daemon systemd service
DEBU Checking crc daemon systemd service
DEBU Checking if crc-daemon.service is running
DEBU Running 'systemctl --user status crc-daemon.service'
DEBU Command failed: exit status 3
DEBU stdout: * crc-daemon.service - CodeReady Containers daemon
Loaded: loaded (/home/admin/.config/systemd/user/crc-daemon.service; static; vendor preset: enabled)
Active: inactive (dead)
DEBU stderr:
DEBU crc-daemon.service is neither running nor listening
DEBU Checking if crc-daemon.service has the expected content
INFO Checking if systemd-networkd is running
DEBU Checking if systemd-networkd.service is running
DEBU Running 'systemctl status systemd-networkd.service'
DEBU Command failed: exit status 4
DEBU stdout:
DEBU stderr: Unit systemd-networkd.service could not be found.
DEBU systemd-networkd.service is not running
I then ran the following commands:
sudo yan install qemu qemu-kvm libvirt-clients libvirt-daemon-system virtinst bridge-utils
Output:
>> Error: Unable to find a match: qemu libvirt-clients libvirt-daemon-system virtinst bridge-utils
>> [admin#localhost ~]$ systemctl status libvirtd.service
- Test systemctl
>> systemctl status libvirtd.service
crc setup output:
[admin#localhost ~]$ crc setup
INFO Checking if running as non-root
INFO Checking if running inside WSL2
.......... Details removed ..........
INFO Checking if CRC bundle is extracted in '$HOME/.crc'
Your system is correctly setup for using CodeReady Containers, you can now run 'crc start -b $bundlename' to start the OpenShift cluster
I cannot seem to find .crcbundle file despite setup completing successfully.
Nothing found under:
#This seems to be an issue as I cannot find '.crcbundle'
[admin#localhost ~]$ tree --noreport .crc
.crc
├── bin
│   ├── crc -> /home/admin/bin/crc
│   ├── crc-admin-helper-linux
│   └── crc-driver-libvirt
├── crc-http.sock
├── crc.json
└── crc.log
OS info:
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.5
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.5"
Thanks in advance.
To resolve the problem you should execute the next commands with root user:
yum install qemu-kvm libvirt libvirt-daemon-kvm
systemctl start libvirtd
systemctl enable libvirtd
systemctl start virtnetworkd
systemctl enable virtnetworkd
systemctl start virtstoraged
systemctl enable virtstoraged
Then you could execute with a non root user session the next commands:
crc setup
crc start -p pull-secret.txt

bash in systemctl . error 2 launch at start

lHello, in preparation for using a RP4 (running ubuntu server) , i am trying to have a bash script that is kicked off on boot... and relaunches is killed. i have included the steps belle along with the content of the file. Any clue on the error code or why it is not work would be greatly appreciated.
any idea on the exit code with a status of 2?
thank you.
uburntu#ubuntu:/etc/systemd/system$ cat prysmbeacon_altona.service
[Unit]
Description=PrysmBeacon--Altona
Wants=network.target
After=network.target
[Service]
Type=simple
DynamicUser=yes
ExecStart=/home/ubuntu/Desktop/prysm/prysm.sh beacon-chain --altona --datadir=/home/ubuntu/.eth2
WorkingDirectory=/home/ubuntu/Desktop/prysm
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
ubuntu#ubuntu:/etc/systemd/system$ systemctl daemon-reload
==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ===
Authentication is required to reload the systemd state.
Authenticating as: Ubuntu (ubuntu)
Password:
==== AUTHENTICATION COMPLETE ===
ubuntu#ubuntu:/etc/systemd/system$ systemctl start prysmbeacon_altona
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to start 'prysmbeacon_altona.service'.
Authenticating as: Ubuntu (ubuntu)
Password:
==== AUTHENTICATION COMPLETE ===
ubuntu#ubuntu:/etc/systemd/system$ systemctl status prysmbeacon_altona.service
● prysmbeacon_altona.service - PrysmBeacon--Altona
Loaded: loaded (/etc/systemd/system/prysmbeacon_altona.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Thu 2020-07-23 15:51:48 CEST; 111ms ago
Process: 3407 ExecStart=/home/ubuntu/Desktop/prysm/prysm.sh beacon-chain --altona --datadir=/home/ubuntu/.eth2 (code=exited, status=2)
Main PID: 3407 (code=exited, status=2)
ubuntu#ubuntu:/etc/systemd/system$

Autostart `slurmd` service on computes after reboot

I am calling scontrol reboot <nodename> to reboot compute nodes in my SLURM cluster.
The reboot usually times out (seen from SLURM) and the node is set to state "DOWN".
(RESUME_TIMEOUT is set to 300).
This presumably happens because the slurmd service does not autostart itself after boot.
By default, the service is "disabled":
[root#c1 ~]# systemctl status slurmd
● slurmd.service - Slurm node daemon
Loaded: loaded (/usr/lib/systemd/system/slurmd.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Activating it using systemctl enable slurmd does not last after the next reboot, the service is again "disabled" then.
I assume this is because the change does not happen in the image which is used for booting.
How can I enable the slurmd service on the computes so that it starts on boot and scontrol reboot works?
This is probably not the recommended way, but I setup a mini cluster at work and the way I fixed it was with a cronjob:
#reboot /usr/bin/scontrol update nodename=[put hostname here] state=resume
I got a reply from Antanas Budriūnas via the OpenHPC mailing list which solved the issue.
(execute on master node)
# chroot /<path>/<to>/<cnode>/<image>
# systemctl enable slurmd
# exit

Can't kill processes (originating in a docker container)

I run a docker cluster with a few thousand containers and a few times per day randomly I have a process that gets "stuck" blocking a container from stopping. Below is an example container with its corresponding process and all things I have tried to kill the container / process.
The container:
# docker ps | grep 950677e2317f
950677e2317f 7e553d1d9f6f "/bin/sh -c /minecraf" 2 days ago Up 2 days 0.0.0.0:22661->22661/tcp, 0.0.0.0:22661->22661/udp, 0.0.0.0:37681->37681/tcp, 0.0.0.0:37681->37681/udp gloomy_jennings
Try to stop container using docker daemon (it tries forever without result):
# time docker stop --time=1 950677e2317f
^C
real 0m13.508s
user 0m0.036s
sys 0m0.008s
Daemon log while trying to stop:
# journalctl -fu docker.service
-- Logs begin at Fri 2015-12-11 15:40:55 CET. --
Dec 31 23:30:33 m3561.contabo.host docker[9988]: time="2015-12-31T23:30:33.164731953+01:00" level=info msg="POST /v1.21/containers/950677e2317f/stop?t=1"
Dec 31 23:30:34 m3561.contabo.host docker[9988]: time="2015-12-31T23:30:34.165531990+01:00" level=info msg="Container 950677e2317fcd2403ef5b5ffad37204e880136e91f76b0a8682e04a93e80942 failed to exit within 1 seconds of SIGTERM - using the force"
Dec 31 23:30:44 m3561.contabo.host docker[9988]: time="2015-12-31T23:30:44.165954266+01:00" level=info msg="Container 950677e2317f failed to exit within 10 seconds of kill - trying direct SIGKILL"
Looking into the processes running on the machine reveals the zombie process (pid 11991 on host machine):
# ps aux | grep [1]1991
root 11991 84.3 0.0 5836 132 ? R Dec30 1300:19 bash -c (echo stop > /tmp/minecraft &)
# top -b | grep [1]1991
11991 root 20 0 5836 132 20 R 89.5 0.0 1300:29 bash
And it is indeed a process running within our container (check container id):
# cat /proc/11991/mountinfo
...
/var/lib/docker/containers/950677e2317fcd2403ef5b5ffad37204e880136e91f76b0a8682e04a93e80942/resolv.conf /etc/resolv.conf rw,relatime - ext4 /dev/sda2 rw,errors=remount-ro,data=ordered
Trying to kill the process yields nothing:
# kill -9 11991
# ps aux | grep [1]1991
root 11991 84.3 0.0 5836 132 ? R Dec30 1303:58 bash -c (echo stop > /tmp/minecraft &)
Some overview data:
# docker version
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:20:08 UTC 2015
OS/Arch: linux/amd64
Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:20:08 UTC 2015
OS/Arch: linux/amd64
# docker info
Containers: 189
Images: 322
Server Version: 1.9.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 700
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.0-19-generic
Operating System: Ubuntu 15.10
CPUs: 24
Total Memory: 125.8 GiB
Name: m3561.contabo.host
ID: ZM2Q:RA6Q:E4NM:5Q2Q:R7E4:BFPQ:EEVK:7MEO:YRH6:SVS6:RIHA:3I2K
# uname -a
Linux m3561.contabo.host 4.2.0-19-generic #23-Ubuntu SMP Wed Nov 11 11:39:30 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
If stopping the docker daemon the process still lives. The only way to get rid of the process is to restart the host machine. As this happens fairly frequently (requires every node to restart every 3-7 days) it has a serious impact on the uptime of the overall cluster.
Any ideas on what to do here?
Okay, I think I found the root cause of this. The folks over at Docker helped me out, check out this thread on GitHub.
It turns out this most likely is a bug in the Linux Kernel 4.19+. I'll be rolling back to an older version until it is fixed.
UPDATE: I've been running 3.* only in my cluster for several days now without any issues. This was most certainly a kernel bug.
I had a similar problem and switching to use overlay2 storage driver made the problem go away. Changing the storage driver will loose all docker state (images & containers). It seems that the aufs storage driver has some problems that might be a source of lock ups.

NFS Vagrant on Fedora 22

I'm trying to run Vagrant using libvirt as my provider. Using rsync is unbearable since I'm working with a huge shared directory, but vagrant does succeed when the nfs setting is commented out and the standard rsync config is set.
config.vm.synced_folder ".", "/vagrant", mount_options: ['dmode=777','fmode=777']
Vagrant hangs forever on this step here after running vagrant up
==> default: Mounting NFS shared folders...
In my Vagrantfile I have this uncommented and the rsync config commented out, which turns NFS on.
config.vm.synced_folder ".", "/vagrant", type: "nfs"
When Vagrant is running it echos this out to the terminal.
Redirecting to /bin/systemctl status nfs-server.service
● nfs-server.service - NFS server and services
Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Redirecting to /bin/systemctl start nfs-server.service
Job for nfs-server.service failed. See "systemctl status nfs-server.service" and "journalctl -xe" for details.
Results of systemctl status nfs-server.service
dillon#localhost ~ $ systemctl status nfs-server.service
● nfs-server.service - NFS server and services
Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2015-05-29 22:24:47 PDT; 22s ago
Process: 3044 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
Process: 3040 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
Main PID: 3044 (code=exited, status=1/FAILURE)
May 29 22:24:47 localhost.sulfur systemd[1]: Starting NFS server and services...
May 29 22:24:47 localhost.sulfur rpc.nfsd[3044]: rpc.nfsd: writing fd to kernel failed: errno 111 (Connection refused)
May 29 22:24:47 localhost.sulfur rpc.nfsd[3044]: rpc.nfsd: unable to set any sockets for nfsd
May 29 22:24:47 localhost.sulfur systemd[1]: nfs-server.service: main process exited, code=exited, status=1/FAILURE
May 29 22:24:47 localhost.sulfur systemd[1]: Failed to start NFS server and services.
May 29 22:24:47 localhost.sulfur systemd[1]: Unit nfs-server.service entered failed state.
May 29 22:24:47 localhost.sulfur systemd[1]: nfs-server.service failed.
The journelctl -xe log has a ton of stuff in it so I won't post all of it here, but there are some things in the bold red.
May 29 22:24:47 localhost.sulfur rpc.mountd[3024]: Could not bind socket: (98) Address already in use
May 29 22:24:47 localhost.sulfur rpc.mountd[3024]: Could not bind socket: (98) Address already in use
May 29 22:24:47 localhost.sulfur rpc.statd[3028]: failed to create RPC listeners, exiting
May 29 22:24:47 localhost.sulfur systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking..
Before I ran vagrant up I looked to see if there were any process binding to port 98 with netstat -tulpn and did not see anything and in fact while vagrant is hanging I ran netstat -tulpn again to see what was binding to port 98 and didn't see anything. (checked for both current user and root)
UPDATE: Haven't gotten any responses.
I wasn't able to figure out the current issue I'm having. I tried using lxc instead, but gets stuck on booting. I'd also prefer not to use VirtualBox, but the issue seems to lie within nfs not the hypervisor. Going to try using the rsync-auto feature Vagrant provides, but I'd prefer to get nfs working.
Looks like when using libvirt the user is given control over nfs and rpcbind, and Vagrant doesn't even try to touch those things like I had assumed it did. Running these solved my issue:
service rpcbind start
service nfs stop
service nfs start
The systemd unit dependencies of nfs-server.service contain rpcbind.target but not rpcbind.service.
One simple solution is to create a file /etc/systemd/system/nfs-server.service containing:
.include /usr/lib/systemd/system/nfs-server.service
[Unit]
Requires=rpcbind.service
After=rpcbind.service
On CentOS 7, all I needed to do
was install the missing rpcbind, like this:
yum -y install rpcbind
systemctl enable rpcbind
systemctl start rpcbind
systemctl restart nfs-server
Took me over an hour to find out and try this though :)
Michel
I've had issues with NFS mounts using both the libvirt and the VirtualBox provider on Fedora 22. After a lot of gnashing of teeth, I managed to figure out that it was a firewall issue. Fedora seems to ship with a firewalld service by default. Stopping that service - sudo systemctl stop firewalld - did the trick for me.
Of course, ideally you would configure this firewall rather than disable it entirely, but I don't know how to do that.

Resources