'systemd' kills normal user subprocesses - systemd

I have written a 'systemd's service unit to run a bash script that I have been running, whitout any problems, with '/etc/rc.local' for many years.
The 'systemd's service works perfectly except for the following problem:
My bash script runs, every 5 minutes, the following lines:
nice --adjustment=$CommonNicePriority su root --command="HOME=$COMMON_DIR $0 $CyclesOption" # System common tasks.
for User in $Users # Run logged in users' tasks.
do
nice --adjustment=$UsersNicePriority su $User --command="$0 $CyclesOption"
done
As you can see it spawns a process, running as 'root' (first line) and then one process for every logged in normal user (loop) as (normal) user.
Each of the above processes can spawn some other (task) processes. The processes running as 'root' work all right but those running for normal users are killed by 'systemd'. These are lines from '/var/log/syslog':
Feb 2 10:35:00 Linux-1 systemd[1]: Started User Manager for UID 0.
Feb 2 10:35:00 Linux-1 systemd[1]: Started Session c4 of user manolo.
Feb 2 10:35:00 Linux-1 systemd[1]: session-c4.scope: Killing process 31163 (CommonCron) with signal SIGTERM.
Feb 2 10:35:00 Linux-1 systemd[1]: session-c4.scope: Killing process 31164 (CommonCron) with signal SIGTERM.
Feb 2 10:35:00 Linux-1 systemd[1]: session-c4.scope: Killing process 31165 (CommonCron) with signal SIGTERM.
Feb 2 10:35:00 Linux-1 systemd[1]: Stopping Session c4 of user manolo.
Feb 2 10:35:00 Linux-1 systemd[1]: Stopped Session c4 of user manolo.
Feb 2 10:35:13 Linux-1 systemd[1]: Stopping User Manager for UID 0...
Here is my 'systemd's service:
[Unit]
Description=CommonStartUpShutdown
Requires=local-fs.target
Wants=network.target
[Service]
Type=forking
ExecStart=/etc/after_boot.local
RemainAfterExit=yes
TimeoutSec=infinity
KillMode=none
ExecStop=/etc/before_halt.local
[Install]
WantedBy=local-fs.target
# How I think it works:
# Service starts when target 'local-fs.target' is reached, preferably, when target 'network.target'
# is also reached. This last target is reached even if the router is powered off (tested).
# Service start sequence runs script: 'ExecStart=/etc/after_boot.local' which is expected
# to spawn child processes and exit: 'Type=forking'. Child processes: 'CommonSystemStartUp's
# children: 'CommonDaemon', 'CommonCron'... This script must exit with 0, otherwise systemd
# will kill all child processes and wont run 'ExecStop' script. Service start can run as long
# as it needs: 'TimeoutSec=infinity'.
# Service is kept alive, after running 'ExecStart=...', by 'RemainAfterExit=true'.
# When the service is stopped, at system shutdown, script 'ExecStop=/etc/before_halt.local'
# will run for as long as necessary, 'TimeoutSec=infinity', before target
# 'local-fs.target' is lost: 'Requires=local-fs.target'.
# 'ExecStart=/etc/after_boot.local's children processes ('CommonDaemon', 'CommonCron'...) won't be
# killed when 'ExecStop=/etc/before_halt.local' runs: 'KillMode=none'.
I have tried moving 'KillMode=none' to the [Unit] block: 'systemd' complains.
Also tried 'WantedBy=multi-user.target.wants' in the [Install] block: doesn't make any difference.

Related

Systemd run script after wakeup: "Can't open display"

I have a script which does things with screen brightness, works fine that's cool and now I want to make it run after wake up from suspend.
So I tried using systemd, I have a file under /etc/systemd/system/myscript.service which is as follows:
[Unit]
Description=Run myscript after wakeup
After=suspend.target hibernate.target hybrid-sleep.target suspend-then-hibernate.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/myscript
User=me
#Environment=DISPLAY=:0
[Install]
WantedBy=suspend.target hibernate.target hybrid-sleep.target suspend-then-hibernate.target
Note: User is set because the myscript needs HOME variable.
After I run sudo systemctl enable myscript and try suspend/wakeup, myscript is not run and journalctl -u myscript.service outputs the following message:
Jan 25 13:42:53 mymachine myscript[24489]: Can't open display
Jan 25 13:42:53 mymachine systemd[1]: myscript.service: Succeeded.
Jan 25 13:42:53 mymachine systemd[1]: Finished Run myscript after wakeup.
If I uncomment the line #Environment=DISPLAY=:0 in myscript.service the error is "Can't open display :0"
Any help would be great :^)
This worked on my Arch system. I tested a script in that location with xbacklight going up and down by 75% a few times after a resume from hibernate or suspend (systemctl hibernate / suspend).
I can only think that you do not have the DISPLAY=:0 in your environment (verify with env) for the user you are running the script as.
I was having a similar problem. Fixed it by adding the following to my systemd service:
Environment="DISPLAY=<DISP>"
Environment="XAUTHORITY=/path/to/xauthority"
Replace <DISP> with the value of your $DISPLAY variable, this is usually :0.

systemctl service systemd-notify not working with non-root user

I have a simple example of a service unit and bash script on Red Hat Enterprise Linux 7 using Type=notify that I am trying to get working.
When the service unit is configured to start the script as root, things work as expected. When adding User=testuser it fails. While the script initially starts (as seen on process list) the systemctl service never receives the notify message indicating ready so it hangs and eventually times out.
[Unit]
Description=My Test
[Service]
Type=notify
User=testuser
ExecStart=/home/iatf/test.sh
[Install]
WantedBy=multi-user.target
Test.sh (owned by testuser with execute permission)
#!/bin/bash
systemd-notify --status="Starting..."
sleep 5
systemd-notify --ready --status="Started"
while [ 1 ] ; do
systemd-notify --status="Processing..."
sleep 3
systemd-notify --status="Waiting..."
sleep 3
done
When run as root systemctl status test displays the correct status and status messages as sent from my test.sh bash script. When User=testuser the service hangs and then timesout and journalctl -xe reports:
Jul 15 13:37:25 tstcs03.ingdev systemd[1]: Cannot find unit for notify message of PID 7193.
Jul 15 13:37:28 tstcs03.ingdev systemd[1]: Cannot find unit for notify message of PID 7290.
Jul 15 13:37:31 tstcs03.ingdev systemd[1]: Cannot find unit for notify message of PID 7388.
Jul 15 13:37:34 tstcs03.ingdev systemd[1]: Cannot find unit for notify message of PID 7480.
I am not sure what those PIDs are as they do not appear on ps -ef list
This appears to be known limitation in the notify service type
From a pull request to the systemd man pages
Due to current limitations of the Linux kernel and the systemd, this
command requires CAP_SYS_ADMIN privileges to work
reliably. I.e. it's useful only in shell scripts running as a root
user.
I've attempted some hacky workarounds with sudo and friends but they won't work as systemd - generally failing with
No status data could be sent: $NOTIFY_SOCKET was not set
This refers to the socket that systemd-notify is trying to send data to - its defined in the service environment but I could not get it reliably exposed to a sudo environment
You could also try using a Python workaround described here
python -c "import systemd.daemon, time; systemd.daemon.notify('READY=1'); time.sleep(5)"
Its basically just a sleep which is not reliable and the whole point of using notify is reliable services.
In my case - I just refactored to use root as the user - with the actual service as a child under the main service with the desired user
sudo -u USERACCOUNT_LOGGED notify-send "hello"

How to set the program as daemon after $DISPLAY is set?

I want to set my screen as screensave status every 50minutes (3000 seconds).
cat /home/rest.sh
while true;do
sleep 3000
xscreensaver-command --lock 1>/dev/null
done
sh /home/rest.sh & can make it run.
Now i want to set it as a daemon.
sudo vim /etc/systemd/system/screensave.service
[Unit]
Description=screensave
[Service]
User=root
ExecStart=/bin/bash /home/rest.sh
StandardError=journal
[Install]
WantedBy=multi-user.target
To set it and enable as daemon.
systemctl enable screensave.service
I find that the service is not running as a daemon.
sudo journalctl -u screensave
Jan 24 12:16:50 user systemd[1]: Started screensave.
Jan 24 12:17:22 user bash[621]: xscreensaver-command: warning: $DISPLAY is not set: defaulting to ":0.0".
Jan 24 12:17:22 user bash[621]: No protocol specified
Jan 24 12:17:22 user bash[621]: xscreensaver-command: can't open display :0.0
How to run it as a daemon after $DISPLAY is set ?
This is a very common FAQ. A system daemon cannot easily connect to the X session of any individual user. On a multi-user system, how do you tell which user's session to connect to, anyway? On a single-user system, what should the daemon do if no session is running (as it often isn't at the time the daemon starts up)?
Trying to run a system daemon as any particular user won't work, and giving individual users access to a system daemon is a recipe for security problems. It can be done, but the solution is complex, and probably not something you want to attempt on your own. (Briefly, have the daemon listen to commands on a socket; create a user-space program which knows how to talk to the socket, and build some sort of authorization and authentication so the daemon knows whom it's talking to and can verify that this user is allowed to connect to this display.)
The drop-dead simple solution is to run this from your desktop environment's startup scripts instead. Most desktops have something like "session start-up items" or "autorun on login" hooks.
I'm not running linux and can't check now but the steps to daemonize a process are to close stdin stdout stderr change current working directory to / and to fork twice and setsid so that current process is a new session leader.
adding something like this at the beginning, before running, first thing to check is exec command creates a new session leader process with ps -Cbash -o sid,pgid,pid,ppid,comm,args
# checking if current process is a session leader to avoid infinite call
if [[ $(ps -p $$ -osid=) != $$ ]]; then
( cd / ; exec setsid /bin/bash /home/rest.sh & ) </dev/null 1>&0 2>&0 &
exit
fi

How to build a multi instance systemd service configuration?

I am trying to configure systemd to be able to execute multiple instances of the same service but it seems that I am doing something wrong and the documentation resources seem not to be quite so clear.
Created /lib/systemd/system/confluence#.service file with this content:
[Unit]
Description=Confluence %i
After=postgresql.service nginx.service
[Service]
Type=forking
ExecStart=/opt/atlassian/confluence-%i/bin/start-confluence.sh
ExecStartPre=/opt/atlassian/confluence-%i/bin/setenv.sh prestart
ExecStop=/opt/atlassian/confluence-%i/bin/stop-confluence.sh
TimeoutStopSec=5min
PIDFile=/opt/atlassian/confluence-%i/work/catalina.pid
[Install]
WantedBy=multi-user.target
So far, so good, the systemctl enable confluence.test reported success (and yes the /opt/atlassian/confluence-test/ "happens" to contain the what it needs.
Still, when I try to start the service using systemctl start confluence I get:
root#atlas:/lib/systemd/system# systemctl start confluence#test.service
Job for confluence#test.service failed. See "systemctl status confluence#test.service" and "journalctl -xe" for details.
root#atlas:/lib/systemd/system# systemctl status confluence#test.service
● confluence#test.service - Confluence test
Loaded: loaded (/lib/systemd/system/confluence#.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2015-10-09 13:25:28 BST; 7s ago
Process: 16352 ExecStartPre=/opt/atlassian/confluence-%i/bin/setenv.sh prestart (code=exited, status=203/EXEC)
Oct 09 13:25:28 atlas systemd[1]: Starting Confluence test...
Oct 09 13:25:28 atlas systemd[1]: confluence#test.service: control process exited, code=exited status=203
Oct 09 13:25:28 atlas systemd[1]: Failed to start Confluence test.
Oct 09 13:25:28 atlas systemd[1]: Unit confluence#test.service entered failed state.
Oct 09 13:25:28 atlas systemd[1]: confluence#test.service failed.
Somehow it seems that systemd does not expand the "%i" which is supposed to be the instance name.
I have been bouncing all around the Web looking for the pieces I need to create my own set of systemd unit, template and target files. This question was in the pile of search results. In this case, rather than finding help, I think I can give help.
I don't have Confluence, and won't install it to test, so I am mostly guessing for the first part.
I believe the problem is that the %i specifier is escaped and could be creating bad paths in the commands. If that is the whole problem, changing to the unescaped version, %I, would be the simplest solution. Your file /lib/systemd/system/confluence#.service would then become:
[Unit]
Description=Confluence %I
After=postgresql.service nginx.service
[Service]
Type=forking
ExecStart=/opt/atlassian/confluence-%I/bin/start-confluence.sh
ExecStartPre=/opt/atlassian/confluence-%I/bin/setenv.sh prestart
ExecStop=/opt/atlassian/confluence-%I/bin/stop-confluence.sh
TimeoutStopSec=5min
PIDFile=/opt/atlassian/confluence-%I/work/catalina.pid
[Install]
WantedBy=multi-user.target
If, on the other hand, the problem has a different root, such as specifiers not expanding in the commands, you can still accomplish the same results using a wrapper script. systemd is supposed to eliminate the need for them, but using older systems in newer processes often results in the theory not matching practice.
Create a shell script, perhaps in the directory that is the base for everything else anyway.
Create /opt/atlassian/conflencectl with this:
#!/usr/bin/env bash
# Check that there is a command to execute
if ( ! test -n "$1" ) {
exit 1
}
# Check that there is an instance name to use
if ( ! test -n "$2" ) {
exit 1
}
# Make the directory name stub for this instance
InstanceDirName="confluence-$2"
# Execute the proper command, based on the command from the unit file
case $1 in
Start)
/opt/atlassian/$InstanceDirName/bin/start-confluence.sh;
;;
StartPre )
/opt/atlassian/$InstanceDirName/bin/setenv.sh prestart;
;;
Stop )
/opt/atlassian/$InstanceDirName/bin/stop-confluence.sh;
;;
*)
exit 1;
esac
Depending on what is, or is not, needed for the operations, you could add additional safety, sanity, or dependency checks, such as testing for the existence of the directory or needed config files.
Your systemd unit file then becomes:
[Unit]
Description=Confluence %I
After=postgresql.service nginx.service
[Service]
Type=forking
ExecStart=/opt/atlassian/conflencectl "Start" "%i"
ExecStartPre=/opt/atlassian/conflencectl "StartPre" "%i"
ExecStop=/opt/atlassian/conflencectl "Stop" "%i"
TimeoutStopSec=5min
PIDFile=/opt/atlassian/confluence-%I/work/catalina.pid
[Install]
WantedBy=multi-user.target
Even with this version I'm not too sure what will happen with the PIDFile declaration, nor how to compensate, if needed, using the wrapper script. The documentation I've scanned seems to imply that systemd is pretty good at knowing the PID anyway, and it may not be needed. Only testing will prove the final effects, however.
You can't have a separate start/stop script for each instance.
I would suggest ExecStart=/opt/atlassian/confluence/bin/start-confluence.sh
This will start a new instance every-time it is called.
Same goes for ExecStartPre and ExecStop

mmonit golang restarting slow and status does not exist

I created monit app that must restart golang site on crash
$ cd /etc/monit/conf.d
$ vim checkSite
It starting program with nohup and saving its pid to file:
check process site with pidfile /root/go/path/to/goSite/run.pid
start program = "/bin/bash -c 'cd /root/go/path/to/goSitePath; nohup ./goSite > /dev/null 2>&1 & echo $! > run.pid'" with timeout 5 seconds
stop program = "/bin/kill -9 `cat /root/go/path/to/goSitePath/run.pid`"
It starts ok.
Process 'site'
status Running
monitoring status Monitored
pid 29723
parent pid 1
uptime 2m
children 0
memory kilobytes 8592
memory kilobytes total 8592
memory percent 0.4%
memory percent total 0.4%
cpu percent 0.0%
cpu percent total 0.0%
data collected Thu, 05 Mar 2015 07:20:32
Then to test how it will restart on crash I killed manually golang site.
Here I have two issues:
Site is restarted rather slow: it takes 1 minute although in configuration I set with timeout 5 seconds
Status of site in monit becomes Does not exist even after site in fact restarts. I guess this occurs because after killing and restarting site's pid is changing randomly, but how to overcome this I don't know.
status after restart:
Process 'site'
status Does not exist
monitoring status Monitored
data collected Thu, 05 Mar 2015 08:04:44
How to reduce the time of restarting and how to repair site's monit status?
monit log:
[Mar 5 08:04:44] error : 'site' process is not running
[Mar 5 08:04:44] info : 'site' trying to restart
[Mar 5 08:04:44] info : 'site' start: /bin/bash
[Mar 5 08:06:44] info : 'site' process is running with pid 31479
Update
My golang site is rather simple:
package main
import (
"fmt"
"github.com/go-martini/martini"
)
func main() {
m := martini.Classic()
m.Get("/", func() {
fmt.Println("main page")
})
m.Run()
}
Update 2
I tried to increase speed of monit reload my golang site by removing pid file itself. Say I made kill 29723 && rm run.pid and turned timer on to count time for site been accessible again. It took 85 seconds. So removing pid file did not help monit to increase speed of reloading site.
monit doesn't have any subscription mechanism to inmediatelly discover if a process has died.
In daemon mode, as documented, monit works by periodically polling the status of all the configured rules, its poll-cycle is configured when daemon starts and defaults in some Linux distributions to 2 minutes, what means that in this case, monit can need till 2 minutes to take any action.
Check this configuration in your monitrc, it's configured with the set daemon directive, for example, if you want to check the status every 5 seconds, then you should set:
set daemon 5
On every cycle it updates its status, and executes actions if needed depending on this. So if it detects that the process doesn't exist, it will report Does not exist till the next poll cycle, even if it already takes the decission to restart it.
The timeout in the start daemon directive doesn't have anything to do with this poll-cycle, this is the time monit will give to the service to start. If the service doesn't start in this time monit will report it.
If monit doesn't meet your requirements, you can also try supervisord, that is always aware of the state of the executed programs.

Resources