Ensuring that a systemd unit starts *before* any networking - systemd

I have some code which I need to ensure runs until completion before any networking units start, as amongst other things that code generates dhcpcd.conf and wpa_supplicant.conf.
This ought to be straightforward but all my efforts so far have been in vain...
My current unit looks like this:
[Unit]
Description=Config generation from DB
Before=networking.service
[Service]
Type=oneshot
ExecStart=/home/mark/bin/db2config.py
[Install]
RequiredBy=network.target
I have tried several variations on this theme (including adding dhcpcd.service to the Before= list, for example) but none have had the desired effect.
My understanding of Before= is that any of the listed services which are going to be started, will not start until after this unit. But that understanding is clearly wrong!
This feels like something that would already have come up, but if so I've not found it amongst the far more common questions about making sure networking has started before some other unit does.

The answer is fairly simple, but it requires removing the assumption that OS-supplied units necessarily do what you think they do.
Firstly, my (now working) unit:
[Unit]
Description=Config generation from DB
Before=network-pre.service
Wants=network-pre.service
[Service]
Type=oneshot
ExecStart=/home/mark/bin/db2config.py
[Install]
RequiredBy=network.target
But the all important change is to make dhcpcd depend on network-pre.target which out of the box on many/most distros (eg Debian, Redhat) it does not:
sudo systemctl edit dhcpcd.service
.. and add:
[Unit]
After=network-pre.target
Thanks to the systemd-devel mailing list for helping me on this:

Related

Is forking in a daemon still necessary when using systemd?

I created a script which is supposed to run as a daemon, controlled by systemd. I came across ancient questions like What is the reason for performing a double fork when creating a daemon? and ancient documentation, which suggests that daemons should fork to detach from a terminal.
But in 2020, using systemd, all of this seems obsolete to me. As far as I understand (with support from https://jdebp.eu/FGA/unix-daemon-design-mistakes-to-avoid.html), there is no need to detach from any terminal, no need to avoid zombie processes etc. The whole forking-and-exiting only makes sense to me if I want to start the daemon manually from a terminal and not with systemd.
Am I right or is there still any benefit from forking inside a daemon and exiting the parent?
You are correct. The forking is now 100% handled by the systemd environment so there is really no need to do anything in that arena. It even saves the PID which you can access in the StopExec=... as $MAINPID:
StopExec=/bin/kill "$MAINPID"
If your daemon has a forking capability, you can use it with the forking type:
[Service]
type=forking
But if you don't have any forking mechanism in your daemon, don't implement it. It's useless.
Note that from the command line, you can always use the & to start it in the background. That's explicit. People can clearly understand how that works.
Another point, many people would use a PID file to save that identifier and use it to kill the process on a stop. That PID file was also useful to prevent the administrator from starting a second instance of the same service. Again, systemd takes care of that. You can have at most one instance of any service.

One time task with Kubernetes

We are implementing a utility that will apply the DDLs to the database. The utility is built using spring boot java and it has a main program that will run just once on startup. Can some one share what kind of K8s recipe file. Here are my considerations, the pod is expected to be short lived and after the program executes I want the POD to get killed.
Kubernetes Jobs are what you want for that.
Here is a great example.
Once you start running jobs you'll also want to think of an automated way of cleaning up the old jobs. There are custom controllers written to clean up jobs, so you could look at those, but there is first-class support being built-in for job clean-up that I believe is still in alpha state, but you can already use this of course.
It works by simply adding a TTL to your job manifests. Here is more info on the job clean-up mechanism with TTL.

How to deal with stale data when doing service discovery with etcd on CoreOS?

I am currently tinkering with CoreOS and creating a cluster based upon it. So far, the experience with CoreOS on a single host is quite smooth. But things get a little hazy when it comes to service discovery. Somehow I don't get the overall idea, hence I am asking here now for help.
What I want to do is to have two Docker containers running where the first relies on the second. If we are talking pure Docker, I can solve this using linked containers. So far, so good.
But this approach does not work across machine boundaries, because Docker can not link containers across multiple hosts. So I am wondering how to do this.
What I've understand so far is that CoreOS's idea of how to deal with this is to use its etcd service, which is basically a distributed key-value-store that is accessible on each host locally via port 4001, so you do not have to deal (as a consumer of etcd) with any networking details: Just access localhost:4001 and you're fine.
So, in my head, I now have the idea that this means that when a Docker which provides a service spins up, it registers itself (i.e. its IP address and its port) in the local etcd, and etcd takes care of distributing the information across the network. This way, e.g. you get key-value pairs such as:
RedisService => 192.168.3.132:49236
Now, when another Docker container needs to access a RedisService, it gets the IP address and port from their very own local etcd, at least once the information has been distributed across the network. So far, so good.
But now I have a question that I can not answer, and that puzzles me already for a few days: What happens when a service goes down? Who cleans up the data inside of etcd? If it is not cleaned up, all the clients try to access a service that is no longer there.
The only (reliable) solution I can think of at the moment is making use of etcd's TTL feature for data, but this involves a trade-off: Either you have quite high network traffic, as you need to send a heartbeat every few seconds, or you have to live with stale data. Both is not fine.
The other, well, "solution" I can think of is to make a service deregister itself when it goes down, but this only works for planned shutdowns, not for crashes, power outeages, …
So, how do you solve this?
There are a few different ways to solve this: the sidekick method, using ExecStopPost and removing on failure. I'm assuming a trio of CoreOS, etcd and systemd, but these concepts could apply elsewhere too.
The Sidekick Method
This involves running a separate process next to your main application that heartbeats to etcd. On the simple side, this is just a for loop that runs forever. You can use systemd's BindsTo to ensure that when your main unit stops, this service registration unit stops too. In the ExecStop you can explicitly delete the key you're setting. We're also setting a TTL of 60 seconds to handle any ungraceful stoppage.
[Unit]
Description=Announce nginx1.service
# Binds this unit and nginx1 together. When nginx1 is stopped, this unit will be stopped too.
BindsTo=nginx1.service
[Service]
ExecStart=/bin/sh -c "while true; do etcdctl set /services/website/nginx1 '{ \"host\": \"10.10.10.2\", \"port\": 8080, \"version\": \"52c7248a14\" }' --ttl 60;sleep 45;done"
ExecStop=/usr/bin/etcdctl delete /services/website/nginx1
[Install]
WantedBy=local.target
On the complex side, this could be a container that starts up and hits a /health endpoint that your app provides to run a health check before sending data to etcd.
ExecStopPost
If you don't want to run something beside your main app, you can have etcdctl commands within your main unit to run on start and stop. Be aware, this won't catch all failures, as you mentioned.
[Unit]
Description=MyWebApp
After=docker.service
Require=docker.service
After=etcd.service
Require=etcd.service
[Service]
ExecStart=/usr/bin/docker run -rm -name myapp1 -p 8084:80 username/myapp command
ExecStop=/usr/bin/etcdctl set /services/myapp/%H:8084 '{ \"host\": \"%H\", \"port\": 8084, \"version\": \"52c7248a14\" }'
ExecStopPost=/usr/bin/etcdctl rm /services/myapp/%H:8084
[Install]
WantedBy=local.target
%H is a systemd variable that substitutes in the hostname for the machine. If you're interested in more variable usage, check out the CoreOS Getting Started with systemd guide.
Removing on Failure
On the client side, you could remove any instance that you have failed to connect to more than X times. If you get a 500 or a timeout from /services/myapp/instance1 you could run and keep increasing the failure count and then try to connect to other hosts in the /services/myapp/ directory.
etcdctl set /services/myapp/instance1 '{ \"host\": \"%H\", \"port\": 8084, \"version\": \"52c7248a14\", \"failures\": 1 }'
When you hit your desired threshold, remove the key with etcdctl.
Regarding the network traffic that heartbeating would cause – in most cases you should be sending this traffic over a local private network that your provider runs so it should be free and very fast. etcd is constantly heartbeating with its peers anyways, so this is just a little increase in traffic.
Hop into #coreos on Freenode if you have any other questions!

How to run an EventMachine application on your own production server?

I have just written my first EventMachine application. In development, to start the server, all I do is:
ruby myapp.rb
Which runs my application until I kill it with control+C. In production, this doesn't seem like the right way to do it.
How would I go about running this on my production server?
Check out daemons: http://daemons.rubyforge.org/ - a simple gem written for precisely this use case.
At PostRank we always used God to start/restart our production EventMachine APIs.
I prefer to have a completely external process handling my daemons rather than using something like the daemons library but that's a personnal preference.
You have many solutions out there, here those I know of, all of them will restart your application when it crash more or les quickly and some offer a management interface whether it is a cli or a web interface:
supervisord (http://supervisord.org/): he one I prefer so far
daemontools (http://cr.yp.to/daemontools.html): works well but can be anoying to configure
god as mentionned (http://god.rubyforge.org/): Never used it most for this horrible and cryptic config file syntax
And the last one is whatever comes with your linux distrib, init can run an application and restart it when it dies, you have neary no control over it but it can do the job.
You can type "man inittab" to learn more.

Is there a good process monitoring and control framework in either Ruby or Perl?

I came across God which seems good but I am wondering if anyone knows of other process monitoring and control frameworks that I can compare god with.
God has the following features:
Config file is written in Ruby
Easily write your own custom conditions in Ruby
Supports both poll and event based conditions
Different poll conditions can have different intervals
Integrated notification system (write your own too!)
Easily control non-daemonizing scripts
The last one is what i am having difficulty with.
Have a look at Ubic (CPAN page here but do read installation details on the github project page).
Ubic isn't a monitoring framework par se but a LSB compliant extensible Service Manager.
Its written and configurable all in Perl. A simple example would be:
# /etc/ubic/services/test
use Ubic::Service::SimpleDaemon;
return Ubic::Service::SimpleDaemon->new({ bin => "sleep 1000" });
To start above is: ubic start test. To check its running or not: ubic status test. To stop service (suprisingly!) is: ubic stop test.
Ubic keeps an eye on all its services so when test service stops after 1000 seconds then Ubic will automatically restart it again.
Some more links:
Mailing list
Ubic - how to implement your first service (blog post)
Ubic - code reuse and customizations (blog post)
/I3az/
I am a big fan of Monit. It's written in C, but does everything you want.
I particularly liked that I was able to compile a thin version that worked beautifully on an ARM based system with only 64 MB of RAM.
You might want to read God vs Monit on SO to get a comparison.
Bluepill is a great process monitoring/administration framework.
It's written in Ruby but it can monitor anything, I use it to monitor Unicorn processes.
It even runs on 1.9.2.
Doesn't leak memory.
Has support for demonizing processes that don't demonize themselves.
All around easy, even with RVM!

Resources