Problem between a "log2ram-daily.timer" and its "override.conf" file - systemd

I am getting a problem between a "log2ram-daily.timer" and its "override.conf" file and I am not sure if it is because of syntax incompatibility or what. This is what I am doing:
Raspian on Pi3B+
The "log2ram-daily.timer" contains :
[Unit]
Description=Daily Log2Ram writing activities
[Timer]
OnCalendar=*-*-* 23:55:00
Persistent=true
[Install]
WantedBy=timers.target
Its "override.conf" file contains :
[Timer]
OnCalendar=Mon *-*-* 23:50:00
But the result is not as I want...
When I "systemctl status log2ram-daily.timer", the trigger indicates
Trigger: Fri 2021-07-30 23:55:00 CEST; 19h left
Not as expected in the override.conf...
In my troubleshooting, I modified the file "log2ram-daily.timer", by adding "Mon":
[Unit]
Description=Daily Log2Ram writing activities
[Timer]
OnCalendar=Mon *-*-* 23:55:00
Persistent=true
[Install]
WantedBy=timers.target
And...
systemctl status log2ram-daily.timer
Trigger: Mon 2021-08-02 23:50:00 CEST; 3 days left
The override.conf is now well loaded! But why? Syntax incompatibility?!

I found the solution, in the "override.conf" just add an empty directive before, because otherwise it will be repeated.
[Timer]
OnCalendar=
OnCalendar=Mon *-*-* 23:50:00

Related

collectd - exec plugin: Unable to parse command

I'm trying to return a value from a simple script. However, I'm getting the following error.
Feb 26 09:26:37 localhost systemd[1]: Starting Collectd statistics daemon...
Feb 26 09:26:37 localhost collectd[834]: plugin_load: plugin "exec" successfully loaded.
Feb 26 09:26:37 localhost collectd[834]: Systemd detected, trying to signal readyness.
Feb 26 09:26:37 localhost systemd[1]: Started Collectd statistics daemon.
Feb 26 09:26:37 localhost collectd[834]: Initialization complete, entering read-loop.
Feb 26 09:26:37 localhost collectd[834]: exec plugin: Unable to parse command, ignoring line: "73"
Feb 26 09:26:47 localhost collectd[834]: exec plugin: Unable to parse command, ignoring line: "74"
Feb 26 09:26:57 localhost collectd[834]: exec plugin: Unable to parse command, ignoring line: "73"
Feb 26 09:27:07 localhost collectd[834]: exec plugin: Unable to parse command, ignoring line: "73"
My config is
LoadPlugin exec
<Plugin exec>
Exec "cwagent" "/opt/aws/amazon-cloudwatch-agent/bin/supervisor.sh"
</Plugin>
and my script is
#!/bin/bash
VALUE=$(/bin/systemctl status | wc -l)
echo "$VALUE"
I realise that this is probably a silly mistake I'm making. I have spent a bit of time playing around and googling to try to understand the problem. But I'm afraid I've made little progress. Grateful for any advice :¬)
Number of things, your plugin is forked off by collectd with the expectation that it keeps running and producing consumable output, so you need to use a while loop like it lays out here: https://collectd.org/wiki/index.php/Plugin:Exec
Second, your output format is wrong. I found this bit of the documentation badly written because it isn't completely clear how the gauge name and metric name are constituted out of the string. Taking the example in the page above:
echo "PUTVAL \"$HOSTNAME/exec-magic/gauge-magic_level\" interval=$INTERVAL N:$VALUE"
Then:
exec-magic is the plugin name
magic_level is the metric name
gauge is the data source type from collectd types
N: is the abbreviation for "now" as defined in the exec plugin
So putting this together you'd something similar to:
#!/bin/bash
HOSTNAME="${COLLECTD_HOSTNAME:-localhost}"
INTERVAL="${COLLECTD_INTERVAL:-60}"
while sleep "$INTERVAL"; do
VALUE=$(/bin/systemctl status | wc -l)
echo "PUTVAL ${HOSTNAME}/cwagent/counter-line_count\" N:$VALUE"
done
In this case you are using the simple counter type and returning a single value equivalent to the number of lines you counted in your command.

Difference between **journalctl -u test.service** and **journalctl CONTAINER_NAME=test**

I have a systemd service file which run a docker container with log driver journald.
ExecStart=/usr/bin/docker run \
--name ${CONTAINER_NAME} \
-p ${PORT}:8080 \
--add-host ${DNS} \
-v /etc/localtime:/etc/localtime:ro \
--log-driver=journald \
--log-opt tag="docker.{{.Name}}" \
${RESPOSITORY_NAME}/${CONTAINER_NAME}
ExecStop=-/usr/bin/docker stop ${CONTAINER_NAME}
When I check the logs via journalctl I see two different _TRANSPORT.
With journalctl -u test.service I see _TRANSPORT=stdout. And with Journalctl CONTAINER_NAME=test I see _TRANSPORT=journal
What is the difference?
The difference here is in how the logs get to systemd-journald before they are logged.
As of right now, the supported transports (at least according to the _TRANSPORT field in systemd-journald) are: audit, driver, syslog, journal, stdout and kernel (see systemd.journal-fields(7)).
In your case, everything logged to stdout by commands executed by the ExecStart= and ExecStop= directives is logged under the _TRANSPORT=stdout transport.
However, Docker is internally capable of using the journald logging driver which, among other things, introduces several custom journal fields - one of them being CONTAINER_ID=. It's just a different method of delivering data to systemd-journald - instead of relying on systemd to catch and send everything from stdout to systemd-journald, Docker internally sends everything straight to systemd-journald by itself.
This can be achieved by using the sd-journal API (as described in sd-journal(3)). Docker uses the go-systemd Go bindings for the sd-journal C library.
Simple example:
hello.c
#include <stdio.h>
#include <systemd/sd-journal.h>
int main(void)
{
printf("Hello from stdout\n");
sd_journal_print(LOG_INFO, "Hello from journald");
return 0;
}
# gcc -o /var/tmp/hello -lsystemd hello.c
# cat > /etc/systemd/system/hello.service << EOF
[Service]
ExecStart=/var/tmp/hello
EOF
# systemctl daemon-reload
# systemctl start test.service
Now if I check journal, I'll see both messages:
# journalctl -u hello.service
-- Logs begin at Mon 2019-09-30 22:08:02 CEST, end at Fri 2020-03-27 17:11:29 CET. --
Mar 27 17:08:28 localhost systemd[1]: Started hello.service.
Mar 27 17:08:28 localhost hello[921852]: Hello from journald
Mar 27 17:08:28 localhost hello[921852]: Hello from stdout
Mar 27 17:08:28 localhost systemd[1]: hello.service: Succeeded.
But each of them arrived using a different transport:
# journalctl -u hello.service _TRANSPORT=stdout
-- Logs begin at Mon 2019-09-30 22:08:02 CEST, end at Fri 2020-03-27 17:12:29 CET. --
Mar 27 17:08:28 localhost hello[921852]: Hello from stdout
# journalctl -u hello.service _TRANSPORT=journal
-- Logs begin at Mon 2019-09-30 22:08:02 CEST, end at Fri 2020-03-27 17:12:29 CET. --
Mar 27 17:08:28 localhost systemd[1]: Started hello.service.
Mar 27 17:08:28 localhost hello[921852]: Hello from journald
Mar 27 17:08:28 localhost systemd[1]: hello.service: Succeeded.

systemd timer. several instance, different time?

I'm using the btrfs-scrub#.timer which is a timer with a template. (the argument represents the btrfs volume to scrub)
I need to scrub several volumes regularly but I'd like scrubs not to happen simultaneously. (e.g everyday a scrub, each day a different volume).
According to the documentation, you can only have one argument per timer
Is there a simple way to do that with systemd timers?
You could override the timer specifications for each unit using drop-in files, for example:
# /etc/systemd/system/btrfs-scrub#-.timer.d/OnCalendar.conf
[Timer]
OnCalendar=
OnCalendar=*-*-01 00:00:00
 
# /etc/systemd/system/btrfs-scrub#usr.timer.d/OnCalendar.conf
[Timer]
OnCalendar=
OnCalendar=*-*-10 00:00:00
 
# /etc/systemd/system/btrfs-scrub#var-lib.timer.d/OnCalendar.conf
[Timer]
OnCalendar=
OnCalendar=*-*-20 00:00:00

How can I show progress for a long-running Ansible task?

I have a some Ansible tasks that perform unfortunately long operations - things like running an synchronization operation with an S3 folder. It's not always clear if they're progressing, or just stuck (or the ssh connection has died), so it would be nice to have some sort of progress output displayed. If the command's stdout/stderr was directly displayed, I'd see that, but Ansible captures the output.
Piping output back is a difficult problem for Ansible to solve in its current form. But are there any Ansible tricks I can use to provide some sort of indication that things are still moving?
Current ticket is https://github.com/ansible/ansible/issues/4870
I came across this problem today on OSX, where I was running a docker shell command which took a long time to build and there was no output whilst it built. It was very frustrating to not understand whether the command had hung or was just progressing slowly.
I decided to pipe the output (and error) of the shell command to a port, which could then be listened to via netcat in a separate terminal.
myplaybook.yml
- name: run some long-running task and pipe to a port
shell: myLongRunningApp > /dev/tcp/localhost/4000 2>&1
And in a separate terminal window:
$ nc -lk 4000
Output from my
long
running
app will appear here
Note that I pipe the error output to the same port; I could as easily pipe to a different port.
Also, I ended up setting a variable called nc_port which will allow for changing the port in case that port is in use. The ansible task then looks like:
shell: myLongRunningApp > /dev/tcp/localhost/{{nc_port}} 2>&1
Note that the command myLongRunningApp is being executed on localhost (i.e. that's the host set in the inventory) which is why I listen to localhost with nc.
Ansible has since implemented the following:
---
# Requires ansible 1.8+
- name: 'YUM - async task'
yum:
name: docker-io
state: installed
async: 1000
poll: 0
register: yum_sleeper
- name: 'YUM - check on async task'
async_status:
jid: "{{ yum_sleeper.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 30
For further information, see the official documentation on the topic (make sure you're selecting your version of Ansible).
There's a couple of things you can do, but as you have rightly pointed out, Ansible in its current form doesn't really offer a good solution.
Official-ish solutions:
One idea is to mark the task as async and poll it. Obviously this is only suitable if it is capable of running in such a manner without causing failure elsewhere in your playbook. The async docs are here and here's an example lifted from them:
- hosts: all
remote_user: root
tasks:
- name: simulate long running op (15 sec), wait for up to 45 sec, poll every 5 sec
command: /bin/sleep 15
async: 45
poll: 5
This can at least give you a 'ping' to know that the task isn't hanging.
The only other officially endorsed method would be Ansible Tower, which has progress bars for tasks but isn't free.
Hacky-ish solutions:
Beyond the above, you're pretty much going to have to roll your own. Your specific example of synching an S3 bucket could be monitored fairly easily with a script periodically calling the AWS CLI and counting the number of items in a bucket, but that's hardly a good, generic solution.
The only thing I could imagine being somewhat effective would be watching the incoming ssh session from one of your nodes.
To do that you could configure the ansible user on that machine to connect via screen and actively watch it. Alternatively perhaps using the log_output option in the sudoers entry for that user, allowing you to tail the file. Details of log_output can be found on the sudoers man page
If you're on Linux you may use systemd-run to create a transient unit and inspect the output with journalctl, like:
sudo systemd-run --unit foo \
bash -c 'for i in {0..10}; do
echo "$((i * 10))%"; sleep 1;
done;
echo "Complete"'
And in another session
sudo journalctl -xf --unit foo
It would output something like:
Apr 07 02:10:34 localhost.localdomain systemd[1]: Started /bin/bash -c for i in {0..10}; do echo "$((i * 10))%"; sleep 1; done; echo "Complete".
-- Subject: Unit foo.service has finished start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit foo.service has finished starting up.
--
-- The start-up result is done.
Apr 07 02:10:34 localhost.localdomain bash[10083]: 0%
Apr 07 02:10:35 localhost.localdomain bash[10083]: 10%
Apr 07 02:10:36 localhost.localdomain bash[10083]: 20%
Apr 07 02:10:37 localhost.localdomain bash[10083]: 30%
Apr 07 02:10:38 localhost.localdomain bash[10083]: 40%
Apr 07 02:10:39 localhost.localdomain bash[10083]: 50%
Apr 07 02:10:40 localhost.localdomain bash[10083]: 60%
Apr 07 02:10:41 localhost.localdomain bash[10083]: 70%
Apr 07 02:10:42 localhost.localdomain bash[10083]: 80%
Apr 07 02:10:43 localhost.localdomain bash[10083]: 90%
Apr 07 02:10:44 localhost.localdomain bash[10083]: 100%
Apr 07 02:10:45 localhost.localdomain bash[10083]: Complete

Parsing entry name from a log

Writing bash parsing scripts is my own personal nightmare, so here I am.
The server log format is below:
197 INFO Thu Mar 27 10:10:32 2014
seq_1_1..JobControl (DSWaitForJob): Waiting for job job_1_1_1 to finish
198 INFO Thu Mar 27 10:10:36 2014
seq_1_1..JobControl (DSWaitForJob): Job job_1_1_1 has finished, status = 3 (Aborted)
199 WARNING Thu Mar 27 10:10:36 2014
seq_1_1..JobControl (#job_1_1_1): Job job_1_1_1 did not finish OK, status = 'Aborted'
From here I need to parse out the string which follows the format:
Job job_name has finished, status = 3 (Aborted)
So from the output above I should get: job_1_1_1
What would the script for that look like if I get this server log as a certain command output?
Thanks xx
Using grep -P:
grep -oP '\w+(?= has finished, status = 3)' file
job_1_1_1

Resources