I have a systemd service file which run a docker container with log driver journald.
ExecStart=/usr/bin/docker run \
--name ${CONTAINER_NAME} \
-p ${PORT}:8080 \
--add-host ${DNS} \
-v /etc/localtime:/etc/localtime:ro \
--log-driver=journald \
--log-opt tag="docker.{{.Name}}" \
${RESPOSITORY_NAME}/${CONTAINER_NAME}
ExecStop=-/usr/bin/docker stop ${CONTAINER_NAME}
When I check the logs via journalctl I see two different _TRANSPORT.
With journalctl -u test.service I see _TRANSPORT=stdout. And with Journalctl CONTAINER_NAME=test I see _TRANSPORT=journal
What is the difference?
The difference here is in how the logs get to systemd-journald before they are logged.
As of right now, the supported transports (at least according to the _TRANSPORT field in systemd-journald) are: audit, driver, syslog, journal, stdout and kernel (see systemd.journal-fields(7)).
In your case, everything logged to stdout by commands executed by the ExecStart= and ExecStop= directives is logged under the _TRANSPORT=stdout transport.
However, Docker is internally capable of using the journald logging driver which, among other things, introduces several custom journal fields - one of them being CONTAINER_ID=. It's just a different method of delivering data to systemd-journald - instead of relying on systemd to catch and send everything from stdout to systemd-journald, Docker internally sends everything straight to systemd-journald by itself.
This can be achieved by using the sd-journal API (as described in sd-journal(3)). Docker uses the go-systemd Go bindings for the sd-journal C library.
Simple example:
hello.c
#include <stdio.h>
#include <systemd/sd-journal.h>
int main(void)
{
printf("Hello from stdout\n");
sd_journal_print(LOG_INFO, "Hello from journald");
return 0;
}
# gcc -o /var/tmp/hello -lsystemd hello.c
# cat > /etc/systemd/system/hello.service << EOF
[Service]
ExecStart=/var/tmp/hello
EOF
# systemctl daemon-reload
# systemctl start test.service
Now if I check journal, I'll see both messages:
# journalctl -u hello.service
-- Logs begin at Mon 2019-09-30 22:08:02 CEST, end at Fri 2020-03-27 17:11:29 CET. --
Mar 27 17:08:28 localhost systemd[1]: Started hello.service.
Mar 27 17:08:28 localhost hello[921852]: Hello from journald
Mar 27 17:08:28 localhost hello[921852]: Hello from stdout
Mar 27 17:08:28 localhost systemd[1]: hello.service: Succeeded.
But each of them arrived using a different transport:
# journalctl -u hello.service _TRANSPORT=stdout
-- Logs begin at Mon 2019-09-30 22:08:02 CEST, end at Fri 2020-03-27 17:12:29 CET. --
Mar 27 17:08:28 localhost hello[921852]: Hello from stdout
# journalctl -u hello.service _TRANSPORT=journal
-- Logs begin at Mon 2019-09-30 22:08:02 CEST, end at Fri 2020-03-27 17:12:29 CET. --
Mar 27 17:08:28 localhost systemd[1]: Started hello.service.
Mar 27 17:08:28 localhost hello[921852]: Hello from journald
Mar 27 17:08:28 localhost systemd[1]: hello.service: Succeeded.
Related
Using the following redis.conf
▶ cat redis.conf
bind 0.0.0.0
spinning up a redis container
▶ docker run -d --name redis-test -p 11111:6379 -v /Users/redis.conf:/redis.conf redis redis-server /redis.conf
59eb1612e8c3e2403e18ce889ce1438f6c6a23a7c70bed30b46ff765b7fe7038
logs seem healthy
▶ docker logs -f 59eb1612e8c3e2403e18ce889ce1438f6c6a23a7c70bed30b46ff765b7fe7038
1:C 18 Mar 2021 17:57:13.954 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 18 Mar 2021 17:57:13.954 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 18 Mar 2021 17:57:13.954 # Configuration loaded
1:M 18 Mar 2021 17:57:13.955 * monotonic clock: POSIX clock_gettime
1:M 18 Mar 2021 17:57:13.955 * Running mode=standalone, port=6379.
1:M 18 Mar 2021 17:57:13.955 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 18 Mar 2021 17:57:13.956 # Server initialized
1:M 18 Mar 2021 17:57:13.956 * Ready to accept connections
container seems up
▶ docker ps | grep -i redis
59eb1612e8c3 redis "docker-entrypoint.s…" 3 minutes ago Up 3 minutes 0.0.0.0:11111->6379/tcp redis-test
If all the above are more or less good indications, why am I unable to connect to the container
▶ redis-cli -h localhost -p 11111
Could not connect to Redis at localhost:11111: Connection refused
not connected>
▶ redis-cli -h 127.0.0.1 -p 11111
Could not connect to Redis at 127.0.0.1:11111: Connection refused
not connected>
Working on MacOS Catalina
Find the IP Address of container called redis-test by running this command (I'm in Linux, but I think that should be the same on MacOS, sorry if that's not the same):
docker inspect redis-test | grep -i ipaddress
The result should be something like this:
"IPAddress": "172.21.0.2"
Now try:
redis-cli -h 172.21.0.2 -p 11111
I'm trying to return a value from a simple script. However, I'm getting the following error.
Feb 26 09:26:37 localhost systemd[1]: Starting Collectd statistics daemon...
Feb 26 09:26:37 localhost collectd[834]: plugin_load: plugin "exec" successfully loaded.
Feb 26 09:26:37 localhost collectd[834]: Systemd detected, trying to signal readyness.
Feb 26 09:26:37 localhost systemd[1]: Started Collectd statistics daemon.
Feb 26 09:26:37 localhost collectd[834]: Initialization complete, entering read-loop.
Feb 26 09:26:37 localhost collectd[834]: exec plugin: Unable to parse command, ignoring line: "73"
Feb 26 09:26:47 localhost collectd[834]: exec plugin: Unable to parse command, ignoring line: "74"
Feb 26 09:26:57 localhost collectd[834]: exec plugin: Unable to parse command, ignoring line: "73"
Feb 26 09:27:07 localhost collectd[834]: exec plugin: Unable to parse command, ignoring line: "73"
My config is
LoadPlugin exec
<Plugin exec>
Exec "cwagent" "/opt/aws/amazon-cloudwatch-agent/bin/supervisor.sh"
</Plugin>
and my script is
#!/bin/bash
VALUE=$(/bin/systemctl status | wc -l)
echo "$VALUE"
I realise that this is probably a silly mistake I'm making. I have spent a bit of time playing around and googling to try to understand the problem. But I'm afraid I've made little progress. Grateful for any advice :¬)
Number of things, your plugin is forked off by collectd with the expectation that it keeps running and producing consumable output, so you need to use a while loop like it lays out here: https://collectd.org/wiki/index.php/Plugin:Exec
Second, your output format is wrong. I found this bit of the documentation badly written because it isn't completely clear how the gauge name and metric name are constituted out of the string. Taking the example in the page above:
echo "PUTVAL \"$HOSTNAME/exec-magic/gauge-magic_level\" interval=$INTERVAL N:$VALUE"
Then:
exec-magic is the plugin name
magic_level is the metric name
gauge is the data source type from collectd types
N: is the abbreviation for "now" as defined in the exec plugin
So putting this together you'd something similar to:
#!/bin/bash
HOSTNAME="${COLLECTD_HOSTNAME:-localhost}"
INTERVAL="${COLLECTD_INTERVAL:-60}"
while sleep "$INTERVAL"; do
VALUE=$(/bin/systemctl status | wc -l)
echo "PUTVAL ${HOSTNAME}/cwagent/counter-line_count\" N:$VALUE"
done
In this case you are using the simple counter type and returning a single value equivalent to the number of lines you counted in your command.
I'm creating my first background service and I want to communicate with it through a socket.
I have the following script /tmp/myservice.sh:
#! /usr/bin/env bash
while read received_cmd
do
echo "Received command ${received_cmd}"
done
And the following socket /etc/systemd/user/myservice.socket
[Unit]
Description=Socket to communicate with myservice
[Socket]
ListenSequentialPacket=/tmp/myservice.socket
And the following service:
[Unit]
Description=A simple service example
[Service]
ExecStart=/bin/bash /tmp/myservice.sh
StandardError=journal
StandardInput=socket
StandardOutput=socket
Type=simple
The idea is to understand how to communicate with a background service, here using an unix file socket. The script works well when launched from the shell and reading stdin and I thought that by setting StandardInput = "socket" it would read from the socket the same way.
Nevertheless, when I run nc -U /tmp/myservice.socket the command returns right away and I have the following output:
$ journalctl --user -u myservice
-- Logs begin at Sat 2020-10-24 17:26:25 BST, end at Thu 2020-10-29 14:00:53 GMT. --
Oct 29 08:40:16 shiny systemd[1689]: Started A simple service example.
Oct 29 08:40:16 shiny bash[21941]: /tmp/myservice.sh: line 3: read: read error: 0: Invalid argument
Oct 29 08:40:16 shiny systemd[1689]: myservice.service: Succeeded.
Oct 29 08:40:16 shiny systemd[1689]: Started A simple service example.
Oct 29 08:40:16 shiny bash[21942]: /tmp/myservice.sh: line 3: read: read error: 0: Invalid argument
Oct 29 08:40:16 shiny systemd[1689]: myservice.service: Succeeded.
Oct 29 08:40:16 shiny systemd[1689]: Started A simple service example.
Oct 29 08:40:16 shiny bash[21943]: /tmp/myservice.sh: line 3: read: read error: 0: Invalid argument
Oct 29 08:40:16 shiny systemd[1689]: myservice.service: Succeeded.
Oct 29 08:40:16 shiny systemd[1689]: Started A simple service example.
Oct 29 08:40:16 shiny bash[21944]: /tmp/myservice.sh: line 3: read: read error: 0: Invalid argument
Oct 29 08:40:16 shiny systemd[1689]: myservice.service: Succeeded.
Oct 29 08:40:16 shiny systemd[1689]: Started A simple service example.
Oct 29 08:40:16 shiny bash[21945]: /tmp/myservice.sh: line 3: read: read error: 0: Invalid argument
Oct 29 08:40:16 shiny systemd[1689]: myservice.service: Succeeded.
Oct 29 08:40:16 shiny systemd[1689]: myservice.service: Start request repeated too quickly.
Oct 29 08:40:16 shiny systemd[1689]: myservice.service: Failed with result 'start-limit-hit'.
Oct 29 08:40:16 shiny systemd[1689]: Failed to start A simple service example.
Did I misunderstand how sockets work? Why read fails to read from the socket? Should I use another mechanism to communicate with my background service (as I said, it's my first background service so I may do unconventional things here)?
The only thing I have seen working with a shell script is ListenStream= rather than ListenSequentialPacket=. (Obviously, this means you lose packet boundaries, but the shell read is usually oriented to read lines ending \n from streams, so it is not usually a problem).
But the most important thing that is missing, is the extra Accept line:
[Socket]
ListenStream=...
Accept=true
As I understand it, without this the service will be passed a socket on which it must first do a socket accept() call, to get the actual connection socket (hence the read error). The service must also then handle all further connections.
By using Accept=true, a new service will be started for each new connection, and will be passed the immediately usable socket. Note, however, that this means the service must now be templated, i.e. called myservice#.service rather than myservice.service.
(For datagram sockets, Accept must be left defaulted to false). See man systemd.socket.
I'm having trouble with systemd journal cursors.
If I SeekTail(), I get a value for the cursor and can keep calling Next() and it behaves exactly as expected.
However, if I SeekCursor() and then call Next() entry it jumps back to the Head() and starts reading over again. Why would it do that? I can verify that it did locate the cursor correctly. But it's as though SeekCursor only worked for the specific item and thats all. This is not what I would expect reading the man pages and other documentation.
I'm using go-systemd from the CoreOS project which is a simple wrapper for the systemd C-API.
But the go wrapper is not the issue, the C library is. I can see that journalctl is doing the same thing on Ubuntu.
e.g. append to journal, show tail output, get full entry detail in json. Jump to cursor and show tail
matthewh#xen:~$ echo "Cursor example" | systemd-cat
matthewh#xen:~$ journalctl -f
-- Logs begin at Mon 2017-07-03 08:56:12 NZST. --
May 31 17:50:31 xen code.desktop[6771]: [main 17:50:31] update#setState idle
May 31 17:55:01 xen CRON[4468]: pam_unix(cron:session): session opened for user root by (uid=0)
May 31 17:55:01 xen CRON[4469]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 31 17:55:01 xen CRON[4468]: pam_unix(cron:session): session closed for user root
May 31 17:55:03 xen smokeping[2839]: RRDs::update ERROR: /var/lib/smokeping/Local/LocalMachine.rrd: illegal attempt to update using time 1527746103 when last update time is 4073847643 (minimum one second step)
May 31 17:55:22 xen cat[4479]: Hello
May 31 17:59:28 xen cat[4539]: Cursor example
May 31 18:00:03 xen smokeping[2839]: RRDs::update ERROR: /var/lib/smokeping/Local/LocalMachine.rrd: illegal attempt to update using time 1527746403 when last update time is 4073847643 (minimum one second step)
May 31 18:00:06 xen cat[4547]: Cursor example
May 31 18:01:09 xen cat[4597]: Cursor example
^C
matthewh#xen:~$ journalctl -f -o json-pretty -n1
{
"__CURSOR" : "s=b7f2a0f19c9946abab26788729a244c5;i=52a5;b=1ba1d5cabb5840adb02eedc4aba5b4d6;m=2d96b77f94;t=56d7a319ee462;x=8afac4ada39ae1fb",
"__REALTIME_TIMESTAMP" : "1527746469487714",
"__MONOTONIC_TIMESTAMP" : "195802136468",
"_BOOT_ID" : "1ba1d5cabb5840adb02eedc4aba5b4d6",
"_UID" : "1000",
"_GID" : "1000",
"_CAP_EFFECTIVE" : "0",
"_MACHINE_ID" : "f899a862e4aa4775b8995564d8da565d",
"_HOSTNAME" : "xen",
"_TRANSPORT" : "stdout",
"PRIORITY" : "6",
"_COMM" : "cat",
"MESSAGE" : "Cursor example",
"_STREAM_ID" : "d1fbcc3ff027401e9dc95b5648f9322e",
"_PID" : "4597"
}
^C
matthewh#xen:~$ journalctl -f --cursor="s=b7f2a0f19c9946abab26788729a244c5;i=52a5;b=1ba1d5cabb5840adb02eedc4aba5b4d6;m=2d96b77f94;t=56d7a319ee462;x=8afac4ada39ae1fb"
-- Logs begin at Mon 2017-07-03 08:56:12 NZST. --
May 31 18:01:09 xen cat[4597]: Cursor example
-- Reboot --
Feb 04 13:03:03 xen systemd-journald[420]: Runtime journal (/run/log/journal/) is 8.0M, max 241.0M, 233.0M free.
Feb 04 13:03:03 xen kernel: Initializing cgroup subsys cpuset
Feb 04 13:03:03 xen kernel: Initializing cgroup subsys cpu
Feb 04 13:03:03 xen kernel: Initializing cgroup subsys cpuacct
Feb 04 13:03:03 xen kernel: Linux version 4.4.0-116-generic (buildd#lgw01-amd64-021) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9) ) #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 (Ubuntu 4.4.0-116.140-generic 4.4.98)
Feb 04 13:03:03 xen kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-116-generic root=UUID=f95a581f-2afb-4428-bade-c913f1c51741 ro quiet splash vt.handoff=7
Feb 04 13:03:03 xen kernel: KERNEL supported cpus:
Feb 04 13:03:03 xen kernel: Intel GenuineIntel
Feb 04 13:03:03 xen kernel: AMD AuthenticAMD
^C
Note the "--reboot--" text and the fact that it jumped back several days in the past. But prior to that, it located my entry via systemd-cat so it was found.
What am I doing wrong? is it a bug or an oversight on my part?
Oddly enough, I have a CoreOS server I was able to test this on and it behaves differently. It behaves as expected. The version of journalctl is the same on both. All the configuration is untouched stock standard.
I have a some Ansible tasks that perform unfortunately long operations - things like running an synchronization operation with an S3 folder. It's not always clear if they're progressing, or just stuck (or the ssh connection has died), so it would be nice to have some sort of progress output displayed. If the command's stdout/stderr was directly displayed, I'd see that, but Ansible captures the output.
Piping output back is a difficult problem for Ansible to solve in its current form. But are there any Ansible tricks I can use to provide some sort of indication that things are still moving?
Current ticket is https://github.com/ansible/ansible/issues/4870
I came across this problem today on OSX, where I was running a docker shell command which took a long time to build and there was no output whilst it built. It was very frustrating to not understand whether the command had hung or was just progressing slowly.
I decided to pipe the output (and error) of the shell command to a port, which could then be listened to via netcat in a separate terminal.
myplaybook.yml
- name: run some long-running task and pipe to a port
shell: myLongRunningApp > /dev/tcp/localhost/4000 2>&1
And in a separate terminal window:
$ nc -lk 4000
Output from my
long
running
app will appear here
Note that I pipe the error output to the same port; I could as easily pipe to a different port.
Also, I ended up setting a variable called nc_port which will allow for changing the port in case that port is in use. The ansible task then looks like:
shell: myLongRunningApp > /dev/tcp/localhost/{{nc_port}} 2>&1
Note that the command myLongRunningApp is being executed on localhost (i.e. that's the host set in the inventory) which is why I listen to localhost with nc.
Ansible has since implemented the following:
---
# Requires ansible 1.8+
- name: 'YUM - async task'
yum:
name: docker-io
state: installed
async: 1000
poll: 0
register: yum_sleeper
- name: 'YUM - check on async task'
async_status:
jid: "{{ yum_sleeper.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 30
For further information, see the official documentation on the topic (make sure you're selecting your version of Ansible).
There's a couple of things you can do, but as you have rightly pointed out, Ansible in its current form doesn't really offer a good solution.
Official-ish solutions:
One idea is to mark the task as async and poll it. Obviously this is only suitable if it is capable of running in such a manner without causing failure elsewhere in your playbook. The async docs are here and here's an example lifted from them:
- hosts: all
remote_user: root
tasks:
- name: simulate long running op (15 sec), wait for up to 45 sec, poll every 5 sec
command: /bin/sleep 15
async: 45
poll: 5
This can at least give you a 'ping' to know that the task isn't hanging.
The only other officially endorsed method would be Ansible Tower, which has progress bars for tasks but isn't free.
Hacky-ish solutions:
Beyond the above, you're pretty much going to have to roll your own. Your specific example of synching an S3 bucket could be monitored fairly easily with a script periodically calling the AWS CLI and counting the number of items in a bucket, but that's hardly a good, generic solution.
The only thing I could imagine being somewhat effective would be watching the incoming ssh session from one of your nodes.
To do that you could configure the ansible user on that machine to connect via screen and actively watch it. Alternatively perhaps using the log_output option in the sudoers entry for that user, allowing you to tail the file. Details of log_output can be found on the sudoers man page
If you're on Linux you may use systemd-run to create a transient unit and inspect the output with journalctl, like:
sudo systemd-run --unit foo \
bash -c 'for i in {0..10}; do
echo "$((i * 10))%"; sleep 1;
done;
echo "Complete"'
And in another session
sudo journalctl -xf --unit foo
It would output something like:
Apr 07 02:10:34 localhost.localdomain systemd[1]: Started /bin/bash -c for i in {0..10}; do echo "$((i * 10))%"; sleep 1; done; echo "Complete".
-- Subject: Unit foo.service has finished start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit foo.service has finished starting up.
--
-- The start-up result is done.
Apr 07 02:10:34 localhost.localdomain bash[10083]: 0%
Apr 07 02:10:35 localhost.localdomain bash[10083]: 10%
Apr 07 02:10:36 localhost.localdomain bash[10083]: 20%
Apr 07 02:10:37 localhost.localdomain bash[10083]: 30%
Apr 07 02:10:38 localhost.localdomain bash[10083]: 40%
Apr 07 02:10:39 localhost.localdomain bash[10083]: 50%
Apr 07 02:10:40 localhost.localdomain bash[10083]: 60%
Apr 07 02:10:41 localhost.localdomain bash[10083]: 70%
Apr 07 02:10:42 localhost.localdomain bash[10083]: 80%
Apr 07 02:10:43 localhost.localdomain bash[10083]: 90%
Apr 07 02:10:44 localhost.localdomain bash[10083]: 100%
Apr 07 02:10:45 localhost.localdomain bash[10083]: Complete