How do I read / understand ansible logs on target host (written by syslog) - ansible

When you execute ansible on some host, it will write to syslog on that host, something like this:
Dec 1 15:00:22 run-tools python: ansible-<stdin> Invoked with partial=False links=None copy_links=None perms=None owner=False rsync_path=None dest_port=22 _local_rsync_path=rsync group=False existing_only=False archive=True _substitute_controller=False verify_host=False dirs=False private_key=None dest= compress=True rsync_timeout=0 rsync_opts=None set_remote_user=True recursive=None src=/etc/ansible/repo/external/golive/ checksum=False times=None mode=push ssh_args=None delete=False
Dec 1 15:00:22 run-tools python: ansible-<stdin> Invoked with partial=False links=None copy_links=None perms=None owner=False rsync_path=None dest_port=22 _local_rsync_path=rsync group=False existing_only=False archive=True _substitute_controller=False verify_host=False dirs=False private_key=None dest= compress=True rsync_timeout=0 rsync_opts=None set_remote_user=True recursive=None src=/etc/ansible/repo/external/golive/ checksum=False times=None mode=push ssh_args=None delete=False
Dec 1 15:00:22 run-tools python: ansible-<stdin> Invoked with partial=False links=None copy_links=None perms=None owner=False rsync_path=None dest_port=22 _local_rsync_path=rsync group=False existing_only=False archive=True _substitute_controller=False verify_host=False dirs=False private_key=None dest= compress=True rsync_timeout=0 rsync_opts=None set_remote_user=True recursive=None src=/etc/ansible/repo/external/golive/ checksum=False times=None mode=push ssh_args=None delete=False
Dec 1 15:00:56 run-tools python: ansible-<stdin> Invoked with filter=* fact_path=/etc/ansible/facts.d
Dec 1 15:09:56 run-tools python: ansible-<stdin> Invoked with checksum_algorithm=sha1 mime=False get_checksum=True path=/usr/local/bin/check_open_files_generic.sh checksum_algo=sha1 follow=False get_md5=False
Dec 1 15:09:56 run-tools python: ansible-<stdin> Invoked with directory_mode=None force=False remote_src=None path=/usr/local/bin/check_open_files_generic.sh owner=root follow=False group=root state=None content=NOT_LOGGING_PARAMETER serole=None diff_peek=None setype=None dest=/usr/local/bin/check_open_files_generic.sh selevel=None original_basename=check_open_files_generic.sh regexp=None validate=None src=check_open_files_generic.sh seuser=None recurse=False delimiter=None mode=0755 backup=None
Dec 1 15:20:03 run-tools python: ansible-<stdin> Invoked with warn=True executable=None _uses_shell=False _raw_params=visudo -c removes=None creates=None chdir=None
Is there any documentation or explanation of these logs that would help me understand how to read them? Specifically I would like to be able to see what exactly ansible did, which files it touched etc. Is it possible to find it there? Or reconfigure ansible so that it writes this kind of information in there?
Is it possible to configure these logs at all? How?

I am not aware of documentation that explains the contents of syslog messages specifically. However, you can look at some of the logging code in AnsibleModule.log() to see what's going on. Basically, it's reporting module names and the parameters they were called with.
For configuring logs, there are some good suggestions in response to this related question. The summary is that you can get more information - including your request about what ansible did - by specifying a log path and running with the verbose -v flag. For more fine-grained control, you can attack the problem from two different angles:
From the playbook side, you can use the debug module or tailor your handling of changed/failed results to suit your needs. Both of those changes can add useful context to your log output.
Outside of playbooks, you can use Ansible callback plugins to control logging. Here is an example of a callback plugin which intercepts logs and outputs something more human readable.

Related

Systemctl bash variable

I'm trying to get this variable work, but I'm getting always an error about systemd-escape. Although I escape the special character, the variable is not working:
status="systemctl status syslog-ng | grep Active: | sed 's/.*: //' | sed 's/since.*//g'"
The result I'm getting is here:
Invalid unit name "|" was escaped as "\x7c" (maybe you should use systemd-escape?)
Invalid unit name "|" was escaped as "\x7c" (maybe you should use systemd-escape?)
Invalid unit name "'s/.*:" was escaped as "\x27s-.*:" (maybe you should use systemd-escape?)
Invalid unit name "|" was escaped as "\x7c" (maybe you should use systemd-escape?)
Invalid unit name "'s/since.*//g'" was escaped as "\x27s-since.*--g\x27" (maybe you should use systemd-escape?)
Unit \x7c.service could not be found.
Unit grep.service could not be found.
Unit Active:.service could not be found.
Unit \x7c.service could not be found.
Unit sed.service could not be found.
Unit \x27.mount could not be found.
Unit \x7c.service could not be found.
Unit sed.service could not be found.
● syslog-ng.service - System Logger Daemon
Loaded: loaded (/usr/lib/systemd/system/syslog-ng.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2021-10-11 14:01:23 CEST; 26min ago
Docs: man:syslog-ng(8)
Main PID: 3020944 (syslog-ng)
Tasks: 3 (limit: 101081)
Memory: 8.9M
CGroup: /system.slice/syslog-ng.service
└─3020944 /usr/sbin/syslog-ng -F -p /var/run/syslogd.pid
Oct 11 14:01:23 syslog-ng systemd[1]: Starting System Logger Daemon...
Oct 11 14:01:23 syslog-ng syslog-ng[3020944]: [2021-10-11T14:01:23.798405] Plugin module not found in 'module-path'; module-path='/usr/lib64/syslog-ng'>
Oct 11 14:01:23 syslog-ng syslog-ng[3020944]: [2021-10-11T14:01:23.801271] Plugin module not found in 'module-path'; module-path='/usr/lib64/syslog-ng'>
Oct 11 14:01:23 syslog-ng syslog-ng[3020944]: [2021-10-11T14:01:23.801828] Plugin module not found in 'module-path'; module-path='/usr/lib64/syslog-ng'>
Oct 11 14:01:23 syslog-ng syslog-ng[3020944]: [2021-10-11T14:01:23.808340] WARNING: With use-dns(no), dns-cache() will be forced to 'no' too!;
Oct 11 14:01:23 syslog-ng systemd[1]: Started System Logger Daemon.
I just want to have the status of my syslog-ng as a variable
We suggest to follow systemd service unit definition documentation.
As stated in the Command lines section.
This syntax is inspired by shell syntax, but only the meta-characters and expansions described in the following paragraphs are understood, and the expansion of variables is different. Specifically, redirection using "<", "<<", ">", and ">>", pipes using "|", running programs in the background using "&", and other elements of shell syntax are not supported.
We cannot assume bash nor sh syntax/rules writing systemd service unit.
Environment variables are better defined statically.
In case you need to store values in dynamic variables. We suggest create a script service-unit-xyx.bash that executes all the required assignments and eventually run the target service.
Using a single script service-unit-xyx.bash you can easily log and debug every line.
Eventually execute service-unit-xyx.bash script with full path in ExecStart= command.
Using execution script service-unit-xyx.bash also simplify the service unit development, from systemd service unit file to bash/sh script development.
It is also possible to use a ExecStartPre=, ExecStartPost= commands to control service runtime environments.
Read and follow sample service units from the internet.
All commands/scripts should specify absolute path reference.

How do I check the succesfull retry with separate command in ansible?

Ansible 2.9.6
There is standard way to use retry in Ansible.
- name: run my command
register: result
retries: 5
delay: 60
until: result.rc == 0
shell:
cmd: >
mycommand.sh
until is a passive check here.
How can I do the check with the separate command? Like "retry command A several times until command B return 0"
Of cause I may put both commands inside shell execution "commandA ; commandB" and I will get exit status of the second one for the result.rc. But is any Ansible way to do this?
The ansible way would point towards retries over a block or include that contains a command task for each script. However, that's not supported. You can use a modified version of the workaround described here, though, but it starts to get complicated. Therefore, you may prefer to take Zeitounator's suggestion.

User defined (or emmited) username when using the logger(1) linux bash tool command

I am trying to log some custom logs. The problem is that if I use the logger command, the username running the command is also logged. I would like to ommit that info so I can manually fill anything I want. I have read the manual but could not find anything like that. I also tried implementing it in a script (java) but not quit succeed.
Example. Now I am seeing this:
Mar 2 10:31:28 $HOSTNAME $USERNAME: Hello world!
What I would like to see is this:
Mar 2 10:31:28 suhosin[666]: ALERT - canary mismatch on efree() - heap overflow detected (attacker '000.000.000.000', file 'xyz')
Use the -t option to set the tag.
$ logger -t 'nobody' 'hello'
Produces log:
Feb 28 10:25:37 myhostname nobody: hello
Relevant man page section (bold added for emphasis):
-t, --tag tag
Mark every line to be logged with the specified tag. The default tag is the name of the user logged in on the terminal (or a user name based on effective user ID).

command output not captured by shell script when invoked by snmp pass

The problem
SNMPD is correctly delegating SNMP polling requests to another program but the response from that program is not valid. A manual run of the program with the same arguments is responding correctly.
The detail
I've installed the correct LSI raid drivers on a server and want to configure SNMP. As per the instructions, I've added the following to /etc/snmp/snmpd.conf to redirect SNMP polling requests with a given OID prefix to a program:
pass .1.3.6.1.4.1.3582 /usr/sbin/lsi_mrdsnmpmain
It doesn't work correctly for SNMP polling requests:
snmpget -v1 -c public localhost .1.3.6.1.4.1.3582.5.1.4.2.1.2.1.32.1
I get the following response:
Error in packet
Reason: (noSuchName) There is no such variable name in this MIB.
Failed object: SNMPv2-SMI::enterprises.3582.5.1.4.2.1.2.1.32.1
What I've tried
SNMPD passes two arguments, -g and <oid> and expects a three line response <oid>, <data-type> and <data-value>.
If I manually run the following:
/usr/sbin/lsi_mrdsnmpmain -g .1.3.6.1.4.1.3582.5.1.4.2.1.2.1.32.0
I correctly get a correct three line response:
.1.3.6.1.4.1.3582.5.1.4.2.1.2.1.32.0
integer
30
This means that the pass command is working correctly and the /usr/sbin/lsi_mrdsnmpmain program is working correctly in this example
I tried replacing /usr/sbin/lsi_mrdsnmpmain with a bash script. The bash script delegates the call and logs the supplied arguments and output from the delegated call:
#!/bin/bash
echo "In: '$#" > /var/log/snmp-pass-test
RETURN=$(/usr/sbin/lsi_mrdsnmpmain $#)
echo "$RETURN"
echo "Out: '$RETURN'" >> /var/log/snmp-pass-test
And modified the pass command to redirect to the bash script. If I run the bash script manually /usr/sbin/snmp-pass-test -g .1.3.6.1.4.1.3582.5.1.4.2.1.2.1.32.0 I get the correct three line response as I did when I ran /usr/sbin/lsi_mrdsnmpmain manually and I get the following logged:
In: '-g .1.3.6.1.4.1.3582.5.1.4.2.1.2.1.32.0
Out: '.1.3.6.1.4.1.3582.5.1.4.2.1.2.1.32.0
integer
30'
When I rerun the snmpget test, I get the same Error in packet... error and the bash script's logging shows that the captured delegated call output is empty:
In: '-g .1.3.6.1.4.1.3582.5.1.4.2.1.2.1.32.0
Out: ''
If I modify the bash script to only echo an empty line I also get the same Error in packet... message.
I've also tried ensuring that the environment variables that are present when I manually call /usr/sbin/lsi_mrdsnmpmain are the same for the bash script but I get the same empty output.
Finally, my questions
Why would the bash script behave differently in these two scenarios?
Is it likely that the problem that exists with the bash scripts is the same as originally noticed (manually running program has different output to SNMPD run program)?
Updates
eewanco's suggestions
What user is running the program in each scenario?
I added echo "$(whoami)" > /var/log/snmp-pass-test to the bash script and root was added to the logs
Maybe try executing it in cron
Adding the following to root's crontab and the correct three line response was logged:
* * * * * /usr/sbin/lsi_mrdsnmpmain -g .1.3.6.1.4.1.3582.5.1.4.2.1.2.1.32.1 >> /var/log/snmp-test-cron 2>&1
Grisha Levit's suggestion
Try logging the stderr
There aren't any errors logged
Checking /var/log/messages
When I run it via SNMPD, I get MegaRAID SNMP AGENT: Error in getting Shared Memory(lsi_mrdsnmpmain) logged. When I run it directly, I don't. I've done a bit of googling and I may need lm_sensors installed; I'll try this.
I installed lm_sensors & compat-libstdc++-33.i686 (the latter because it said it was a pre-requisite from the instructions and I was missing it), uninstalled and reinstalled the LSI drivers and am experiencing the same issue.
SELinux
I accidently stumbled upon a page about extending snmpd with scripts and it says to check the script has the right SELinux context. I ran grep AVC /var/log/audit/audit.log | grep snmp before and after running a snmpget and the following entry is added as a direct result from running snmpget:
type=AVC msg=audit(1485967641.075:271): avc: denied { unix_read unix_write } for pid=5552 comm="lsi_mrdsnmpmain" key=558265 scontext=system_u:system_r:snmpd_t:s0 tcontext=system_u:system_r:initrc_t:s0 tclass=shm
I'm now assuming that SELinux is causing the call to fail; I'll dig further...see answer for solution.
strace (eewanco's suggestion)
Try using strace with and without snmp and see if you can catch a system call failure or some additional hints
For completeness, I wanted to see if strace would have hinted that SELinux was denying. I had to remove the policy packages using semodule -r <policy-package-name> to reintroduce the problem then ran the following:
strace snmpget -v1 -c public localhost .1.3.6.1.4.1.3582.5.1.4.2.1.2.1.32.1 >> strace.log 2>&1
The end of strace.log is as follows and unless I'm missing something, it doesn't seem to provide any hints:
...
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(161), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)= [{"0;\2\1\0\4\20public\240$\2\4I\264-m\2"..., 61}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 61
select(4, [3], NULL, NULL, {0, 999997}) = 1 (in [3], left {0, 998475})
brk(0xab9000) = 0xab9000
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(161), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)= [{"0;\2\1\0\4\20public\242$\2\4I\264-m\2"..., 65536}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT) = 61
write(2, "Error in packet\nReason: (noSuchN"..., 81Error in packet
Reason: (noSuchName) There is no such variable name in this MIB.
) = 81
write(2, "Failed object: ", 15Failed object: ) = 15
write(2, "SNMPv2-SMI::enterprises.3582.5.1"..., 48SNMPv2- SMI::enterprises.3582.5.1.4.2.1.2.1.32.1
) = 48
write(2, "\n", 1
) = 1
brk(0xaa9000) = 0xaa9000
close(3) = 0
exit_group(2) = ?
+++ exited with 2 +++
It was SELinux that was denying snmpd a delegated call to /usr/sbin/lsi_mrdsnmpmain (and probably beyond).
To identify it, I ran grep AVC /var/log/audit/audit.log and for each entry, I ran the following:
echo "<grepped-output>" | audit2allow -a -M <filename>
This creates a SELinux policy package that should allow the delegated call through. The package is then loaded using the following:
semodule -i <filename>.pp
I had to do this 5 times as there were different causes of denial (unix_read unix_write, associate, read write). I'll look to combine the modules into one.
Now when I run snmpget I get the correct delegated output:
SNMPv2-SMI::enterprises.3582.5.1.4.2.1.2.1.32.1 = INTEGER: 34

How can I show progress for a long-running Ansible task?

I have a some Ansible tasks that perform unfortunately long operations - things like running an synchronization operation with an S3 folder. It's not always clear if they're progressing, or just stuck (or the ssh connection has died), so it would be nice to have some sort of progress output displayed. If the command's stdout/stderr was directly displayed, I'd see that, but Ansible captures the output.
Piping output back is a difficult problem for Ansible to solve in its current form. But are there any Ansible tricks I can use to provide some sort of indication that things are still moving?
Current ticket is https://github.com/ansible/ansible/issues/4870
I came across this problem today on OSX, where I was running a docker shell command which took a long time to build and there was no output whilst it built. It was very frustrating to not understand whether the command had hung or was just progressing slowly.
I decided to pipe the output (and error) of the shell command to a port, which could then be listened to via netcat in a separate terminal.
myplaybook.yml
- name: run some long-running task and pipe to a port
shell: myLongRunningApp > /dev/tcp/localhost/4000 2>&1
And in a separate terminal window:
$ nc -lk 4000
Output from my
long
running
app will appear here
Note that I pipe the error output to the same port; I could as easily pipe to a different port.
Also, I ended up setting a variable called nc_port which will allow for changing the port in case that port is in use. The ansible task then looks like:
shell: myLongRunningApp > /dev/tcp/localhost/{{nc_port}} 2>&1
Note that the command myLongRunningApp is being executed on localhost (i.e. that's the host set in the inventory) which is why I listen to localhost with nc.
Ansible has since implemented the following:
---
# Requires ansible 1.8+
- name: 'YUM - async task'
yum:
name: docker-io
state: installed
async: 1000
poll: 0
register: yum_sleeper
- name: 'YUM - check on async task'
async_status:
jid: "{{ yum_sleeper.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 30
For further information, see the official documentation on the topic (make sure you're selecting your version of Ansible).
There's a couple of things you can do, but as you have rightly pointed out, Ansible in its current form doesn't really offer a good solution.
Official-ish solutions:
One idea is to mark the task as async and poll it. Obviously this is only suitable if it is capable of running in such a manner without causing failure elsewhere in your playbook. The async docs are here and here's an example lifted from them:
- hosts: all
remote_user: root
tasks:
- name: simulate long running op (15 sec), wait for up to 45 sec, poll every 5 sec
command: /bin/sleep 15
async: 45
poll: 5
This can at least give you a 'ping' to know that the task isn't hanging.
The only other officially endorsed method would be Ansible Tower, which has progress bars for tasks but isn't free.
Hacky-ish solutions:
Beyond the above, you're pretty much going to have to roll your own. Your specific example of synching an S3 bucket could be monitored fairly easily with a script periodically calling the AWS CLI and counting the number of items in a bucket, but that's hardly a good, generic solution.
The only thing I could imagine being somewhat effective would be watching the incoming ssh session from one of your nodes.
To do that you could configure the ansible user on that machine to connect via screen and actively watch it. Alternatively perhaps using the log_output option in the sudoers entry for that user, allowing you to tail the file. Details of log_output can be found on the sudoers man page
If you're on Linux you may use systemd-run to create a transient unit and inspect the output with journalctl, like:
sudo systemd-run --unit foo \
bash -c 'for i in {0..10}; do
echo "$((i * 10))%"; sleep 1;
done;
echo "Complete"'
And in another session
sudo journalctl -xf --unit foo
It would output something like:
Apr 07 02:10:34 localhost.localdomain systemd[1]: Started /bin/bash -c for i in {0..10}; do echo "$((i * 10))%"; sleep 1; done; echo "Complete".
-- Subject: Unit foo.service has finished start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit foo.service has finished starting up.
--
-- The start-up result is done.
Apr 07 02:10:34 localhost.localdomain bash[10083]: 0%
Apr 07 02:10:35 localhost.localdomain bash[10083]: 10%
Apr 07 02:10:36 localhost.localdomain bash[10083]: 20%
Apr 07 02:10:37 localhost.localdomain bash[10083]: 30%
Apr 07 02:10:38 localhost.localdomain bash[10083]: 40%
Apr 07 02:10:39 localhost.localdomain bash[10083]: 50%
Apr 07 02:10:40 localhost.localdomain bash[10083]: 60%
Apr 07 02:10:41 localhost.localdomain bash[10083]: 70%
Apr 07 02:10:42 localhost.localdomain bash[10083]: 80%
Apr 07 02:10:43 localhost.localdomain bash[10083]: 90%
Apr 07 02:10:44 localhost.localdomain bash[10083]: 100%
Apr 07 02:10:45 localhost.localdomain bash[10083]: Complete

Resources