Pattern to detect Ansible failure - ansible

I have a an Ansible playbook quite big with a laot of template and it generates tons of logs (hundreds of thousands of lines in my log file)
Whenever a task fail, I can spot it with failed=
My problem is how to see where the error as of today, all I'm doing is scrolling the log and pray for my eyes to find the error but when you have that quantity of lines, it can take time and very frustrating.
Is there any pattern I should look for to find where the error is?
Thanks in advance for your inputs

By default, Ansible stops after the first failed task...
https://docs.ansible.com/ansible/latest/user_guide/playbooks_error_handling.html
Ansible normally has defaults that make sure to check the return codes
of commands and modules and it fails fast – forcing an error to be
dealt with unless you decide otherwise.
If your playbook handles a lot of targets and you want to stop everything at the first failure on any target, you an use any_errors_fatal: true play option.
https://docs.ansible.com/ansible/latest/user_guide/playbooks_error_handling.html#aborting-the-play

Related

systemd-udev rule applied multiple times (twice in my case)

I have udev rule with the following content:
DRIVERS=="adt7310", RUN+="/bin/ln -s /sys//devices/platform/soc/fff00000.spi/spi_master/spi0/spi0.0/temp1_input /dev/temperature_adt"
The problem is that this rule is applied twice and in the log appears annoying line:
localhost systemd-udevd[1104]: Process '/bin/ln -s /sys//devices/platform/soc/fff00000.spi/spi_master/spi0/spi0.0/temp1_input /dev/temperature_adt' failed with exit code 1.
I have seen a lot of similar issues over the internet and many of them still unresolved. But most of them were about PC's and quite complicated rules.
Here it is an embedded system, the link is created, nothing wrong happens but I simply don't know what to tell to QA people...
Thanks
Please be very careful when writing rules and especially when you are taking someone else rules. Always run
udevadm info -a /sys/...device
and read very carefully the information.
In my case the solution is
DRIVER=="adt7310" instead of DRIVERS=="adt7310"
My apologies

Monitor A File For Additions And Get Last Added Line

I'm having trouble monitoring a file for changes. I need to be able to know when a file changes, and when it does, I need the new line that was added. I intend to parse each line and find ones that match certain criteria, and act on information in those lines. I know the expected number of matching lines ahead of time, but I do not know how many lines in total will be added to the file, or where the matching lines will be.
I've tried 2 packages so far, with no avail.
fsnotify/fsnotify
As fas as I can tell, fsnotify can only tell me when a file is modified, not what the details of the modification was. Since I need to know what exactly was added to the file, this is no good for me.
(As a side-question, can this be run in a loop? The example that I tried exited after just one modification. I need to monitor for multiple modifications.)
hpcloud/tail
This package tries to mimic the Unix tail command, but it seems to have its own issues. The output that I get includes timestamps and other data - I just want the added line, nothing else. Also, it seems to think a file has been modified multiple times, even when it's just one edit. Further, the deal breaker here is that it does not output the last line if the line was not followed by a newline character.
Delegating to tail
I came across this answer, which suggests to delegate this work to the tail command itself, but I need this to work cross-platform (specifically, macOS, Linux and Windows). I don't believe that an equivalent command exists on Windows.
How do I go about tackling this?
#user2515526,
Usually changed diff is out of scope of file watchers' functionality, because, you know, you could change an image, and a watcher would need to keep a track several Mb of a diff in memory, and what if we have thousands of files?
However, as bad as it sounds, this may be exactly the way you want to implement this (sure, depends on your app, etc. - could be fine for text files), i.e. - keeping a map of diffs (1 diff per file) since last modification. Cannot say I like it, but sounds like fsnotify has no support for changes/diffs that you need.
Also, regarding your question about running in a loop, maybe you can get some hints here: https://github.com/kataras/iris/blob/8370d76910cdd8de043753ed81ae080eae8dc798/utils/file.go
Its a framework that allows to build a server that watches for TypeScript file changes. So sounds similar to your case/question.
Cheers,
-D

Windows 7 task scheduler keeps returning operational code 2

I set up a scheduled task to run under my account. Everything it runs, even if it is successful, returns an operational code of (2). I looked this up this error code at the below link, and it claims it cannot find the specific file.
http://www.hiteksoftware.com/knowledge/articles/049.htm
Even if I do something very simple, I get back operational code of (2). For example:
run program: cmd.exe
start in path: c:\windows\system32
I start the task and I see the process running in my task manager, so I kill the task. I then check in the history of scheduled task and it shows up as (2).
Something more realistic of what I am doing:
<?
/* file in c:\php\test.php */
echo "hello";
?>
run program: php.exe
start in path: c:\php
arguments: -f test.php
Everything works in the command line, but Windows schedule task keeps returning operational code (2). I should be seeing an operational code of (0), which means successful, correct?
You may not have put a path in the "Start In (Optional) box of the Edit Action dialog box.
Even though you had a path on the program that was being executed, Windows 7 still wants you to tell it where to run the program.
TL/DR: Don't worry about it. This just means the task finished, but tells you nothing about whether it was successful or how it failed. Look at the "Last Run Result" for that information.
The question and the top answer are confusing the notion of a "return code", which shows up in Task Scheduler as the "Last Run Result" with the "OpCode"/"Operational Code" that shows up in the history of a task.
If I create a simple Python program that does nothing more than sys.exit(7), and run it via task scheduler, I get a Last Run Result of 0x7, and an opcode of 2. If I have it do nothing, or sys.exit(0), I get a Last Run Result of "The operation completed successfully (0x0)" and still an opcode of 2. In other words, the return code from the executed program determines the Last Run Result. The OpCode appears to be a constant 2. This also establishes that the opcode 2 is not related to the return code 2 that likely means the file's not found. We know the file was found as it executed, and returned different Last Run Results depending on the code contained.
Further, a Windows forum post points out that this history view is really coming out of the event log. Sure enough, I can find the same events in the event log (always with a value of 2). This means the definition of the OpCode is going to be the same as the definition used for events, and is less of a task scheduler concept than a Windows event concept.
What is an opcode for an event? I've struggled to get a clear answer, but as best I can tell, it appears it's ultimately controlled by the program writing to the event log. There's documentation around for defining opcodes in your program. In this case, the thing writing to the event log would be Task Scheduler itself or something else in Windows.
A final observation: If I go to the event viewer and look for Log: Microsoft-Windows-TaskScheduler/Operational, Source: Microsoft-Windows-TaskScheduler and Event ID: 102,201, add the column for Operational Code, and sort, I see it is always a 2. And events 100 and 200 are always a 1. This applies not just to my manual experiments, but also includes every other random program that's using scheduled tasks, e.g. Dropbox and Google updaters that are working as far as I know.
Put all this together and I would strongly bet that the events generated while starting up a scheduled task are hardcoded by Windows to use an opcode of 1 when writing to the event log, and the events generated while finishing a task (successful or not - which goes in the Last Run Result) are hardcoded by Windows to use an opcode of 2 when writing to the event log. This opcode appears to be a red herring that doesn't affect anything we need to worry about beyond curiosity.
I was striking out until I just deleted & re-created the scheduled task...now it works. Don't know why but there it is.
Okay I know I am late to the party here, but I think a lot of the problem stems from confusing the Operational Code with a Return Code. I'm not an expert in Windows programming or internals (I make a living using a Windows system to program, but my programming isn't for Windows systems).
If I understand correctly:
The Operational Code is set by what ever routine being run at whatever value the programmer decided to set it at.
The Return Code is indicative of success or failure.
Consider the following (edited) example from the history of one of my scheduled tasks:
Event 201, Task Category "Action completed" shows an Operational Code of (2).
Down below under the General tab, is the message:
Task Scheduler successfully completed task "\My_task" , instance "{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}" , action "C:.....\blahblah.exe" with return code 0.
There's the indication of success. A different return code would indicate a failure. The Operational Code of (2) merely indicates that the routine was finished (in this case) when reported. I don't believe there's any set values to be interpreted for the Operation Code.
I've been having a similar issue and found that in addition to what was suggested in both the accepted answer and its comments I had to do one other thing. I had to re-create the task and set its "configure for" to Windows Server 2003, Windows XP, or Windows 2000 I dont understand why, since its not for any of those OS' but after I did so my task actually worked.
If this runs, and works, yet you still get an error code try entering exit 0 at the end of your script.
It took me a lot of googling to find that so hopefully this is helpful to someone.
#ojchase is right.
Opcodes are attached to events by the event provider. An opcode defines a numeric value that identifies the activity or a point within an activity that the application was performing when it raised the event.
Opcode 1 means that, when producing the event, the application was in the start of an activity.
Opcode 2 means that, when producing the event, the app. was at the end of an activity.
So opcodes have little to do with success or failure.
Sources:
https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.eventing.reader.standardeventopcode?view=net-5.0
https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.eventing.reader.eventopcode?view=net-5.0

The bizarre case of the file that both is and isn’t there

In .Net 3.5, I have the following code.
If File.Exists(sFilePath & IndexFileName & ".NX") Then
Kill(sFilePath & IndexFileName & ".NX")
End If
At runtime, on one client's machine, I get the following exception, over and over, when this code executes
Source: Microsoft.VisualBasic
TargetSite: Microsoft.VisualBasic.FileSystem.Kill
Message: No files found matching 'I:\RPG\HGIAPVXD.NX'.
StackTrace:
at Microsoft.VisualBasic.FileSystem.Kill(String PathName)
(More trace that identifies the exact line of code.)
There are two people on different machines running this code, but only one of them is getting the exception. The exception does not happen every time, but it is happening regularly. (Multiple times every hour.) The code is not in a loop, nor does it run continuously, more like once every couple of minutes or so.
On the surface, this looks like a race condition, but given how infrequently this code is run and how often the error is happening I think there must be something else going on.
I would appreciate any suggestions on how I can track down what is really going on here. A solution to keep the error from happening would be even better.
I guess the first question to ask is "IS the file really there or not?" and if so, does it have any specical attributes (Is it Read-only or Hidden, or System --- or a Directory)?
Note the Microsoft.VisualBasic.FileSystem.Kill specifically looks for, and silently skips, any file marked "System" or "Hidden". For pretty much any other problem you would have gotten a different exception.
as James pointed out the Kill functions checks if the file in case is a system or hidden, you better use System.IO.File.Delete() instead
Try
System.IO.File.Delete(sFilePath & IndexFileName & ".NX")
Catch ex As System.Exception
...
End Try
using File.Exits is not neccasary because File.Delete() checks this by itself.
Is there any chance that the I: drive is a network drive? it could be some network issue... or then maybe a race condition

Meaning of diagnostic output (like make[N]) from a makefile

When running something like "make install", there is a lot of information displayed in the terminal window. Some lines start with make[1], make[2], make[3] or make[4]. What do these mean? Does this mean there is some kind of error?
When make is invoked recursively, each make distinguishes itself in output messages with a count. That is, messages beginning "make[3]" are from the third make that was invoked. It is not indicative of an error of any kind, but is intended to enable you to keep track of what is happening. In particular, you can tell in which directory make is being run to help debug the build if any errors do occur.

Resources