Makefile: store warning count into variable without using temp file - bash

I would like to improve an existing Makefile, so it prints out the number of warnings and/or errors that were encountered during the build process.
My basic idea is that there must be a way to pipe the output to grep and have the number of occurences of a certain string in either stderr or stdout stream (i.e. "Warning:") stored into a variable that can then simply be echo'ed out at the end make command.
Requirements / Challenges:
Current console output and exit code must remain exactly the same
That means also without even changing control characters. Dev's using the MakeFile must not recognize any difference to what the output was prior to my change (except for a nice, additional Warning count output at the end of the make process). Any approaches with tee i tried so far were not successful, as the color coding of stderr messages in the console is lost, changing them to all black & white.
Must be system-independent
The project is currently being built by Win/OSX/Linux devs and thus needs to work with standard tools available out-of-the-box in most *nix / CygWin shells. Introducing another dependency such as unbuffer is not an option.
It must be stable and free of side-effects (see also 5.)
If the make process is interrupted (i.e. by the user pressing CTRL+C or for any other reason), there should be no side-effects (such as having an orphaned log file of the output left behind on disk)
(Bonus) It should be efficient
The amount of output may get >1MB and more, so just piping to a file and greping it will be a small performance hit and also there will be additional the disk I/O (thus unnecessarily slowing down the build). I'm simply wondering if this cannot be done w/o a temp file as i understand pipes as sort of "streams" that just need to be analysed as the flow through.
(Bonus) Make it locale-independent w/o changing the current output
Depending on the current locale, the string to grep and count is localized differently, i.e. "Warning:" (en_US.utf8) or "Warnung:" (de_DE.utf8). Surely i could have locale switch to en_US in the Makefile, but that would change console output for users (Hence breaking requirement 1.), so i'd like to know if there's any (efficient) approach you could think of for this.
At the end of the day, i'd be able to do with a solid solution that just fullfills requirement 1. + 2.
If 3. to 5. are not possible to be done then i'd have to convince the project maintainers to have some changes to .gitignore, have the build process slightly take up more time and resources, and/or fix the make output to english only but i assume they will agree that would be worth it.
Current solution
The best i have so far is:
script -eqc "make" make.log && WARNING_COUNT=$(grep -i --count "Warning:" make.log)" && rm make.log || rm make.log
That fulfills my requirements 1, 2 and almost no. 3: still, if the machine has a power-outage while running the command, make.log will remain as an unwanted artifact. Also the repetition of rm make.log looks ugly.
So i'm open on alternative approaches and improvements by anybody. Thanks in advance.

Related

Whether to redirect stderr to stdout OR redirect both to the same file?

Which is better?
cmd >>file 2>&1
cmd 1>>file 2>>file
Is there even a difference?
I know two reasons to choose the first one: It does also work with > instead of >>. It is more popular, therefore someone knowing shell-scripts would except it right away.
But, I still feel like the second one is better readable, and works without having to know the [n]>&[n] syntax, which IMHO is kinda confusing.
What is the difference?
Let's examine what each of these commands means. I will assume that the POSIX shell specification applies since the question doesn't ask about anything more specific.
The first command is cmd >>file 2>&1. This runs cmd after setting up the specified redirections.
The redirection >>file opens the named file with O_APPEND. As explained in the specification of open, this creates a new Open File Description, which notably contains the current file offset, and arranges for File Descriptor 1 to refer to that description. The meaning of O_APPEND is "the file offset shall be set to the end of the file prior to each write".
The redirection 2>&1 says that file descriptor 2 "shall be made to be a copy" of file descriptor 1. That specification is a little vague, but I think the only sensible interpretation (and what shells actually do) is it means to call dup2(1, 2), which "shall cause the file descriptor [2] to refer to the same open file description as the file descriptor [1]". Crucially, we get another file descriptor, but continue to use the same file description, meaning they both have the same file offset.
The second command is cmd 1>>file 2>>file. Based on the specifications cited above, this creates two separate file descriptions for file, each with their own offset.
Now, if the only thing that cmd does to file descriptors 1 and 2 is to call write, then these two situations are equivalent, because every call to write will atomically update the offset to point to the end of the file before performing the write, and therefore the existence of two separate offsets in the second command will not have any observable effect.
However, if cmd performs some other operation, for example lseek, then the two cases are not equivalent because that will reveal that the first command has one shared offset while the second command has two independent offsets.
Additionally, the above assumes the POSIX-specified semantics of O_APPEND. But real computer systems do not always implement that; for example, NFS does not have atomic append. Without atomic append, the second command may behave differently (most likely corrupting the output) even when only write is performed.
Which is better?
As the two commands do not mean the same thing, which is better presumably depends on which meaning is closer to what you intend. I speculate that, in almost all cases, the intent is to append to file both the standard output and standard error from cmd, which is presumed to only write to these descriptors. That is precisely the meaning of the first command (cmd >>file 2>&1), and hence is the better choice.
While the second command does use fewer shell features, and hence might be easier to understand for some people, it would probably seem odd to those who do have greater familiarity with redirection syntax, and might even behave differently than intended in some circumstances. Therefore I would advise against it, and if I found it in some code I was maintaining, would be inclined to change it to the first form.
Of course, if you truly want separate file descriptions, and hence separate file offsets, then the second command makes sense, so long as you put a comment nearby explaining the rationale for the unusual construction.

Monitor A File For Additions And Get Last Added Line

I'm having trouble monitoring a file for changes. I need to be able to know when a file changes, and when it does, I need the new line that was added. I intend to parse each line and find ones that match certain criteria, and act on information in those lines. I know the expected number of matching lines ahead of time, but I do not know how many lines in total will be added to the file, or where the matching lines will be.
I've tried 2 packages so far, with no avail.
fsnotify/fsnotify
As fas as I can tell, fsnotify can only tell me when a file is modified, not what the details of the modification was. Since I need to know what exactly was added to the file, this is no good for me.
(As a side-question, can this be run in a loop? The example that I tried exited after just one modification. I need to monitor for multiple modifications.)
hpcloud/tail
This package tries to mimic the Unix tail command, but it seems to have its own issues. The output that I get includes timestamps and other data - I just want the added line, nothing else. Also, it seems to think a file has been modified multiple times, even when it's just one edit. Further, the deal breaker here is that it does not output the last line if the line was not followed by a newline character.
Delegating to tail
I came across this answer, which suggests to delegate this work to the tail command itself, but I need this to work cross-platform (specifically, macOS, Linux and Windows). I don't believe that an equivalent command exists on Windows.
How do I go about tackling this?
#user2515526,
Usually changed diff is out of scope of file watchers' functionality, because, you know, you could change an image, and a watcher would need to keep a track several Mb of a diff in memory, and what if we have thousands of files?
However, as bad as it sounds, this may be exactly the way you want to implement this (sure, depends on your app, etc. - could be fine for text files), i.e. - keeping a map of diffs (1 diff per file) since last modification. Cannot say I like it, but sounds like fsnotify has no support for changes/diffs that you need.
Also, regarding your question about running in a loop, maybe you can get some hints here: https://github.com/kataras/iris/blob/8370d76910cdd8de043753ed81ae080eae8dc798/utils/file.go
Its a framework that allows to build a server that watches for TypeScript file changes. So sounds similar to your case/question.
Cheers,
-D

Can GNU makefiles rules have processes as requirements, if so how?

At some step of my software building automatization, which I attempt to implement using GNU make Makefiles, I run into the case of not only having targets a requirement being source files, but as a sort of different type of requirement I would like the target to depend on another software is started and hence exist as an operation system process.
Such a program could be background process but also a foreground process such as a Webbrowser which running a HTML5 application, which might play a role in a building process by for instance interacting with files it is fed through the building process.
I would hence like to write a rule somewhat like this:
.PHONY: firefoxprocess
Html5DataResultFile: HTML5DataSourceFile firefoxprocess
cp HTML5DataSourceFile folder/checked/by/html5app/
waitforHtml5DataResultFile
firefoxprocess:
/usr/bin/firefox file://url/to/html5app &
As seen I have taken the idea that .PHONY targets are somewhat non-file targets and hence would allow for requirering a process to be started?
Yet I a unsure if that is right. The documentation of GNU make is excellent and quite large and I am unsure understood it completely. To the best of my knowledge the documentation did not really report on the use of processes being used in rules, which motivates the question here.
My feeling has been that pidfiles are somewhat a link between processes and files, but they come with several problems (i.e. race conditions, uniqueness etc)
Sometimes a Makefile dependency tree includes elements that aren't naturally or necessarily time-dependent files. There are two answers:
create a file to represent the step, or
just do the work "in line" as part of the step.
The second option is usually easiest. For instance, if a target file is to be created in a directory that might not exist yet, you don't want to make the directory name itself a dependency, because that would cause the file to be out of date whenever the directory changed. Instead, I do:
d/foo:
#test -d d || mkdir -p d
...
In your case, you could something similar; you just need a way to test for a running instance of firefox, and to be able to start it. Something like this might do:
Html5DataResultFile: HTML5DataSourceFile
pgrep firefox || { /usr/bin/firefox && sleep 5; }
cp HTML5DataSourceFile folder/checked/by/html5app/
waitforHtml5DataResultFile
The sleep call just lets FF initialize, because it might not be ready to do anything the instant it returns.
The problem with option #1 in your case is that it's undependable and a little circular. Firefox won't reliably remove the pidfile if the process dies. If it does successfully remove the file when it exits, and re-creates it when it restarts, you have a new problem: the timestamp on the file spuriously defines any dependencies as out of date, when in fact the restarted process hasn't invalidated them.

How to display command output in a whiptail textbox

The whiptail command has an option --textbox that has the following description:
--textbox <file> <height> <width>
The first option requires a file as input; I would like to use the output of a command in its place. It seems like this should be possible in sh or bash. For the sake of the question, let's say I'd like to view the output of ls -l in a whiptail textbox.
Note that process substitution does not appear to work in whiptail (e.g. whiptail --textbox <(ls -l) 40 80 does not work.
This question is a re-asking of this other stackoverflow question, which technically was answered.
For the record, the question says that
whiptail --textbox <(ls -l) 40 80
"doesn't work". It's most certainly worth stating clearly that the nature of the failure is that whiptail displays an empty textbox. (That's mentioned in a comment to an answer to the original question, linked in this question, but that's a pretty obscure place to look for a problem report.)
In 2014, this workaround was available (and was the original contents of this answer):
whiptail --textbox /dev/stdin 40 80 <<<"$(ls -l)"
That will still work in 2022, if ls -l produces enough output (at least 64k on a standard Linux/Bash install).
Another possible workaround is to use a msgbox instead of a textbox:
whiptail --msgbox "$(ls -l)" 40 80
However, that will fail if the output from the command is too large to use as a command-line argument, which might be the case at 128k.
So if you can guess reasonably accurately how big the output will be, one of those solutions will work. Up to around 100k, you can use the msgbox solution; beyond that, you can use the textbox with a here-string.
But that's far from ideal, since it's really hard to reliably guess the size of the output of a command, even within a factor of two.
What will always work is to save the output of the command to a temporary file, then provide the file to whiptail, and then delete the file. (In fact, you can delete the file immediately, since Posix systems don't delete files until there are no open file handles.) But no matter how hard you try, you will occasionally end up with a file which should have been deleted. On Linux, your best bet is to create temporary files in the /tmp directory, which is an in-memory filesystem which is emptied automatically on reboot.
Why does this happen?
I came up with the solution above, eight years prior to this edit, on the assumption that OP was correct in their guess that the problem had to do with not being able to seek() a process substituted fd. Indeed, it's true that you can't seek() /dev/fd/63. It was also true at the time that bash implemented here-strings and here-docs by creating a temporary file to hold the expanded text, and then redirecting standard input (or whatever fd was specified) to that temporary file. So the suggested workaround did work; it ensured that /dev/stdin was just a regular file.
But in 2022, the same question was asked, this time because the suggested workaround failed. As it turns out, the reason is that Bash v5.1, which was released in late 2020, attempts to optimise small here-strings and here-docs:
c. Here documents and here strings now use pipes for the expanded document if it's smaller than the pipe buffer size, reverting to temporary files if it's larger.
(from the Bash CHANGES file; changes between bash 5.1alpha and bash 5.0, in section 3, New features in Bash.)
So with the new Bash version, unless the here-string is bigger than a pipe buffer (on Linux, 16 pages), it will no longer be a regular file.
One slightly confusing aspect of this issue is that whiptail does not, in fact, try to call lseek() on the textbox file. So the initial guess about the nature of the problem was not exact. That's not all that surprising, since using lseek() on a FIFO to position the stream at SEEK_END produces an explicit error, and it's reasonable to expect software to actually report error returns. But whiptail does not attempt to get the filesize by seeking to the end. Instead, it fstat()s the file and gets the file size from the st_size field in the returned struct stat. It then allocates exactly enough memory to hold the contents of the file, and reads the indicated number of bytes.
Of course, fstat cannot report the size of a FIFO, since that's not known until the FIFO is completely drained. But unlike lseek, that's not considered an error. fstat is documented as not filling in the st_size field on FIFOs, sockets, character devices, and other stream types. As it happens, on Linux the st_size field is filled in as 0 for such file descriptors, but Posix actually allows it to be unset. In any case, there is no error indication, and it's essentially impossible to distinguish between a stream which doesn't have a known size and a stream which is known to have size 0, like an empty file. Thus, whiptail treats a FIFO as though it were an empty file, reading 0 bytes and presenting an empty textbox.
What about dialog?
One alternative to whiptail is Dialog, currently maintained by Thomas Dickey. You can often directly substitute dialog for whiptail, and it has some additional widgets which can be useful. However, it does not provide a simple solution in this case.
Unlike whiptail, dialog's textbox attempts to avoid reading the entire file into memory before drawing the widget. As a result, it does depend on lseek() in order to read out of order, and thus cannot work on special files at all. Attempting to use a FIFO with dialog produces an error message, rather than drawing an empty textbox; that makes diagnosis easier but doesn't really solve the underlying problem. Dialog does have a variety of widgets which can read from stdin, but as far as I know none of them allow scrolling, so they're only useful if the command output fits in a single window. But it's possible that I've missed something.
Drawing out a moral: just read until you reach the end
(Skip this section if you're only interested in using command-line utilities, not writing them.)
The tragic part of this complicated tale is that it was all completely unnecessary. Whiptail is going to read the entire file in any case; trying to get the size of the file before reading was either laziness or a misguided attempt to optimise. Had the code just read until an end of file indication, possibly allocating more memory as needed, all these problems would have been avoided. And not just these problems. As it happens, Posix does not guarantee that the st_size field is accurate even for apparently normal files. Stat is only required to report accurate sizes for symlinks (the length of the link itself) and shared memory objects. The Linux documentation indicates that st_size will be returned as 0 for certain automatically-generated files:
For example, the value 0 is returned for many files under the /proc directory, while various files under /sys report a size of 4096 bytes, even though the file content is smaller. For such files, one should simply try to read as many bytes as possible (and append '\0' to the returned buffer if it is to be interpreted as a string). (from man 7 inode).
lseek will also fail on many autogenerated files, as well as sockets, FIFOs and character devices. So the more common idiom for this particular optimization is also unreliable, as well as leading to TOCTOU-like race conditions when the file is truncated or appended to while it is being read.

command line wisdom for 2 panel file manager user

Want to upgrade my file management productivity by replacing 2 panel file manager with command line (bash or cygwin). Can commandline give same speed? Please advise a guru way of how to do e.g. copy of some file in directory A to the directory B. Is it heavy use of pushd/popd? Or creation of links to most often used directories? What are the best practices and a day-to-day routine to manage files of a command line master?
Can commandline give same speed?
My experience is that commandline copying is significantly faster (especially in the Windows environment). Of course the basic laws of physics still apply, a file that is 1000 times bigger than a file that copies in 1 second will still take 1000 seconds to copy.
..(howto) copy of some file in directory A to the directory B.
Because I often have 5-10 projects that use similar directory structures, I set up variables for each subdir using a naming convention :
project=NewMatch
NM_scripts=${project}/scripts
NM_data=${project}/data
NM_logs=${project}/logs
NM_cfg=${project}/cfg
proj2=AlternateMatch
altM_scripts=${proj2}/scripts
altM_data=${proj2}/data
altM_logs=${proj2}/logs
altM_cfg=${proj2}/cfg
You can make this sort of thing as spartan or baroque as needed to match your theory of living/programming.
Then you can easily copy the cfg from 1 project to another
cp -p $NM_cfg/*.cfg ${altM_cfg}
Is it heavy use of pushd/popd?
Some people seem to really like that. You can try it and see what you thing.
Or creation of links to most often used directories?
Links to dirs are, in my experience used more for software development where a source code is expecting a certain set of dir names, and your installation has different names. Then making links to supply the dir paths expected is helpful. For production data, is just one more thing that can get messed up, or blow up. That's not always true, maybe you'll have a really good reason to have links, but I wouldn't start out that way, just because it is possible to do.
What are the best practices and a day-to-day routine to manage files of a command line master?
( Per above, use standardized directory structure for all projects.
Have scripts save any small files to a directory your dept keeps in the /tmp dir, .
i.e /tmp/MyDeptsTmpFile (named to fit your local conventions) )
It depends. If you're talking about data and logfiles, dated fileNames can save you a lot of time. I recommend dateFmts like YYYYMMDD(_HHMMSS) if you need the extra resolution.
Dated logfiles are very handy, when a current process seems like it is taking a long time, you can look at the log file from a week ago and quantify exactly how long this process took, a week, month, 6 months (up to how much space you can afford). LogFiles should also capture all STDERR messages, so you never have to re-run a bombed program just to see what the error message was.
This is Linux/Unix you're using, right? Read the man page for the cp cmd installed on your machine. I recommend using an alias like alias CP='/bin/cp -pi' so you always copy a file with the same permissions and with the original files' time stamp. Then it is easy to use /bin/ls -ltr to see a sorted list of files with the most recent files showing up at the bottom of the list. (No need to scroll back to the top, when you sort by time,reverse). Also the '-i' option will warn you that you are going to overwrite a file, and this has saved me more than a couple of times.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, and/or give it a + (or -) as a useful answer.

Resources