GNU make implicit clean rule - makefile

I know there are a lot of implicit rules in make, but is there one for cleaning generated files ?
I tried searching in GNU make manual, googling, and here too, but did not find any.
My googling always takes to me to generic implicit rule hits (usually the clean keyword drops out in many hits), possibly due to my search keywords (see title of this question)
However, since it is a trivial rule, i believe there is a high chance it could be implemented, but i just can't reach it
Thank you for your time

Related

How does the compiler guess the correct name when one mistypes it?

At times, gcc yields the following error message:
error: 'class X' has no member named 'Y'; did you mean 'Z'?
I have seen gcc correctly guess Z when Y contains some simple typo, e.g. wrong lower/upper case, but also when there are some missing/extra character(s) in the name.
I was curious to know
how does gcc correctly guess Z starting from Y?
if it applies a fixed set of rules, what kind of jamming is it able to handle and what falls beyond its grasp?
I would welcome answers relating other compilers too, if they perform something ostensibly different or interesting.
Well after a quick search it seems that GCC has an internal code to handle spellcheck which includes an implementation of Levenstein distance.
see
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01090.html
and
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00046.html
I don't know how GCC specifically does it, but there are algorithms that can calculate how much different are these two strings? Totally unrelated or a minor difference?
For example: Jaro-Winkler distance, Levenshtein distance, and maybe others.
So, when seeing an unresolved name, a compiler can scan through known/suitable names, pick one/few most similar and suggest that as an alternative.
GCC would likely already have a list of in-scope symbols at the ready, when it gets to a potentially incorrect symbol. All it has to do then is run the incorrect symbol through a good old fashioned spellchecker algorithm, with the in-scope symbols as the dictionary.
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01090.html

can it be a good idea to partition available cores among GNU Make processes?

Has anybody seen a situation, where running n GNU Make processes each with
$ make -j <number of cores / n>
is faster than running them all with
$ make -j < number of cores>
and letting GNU Make take care of it. I have never seen such an effect in my practice, and it seems to me, doing the former is pointless. That is to say, maybe it is possible to cook up specific non-practical examples where such an effect would happen, but in real-world situation, no.
My question is, has anybody seen such an effect in their practice.
ADDED: Of course it is assumed that the different processes are independent of each other and can logically run simultaneously.
ADDED: the question is NOT: is it usually going to be faster to partition manually. I know the answer to that, it is: NO. The question is, CAN it happen in practice? Has anybody seen it?
Especially I am interested in the answer of MadScientist :)
This question is On Topic of StackOverflow, because, the above lines are in programs, namely, wrapper scripts for GNU Make. (Yes I know this is probably not going to fly, but at least I tried :) )
Manual partitioning as you have described is likely to have strong negative effects:
with no synchronization between the separate make processes, they are likely to try to rebuild the same files at the same time, leading to not only wasted work, but also failed builds due to overridden garbage inputs.
even if you manage to partition the entire dependency chains without any overlap, you will have idle CPUs after the first partition finishes.
The only techniques I have seen that can improve makefile speed are:
Place "more expensive" targets (i.e. binary with the most linking) earlier in the dependency lists, to prevent idle CPUs at the end.
Include a generated makefile with cached dependencies rather than recalculating them every run even if they haven't changed.
Use a system to avoid rebuilding files when source files change timestamp but not contents. I am only beginning to study what is possible here.

Difference between rules in based expert systems and conditions in normal algorithmic programming

In rule based expert systems the knowledge base contains large number of rules in the form of "if (template) then (action)". The inference engine chooses the rules that match the input facts. That is those rules that their condition section matches the input data are shortlisted and one of them is selected.
Now it is possible to use a normal program with similar conditional statements in some way to possibly reach a result.
I am trying to find a "sound and clear description" of the difference between the two and why we cannot achieve what expert system rules could do with normal algorithmic programming?
Is it just that an algorithm needs complete and very well known inputs while expert systems can accept incomplete information with any order?
Thanks.
Rules are more accurately described as the form "when (template) then (action)". The semantic of "when" is quite different than those of "if". For example, the most direct translation of a distinct set of rules to a procedural programming language would look something like this:
if <rule-1 conditions>
then <rule-1 actions>
if <rule-2 conditions>
then <rule-2 actions>
.
.
.
if <rule-n conditions>
then <rule-n actions>
Since the actions of a rule can effect the conditions of another rule, every time any rule action is applied the rule conditions will all need to be rechecked. This can be quite inefficient.
The benefit provided by rule-based systems is to allow you to express rules as discrete units of logic while efficiently handling the matching process for you. Typically this will involve detecting and sharing common conditions between rules so they don't need to be checked multiple times as well as data-driven approaches where the system predetermines which rules will be effected by changes to specific data so that rule conditions do not need to be rechecked when unrelated data is changed.
This benefit is similar to the one provided by garbage collection in languages such as Java. There's nothing that automatic memory management provides in Java that can't be achieved by writing your own memory management routines in C. But since that's tedious and error prone, there's a clear benefit to using automatic memory management.
There is nothing that a "rule-based expert system" can do that a "normal algorithmic program" can't do because a rule-based expert system is a normal algorithmic program. It is not only possible to write a normal algorithmic program to match the way an expert system inference engine works, that is exactly what the people who wrote the inference engine did.
Perhaps the "difference" that you are seeing is that in one case the rules are "hard-coded" in the programming language, whereas in the other case the rules are treated as data to be processed by the program. The same logic is present in both cases, it's just that in one the "program" is specific to one task while the other shuffles the complexity out of the "program" and into the "data".
To expand on what Gary said, in hard-coded if-then systems the firing order for the rules is more constrained than in most expert systems. In an expert system, the rules can fire based on criteria other than some coded order. For example, some measure of relevance may be used to fire the rule, e.g. firing the rule the success or failure of which will rule in or out the most hypotheses.
Similarly, in many systems, the "knowledge engineer" can state the rules in any order. Although the potential order of firing may need to be considered, the order in which the rules are declared may not be important.
In some types of systems the rules are only loosely coupled. That is, a rule's contribution may not be all or nothing. If a rule fires, it may contribute evidence, if it fails to fire (or is absent), it may not contribute evidence, yet the hypothesis may succeed if some other suite of rules pushes it over a certainty threshold.
This allows experts to contribute rules in a more natural manner. The expert can think of a few rules and they can be tested. The expert can add a few more rules, perhaps even months later, etc., all the while improving the system's accuracy without the need to re-write any of the earlier rules or re-arrange any code.
The ways in which the above are accomplished are myriad, but the production rules described by Gary are one of the most common, easily understood, and effective means and are used by many expert systems.

Colossal memory usage/stack problems with ANTLR lexer/parser

I'm porting over a grammar from flex/bison, and mostly seem to have everything up and running (in particular, my token stream seems fine, and my parser grammar is compiling and running), but seem to be running into problems of runaway stack/memory usage even with very small/moderate sized inputs to my grammar. What is the preferred construct for chaining together an unbounded sequence of the same nonterminal? In my Bison grammar I had production rules of the form:
statements: statement | statement statements
words: | word words
In ANTLR, if I maintain the same rule setup, this seems to perform admirably on small inputs (on the order of 4kB), but leads to stack overflow on larger inputs (on the order of 100kB). In both cases the automated parse tree produced is also rather ungainly.
I experimented with changing these production rules to have an explicitly additive (rather than recursive form):
statements: statement+
words: word*
However this seems to have lead to absolutely horrific blowup in memory usage (upwards of 1GB) on even very small inputs, and the parser has not yet managed to return a parse tree after 20 minutes of letting it run.
Any pointers would be appreciated.
Your rewritten statements are the optimal ANTLR 4 form of the two rules you described (highest performing and minimum memory usage). Here is some general feedback regarding the issues you describe.
I developed some very advanced diagnostic code for numerous potential performance problems. Much of this code is included in TestPerformance, but it is geared towards expert users and requires a rather deep understanding of ANTLR 4's new ALL(*) algorithm to interpret the results.
Terence and I are interested in turning the above into a tool that users can make use of. I may be able to help (run and interpret the test) if you provide a complete grammar and example inputs, so that I can use that grammar and input pair as part of evaluating the usability of a tool further down the road that automates the analysis.
Make sure you are using the two-stage parsing strategy from the book. In many cases, this will vastly improve the parsing performance for correct inputs (incorrect inputs would not be faster).
We don't like to use more memory than necessary, but you should be aware that we are working under a very different definition of "excessive" - e.g. we run our testing applications with -Xmx4g to -Xmx12g, depending on the test.
Okay, so I've gotten it working, in the following manner. My YACC grammar had the following constructions:
lines: lines | line lines;
words: | word words;
However, this did not make the recursive parsing happy, so I rewrote it as:
lines: line+;
words: word*;
Which is in line with #280Z28's feedback (and my original guess). This hung the parser, which is why I posted the question in the first place, but the debugging procedure outlined in my comments to #280Z28's answer showed that in fact it was only the lines parsing which was causing the problem (words) was fine. On a whim, I tried the following rewrite:
lines : stmt (EOL stmt)+ EOL*;
(where line had originally been defined as:
line : stmt (EOL | EOF);
)
This seems to be working quite well, even for large inputs. However it is entirely unclear to me WHY this is the Right Thing To Do(tm), or why it makes a difference compared to the revision which prompted this question. Any feedback on this matter would still be appreciated.

Rewriting methods within a project

As a growing dev team we are beginning to encounter the problem of rewriting functions that behave in similar/identical ways.
We are all guilty of failing to write documentation as time is a limiting factor, however the idea of gathering all current functions (duplicates and all) and using that list along with applied key words and the methods summary to identify current methods before we rewrite them has been suggested.
Now before I go and write a solution I just wanted to make sure there isn’t a perfectly good solution out there, I've already done the obvious and searched a little, but googling
Visual Studio + return function list and other variations surprisingly returns not a whole bunch.
Any suggestions would be much appreciated.
One option would be to mark a suspect function with the Obsolete attribute and counting the warnings that are thrown. Repeat for the redundant function. Using this you can find out which method is called more and save yourself the effort of updating it in more locations. This of course assumes that the functions have different signatures and that a simple find-and-replace operation didn't solve your problem.
As with any large undertaking, you probably shouldn't try to do it all at once. As suspect functions are found, deal with them one at a time and gradually refactor the excess code out of your system. That way you aren't spending too much time up front, but are making continual progress.

Resources