When to rename variables in lambda calculus? - lambda-calculus

I understand why renaming variables to avoid capture is important but, in the following example, I don't understand why it doesn't occur.
(λf.λx.f(fx))(λf.λx.fx)
apparently reduces to
λx.(λf.λx.fx)((λf.λx.fx)x)
but shouldn't x be renamed in either (λf.λx.f(fx)) or (λf.λx.f(fx))? Don't they refer to different xs?

Capture avoidance is to avoid capturing free variables. "Capturing" bound variables doesn't hurt that much: In
λx.(λf.λx.fx)((λf.λx.fx)x)
the two uses of x are indeed different variables, but this is already encoded in the term: In general, a new abstraction in a subterm will "overwrite" the binding of further outmost abstractions. This is simply due to the way the evaluation of lambda terms works: If there is a new abstraction over the same variable, then the old abstraction further out will ultimately lose its effect in the subterm with the new abstraction, and the variables bound by the inner abstraction will effectively be different variables than the ones only bound by the outer abstraction.
You can try this out: If you apply λx.(λf.λx.fx)((λf.λx.fx)x) to some term N, then according to the definition of beta reduction, this term will reduce to (λf.λx.fx)((λf.λx.fx)x)[N/x], i.e. the term obtained by substituting every free (!) occurence of x in (λf.λx.fx)((λf.λx.fx)x) by N (substitution only operates on free variables by definition). The only free occurrence of x in that term is the very last one; the other two xes in the two subterms (λf.λx.fx) are bound by their respective λx's. So the only x that will be substituted by N is the last one, hence (λx.(λf.λx.fx)((λf.λx.fx)x))N will reduce to (λf.λx.fx)((λf.λx.fx)N) - the x's bound in the subterms (λf.λx.fx) remain unchanged.
So the x's bound by the inner abstraction and the x at the end of the term are indeed different variables belonging to different abstractions. Therefore it is unproblematic not to rename them during the application.
That being said, it can sometimes be useful to do such renamings for easier readability. The resulting term will be alpha-congruent to the one obtained by directly substituting without renaming.

Related

Why should a rule be standardized in Backward Chaining before looking for substitutions?

I understood most of the Backward Chaining algorithm (for first-order logic), but not what Standardize-Variables(rule) is for. Below is the pseudo-code of the algorithm:
function FOL-BC-Ask(KB, query) returns a generator of substitutions
return FOL-BC-Or(KB, query, {})
function FOL-BC-Or(KB, goal, θ) returns a substitution
for each rule in Fetch-Rules-For-Goal(KB, goal) do
(lhs ⇒ rhs) ← Standardize-Variables(rule)
for each θ' in FOL-BC-And(KB, lhs, Unify(rhs, goal, θ)) do
yield θ'
function FOL-BC-And(KB, goals, θ) returns a substitution
if θ = failure then return
else if Length(goals) = 0 then yield θ
else
first, rest ← First(goals), Rest(goals)
for each θ' in FOL-BC-Or(KB, Subst(θ, first), θ) do
for each θ'' in FOL-BC-And(KB, rest, θ') do
yield θ''
I'm studying on the book Artificial Intelligence - A Modern Approach and the code comes from there. The book simply says
FOL-BC-Or works by fetching all clauses that might unify with the goal, standardizing the variables in the clause to be brand-new variables, and then ...
I do understand this, but I do not understand why it needs to be done, or what would happen without it.
I hope someone can explain this. Thank you.
The reason for standardizing variables apart is rather mundane: scope. A variable is "local" to its clause, so when it appears in multiple clauses, it really should be treated as a different variable in each clause. Standardizing apart makes sure this is made clear by using different names in each clause.
Let me explain in more detail. In a normalized first-order logic theory, each clause is implicitly universally quantified. If I have a theory with two clauses
happy(X)
happy(X) or not friends(X,Y),
it means the same as
for all X: happy(X)
for all X : for all Y : happy(X) or not friends(X,Y)
You can think of "for all X" as a sort of "declaration" of X (in the programming sense of "declaration"), so each of these variables is, so to speak, "local" to the clause, in the same sense that a local variable in programming is local to its scope. It is pure coincidence that X is used in both clauses, and in fact we can rename them at will within each clause and obtain perfectly equivalent theories such as
for all U: happy(U)
for all V : for all W : happy(V) or not friends(V,W)
or even
for all X: happy(X)
for all Y : for all X : happy(Y) or not friends(Y,X)
Standardizing apart comes into play because if we try to unify these two clauses, there will be two variables with the the same name X even though they do not necessarily refer to the same entities. If we try to unify the two clauses above without standardizing apart first, we will unify X and Y and end up with
happy(X) or not friends(X,X)
which implies that both arguments of "friends" are the same even though that would not be implied if we simply renamed the variables. Unifying the same perfectly equivalent two clauses using U, V, W names results in
happy(U) or not friends(U, W)
where now the two arguments of "friends" are not required to be the same.
The fact that we obtained different results from unifying perfectly equivalent theories shows us that something must be incorrect. And indeed what is incorrect here is unifying two clauses that use a variable with the same name (X) even though they are not really the same variable and could be equivalently renamed to something else.
David Einsentat's comment is correct that failing to standardize apart is incorrect as it does not provide the most general unifier, because it may provide an unifier that has spurious constraints such as the equality we saw above, preventing it from being as general as it should.
Standardizing apart solves this problem by renaming the variables to "brand-new ones", meaning variables that do no appear anywhere else and which therefore do not pose the risk of colliding in this way and introducing a false equality based on purely arbitrary name choices.

Free Variable in Prolog

Can anyone explain the concept of free variables in Prolog. Is it similar to anonymous variables ? Or is there a difference. Also could be great if an example is given to explain.
tl;dr:
free is a notion to distinguish universally bound (free in clause notation) from existentially bound variables in setof/3, bagof/3, etc. - some people use free to mean "currently not instantiated" and some use it to denote an output argument that's meant to be instantiated by the predicate but that's not how the standard uses it.
long version:
I will quote the Prolog standard on the definition:
7.1.1.4 Free variables set of a term
The free variables set, FVt of a term T with respect to a
term v is a set of variables defined as the set difference
of the variable set (7.1.1.1) of T and BV where BV is a
set of variables defined as the union of the variable set of
v and the existential variables set (7.1.1.3) of T.
where they explicitly note:
The concept of a free variables set is required when defining
bagof/3 (8.10.2) and setof/3 (8.10.3).
Perhaps as a background: in logic, a free variable is one that is not bound by a quantifier (e.g. x is bound and y is free in ∀x p(x,y) ). A (pure) prolog clause head(X) :- goal1(X), goal2(X). can be read as the logical formula ∀X goal1(X) ∧ goal2(X) → head(X). In practice, as long as we use fresh variables whenever we try to unify a goal with a clause, we can just disregard the universal quantifiers. So for our purposes we can treat X in the clause above as free.
This is all and well until meta-predicates come in: say we are interested in the set of first elements in a list of tuples:
?- setof(X, member(X-Y, [1-2, 2-2, 1-3]), Xs).
Y = 2,
Xs = [1, 2] ;
Y = 3,
Xs = [1].
But we get two solutions: the ones where Y=2 and those where Y=3. What I'd actually want to say is: there exists some Y such that X-Y is a member of the list. The Prolog notation for this pattern is to write Var^Term:
?- setof(X, Y^member(X-Y, [1-2, 2-2, 1-3]), Xs).
Xs = [1, 2].
In the first example, both X and Y are free, in the second example X is free and Y is bound.
If we write this as a formula we get setof(X, ∃Y member(X-Y, [1-2, 2-3, 1-3]), Xs) which is not a first order formula anymore (there is an equivalent first order one but this is where the name meta predicate comes in). Now the problem is that the Var^Term notation is purely syntactical - internally there is only one type of variable. But when we describe the behaviour of setof and friends we need to distinguish between free and existentially bound variables. So unless you are using metapredicates, all of your variables can be considered as free (1).
The Learning Prolog link provided by #Reema Q Khan is a bit fuzzy in its use of free. Just looking at the syntax, X is free in X=5, X is 2 + 3. But when we run this query, as soon as we get to the second goal, X has been instantiated to 5 so we are actually running the query 5 is 2 + 3 (2). What is meant in that context is that we expect is/3 to unify its first argument (often called "output" argument). To make sure this always succeeds we would pass a variable here (even though it's perfectly fine not to do it). The text tries to describe this expectation as "free variable" (3).
(1) ok, formally, anything that looks like Var^Term considers Var existentially bound but without meta-predicates this doesn't matter.
(2) I believe there is a clash in notation that some texts use "X is bound to 5" here, which might increase the confusion.
(3) What the should say is that they expect that the argument has not been instantiated yet but even that does not capture the semantics correctly - Paulo Moura already gave the initial ground example 5 is 3 + 2.
Maybe this can help. (If I have prepared it, I might as well post it! Still hard to read, needs simplification.)
In fact, you need to distinguish whether you talk about the syntax of the program or whether you talk about the runtime state of the program.
The word "variable" takes on slightly different meanings in both cases. In common usage, one does not make a distinction, and the understanding this fluent usage provides is good enough. But for beginners, this may be a hurdle.
In logic, the word "variable" has the meaning of "a symbol selected from the set of variable symbols", and it stands for the possibly infinite set of terms it may take on while fulfilling any constraints given by the logical formulae it participates in. This is not the "variable" used in reasoning about an actual programs.
Free Variable:
"is" is a build-in arithmetic evaluator in Prolog. "X is E" requires X to be free variable and E to be arithmetic expression that is possible to evaluate. E can contain variables but these variables has to be bound to numbers, e.g., "X=5, Y is 2*X" is correct Prolog goal.
More Explanation:
http://kti.ms.mff.cuni.cz/~bartak/prolog.old/learning/LearningProlog11.html
Anonymous Variable:
The name of every anonymous variable is _ .
More Explanation:
https://dobrev.com/help/tut/The_anonymous_variable.html#:~:text=The%20anonymous%20variable%20is%20an,of%20_denotes%20a%20distinct%20variable%20.

How to recognize variables that don't affect the output of a program?

Sometimes the value of a variable accessed within the control-flow of a program cannot possibly have any effect on a its output. For example:
global var_1
global var_2
start program hello(var_3, var_4)
if (var_2 < 0) then
save-log-to-disk (var_1, var_3, var_4)
end-if
return ("Hello " + var_3 + ", my name is " + var_1)
end program
Here only var_1 and var_3 have any influence on the output, while var_2 and var_4 are only used for side effects.
Do variables such as var_1 and var_3 have a name in dataflow-theory/compiler-theory?
Which static dataflow analysis techniques can be used to discover them?
References to academic literature on the subject would be particularly appreciated.
The problem that you stated is undecidable in general,
even for the following very narrow special case:
Given a single routine P(x), where x is a parameter of type integer. Is the output of P(x) independent of the value of x, i.e., does
P(0) = P(1) = P(2) = ...?
We can reduce the following still undecidable version of the halting problem to the question above: Given a Turing machine M(), does the program
never stop on the empty input?
I assume that we use a (Turing-complete) language in which we can build a "Turing machine simulator":
Given the program M(), construct this routine:
P(x):
if x == 0:
return 0
Run M() for x steps
if M() has terminated then:
return 1
else:
return 0
Now:
P(0) = P(1) = P(2) = ...
=>
M() does not terminate.
M() does terminate
=> P(x) = 1 for a sufficiently large x
=> P(x) != P(0) = 0
So, it is very difficult for a compiler to decide whether a variable actually does not influence the return value of a routine; in your example, the "side effect routine" might manipulate one of its values (or even loop infinitely, which would most definitely change the return value of the routine ;-)
Of course overapproximations are still possible. For example, one might conclude that a variable does not influence the return value if it does not appear in the routine body at all. You can also see some classical compiler analyses (like Expression Simplification, Constant propagation) having the side effect of eliminating appearances of such redundant variables.
Pachelbel has discussed the fact that you cannot do this perfectly. OK, I'm an engineer, I'm willing to accept some dirt in my answer.
The classic way to answer you question is to do dataflow tracing from program outputs back to program inputs. A dataflow is the connection of a program assignment (or sideeffect) to a variable value, to a place in the application that consumes that value.
If there is (transitive) dataflow from a program output that you care about (in your example, the printed text stream) to an input you supplied (var2), then that input "affects" the output. A variable that does not flow from the input to your desired output is useless from your point of view.
If you focus your attention only the computations involved in the dataflows, and display them, you get what is generally called a "program slice" . There are (very few) commercial tools that can show this to you.
Grammatech has a good reputation here for C and C++.
There are standard compiler algorithms for constructing such dataflow graphs; see any competent compiler book.
They all suffer from some limitation due to Turing's impossibility proofs as pointed out by Pachelbel. When you implement such a dataflow algorithm, there will be places that it cannot know the right answer; simply pick one.
If your algorithm chooses to answer "there is no dataflow" in certain places where it is not sure, then it may miss a valid dataflow and it might report that a variable does not affect the answer incorrectly. (This is called a "false negative"). This occasional error may be satisfactory if
the algorithm has some other nice properties, e.g, it runs really fast on a millions of code. (The trivial algorithm simply says "no dataflow" in all places, and it is really fast :)
If your algorithm chooses to answer "yes there is a dataflow", then it may claim that some variable affects the answer when it does not. (This is called a "false positive").
You get to decide which is more important; many people prefer false positives when looking for a problem, because then you have to at least look at possibilities detected by the tool. A false negative means it didn't report something you might care about. YMMV.
Here's a starting reference: http://en.wikipedia.org/wiki/Data-flow_analysis
Any of the books on that page will be pretty good. I have Muchnick's book and like it lot. See also this page: (http://en.wikipedia.org/wiki/Program_slicing)
You will discover that implementing this is pretty big effort, for any real langauge. You are probably better off finding a tool framework that does most or all this for you already.
I use the following algorithm: a variable is used if it is a parameter or it occurs anywhere in an expression, excluding as the LHS of an assignment. First, count the number of uses of all variables. Delete unused variables and assignments to unused variables. Repeat until no variables are deleted.
This algorithm only implements a subset of the OP's requirement, it is horribly inefficient because it requires multiple passes. A garbage collection may be faster but is harder to write: my algorithm only requires a list of variables with usage counts. Each pass is linear in the size of the program. The algorithm effectively does a limited kind of dataflow analysis by elimination of the tail of a flow ending in an assignment.
For my language the elimination of side effects in the RHS of an assignment to an unused variable is mandated by the language specification, it may not be suitable for other languages. Effectiveness is improved by running before inlining to reduce the cost of inlining unused function applications, then running it again afterwards which eliminates parameters of inlined functions.
Just as an example of the utility of the language specification, the library constructs a thread pool and assigns a pointer to it to a global variable. If the thread pool is not used, the assignment is deleted, and hence the construction of the thread pool elided.
IMHO compiler optimisations are almost invariably heuristics whose performance matters more than effectiveness achieving a theoretical goal (like removing unused variables). Simple reductions are useful not only because they're fast and easy to write, but because a programmer using a language who understand basics of the compiler operation can leverage this knowledge to help the compiler. The most well known example of this is probably the refactoring of recursive functions to place the recursion in tail position: a pointless exercise unless the programmer knows the compiler can do tail-recursion optimisation.

most readable way in XPath to write "is value X a member of sequence S"?

XPath 2.0 has some new functions and syntax, relative to 1.0, that work with sequences. Some of theset don't really add to what the language could already do in 1.0 (with node sets), but they make it easier to express the desired logic in ways that are more readable. This increases the chances of the programmer getting the code correct -- and keeping it that way. For example,
empty(s) is equivalent to not(s), but its intent is much clearer when you want to test whether a sequence is empty.
Correction: the effective boolean value of a sequence is in general more complicated than that. E.g. empty((0)) != not((0)). This applies to exists(s) vs. s in a boolean context as well. However, there are domains of s where empty(s) is equivalent to not(s), so the two could be used interchangeably within those domains. But this goes to show that the use of empty() can make a non-trivial difference in making code easier to understand.
Similarly, exists(s) is equivalent to boolean(s) that already existed in XPath 1.0 (or just s in a boolean context), but again is much clearer about the intent.
Quantified expressions; e.g. "some $x in expression satisfies test($x)" would be equivalent to boolean(expression[test(.)]) (although the new syntax is more flexible, in that you don't need to worry about losing the context item because you have the variable to refer to it by).
Similarly, "every $x in expression satisfies test($x)" would be equivalent to not(expression[not(test(.))]) but is more readable.
These functions and syntax were evidently added at no small cost, solely to serve the goal of writing XPath that is easier to map to how humans think. This implies, as experienced developers know, that understandable code is significantly superior to code that is difficult to understand.
Given all that ... what would be a clear and readable way to write an XPath test expression that asks
Does value X occur in sequence S?
Some ways to do it: (Note: I used X and S notation here to indicate the value and the sequence, but I don't mean to imply that these subexpressions are element name tests, nor that they are simple expressions. They could be complicated.)
X = S: This would be one of the most unreadable, since it requires the reader to
think about which of X and S are sequences vs. single values
understand general comparisons, which are not obvious from the syntax
However, one advantage of this form is that it allows us to put the topic (X) before the comment ("is a member of S"), which, I think, helps in readability.
See also CMS's good point about readability, when the syntax or names make the "cardinality" of X and S obvious.
index-of(S, X): This one is clear about what's intended as a value and what as a sequence (if you remember the order of arguments to index-of()). But it expresses more than we need to: it asks for the index, when all we really want to know is whether X occurs in S. This is somewhat misleading to the reader. An experienced developer will figure out what's intended, with some effort and with understanding of the context. But the more we rely on context to understand the intent of each line, the more understanding the code becomes a circular (spiral) and potentially Sisyphean task! Also, since index-of() is designed to return a list of all the indexes of occurrences of X, it could be more expensive than necessary: a smart processor, in order to evaluate X = S, wouldn't necessarily have to find all the contents of S, nor enumerate them in order; but for index-of(S, X), correct order would have to be determined, and all contents of S must be compared to X. One other drawback of using index-of() is that it's limited to using eq for comparison; you can't, for example, use it to ask whether a node is identical to any node in a given sequence.
Correction: This form, used as a conditional test, can result in a runtime error: Effective boolean value is not defined for a sequence of two or more items starting with a numeric value. (But at least we won't get wrong boolean values, since index-of() can't return a zero.) If S can have multiple instances of X, this is another good reason to prefer form 3 or 6.
exists(index-of(X, S)): makes the intent clearer, and would help the processor eliminate the performance penalty if the processor is smart enough.
some $m in S satisfies $m eq X: This one is very clear, and matches our intent exactly. It seems long-winded compared to 1, and that in itself can reduce readability. But maybe that's an acceptable price for clarity. Keep in mind that X and S could potentially be complex expressions themselves -- they're not necessarily just variable references. An advantage is that since the eq operator is explicit, you can replace it with is or any other comparison operator.
S[. eq X]: clearer than 1, but shares the semantic drawbacks of 2: it computes all members of S that are equal to X. Actually, this could return a false negative (incorrect effective boolean value), if X is falsy. E.g. (0, 1)[. eq 0] returns 0 which is falsy, even though 0 occurs in (0, 1).
exists(S[. eq X]): Clearer than 1, 2, 3, and 5. Not as clear as 4, but shorter. Avoids the drawbacks of 5 (or at least most of them, depending on the processor smarts).
I'm kind of leaning toward the last one, at this point: exists(S[. eq X])
What about you... As a developer coming to a complex, unfamiliar XSLT or XQuery or other program that uses XPath 2.0, and wanting to figure out what that program is doing, which would you find easiest to read?
Apologies for the long question. Thanks for reading this far.
Edit: I changed = to eq wherever possible in the above discussion, to make it easier to see where a "value comparison" (as opposed to a general comparison) was intended.
For what it's worth, if names or context make clear that X is a singleton, I'm happy to use your first form, X = S -- for example when I want to check an attribute value against a set of possible values:
<xsl:when test="#type = ('A', 'A+', 'A-', 'B+')" />
or
<xsl:when test="#type = $magic-types"/>
If I think there is a risk of confusion, then I like your sixth formulation. The less frequently I have to remember the rules for calculating an effective boolean value, the less frequently I make a mistake with them.
I prefer this one:
count(distinct-values($seq)) eq count(distinct-values(($x, $seq)))
When $x is itself a sequence, this expression implements the (value-based) subset of relation between two sets of values, that are represented as sequences. This implementation of subset of has just linear time complexity -- vs many other ways of expressing this, that have O(N^2)) time complexity.
To summarize, the question whether a single value belongs to a set of values is a special case of the question whether one set of values is a subset of another. If we have a good implementation of the latter, we can simply use it for answering the former.
The functx library has a nice implementation of this function, so you can use
functx:is-node-in-sequence($X, $Y)
(this particular function can be found at http://www.xqueryfunctions.com/xq/functx_is-node-in-sequence.html)
The whole functx library is available for both XQuery (http://www.xqueryfunctions.com/) and XSLT (http://www.xsltfunctions.com/)
Marklogic ships the functx library with their core product; other vendors may also.
Another possibility, when you want to know whether node X occurs in sequence S, is
exists((X) intersect S)
I think that's pretty readable, and concise. But it only works when X and the values in S are nodes; if you try to ask
exists(('bob') intersect ('alice', 'bob'))
you'll get a runtime error.
In the program I'm working on now, I need to compare strings, so this isn't an option.
As Dimitri notes, the occurrence of a node in a sequence is a question of identity, not of value comparison.

Asserting and retracting to emulate global variables

I'm doing this to emulate global variables:
update_queue(NewItem) :-
global_queue(Q),
retractall(global_queue(Q)),
append(Q, [NewItem], NewQ),
assert(global_queue(NewQ)).
Is there another way? (Besides passing the variables as arguments, that is). Not necessarily more efficient, I'm just curious.
In SWI-Prolog, there is also nb_setval/2 and b_setval/2 (and corresponding "_getval/2"). Use time/1 to see if that is more efficient. Also a comment on the queue representation: If you represent the initial queue as a pair of variables Q-Q, you can append an element in constant time with:
insert_q0_q(E, Q-[E|Rest], Q-Rest).
that is, you append an element E to the queue by further instantiating the tail (i.e., the second element of the pair), and the new tail is again a free variable. I leave removing an element from the front (also in constant time) as an exercise; hint: when the first element of the pair is a variable, the queue in this representation is empty. Generally, global variables complicate debugging considerably, since you then cannot test the predicates in isolation. As an alternative to passing the queue as arguments (which you already mentioned), consider using DCG notation to thread it through implicitly. This often makes the code more readable, especially if only a small subset of predicates needs to access the "global" arguments.

Resources