lambda calculus, normal order, normal form, - lambda-calculus

In lambda calculus, if a term has normal form, normal order reduction strategy will always produce it.
I just wonder how to prove the above proposition strictly?

The result you mention is a corollary of the so called standardization theorem, stating that for any reduction sequence M->N there is another "standard" one between the same terms M and N, where you execute redexes in leftmost outermost order. The proof is not so trivial, and there are several different approaches in the literature. I add a short bibliography below.
The recent proof by Kashima 5 (see also 1) has the advantage of not using the notion of residual and of being based on purely inductive techniques. It is also good for formalization 2, but unless you are not already confident with the subject, it is not particularly instructive.
The general idea behind standardization is the following.
Suppose to have two redexes R and S, where S is in leftmost outermost position with respect to R, and consider the following reduction:
R S
M -> P -> N
Then, you can start firing S, instead, but in this way you can possibly duplicate (or erase) the redex R. These redexes, that are essentially what remains of R after firing S, are called residuals, and are usually indicated as R/S (read: residuals of R after S).
So, the basic lemma is that
R S = S (R/S)
In order to use it for standardization, we need to generalize R to an arbitrary sequence ρ (that we may assume to be standard, with no redex in leftmost outermost position w.r.t. S). It is still true that
(*) ρS = S (ρ/S)
but what is not so evident is the standardization of (ρ/S). To this aim,
let us observe that ρ was performed before firing S = C[\x.M N], that
essentially splits the term in three disconnected regions: the context C, M, and N.
This induces a repartition of ρ in three consecutive sequences:
ρ1 inside M
ρ2 inside N
ρ3 inside C
(remember that no redex was in leftmost outermost position w.r.t. S).
The only part that can be duplicated (or erased) is ρ2, and the residuals
ρ2-0 ... ρ2-k are easily ordered according to the different positions of
the k copies of N created by the firing of S. So
S ρ1 ρ2-0 ... ρ2-k ρ_3
is the standard version of (*).
Basic bibliography.
1 A.Asperti, JJ. Levy. The cost of usage in the lambda-calculus. LICS 2013.
3 H. P. Barendregt. The Lambda Calculus, North-Holland (1984).
4 G.Gonthier, JJ. Levy, PA. Mellies. An abstract standardisation theorem. LICS ’92.
2 F.Guidi. Standardization and Confluence in Pure Lambda-Calculus
Formalized for the Matita Theorem Prover. Journal of Formalized Reasoning
5(1):1–25, 2012.
5 R.Kashima. A proof of the standardization theorem in lambda-calculus.
Technical Report C-145,, Tokyo Institute of Technology, 2000.
[6] JW. Klop. Combinatory Reduction Systems. PhD thesis, CWI,
Amsterdam, 1980.
[7] G.Mitschke. The standardisation theorem for the lambda-calculus.
Z. Math.Logik. Grundlag. Math, 25:29–31, 1979
[8] M.Takahashi. Parallel reductions in lambda-calculus.
Information and Computation 118, pp.120-127, 1995.
[9] H. Xi, Upper bounds for standardizations and an application. Journal of Symboloc Logic 64, pp.291-303, 1999.

Related

XOR and logical conjunction

I'm struggling in order to understand the meaning of the following expression:
aᵢ ⊕ bᵢ = xᵢ ∧ yᵢ
I know the symbol ⊕ is actually an exclusive OR, and the ∧ is an and symbol.
But I cannot grasp the overall meaning. What does that mean in simple words?
The context is what is stated here.
Can someone help me?
Thanks a lot
The paper you reference uses notation a⊕b = x·y, but · and ∧ mean the same in this context: logical AND operation on single-bit variables.
This equality describes the requirement of the CHSH game. The game involves two players, Alice and Bob, who cannot communicate with one another. They are each given a single random bit (Alice gets X and Bob gets Y). Alice and Bob then output a single bit they choose independently based on their input bits (A from Alice and B from Bob) with the goal of satisfying the formula X · Y = A ⊕ B.
This game illustrates that quantum entanglement enables strategies that are dramatically better than the purely classical strategies. The best
classical strategy is for Alice and Bob to output 0 regardless of the input - this strategy wins the game 75% of the time. But a quantum strategy exists that allows them to win 85% of the time if they share an entangled pair of qubits before the start of the game.
You can read more on the CHSH game here.

Turning quadratic-time program into a linear-time one with help of monoid?

While I was reading a paper dedicated to Yoneda lemma and it's relationship with profunctors optics, I've encountered following statement:
...Cayley’s
Theorem for monoids (which is the trick that enables the use of an accumulating parameter, which
can often turn a quadratic-time program into a linear-time one)...
The part I am interested in is the trick ... quadratic-time... into a linear-time one. How does it work?
P.S. I am familiar with monoids and common math notation for them, so feel free to use it, if necessary or stick to Haskell.
Following the original paper by H. Bird, the leading example for that claim is list reversal for simply-linked lists, which can be defined as
reverse([a : x]) = append(reverse x, a)
In a direct implementation, to append a to the reverse of the tail x requires n-1 lookup operations to find the end, and the operations count for reverse x, so that the total effort is (n-1)+...+2+1=n*(n-1)/2.
The linear implementation uses the asymmetric complexity of the append operation as append(x,y) has a cost proportional to the length of x, while the length of y does not play any role. As a partial operation, append is an endomorphism on the space of lists, append(x) y = append(x,y).
Now represent the reversed list as the result of a concatenation of these endomorphisms
reverse([a1,a2,...,an])=append(an) ... append(a2) append(a1) []
from which the list reconstruction is a linear cost operation. The previously quadratic "main" cost is "hidden" in the management of the operations stack. However, this in the end is not really needed as the reconstruction of the resulting list can start with the extraction of the first element. This needs the "accumulating element", in the same wild pseudo-code
reverse(x) = reverse_recursion(x,[])
where
reverse_recursion([a : x], y) = reverse_recursion(x, [a : y])
with
reverse_recursion([], y) = y

Prove that all P problems except {} and {a,b}* are complete

It is easy to say that {} and {a,b}* are not P complete because other problems in P can't be reduced to these because {} can't accept anything and {a,b}* cannot reject anything. So, proper mapping can't be done with a reduction function.
But I'm stuck with proving that every other problem in P is P-complete.
You have to be careful when talking about P-completeness because this means different things to different people based on what type of reductions you're allowing. I'm going to assume that you're talking about using polynomial-time reductions. In that case, choose any language L ∈ P other than ∅ or {a, b}*. Now pick any language M in P that you like. Here's a silly reduction from M to L:
Given an input string w, decide whether w in M in polynomial time (this is possible because M ∈ P.)
If w ∈ M, output any string w ∈ L that you'd like (at least one exists because L is nonempty.)
Otherwise, w ∉ M, so output any string w ∉ L that you'd like (at least one exists, because L isn't {a, b}*.
This reduction takes polynomial time because each step takes polynomial time, so it's a polynomial-time reduction from an arbitrary P language to L. Therefore, L is P-complete with respect to polynomial-time reductions.
Generally speaking, when you talk about notions of completeness, you have to make sure that your reductions are given fewer computational resources than the class of solvers that you're using, or you can do weird things like what's described here that make reductions essentially useless.

Example channelling constraints ECLiPSe

Can someone provide a simple example of channelling constraints?
Channelling constraints are used to combine viewpoints of a constraint problem. Handbook of Constraint Programming gives a good explanation of how it works and why it can be useful:
The search variables can be the variables of one of the viewpoints, say X1 (this is discussed further below). As
search proceeds, propagating the constraints C1 removes values from the domains of the
variables in X1. The channelling constraints may then allow values to be removed from
the domains of the variables in X2. Propagating these value deletions using the constraints
of the second model, C2, may remove further values from these variables, and again these
removals can be translated back into the first viewpoint by the channelling constraints. The
net result can be that more values are removed within viewpoint V1 than by the constraints
C1 alone, leading to reduced search.
I do not understand how this is implemented. What are these constraints exactly, how do they look like in a real problem? A simple example would be very helpful.
As stated in Dual Viewpoint Heuristics for Binary Constraint Satisfaction Problems (P.A. Geelen):
Channelling constraints of two different models allows for the expression of a relationship between two sets of variables, one of each model.
This implies assignments in one of the viewpoints can be translated into assignments in the other and vice versa, as well as, when search initiates,
excluded values from one model can be excluded from the other as well.
Let me throw in an example I implemented a while ago while writing a Sudoku solver.
Classic viewpoint
Here we interpret the problem in the same way a human would: using the
integers between 1 and 9 and a definition that all rows, columns and blocks must contain every integer exactly once.
We can easily state this in ECLiPSe using something like:
% Domain
dim(Sudoku,[N,N]),
Sudoku[1..N,1..N] :: 1..N
% For X = rows, cols, blocks
alldifferent(X)
And this is yet sufficient to solve the Sudoku puzzle.
Binary boolean viewpoint
One could however choose to represent integers by their binary boolean arrays (shown in the answer by #jschimpf). In case it's not clear what this does, consider the small example below (this is built-in functionality!):
?­ ic_global:bool_channeling(Digit, [0,0,0,1,0], 1).
Digit = 4
Yes (0.00s cpu)
?­ ic_global:bool_channeling(Digit, [A,B,C,D], 1), C = 1.
Digit = 3
A = 0
B = 0
C = 1
D = 0
Yes (0.00s cpu)
If we use this model to represent a Sudoku, every number will be replaced by its binary boolean array and corresponding constraints can be written. Being trivial for this answer, I will not include all the code for the constraints, but a single sum constraint is yet enough to solve a Sudoku puzzle in its binary boolean representation.
Channelling
Having these two viewpoints with corresponding constrained models now gives the opportunity to channel between them and see if any improvements were made.
Since both models are still just an NxN board, no difference in dimension of representation exists and channelling becomes real easy.
Let me first give you a last example of what a block filled with integers 1..9 would look like in both of our models:
% Classic viewpoint
1 2 3
4 5 6
7 8 9
% Binary Boolean Viewpoint
[](1,0,0,0,0,0,0,0,0) [](0,1,0,0,0,0,0,0,0) [](0,0,1,0,0,0,0,0,0)
[](0,0,0,1,0,0,0,0,0) [](0,0,0,0,1,0,0,0,0) [](0,0,0,0,0,1,0,0,0)
[](0,0,0,0,0,0,1,0,0) [](0,0,0,0,0,0,0,1,0) [](0,0,0,0,0,0,0,0,1)
We now clearly see the link between the models and simply write the code to channel our decision variables. Using Sudoku and BinBools as our boards, the code would look something like:
( multifor([Row,Col],1,N), param(Sudoku,BinBools,N)
do
Value is Sudoku[Row,Col],
ValueBools is BinBools[Row,Col,1..N],
ic_global:bool_channeling(Value,ValueBools,1)
).
At this point, we have a channelled model where, during search, if values are pruned in one of the models, its impact will also occur in the other model. This can then of course lead to further overall constraint propagation.
Reasoning
To explain the usefulness of the binary boolean model for the Sudoku puzzle, we must first differentiate between some provided alldifferent/1 implementations by ECLiPSe:
What this means in practice can be shown as following:
?­ [A, B, C] :: [0..1], ic:alldifferent([A, B, C]).
A = A{[0, 1]}
B = B{[0, 1]}
C = C{[0, 1]}
There are 3 delayed goals.
Yes (0.00s cpu)
?­ [A, B, C] :: [0..1], ic_global:alldifferent([A, B, C]).
No (0.00s cpu)
As there has not yet occurred any assignment using the Forward Checking (ic library), the invalidity of the query is not yet detected, whereas the Bounds Consistent version immediately notices this. This behaviour can lead to considerable differences in constraint propagation while searching and backtracking through highly constrained models.
On top of these two libraries there is the ic_global_gac library intended for global constraints for which generalized arc consistency (also called hyper arc consistency or domain consistency) is maintained. This alldifferent/1 constraint provides even more pruning opportunities than the bounds consistent one, but preserving full domain consistency has its cost and using this library in highly constrained models generally leads to a loss in running performance.
Because of this, I found it interesting for the Sudoku puzzle to try and work with the bounds consistent (ic_global) implementation of alldifferent to maximise performance and subsequently try to approach domain consistency myself by channelling the binary boolean model.
Experiment results
Below are the backtrack results for the 'platinumblonde' Sudoku puzzle (referenced as being the hardest, most chaotic Sudoku puzzle to solve in The Chaos Within Sudoku, M. Ercsey­Ravasz and Z. Toroczkai) using respectively forward checking, bounds consistency, domain consistency, standalone binary boolean model and finally, the channelled model:
(ic) (ic_global) (ic_global_gac) (bin_bools) (channelled)
BT 6 582 43 29 143 30
As we can see, our channelled model (using bounds consistency (ic_global)) still needs one backtrack more than the domain consistent implementation, but it definitely performs better than the standalone bounds consistent version.
When we now take a look at the running times (results are the product of calculating an average over multiple executions, this to avoid extremes) excluding the forward checking implementation as it's proven to no longer be interesting for solving Sudoku puzzles:
(ic_global) (ic_global_gac) (bin_bools) (channelled)
Time(ms) 180ms 510ms 100ms 220ms
Looking at these results, I think we can successfully conclude the experiment (these results were confirmed by 20+ other Sudoku puzzle instances):
Channelling the binary boolean viewpoint to the bounds consistent standalone implementation produces a slightly less strong constraint propagation behaviour than that of the domain consistent standalone implementation, but with running times ranging from just as long to notably faster.
EDIT: attempt to clarify
Imagine some domain variable representing a cell on a Sudoku board has a remaining domain of 4..9. Using bounds consistency, it is guaranteed that for both value 4 and 9 other domain values can be found which satisfy all constraints and thus provides consistency. However, no consistency is explicitly guaranteed for other values in the domain (this is what 'domain consistency' is).
Using a binary boolean model, we define the following two sum constraints:
The sum of every binary boolean array is always equal to 1
The sum of every N'th element of every array in every row/col/block is always equal to 1
The extra constraint strength is enforced by the second constraint which, apart from constraining row, columns and blocks, also implicitly says: "every cell can only contain every digit once". This behaviour is not actively expressed when using just the bounds consistent alldifferent/1 constraint!
Conclusion
It is clear that channelling can be very useful to improve a standalone constrained model, however if the new model's constraint strengthness is weaker than that of the current model, obviously, no improvements will be made. Also note that having a more constrained model doesn't necesarilly also mean an overall better performance! Adding more constraints will in fact decrease amounts of backtracks required to solve a problem, but it might also increase the running times of your program if more constraint checks have to occur.
Channeling constraints are used when, in a model, aspects of a problem are represented in more than one way. They are then necessary to synchronize these multiple representations, even though they do not themselves model an aspect of the problem.
Typically, when modelling a problem with constraints, you have several ways of choosing your variables. For example, in a scheduling problem, you could choose to have
an integer variable for each job (indicating which machine does the job)
an integer variable for each machine (indicating which job it performs)
a matrix of Booleans (indicating which job runs on which machine)
or something more exotic
In a simple enough problem, you choose the representation that makes it easiest to formulate the constraints of the problem. However, in real life problems with many heterogeneous constraints it is often impossible to find such a single best representation: some constraints are best represented with one type of variable, others with another.
In such cases, you can use multiple sets of variables, and formulate each individual problem constraint over the most convenient variable set. Of course, you then end up with multiple independent subproblems, and solving these in isolation will not give you a solution for the whole problem. But by adding channeling constraints, the variable sets can be synchronized, and the subproblems thus re-connected. The result is then a valid model for the whole problem.
As hinted in the quote from the handbook, in such a formulation is is sufficient to perform search on only one of the variable sets ("viewpoints"), because the values of the others are implied by the channeling constraints.
Some common examples for channeling between two representations are:
Integer variable and Array of Booleans:
Consider an integer variable T indicating the time slot 1..N when an event takes place, and an array of Booleans Bs[N] such that Bs[T] = 1 iff an event takes place in time slot T. In ECLiPSe:
T #:: 1..N,
dim(Bs, [N]), Bs #:: 0..1,
Channeling between the two representations can then be set up with
( for(I,1,N), param(T,Bs) do Bs[I] #= (T#=I) )
which will propagate information both ways between T and Bs. Another way of implementing this channeling is the special purpose bool_channeling/3 constraint.
Start/End integer variables and Array of Booleans (timetable):
We have integer variables S,E indicating the start and end time of an activity. On the other side an array of Booleans Bs[N] such that Bs[T] = 1 iff the activity takes place at time T. In ECLiPSe:
[S,E] #:: 1..N,
dim(Bs, [N]), Bs #:: 0..1,
Channeling can be achieved via
( for(I,1,N), param(S,E,Bs) do Bs[I] #= (S#=<I and I#=<E) ).
Dual representation Job/Machine integer variables:
Here, Js[J] = M means that job J is executed on machine M, while the dual formulation Ms[M] = J means that machine M executes job J
dim(Js, [NJobs]), Js #:: 0..NMach,
dim(Ms, [NMach]), Ms #:: 1..NJobs,
And channeling is achieved via
( multifor([J,M],1,[NJobs,NMach]), param(Js,Ms) do
(Js[J] #= M) #= (Ms[M] #= J)
).
Set variable and Array of Booleans:
If you use a solver (such as library(ic_sets)) that can directly handle set-variables, these can be reflected into an array of booleans indicating membership of elements in the set. The library provides a dedicated constraint membership_booleans/2 for this purpose.
Here is a simple example, works in SWI-Prolog, but should
also work in ECLiPSe Prolog (in the later you have to use (::)/2 instead of (in)/2):
Constraint C1:
?- Y in 0..100.
Y in 0..100.
Constraint C2:
?- X in 0..100.
X in 0..100.
Channelling Constraint:
?- 2*X #= 3*Y+5.
2*X#=3*Y+5.
All together:
?- Y in 0..100, X in 0..100, 2*X #= 3*Y+5.
Y in 1..65,
2*X#=3*Y+5,
X in 4..100.
So the channel constraint works in both directions, it
reduces the domain of C1 as well as the domain of C2.
Some systems use iterative methods, with the result that this channelling
can take quite some time, here is an example which needs around
1 minute to compute in SWI-Prolog:
?- time(([U,V] ins 0..1_000_000_000, 36_641*U-24 #= 394_479_375*V)).
% 9,883,559 inferences, 53.616 CPU in 53.721 seconds
(100% CPU, 184341 Lips)
U in 346688814..741168189,
36641*U#=394479375*V+24,
V in 32202..68843.
On the other hand ECLiPSe Prolog does it in a blink:
[eclipse]: U::0..1000000000, V::0..1000000000,
36641*U-24 #= 394479375*V.
U = U{346688814 .. 741168189}
V = V{32202 .. 68843}
Delayed goals:
-394479375 * V{32202 .. 68843} +
36641 * U{346688814 .. 741168189} #= 24
Yes (0.11s cpu)
Bye

Herbrand universe and Least herbrand Model

I read the question asked in Herbrand universe, Herbrand Base and Herbrand Model of binary tree (prolog) and the answers given, but I have a slightly different question more like a confirmation and hopefully my confusion will be clarified.
Let P be a program such that we have the following facts and rule:
q(a, g(b)).
q(b, g(b)).
q(X, g(X)) :- q(X, g(g(g(X)))).
From the above program, the Herbrand Universe
Up = {a, b, g(a), g(b), q(a, g(a)), q(a, g(b)), q(b, g(a)), q(b, g(b)), g(g(a)), g(g(b))...e.t.c}
Herbrand base:
Bp = {q(s, t) | s, t E Up}
Now come to my question(forgive me for my ignorance), i included q(a, g(a)) as an element in my Herbrand Universe but from the fact, it states q(a, g(b)). Does that mean that q(a, g(a)) does not suppose to be there?
Also since the Herbrand models are subset of the Herbrand base, how do i determine the least Herbrand model by induction?
Note: I have done a lot of research on this, and some parts are well clear to me but still i have this doubt in me thats why i want to seek the communities opinion. Thank you.
From having the fact q(a,g(b)) you cannot conclude whether or not q(a,g(a)) is in the model. You will have to generate the model first.
For determining the model, start with the facts {q(a,g(b)), q(b,g(b))} and now try to apply your rules to extend it. In your case, however, there is no way to match the right-hand side of the rule q(X,g(X)) :- q(X,g(g(g(X)))). to above facts. Therefore, you are done.
Now imagine the rule
q(a,g(Y)) :- q(b,Y).
This rule could be used to extend our set. In fact, the instance
q(a,g(g(b))) :- q(b,g(b)).
is used: If q(b,g(b)) is present, conclude q(a,g(g(b))). Note that we are using here the rule right-to-left. So we obtain
{q(a,g(b)), q(b,g(b)), q(a,g(g(b)))}
thereby reaching a fixpoint.
Now take as another example you suggested the rule
q(X, g(g(g(X)))) :- q(X, g(X)).
Which permits (I will no longer show the instantiated rule) to generate in one step:
{q(a,g(b)), q(b,g(b)), q(a,g(g(g(b)))), q(b, g(g(g(b))))}
But this is not the end, since, again, the rule can be applied to produce even more! In fact, you have now an infinite model!
{g(a,gn+1(b)), g(b, gn+1(b))}
This right-to-left reading is often very helpful when you are trying to understand recursive rules in Prolog. The top-down reading (left-to-right) is often quite difficult, in particular, since you have to take into account backtracking and general unification.
Concerning your question:
"Also since the Herbrand models are subset of the Herbrand base, how do i determine the least Herbrand model by induction?"
If you have a set P of horn clauses, the definite program, then you can define
a program operator:
T_P(M) := { H S | S is ground substitution, (H :- B) in P and B S in M }
The least model is:
inf(P) := intersect { M | M |= P }
Please note that not all models of a definite program are fixpoints of the
program operator. For example the full herbrand model is always a model of
the program P, which shows that definite programs are always consistent, but
it is not necessarily a fixpoint.
On the other hand each fixpoint of the program operator is a model of the
definite program. Namely if you have T_P(M) = M, then one can conclude
M |= P. So that after some further mathematical reasoning(*) one finds that
the least fixpoint is also the least model:
lfp(T_P) = inf(P)
But we need some further considerations so that we can say that we can determine
the least model by a kind of computation. Namely one easily observes that the
program operator is contiguous, i.e. preserves infinite unions of chains, since
horn clauses do not have forall quantifiers in their body:
union_i T_P(M_i) = T_P(union_i M_i)
So that again after some further mathematical reasoning(*) one finds that we can
compute the least fixpoint via iteration, witch can be used for simple
induction. Every element of the least model has a simple derivation of finite
depth:
union_i T_P^i({}) = lpf(T_P)
Bye
(*)
Most likely you find further hints on the exact mathematical reasoning
needed in this book, but unfortunately I can't recall which sections
are relevant:
Foundations of Logic Programming, John Wylie Lloyd, 1984
http://www.amazon.de/Foundations-Programming-Computation-Artificial-Intelligence/dp/3642968287

Resources