Standard term order (ISO/IEC 13211-1 7.2 Term order) is defined over all terms — including variables. While there are good uses for this — think of the implementation of setof/3, this makes many otherwise clean and logical uses of the built-ins in 8.4 Term comparison a declarative nightmare with imps (short form for imperative constructs) all around. 8.4 Term comparison features:
8.4 Term comparison
8.4.1 (#=<)/2, (==)/2, (==)/2, (#<)/2, (#>)/2,
(#>=)/2.
8.4.2 compare/3.
8.4.3 sort/2.
8.4.4 keysort/2.
To give an example, consider:
?- X #< a.
true.
This succeeds, because
7.2 Term order
An ordering term_precedes (3.181) defines whether or
not a term X term-precedes a term Y.
If X and Y are identical terms then X term_precedes Y
and Y term_precedes X are both false.
If X and Y have different types: X term_precedes Y iff the
type of X precedes the type of Y in the following order:
variable precedes floating point precedes integer
precedes atom precedes compound.
NOTE — Built-in predicates which test the ordering of terms
are defined in 8.4.
...
And thus all variables are smaller than a. But once X is instantiated:
?- X #< a, X = a.
X = a.
the result becomes invalid.
So that is the problem. To overcome this, one might either use constraints, or stick to core behavior only and therefore produce an instantiation_error.
7.12.2 Error classification
Errors are classified according to the form of Error_term:
a) There shall be an Instantiation Error when an
argument or one of its components is a variable, and an
instantiated argument or component is required. It has
the form instantiation_error.
In this manner we know for sure that a result is well defined as long as no instantiation error occurs.
For (\==)/2, there is already either dif/2 which uses constraints or dif_si/2 (formerly iso_dif/2) which produces a clean instantiation error.
dif_si(X, Y) :-
X \== Y,
( X \= Y -> true
; throw(error(instantiation_error,dif_si/2))
).
So what my question is about: How to define (and name) the corresponding safe term comparison predicates in ISO Prolog? Ideally, without any explicit term traversal. Maybe to clarify: Above dif_si/2 does not use any explicit term traversal. Both (\==)/2 and (\=)/2 traverse the term internally, but the overheads for this are extremely low compared to explicit traversal with (=..)/2 or functor/3, arg/3.
iso_dif/2 is much simpler to implement than a comparison:
The built-in \= operator is available
You now exactly what arguments to provide to\=
Definition
Based on your comments, the safe comparison means that the order won't change if variables in both subterms are instanciated. If we name the comparison lt, we have for example:
lt(a(X), b(Y)) : always holds for all any X and Y, because a #< b
lt(a(X), a(Y)) : we don't know for sure: intanciation_error
lt(a(X), a(X)) : always fails, because X #< X fails
As said in the comments, you want to throw an error if, when doing a side-by-side traversing of both terms, the first (potentially) discriminating pair of terms contains:
two non-identical variables (lt(X,Y))
a variable and a non-variable (lt(X,a), or lt(10,Y))
But first, let's review the possible approaches that you don't want to use:
Define an explicit term-traversal comparison function. I known you'd prefer not to, for performance reason, but still, this is the most straightforward approach. I'd recommend to do it anyway, so that you have a reference implementation to compare against other approaches.
Use constraints to have a delayed comparison: I don't know how to do it using ISO Prolog, but with e.g. ECLiPSe, I would suspend the actual comparison over the set of uninstanciated variables (using term_variables/2), until there is no more variables. Previously, I also suggested using the coroutine/0 predicate, but I overlooked the fact that it does not influence the #< operator (only <).
This approach does not address exactly the same issue as you describe, but it is very close. One advantage is that it does not throw an exception if the eventual values given to variables satisfy the comparison, whereas lt throws one when it doesn't know in advance.
Explicit term traversal (reference implementation)
Here is an implementation of the explicit term traversal approach for lt, the safe version of #<.
Please review it to check if this is what you expect. I might have missed some cases. I am not sure if this is conform to ISO Prolog, but that can be fixed too, if you want.
lt(X,Y) :- X == Y,!,
fail.
lt(X,Y) :- (var(X);var(Y)),!,
throw(error(instanciation_error)).
lt(X,Y) :- atomic(X),atomic(Y),!,
X #< Y.
lt([XH|XT],[YH|YT]) :- !,
(XH == YH ->
lt(XT,YT)
; lt(XH,YH)).
lt(X,Y) :-
functor(X,_,XA),
functor(Y,_,YA),
(XA == YA ->
X =.. XL,
Y =.. YL,
lt(XL,YL)
; XA < YA).
(Edit: taking into account Tudor Berariu's remarks: (i) missing var/var error case, (ii) order by arity first; moreover, fixing (i) allows me to remove subsumes_term for lists. Thanks.)
Implicit term traversal (not working)
Here is my attempt to achieve the same effect without destructuring terms.
every([],_).
every([X|L],X) :-
every(L,X).
lt(X,Y) :-
copy_term(X,X2),
copy_term(Y,Y2),
term_variables(X2,VX),
term_variables(Y2,VY),
every(VX,1),
every(VY,0),
(X #< Y ->
(X2 #< Y2 ->
true
; throw(error(instanciation_error)))
; (X2 #< Y2 ->
throw(error(instanciation_error))
; false)).
Rationale
Suppose that X #< Y succeeds.
We want to check that the relation does not depend on some uninitialized variables.
So, I produce respective copies X2 and Y2 of X and Y, where all variables are instanciated:
In X2, variables are unified with 1.
In Y2, variables are unified with 0.
So, if the relation X2 #< Y2 still holds, we know that we don't rely on the standard term ordering between variables. Otherwise, we throw an exception, because it means that a 1 #< 0 relation, that previously was not occuring, made the relation fail.
Shortcomings
(based on OP's comments)
lt(X+a,X+b) should succeed but produce an error.
At first sight, one may think that unifying variables that occur in both terms with the same value, say val, may fix the situation. However, there might be other occurences of X in the compared terms where this lead to an errorneous judgment.
lt(X,3) should produce an error but succeeds.
In order to fix that case, one should unify X with something that is greater than 3. In the general case, X should take a value that is greater than other any possible term1. Practical limitations aside, the #< relation has no maximum: compound terms are greater than non-compound ones, and by definition, compound terms can be made arbitrarly great.
So, that approach is not conclusive and I don't think it can be corrected easily.
1: Note that for any given term, however, we could find the locally maximal and minimal terms, which would be sufficient for the purpose of the question.
Third try! Developed and tested with GNU Prolog 1.4.4.
Exhibit 'A': "as simple as it gets"
lt(X,Y) :-
X \== Y,
( X \= Y
-> alpha_omega(Alpha,Omega),
term_variables(X+Y,Vars), % A
\+ \+ (label_vars(Vars,Alpha,Omega), X #< Y),
( \+ (label_vars(Vars,Alpha,Omega), X #> Y)
-> true
; throw(error(instantiation_error,lt/2))
)
; throw(error(instantiation_error,lt/2))
).
Exhibit 'B': "no need to label all vars"
lt(X,Y) :-
X \== Y,
( X \= Y
-> alpha_omega(Alpha,Omega),
term_variables(X,Xvars), % B
term_variables(Y,Yvars), % B
vars_vars_needed(Xvars,Yvars,Vars), % B
\+ \+ (label_vars(Vars,Alpha,Omega), X #< Y),
( \+ (label_vars(Vars,Alpha,Omega), X #> Y)
-> true
; throw(error(instantiation_error,lt/2))
)
; throw(error(instantiation_error,lt/2))
).
vars_vars_needed([], [], []).
vars_vars_needed([A|_], [], [A]).
vars_vars_needed([], [B|_], [B]).
vars_vars_needed([A|As],[B|Bs],[A|ABs]) :-
( A \== B
-> ABs = [B]
; vars_vars_needed(As,Bs,ABs)
).
Some shared code:
alpha_omega(Alpha,Omega) :-
Alpha is -(10.0^1000), % HACK!
functor(Omega,z,255). % HACK!
label_vars([],_,_).
label_vars([Alpha|Vs],Alpha,Omega) :- label_vars(Vs,Alpha,Omega).
label_vars([Omega|Vs],Alpha,Omega) :- label_vars(Vs,Alpha,Omega).
This is not a completely original answer, as it builds on #coredump's answer.
There is one type of queries lt/2 (the reference implementation doing explicit term traversal) fails to answer correctly:
| ?- lt(b(b), a(a,a)).
no
| ?- #<(b(b), a(a,a)).
yes
The reason is that the standard order of terms considers the arity before comparing functor names.
Second, lt/2 does not always throw an instatiation_error when it comes to comparing variables:
| ?- lt(a(X), a(Y)).
no
I write here another candidate for a reference explicit implementation:
lt(X,Y):- var(X), nonvar(Y), !, throw(error(instantiation_error)).
lt(X,Y):- nonvar(X), var(Y), !, throw(error(instantiation_error)).
lt(X,Y):-
var(X),
var(Y),
( X \== Y -> throw(error(instatiation_error)) ; !, false).
lt(X,Y):-
functor(X, XFunc, XArity),
functor(Y, YFunc, YArity),
(
XArity < YArity, !
;
(
XArity == YArity, !,
(
XFunc #< YFunc, !
;
XFunc == YFunc,
X =.. [_|XArgs],
Y =.. [_|YArgs],
lt_args(XArgs, YArgs)
)
)
).
lt_args([X1|OtherX], [Y1|OtherY]):-
(
lt(X1, Y1), !
;
X1 == Y1,
lt_args(OtherX, OtherY)
).
The predicate lt_args(Xs, Ys) is true when there is a pair of corresponding arguments Xi, Yi such that lt(Xi, Yi) and Xj == Yj for all the previous pairs Xj, Yj (for example lt_args([a,X,a(X),b|_], [a,X,a(X),c|_]) is true).
Some example queries:
| ?- lt(a(X,Y,c(c),_Z1), a(X,Y,b(b,b),_Z2)).
yes
| ?- lt(a(X,_Y1,c(c),_Z1), a(X,_Y2,b(b,b),_Z2)).
uncaught exception: error(instatiation_error)
What the heck! I'll give it a shot, too!
lt(X,Y) :-
X \== Y,
( X \= Y
-> term_variables(X,Xvars),
term_variables(Y,Yvars),
list_vars_excluded(Xvars,Yvars,XonlyVars),
list_vars_excluded(Yvars,Xvars,YonlyVars),
_ = s(T_alpha),
functor(T_omega,zzzzzzzz,255), % HACK!
copy_term(t(X,Y,XonlyVars,YonlyVars),t(X1,Y1,X1onlyVars,Y1onlyVars)),
copy_term(t(X,Y,XonlyVars,YonlyVars),t(X2,Y2,X2onlyVars,Y2onlyVars)),
maplist(=(T_alpha),X1onlyVars), maplist(=(T_omega),Y1onlyVars),
maplist(=(T_omega),X2onlyVars), maplist(=(T_alpha),Y2onlyVars),
% do T_alpha and T_omega have an impact on the order?
( compare(Cmp,X1,Y1),
compare(Cmp,X2,Y2)
-> Cmp = (<) % no: demand that X #< Y holds
; throw(error(instantiation_error,lt/2))
)
; throw(error(instantiation_error,lt/2))
).
Some more auxiliary stuff:
listHasMember_identicalTo([X|Xs],Y) :-
( X == Y
-> true
; listHasMember_identicalTo(Xs,Y)
).
list_vars_excluded([],_,[]).
list_vars_excluded([X|Xs],Vs,Zs) :-
( listHasMember_identicalTo(Vs,X)
-> Zs = Zs0
; Zs = [X|Zs0]
),
list_vars_excluded(Xs,Vs,Zs0).
Let's have some tests (with GNU Prolog 1.4.4):
?- lt(a(X,Y,c(c),Z1), a(X,Y,b(b,b),Z2)).
yes
?- lt(a(X,Y,b(b,b),Z1), a(X,Y,c(c),Z2)).
no
?- lt(a(X,Y1,c(c),Z1), a(X,Y2,b(b,b),Z2)).
uncaught exception: error(instantiation_error,lt/2)
?- lt(a(X,Y1,b(b,b),Z1), a(X,Y2,c(c),Z2)).
uncaught exception: error(instantiation_error,lt/2)
?- lt(b(b), a(a,a)).
yes
?- lt(a(X), a(Y)).
uncaught exception: error(instantiation_error,lt/2)
?- lt(X, 3).
uncaught exception: error(instantiation_error,lt/2)
?- lt(X+a, X+b).
yes
?- lt(X+a, Y+b).
uncaught exception: error(instantiation_error,lt/2)
?- lt(a(X), b(Y)).
yes
?- lt(a(X), a(Y)).
uncaught exception: error(instantiation_error,lt/2)
?- lt(a(X), a(X)).
no
Edit 2015-05-06
Changed the implementation of lt/2 to use T_alpha and T_omega, not two fresh variables.
lt(X,Y) makes two copies of X (X1 and X2) and two copies of Y (Y1 and Y2).
Shared variables of X and Y are also shared by X1 and Y1, and by X2 and Y2.
T_alpha comes before all other terms (in X1, X2, Y1, Y2) w.r.t. the standard order.
T_omega comes after all other terms in the standard order.
In the copied terms, the variables that are in X but not in Y (and vice versa) are unified with T_alpha / T_omega.
If this has an impact on term ordering, we cannot yet decide the ordering.
If it does not, we're done.
Now, the counterexample given by #false works:
?- lt(X+1,1+2).
uncaught exception: error(instantiation_error,lt/2)
?- X=2, lt(X+1,1+2).
no
Here is a sketch of what I believe might be a working approach. Consider the goal lt(X, Y) and term_variables(X, XVars), term_variables(Y, YVars).
The purpose of the definition is to determine whether or not a further instantiation might change the term order (7.2). So we might want to find out the responsible variables directly. Since term_variables/2 traverses a term in the very same way that is of relevance to term order, the following holds:
If there is an instantiation that changes the term order, then the variables that have to be instantiated to witness that change are in the list prefixes XCs, YCs of XVars and YVars respectively, and either
XCs, YCs, XVars, and YVars are identical, or
XCs and YCs are identical up to the last element, or
XCs and YCs are identical up to the end where one list has a further element, and the other list is identical to its corresponding variable list XVars or YVars.
As an interesting special case, if the first elements in XVars and YVars differ, then those are the only variables to be tested for relevance. So this includes the case where there is no common variable, but it is even more general than that.
Next! This should do better than my previous attempt:
lt(X,Y) :-
X \== Y,
( X \= Y
-> term_variables(X,Xvars),
term_variables(Y,Yvars),
T_alpha is -(10.0^1000), % HACK!
functor(T_omega,z,255), % HACK!
copy_term(t(X,Y,Xvars,Yvars),t(X1,Y1,X1vars,Y1vars)),
copy_term(t(X,Y,Xvars,Yvars),t(X2,Y2,X2vars,Y2vars)),
copy_term(t(X,Y,Xvars,Yvars),t(X3,Y3,X3vars,Y3vars)),
copy_term(t(X,Y,Xvars,Yvars),t(X4,Y4,X4vars,Y4vars)),
maplist(=(T_alpha),X1vars), maplist(maybe_unify(T_omega),Y1vars),
maplist(=(T_omega),X2vars), maplist(maybe_unify(T_alpha),Y2vars),
maplist(=(T_omega),Y3vars), maplist(maybe_unify(T_alpha),X3vars),
maplist(=(T_alpha),Y4vars), maplist(maybe_unify(T_omega),X4vars),
% do T_alpha and T_omega have an impact on the order?
( compare(Cmp,X1,Y1),
compare(Cmp,X2,Y2),
compare(Cmp,X3,Y3),
compare(Cmp,X4,Y4),
-> Cmp = (<) % no: demand that X #< Y holds
; throw(error(instantiation_error,lt/2))
)
; throw(error(instantiation_error,lt/2))
).
The auxiliary maybe_unify/2 deals with variables occurring in both X and Y:
maybe_unify(K,X) :-
( var(X)
-> X = K
; true
).
Checking with GNU-Prolog 1.4.4:
?- lt(a(X,Y,c(c),Z1), a(X,Y,b(b,b),Z2)).
yes
?- lt(a(X,Y,b(b,b),Z1), a(X,Y,c(c),Z2)).
no
?- lt(a(X,Y1,c(c),Z1), a(X,Y2,b(b,b),Z2)).
uncaught exception: error(instantiation_error,lt/2)
?- lt(a(X,Y1,b(b,b),Z1), a(X,Y2,c(c),Z2)).
uncaught exception: error(instantiation_error,lt/2)
?- lt(b(b), a(a,a)).
yes
?- lt(a(X), a(Y)).
uncaught exception: error(instantiation_error,lt/2)
?- lt(X, 3).
uncaught exception: error(instantiation_error,lt/2)
?- lt(X+a, X+b).
yes
?- lt(X+a, Y+b).
uncaught exception: error(instantiation_error,lt/2)
?- lt(a(X), b(Y)).
yes
?- lt(a(X), a(Y)).
uncaught exception: error(instantiation_error,lt/2)
?- lt(a(X), a(X)).
no
?- lt(X+1,1+2).
uncaught exception: error(instantiation_error,lt/2)
?- lt(X+X+2,X+1+3). % NEW
uncaught exception: error(instantiation_error,lt/2)
In this answer we present the predicate safe_term_less_than/2, a monotonic analogue to the iso-prolog built-in predicate (#<)/2 (§8.4.1, "term less than"). Its main properties are:
Explicit traversal of recursive terms.
Based on prolog-coroutining facilities, in particular when/2.
The comparison may progress gradually:
"freeze" whenever instantiation is not sufficient
"wake up" whenever the instantiation of the most significant terms change
The current frontline of the comparison is represented as an explicit (LIFO) stack.
The current state is directly passed around the residual goals.
The following code has been developed and tested on sicstus-prolog version 4.3.2:
safe_term_less_than(L, R) :- % exported predicate
i_less_than_([L-R]).
Above definition of safe_term_less_than/2 is based on the following auxiliary predicates:
i_less_than_([L-R|LRs]) :-
Cond = (?=(L,R) ; nonvar(L),nonvar(R)),
when(Cond, i_lt_step_(L,R,LRs)).
i_lt_step_(L, R, LRs) :-
( L == R
-> i_less_than_(LRs)
; term_itype(L, L_type),
term_itype(R, R_type),
compare(Ord, L_type, R_type),
ord_lt_step_(Ord, L, R, LRs)
).
term_itype(V, T) :-
( var(V) -> throw(error(instantiation_error,_))
; float(V) -> T = t1_float(V)
; integer(V) -> T = t2_integer(V)
; callable(V) -> T = t3_callable(A,F), functor(V, F, A)
; throw(error(system_error,_))
).
ord_lt_step_(<, _, _, _).
ord_lt_step_(=, L, R, LRs) :-
( compound(L)
-> L =.. [_|Ls],
R =.. [_|Rs],
phrase(args_args_paired(Ls,Rs), LRs0, LRs),
i_less_than_(LRs0)
; i_less_than_(LRs)
).
args_args_paired([], []) --> [].
args_args_paired([L|Ls], [R|Rs]) --> [L-R], args_args_paired(Ls, Rs).
Sample queries:
| ?- safe_term_less_than(X, 3).
prolog:trig_nondif(X,3,_A,_B),
prolog:trig_or([_B,X],_A,_A),
prolog:when(_A,(?=(X,3);nonvar(X),nonvar(3)),user:i_lt_step_(X,3,[])) ?
yes
| ?- safe_term_less_than(X, 3), X = 4.
no
| ?- safe_term_less_than(X, 3), X = 2.
X = 2 ? ;
no
| ?- safe_term_less_than(X, a).
prolog:trig_nondif(X,a,_A,_B),
prolog:trig_or([_B,X],_A,_A),
prolog:when(_A,(?=(X,a);nonvar(X),nonvar(a)),user:i_lt_step_(X,a,[])) ? ;
no
| ?- safe_term_less_than(X, a), X = a.
no
| ?- safe_term_less_than(X+2, Y+1), X = Y.
no
In comparison to previous answers, we observe:
The "text volume" of residual goals appears kind of "bloated".
The query ?- safe_term_less_than(X+2, Y+1), X = Y. fails—just like it should!
This answer follows up on my previous one which presented safe_term_less_than/2.
What's next? A safe variant of compare/3—unimaginatively called scompare/3:
scompare(Ord, L, R) :-
i_scompare_ord([L-R], Ord).
i_scompare_ord([], =).
i_scompare_ord([L-R|Ps], X) :-
when((?=(L,R);nonvar(L),nonvar(R)), i_one_step_scompare_ord(L,R,Ps,X)).
i_one_step_scompare_ord(L, R, LRs, Ord) :-
( L == R
-> scompare_ord(LRs, Ord)
; term_itype(L, L_type),
term_itype(R, R_type),
compare(Rel, L_type, R_type),
( Rel \== (=)
-> Ord = Rel
; compound(L)
-> L =.. [_|Ls],
R =.. [_|Rs],
phrase(args_args_paired(Ls,Rs), LRs0, LRs),
i_scompare_ord(LRs0, Ord)
; i_scompare_ord(LRs , Ord)
)
).
The predicates term_itype/2 and args_args_paired//2 are the same as defined previously.