Prolog : Combining DCG grammars with other restrictions - prolog

I'm very impressed by Prolog's DCG and how quickly I can produce all the possible structures that fit a particular grammar.
But I'd like to combine this search with other constraints. For example, define a complex grammar and ask Prolog to generate all sentences with not more than 10 words. Or all sentences that don't repeat the same word twice.
Is it possible to add extra constraints like this to a DCG grammer? Or do I basically have to translate the DCG back into normal Prolog clauses and start modifying them?

If you only want to see all sentences that are generated, it is very convenient to use the following:
?- length(Xs, N), phrase(mynonterminal, Xs).
Of course that generates all sentences. But it is very useful and it saves you the time to think of a concrete limit. If you want to restrict that further, add the goal between(0,10,N) in front.
If you want to say within a grammar, that a certain non-terminal should take a certain length, it is best to say this explicitly:
seq([]) --> [].
seq([E|Es]) --> [E], seq(Es).
a --> {length(Es,10)}, seq(Es), {phrase(mynonterminal,Es)}.
If you are still not happy, then you want to express the intersection of two non-terminals. This is tantamount to asking the intersection of two context free languages which is in the general case undecidable. But much earlier, you will have problems with termination. So be aware of that in what follows:
:- op( 950, xfx, &).
(NT1 & NT2) -->
call(Xs0^Xs^(phrase(NT1,Xs0,Xs),phrase(NT2,Xs0,Xs))).
The following is only needed if you do not use library(lambda):
^(V0, Goal, V0, V) :-
call(Goal,V).
^(V, Goal, V) :-
call(Goal).
So this permits you now to express the intersection of two non-terminals. But please, be aware that termination is very brittle here. In particular, the termination of the first non-terminal does not necessarily limit the second.

well, you can always use {} and write any kind of prolog predicate in-between, for example:
foo(X)-->
{ valid(X) },
[a].
foo(X)-->
[b].
so you could add some sort of word counter. of course, if each token is a word you could simply write something like: length(L,N), N<11, start(L,[]).
on the other hand, perhaps it will be better, depending on the complexity of the constrains, to encode them in a different part. something like parser->semantic checker in compilers.

Related

functor vs predicate - definition for students

The question of the difference between a functor and a predicate in prolog is asked often.
I am trying to develop an informal definition that is suitable for new students.
A functor is the name of a predicate. The word functor is used when
discussing syntax, such as arity, affix type, and relative priority
over other functors. The word predicate is used when discussing
logical and procedural meaning.
This looks "good enough" to me.
Question: Is it good enough, or is it fundamentally flawed?
To be clear, I am aiming to develop a useful intuition, not write legalistic text for an ISO standard!
The definition in https://www.swi-prolog.org/pldoc/man?section=glossary is:
"functor: Combination of name and arity of a compound term. The term foo(a,b,c) is said to be a term belonging to the functor foo/3." This does not help a lot, and certainly doesn't explain the difference from a predicate, which is defined: "Collection of clauses with the same functor (name/arity). If a goal is proved, the system looks for a predicate with the same functor, then uses indexing to select candidate clauses and then tries these clauses one-by-one. See also backtracking.".
One of the things that often confuses students is that foo(a) could be a term, a goal, or a clause head, depending on the context.
One way to think about term versus predicate/goal is to treat call/1 as if it is implemented by an "infinite" number of clauses that look like this:
call(foo(X)) :- foo(X).
call(foo(X,Y)) :- foo(X,Y).
call(bar(X)) :- bar(X).
etc.
This is why you can pass around at term (which is just data) but treat it as a "goal". So, in Prolog there's no need to have a special "closure" or "thunk" or "predicate" data type - everything can be treated as just data and can be executed by use of the call/1 predicate.
(There are also variations on "call", such as call/2, which can be defined as:
call(foo, X) :- foo(X).
call(foo(X), Y) :- foo(X, Y).
etc.)
This can be used to implement "meta-predicates", such as maplist/2, which takes a list and applies a predicate to each element:
?- maplist(writeln, [one,two,three]).
one
two
three
where a naïve implementation of maplist/2 is (the actual implementation is a bit more complicated, for efficiency):
maplist(_Goal, []).
maplist(Goal, [X|Xs]) :-
call(Goal, X),
maplist(Goal, Xs).
The answer by Peter Ludemann is already very good. I want to address the following from your question:
To be clear, I am aiming to develop a useful intuition, not write legalistic text for an ISO standard!
If you want to develop intuition, don't bother writing definitions. Definitions end up being written in legalese or are useless as definitions. This is why we sometimes explain by describing how the machine will behave, this is supposedly well-defined, while any statement written in natural language is by definition ambiguous. It is interpreted by a human brain, and you have no idea what is in this brain when it interprets it. As a defense, you end up using legalese to write definitions in natural language.
You can give examples, which will leave some impression and probably develop intuition.
"The Prolog compound term a(b, c) can be described by the functor a/2. Here, a is the term name, and 2 is its arity".
"The functor foo/3 describes any term with a name foo and three arguments."
"Atomic terms by definition have arity 0: for example atoms or numbers. The atom a belongs to the functor a/0."
"You can define two predicates with the same name, as long as they have a different number of arguments."
There is also the possibility of confusion because some system predicates that allow introspection might take either a functor or the head of the predicate they work on. For example, abolish/1 takes a functor, while retractall/1 takes the predicate head.....

Predicate for removing certain terms from compound term in Prolog

I would like to write a Prolog predicate that takes in a compound term as its argument and outputs this compound term with some of the nested terms removed. For example, let's say that I have a compound term:
outer_term(level_one(level_two_a(X), level_two_b(Y)), level_one(level_two_b(Z))).
And I would like to write a predicate extract_terms/2 which would take this term and returned it without occurences of level_two_a/1.
extract_terms(Term, ExtractedTerm) :-
*** Prolog Magic ***.
Is there a built-in (or semi-built-in) way to do this in Prolog? If not, how would I go about doing this? One way that occurs to me would be to use =../2 operator to convert the Term into a list and then somehow use some built-in predicate like subtract/3 to get rid of the predicates that I want. The trouble I have is making this work with list that has nested terms as its items.
I would appreciate any ideas, thank you.
First, a general guideline:
Everything that can be expressed by pattern matching should be expressed by pattern matching.
You have asked a similar question previously, although it was a bit simpler. Still, let us consider the simpler case first: You said a possible instance of a verb phrase would be:
VP = vp(vp(verb(making), adj(quick), np2(noun(improvements))))
and you want to extract the verb. Well, the simplest approach is to use pattern matching, or more generally, unification, like this:
?- VP = vp(vp(verb(making), adj(quick), np2(noun(improvements)))),
VP = vp(vp(Verb, _, _)).
This yields:
Verb = verb(making).
Thus, we have successfully "extracted" verb(making) from such a phrase by virtue of unification.
Now to the slightly more complex task you are considering in this question: At this point, you may wonder whether you have chosen a good representation of your data. Frequent use or even the very necessity of (=..)/2 typically indicates a problem with your representation, since it may mean that you have lost track or control of the possible shapes of your data.
In this concrete case, you state as an example:
outer_term(level_one(level_two_a(X), level_two_b(Y)), level_one(level_two_b(Z))).
and you want to remove occurrences of level_two_a. You can now of course begin to mess with (=..)/2, which requires a conversion of such terms to lists, then some reasoning on these lists, and a second conversion from lists back to such structures. That's not how we want to work with our data. In addition to other drawbacks, it would preclude more general usage patterns that we expect from relations.
Instead, let us fix the data representation so that we can cleanly distinguish the different cases. For example, instead of "hardcoding" the very parameter we need to distinguish the cases inside of a functor, let us make the distinction explicit: We want to be able to distinguish, by pattern matching, level 1 from level 2.
So, the following representation suggests itself:
outer_term([level(1, [level(2, X),
level(2, Y)]),
level(1, [level(2, Z)])]).
This may need some additional attributes, such as a and b, and I leave extending this representation to represent such attributes as an easy exercise. The general idea should be clear though: We have thus achieved a uniform representation about which we can easily reason symbolically.
It is now easy to describe the relation between a (potentially nested) list of such levels and the levels without the "level 2" elements:
without_level_2(Ls0, Ls) :-
phrase(no_level_2(Ls0), Ls).
no_level_2([]) --> [].
no_level_2([L|Ls]) -->
no_level_2_(L),
no_level_2(Ls).
no_level_2_(level(2,_)) --> [].
no_level_2_(level(L,Ls0)) --> [level(L,Ls)],
{ dif(L, 2),
without_level_2(Ls0, Ls) }.
See dcg for more information about this formalism.
Sample query:
?- outer_term(Ts0),
without_level_2(Ts0, Ts).
Yielding:
Ts = [level(1, []), level(1, [])] .
Note that to truly benefit from this representation, you need to obtain it in the first place by using or generating terms of such shapes. Once you have this ensured, you can conveniently stick to pattern matching to distinguish the cases. Among the main benefits of this approach we find convenience, performance and generality. For example, we can use the DCG shown above not only to extract but also to generate terms of this form:
?- length(Ls0, _), without_level_2(Ls0, Ls).
Ls0 = Ls, Ls = [] ;
Ls0 = [level(2, _56)],
Ls = [] ;
Ls0 = Ls, Ls = [level(_130, [])],
dif(_130, 2) ;
Ls0 = [level(_150, [level(2, _164)])],
Ls = [level(_150, [])],
dif(_150, 2) .
This is truly a relation, usable in all directions. For this reason, I have avoided imperative names like "extract", "remove" etc. in the predicate name, because these always suggest a particular direction of use, not doing justice to the generality of the predicate.

Prolog: generate queries out of DCG

I currently have a small Prolog database containing a few people and some predicates for relations. For example:
female(anna).
female(susan).
male(john).
male(timmy).
siblings(anna, susan).
siblings(anna, john).
siblings(susan, john).
sibling(X, Y) :- siblings(X, Y) ; siblings(Y, X).
%X is brother of Y
brother(X, Y) :- male(X), sibling(X, Y).
and I have a DCG which can determine valid questions like
"who is the brother of john", which also works well.
question --> ip, verb, article, noun, pronoun, name.
Now I want my program to make a call to my family-database out of noun and name like this:
noun(X, name).
Which in the example then should be
brother(X, anna).
and then return the answer as a natural-language answer like:
"the brother of anna is john"
Defining the grammer for the answer sentence is no problem either. The only thing I don't know is, how to make the call from my DCG to my database and to get the right values filled into it. I looked around for quite some time now - perhaps I don't know the right search terms - and couldnt find something related to this.
I hope you guys have some good ideas ! :)
Thank you.
Invoking Prolog predicates from DCGs
Regular way: Use {}/1
Use the nonterminal {}//1 to call arbitrary Prolog goals from within DCGs.
For example:
verb --> [V], { verb(V) }.
This defines a nonterminal verb//1. This DCG describes a list consisting of the element V such that verb(V) holds, where verb/1 is a normal Prolog predicate.
In a sense even more regular: Use DCGs throughout!
Note that there is a second way to do this, which is in a sense even easier to understand: You can simply turn everything into DCG nonterminals!
For example, you could say:
female(anna) --> [].
female(susan) --> [].
male(john) --> [].
male(timmy) --> [].
You could then simply use these nonterminals directly. You could define a term_expansion/2 rule that does such a transformation automatically.
In your specific case, using {}/1 is likely preferable, because you already have existing Prolog facts and. But there are definitely cases where using DCGs throughout is preferable.
EDIT: From your comment, I see your question is a bit more involved.
The question is rather about:
Constructing Prolog goals from sentences
This is extremely straight-forward: Essentially, you only need to describe the relation between the Prolog goals you want and the corresponding sentences.
We do this by introducing a new argument to the DCG, and that argument will denote the Prolog goal that needs to be executed to answer the sentence. In your example, you want to relate the sentence "Who is the brother of susan?", to a call of the Prolog predicate brother(X, susan). You already have a nonterminal sentence//0 that describes such sentences. You only need to make explicit the goal that such sentences correspond to. For example:
sentence_goal(noun(X, name)) --> ip, v, a, noun, p, name.
This is only used to illustrate the principle; I'm not claiming that this is already the full solution. The point is simply to show that you can reason about Prolog goals in exactly the same way as about all other terms.
You can then invoke the actual goals in two phases:
first, relate the given sentence to the goal, using this new nonterminal sentence_goal//1
simply call the goal, using call/1 or invoking it directly.
For example:
?- phrase(sentence_goal(Goal), Sentence), Goal.
In your case, all that remains is relating such sentences to the Prolog goals you want to invoke, such as brother_of/2 etc.
None of this needs any side-effects (write/1)! Instead, concentrate on describing the relations between sentences and goals, and let the Prolog toplevel do the printing for you.

Prolog - Return result instead of printing in algorithm

I know there is technically no 'return' in Prolog but I did not know how to formulate the question otherwise.
I found some sample code of an algorithm for finding routes between metro stations. It works well, however it is supposed to just print the result so it makes it hard to be extended or to do a findall/3 for example.
% direct routes
findRoute(X,Y,Lines,Output) :-
line(Line,Stations),
\+ member(Line,Lines),
member(X,Stations),
member(Y,Stations),
append(Output,[[X,Line,Y]],NewOutput),
print(NewOutput).
% needs intermediate stop
findRoute(X,Y,Lines,Output) :-
line(Line,Stations),
\+ member(Line,Lines),
member(X,Stations),
member(Intermediate,Stations),
X\=Intermediate,Intermediate\=Y,
append(Output,[[X,Line,Intermediate]],NewOutput),
findRoute(Intermediate,Y,[Line|Lines],NewOutput).
line is a predicate with an atom and a list containing the stations.
For ex: line(s1, [first_stop, second_stop, third_stop])
So what I am trying to do is get rid of that print at line 11 and add an extra variable to my rule to store the result for later use. However I failed miserably because no matter what I try it either enters infinite loop or returns false.
Now:
?- findRoute(first_stop, third_stop, [], []).
% prints [[first_stop,s1,third_stop]]
Want:
?- findRoute(first_stop, third_stop, [], R).
% [[first_stop,s1,third_stop]] is stored in R
Like you, I also see this pattern frequently among Prolog beginners, especially if they are using bad books and other material:
solve :-
.... some goals ...
compute(A),
write(A).
Almost every line in the above is problematic, for the following reasons:
"solve" is imperative. This does not make sense in a declarative languague like Prolog, because you can use predicates in several directions.
"compute" is also imperative.
write/1 is a side-effect, and its output is only available on the system terminal. This gives us no easy way to actually test the predicate.
Such patterns should always simply look similar to:
solution(S) :-
condition1(...),
condition2(...),
condition_n(S).
where condition1 etc. are simply pure goals that describe what it means that S is a solution.
When querying
?- solution(S).
then bindings for S will automatically be printed on the toplevel. Let the toplevel do the printing for you!
In your case, there is a straight-forward fix: Simply make NewOutput one of the arguments, and remove the final side-effect:
route(X, Y, Lines, Output, NewOutput) :-
line(Line, Stations),
\+ member(Line, Lines),
member(X, Stations),
member(Y, Stations),
append(Output, [[X,Line,Y]], NewOutput).
Note also that I have changed the name to just route/5, because the predicate makes sense also if the arguments are all already instantiated, which is useful for testing etc.
Moreover, when describing lists, you will often benefit a lot from using dcg notation.
The code will look similar to this:
route(S, S, _) --> []. % case 1: already there
route(S0, S, Lines) --> % case 2: needs intermediate stop
{ line_stations(Line, Stations0),
maplist(dif(Line), Lines),
select(S0, Stations0, Stations),
member(S1, Stations) },
[link(S0,Line,S1)],
route(S1, S, [Line|Lines]).
Conveniently, you can use this to describe the concatenation of lists without needing append/3 so much. I have also made a few other changes to enhance purity and readability, and I leave figuring out the exact differences as an easy exercise.
You call this using the DCG interface predicate phrase/2, using:
?- phrase(route(X,Y,[]), Rs).
where Rs is the found route. Note also that I am using terms of the form link/3 to denote the links of the route. It is good practice to use dedicated terms when the arity is known. Lists are for example good if you do not know beforehand how many elements you need to represent.

Making "deterministic success" of Prolog goals explicit

The matter of deterministic success of some Prolog goal has turned up time and again in—at least—the following questions:
Reification of term equality/inequality
Intersection and union of 2 lists
Remove duplicates in list (Prolog)
Prolog: How can I implement the sum of squares of two largest numbers out of three?
Ordering lists with constraint logic programming)
Different methods were used (e.g., provoking certain resource errors, or looking closely at the exact answers given by the Prolog toplevel), but they all appear somewhat ad-hack to me.
I'm looking for a generic, portable, and ISO-conformant way to find out if the execution of some Prolog goal (which succeeded) left some choice-point(s) behind. Some meta predicate, maybe?
Could you please hint me in the right direction? Thank you in advance!
Good news everyone: setup_call_cleanup/3 (currently a draft proposal for ISO) lets you do that in a quite portable and beautiful way.
See the example:
setup_call_cleanup(true, (X=1;X=2), Det=yes)
succeeds with Det == yes when there are no more choice points left.
EDIT: Let me illustrate the awesomeness of this construct, or rather of the very closely related predicate call_cleanup/2, with a simple example:
In the excellent CLP(B) documentation of SICStus Prolog, we find in the description of labeling/1 a very strong guarantee:
Enumerates all solutions by backtracking, but creates choicepoints only if necessary.
This is really a strong guarantee, and at first it may be hard to believe that it always holds. Luckily for us, it is extremely easy to formulate and generate systematic test cases in Prolog to verify such properties, in essence using the Prolog system to test itself.
We start with systematically describing what a Boolean expression looks like in CLP(B):
:- use_module(library(clpb)).
:- use_module(library(lists)).
sat(_) --> [].
sat(a) --> [].
sat(~_) --> [].
sat(X+Y) --> [_], sat(X), sat(Y).
sat(X#Y) --> [_], sat(X), sat(Y).
There are in fact many more cases, but let us restrict ourselves to the above subset of CLP(B) expressions for now.
Why am I using a DCG for this? Because it lets me conveniently describe (a subset of) all Boolean expressions of specific depth, and thus fairly enumerate them all. For example:
?- length(Ls, _), phrase(sat(Sat), Ls).
Ls = [] ;
Ls = [],
Sat = a ;
Ls = [],
Sat = ~_G475 ;
Ls = [_G475],
Sat = _G478+_G479 .
Thus, I am using the DCG only to denote how many available "tokens" have already been consumed when generating expressions, limiting the total depth of the resulting expressions.
Next, we need a small auxiliary predicate labeling_nondet/1, which acts exactly as labeling/1, but is only true if a choice-point still remains. This is where call_cleanup/2 comes in:
labeling_nondet(Vs) :-
dif(Det, true),
call_cleanup(labeling(Vs), Det=true).
Our test case (and by this, we actually mean an infinite sequence of small test cases, which we can very conveniently describe with Prolog) now aims to verify the above property, i.e.:
If there is a choice-point, then there is a further solution.
In other words:
The set of solutions of labeling_nondet/1 is a proper subset of that of labeling/1.
Let us thus describe what a counterexample of the above property looks like:
counterexample(Sat) :-
length(Ls, _),
phrase(sat(Sat), Ls),
term_variables(Sat, Vs),
sat(Sat),
setof(Vs, labeling_nondet(Vs), Sols),
setof(Vs, labeling(Vs), Sols).
And now we use this executable specification in order to find such a counterexample. If the solver works as documented, then we will never find a counterexample. But in this case, we immediately get:
| ?- counterexample(Sat).
Sat = a+ ~_A,
sat(_A=:=_B*a) ? ;
So in fact the property does not hold. Broken down to the essence, although no more solutions remain in the following query, Det is not unified with true:
| ?- sat(a + ~X), call_cleanup(labeling([X]), Det=true).
X = 0 ? ;
no
In SWI-Prolog, the superfluous choice-point is obvious:
?- sat(a + ~X), labeling([X]).
X = 0 ;
false.
I am not giving this example to criticize the behaviour of either SICStus Prolog or SWI: Nobody really cares whether or not a superfluous choice-point is left in labeling/1, least of all in an artificial example that involves universally quantified variables (which is atypical for tasks in which one uses labeling/1).
I am giving this example to show how nicely and conveniently guarantees that are documented and intended can be tested with such powerful inspection predicates...
... assuming that implementors are interested to standardize their efforts, so that these predicates actually work the same way across different implementations! The attentive reader will have noticed that the search for counterexamples produces quite different results when used in SWI-Prolog.
In an unexpected turn of events, the above test case has found a discrepancy in the call_cleanup/2 implementations of SWI-Prolog and SICStus. In SWI-Prolog (7.3.11):
?- dif(Det, true), call_cleanup(true, Det=true).
dif(Det, true).
?- call_cleanup(true, Det=true), dif(Det, true).
false.
whereas both queries fail in SICStus Prolog (4.3.2).
This is the quite typical case: Once you are interested in testing a specific property, you find many obstacles that are in the way of testing the actual property.
In the ISO draft proposal, we see:
Failure of [the cleanup goal] is ignored.
In the SICStus documentation of call_cleanup/2, we see:
Cleanup succeeds determinately after performing some side-effect; otherwise, unexpected behavior may result.
And in the SWI variant, we see:
Success or failure of Cleanup is ignored
Thus, for portability, we should actually write labeling_nondet/1 as:
labeling_nondet(Vs) :-
call_cleanup(labeling(Vs), Det=true),
dif(Det, true).
There is no guarantee in setup_call_cleanup/3 that it detects determinism, i.e. missing choice points in the success of a goal. The 7.8.11.1 Description draft proposal only says:
c) The cleanup handler is called exactly once; no later than
upon failure of G. Earlier moments are:
If G is true or false, C is called at an implementation
dependent moment after the last solution and after the last
observable effect of G.
So there is currently no requirement that:
setup_call_cleanup(true, true, Det=true)
Returns Det=true in the first place. This is also reflected in the test cases 7.8.11.4 Examples that the draf proposal gives, we find one test case which says:
setup_call_cleanup(true, true, X = 2).
Either: Succeeds, unifying X = 2.
Or: Succeeds.
So its both a valid implementation, to detect determinism and not to detect determinism.

Resources