In Prolog DCGs, how to remove over general solutions? - prolog

I have a text file containing a sequence. For example:
GGGGGGGGAACCCCCCCCCCTTGGGGGGGGGGGGGGGGAACCCCCCCCCCTTGGGGGGGG
I have wrote the following DCG to find the sequence between AA and TT.
:- use_module(library(pio)).
:- use_module(library(dcg/basics)).
:- portray_text(true).
process(Xs) :- phrase_from_file(find(Xs), 'string.txt').
anyseq([]) -->[].
anyseq([E|Es]) --> [E], anyseq(Es).
begin --> "AA".
end -->"TT".
find(Seq) -->
anyseq(_),begin,anyseq(Seq),end, anyseq(_).
I query and I get:
?- process(Xs).
Xs = "CCCCCCCCCC" ;
Xs = "CCCCCCCCCCTTGGGGGGGGGGGGG...CCCCC" ;
Xs = "CCCCCCCCCC" ;
false.
But I dont want it to find the second solution or ones like it. Only the solutions between one pair of AA and TTs not all combinations. I have a feeling I could use string_without and string in library dcg basiscs but I dont understand how to use them.

your anyseq//1 is identical to string//1 from library(dcg/basics), and shares the same 'problem'.
To keep in control, I would introduce a 'between separators' state:
elem(E) --> begin, string(E), end, !.
begin --> "AA".
end -->"TT".
find(Seq) -->
anyseq(_),elem(Seq).
anyseq([]) -->[].
anyseq([E|Es]) --> [E], anyseq(Es).
process(Xs) :-
phrase(find(Xs), `GGGGGGGGAACCCCCCCCCCTTGGGGGGGGGGGGGGGGAACCCCC+++CCCCCTTGGGGGGGG`,_).
now I get
?- process(X).
X = "CCCCCCCCCC" ;
X = "CCCCC+++CCCCC" ;
false.
note the anonymous var as last argument of phrase/3: it's needed to suit the change in 'control flow' induced by the more strict pattern used: elem//1 is not followed by anyseq//1, because any two sequences 'sharing' anyseq//1 would be problematic.
In the end, you should change your grammar to collect elem//1 with a right recursive grammar....

First, let me suggest that you most probably misrepresent the problem, at least if this is about mRNA-sequences. There, bases occur in triplets, or codons and the start is methionine or formlymethionine, but the end are three different triplets. So most probably you want to use such a representation.
The sequence in between might be defined using all_seq//2, if_/3, (=)/3:
mRNAseq(Cs) -->
[methionine],
all_seq(\C^maplist(dif(C),[amber,ochre,opal]), Cs),
( [amber] | [ochre] | [opal]).
or:
mRNAseq(Cs) -->
[methionine],
all_seq(list_without([amber,ochre,opal]), Cs),
( [amber] | [ochre] | [opal]).
list_without(Xs, E) :-
maplist(dif(E), Xs).
But back to your literal statement, and your question about declarative names. anyseq and seq mean essentially the same.
% :- set_prolog_flag(double_quotes, codes). % pick this
:- set_prolog_flag(double_quotes, chars). % or pick that
... --> [] | [_], ... .
seq([]) -->
[].
seq([E|Es]) -->
[E],
seq(Es).
mRNAcontent(Cs) -->
...,
"AA",
seq(Cs),
"TT",
{no_TT(Cs)}, % restriction
... .
no_TT([]).
no_TT([E|Es0]) :-
if([E] = "T",
( Es0 = [F|Es], dif([F],"T") ),
Es0 = Es),
no_TT(Es).
The meaning of no_TT/1 is: There is no sequence "TT" in the list, nor a "T" at then end. So no_TT("T") fails as well, for it might collide with the subsequent "TT"!
So why is it a good idea to use pure, monotonic definitions? You will most probably be tempted to add restrictions. In a pure monotonic form, restrictions are harmless. But in the procedural version suggested in another answer, you will get simply different results that are no restrictions at all.

Related

Delete 2nd and 3rd occurrence of an element

I want to make a program that given a list L in which element X appears 3 times, it returns the NL list including it only one time.
For example, this question
?- erase([1,2,3,1,6,1,7],1,NL).
should return
NL = [1,2,3,6,7] or NL = [2,3,1,6,7] or NL = [2,3,6,1,7]
P.S.
Suppose that the given list doesn't include any element 2,4 or more times.
So, this is my code, but it returns false when I make a question. Any suggestion to correct it would be appreciated.
erase([],_,[]).
erase(L,X,NL):-
append(A,[X,B,X,C,X,D],L),
append(A,[X,B,C,D],NL).
So you say, that the following query should succeed, but fails
?- erase([1,2,3,1,6,1,7],1,NL).
false.
even the following generalization fails:
?- erase([1,2,3,1,6,1,7],E,NL).
false.
Let me reformulate this for easier access:
?- L = [1,2,3,1,6,1,7], erase(L,E,NL).
false.
So we now have to generalize that list even further. I could try this element by element, but I rather prefer first:
?- L = [_,_,_,_,_,_,_], erase(L,E,NL).
L = [_A,E,_B,E,_C,E,_D], NL = [_A,E,_B,_C,_D]
; false.
This is the only answer. It tells us that E has to occur exactly at the 2nd, 3rd and 5th position. Let's try if that is true:
?- erase([0,1,0,1,0,1,0],1,NL).
NL = [0,1,0,0,0]
; false.
So your solution works — sometimes. It seems that you rather want:
erase(L, X, NL) :-
phrase(
( seq(Any1), [X], seq(Any2), [X], seq(Any3), [X], seq(Any4) ), L),
phrase(
( seq(Any1), seq(Any2), seq(Any3), [X], seq(Any4) ), NL).
seq([]) --> [].
seq([E|Es]) --> [E], seq(Es).
append/2 helps a lot when processing multiple lists:
erase(L,E,R) :-
append([A,[E],B,[E],C,[E],D],L),
select([E],[X,Y,Z],[[],[]]),
append([A, X, B, Y, C, Z, D],R).

All substrings with same begin and end

I have to solve a homework but I have a very limited knowledge of Prolog. The task is the following:
Write a Prolog program which can list all of those substrings of a string, whose length is at least two character and the first and last character is the same.
For example:
?- sameend("teletubbies", R).
R = "telet";
R = "ele";
R = "eletubbie";
R = "etubbie";
R = "bb";
false.
My approach of this problem is that I should iterate over the string with head/tail and find the index of the next letter which is the same as the current (it satisfies the minimum 2-length requirement) and cut the substring with sub_string predicate.
This depends a bit on what you exactly mean by a string. Traditionally in Prolog, a string is a list of characters. To ensure that you really get those, use the directive below. See this answer for more.
:- set_prolog_flag(double_quotes, chars).
sameend(Xs, Ys) :-
phrase( ( ..., [C], seq(Zs), [C], ... ), Xs),
phrase( ( [C], seq(Zs), [C] ), Ys).
... --> [] | [_], ... .
seq([]) -->
[].
seq([E|Es]) -->
[E],
seq(Es).
if your Prolog has append/2 and last/2 in library(lists), it's easy as
sameend(S,[F|T]) :-
append([_,[F|T],_],S),last(T,F).

Using list in Prolog DCG

I am trying to convert a Prolog predicate into DCG code. Even if I am familiar with grammar langage I have some troubles to understand how DCG works with lists and how I am supposed to use it.
Actually, this is my predicate :
cleanList([], []).
cleanList([H|L], [H|LL]) :-
number(H),
cleanList(L, LL),
!.
cleanList([_|L], LL) :-
cleanList(L, LL).
It is a simple predicate which removes non-numeric elements.
I would like to have the same behaviour writes in DCG.
I tried something like that (which does not work obviously) :
cleanList([]) --> [].
cleanList([H]) --> {number(H)}.
cleanList([H|T]) --> [H|T], {number(H)}, cleanList(T).
Is it possible to explain me what is wrong or what is missing ?
Thank you !
The purpose of DCG notation is exactly to hide, or better, make implicit, the tokens list. So, your code should look like
cleanList([]) --> [].
cleanList([H|T]) --> [H], {number(H)}, cleanList(T).
cleanList(L) --> [H], {\+number(H)}, cleanList(L).
that can be made more efficient:
cleanList([]) --> [].
cleanList([H|T]) --> [H], {number(H)}, !, cleanList(T).
cleanList(L) --> [_], cleanList(L).
A style note: Prologgers do prefers to avoid camels :)
clean_list([]) --> [].
etc...
Also, I would prefer more compact code:
clean_list([]) --> [].
clean_list(R) --> [H], {number(H) -> R = [H|T] ; R = T}, clean_list(T).

DCG version for maplist/3

The following meta-predicate is often useful. Note that it cannot be called maplist//2, because its expansion would collide with maplist/4.
maplistDCG(_P_2, []) -->
[].
maplistDCG(P_2, [A|As]) -->
{call(P_2, A, B)},
[B],
maplistDCG(P_2, As).
There are several issues here. Certainly the name. But also the terminal [B]: should it be explicitly disconnected from the connecting goal?
Without above definition, one has to write either one of the following - both having serious termination issues.
maplistDCG1(P_2, As) -->
{maplist(P_2, As, Bs)},
seq(Bs).
maplistDCG2(P_2, As) -->
seq(Bs),
{maplist(P_2, As, Bs)}.
seq([]) -->
[].
seq([E|Es]) -->
[E],
seq(Es).
Does {call(P_2,A,B)}, [B] have advantages over [B], {call(P_2,A,B)}?
(And, if so, shouldn't maplist/3 get something like that, too?)
Let's put corresponding dcg and non-dcg variants side-by-side1:
dcg [B],{call(P_2,A,B)} and non-dcg Bs0 = [B|Bs], call(P_2,A,B)
maplistDCG(_,[]) --> []. % maplist(_,[],[]).
maplistDCG(P_2,[A|As]) --> % maplist(P_2,[A|As],Bs0) :-
[B], % Bs0 = [B|Bs],
{call(P_2,A,B)}, % call(P_2,A,B),
maplistDCG(P_2,As). % maplist(P_2,As,Bs).
dcg {call(P_2,A,B)},[B] and non-dcg call(P_2,A,B), Bs0 = [B|Bs]
maplistDCG(_,[]) --> []. % maplist(_,[],[]).
maplistDCG(P_2,[A|As]) --> % maplist(P_2,[A|As],Bs0) :-
{call(P_2,A,B)}, % call(P_2,A,B),
[B], % Bs0 = [B|Bs],
maplistDCG(P_2,As). % maplist(P_2,As,Bs).
Above, we highlighted the goal ordering in use now:
by maplist/3, as defined in this answer on SO and in the Prolog prologue
by maplistDCG//2, as defined in this question by the OP
If we consider that ...
... termination properties need to be taken into account and ...
... dcg and non-dcg variants should better behave the same2 ...
... we find that the variable should not be explicitly disconnected from the connecting goal. The natural DCG analogue of maplist/3 is maplistDCG//2 defined as follows:
maplistDCG(_,[]) -->
[].
maplistDCG(P_2,[A|As]) -->
[B],
{call(P_2,A,B)},
maplistDCG(P_2,As).
Footnote 1: To emphasize commonalities, we adapted variable names, code layout, and made some unifications explicit.
Footnote 2: ... unless we have really good reasons for their divergent behaviour ...

How to write optional in SWI-Prolog

I need to implement some rules by Prolog
ex:
S ---> A,[b],{c}.
Where:
[b] could happen once or none like 0 or 1 time
{c} could happen 0,1,2,...times
How can i write it?
Edit:
I used this:
:- op(700,xfx,--->).
s ---> [vp].
s ---> [vp,conj,vp].
s ---> [vp,conj,np].
vp ---> [feal_amr],
([mfoal_beh];[]),
([mfoal_beh];[]),
([bdl];[]),
[sefa_optional],
([hal];[]),
([shbh_gomla];[]),
([mfoal_motlk];[]).
It gives me an error "Full stop in clause-body? Cannot redefine ,/2"
in the comma in this line "vp ---> [feal_amr], ..."
Edit
I use "--->" because i have this
parse_topdown(Category,String,Reststring,[Category|Subtrees]) :-
Category ---> RHS,
matches(RHS,String,Reststring,Subtrees).
And "-->" gives an error with the operator ":-"?!!
this is my code for an Arabic Parser Code
I'm sorry for inconvenience but i am not an expert in Prolog
from your description
s --> [a], ([b] ; []), c_1.
c_1 --> [c], c_1 ; [].
some test pattern:
?- phrase(s, [a,b,c,c,c]).
true
?- phrase(s, [a]).
true
edit
about your code: you should use -->. Why you declare ---> (and not define it) ? That way you should write your own analyzer, you're not using DCG.
Note that [vp,conj,vp] it's a list of terminals,
Not sure about feal_amr,mfoal_beh, etc etc, but vp it's surely a nonterminal (it's rewritten).
Then I think you should write
s --> vp.
s --> vp,conj,vp.
s --> vp,conj,np.
vp -->
[feal_amr],
([mfoal_beh];[]),
([mfoal_beh];[]),
([bdl];[]),
[sefa_optional],
([hal];[]),
([shbh_gomla];[]),
([mfoal_motlk];[]).
% I hypotesize it's a comma.
conj --> [','].
edit as noted in comments, you are not using DCG, but your own interpreter. I tested it with a minimal example
:- op(700,xfx,--->).
s ---> [name,verb,names].
names ---> [name, conj, names].
names ---> [name].
names ---> [].
lex(anne, name).
lex(bob, name).
lex(charlie, name).
lex(call, verb).
lex(and, conj).
parse_topdown(Category,[Word|Reststring],Reststring,[Category,Word]) :-
lex(Word,Category).
parse_topdown(Category,String,Reststring,[Category|Subtrees]) :-
Category ---> RHS,
matches(RHS,String,Reststring,Subtrees).
matches([],String,String,[]).
matches([Category|Categories],String,RestString,[Subtree|Subtrees]) :-
parse_topdown(Category,String,String1,Subtree),
matches(Categories,String1,RestString,Subtrees).
and this program accepts 0,1, or more names:
?- parse_topdown(s,[anne,call,bob,and,charlie],R,P).
R = [],
P = [s, [name, anne], [verb, call], [names, [name, bob], [conj, and], [names|...]]] ;
R = [charlie],
P = [s, [name, anne], [verb, call], [names, [name, bob], [conj, and], [names]]] ;
R = [and, charlie],
P = [s, [name, anne], [verb, call], [names, [name, bob]]] ;
R = [bob, and, charlie],
P = [s, [name, anne], [verb, call], [names]] ;
false.
Note I leave R free, to examine partial matches. Coming back to your original question, you can see how the nonterminal names accepts 0,1,or many (separed by and) values.
Note that such interpreter will be very slow on any substantial input. I'd advise you to rewrite your grammar using DCG.
First, you probably want to lower-case S and A; Prolog uses initial caps for variable names.
One way to allow 'b' to occur once or not at all is to write something like this:
s --> a, b_optional, {c}.
b_optional --> [b].
b_optional --> [].
There is also syntax for writing the two rules for b_optional as a single rule, if you prefer; consult the chapter on definite clause grammars in your favorite Prolog text.
I don't know what you mean by c happening 0, 1, 2, ... times, so I don't think I can help you there.

Resources