Using list in Prolog DCG - prolog

I am trying to convert a Prolog predicate into DCG code. Even if I am familiar with grammar langage I have some troubles to understand how DCG works with lists and how I am supposed to use it.
Actually, this is my predicate :
cleanList([], []).
cleanList([H|L], [H|LL]) :-
number(H),
cleanList(L, LL),
!.
cleanList([_|L], LL) :-
cleanList(L, LL).
It is a simple predicate which removes non-numeric elements.
I would like to have the same behaviour writes in DCG.
I tried something like that (which does not work obviously) :
cleanList([]) --> [].
cleanList([H]) --> {number(H)}.
cleanList([H|T]) --> [H|T], {number(H)}, cleanList(T).
Is it possible to explain me what is wrong or what is missing ?
Thank you !

The purpose of DCG notation is exactly to hide, or better, make implicit, the tokens list. So, your code should look like
cleanList([]) --> [].
cleanList([H|T]) --> [H], {number(H)}, cleanList(T).
cleanList(L) --> [H], {\+number(H)}, cleanList(L).
that can be made more efficient:
cleanList([]) --> [].
cleanList([H|T]) --> [H], {number(H)}, !, cleanList(T).
cleanList(L) --> [_], cleanList(L).
A style note: Prologgers do prefers to avoid camels :)
clean_list([]) --> [].
etc...
Also, I would prefer more compact code:
clean_list([]) --> [].
clean_list(R) --> [H], {number(H) -> R = [H|T] ; R = T}, clean_list(T).

Related

I can't get my Prolog DCG working with atom concat

I can't get this Prolog DCG code working:
String1=" ",string_codes(String1,Codes),phrase(spaces(Output),Codes).
spaces(XXs) -->
[X], {X=32}, spaces(Xs),
{char_code(Ch,X), atom_concat(Ch,Xs,XXs)}, !. %%Space
spaces([]) --> [].
I feel like an improved solution would probably be something like this:
spaces(Spaces) --> " ", spaces(S0), { atom_concat(' ', S0, Spaces) }.
spaces('') --> [].
There's no real need to ask what the char is for code 32, you know it's a space. Also, [X], {X=32} from your answer is better as [32], which is still better as " ".
I solved this by changing [] in the base case to ''.
spaces(XXs) -->
[X], {X=32}, spaces(Xs),
{char_code(Ch,X), atom_concat(Ch,Xs,XXs)}, !. %% Space
spaces('') --> [].
String1 = " ",
Codes = [32, 32, 32],
Output = ' '.
If you are doing DCG and using SWI-Prolog there is a library of often used predicates and DCG clauses in dcgbasics. This can be used in code with
:- use_module(library(dcg/basics)).
To list the code for the predicates use listing/1, e.g.
?- listing(dcg_basics:_).
The library has a DCG clause blanks//0 that does what you want, e.g.
?- listing(dcg_basics:blanks).
blanks(A, B) :-
blank(A, C),
!,
D=C,
blanks(D, B).
blanks(A, A).
true.
?- listing(dcg_basics:blank).
blank([C|A], B) :-
nonvar(C),
code_type(C, space),
B=A.
true.
which as DCG is
blank -->
[C],
{
nonvar(C),
code_type(C,space)
}.
blanks -->
blank, !, blanks.
blanks --> [].
NB
The library version uses character codes and not characters.
?- string_codes("",Codes),phrase(blanks,Codes,Rest).
Codes = Rest, Rest = [].
?- string_codes(" ",Codes),phrase(blanks,Codes,Rest).
Codes = [32],
Rest = [].
?- string_codes(" ",Codes),phrase(blanks,Codes,Rest).
Codes = [32, 32],
Rest = [].
?- string_codes(" ",Codes),phrase(blanks,Codes,Rest).
Codes = [32, 32, 32],
Rest = [].

All substrings with same begin and end

I have to solve a homework but I have a very limited knowledge of Prolog. The task is the following:
Write a Prolog program which can list all of those substrings of a string, whose length is at least two character and the first and last character is the same.
For example:
?- sameend("teletubbies", R).
R = "telet";
R = "ele";
R = "eletubbie";
R = "etubbie";
R = "bb";
false.
My approach of this problem is that I should iterate over the string with head/tail and find the index of the next letter which is the same as the current (it satisfies the minimum 2-length requirement) and cut the substring with sub_string predicate.
This depends a bit on what you exactly mean by a string. Traditionally in Prolog, a string is a list of characters. To ensure that you really get those, use the directive below. See this answer for more.
:- set_prolog_flag(double_quotes, chars).
sameend(Xs, Ys) :-
phrase( ( ..., [C], seq(Zs), [C], ... ), Xs),
phrase( ( [C], seq(Zs), [C] ), Ys).
... --> [] | [_], ... .
seq([]) -->
[].
seq([E|Es]) -->
[E],
seq(Es).
if your Prolog has append/2 and last/2 in library(lists), it's easy as
sameend(S,[F|T]) :-
append([_,[F|T],_],S),last(T,F).

DCG version for maplist/3

The following meta-predicate is often useful. Note that it cannot be called maplist//2, because its expansion would collide with maplist/4.
maplistDCG(_P_2, []) -->
[].
maplistDCG(P_2, [A|As]) -->
{call(P_2, A, B)},
[B],
maplistDCG(P_2, As).
There are several issues here. Certainly the name. But also the terminal [B]: should it be explicitly disconnected from the connecting goal?
Without above definition, one has to write either one of the following - both having serious termination issues.
maplistDCG1(P_2, As) -->
{maplist(P_2, As, Bs)},
seq(Bs).
maplistDCG2(P_2, As) -->
seq(Bs),
{maplist(P_2, As, Bs)}.
seq([]) -->
[].
seq([E|Es]) -->
[E],
seq(Es).
Does {call(P_2,A,B)}, [B] have advantages over [B], {call(P_2,A,B)}?
(And, if so, shouldn't maplist/3 get something like that, too?)
Let's put corresponding dcg and non-dcg variants side-by-side1:
dcg [B],{call(P_2,A,B)} and non-dcg Bs0 = [B|Bs], call(P_2,A,B)
maplistDCG(_,[]) --> []. % maplist(_,[],[]).
maplistDCG(P_2,[A|As]) --> % maplist(P_2,[A|As],Bs0) :-
[B], % Bs0 = [B|Bs],
{call(P_2,A,B)}, % call(P_2,A,B),
maplistDCG(P_2,As). % maplist(P_2,As,Bs).
dcg {call(P_2,A,B)},[B] and non-dcg call(P_2,A,B), Bs0 = [B|Bs]
maplistDCG(_,[]) --> []. % maplist(_,[],[]).
maplistDCG(P_2,[A|As]) --> % maplist(P_2,[A|As],Bs0) :-
{call(P_2,A,B)}, % call(P_2,A,B),
[B], % Bs0 = [B|Bs],
maplistDCG(P_2,As). % maplist(P_2,As,Bs).
Above, we highlighted the goal ordering in use now:
by maplist/3, as defined in this answer on SO and in the Prolog prologue
by maplistDCG//2, as defined in this question by the OP
If we consider that ...
... termination properties need to be taken into account and ...
... dcg and non-dcg variants should better behave the same2 ...
... we find that the variable should not be explicitly disconnected from the connecting goal. The natural DCG analogue of maplist/3 is maplistDCG//2 defined as follows:
maplistDCG(_,[]) -->
[].
maplistDCG(P_2,[A|As]) -->
[B],
{call(P_2,A,B)},
maplistDCG(P_2,As).
Footnote 1: To emphasize commonalities, we adapted variable names, code layout, and made some unifications explicit.
Footnote 2: ... unless we have really good reasons for their divergent behaviour ...

In Prolog DCGs, how to remove over general solutions?

I have a text file containing a sequence. For example:
GGGGGGGGAACCCCCCCCCCTTGGGGGGGGGGGGGGGGAACCCCCCCCCCTTGGGGGGGG
I have wrote the following DCG to find the sequence between AA and TT.
:- use_module(library(pio)).
:- use_module(library(dcg/basics)).
:- portray_text(true).
process(Xs) :- phrase_from_file(find(Xs), 'string.txt').
anyseq([]) -->[].
anyseq([E|Es]) --> [E], anyseq(Es).
begin --> "AA".
end -->"TT".
find(Seq) -->
anyseq(_),begin,anyseq(Seq),end, anyseq(_).
I query and I get:
?- process(Xs).
Xs = "CCCCCCCCCC" ;
Xs = "CCCCCCCCCCTTGGGGGGGGGGGGG...CCCCC" ;
Xs = "CCCCCCCCCC" ;
false.
But I dont want it to find the second solution or ones like it. Only the solutions between one pair of AA and TTs not all combinations. I have a feeling I could use string_without and string in library dcg basiscs but I dont understand how to use them.
your anyseq//1 is identical to string//1 from library(dcg/basics), and shares the same 'problem'.
To keep in control, I would introduce a 'between separators' state:
elem(E) --> begin, string(E), end, !.
begin --> "AA".
end -->"TT".
find(Seq) -->
anyseq(_),elem(Seq).
anyseq([]) -->[].
anyseq([E|Es]) --> [E], anyseq(Es).
process(Xs) :-
phrase(find(Xs), `GGGGGGGGAACCCCCCCCCCTTGGGGGGGGGGGGGGGGAACCCCC+++CCCCCTTGGGGGGGG`,_).
now I get
?- process(X).
X = "CCCCCCCCCC" ;
X = "CCCCC+++CCCCC" ;
false.
note the anonymous var as last argument of phrase/3: it's needed to suit the change in 'control flow' induced by the more strict pattern used: elem//1 is not followed by anyseq//1, because any two sequences 'sharing' anyseq//1 would be problematic.
In the end, you should change your grammar to collect elem//1 with a right recursive grammar....
First, let me suggest that you most probably misrepresent the problem, at least if this is about mRNA-sequences. There, bases occur in triplets, or codons and the start is methionine or formlymethionine, but the end are three different triplets. So most probably you want to use such a representation.
The sequence in between might be defined using all_seq//2, if_/3, (=)/3:
mRNAseq(Cs) -->
[methionine],
all_seq(\C^maplist(dif(C),[amber,ochre,opal]), Cs),
( [amber] | [ochre] | [opal]).
or:
mRNAseq(Cs) -->
[methionine],
all_seq(list_without([amber,ochre,opal]), Cs),
( [amber] | [ochre] | [opal]).
list_without(Xs, E) :-
maplist(dif(E), Xs).
But back to your literal statement, and your question about declarative names. anyseq and seq mean essentially the same.
% :- set_prolog_flag(double_quotes, codes). % pick this
:- set_prolog_flag(double_quotes, chars). % or pick that
... --> [] | [_], ... .
seq([]) -->
[].
seq([E|Es]) -->
[E],
seq(Es).
mRNAcontent(Cs) -->
...,
"AA",
seq(Cs),
"TT",
{no_TT(Cs)}, % restriction
... .
no_TT([]).
no_TT([E|Es0]) :-
if([E] = "T",
( Es0 = [F|Es], dif([F],"T") ),
Es0 = Es),
no_TT(Es).
The meaning of no_TT/1 is: There is no sequence "TT" in the list, nor a "T" at then end. So no_TT("T") fails as well, for it might collide with the subsequent "TT"!
So why is it a good idea to use pure, monotonic definitions? You will most probably be tempted to add restrictions. In a pure monotonic form, restrictions are harmless. But in the procedural version suggested in another answer, you will get simply different results that are no restrictions at all.

Searching Prolog structures

I'm interested in formulae made up from lots of conjunctions (part of a larger problem). I want to write a program that takes something like this:
:- get_params(conj(conj(a,b),c),X)
and returns a list of all the parameters of the conjunctions i.e. X=[a,b,c]. At the moment I can do
:- get_params(conj(a,b),X) to get X=[a,b]
using simple Prolog pattern matching but how would you go about doing things such as
:- get_params(conj(conj(a,b),c),X) to get X=[a,b,c]
It seems really simple but I've been struggling all day!
Since you are describing a list, consider using DCG notation:
params(conj(A,B)) --> !, params(A), params(B).
params(X) --> [X].
Example:
?- phrase(params(conj(conj(a,b),c)), Ps).
Ps = [a, b, c].
Assuming that all conj functors are binary:
get_params(X, Y, L) :-
get_params(X, L1),
get_params(Y, L2),
append(L1, L2, L).
get_params(conj(X, Y), L) :-
get_params(X, Y, L), !.
get_params(A, [A]).

Resources