I can't get my Prolog DCG working with atom concat - prolog

I can't get this Prolog DCG code working:
String1=" ",string_codes(String1,Codes),phrase(spaces(Output),Codes).
spaces(XXs) -->
[X], {X=32}, spaces(Xs),
{char_code(Ch,X), atom_concat(Ch,Xs,XXs)}, !. %%Space
spaces([]) --> [].

I feel like an improved solution would probably be something like this:
spaces(Spaces) --> " ", spaces(S0), { atom_concat(' ', S0, Spaces) }.
spaces('') --> [].
There's no real need to ask what the char is for code 32, you know it's a space. Also, [X], {X=32} from your answer is better as [32], which is still better as " ".

I solved this by changing [] in the base case to ''.
spaces(XXs) -->
[X], {X=32}, spaces(Xs),
{char_code(Ch,X), atom_concat(Ch,Xs,XXs)}, !. %% Space
spaces('') --> [].
String1 = " ",
Codes = [32, 32, 32],
Output = ' '.

If you are doing DCG and using SWI-Prolog there is a library of often used predicates and DCG clauses in dcgbasics. This can be used in code with
:- use_module(library(dcg/basics)).
To list the code for the predicates use listing/1, e.g.
?- listing(dcg_basics:_).
The library has a DCG clause blanks//0 that does what you want, e.g.
?- listing(dcg_basics:blanks).
blanks(A, B) :-
blank(A, C),
!,
D=C,
blanks(D, B).
blanks(A, A).
true.
?- listing(dcg_basics:blank).
blank([C|A], B) :-
nonvar(C),
code_type(C, space),
B=A.
true.
which as DCG is
blank -->
[C],
{
nonvar(C),
code_type(C,space)
}.
blanks -->
blank, !, blanks.
blanks --> [].
NB
The library version uses character codes and not characters.
?- string_codes("",Codes),phrase(blanks,Codes,Rest).
Codes = Rest, Rest = [].
?- string_codes(" ",Codes),phrase(blanks,Codes,Rest).
Codes = [32],
Rest = [].
?- string_codes(" ",Codes),phrase(blanks,Codes,Rest).
Codes = [32, 32],
Rest = [].
?- string_codes(" ",Codes),phrase(blanks,Codes,Rest).
Codes = [32, 32, 32],
Rest = [].

Related

All substrings with same begin and end

I have to solve a homework but I have a very limited knowledge of Prolog. The task is the following:
Write a Prolog program which can list all of those substrings of a string, whose length is at least two character and the first and last character is the same.
For example:
?- sameend("teletubbies", R).
R = "telet";
R = "ele";
R = "eletubbie";
R = "etubbie";
R = "bb";
false.
My approach of this problem is that I should iterate over the string with head/tail and find the index of the next letter which is the same as the current (it satisfies the minimum 2-length requirement) and cut the substring with sub_string predicate.
This depends a bit on what you exactly mean by a string. Traditionally in Prolog, a string is a list of characters. To ensure that you really get those, use the directive below. See this answer for more.
:- set_prolog_flag(double_quotes, chars).
sameend(Xs, Ys) :-
phrase( ( ..., [C], seq(Zs), [C], ... ), Xs),
phrase( ( [C], seq(Zs), [C] ), Ys).
... --> [] | [_], ... .
seq([]) -->
[].
seq([E|Es]) -->
[E],
seq(Es).
if your Prolog has append/2 and last/2 in library(lists), it's easy as
sameend(S,[F|T]) :-
append([_,[F|T],_],S),last(T,F).

In Prolog DCGs, how to remove over general solutions?

I have a text file containing a sequence. For example:
GGGGGGGGAACCCCCCCCCCTTGGGGGGGGGGGGGGGGAACCCCCCCCCCTTGGGGGGGG
I have wrote the following DCG to find the sequence between AA and TT.
:- use_module(library(pio)).
:- use_module(library(dcg/basics)).
:- portray_text(true).
process(Xs) :- phrase_from_file(find(Xs), 'string.txt').
anyseq([]) -->[].
anyseq([E|Es]) --> [E], anyseq(Es).
begin --> "AA".
end -->"TT".
find(Seq) -->
anyseq(_),begin,anyseq(Seq),end, anyseq(_).
I query and I get:
?- process(Xs).
Xs = "CCCCCCCCCC" ;
Xs = "CCCCCCCCCCTTGGGGGGGGGGGGG...CCCCC" ;
Xs = "CCCCCCCCCC" ;
false.
But I dont want it to find the second solution or ones like it. Only the solutions between one pair of AA and TTs not all combinations. I have a feeling I could use string_without and string in library dcg basiscs but I dont understand how to use them.
your anyseq//1 is identical to string//1 from library(dcg/basics), and shares the same 'problem'.
To keep in control, I would introduce a 'between separators' state:
elem(E) --> begin, string(E), end, !.
begin --> "AA".
end -->"TT".
find(Seq) -->
anyseq(_),elem(Seq).
anyseq([]) -->[].
anyseq([E|Es]) --> [E], anyseq(Es).
process(Xs) :-
phrase(find(Xs), `GGGGGGGGAACCCCCCCCCCTTGGGGGGGGGGGGGGGGAACCCCC+++CCCCCTTGGGGGGGG`,_).
now I get
?- process(X).
X = "CCCCCCCCCC" ;
X = "CCCCC+++CCCCC" ;
false.
note the anonymous var as last argument of phrase/3: it's needed to suit the change in 'control flow' induced by the more strict pattern used: elem//1 is not followed by anyseq//1, because any two sequences 'sharing' anyseq//1 would be problematic.
In the end, you should change your grammar to collect elem//1 with a right recursive grammar....
First, let me suggest that you most probably misrepresent the problem, at least if this is about mRNA-sequences. There, bases occur in triplets, or codons and the start is methionine or formlymethionine, but the end are three different triplets. So most probably you want to use such a representation.
The sequence in between might be defined using all_seq//2, if_/3, (=)/3:
mRNAseq(Cs) -->
[methionine],
all_seq(\C^maplist(dif(C),[amber,ochre,opal]), Cs),
( [amber] | [ochre] | [opal]).
or:
mRNAseq(Cs) -->
[methionine],
all_seq(list_without([amber,ochre,opal]), Cs),
( [amber] | [ochre] | [opal]).
list_without(Xs, E) :-
maplist(dif(E), Xs).
But back to your literal statement, and your question about declarative names. anyseq and seq mean essentially the same.
% :- set_prolog_flag(double_quotes, codes). % pick this
:- set_prolog_flag(double_quotes, chars). % or pick that
... --> [] | [_], ... .
seq([]) -->
[].
seq([E|Es]) -->
[E],
seq(Es).
mRNAcontent(Cs) -->
...,
"AA",
seq(Cs),
"TT",
{no_TT(Cs)}, % restriction
... .
no_TT([]).
no_TT([E|Es0]) :-
if([E] = "T",
( Es0 = [F|Es], dif([F],"T") ),
Es0 = Es),
no_TT(Es).
The meaning of no_TT/1 is: There is no sequence "TT" in the list, nor a "T" at then end. So no_TT("T") fails as well, for it might collide with the subsequent "TT"!
So why is it a good idea to use pure, monotonic definitions? You will most probably be tempted to add restrictions. In a pure monotonic form, restrictions are harmless. But in the procedural version suggested in another answer, you will get simply different results that are no restrictions at all.

About a Prolog tokenizer

One of my assignments ask us to build a prolog tokenizer. Right now I wrote a predicate that can change space and tab it new line. But I don't know how to implement that into the main program.
The replace part looks like this:
replace(_, _, [], []).
replace(O, R, [O|T], [R|T2]):- replace(O, R, T, T2).
replace(O, R, [H|T], [H|T2]) :- H \= O, replace(O, R, T, T2).
And the Main part has a predicate called removewhite(list1 list2)
So how can I let removewhite execute replace?
You are a bit 'off trail' toward a tokenizer: removewhite/2 isn't going to buy you any useful functionality. Instead, consider a DCG (of course if your Prolog offers this functionality):
tokenize(String, Tokens) :- phrase(tokenize(Tokens), String).
tokenize([]) --> [].
tokenize(Tokens) --> skip_spaces, tokenize(Tokens).
tokenize([Number|Tokens]) --> number(Number), tokenize(Tokens).
skip_spaces --> code_types(white, [_|_]).
number(N) --> code_types(digit, [C|Cs]), {number_codes(N,[C|Cs])}.
code_types(Type, [C|Cs]) --> [C], {code_type(C,Type)}, !, code_types(Type, Cs).
code_types(_, []) --> [].
despite the simplicity, this is a fairly efficient scanner, easily extensible.
In SWI-Prolog, that has (non ISO compliant) extensions for efficient handling of strings, this can be called from top level like:
?- tokenize(`123 4 567 `, L).
L = [123, 4, 567]
or
?- atom_codes('123 4 567 ',Cs), tokenize(Cs, L).
Cs = [49, 50, 51, 32, 32, 52, 32, 53, 54|...],
L = [123, 4, 567]
Btw, in SWI-Prolog, number//1 is predefined (with much more functionality, of course) in library(dcg/basics).
Anyway, about your question
how can I let removewhite execute replace?
I feel you're really 'barking the wrong tree': removing a space - that actually is a separator - will mess up your input...
You can write a more "powerfull" predicate
replace_all(_, _, [], []).
replace_all(L, R, [X|T], [R|T2]):-
member(X, L),
replace_all(L, R, T, T2).
replace_all(L, R, [X|T], [X|T2]) :-
\+ member(X, L),
replace_all(L, R, T, T2).
Then, you will have
removewhite(List1, List2) :-
remove_all([' ', '\t'], '\n', List1, List2).

Replacing white spaces in prolog

Is it possible in prolog to replace all white spaces of a string with some given character?
Example-
If I have a variable How are you today? and I want How_are_you_today?
For atoms
There are may ways in which this can be done. I find the following particularly simple, using atomic_list_concat/3:
?- atomic_list_concat(Words, ' ', 'How are you today?'), atomic_list_concat(Words, '_', Result).
Words = ['How', are, you, 'today?'],
Result = 'How_are_you_today?'.
For SWI strings
The above can also be done with SWI strings. Unfortunately, there is no string_list_concat/3 which would have made the conversion trivial. split_string/4 is very versatile, but it only does half of the job:
?- split_string("How are you today?", " ", "", Words).
Words = ["How", "are", "you", "today?"].
We can either define string_list_concat/3 ourselves (a first attempt at defining this is shown below) or we need a slightly different approach, e.g. repeated string_concat/3.
string_list_concat(Strings, Separator, String):-
var(String), !,
maplist(atom_string, [Separator0|Atoms], [Separator|Strings]),
atomic_list_concat(Atoms, Separator0, Atom),
atom_string(Atom, String).
string_list_concat(Strings, Separator, String):-
maplist(atom_string, [Separator0,Atom], [Separator,String]),
atomic_list_concat(Atoms, Separator0, Atom),
maplist(atom_string, Atoms, Strings).
And then:
?- string_list_concat(Words, " ", "How are you today?"), string_list_concat(Words, "_", Result).
Words = ["How", "are", "you", "today?"],
Result = "How_are_you_today?".
It all depends on what you mean by a string. SWI has several for them, some are generally available in any common Prolog and conforming to the ISO standard ; and some are specific to SWI and not conforming. So, let's start with those that are generally available:
###Strings as list of character codes — integers representing code points
This representation is often the default, prior to SWI 7 it was the default in SWI, too. The biggest downside is that a list of arbitrary integers can now be confused with text.
:- set_prolog_flag(double_quotes, codes).
codes_replaced(Xs, Ys) :-
maplist(space_repl, Xs, Ys).
space_repl(0' ,0'_).
space_repl(C, C) :- dif(C,0' ).
?- codes_replaced("Spaces !", R).
R = [83,112,97,99,101,115,95,95,33]
; false.
###Strings as list of characters — atoms of length 1
This representation is a bit cleaner since it does not confuse integers with characters, see this reply how to get more compact answers.
:- set_prolog_flag(double_quotes, chars).
chars_replaced(Xs, Ys) :-
maplist(space_replc, Xs, Ys).
space_replc(' ','_').
space_replc(C, C) :- dif(C,' ').
?- chars_replaced("Spaces !", R).
R = ['S',p,a,c,e,s,'_','_',!]
; false.
###Strings as atoms
#WouterBeek already showed you how this can be done with an SWI-specific built-in. I will reuse above:
atom_replaced(A, R) :-
atom_chars(A, Chs),
chars_replaced(Chs, Rs),
atom_chars(R, Rs).
?- atom_replaced('Spaces !',R).
R = 'Spaces__!'
; false.
So far everything applies to iso-prolog
###Strings as an SWI-specific, non-conforming data type
This version does not work in any other system. I mention it for completeness.
SWI-Prolog DCGs allows an easy definition, using 'push back' or lookahead argument:
?- phrase(rep_string(` `, `_`), `How are you`, R),atom_codes(A,R).
R = [72, 111, 119, 95, 97, 114, 101, 95, 121|...],
A = 'How_are_you'
The definition is
rep_string(Sought, Replace), Replace --> Sought, rep_string(Sought, Replace).
rep_string(Sought, Replace), [C] --> [C], rep_string(Sought, Replace).
rep_string(_, _) --> [].
edit To avoid multiple 'solutions', a possibility is
rep_string(Sought, Replace), Replace --> Sought, !, rep_string(Sought, Replace).
rep_string(Sought, Replace), [C] --> [C], !, rep_string(Sought, Replace).
rep_string(_, _) --> [].

Delete vowels in a list

Write a program that deletes vowels (String, NoVowelsString) that deletes all vowels from a given string.
So far I've got the condition vowel(X):- member(X,[a,e,i,o,u]). Then I thought of the one that deletes all the elements from the other list:
delete2([],L1,L1).
delete2([H|T],L1,L3) :-
delete2(H,L1,R2),
delete2(T,R2,L3).
So having these two I thought that I could put a condition to those elements being deleted that they have to be a member of [a,e,i,o,u]. Though I still haven't got anywhere.
The following is based on the reification of term equality/inequality.
First, we first define list_memberd_t/3, which behaves just like the memberd_truth/3 but has a different argument order:
list_memberd_t([] ,_,false).
list_memberd_t([Y|Ys],X,Truth) :-
if_(X=Y, Truth=true, list_memberd_t(Ys,X,Truth)).
list_memberd_truth(Xs,X,Truth) :- list_memberd_t(Xs,X,Truth).
For the sake of brevity, let's define memberd_t/3 based on list_memberd_t/3:
memberd_t(X,Xs,Truth) :- list_memberd_t(Xs,X,Truth).
As a parallel to library(apply), let's define tinclude/3:
:- meta_predicate tinclude(2,?,?).
tinclude(P_2,Xs,Zs) :-
list_tinclude_list(Xs,P_2,Zs).
list_tinclude_list([], _P_2,[]).
list_tinclude_list([E|Es],P_2,Fs0) :-
if_(call(P_2,E), Fs0 = [E|Fs], Fs0 = Fs),
list_tinclude_list(Es,P_2,Fs).
tfilter/3 is another name for tinclude/3:
tfilter(P_2,As,Bs) :-
tinclude(P_2,As,Bs).
Next, we define the meta-predicate texclude/3, the opposite of tinclude/3:
:- meta_predicate texclude(2,?,?).
texclude(P_2,Xs,Zs) :-
list_texclude_list(Xs,P_2,Zs).
list_texclude_list([],_,[]).
list_texclude_list([E|Es],P_2,Fs0) :-
if_(call(P_2,E), Fs0 = Fs, Fs0 = [E|Fs]),
list_texclude_list(Es,P_2,Fs).
Now let's use them together!
?- texclude(list_memberd_truth([a,e,i,o,u]),
[d,e,l,e,t,e,' ',v,o,w,e,l,s,' ',i,n,' ',a,' ',l,i,s,t], Filtered).
Filtered = [d, l, t, ' ',v, w, l,s,' ', n,' ', ' ',l, s,t].
Edit
As an alternative to using above texclude/3, let's use tinclude/3 with an auxiliary predicate not/3 to flip the truth value:
:- meta_predicate not(2,?,?).
not(P_2,X,Truth) :-
call(P_2,X,Truth0),
truth_flipped(Truth0,Truth).
truth_flipped(true,false).
truth_flipped(false,true).
Sample query:
?- tinclude(not(list_memberd_truth([a,e,i,o,u])),
[d,e,l,e,t,e,' ',v,o,w,e,l,s,' ',i,n,' ',a,' ',l,i,s,t], Filtered).
Filtered = [d, l, t, ' ',v, w, l,s,' ', n,' ', ' ',l, s,t].
here a solution using DCG. Note how the 'output' is obtained (no arguments passing, only difference lists)
novowels --> ("a";"e";"i";"o";"u"), !, novowels.
% or ..
% novowels --> [C], {memberchk(C, "aeiou")}, !, novowels.
novowels, [C] --> [C], !, novowels.
novowels --> [].
I must confess the second cut doesn't like me, but seems required.
test:
?- phrase(novowels, "abcdefghilmnopq", L),format('~s',[L]).
bcdfghlmnpq
L = [98, 99, 100, 102, 103, 104, 108, 109, 110|...].
edit About the second cut, it seems required by 'left hand' notation: if I code with argument, without cut, I get a correct parsing:
novowels(Cs) --> ("a";"e";"i";"o";"u"), !, novowels(Cs).
% novowels(Cs) --> [C], {memberchk(C, "aeiou")}, !, novowels(Cs).
novowels([C|Cs]) --> [C], novowels(Cs).
novowels([]) --> [].
test:
?- phrase(novowels(L), "abcdefghilmnopq"),format('~s',[L]).
bcdfghlmnpq
L = [98, 99, 100, 102, 103, 104, 108, 109, 110|...] ;
false.
I wonder if this is a bug of the DCG translator, or (more probably) my fault...
Here is the code
deleteV([H|T],R):-member(H,[a,e,i,o,u]),deleteV(T,R),!.
deleteV([H|T],[H|R]):-deleteV(T,R),!.
deleteV([],[]).
What it does?
First it question itself?It's the head a vowel
Yes->We ignore it.
No->We need it.
If it finds an empty list, it constructs the result list, and when returning from backtracking it appends the consonats in front.
This code was tested in SWIProlog.

Resources