All substrings with same begin and end - prolog

I have to solve a homework but I have a very limited knowledge of Prolog. The task is the following:
Write a Prolog program which can list all of those substrings of a string, whose length is at least two character and the first and last character is the same.
For example:
?- sameend("teletubbies", R).
R = "telet";
R = "ele";
R = "eletubbie";
R = "etubbie";
R = "bb";
false.
My approach of this problem is that I should iterate over the string with head/tail and find the index of the next letter which is the same as the current (it satisfies the minimum 2-length requirement) and cut the substring with sub_string predicate.

This depends a bit on what you exactly mean by a string. Traditionally in Prolog, a string is a list of characters. To ensure that you really get those, use the directive below. See this answer for more.
:- set_prolog_flag(double_quotes, chars).
sameend(Xs, Ys) :-
phrase( ( ..., [C], seq(Zs), [C], ... ), Xs),
phrase( ( [C], seq(Zs), [C] ), Ys).
... --> [] | [_], ... .
seq([]) -->
[].
seq([E|Es]) -->
[E],
seq(Es).

if your Prolog has append/2 and last/2 in library(lists), it's easy as
sameend(S,[F|T]) :-
append([_,[F|T],_],S),last(T,F).

Related

Delete 2nd and 3rd occurrence of an element

I want to make a program that given a list L in which element X appears 3 times, it returns the NL list including it only one time.
For example, this question
?- erase([1,2,3,1,6,1,7],1,NL).
should return
NL = [1,2,3,6,7] or NL = [2,3,1,6,7] or NL = [2,3,6,1,7]
P.S.
Suppose that the given list doesn't include any element 2,4 or more times.
So, this is my code, but it returns false when I make a question. Any suggestion to correct it would be appreciated.
erase([],_,[]).
erase(L,X,NL):-
append(A,[X,B,X,C,X,D],L),
append(A,[X,B,C,D],NL).
So you say, that the following query should succeed, but fails
?- erase([1,2,3,1,6,1,7],1,NL).
false.
even the following generalization fails:
?- erase([1,2,3,1,6,1,7],E,NL).
false.
Let me reformulate this for easier access:
?- L = [1,2,3,1,6,1,7], erase(L,E,NL).
false.
So we now have to generalize that list even further. I could try this element by element, but I rather prefer first:
?- L = [_,_,_,_,_,_,_], erase(L,E,NL).
L = [_A,E,_B,E,_C,E,_D], NL = [_A,E,_B,_C,_D]
; false.
This is the only answer. It tells us that E has to occur exactly at the 2nd, 3rd and 5th position. Let's try if that is true:
?- erase([0,1,0,1,0,1,0],1,NL).
NL = [0,1,0,0,0]
; false.
So your solution works — sometimes. It seems that you rather want:
erase(L, X, NL) :-
phrase(
( seq(Any1), [X], seq(Any2), [X], seq(Any3), [X], seq(Any4) ), L),
phrase(
( seq(Any1), seq(Any2), seq(Any3), [X], seq(Any4) ), NL).
seq([]) --> [].
seq([E|Es]) --> [E], seq(Es).
append/2 helps a lot when processing multiple lists:
erase(L,E,R) :-
append([A,[E],B,[E],C,[E],D],L),
select([E],[X,Y,Z],[[],[]]),
append([A, X, B, Y, C, Z, D],R).

In Prolog DCGs, how to remove over general solutions?

I have a text file containing a sequence. For example:
GGGGGGGGAACCCCCCCCCCTTGGGGGGGGGGGGGGGGAACCCCCCCCCCTTGGGGGGGG
I have wrote the following DCG to find the sequence between AA and TT.
:- use_module(library(pio)).
:- use_module(library(dcg/basics)).
:- portray_text(true).
process(Xs) :- phrase_from_file(find(Xs), 'string.txt').
anyseq([]) -->[].
anyseq([E|Es]) --> [E], anyseq(Es).
begin --> "AA".
end -->"TT".
find(Seq) -->
anyseq(_),begin,anyseq(Seq),end, anyseq(_).
I query and I get:
?- process(Xs).
Xs = "CCCCCCCCCC" ;
Xs = "CCCCCCCCCCTTGGGGGGGGGGGGG...CCCCC" ;
Xs = "CCCCCCCCCC" ;
false.
But I dont want it to find the second solution or ones like it. Only the solutions between one pair of AA and TTs not all combinations. I have a feeling I could use string_without and string in library dcg basiscs but I dont understand how to use them.
your anyseq//1 is identical to string//1 from library(dcg/basics), and shares the same 'problem'.
To keep in control, I would introduce a 'between separators' state:
elem(E) --> begin, string(E), end, !.
begin --> "AA".
end -->"TT".
find(Seq) -->
anyseq(_),elem(Seq).
anyseq([]) -->[].
anyseq([E|Es]) --> [E], anyseq(Es).
process(Xs) :-
phrase(find(Xs), `GGGGGGGGAACCCCCCCCCCTTGGGGGGGGGGGGGGGGAACCCCC+++CCCCCTTGGGGGGGG`,_).
now I get
?- process(X).
X = "CCCCCCCCCC" ;
X = "CCCCC+++CCCCC" ;
false.
note the anonymous var as last argument of phrase/3: it's needed to suit the change in 'control flow' induced by the more strict pattern used: elem//1 is not followed by anyseq//1, because any two sequences 'sharing' anyseq//1 would be problematic.
In the end, you should change your grammar to collect elem//1 with a right recursive grammar....
First, let me suggest that you most probably misrepresent the problem, at least if this is about mRNA-sequences. There, bases occur in triplets, or codons and the start is methionine or formlymethionine, but the end are three different triplets. So most probably you want to use such a representation.
The sequence in between might be defined using all_seq//2, if_/3, (=)/3:
mRNAseq(Cs) -->
[methionine],
all_seq(\C^maplist(dif(C),[amber,ochre,opal]), Cs),
( [amber] | [ochre] | [opal]).
or:
mRNAseq(Cs) -->
[methionine],
all_seq(list_without([amber,ochre,opal]), Cs),
( [amber] | [ochre] | [opal]).
list_without(Xs, E) :-
maplist(dif(E), Xs).
But back to your literal statement, and your question about declarative names. anyseq and seq mean essentially the same.
% :- set_prolog_flag(double_quotes, codes). % pick this
:- set_prolog_flag(double_quotes, chars). % or pick that
... --> [] | [_], ... .
seq([]) -->
[].
seq([E|Es]) -->
[E],
seq(Es).
mRNAcontent(Cs) -->
...,
"AA",
seq(Cs),
"TT",
{no_TT(Cs)}, % restriction
... .
no_TT([]).
no_TT([E|Es0]) :-
if([E] = "T",
( Es0 = [F|Es], dif([F],"T") ),
Es0 = Es),
no_TT(Es).
The meaning of no_TT/1 is: There is no sequence "TT" in the list, nor a "T" at then end. So no_TT("T") fails as well, for it might collide with the subsequent "TT"!
So why is it a good idea to use pure, monotonic definitions? You will most probably be tempted to add restrictions. In a pure monotonic form, restrictions are harmless. But in the procedural version suggested in another answer, you will get simply different results that are no restrictions at all.

Replacing white spaces in prolog

Is it possible in prolog to replace all white spaces of a string with some given character?
Example-
If I have a variable How are you today? and I want How_are_you_today?
For atoms
There are may ways in which this can be done. I find the following particularly simple, using atomic_list_concat/3:
?- atomic_list_concat(Words, ' ', 'How are you today?'), atomic_list_concat(Words, '_', Result).
Words = ['How', are, you, 'today?'],
Result = 'How_are_you_today?'.
For SWI strings
The above can also be done with SWI strings. Unfortunately, there is no string_list_concat/3 which would have made the conversion trivial. split_string/4 is very versatile, but it only does half of the job:
?- split_string("How are you today?", " ", "", Words).
Words = ["How", "are", "you", "today?"].
We can either define string_list_concat/3 ourselves (a first attempt at defining this is shown below) or we need a slightly different approach, e.g. repeated string_concat/3.
string_list_concat(Strings, Separator, String):-
var(String), !,
maplist(atom_string, [Separator0|Atoms], [Separator|Strings]),
atomic_list_concat(Atoms, Separator0, Atom),
atom_string(Atom, String).
string_list_concat(Strings, Separator, String):-
maplist(atom_string, [Separator0,Atom], [Separator,String]),
atomic_list_concat(Atoms, Separator0, Atom),
maplist(atom_string, Atoms, Strings).
And then:
?- string_list_concat(Words, " ", "How are you today?"), string_list_concat(Words, "_", Result).
Words = ["How", "are", "you", "today?"],
Result = "How_are_you_today?".
It all depends on what you mean by a string. SWI has several for them, some are generally available in any common Prolog and conforming to the ISO standard ; and some are specific to SWI and not conforming. So, let's start with those that are generally available:
###Strings as list of character codes — integers representing code points
This representation is often the default, prior to SWI 7 it was the default in SWI, too. The biggest downside is that a list of arbitrary integers can now be confused with text.
:- set_prolog_flag(double_quotes, codes).
codes_replaced(Xs, Ys) :-
maplist(space_repl, Xs, Ys).
space_repl(0' ,0'_).
space_repl(C, C) :- dif(C,0' ).
?- codes_replaced("Spaces !", R).
R = [83,112,97,99,101,115,95,95,33]
; false.
###Strings as list of characters — atoms of length 1
This representation is a bit cleaner since it does not confuse integers with characters, see this reply how to get more compact answers.
:- set_prolog_flag(double_quotes, chars).
chars_replaced(Xs, Ys) :-
maplist(space_replc, Xs, Ys).
space_replc(' ','_').
space_replc(C, C) :- dif(C,' ').
?- chars_replaced("Spaces !", R).
R = ['S',p,a,c,e,s,'_','_',!]
; false.
###Strings as atoms
#WouterBeek already showed you how this can be done with an SWI-specific built-in. I will reuse above:
atom_replaced(A, R) :-
atom_chars(A, Chs),
chars_replaced(Chs, Rs),
atom_chars(R, Rs).
?- atom_replaced('Spaces !',R).
R = 'Spaces__!'
; false.
So far everything applies to iso-prolog
###Strings as an SWI-specific, non-conforming data type
This version does not work in any other system. I mention it for completeness.
SWI-Prolog DCGs allows an easy definition, using 'push back' or lookahead argument:
?- phrase(rep_string(` `, `_`), `How are you`, R),atom_codes(A,R).
R = [72, 111, 119, 95, 97, 114, 101, 95, 121|...],
A = 'How_are_you'
The definition is
rep_string(Sought, Replace), Replace --> Sought, rep_string(Sought, Replace).
rep_string(Sought, Replace), [C] --> [C], rep_string(Sought, Replace).
rep_string(_, _) --> [].
edit To avoid multiple 'solutions', a possibility is
rep_string(Sought, Replace), Replace --> Sought, !, rep_string(Sought, Replace).
rep_string(Sought, Replace), [C] --> [C], !, rep_string(Sought, Replace).
rep_string(_, _) --> [].

dividing a list up to a point in prolog

my_list([this,is,a,dog,.,are,tigers,wild,animals,?,the,boy,eats,mango,.]).
suppose this is a list in prolog which i want to divide in three parts that is up to three full stops and store them in variables.
how can i do that...
counthowmany(_, [], 0) :- !.
counthowmany(X, [X|Q], N) :- !, counthowmany(X, Q, N1), N is N1+1.
counthowmany(X, [_|Q], N) :- counthowmany(X, Q, N).
number_of_sentence(N) :- my_list(L),counthowmany(.,L,N).
i already counted the number of full stops in the list(my_list) now i want to divide the list up to first full stop and store it in a variable and then divide up to second full stop and store in a variable and so on.........
UPDATE: the code slightly simplified after #CapelliC comment.
One of the many ways to do it (another, better way - is to use DCG - definite clause grammar):
You don't really need counthowmany.
split([], []).
split(List, [Part | OtherParts]) :-
append(Part, ['.' | Rest], List),
split(Rest, OtherParts).
Let's try it:
?- my_list(List), split(List, Parts).
List = [this, is, a, dog, '.', tigers, are, wild, animals|...],
Parts = [[this, is, a, dog], [tigers, are, wild, animals], [the, boy, eats, mango]]
Your problem statement did not specify what a sequence without a dot should correspond to. I assume that this would be an invalid sentence - thus failure.
:- use_module(library(lambda)).
list_splitted(Xs, Xss) :-
phrase(sentences(Xss), Xs).
sentences([]) --> [].
sentences([Xs|Xss]) -->
sentence(Xs),
sentences(Xss).
sentence(Xs) -->
% {Xs = [_|_]}, % add this, should empty sentences not be allowed
allseq(dif('.'),Xs),
['.'].
% sentence(Xs) -->
% allseq(\X^maplist(dif(X),['.',?]), Xs),
% (['.']|[?]).
allseq(_P_1, []) --> [].
allseq( P_1, [C|Cs]) -->
[C],
{call(P_1,C)},
allseq(P_1, Cs).
In this answer we define split_/2 based on splitlistIf/3 and list_memberd_t/3:
split_(Xs, Yss) :-
splitlistIf(list_memberd_t(['?','.','!']), Xs, Yss).
Sample queries:
?- _Xs = [this,is,a,dog,'.', are,tigers,wild,animals,?, the,boy,eats,mango,'.'],
split_(_Xs, Yss).
Yss = [ [this,is,a,dog] ,[are,tigers,wild,animals] ,[the,boy,eats,mango] ].
?- split_([a,'.',b,'.'], Yss).
Yss = [[a],[b]]. % succeeds deterministically

Check if string is substring in Prolog

Is there a way to check if a string is a substring of another string in Prolog? I tried converting the string to a list of chars and subsequently checking if the first set is a subset of the second that that doesn't seem to be restrictive enough. This is my current code:
isSubstring(X,Y):-
stringToLower(X,XLower),
stringToLower(Y,YLower),
isSubset(XLower,YLower).
isSubset([],_).
isSubset([H|T],Y):-
member(H,Y),
select(H,Y,Z),
isSubset(T,Z).
stringToLower([],[]).
stringToLower([Char1|Rest1],[Char2|Rest2]):-
char_type(Char2,to_lower(Char1)),
stringToLower(Rest1,Rest2).
If I test this with
isSubstring("test","tesZting").
it returns yes, but should return no.
It is not clear what you mean by a string. But since you say you are converting it to a list, you could mean atoms. ISO Prolog offers atom_concat/3 and sub_atom/5 for this purpose.
?- atom_concat(X,Y,'abc').
X = '', Y = abc
; X = a, Y = bc
; X = ab, Y = c
; X = abc, Y = ''.
?- sub_atom('abcbcbe',Before,Length,After,'bcb').
Before = 1, Length = 3, After = 3
; Before = 3, Length = 3, After = 1.
Otherwise, use DCGs! Here's how
seq([]) --> [].
seq([E|Es]) --> [E], seq(Es).
... --> [] | [_], ... .
subseq([]) --> [].
subseq(Es) --> [_], subseq(Es).
subseq([E|Es]) --> [E], subseq(Es).
seq_substring(S, Sub) :-
phrase((...,seq(Sub),...),S).
seq_subseq(S, Sub) :-
phrase(subseq(Sub),S).
Acknowledgements
The first appearance of above definition of ... is on p. 205, Note 1 of
David B. Searls, Investigating the Linguistics of DNA with Definite Clause Grammars. NACLP 1989, Volume 1.
Prolog strings are lists, where each element of the list is the integer value representing the codepoint of the character in question. The string "abc" is exactly equivalent to the list [97,98,99] (assuming your prolog implementation is using Unicode or ASCII, otherwise the values might differ). That leads to this (probably suboptimal from a Big-O perspective) solution, which basically says that X is a substring of S if
S has a suffix T such that, and
X is a prefix of T
Here's the code:
substring(X,S) :-
append(_,T,S) ,
append(X,_,T) ,
X \= []
.
We restrict X to being something other than the empty list (aka the nil string ""), since one could conceptually find an awful lot of zero-length substrings in any string: a string of length n has 2+(n-1) nil substrings, one between each character in the string, one preceding the first character and one following the last character.
The problem is with your isSubset/2.
There are two distinct situations that you've tried to capture in one predicate. Either you're looking for the first position to try to match your substring, or you've already found that point and are checking whether the strings 'line up'.
isSubset([], _).
isSubSet(Substring, String) :-
findStart(Substring, String, RestString),
line_up(Substring, RestString).
findStart([], String, String).
findStart([H|T], [H|T1], [H|T1]).
findStart(Substring, [_|T], RestString) :-
findStart(Substring, T, RestString).
line_up([], _).
line_up([H|T], [H|T1]) :-
line_up(T, T1).
You can combine these into one predicate, as follows:
isSublist([], L, L).
isSublist([H|T], [H|T1], [H|T1]) :-
isSublist(T, T1, T1).
isSublist(L, [_|T], Rest) :-
isSublist(L, T, Rest).
Using DCG's you can do the following: (SWI)
% anything substring anything
substr(String) --> ([_|_];[]), String, ([_|_];[]).
% is X a substring of Y ?
substring(X,Y) :- phrase(substr(X),Y).

Resources