Check if string is substring in Prolog - prolog

Is there a way to check if a string is a substring of another string in Prolog? I tried converting the string to a list of chars and subsequently checking if the first set is a subset of the second that that doesn't seem to be restrictive enough. This is my current code:
isSubstring(X,Y):-
stringToLower(X,XLower),
stringToLower(Y,YLower),
isSubset(XLower,YLower).
isSubset([],_).
isSubset([H|T],Y):-
member(H,Y),
select(H,Y,Z),
isSubset(T,Z).
stringToLower([],[]).
stringToLower([Char1|Rest1],[Char2|Rest2]):-
char_type(Char2,to_lower(Char1)),
stringToLower(Rest1,Rest2).
If I test this with
isSubstring("test","tesZting").
it returns yes, but should return no.

It is not clear what you mean by a string. But since you say you are converting it to a list, you could mean atoms. ISO Prolog offers atom_concat/3 and sub_atom/5 for this purpose.
?- atom_concat(X,Y,'abc').
X = '', Y = abc
; X = a, Y = bc
; X = ab, Y = c
; X = abc, Y = ''.
?- sub_atom('abcbcbe',Before,Length,After,'bcb').
Before = 1, Length = 3, After = 3
; Before = 3, Length = 3, After = 1.
Otherwise, use DCGs! Here's how
seq([]) --> [].
seq([E|Es]) --> [E], seq(Es).
... --> [] | [_], ... .
subseq([]) --> [].
subseq(Es) --> [_], subseq(Es).
subseq([E|Es]) --> [E], subseq(Es).
seq_substring(S, Sub) :-
phrase((...,seq(Sub),...),S).
seq_subseq(S, Sub) :-
phrase(subseq(Sub),S).
Acknowledgements
The first appearance of above definition of ... is on p. 205, Note 1 of
David B. Searls, Investigating the Linguistics of DNA with Definite Clause Grammars. NACLP 1989, Volume 1.

Prolog strings are lists, where each element of the list is the integer value representing the codepoint of the character in question. The string "abc" is exactly equivalent to the list [97,98,99] (assuming your prolog implementation is using Unicode or ASCII, otherwise the values might differ). That leads to this (probably suboptimal from a Big-O perspective) solution, which basically says that X is a substring of S if
S has a suffix T such that, and
X is a prefix of T
Here's the code:
substring(X,S) :-
append(_,T,S) ,
append(X,_,T) ,
X \= []
.
We restrict X to being something other than the empty list (aka the nil string ""), since one could conceptually find an awful lot of zero-length substrings in any string: a string of length n has 2+(n-1) nil substrings, one between each character in the string, one preceding the first character and one following the last character.

The problem is with your isSubset/2.
There are two distinct situations that you've tried to capture in one predicate. Either you're looking for the first position to try to match your substring, or you've already found that point and are checking whether the strings 'line up'.
isSubset([], _).
isSubSet(Substring, String) :-
findStart(Substring, String, RestString),
line_up(Substring, RestString).
findStart([], String, String).
findStart([H|T], [H|T1], [H|T1]).
findStart(Substring, [_|T], RestString) :-
findStart(Substring, T, RestString).
line_up([], _).
line_up([H|T], [H|T1]) :-
line_up(T, T1).
You can combine these into one predicate, as follows:
isSublist([], L, L).
isSublist([H|T], [H|T1], [H|T1]) :-
isSublist(T, T1, T1).
isSublist(L, [_|T], Rest) :-
isSublist(L, T, Rest).

Using DCG's you can do the following: (SWI)
% anything substring anything
substr(String) --> ([_|_];[]), String, ([_|_];[]).
% is X a substring of Y ?
substring(X,Y) :- phrase(substr(X),Y).

Related

Pattern matching with lists and strings with Prolog

a prolog newby here.
I found the following code online:
string_to_list_of_characters(String, Characters) :-
name(String, Xs),
maplist( number_to_character,
Xs, Characters ).
number_to_character(Number, Character) :-
name(Character, [Number]).
I want to use it to do some pattern matching.
This is what I have tried so far:
wordH1(H1) :-
word(H1),
string_length(H1,6),
string_to_list_of_characters(H1, X) = a,_,_,_,_,_.
I want to get all strings which are of length 6 and that start with an a.
You seem to be using some very old learning resource. Instead of writing this string_to_list_of_characters predicate yourself you can just use the builtin atom_chars:
?- atom_chars(apple, Chars).
Chars = [a, p, p, l, e].
?- atom_chars(amazon, Chars).
Chars = [a, m, a, z, o, n].
For pattern matching you can write lists similarly to how you tried to do it, but you need square brackets around the elements. You also don't pattern match on something like a "function application expression" as you would in other programming languages. Rather you apply a predicate and then write a separate unification. So it's not something like atom_chars(A, B) = Something but rather:
?- atom_chars(apple, Chars), Chars = [a,_,_,_,_,_].
false.
?- atom_chars(amazon, Chars), Chars = [a,_,_,_,_,_].
Chars = [a, m, a, z, o, n].

All substrings with same begin and end

I have to solve a homework but I have a very limited knowledge of Prolog. The task is the following:
Write a Prolog program which can list all of those substrings of a string, whose length is at least two character and the first and last character is the same.
For example:
?- sameend("teletubbies", R).
R = "telet";
R = "ele";
R = "eletubbie";
R = "etubbie";
R = "bb";
false.
My approach of this problem is that I should iterate over the string with head/tail and find the index of the next letter which is the same as the current (it satisfies the minimum 2-length requirement) and cut the substring with sub_string predicate.
This depends a bit on what you exactly mean by a string. Traditionally in Prolog, a string is a list of characters. To ensure that you really get those, use the directive below. See this answer for more.
:- set_prolog_flag(double_quotes, chars).
sameend(Xs, Ys) :-
phrase( ( ..., [C], seq(Zs), [C], ... ), Xs),
phrase( ( [C], seq(Zs), [C] ), Ys).
... --> [] | [_], ... .
seq([]) -->
[].
seq([E|Es]) -->
[E],
seq(Es).
if your Prolog has append/2 and last/2 in library(lists), it's easy as
sameend(S,[F|T]) :-
append([_,[F|T],_],S),last(T,F).

convert string to list in prolog

I am a Prolog newbie and am stuck at parsing a string to a list.
I have a string of the form
1..2...3..4
I wish to convert it into a list which looks like
[1, _, _, 2, _, _, _, 3, _, _, 4]
How can I achieve this functionality?
Another solution is to use DCG's. The code is straightforward:
digit(N) -->
[ D ], { member(D, "0123456789"), number_codes(N, [D]) }.
dot(_) --> ".".
token(T) --> digit(T).
token(T) --> dot(T).
tokens([T|Ts]) --> token(T), tokens(Ts).
tokens([]) --> "".
parse_codes(In, Out):-
phrase(tokens(Out), In, "").
parse_atom(In, Out):-
atom_codes(In, Codes),
parse_codes(Codes, Out).
Testing on SWI-Prolog with "string" (which is actually just a list of codes):
?- parse_codes("1..24.4", Out).
Out = [1, _G992, _G995, 2, 4, _G1070, 4] .
And with an atom (which is just converted to codes before using the same predicate):
?- parse_atom('1..22.4', Out).
Out = [1, _G971, _G974, 2, 2, _G1049, 4] .
SWI-Prolog prints anonymous variables (_) in a bit fancier notation but otherwise it should be the same result you need.
Yet another way.. take advantage of the fact that ascii numbers for 0..9 are known/fixed, then no type conversions or checks are needed, just subtractions.
% case 1: char is in decimal range 0-9, ie ascii 48,49,50,51,52,53,54,55,56,57
% so eg. char 48 returns integer 0
onechar(Char, Out) :-
between(48, 57, Char),
Out is Char -48.
% case 2: case 1 failed, dot '.' is ascii 46, use anonymous variable
onechar(46, _).
% execution
go(InString, OutList) :-
maplist(onechar, InString, OutList).
Execution:
?- go("1..2...3..4", X).
X = [1, _G5638, _G5641, 2, _G5650, _G5653, _G5656, 3, _G5665, _G5668, 4]
Edit: forgot to say that this works because strings are represented as a list of ascii numbers, so string "0123456789" is represented internally as [48,49,50,51,52,53,54,55,56,57].
onechar does the calc for 1 of those list items, then maplist calls the same predicate on all list items.
Edit 2: the 2nd rule was originally:
% case 2: case 1 failed, output is an anon variable
onechar(_, _).
This is too generous - presumably if the input does not contain 0.9 or a dot, then the predicate should fail.
A predicate that describes the relationship between a character in your string and an element of the list could be:
char_to_el(DigitChar, Digit) :- % a character between '0' and '9'
DigitChar >= 0'0, DigitChar =< 0'9,
number_codes(Digit, [DigitChar]).
char_to_el(0'., _). % the element is the '.' characther
The first clause checks whether the character is indeed a digit and converts it to an integer. You could also simply subtract 0'0 from the integer value of the character, so instead of using number_codes/2 you could write Digit is DigitChar - 0'0.
You should be able to use maplist/3 then, according to the gnu prolog manual:
| ?- maplist(char_to_el, "1..2...3..4", L).
L = [1,_,_,2,_,_,_,3,_,_,4]
yes
but it didn't work on my system (old gnu prolog version maybe?), so instead:
str_to_list([], []).
str_to_list([C|Cs], [E|Es]) :-
char_to_el(C, E),
str_to_list(Cs, Es).
| ?- str_to_list("1..2...3..4", L).
L = [1,_,_,2,_,_,_,3,_,_,4]
yes

How to count how many times a character appears in a string list in Prolog?

I want to check if a character exists in a string. So Atom is the string and Ch the character. name is a predicate that converts the string in a list of numbers according to the ASCII code.
find_element is a predicate that is supposed to be true only if element X is part of a list. C is a counter that tells us where exactly element X was found.
This is the result I am getting:
?- exists(prolog,g). [103][112,114,111,108,111,103] false.
-------> 103 is the ASCII code of letter "g" and the list [112,114,111,108,111,103] is the list that represents the string "prolog". The question exists(prolog,g) should have provided a true response.
However the find_element predicate is working correctly. I don't understand why this is happening because when I type for example
?- find_element(5,[3,4,5,6,5,2],X).
I am getting X= 3 ; X = 5 ; false.
---->
which is absolutely fine because it tells me that 5 is the 3rd and the 5th element of the list.
So the problem is that find_element is working when I type something like ?- find_element(5,[3,4,5,6,5,2],X) but it is not when I try to call the predicate exists (which calls find_element).
This is the code:
find_element(X,[X|T],1).
find_element(X,[H|T],C):- find_element(X,T,TEMPC), C is TEMPC +1.
exists(Atom,Ch):- name(Atom,[X|T]), name(Ch,Z), write(Z), write([X|T]), find_element(Z,[X|T],Count).
Thanks in advance
I've cleaned a bit your code, and fixed a bug:
find_element(X,[X|_], 1).
find_element(X,[_|T], C) :-
find_element(X,T,TEMPC),
C is TEMPC +1.
exists(Atom, Ch):-
name(Atom, L),
name(Ch, [Z]),
find_element(Z, L, _Count).
note name(Ch, [Z]) to extract the single character. Now
?- exists(pippo,o).
true
It's worth to note that
?- find_element(3, [1,2,3,4,1,2,3,4],P).
P = 3 ;
P = 7 ;
false.
?- nth1(P, [1,2,3,4,1,2,3,4], 3).
P = 3 ;
P = 7 ;
false.
your find_element/3 behaves as nth1/3, with arguments 1 and 3 swapped.
Of course there are simpler and more general ways to perform such test. Using ISO builtins
like sub_atom/5 (a really powerful primitive for atom inspection)
?- sub_atom(pippo, _,_,_, o).
true ;
or memberchk/2, after the conversion to character lists that you already know (but using ISO builtin atom_codes/2)
exists(Atom, Ch):-
atom_codes(Atom, L),
atom_codes(Ch, [Z]),
memberchk(Z, L).
To count occurrences of a sub_atom, library(aggregate) can be used
occurences(Atom, Ch, N) :-
aggregate_all(count, sub_atom(Atom, _,_,_, Ch), N).
?- occurences(pippo, p, X).
X = 3.

Prolog - unusual cons syntax for lists

I have come across an unfamiliar bit of Prolog syntax in Lee Naish's paper Higher-order logic programming in Prolog. Here is the first code sample from the paper:
% insertion sort (simple version)
isort([], []).
isort(A.As, Bs) :-
isort(As, Bs1),
isort(A, Bs1, Bs).
% insert number into sorted list
insert(N, [], [N]).
insert(N, H.L, N.H.L) :-
N =< H.
insert(N, H.LO, H.L) :-
N > H,
insert(N, LO, L).
My confusion is with A.As in isort(A.As, Bs) :-. From the context, it appears to be an alternate cons syntax for lists, the equivalent of isort([A|As], Bs) :-.
As well N.H.L appears to be a more convenient way to say [N|[H|L]].
But SWI Prolog won't accept this unusual syntax (unless I'm doing something wrong).
Does anyone recognize it? is my hypothesis correct? Which Prolog interpreter accepts that as valid syntax?
The dot operator was used for lists in the very first Prolog system of 1972, written in Algol-W, sometimes called Prolog 0. It is inspired by similar notation in LISP systems. The following exemple is from the paper The birth of Prolog by Alain Colmerauer and Philippe Roussel – the very creators of Prolog.
+ELEMENT(*X, *X.*Y).
+ELEMENT(*X, *Y.*Z) -ELEMENT(*X, *Z).
At that time, [] used to be NIL.
The next Prolog version, written in Fortran by Battani & Meloni, used cases to distinguish atoms and variables. Then DECsystem 10 Prolog introduced the square bracket notation replacing nil and X.Xs with [] and [X,..Xs] which in later versions of DECsystem 10 received [X|Xs] as an alternative. In ISO Prolog, there is only [X|Xs], .(X,Xs), and as canonical syntax '.'(X,Xs).
Please note that the dot has many different rôles in ISO Prolog. It serves already as
end token when followed by a % or a layout character like SPACE, NEWLINE, TAB.
decimal point in a floating point number, like 3.14159
graphic token char forming graphic tokens as =..
So if you are now declaring . as an infix operator, you have to be very careful. Both with what you write and what Prolog systems will read. A single additional space can change the meaning of a term. Consider two lists of numbers in both notations:
[1,2.3,4]. [5].
1 .2.3.4.[]. 5.[].
Please note that you have to add a space after 1. In this context, an additional white space in front of a number may change the meaning of your terms. Like so:
[1|2.3]. [4]. 5. [].
1 .2.3. 4.[]. 5. [].
Here is another example which might be even more convincing:
[1,-2].
1.(-2).[].
Negative numbers require round brackets within dot-lists.
Today, there is only YAP and XSB left that still offer infix . by default – and they do it differently. And XSB does not even recognize above dot syntax: you need round brackets around some of the nonnegative numbers.
You wrote that N.H.L appears to be a more convenient way to say [N|[H|L]]. There is a simple rule-of-thumb to simplify such expressions in ISO Prolog: Whenever you see within a list the tokens | and [ immediately after each other, you can replace them by , (and remove the corresponding ] on the right side). So you can now write: [N,H|L] which does not look that bad.
You can use that rule also in the other direction. If we have a list [1,2,3,4,5] we can use | as a "razor blade" like so: [1,2,3|[4,5]].
Another remark, since you are reading Naish's paper: In the meantime, it is well understood that only call/N is needed! And ISO Prolog supports call/1, call/2 up to call/8.
Yes, you are right, the dot it's the list cons infix operator. It's actually required by ISO Prolog standard, but usually hidden. I found (and used) that syntax some time ago:
:- module(eog, []).
:- op(103, xfy, (.)).
% where $ARGS appears as argument, replace the call ($ARGS) with a VAR
% the calle goes before caller, binding the VAR (added as last ARG)
funcs(X, (V, Y)) :-
nonvar(X),
X =.. W.As,
% identify meta arguments
( predicate_property(X, meta_predicate M)
% explicitly exclude to handle test(dcg)
% I'd like to handle this case in general way...
, M \= phrase(2, ?, ?)
-> M =.. W.Ms
; true
),
seek_call(As, Ms, Bs, V),
Y =.. W.Bs.
% look for first $ usage
seek_call([], [], _Bs, _V) :-
!, fail.
seek_call(A.As, M.Ms, A.Bs, V) :-
M #>= 0, M #=< 9, % skip meta arguments
!, seek_call(As, Ms, Bs, V).
seek_call(A.As, _, B.As, V) :-
nonvar(A),
A = $(F),
F =.. Fp.FAs,
( current_arithmetic_function(F) % inline arith
-> V = (PH is F)
; append(FAs, [PH], FBs),
V =.. Fp.FBs
),
!, B = PH.
seek_call(A.As, _.Ms, B.As, V) :-
nonvar(A),
A =.. F.FAs,
seek_call(FAs, Ms, FBs, V),
!, B =.. F.FBs.
seek_call(A.As, _.Ms, A.Bs, V) :-
!, seek_call(As, Ms, Bs, V).
:- multifile user:goal_expansion/2.
user:goal_expansion(X, Y) :-
( X = (_ , _) ; X = (_ ; _) ; X = (_ -> _) )
-> !, fail % leave control flow unchanged (useless after the meta... handling?)
; funcs(X, Y).
/* end eog.pl */
I was advised against it. Effectively, the [A|B] syntax it's an evolution of the . operator, introduced for readability.
OT: what's that code?
the code above it's my attempt to sweeten Prolog with functions. Namely, introduces on request, by means of $, the temporary variables required (for instance) by arithmetic expressions
fact(N, F) :-
N > 1 -> F is N * $fact($(N - 1)) ; F is 1.
each $ introduce a variable. After expansion, we have a more traditional fact/2
?- listing(fact).
plunit_eog:fact(A, C) :-
( A>1
-> B is A+ -1,
fact(B, D),
C is A*D
; C is 1
).
Where we have many expressions, that could be useful...
This syntax comes from NU-Prolog. See here. It's probably just the normal list functor '.'/2 redefined as an infix operator, without the need for a trailing empty list:
?- L= .(a,.(b,[])).
L = [a,b]
Yes (0.00s cpu)
?- op(500, xfy, '.').
Yes (0.00s cpu)
?- L = a.b.[].
L = [a,b]
Yes (0.00s cpu)

Resources