How to define a string concatenation operator in prolog? - prolog

I am manipulating strings of characters in prolog, and I would like to avoid the use of too many temporary variables in my rules.
I would like to transform something like this:
process_str(Str, Next) :-
is_valid_pattern0(Pattern0),
concat(Pattern0, Tail0, Str),
concat("->", Tail1, Tail0),
is_valid_pattern1(Pattern1),
concat(Pattern1, Tail2, Tail1),
concat("|", Tail2, Next).
using a concatenation operator definition, into something like:
process_str(Pattern0.."->"..Pattern1.."|"..Next, Next) :-
is_valid_pattern0(Pattern0),
is_valid_pattern1(Pattern1).
which I believe would be far more readable, at the expense of a few more operations depending on how the operator is defined.
I found that the documentation talks about defining operators, but as far as I understand, one can only define predicate operators, not functional operators that can "return a value" (like for instance the + operator).
Please tell me either why I am wrong or how to define such a concatenation operator.

Here's a solution using DCG and term_expansion that works in SWI-Prolog. First, the basics:
:- set_prolog_flag(double_quotes, chars).
This ensures that "foo" will be interpreted as a list of three characters, not some non-standard atomic "string" object.
Then, let's assume that valid pattern0 and pattern1 matches are simply lists of letters. It's up to you to fill in the details. Some basic DCGs:
letters -->
[].
letters -->
[Letter],
{ char_type(Letter, alpha) },
letters.
pattern0 -->
letters.
pattern1 -->
letters.
For example:
?- phrase(pattern0, Pattern0).
Pattern0 = [] ;
Pattern0 = ['A'] ;
Pattern0 = ['A', 'A'] ;
Pattern0 = ['A', 'A', 'A'] ;
Pattern0 = ['A', 'A', 'A', 'A'] ;
Pattern0 = ['A', 'A', 'A', 'A', 'A'] .
?- phrase(pattern0, "helloworld").
true.
Also, the handy DCG describing simply a list:
list([]) -->
[].
list([X | Xs]) -->
[X],
list(Xs).
This doesn't seem to do much, but it will come in handy in a moment:
?- phrase(list([a, b, c]), List).
List = [a, b, c].
?- phrase(list(List), [a, b, c]).
List = [a, b, c] ;
false.
Now, you would like to define a composite pattern like Pattern0.."->"..Pattern1.."|"..Next. I suggest to write this a bit differently, namely as a list of sub-patterns: [pattern0, "->", pattern1, "|", Next]. Such a list may contain three kinds of elements:
DCG rule names
literal lists of characters
variables that may become bound to lists of characters
We can then write a DCG matching some composite patterns:
composite([]) -->
[].
composite([Head | Tail]) -->
{ atom(Head) },
% Assume that this atom is the name of another DCG rule, and execute it.
call(Head),
composite(Tail).
composite([Head | Tail]) -->
list(Head),
composite(Tail).
This expresses that a composite pattern just describes a sequence of whatever its sub-patterns describe. It only has two clauses dealing with sub-patterns: One for DCG rule names (represented by atoms) and one for character lists. The case of variables is handled automatically by the character list clause!
We can use this definition to match a character list like "foo->bar|baz" against a composite pattern:
?- phrase(composite([pattern0, "->", pattern1, "|", Next]), "foo->bar|baz").
Next = [b, a, z] ;
false.
Almost done! We can pack this up in a definition encapsulating the pattern:
process_str(Sequence, Next) :-
phrase(composite([pattern0, "->", pattern1, "|", Next]), Sequence).
This works like this:
?- process_str("foo->bar|baz", Next).
Next = [b, a, z] ;
false.
I think this is already pretty good. But if you really want a kind of pattern matching syntax, term_expansion will help. Its use is (deceptively) simple: Define a clause for term_expansion(SomeTermPattern, SomeOtherTerm), and every clause definition matching SomeTermPattern will be treated as if the programmer had written SomeOtherTerm instead. So:
term_expansion(
% Replace every definition of this form:
patterned_process_str(Pattern, Next),
% by a replacement like this:
patterned_process_str(Sequence, Next) :-
phrase(composite(Pattern), Sequence)
).
patterned_process_str([pattern0, "->", pattern1, "|", Next], Next).
We can look at Prolog's internal representation of the source code for patterned_process_str to make sure that it is as expected:
?- listing(patterned_process_str).
patterned_process_str(B, A) :-
phrase(composite([pattern0, [-, >], pattern1, ['|'], A]), B).
Variable names are lost, but otherwise our definition for patterned_process_str was expanded to the form we wanted, namely the same form that we wrote for process_str above. This definition works exactly like process_str above (since it is equivalent):
?- patterned_process_str("foo->bar|baz", Next).
Next = [b, a, z] ;
false.
Exercise: Provide an operator definition for ... Write a predicate pattern_list that converts between "dotted patterns" with .. and "list patterns", for example: pattern_list(A..B..C, [A, B, C]) should succeed. Then, expand the above term_expansion rule in a way that allows you to write patterned_process_str using the "dotted pattern" syntax directly.

You could define in your ~/.swiplrc the following pair of operators:
:- op(699,xfx,:=). % just below =
:- op(698,yfx,++). % just below :=
Out := Left ++ Right :-
flatten_expr_to_string(Left,LStrings),
flatten_expr_to_string(Right,RStrings),
atomics_to_string([LStrings,RStrings],Out).
flatten_expr_to_string(A++B,String) :-
String := A ++ B.
flatten_expr_to_string(Term,String) :-
maplist(integer,Term)
-> string_codes(String,Term)
; term_string(Term,String).
and then
?- X:=`foo`++bar++help.
X = "foobarhelp" .
?- X:=`foo`++123+bar++help.
X = "foo123+barhelp" .
Note that there is an ambiguity about lists of integers (as a backtick string is just that...). Hope you can live with that...
If you use the XPCE editor, just pull the menu [Edit \ Prolog preferences] and add the snippet there, then recompile (Ctrl+b). This will works also on Windows also, where ~/.swiplrc is named in another manner, more appropriate for the platform.
When you have defined an appropriate mini language for your strings expressions, you can explore term rewriting, to enable passing expressions to your predicates, without introducing new variables. Beware that's rather difficult to debug... you can analyze lifter in this repo to get some hints

Related

Pattern matching with lists and strings with Prolog

a prolog newby here.
I found the following code online:
string_to_list_of_characters(String, Characters) :-
name(String, Xs),
maplist( number_to_character,
Xs, Characters ).
number_to_character(Number, Character) :-
name(Character, [Number]).
I want to use it to do some pattern matching.
This is what I have tried so far:
wordH1(H1) :-
word(H1),
string_length(H1,6),
string_to_list_of_characters(H1, X) = a,_,_,_,_,_.
I want to get all strings which are of length 6 and that start with an a.
You seem to be using some very old learning resource. Instead of writing this string_to_list_of_characters predicate yourself you can just use the builtin atom_chars:
?- atom_chars(apple, Chars).
Chars = [a, p, p, l, e].
?- atom_chars(amazon, Chars).
Chars = [a, m, a, z, o, n].
For pattern matching you can write lists similarly to how you tried to do it, but you need square brackets around the elements. You also don't pattern match on something like a "function application expression" as you would in other programming languages. Rather you apply a predicate and then write a separate unification. So it's not something like atom_chars(A, B) = Something but rather:
?- atom_chars(apple, Chars), Chars = [a,_,_,_,_,_].
false.
?- atom_chars(amazon, Chars), Chars = [a,_,_,_,_,_].
Chars = [a, m, a, z, o, n].

Prolog member not found

I have defined a code in PROLOG :
is_member(X, [X|_]).
is_member(X, [_|T]) :-
is_member(X, T).
I am confused by these two outputs :
out1:
is_member('a', ['b', 'c', 'd', 'a']).
>> True.
out2:
Chars = ['b', 'c', 'd', 'a'].
is_member('a', Chars).
>> Chars = [a|_2356]
Can someone help me out here? I though that output should be True.. I am trying to understand the logic here, but obviously I am lost.
Thank you for any help or advice in advance.
Here's how Prolog queries basically work.
First of all, a complete query ends with a period (.). When you execute:
Chars = [a, b, c, d].
This is a complete query since it ends in a period. When you execute a query, Prolog attempts to make it succeed via some binding of the given variables. If it is able to do so, it will simply display the variable bindings that result in success. In this particular case, the solution is trivial: Chars is bound to [a, b, c, d].
Suppose you enter the above and then you follow this with:
is_member(a, Chars).
Since the previous query completed (it ended in a period), Prolog sees this Chars as a new variable. It is no longer bound to [a, b, c, d] because the previous query ended. Prolog looks at this query and determines what binding(s) for Chars will cause it to succeed. The result is:
Chars = [a|_2356]
Prolog is telling you that a valid solution is obtained by binding Chars to the list [a|_2356] which is any list that has a as its first element. What you didn't show is that Prolog prompts for additional solutions. If you press ;, it shows you more solutions to the is_member(a, Chars). query:
3 ?- is_member(a, Chars).
Chars = [a|_5034] ;
Chars = [_5032, a|_5040] ;
Chars = [_5032, _5038, a|_5046] ;
...
In other words, is_member(a, Chars) has an infinite number of solutions. They are lists that have a as the first element, a as the second, etc.
In Prolog, if you want to establish a series of conditions that must all be true in sequence, you use a comma, not a period, to separate each condition, then end the whole thing in a period:
4 ?- Chars = [a,b,c,d], is_member(a, Chars).
Chars = [a, b, c, d] ;
false.
This query says you want to bind Chars to [a, b, c, d] and determine if a is a member of Chars. Prolog is then saying that it succeeded with one solution, Chars = [a,b,c,d]. Entering ; seeks more solutions, which comes back false since there are no additional solutions.
Let's try Isabella's other example with x:
5 ?- Chars = [a,b,c,d], is_member(x, Chars).
false.
In this case, Prolog could not find a solution, so it simply fails (showing false).

Rename Variables in SWI Prolog

How can I rename variables in SWI Prolog?
I tried to use numbervars predicate like this.
numbervars(Xs,1,_,[functor_name(name)]).
So if Xs is a list of 4 variables, it will look like this when I commit 'writeln(Xs)'.
[name(1),name(2),name(3),name(4)]
How can I use that functor, or any other way to remove the parenthesis and make it looks like:
[name1,name2,name3,name4]
Thanks in advance.
You can write your own subset of numbervars/4 (or completely rewrite your own, if you wish). Here's a subset that performs the specific task you're describing:
build_atoms([Var|VarList], N, Prefix) :-
atom_number(Atom, N),
atom_concat(Prefix, Atom, Var),
N1 is N + 1,
build_atoms(VarList, N1, Prefix).
build_atoms([], _, _).
This accepts a list of variables and instantiates them in sequence with the atom given in Prefix contactenated with integers, starting with N.
For example:
?- X = [A,B,C], build_atoms(X, 1, foo).
X = [foo1, foo2, foo3],
A = foo1,
B = foo2,
C = foo3.
?-
This can easily be expanded to include any other functionality in numbervars that you desire.

Check if string is substring in Prolog

Is there a way to check if a string is a substring of another string in Prolog? I tried converting the string to a list of chars and subsequently checking if the first set is a subset of the second that that doesn't seem to be restrictive enough. This is my current code:
isSubstring(X,Y):-
stringToLower(X,XLower),
stringToLower(Y,YLower),
isSubset(XLower,YLower).
isSubset([],_).
isSubset([H|T],Y):-
member(H,Y),
select(H,Y,Z),
isSubset(T,Z).
stringToLower([],[]).
stringToLower([Char1|Rest1],[Char2|Rest2]):-
char_type(Char2,to_lower(Char1)),
stringToLower(Rest1,Rest2).
If I test this with
isSubstring("test","tesZting").
it returns yes, but should return no.
It is not clear what you mean by a string. But since you say you are converting it to a list, you could mean atoms. ISO Prolog offers atom_concat/3 and sub_atom/5 for this purpose.
?- atom_concat(X,Y,'abc').
X = '', Y = abc
; X = a, Y = bc
; X = ab, Y = c
; X = abc, Y = ''.
?- sub_atom('abcbcbe',Before,Length,After,'bcb').
Before = 1, Length = 3, After = 3
; Before = 3, Length = 3, After = 1.
Otherwise, use DCGs! Here's how
seq([]) --> [].
seq([E|Es]) --> [E], seq(Es).
... --> [] | [_], ... .
subseq([]) --> [].
subseq(Es) --> [_], subseq(Es).
subseq([E|Es]) --> [E], subseq(Es).
seq_substring(S, Sub) :-
phrase((...,seq(Sub),...),S).
seq_subseq(S, Sub) :-
phrase(subseq(Sub),S).
Acknowledgements
The first appearance of above definition of ... is on p. 205, Note 1 of
David B. Searls, Investigating the Linguistics of DNA with Definite Clause Grammars. NACLP 1989, Volume 1.
Prolog strings are lists, where each element of the list is the integer value representing the codepoint of the character in question. The string "abc" is exactly equivalent to the list [97,98,99] (assuming your prolog implementation is using Unicode or ASCII, otherwise the values might differ). That leads to this (probably suboptimal from a Big-O perspective) solution, which basically says that X is a substring of S if
S has a suffix T such that, and
X is a prefix of T
Here's the code:
substring(X,S) :-
append(_,T,S) ,
append(X,_,T) ,
X \= []
.
We restrict X to being something other than the empty list (aka the nil string ""), since one could conceptually find an awful lot of zero-length substrings in any string: a string of length n has 2+(n-1) nil substrings, one between each character in the string, one preceding the first character and one following the last character.
The problem is with your isSubset/2.
There are two distinct situations that you've tried to capture in one predicate. Either you're looking for the first position to try to match your substring, or you've already found that point and are checking whether the strings 'line up'.
isSubset([], _).
isSubSet(Substring, String) :-
findStart(Substring, String, RestString),
line_up(Substring, RestString).
findStart([], String, String).
findStart([H|T], [H|T1], [H|T1]).
findStart(Substring, [_|T], RestString) :-
findStart(Substring, T, RestString).
line_up([], _).
line_up([H|T], [H|T1]) :-
line_up(T, T1).
You can combine these into one predicate, as follows:
isSublist([], L, L).
isSublist([H|T], [H|T1], [H|T1]) :-
isSublist(T, T1, T1).
isSublist(L, [_|T], Rest) :-
isSublist(L, T, Rest).
Using DCG's you can do the following: (SWI)
% anything substring anything
substr(String) --> ([_|_];[]), String, ([_|_];[]).
% is X a substring of Y ?
substring(X,Y) :- phrase(substr(X),Y).

Matching tuples in Prolog

Why does Prolog match (X, Xs) with a tuple containing more elements? An example:
test2((X, Xs)) :- write(X), nl, test2(Xs).
test2((X)) :- write(X), nl.
test :-
read(W),
test2(W).
?- test.
|: a, b(c), d(e(f)), g.
a
b(c)
d(e(f))
g
yes
Actually this is what I want to achieve but it seems suspicious. Is there any other way to treat a conjunction of terms as a list in Prolog?
Tuple term construction with the ,/2 operator is generally right-associative in PROLOG (typically referred to as a sequence), so your input of a, b(c), d(e(f)), g might well actually be the term (a, (b(c), (d(e(f)), g))). This is evidenced by the fact that your predicate test2/1 printed what is shown in your question, where on the first invocation of the first clause of test2/1, X matched a and Xs matched (b(c), (d(e(f)), g)), then on the second invocation X matched b(c) and Xs matched (d(e(f)), g), and so on.
If you really wanted to deal with a list of terms interpreted as a conjunction, you could have used the following:
test2([X|Xs]) :- write(X), nl, test2(Xs).
test2([]).
...on input [a, b(c), d(e(f)), g]. The list structure here is generally interpreted a little differently from tuples constructed with ,/2 (as, at least in SWI-PROLOG, such structures are syntactic sugar for dealing with terms constructed with ./2 in much the same way as you'd construct sequences or tuple terms with ,/2). This way, you get the benefits of the support of list terms, if you can allow list terms to be interpreted as conjunctions in your code. Another alternative is to declare and use your own (perhaps infix operator) for conjunction, such as &/2, which you could declare as:
:- op(500, yfx, &). % conjunction constructor
You could then construct your conjunct as a & b(c) & d(e(f)) & g and deal with it appropriately from there, knowing exactly what you mean by &/2 - conjunction.
See the manual page for op/3 in SWI-PROLOG for more details - if you're not using SWI, I presume there should be a similar predicate in whatever PROLOG implementation your'e using -- if it's worth it's salt :-)
EDIT: To convert a tuple term constructed using ,/2 to a list, you could use something like the following:
conjunct_to_list((A,B), L) :-
!,
conjunct_to_list(A, L0),
conjunct_to_list(B, L1),
append(L0, L1, L).
conjunct_to_list(A, [A]).
Hmm... a, b(c), d(e(f)), g means a and (b(c) and (d(e(f)) and g)), as well list [1,2,3] is just a [1 | [2 | [3 | []]]]. I.e. if you turn that conjuction to a list you'll get the same test2([X|Xs]):-..., but difference is that conjunction carries information about how that two goals is combined (there may be disjunction (X; Xs) as well). And you can construct other hierarchy of conjunctions by (a, b(c)), (d(e(f)), g)
You work with simple recursive types. In other languages lists is also recursive types but they often is pretending to be arrays (big-big tuples with nice indexing).
Probably you should use:
test2((X, Y)):- test2(X), nl, test2(Y).
test2((X; Y)). % TODO: handle disjunction
test2(X) :- write(X), nl.

Resources