Concatting a list of strings in Prolog - prolog

I'm writing a Lisp to C translator and I have a problem with handling strings. This is a code that transforms an unary Lisp function to a C equivalent:
define(F) --> fun_unary(F), !.
fun_unary(F) --> "(define (", label(Fun), spaces, label(Arg1), ")", spaces, expr(Body), ")",
{swritef(F, "data *%t(data *%t) { return(%t); }", [Fun, Arg1, Body])}, !.
funs([F]) --> define(F), !.
funs([F|Fs]) --> define(F), spaces, funs(Fs), !.
Now I want to read any number of functions and return them as a single string. The above funs is the best I could come up with, but it works like this:
?- funs(F, "(define (carzero l) (= (car l) 0)) (define (zero n) (= 0 n))", []).
F = ["data *carzero(data *l) { return(eq(car(l), make_atom_int(0))); }", "data *zero(data *n) { return(eq(make_atom_int(0), n)); }"].
While I want something like this:
F = "data *carzero(data *l) { return(eq(car(l), make_atom_int(0))); }\n\ndata *zero(data *n) { return(eq(make_atom_int(0), n)); }".
so that I can nicely swritef is into a complete program, between #includes and main(). An alternative solution is to modify the highest level translator to handle the list. It curently looks like this:
program(P) --> define(F), {swritef(P, "#include \"lisp2c.h\" \n\n%t \nint main() { return 0; }", [F])}, !.
How would I do any of these two? I'm using SWI Prolog.

Setting aside for now the purpose for which it's needed, let's write a Prolog predicate that concatenates a list of strings into one string, placing a double newline between each consecutive pair of strings (but not at the end of the output string, judging by the example that Jerry posted).
SWI-Prolog Manual: Normally I'd post "deep" links to the documentation, but the SWI-Prolog site uses a style of URL that triggers cross-site scripting (XSS) warnings with many browser/plugin combinations. So instead I'll refer than link to the appropriate section.
Section 4.22 Representing text in strings says (in part), "String objects by default have no lexical representation and thus can only be created using the predicates below or through the foreign language interface." This can be a little confusing, as SWI-Prolog writes strings as double-quoted text, but reads double-quoted text (by default) as lists of character codes.
Here's code for a predicate that concatenates the strings in a list, inserting another string Separator in between consecutive string pairs:
strSepCat([ ],_,Empty) :-
string_to_list(Empty,[ ]).
strSepCat([H|T],Separator,StrCat) :-
strSepCat(T,Separator,H,StrCat).
strSepCat([ ],_,StrCat,StrCat).
strSepCat([H|T],Sep,Str,Cat) :-
string_concat(Sep,H,SepH),
string_concat(Str,SepH,StrSepH),
strSepCat(T,Sep,StrSepH,Cat).
Note that we've defined two predicates, strSepCat/3 and strSepCat/4. The former is defined in terms of the latter, a typical design pattern in Prolog that introduces an extra argument as an accumulator that binds to an output when recursion is complete. Such a technique is often helpful in getting a tail recursive definition.
To use the predicate strSepCat/3, we'd generally need to construct the separator string with (the escape sequence for) two newlines:
?- funs(Fs,Lisp,[ ]), string_to_list(Sep,"\n\n"), strSepCat(Fs,Sep,CProg).

What about using DCG notation for appending the strings?
concat([]) --> [].
concat([List|Lists]) --> List, "\n\n", concat(Lists).

Since strings in Prolog are really lists of character codes, you can use append in a custom predicate that also inserts the newlines:
concat_program([], "").
concat_program([L|Ls], Str) :-
concat_program(Ls, Str0),
append("\n\n", Str0, Str1),
append(L, Str1, Str).
Usage:
funs(Fs, Lisp, []),
concat_program(Fs, P),
write("#include ...\n"),
writef(P).

A simpler (and more generic) solution than the accepted answer is to use reduce with the existing string_concat as a parameter:
reduce3(_, [], Default, Default).
reduce3(_, [A], _, A).
reduce3(P3, [A,B|T], _, D):-
call(P3, A, B, C),
reduce3(P3, [C|T], _, D).
?- reduce3(string_concat, ["123", "456", "789"], "", R).
R = "123456789"
?- reduce3(string_concat, ["123"], "", R).
R = "123"
?- reduce3(string_concat, [], "", R).
R = ""
strings_concat(Strings, String):-
reduce3(string_concat, Strings, "", String).
SWISH notebook: https://swish.swi-prolog.org/p/reduce.swinb

Related

What exactly is the vertical slash function in PROLOG? Is it an operator?

I was studying the PROLOG programming language, testing some examples and reading documentations. I started then to do heavy research about lists in PROLOG. The idea is: Head and Tail. I then learned that lists can be expressed in PROLOG like this:
[Head | Tail]
The syntax is pretty simple, square brackets with a head and a tail, separated by a vertical slash |. I then asked myself what is the meaning (the semantics) of the vertical slash | in PROLOG. As I said, I had done research concerning lists and the vertical slash as well, but I was not able to find something helpful about the it.
So this is why I am a little bit confused. I suppose it is indeed a special character, but why does it necessarily have to be a vertical slash? Is it an operator? Is it used for system or language (meta) applications? What is its specific function in the language?
Yes, | is a right-associative infix operator of precedence 1105, right-associative meaning that an expression like
a|b|c|d
binds as
'|'( a , '|'( b , '|'( c , d ) ) )
rather than the left-associative binding
'|'( '|'( '|'( a , b ) , c ) , d )
It is part of Prolog's syntactic sugar for list notation. In Prolog, any non-empty list, has a single item that is denoted as its head, and the remainder of the list, itself another list (which may be empty), denoted as the tail. (A rather nice recursive definition, eh?)
So one can easily partition a list into its head and tail using |. So
[Head|Tail] = [a,b,c,d]
results in
Head = a
Tail = [b,c,d]
From my answer here,
Prolog's list notation is syntactic sugar on top of very simple prolog terms. Prolog lists are denoted thus:
The empty list is represented by the atom []. Why? Because that looks like the mathematical notation for an empty list. They could have used an atom like nil to denote the empty list but they didn't.
A non-empty list is represented by the term .\2, where the first (leftmost) argument is the head of the list and the second (rightmost) argument is the tail of the list, which is, recursively, itself a list.
Some examples:
An empty list: [] is represented as the atom it is:
[]
A list of one element, [a] is internally stored as
.(a,[])
A list of two elements [a,b] is internally stored as
.(a,.(b,[]))
A list of three elements, [a,b,c] is internally stored as
.(a,.(b,.(c,[])))
Examination of the head of the list is likewise syntactic sugar over the same ./2 notation:
[X|Xs] is identical to .(X,Xs)
[A,B|Xs] is identical to .(A,.(B,Xs))
[A,B] is (see above) identical to .(A,.(B,[]))
There seems to be a bit of confusion b/w the usage of vertical bar | generally used in list pattern matching and the |/2 operator.
I am not familiar with other prologs so this might be swi-prolog specific. Help for '|' states the following:
help('|').
:Goal1 | :Goal2
Equivalent to ;/2. Retained for compatibility only. New code should use ;/2.
So, the | used in list notation is not this operator.
?- X = '[|]'(1, []).
X = [1].
?- X = '|'(1, []).
X = (1| []).
?- [1] = '|'(1, []).
false.
?- [1] = '[|]'(1, []).
true.
As seen above using just | only creates a compound term and not a list.
Following uses Univ =.. and makes it more clear.
?- X = '[|]'(a, '[|]'(b, [])).
X = [a, b].
?- [a, b, c] =.. X.
X = ['[|]', a, [b, c]].
?- deep_univ([a, b, c, d], X).
X = ['[|]', a, ['[|]', b, ['[|]', c, ['[|]', d, []]]]].
I have used deep_univ/2 from here

How to create anonymous variables from a string?

I have defined a predicate find_word/2 that when given a list of letters (with some letters possibly ungrounded), produces possible words that match the pattern given in the list. This is something like a hangman solver.
word('entity', n, 11).
word('physical entity', n, 1).
word('abstraction', n, 0).
% ... and 200,000 more entries ...
% Example: find_word([_,o,u,n,t,r,y], X) -> X = country
find_word(LetterList, Word) :-
word(Word, _, _),
atom_chars(Word, LetterList).
The code above works as intended. The challenge is that I receive hangman problems from outside the Prolog system as a string (e.g. app_e), where the underscores in the string represent the missing letters to be found by the prolog program above. i.e. I need to convert the app_e string into a list that can be fed into find_word/2.
On my first attempt, I used atom_chars\2:
?- atom_chars(app_e, L), find_word(L, Word).
Unfortunately, this does not work as hoped because atom_chars(app_e, L) -> L = [a, p, p, '_', e]. i.e. the '_' isn't a wildcard.
In summary, given a string app_e, how do I transform it into a list that can be fed into find_word\2 to achieve the same effect as find_word([a,p,p,_,e], Word).?
I think atom_chars/2 is working as intended here, you just need a little cleanup step to finish turning your input into the desired form, which I think you can do quite straightforwardly like so:
charvar('_', _).
charvar(C, C) :- C \= '_'.
Usage looks like this:
?- maplist(charvar, [a,p,p,'_',e], X).
X = [a, p, p, _3398, e] .
Don't worry about the fact that this variable is not rendered as an underscore; your own probably wouldn't be either:
?- X=[_].
X = [_3450].

How to shortern a predicate that prints depending on a condition

So I have something like this:
main :-
% Check whether B,C,D is equal to A
... ,
% Relevant code:
(
(B==A -> write('B is the same as A.'));
(C==A -> write('C is the same as A.'));
(D==A -> write('D is the same as A.'));
).
Is there any way that this could be shortened but still print the relevant letter? There could be 100's of letters to test so this current method is not very nice.
Just a quick note in case you weren't aware of this difference: When you call A == B, you're resting whether the value bound to the variable A is equivalent to the value bound to variable B. But when you use write/1 to output
'B is the same as A.', you are just outputting the atomic literal represented by that string of letters. There is no relationship between the character 'A' as part of an atom and the value bound to a variable which is represented by A (no ') in your source code.
So I'm not 100% clear on your intended result, but here are two different solutions that demonstrate the use of the format family of predicates for outputting values and literals:
If you just want to compare the values of two variables, you can use a predicate to perform the comparison and printout the desired result, which can then be used on all members of a list (forall/2 is appropriate here because we are only concerned with output):
report_on_equality(A, B) :-
A == B,
format('~w is the same as ~w.~n', [A, B]).
report_on_equality(A, B) :-
A \== B,
format('~w is not the same as ~w.~n', [A, B]).
example_1 :-
Vals = [1,4,6,1,7],
forall( member(V, Vals),
report_on_equality(V, 1)
).
But there is no reason to output the value of the variables twice in this case, since if they are equivalent, they will of course be the same value. So maybe you actually want to print out uppercase characters that have been previously associated with values. This, of course, requires that you have first made some paring between uppercase characters and some other values. I have chosen to use a simple list of pairs for this purpose:
report_on_labeled_equality(LabelA-ValA, LabelB-ValB) :-
ValA == ValB,
format('~w is the same as ~w.~n', [LabelA, LabelB]).
report_on_labeled_equality(LabelA-ValA, LabelB-ValB) :-
ValA \== ValB,
format('~w is not the same as ~w.~n', [LabelA, LabelB]).
example_2 :-
Vals = ['B'-1, 'C'-3, 'D'-1, 'E'-4],
forall( member(V, Vals),
report_on_labeled_equality(V, 'A'-1)
).

A DCG that matches the rest of the input

This is the predicate that does what it should, namely, collect whatever is left on input when part of a DCG:
rest([H|T], [H|T], []).
rest([], [], []).
but I am struggling to define this as a DCG... Or is it at all doable?
This of course is not the same (although it does the same when used in the same manner):
rest([H|T]) --> [H], !, rest(T).
rest([]) --> [].
The reason I think I need this is that the rest//1 is part of a set of DCG rules that I need to parse the input. I could do phrase(foo(T), Input, Rest), but then I would have to call another phrase(bar(T1), Rest).
Say I know that all I have left on input is a string of digits that I want as an integer:
phrase(stuff_n(Stuff, N), `some other stuff, 1324`).
stuff_n(Stuff, N) -->
stuff(Stuff),
rest(Rest),
{ number_codes(N, Rest),
integer(N)
}.
Answering my own silly question:
#CapelliC gave a solution that works (+1). It does something I don't understand :-(, but the real issue was that I did not understand the problem I was trying to solve. The real problem was:
Problem
You have as input a code list that you need to parse. The result should be a term. You know quite close to the beginning of this list of codes what the rest looks like. In other words, it begins with a "keyword" that defines the contents. In some cases, after some point in the input, the rest of the contents do not need to be parsed: instead, they are collected in the resulting term as a code list.
Solution
One possible solution is to break up the parsing in two calls to phrase/3 (because there is no reason not to?):
Read the keyword (first call to phrase/3) and make it an atom;
Look up in a table what the rest is supposed to look like;
Parse only what needs to be parsed (second call to phrase/3).
Code
So, using an approach from (O'Keefe 1990) and taking advantage of library(dcg/basics) available in SWI-Prolog, with a file rest.pl:
:- use_module(library(dcg/basics)).
codes_term(Codes, Term) :-
phrase(dcg_basics:nonblanks(Word), Codes, Codes_rest),
atom_codes(Keyword, Word),
kw(Keyword, Content, Rest, Term),
phrase(items(Content), Codes_rest, Rest).
kw(foo, [space, integer(N), space, integer(M)], [], foo(N, M)).
kw(bar, [], Text, bar(Text)).
kw(baz, [space, integer(N), space], Rest, baz(N, Rest)).
items([I|Is]) -->
item(I),
items(Is).
items([]) --> [].
item(space) --> " ".
item(integer(N)) --> dcg_basics:integer(N).
It is important that here, the "rest" does not need to be handled by a DCG rule at all.
Example use
This solution is nice because it is deterministic, and very easy to expand: just add clauses to the kw/4 table and item//1 rules. (Note the use of the --traditional flag when starting SWI-Prolog, for double-quote delimited code lists)
$ swipl --traditional --quiet
?- [rest].
true.
?- codes_term("foo 22 7", T).
T = foo(22, 7).
?- codes_term("bar 22 7", T).
T = bar([32, 50, 50, 32, 55]).
?- codes_term("baz 22 7", T).
T = baz(22, [55]).
An alternative (that doesn't leave a choice point behind) is to use the call//1 built-in non-terminal with a lambda expression. Using Logtalk's lambda expression syntax to illustrate:
rest(Rest) --> call({Rest}/[Rest,_]>>true).
This solution is a bit nasty, however, as it uses a variable with a dual role in the lambda expression (which triggers a warning with the Logtalk compiler). An usage example:
:- object(rest).
:- public(test/2).
test(Input, Rest) :-
phrase(input(Rest), Input).
input(Rest) --> [a,b,c], rest(Rest).
rest(Rest) --> call({Rest}/[Rest,_]>>true).
% rest([C|Cs]) --> [C|Cs]. % Carlo's solution
:- end_object.
Assuming the above object is saved in a dcg_rest.lgt source file:
$ swilgt
...
?- {dcg_rest}.
* Variable A have dual role in lambda expression: {A}/[A,B]>>true
* in file /Users/pmoura/Desktop/dcg_rest.lgt between lines 13-14
* while compiling object rest
% [ /Users/pmoura/Desktop/dcg_rest.lgt loaded ]
% 1 compilation warning
true.
?- rest::test([a,b,c,d,e], Rest).
Rest = [d, e].
You should be able to get the same results using other lambda expressions implementation such as Ulrich's lambda library.
could be
rest([C|Cs]) --> [C|Cs] .
at least in SWI-Prolog, it seems to run (I used library(dcg/basics) to get the number)
line(I,R) --> integer(I), rest(R).
?- phrase(line(N,R), `6546 okok`).
N = 6546,
R = [32, 111, 107, 111, 107]

how to split a sentence in swi-prolog

I am trying my hands on SWI-Prolog in win xp. I am trying to understand how to split a sentence in Prolog into separate atoms.
Ex : Say I have a sentence like this :
"this is a string"
Is there any way to get individual words to get stored in a variable?
like :
X = this
Y = is
....
and so forth.
Can anyone please explain how this works?
Thanks.
I would use atomic_list_concat/3. See
http://www.swi-prolog.org/pldoc/man?predicate=atomic_list_concat%2F3
Normally it is meant to insert a separator but because of Prolog's bidirectionality of unification, it can also be used to split a string given the separator:
atomic_list_concat(L,' ', 'This is a string').
L = ['This',is,a,string]
Of course once the split is done you can play with the elements of the list L.
I like the answer of 'pat fats', but you have to convert your string to atom before:
..., atom_codes(Atom, String), atomic_list_concat(L, ' ', Atom), ...
If you need to work directly with strings, I have this code in my 'arsenal':
%% split input on Sep
%
% minimal implementation
%
splitter(Sep, [Chunk|R]) -->
string(Chunk),
( Sep -> !, splitter(Sep, R)
; [], {R = []}
).
being a DCG, must be called in this way:
?- phrase(splitter(" ", L), "this is a string"), maplist(atom_codes, As, L).
L = [[116, 104, 105, 115], [105, 115], [97], [115, 116, 114, 105, 110|...]],
As = [this, is, a, string] .
edit: more explanation
I forgot to explain how that works: DCG are well explained by #larsman, in this other answer. I cite him
-->, which actually adds two hidden arguments to it. The first of these is a list to be parsed by the grammar rule; the second is "what's left" after the parse. c(F,X,[]) calls c on the list X to obtain a result F, expecting [] to be left, i.e. the parser should consume the entire list X.
Here I have 2 arguments, the first it's the separator, the second the list being built. The builtin string//1 come from SWI-Prolog library(http/dcg_basics). It's a very handy building block, that match literally anything on backtracking. Here it's 'eating' each char before the separator or the end-of-string. Having done that, we can recurse...
?-split("this is a string"," ", Out).
Out=["this","is","a"," string"]

Resources