how to split a sentence in swi-prolog

how to split a sentence in swi-prolog - prolog

I am trying my hands on SWI-Prolog in win xp. I am trying to understand how to split a sentence in Prolog into separate atoms.
Ex : Say I have a sentence like this :
"this is a string"
Is there any way to get individual words to get stored in a variable?
like :
X = this
Y = is
....
and so forth.
Can anyone please explain how this works?
Thanks.

I would use atomic_list_concat/3. See
http://www.swi-prolog.org/pldoc/man?predicate=atomic_list_concat%2F3
Normally it is meant to insert a separator but because of Prolog's bidirectionality of unification, it can also be used to split a string given the separator:
atomic_list_concat(L,' ', 'This is a string').
L = ['This',is,a,string]
Of course once the split is done you can play with the elements of the list L.

I like the answer of 'pat fats', but you have to convert your string to atom before:
..., atom_codes(Atom, String), atomic_list_concat(L, ' ', Atom), ...
If you need to work directly with strings, I have this code in my 'arsenal':
%% split input on Sep
%
% minimal implementation
%
splitter(Sep, [Chunk|R]) -->
string(Chunk),
( Sep -> !, splitter(Sep, R)
; [], {R = []}
).
being a DCG, must be called in this way:
?- phrase(splitter(" ", L), "this is a string"), maplist(atom_codes, As, L).
L = [[116, 104, 105, 115], [105, 115], [97], [115, 116, 114, 105, 110|...]],
As = [this, is, a, string] .
edit: more explanation
I forgot to explain how that works: DCG are well explained by #larsman, in this other answer. I cite him
-->, which actually adds two hidden arguments to it. The first of these is a list to be parsed by the grammar rule; the second is "what's left" after the parse. c(F,X,[]) calls c on the list X to obtain a result F, expecting [] to be left, i.e. the parser should consume the entire list X.
Here I have 2 arguments, the first it's the separator, the second the list being built. The builtin string//1 come from SWI-Prolog library(http/dcg_basics). It's a very handy building block, that match literally anything on backtracking. Here it's 'eating' each char before the separator or the end-of-string. Having done that, we can recurse...

?-split("this is a string"," ", Out).
Out=["this","is","a"," string"]

Related

Prolog: Convert constant with parentheses to string

I have question about Prolog. When I have the constant e.g. fib(7) and I want to convert it to a string with atom_codes(fib(7), A) I get the error, that for fib(7) a non numeric atom is expected.
Obviously the parentheses are the problem. What can i do?

You have a couple of misunderstandings. fib(7) is not a "constant". It's a "term". It is also not an "atom". atom_codes (as its name implies) converts an atom to a list of character codes.
I'll give you some ideas on how to handle this problem, which I would have indicated in the comments, but it's much too long of a hint to do so. :)
You could write a predicate and use the =../2 to break the term down. =../2 unifies a term as its first argument with a list as its second where the functor of the term is the first element of the list, and the remaining elements are the arguments in the term.
If you know you are dealing with simple terms, then your predicate could look like this:
term_codes(Term, Codes) :-
Term =.. [Functor | Arguments],
atom_codes(Functor, Codes),
( Arguments = []
-> true % Term is a simple atom
; % Arguments is a list of atoms or more complex terms
% For a simple argument list, you can use atom_list_concat
).
See Prolog list to comma separated string for an example of using atom_list_concat and, from there, you can use atom_codes for a list.
This is just my first thought on this problem. For now I'll leave the filling in of the details in the above if Arguments is not empty. If you are going to assume always a single, atomic argument, the predicate is very simple. However, if you can have an arbitrary number of arguments for your Term, then you'll need to process it as a list and concatenate the results of atom_codes for each argument and include a code for comma (,) in between each sequence of atom codes. The predicate becomes even more complex if your Term can be compound (e.g., foo(1, bar(2, 3))`). I'm not sure which it is since it hasn't been specified in the question.
Using your fib(7) example, here's the concept:
fib(7) =.. [fib, [7]]
atom_codes(fib, [102, 105, 98])
atom_codes(7, [55]),
atom_codes('(', [40]),
atom_codes(')', [41]),
% result would be: [102, 105, 98, 40, 55, 41])

Printing first letter of an atom Prolog

Print the first letter of two atoms in a list. I can't even get the first letter of one of the two atoms in the list to print.
grab_letter([],[]).
grab_letter([A],[B]) :- A = [First|_], B = [Second|_].
?- grab_letter([apple,pie]).
true ?
How do I get it to print "a"?

The ISO Prolog standard specifies a sub_atom/5 built-in predicates that can decompose an atom into a sub-atom. The first argument is the atom, the second argument is the number of characters before the sub-atom, the third argument is the length of the sub-atom, the fourth argument is the number of characters after the sub-atom, and the fifth argument is the sub-atom. For example:
| ?- sub_atom(apple, 0, 1, _, First).
First = a
yes
| ?- sub_atom(pie, 0, 1, _, First).
First = p
yes
You can call this predicate from your code that processes the list containing the atoms. Can you give it a try and edit your question with the updated code?
Using in alternative the also standard atom_chars/2 predicate, as suggested in a comment, is not a good idea as it results in creating a temporary list (that will eventually be garbage-collected) just to access the first character.

A DCG that matches the rest of the input

This is the predicate that does what it should, namely, collect whatever is left on input when part of a DCG:
rest([H|T], [H|T], []).
rest([], [], []).
but I am struggling to define this as a DCG... Or is it at all doable?
This of course is not the same (although it does the same when used in the same manner):
rest([H|T]) --> [H], !, rest(T).
rest([]) --> [].
The reason I think I need this is that the rest//1 is part of a set of DCG rules that I need to parse the input. I could do phrase(foo(T), Input, Rest), but then I would have to call another phrase(bar(T1), Rest).
Say I know that all I have left on input is a string of digits that I want as an integer:
phrase(stuff_n(Stuff, N), `some other stuff, 1324`).
stuff_n(Stuff, N) -->
stuff(Stuff),
rest(Rest),
{ number_codes(N, Rest),
integer(N)
}.

Answering my own silly question:
#CapelliC gave a solution that works (+1). It does something I don't understand :-(, but the real issue was that I did not understand the problem I was trying to solve. The real problem was:
Problem
You have as input a code list that you need to parse. The result should be a term. You know quite close to the beginning of this list of codes what the rest looks like. In other words, it begins with a "keyword" that defines the contents. In some cases, after some point in the input, the rest of the contents do not need to be parsed: instead, they are collected in the resulting term as a code list.
Solution
One possible solution is to break up the parsing in two calls to phrase/3 (because there is no reason not to?):
Read the keyword (first call to phrase/3) and make it an atom;
Look up in a table what the rest is supposed to look like;
Parse only what needs to be parsed (second call to phrase/3).
Code
So, using an approach from (O'Keefe 1990) and taking advantage of library(dcg/basics) available in SWI-Prolog, with a file rest.pl:
:- use_module(library(dcg/basics)).
codes_term(Codes, Term) :-
phrase(dcg_basics:nonblanks(Word), Codes, Codes_rest),
atom_codes(Keyword, Word),
kw(Keyword, Content, Rest, Term),
phrase(items(Content), Codes_rest, Rest).
kw(foo, [space, integer(N), space, integer(M)], [], foo(N, M)).
kw(bar, [], Text, bar(Text)).
kw(baz, [space, integer(N), space], Rest, baz(N, Rest)).
items([I|Is]) -->
item(I),
items(Is).
items([]) --> [].
item(space) --> " ".
item(integer(N)) --> dcg_basics:integer(N).
It is important that here, the "rest" does not need to be handled by a DCG rule at all.
Example use
This solution is nice because it is deterministic, and very easy to expand: just add clauses to the kw/4 table and item//1 rules. (Note the use of the --traditional flag when starting SWI-Prolog, for double-quote delimited code lists)
$ swipl --traditional --quiet
?- [rest].
true.
?- codes_term("foo 22 7", T).
T = foo(22, 7).
?- codes_term("bar 22 7", T).
T = bar([32, 50, 50, 32, 55]).
?- codes_term("baz 22 7", T).
T = baz(22, [55]).

An alternative (that doesn't leave a choice point behind) is to use the call//1 built-in non-terminal with a lambda expression. Using Logtalk's lambda expression syntax to illustrate:
rest(Rest) --> call({Rest}/[Rest,_]>>true).
This solution is a bit nasty, however, as it uses a variable with a dual role in the lambda expression (which triggers a warning with the Logtalk compiler). An usage example:
:- object(rest).
:- public(test/2).
test(Input, Rest) :-
phrase(input(Rest), Input).
input(Rest) --> [a,b,c], rest(Rest).
rest(Rest) --> call({Rest}/[Rest,_]>>true).
% rest([C|Cs]) --> [C|Cs]. % Carlo's solution
:- end_object.
Assuming the above object is saved in a dcg_rest.lgt source file:
$ swilgt
...
?- {dcg_rest}.
* Variable A have dual role in lambda expression: {A}/[A,B]>>true
* in file /Users/pmoura/Desktop/dcg_rest.lgt between lines 13-14
* while compiling object rest
% [ /Users/pmoura/Desktop/dcg_rest.lgt loaded ]
% 1 compilation warning
true.
?- rest::test([a,b,c,d,e], Rest).
Rest = [d, e].
You should be able to get the same results using other lambda expressions implementation such as Ulrich's lambda library.

could be
rest([C|Cs]) --> [C|Cs] .
at least in SWI-Prolog, it seems to run (I used library(dcg/basics) to get the number)
line(I,R) --> integer(I), rest(R).
?- phrase(line(N,R), `6546 okok`).
N = 6546,
R = [32, 111, 107, 111, 107]

Query regarding Concatenation

I know that we can concat atoms using atom_concat(Para1,Para1,Final)., Is there any function available in Prolog which can perform the reverse operation mean it takes input as an atom and provides two atom in which one is the last character of the atom and second is remaining one. eg.
?- rev_atom_concat(likes,Para1,Para2).
Para1 = like, Para2 = s
I am not sure that is this really possible or not..?

You may use sub_atom for this. sub_atom extracts part of an atom. The syntax is:
sub_atom(+Atom, ?Before, ?Len, ?After, ?Sub)
Atom is the initial atom; Sub the sub-atom. Extraction works this way:
<************************ Atom ************************>
<***** Prefix *****><***** Sub *****><**** Suffix *****>
<-- before chars --><-- len chars --><-- after chars -->
For example, to extract the last character:
?- sub_atom(likes, _, 1, 0, S).
S = s.
For exemple, to extract all the characters but the last one:
?- sub_atom(likes, 0, _, 1, S).
S = like.

Concatting a list of strings in Prolog

I'm writing a Lisp to C translator and I have a problem with handling strings. This is a code that transforms an unary Lisp function to a C equivalent:
define(F) --> fun_unary(F), !.
fun_unary(F) --> "(define (", label(Fun), spaces, label(Arg1), ")", spaces, expr(Body), ")",
{swritef(F, "data *%t(data *%t) { return(%t); }", [Fun, Arg1, Body])}, !.
funs([F]) --> define(F), !.
funs([F|Fs]) --> define(F), spaces, funs(Fs), !.
Now I want to read any number of functions and return them as a single string. The above funs is the best I could come up with, but it works like this:
?- funs(F, "(define (carzero l) (= (car l) 0)) (define (zero n) (= 0 n))", []).
F = ["data *carzero(data *l) { return(eq(car(l), make_atom_int(0))); }", "data *zero(data *n) { return(eq(make_atom_int(0), n)); }"].
While I want something like this:
F = "data *carzero(data *l) { return(eq(car(l), make_atom_int(0))); }\n\ndata *zero(data *n) { return(eq(make_atom_int(0), n)); }".
so that I can nicely swritef is into a complete program, between #includes and main(). An alternative solution is to modify the highest level translator to handle the list. It curently looks like this:
program(P) --> define(F), {swritef(P, "#include \"lisp2c.h\" \n\n%t \nint main() { return 0; }", [F])}, !.
How would I do any of these two? I'm using SWI Prolog.

Setting aside for now the purpose for which it's needed, let's write a Prolog predicate that concatenates a list of strings into one string, placing a double newline between each consecutive pair of strings (but not at the end of the output string, judging by the example that Jerry posted).
SWI-Prolog Manual: Normally I'd post "deep" links to the documentation, but the SWI-Prolog site uses a style of URL that triggers cross-site scripting (XSS) warnings with many browser/plugin combinations. So instead I'll refer than link to the appropriate section.
Section 4.22 Representing text in strings says (in part), "String objects by default have no lexical representation and thus can only be created using the predicates below or through the foreign language interface." This can be a little confusing, as SWI-Prolog writes strings as double-quoted text, but reads double-quoted text (by default) as lists of character codes.
Here's code for a predicate that concatenates the strings in a list, inserting another string Separator in between consecutive string pairs:
strSepCat([ ],_,Empty) :-
string_to_list(Empty,[ ]).
strSepCat([H|T],Separator,StrCat) :-
strSepCat(T,Separator,H,StrCat).
strSepCat([ ],_,StrCat,StrCat).
strSepCat([H|T],Sep,Str,Cat) :-
string_concat(Sep,H,SepH),
string_concat(Str,SepH,StrSepH),
strSepCat(T,Sep,StrSepH,Cat).
Note that we've defined two predicates, strSepCat/3 and strSepCat/4. The former is defined in terms of the latter, a typical design pattern in Prolog that introduces an extra argument as an accumulator that binds to an output when recursion is complete. Such a technique is often helpful in getting a tail recursive definition.
To use the predicate strSepCat/3, we'd generally need to construct the separator string with (the escape sequence for) two newlines:
?- funs(Fs,Lisp,[ ]), string_to_list(Sep,"\n\n"), strSepCat(Fs,Sep,CProg).

What about using DCG notation for appending the strings?
concat([]) --> [].
concat([List|Lists]) --> List, "\n\n", concat(Lists).

Since strings in Prolog are really lists of character codes, you can use append in a custom predicate that also inserts the newlines:
concat_program([], "").
concat_program([L|Ls], Str) :-
concat_program(Ls, Str0),
append("\n\n", Str0, Str1),
append(L, Str1, Str).
Usage:
funs(Fs, Lisp, []),
concat_program(Fs, P),
write("#include ...\n"),
writef(P).

A simpler (and more generic) solution than the accepted answer is to use reduce with the existing string_concat as a parameter:
reduce3(_, [], Default, Default).
reduce3(_, [A], _, A).
reduce3(P3, [A,B|T], _, D):-
call(P3, A, B, C),
reduce3(P3, [C|T], _, D).
?- reduce3(string_concat, ["123", "456", "789"], "", R).
R = "123456789"
?- reduce3(string_concat, ["123"], "", R).
R = "123"
?- reduce3(string_concat, [], "", R).
R = ""
strings_concat(Strings, String):-
reduce3(string_concat, Strings, "", String).
SWISH notebook: https://swish.swi-prolog.org/p/reduce.swinb

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

how to split a sentence in swi-prolog - prolog

?-split("this is a string"," ", Out). Out=["this","is","a"," string"]

Related

Prolog: Convert constant with parentheses to string

Printing first letter of an atom Prolog

A DCG that matches the rest of the input

Query regarding Concatenation

Concatting a list of strings in Prolog

Categories

Resources