Replacing white spaces in prolog - prolog

Is it possible in prolog to replace all white spaces of a string with some given character?
Example-
If I have a variable How are you today? and I want How_are_you_today?

For atoms
There are may ways in which this can be done. I find the following particularly simple, using atomic_list_concat/3:
?- atomic_list_concat(Words, ' ', 'How are you today?'), atomic_list_concat(Words, '_', Result).
Words = ['How', are, you, 'today?'],
Result = 'How_are_you_today?'.
For SWI strings
The above can also be done with SWI strings. Unfortunately, there is no string_list_concat/3 which would have made the conversion trivial. split_string/4 is very versatile, but it only does half of the job:
?- split_string("How are you today?", " ", "", Words).
Words = ["How", "are", "you", "today?"].
We can either define string_list_concat/3 ourselves (a first attempt at defining this is shown below) or we need a slightly different approach, e.g. repeated string_concat/3.
string_list_concat(Strings, Separator, String):-
var(String), !,
maplist(atom_string, [Separator0|Atoms], [Separator|Strings]),
atomic_list_concat(Atoms, Separator0, Atom),
atom_string(Atom, String).
string_list_concat(Strings, Separator, String):-
maplist(atom_string, [Separator0,Atom], [Separator,String]),
atomic_list_concat(Atoms, Separator0, Atom),
maplist(atom_string, Atoms, Strings).
And then:
?- string_list_concat(Words, " ", "How are you today?"), string_list_concat(Words, "_", Result).
Words = ["How", "are", "you", "today?"],
Result = "How_are_you_today?".

It all depends on what you mean by a string. SWI has several for them, some are generally available in any common Prolog and conforming to the ISO standard ; and some are specific to SWI and not conforming. So, let's start with those that are generally available:
###Strings as list of character codes — integers representing code points
This representation is often the default, prior to SWI 7 it was the default in SWI, too. The biggest downside is that a list of arbitrary integers can now be confused with text.
:- set_prolog_flag(double_quotes, codes).
codes_replaced(Xs, Ys) :-
maplist(space_repl, Xs, Ys).
space_repl(0' ,0'_).
space_repl(C, C) :- dif(C,0' ).
?- codes_replaced("Spaces !", R).
R = [83,112,97,99,101,115,95,95,33]
; false.
###Strings as list of characters — atoms of length 1
This representation is a bit cleaner since it does not confuse integers with characters, see this reply how to get more compact answers.
:- set_prolog_flag(double_quotes, chars).
chars_replaced(Xs, Ys) :-
maplist(space_replc, Xs, Ys).
space_replc(' ','_').
space_replc(C, C) :- dif(C,' ').
?- chars_replaced("Spaces !", R).
R = ['S',p,a,c,e,s,'_','_',!]
; false.
###Strings as atoms
#WouterBeek already showed you how this can be done with an SWI-specific built-in. I will reuse above:
atom_replaced(A, R) :-
atom_chars(A, Chs),
chars_replaced(Chs, Rs),
atom_chars(R, Rs).
?- atom_replaced('Spaces !',R).
R = 'Spaces__!'
; false.
So far everything applies to iso-prolog
###Strings as an SWI-specific, non-conforming data type
This version does not work in any other system. I mention it for completeness.

SWI-Prolog DCGs allows an easy definition, using 'push back' or lookahead argument:
?- phrase(rep_string(` `, `_`), `How are you`, R),atom_codes(A,R).
R = [72, 111, 119, 95, 97, 114, 101, 95, 121|...],
A = 'How_are_you'
The definition is
rep_string(Sought, Replace), Replace --> Sought, rep_string(Sought, Replace).
rep_string(Sought, Replace), [C] --> [C], rep_string(Sought, Replace).
rep_string(_, _) --> [].
edit To avoid multiple 'solutions', a possibility is
rep_string(Sought, Replace), Replace --> Sought, !, rep_string(Sought, Replace).
rep_string(Sought, Replace), [C] --> [C], !, rep_string(Sought, Replace).
rep_string(_, _) --> [].

Related

I can't get my Prolog DCG working with atom concat

I can't get this Prolog DCG code working:
String1=" ",string_codes(String1,Codes),phrase(spaces(Output),Codes).
spaces(XXs) -->
[X], {X=32}, spaces(Xs),
{char_code(Ch,X), atom_concat(Ch,Xs,XXs)}, !. %%Space
spaces([]) --> [].
I feel like an improved solution would probably be something like this:
spaces(Spaces) --> " ", spaces(S0), { atom_concat(' ', S0, Spaces) }.
spaces('') --> [].
There's no real need to ask what the char is for code 32, you know it's a space. Also, [X], {X=32} from your answer is better as [32], which is still better as " ".
I solved this by changing [] in the base case to ''.
spaces(XXs) -->
[X], {X=32}, spaces(Xs),
{char_code(Ch,X), atom_concat(Ch,Xs,XXs)}, !. %% Space
spaces('') --> [].
String1 = " ",
Codes = [32, 32, 32],
Output = ' '.
If you are doing DCG and using SWI-Prolog there is a library of often used predicates and DCG clauses in dcgbasics. This can be used in code with
:- use_module(library(dcg/basics)).
To list the code for the predicates use listing/1, e.g.
?- listing(dcg_basics:_).
The library has a DCG clause blanks//0 that does what you want, e.g.
?- listing(dcg_basics:blanks).
blanks(A, B) :-
blank(A, C),
!,
D=C,
blanks(D, B).
blanks(A, A).
true.
?- listing(dcg_basics:blank).
blank([C|A], B) :-
nonvar(C),
code_type(C, space),
B=A.
true.
which as DCG is
blank -->
[C],
{
nonvar(C),
code_type(C,space)
}.
blanks -->
blank, !, blanks.
blanks --> [].
NB
The library version uses character codes and not characters.
?- string_codes("",Codes),phrase(blanks,Codes,Rest).
Codes = Rest, Rest = [].
?- string_codes(" ",Codes),phrase(blanks,Codes,Rest).
Codes = [32],
Rest = [].
?- string_codes(" ",Codes),phrase(blanks,Codes,Rest).
Codes = [32, 32],
Rest = [].
?- string_codes(" ",Codes),phrase(blanks,Codes,Rest).
Codes = [32, 32, 32],
Rest = [].

All substrings with same begin and end

I have to solve a homework but I have a very limited knowledge of Prolog. The task is the following:
Write a Prolog program which can list all of those substrings of a string, whose length is at least two character and the first and last character is the same.
For example:
?- sameend("teletubbies", R).
R = "telet";
R = "ele";
R = "eletubbie";
R = "etubbie";
R = "bb";
false.
My approach of this problem is that I should iterate over the string with head/tail and find the index of the next letter which is the same as the current (it satisfies the minimum 2-length requirement) and cut the substring with sub_string predicate.
This depends a bit on what you exactly mean by a string. Traditionally in Prolog, a string is a list of characters. To ensure that you really get those, use the directive below. See this answer for more.
:- set_prolog_flag(double_quotes, chars).
sameend(Xs, Ys) :-
phrase( ( ..., [C], seq(Zs), [C], ... ), Xs),
phrase( ( [C], seq(Zs), [C] ), Ys).
... --> [] | [_], ... .
seq([]) -->
[].
seq([E|Es]) -->
[E],
seq(Es).
if your Prolog has append/2 and last/2 in library(lists), it's easy as
sameend(S,[F|T]) :-
append([_,[F|T],_],S),last(T,F).

About a Prolog tokenizer

One of my assignments ask us to build a prolog tokenizer. Right now I wrote a predicate that can change space and tab it new line. But I don't know how to implement that into the main program.
The replace part looks like this:
replace(_, _, [], []).
replace(O, R, [O|T], [R|T2]):- replace(O, R, T, T2).
replace(O, R, [H|T], [H|T2]) :- H \= O, replace(O, R, T, T2).
And the Main part has a predicate called removewhite(list1 list2)
So how can I let removewhite execute replace?
You are a bit 'off trail' toward a tokenizer: removewhite/2 isn't going to buy you any useful functionality. Instead, consider a DCG (of course if your Prolog offers this functionality):
tokenize(String, Tokens) :- phrase(tokenize(Tokens), String).
tokenize([]) --> [].
tokenize(Tokens) --> skip_spaces, tokenize(Tokens).
tokenize([Number|Tokens]) --> number(Number), tokenize(Tokens).
skip_spaces --> code_types(white, [_|_]).
number(N) --> code_types(digit, [C|Cs]), {number_codes(N,[C|Cs])}.
code_types(Type, [C|Cs]) --> [C], {code_type(C,Type)}, !, code_types(Type, Cs).
code_types(_, []) --> [].
despite the simplicity, this is a fairly efficient scanner, easily extensible.
In SWI-Prolog, that has (non ISO compliant) extensions for efficient handling of strings, this can be called from top level like:
?- tokenize(`123 4 567 `, L).
L = [123, 4, 567]
or
?- atom_codes('123 4 567 ',Cs), tokenize(Cs, L).
Cs = [49, 50, 51, 32, 32, 52, 32, 53, 54|...],
L = [123, 4, 567]
Btw, in SWI-Prolog, number//1 is predefined (with much more functionality, of course) in library(dcg/basics).
Anyway, about your question
how can I let removewhite execute replace?
I feel you're really 'barking the wrong tree': removing a space - that actually is a separator - will mess up your input...
You can write a more "powerfull" predicate
replace_all(_, _, [], []).
replace_all(L, R, [X|T], [R|T2]):-
member(X, L),
replace_all(L, R, T, T2).
replace_all(L, R, [X|T], [X|T2]) :-
\+ member(X, L),
replace_all(L, R, T, T2).
Then, you will have
removewhite(List1, List2) :-
remove_all([' ', '\t'], '\n', List1, List2).

Delete vowels in a list

Write a program that deletes vowels (String, NoVowelsString) that deletes all vowels from a given string.
So far I've got the condition vowel(X):- member(X,[a,e,i,o,u]). Then I thought of the one that deletes all the elements from the other list:
delete2([],L1,L1).
delete2([H|T],L1,L3) :-
delete2(H,L1,R2),
delete2(T,R2,L3).
So having these two I thought that I could put a condition to those elements being deleted that they have to be a member of [a,e,i,o,u]. Though I still haven't got anywhere.
The following is based on the reification of term equality/inequality.
First, we first define list_memberd_t/3, which behaves just like the memberd_truth/3 but has a different argument order:
list_memberd_t([] ,_,false).
list_memberd_t([Y|Ys],X,Truth) :-
if_(X=Y, Truth=true, list_memberd_t(Ys,X,Truth)).
list_memberd_truth(Xs,X,Truth) :- list_memberd_t(Xs,X,Truth).
For the sake of brevity, let's define memberd_t/3 based on list_memberd_t/3:
memberd_t(X,Xs,Truth) :- list_memberd_t(Xs,X,Truth).
As a parallel to library(apply), let's define tinclude/3:
:- meta_predicate tinclude(2,?,?).
tinclude(P_2,Xs,Zs) :-
list_tinclude_list(Xs,P_2,Zs).
list_tinclude_list([], _P_2,[]).
list_tinclude_list([E|Es],P_2,Fs0) :-
if_(call(P_2,E), Fs0 = [E|Fs], Fs0 = Fs),
list_tinclude_list(Es,P_2,Fs).
tfilter/3 is another name for tinclude/3:
tfilter(P_2,As,Bs) :-
tinclude(P_2,As,Bs).
Next, we define the meta-predicate texclude/3, the opposite of tinclude/3:
:- meta_predicate texclude(2,?,?).
texclude(P_2,Xs,Zs) :-
list_texclude_list(Xs,P_2,Zs).
list_texclude_list([],_,[]).
list_texclude_list([E|Es],P_2,Fs0) :-
if_(call(P_2,E), Fs0 = Fs, Fs0 = [E|Fs]),
list_texclude_list(Es,P_2,Fs).
Now let's use them together!
?- texclude(list_memberd_truth([a,e,i,o,u]),
[d,e,l,e,t,e,' ',v,o,w,e,l,s,' ',i,n,' ',a,' ',l,i,s,t], Filtered).
Filtered = [d, l, t, ' ',v, w, l,s,' ', n,' ', ' ',l, s,t].
Edit
As an alternative to using above texclude/3, let's use tinclude/3 with an auxiliary predicate not/3 to flip the truth value:
:- meta_predicate not(2,?,?).
not(P_2,X,Truth) :-
call(P_2,X,Truth0),
truth_flipped(Truth0,Truth).
truth_flipped(true,false).
truth_flipped(false,true).
Sample query:
?- tinclude(not(list_memberd_truth([a,e,i,o,u])),
[d,e,l,e,t,e,' ',v,o,w,e,l,s,' ',i,n,' ',a,' ',l,i,s,t], Filtered).
Filtered = [d, l, t, ' ',v, w, l,s,' ', n,' ', ' ',l, s,t].
here a solution using DCG. Note how the 'output' is obtained (no arguments passing, only difference lists)
novowels --> ("a";"e";"i";"o";"u"), !, novowels.
% or ..
% novowels --> [C], {memberchk(C, "aeiou")}, !, novowels.
novowels, [C] --> [C], !, novowels.
novowels --> [].
I must confess the second cut doesn't like me, but seems required.
test:
?- phrase(novowels, "abcdefghilmnopq", L),format('~s',[L]).
bcdfghlmnpq
L = [98, 99, 100, 102, 103, 104, 108, 109, 110|...].
edit About the second cut, it seems required by 'left hand' notation: if I code with argument, without cut, I get a correct parsing:
novowels(Cs) --> ("a";"e";"i";"o";"u"), !, novowels(Cs).
% novowels(Cs) --> [C], {memberchk(C, "aeiou")}, !, novowels(Cs).
novowels([C|Cs]) --> [C], novowels(Cs).
novowels([]) --> [].
test:
?- phrase(novowels(L), "abcdefghilmnopq"),format('~s',[L]).
bcdfghlmnpq
L = [98, 99, 100, 102, 103, 104, 108, 109, 110|...] ;
false.
I wonder if this is a bug of the DCG translator, or (more probably) my fault...
Here is the code
deleteV([H|T],R):-member(H,[a,e,i,o,u]),deleteV(T,R),!.
deleteV([H|T],[H|R]):-deleteV(T,R),!.
deleteV([],[]).
What it does?
First it question itself?It's the head a vowel
Yes->We ignore it.
No->We need it.
If it finds an empty list, it constructs the result list, and when returning from backtracking it appends the consonats in front.
This code was tested in SWIProlog.

Using a prolog DCG to find & replace - code review

I came up w/ the following code to replace all occurences of Find w/ Replace in Request & put the answer in Result. This is using a DCG, so they are all lists of character codes. The predicate that client code would use is substitute.
findReplace(_, _, [], []) -->
[]. % The end.
findReplace(Find, Replace, Result, ResultRest) -->
Find, % Found Find.
{ append(Replace, Intermediate, Result) }, % Put in Replace in Find's place.
!, % Make sure we don't backtrack & interpret Find as the next case.
findReplace(Find, Replace, Intermediate, ResultRest).
findReplace(Find, Replace, [ C | Intermediate ], ResultRest) -->
[ C ], % Any other character.
findReplace(Find, Replace, Intermediate, ResultRest).
substitute(Find, Replace, Request, Result):-
phrase(findReplace(Find, Replace, Result, []), Request).
This works in SWI-Prolog. Does anyone have any comments on how I could improve it? I'm learning how to use DCG's & difference lists. E.g., I put in the cut so that, after finding Find, prolog doesn't ever backtrack & interpret that as an ordinary character in the [ C ] case. Is this needed, or is there a more declarative way of doing so?
Another question - is there a predicate already available to do the same thing that substitute does, maybe on atoms?
Thanks in advance.
Consider using semicontext notation to replace subsequences in DCGs:
eos([], []).
replace(_, _) --> call(eos), !.
replace(Find, Replace), Replace -->
Find,
!,
replace(Find, Replace).
replace(Find, Replace), [C] -->
[C],
replace(Find, Replace).
substitute(Find, Replace, Request, Result):-
phrase(replace(Find, Replace), Request, Result).
Example:
?- substitute("a", "b", "atesta", R), atom_codes(A, R).
R = [98, 116, 101, 115, 116, 98],
A = btestb.
Also, underscores_are_much_more_readable thanMixedCaseNamesAsYouSee.
About the second question, i.e. working with atoms, I wrote this utility perusing atomic_list_concat
%% replace_word(+Old, +New, +Orig, -Replaced)
%% is det.
%
% string replacement
% doesn't fail if not found
%
replace_word(Old, New, Orig, Replaced) :-
atomic_list_concat(Split, Old, Orig),
atomic_list_concat(Split, New, Replaced).
Example:
?- replace_word(a, b, atesta, X).
X = btestb.

Resources