I am trying to break a word into different syllables in Prolog according to 2 different rules ..
rule 1: vowel-consonant-vowel (break word after second vowel)
rule 2: vowel-consonant-consonant-vowel (break word between the 2
consonant) , for example, calculator = cal-cula-tor ..
I already have the following code in Prolog, however, it only analyzes the first 3 or 4 letters of the word ..
I need it to process and analyze the entire word.
vowel(a).
vowel(e).
vowel(i).
vowel(o).
vowel(u).
consonant(L):- not(vowel(L)).
syllable(W, S, RW):-
atom_chars(W, [V1, C, V2|Tail]),
vowel(V1),
consonant(C),
vowel(V2),
!,
atomic_list_concat([V1, C, V2], S),
atomic_list_concat(Tail, RW).
syllable(W, S, RW):-
atom_chars(W, [V1, C, C2, V2|Tail]),
vowel(V1),
consonant(C),
consonant(C2),
vowel(V2),
!,
atomic_list_concat([V1, C, C2, V2], S),
atomic_list_concat(Tail, RW).
syllable(W, W, _).
break(W, B):-
syllable(W, B, ''), !.
break(W, B):-
syllable(W, S, RW),
break(RW, B2),
atomic_list_concat([S, '-', B2], B).
First, a setting that makes it much more convenient to specify lists of characters, and which I recommend you use in your code if you process text a lot:
:- set_prolog_flag(double_quotes, chars).
Second, the data, represented in such a way that the definitions can be used in all directions:
vowel(a). vowel(e). vowel(i). vowel(o). vowel(u).
consonant(C) :- maplist(dif(C), [a,e,i,o,u]).
For example:
?- consonant(C).
dif(C, u),
dif(C, o),
dif(C, i),
dif(C, e),
dif(C, a).
whereas the version you posted incorrectly says that there is no consonant:
?- consonant(C).
false.
The rules you outline are readily described in Prolog:
% rule 1: vowel-consonant-vowel (break after second vowel)
rule([V1,C,V2|Rest], Bs0, Bs, Rest) :-
vowel(V1), consonant(C), vowel(V2),
reverse([V2,C,V1|Bs0], Bs).
% rule 2: vowel-consonant-consonant-vowel (break between the consonants)
rule([V1,C1,C2,V2|Rest], Bs0, Bs, [C2,V2|Rest]) :-
vowel(V1), consonant(C1), consonant(C2), vowel(V2),
reverse([C1,V1|Bs0], Bs).
% alternative: no break at this position
rule([L|Ls], Bs0, Bs, Rest) :-
rule(Ls, [L|Bs0], Bs, Rest).
Exercise: Why am I writing [V2,C,V1|_] instead of [V1,C,V2|...] in the call of reverse/2?
Now, it only remains to describe the list of resulting syllables. This is easy with dcg notation:
word_breaks([]) --> [].
word_breaks([L|Ls]) --> [Bs],
{ rule([L|Ls], [], Bs, Rest) },
word_breaks(Rest).
word_breaks([L|Ls]) --> [[L|Ls]].
Now the point: Since this program is completely pure and does not incorrectly commit prematurely, we can use it to show that there are also other admissible hyphenations:
?- phrase(word_breaks("calculator"), Hs).
Hs = [[c, a, l], [c, u, l, a], [t, o, r]] ;
Hs = [[c, a, l], [c, u, l, a, t, o], [r]] ;
Hs = [[c, a, l], [c, u, l, a, t, o, r]] ;
Hs = [[c, a, l, c, u, l, a], [t, o, r]] ;
Hs = [[c, a, l, c, u, l, a, t, o], [r]] ;
Hs = [[c, a, l, c, u, l, a, t, o, r]].
In Prolog, it is good practice to retain the generality of your code so that you can readily observe alternative solutions. See logical-purity.
I guess its time for a DCG push back solution. The push back is used in the second rule of break//1. It is to reflect that we look at four characters but only consume two characters:
vowel(a). vowel(e). vowel(i). vowel(o). vowel(u).
consonant(C) :- \+ vowel(C).
break([V1,C,V2]) -->
[V1,C,V2],
{vowel(V1), consonant(C), vowel(V2)}.
break([V1,C1]), [C2,V2] -->
[V1,C1,C2,V2],
{vowel(V1), consonant(C1), consonant(C2), vowel(V2)}.
syllables([L|R]) --> break(L), !, syllables(R).
syllables([[C|L]|R]) --> [C], syllables([L|R]).
syllables([[]]) --> [].
So the overall solution doesn't need some extra predicates such as append/3 or reverse/2. We have also placed a cut to prune the search, which can be done because of the character catchall in the second rule of syllables//1.
Here are some example runs:
Jekejeke Prolog 2, Laufzeitbibliothek 1.1.6
(c) 1985-2016, XLOG Technologies GmbH, Schweiz
?- set_prolog_flag(double_quotes, chars).
Ja
?- phrase(syllables(R), "calculator").
R = [[c,a,l],[c,u,l,a],[t,o,r]] ;
Nein
?- phrase(syllables(R), "kitchensink").
R = [[k,i,t,c,h,e,n],[s,i,n,k]] ;
Nein
P.S.: In some older draft standards this DCG technique was
called "right-hand-context", and instead of the verb "push
back", the verb "prefixing" was used. In a newer draft standard
this is called "semicontext", and instead of the verb "push back",
the verb "restoring" is used.
https://www.complang.tuwien.ac.at/ulrich/iso-prolog/dcgs/dcgsdraft-2015-11-10.pdf
I think you could write it more simply.Here is my implementation:
syllable( Input, Final_Word):-
atom_chars( Input, Char_list),
(split(Char_list, Word)-> atom_chars( Final_Word, Word);
Final_Word=Input).
split([],[]).
split([X,Y,Z|T],[X,Y,Z,'-'|T1]):-
vowel(X),vowel(Z),
atom_chars( Input, T),
syllable(Input,T2),
atom_chars( T2, T1).
split([X,Y,Z,W|T],[X,Y,'-',Z|T1]):-
vowel(X),\+vowel(Y),\+vowel(Z),vowel(W),
atom_chars( Input, [W|T]),
syllable(Input,T2),
atom_chars( T2, T1).
split([X|T],[X|T1]):- \+vowel(X),split(T,T1).
split/2 splits the word adding '-' where it could be added following the above rules you stated and returns a list to syllable. atom_chars/2 transforms the list to a word. If the word couldn't be split then the output is the input.
Example:
?- syllable(calculator,L).
L = 'calcu-lato-r'.
I'm don't understand why you wrote 'calculator = cal-cula-tor ' since it doesn't follows the rules stated, since "cal" is not vowel-constant-vowel but constant-vowel-constant and same for the rest of thr word...
Related
I am trying to append a list and a word together, and if the user types a specific word I want to add a certain letter to the list.
For example, I want to make the words entered in a list change based on the pronoun.
?- append([t,a,l,k], she, X).
X = [t, a, l, k, s].
so if the user enters [t, a, l, k] and she, Prolog will add 's' to the end of the list.
The code I have so far is only able to append the two entered values and not based on if the user enters a certain word.
append( [], X, X).
append( [A | B], C, [A | D]) :- append( B, C, D).
result:
?- append([t,a,l,k], she, X).
X = [t, a, l, k|she].
How can I make it so if they type she prolog adds 's' to the list instead of 'she'?
Thank you.
You have to decompose the atom she into individual characters first.
It is also best to use my_append/3 because append/3 already exists.
my_append( [], W, [F]) :- atom_chars(W,[F|_]).
my_append( [A | B], W, [A | D]) :- my_append(B, W, D).
:- begin_tests(shemanator).
test("append 'she'", true(X == [t, a, l, k, s])) :-
my_append([t,a,l,k], she, X).
test("append 'she' to an empty list", true(X == [s])) :-
my_append([], she, X).
test("append 's'", true(X == [t, a, l, k, s])) :-
my_append([t,a,l,k], s, X).
:- end_tests(shemanator).
And so
?- run_tests.
% PL-Unit: shemanator ... done
% All 3 tests passed
true.
Given the letters [a, b, c] generate the list containing all the words of length N, formed out of this letters.
For example:
?- generate(2, L).
should output:
L = [aa, ab, ac, ba, bb, bc, ca, cb, cc].
At first, this seemed like a pretty simple problem, but I've discovered that none of my implementations work.
This is the second implementation, the one that kind of works.
letter(X) :- member(X, [a, b, c]).
generateWord(0, []) :- !.
generateWord(N, [H|T]) :-
letter(H),
NextN is N - 1,
generateWord(NextN, T).
generateAtomicWord(N, Word) :-
generateWord(N, WList),
atomic_list_concat(WList, Word).
maxSolutions(N, R) :- R is N ** 3.
generate(N, CurrentList, ResultList) :-
maxSolutions(N, R),
length(CurrentList, L),
L =:= R,
append(CurrentList, [], ResultList), !.
generate(N, CurrentList, ResultList) :-
generateAtomicWord(N, NewWord),
\+ member(NewWord, CurrentList),
append(CurrentList, [NewWord], NewList),
generate(N, NewList, ResultList).
generate(N, ResultList) :-
generate(N, [], ResultList).
It kind of works because when given N = 3 the program outputs:
L = [aaa, aab, aac, aba, abb, abc, aca, acb, acc|...]
My first implementation is different, but I can't make it work on any case.
letter(X) :- member(X, [a, b, c]).
generateWord(0, []) :- !.
generateWord(N, [H|T]) :-
letter(H),
NextN is N - 1,
generateWord(NextN, T), !.
generateAtomicWord(N, Word) :-
generateWord(N, WList),
atomic_list_concat(WList, Word).
maxSolutions(N, R) :- R is N ** 3.
generate(N, [H]) :- generateAtomicWord(N, H).
generate(N, [H|T]) :-
generate(N, T),
length(T, TailLen),
maxSolutions(N, M),
(TailLen =:= M -> !;
generateAtomicWord(N, H),
\+ member(H, T)).
This one just outputs:
L = [aa]
and when requested for the rest of the solutions it cycles.
The problem must be solved without using predicates such as:
findall, findnsol, bagof, setof, etc...
that find all the solutions.
I've added the tag backtracking because it does resemble a backtracking problem, but I've no idea what a standard implementation might look like in Prolog.
It kind of works because when given N = 3 the program outputs:
L = [aaa, aab, aac, aba, abb, abc, aca, acb, acc|...]
That is not an error, that is the Prolog interpreter that displays the list in a shorter way. If you hit w when it shows the output, it will show the full list. For more information see this answer.
That being said, you make it too hard. You can first make a predicate that will unify a variable with all possible atoms:
letter(X) :- member(X, [a, b, c]).
word(0, []).
word(N, [C|W]) :-
N > 0,
N1 is N-1,
letter(C),
word(N1, W).
Now we can generate all possibilities with findall/3 [swi-doc], and use for example maplist/3 [swi-doc] with atomic_list_concat/2 to convert the list to a single atom:
words(N, L) :-
findall(W, word(N, W), Ws),
maplist(atomic_list_concat, Ws, L).
For example:
?- words(0, L).
L = [''].
?- words(1, L).
L = [a, b, c].
?- words(2, L).
L = [aa, ab, ac, ba, bb, bc, ca, cb, cc].
?- words(3, L).
L = [aaa, aab, aac, aba, abb, abc, aca, acb, acc|...].
We can generate a list of lists ourselves by updating a "difference" list until all possible words are generated:
wordlist(N, L) :-
wordlist(N, [], L, []).
wordlist(0, R, [W|T], T) :-
reverse(R, W),
!.
wordlist(N, C, L, T) :-
N > 0,
N1 is N-1,
wordfold([a,b,c], N1, C, L, T).
wordfold([], _, _, L, L).
wordfold([C|CS], N1, CT, L, T) :-
wordlist(N1, [C|CT], L, L2),
wordfold(CS, N1, CT, L2, T).
For example:
?- wordlist(0, L).
L = [[]].
?- wordlist(1, L).
L = [[a], [b], [c]].
?- wordlist(2, L).
L = [[a, a], [a, b], [a, c], [b, a], [b, b], [b, c], [c, a], [c|...], [...|...]].
You then still need to perform atomic_list_concat on it. I leave that as an exercise.
How can I write a program in prolog that breaks a word into syllables using predicate: First syllable is vowel-consonant-vowel .. or Second syllable: vowel-consonant-consonant-vowel. For example; abandon = aba-ndon ..
This program will basically apply the rules you mention, but I don't think it will make a good tool for word processing.
vowel(a).
vowel(e).
vowel(i).
vowel(o).
vowel(u).
consonant(L):- not(vowel(L)).
syllable(W, S, RW):- atom_chars(W, [V1, C, V2|Tail]), vowel(V1), consonant(C), vowel(V2), !, atomic_list_concat([V1, C, V2], S), atomic_list_concat(Tail, RW).
syllable(W, S, RW):- atom_chars(W, [V1, C, C2, V2|Tail]), vowel(V1), consonant(C), consonant(C2),vowel(V2), !, atomic_list_concat([V1, C, C2, V2], S), atomic_list_concat(Tail, RW).
syllable(W, W, _).
break(W, B):- syllable(W, B, ''), !.
break(W, B):- syllable(W, S, RW), break(RW, B2), atomic_list_concat([S, '-', B2], B).
The program defines what a vowel is and what it is not. Also a syllable according to your rules, and how to break a word. Using the predicate ´break/2´ you can test it:
?- break(abaebbi, B).
B = 'aba-ebbi'
What makes me doubt about your rules, besides my poor English, is that testing with each word of my answer, returns the entire word always :)
?-break('syllable', B).
B = syllable
There's a publically available list of words split into "syllables" (not sure exactly what the criteria is) here. Each line is a word, so you could read the words in one at time, split them into syllables, and store them in some dynamic predicate. Suppose the file is called mhypth.txt, as it is at the link above:
go :-
%% I don't know why 65533 is the code for the separator, but it is.
string_codes(Sep, [65533]),
setup_call_cleanup(
open(<Path to mhyph.txt>>, read, St),
read_sylls(St, Sep),
close(St)
).
:- dynamic word_sylls/2.
read_sylls(Stream, Sep) :-
read_line_to_string(Stream, S),
(S == end_of_file -> true
;
split_string(S, Sep, Sep, Parts),
atomics_to_string(Parts, Word),
asserta(word_sylls(Word, Parts)),
read_sylls(Stream, Sep)
).
If you load this into your SWI Prolog interpreter, you can then do something like this:
?- go.
true.
?- word_sylls(A,B).
A = "Zurich",
B = ["Zu", "rich"] ;
A = "Zollner",
B = ["Zoll", "ner"] ;
A = "zymurgy",
B = ["zy", "mur", "gy"] ;
A = "zymosis",
B = ["zy", "mo", "sis"] ;
A = "zymoplastic",
B = ["zy", "mo", "plas", "tic"] ;
A = "zymolytic",
B = ["zy", "mo", "lyt", "ic"] ;
A = "zymologic",
B = ["zy", "mo", "log", "ic"]
?- word_sylls("abandon", Sylls).
Sylls = ["a", "ban", "don"].
?-
I implemented function to get sublist of list, for example:
sublist([1,2,4], [1,2,3,4,5,1,2,4,6]).
true
sublist([1,2,4], [1,2,3,4,5,1,2,6]).
false
look at my solution:
my_equals([], _).
my_equals([H1|T1], [H1|T2]) :- my_equals(T1, T2).
sublist([], _).
sublist(L1, [H2|T2]) :- my_equals(L1, [H2|T2]); sublist(L1, T2).
Could you give me another solution ? Maybe there is exists some predefined predicate as my_equals ?
You can unify a sublist using append/3, like this:
sublist(SubList, List):-
append(_, Tail, List),
append(SubList, _, Tail).
The first call to append/3 will split List into two parts (i.e. dismiss the some "leading" items from List.
The second call to append/3 will check whether SubList is itself a sublist of Tail.
As #false's suggests it would be better, at least for ground terms, to exchange goals,
sublist(SubList, List):-
append(SubList, _, Tail),
append(_, Tail, List).
There's also a DCG approach to the problem:
substr(Sub) --> seq(_), seq(Sub), seq(_).
seq([]) --> [].
seq([Next|Rest]) --> [Next], seq(Rest).
Which you would call with:
phrase(substr([1,2,4]), [1,2,3,4,5,1,2,4,6]).
You can define:
sublist(Sub, List) :-
phrase(substr(Sub), List).
So you could call it by, sublist([1,2,4], [1,2,3,4,5,1,2,4,6])..
Per #mat's suggestion:
substr(Sub) --> ..., seq(Sub), ... .
... --> [] | [_], ... .
Yes, you can have a predicate named .... :)
Per suggestions from #repeat and #false, I changed the name from subseq (subsequence) to substr (substring) since the meaning of "subsequence" embraces non-contiguous sequences.
This is an alternative solution to Lurkers, which is slightly faster,
assuming S is much shorter than L in length and thus the phrase/3 DCG
translation time is negligible:
sublist(S, L) :-
phrase((..., S), L, _).
If S=[X1,..,Xn] it will DCG translate this into a match I=[X1,..,Xn|O]
before execution, thus delegating my_equals/2 completely to Prolog
unification. Here is an example run:
?- phrase((..., [a,b]), [a,c,a,b,a,c,a,b,a,c], X).
X = [a, c, a, b, a, c] ;
X = [a, c] ;
false.
Bye
P.S.: Works also for other patterns S than only terminals.
Maybe there is exists some predefined predicate
If your Prolog has append/2 from library(lists):
sublist(S, L) :- append([_,S,_], L).
Another fairly compact definition, available in every (I guess) Prolog out there:
sublist(S, L) :- append(S, _, L).
sublist(S, [_|L]) :- sublist(S, L).
Solution in the original question is valid just, as has been said, remark that "my_equals" can be replaced by "append" and "sublist" loop by another append providing slices of the original list.
However, prolog is (or it was) about artificial intelligence. Any person can answer immediately "no" to this example:
sublist([1,1,1,2], [1,1,1,1,1,1,1,1,1,1] ).
because a person, with simple observation of the list, infers some characteristics of it, like that there are no a "2".
Instead, the proposals are really inefficient on this case. By example, in the area of DNA analysis, where long sequences of only four elements are studied, this kind of algorithms are not applicable.
Some easy changes can be done, with the objective of look first for the most strongest condition. By example:
/* common( X, Y, C, QX, QY ) => X=C+QX, Y=C+QY */
common( [H|S2], [H|L2], [H|C2], DS, DL ) :- !,
common( S2, L2, C2, DS, DL ).
common( S, L, [], S, L ).
sublist( S, L ) :-
sublist( [], S, L ).
sublist( P, Q, L ) :- /* S=P+Q */
writeln( Q ),
length( P, N ),
length( PD, N ), /* PD is P with all unbound */
append( PD, T, L ), /* L=PD+T */
common( Q, T, C, Q2, _DL ), /* S=P+C+Q2; L=PD+C+_DL */
analysis( L, P, PD, C, Q2 ).
analysis( _L, P, P, _C, [] ) :- !. /* found sublist */
analysis( [_|L2], P, _PD, C, [] ) :- !,
sublist( P, C, L2 ).
analysis( [_|L2], P, _PD, C, Q2 ) :-
append( P, C, P2 ),
sublist( P2, Q2, L2 ).
Lets us try it:
?- sublist([1,1,1,2], [1,1,1,1,1,1,1,1,1,1]).
[1,1,1,2]
[2]
[2]
[2]
[2]
[2]
[2]
[2]
[2]
false.
see how "analysis" has decided that is better look for the "2".
Obviously, this is a strongly simplified solution, in a real situation better "analysis" can be done and patterns to find must be more flexible (the proposal is restricted to patterns at the tail of the original S pattern).
I need to read a word in from the user and then split it into syllables
based on one of 2 rules: vowel-consonant-vowel, or
vowel-consonant-consonant-vowel.
Looks that predicate "name" does not work, should be word in output, not list
Could you please help?
vowel(a).
vowel(e).
vowel(i).
vowel(o).
vowel(u).
vowel(y).
consonant(L) :- not(vowel(L)).
ssplit(A,B) :- atom_chars(A,K),ssplit(K,B,-1). %convert atom to list
test(A,B) :- append(A,[],F), name(N,F).
ssplit([],[],0) :- append(L,[],F), name(N,F), writeln(N).
ssplit([H1|T1],[H1|T2],-1) :- ssplit(T1,T2,0).
ssplit([H1|T1],[H1|T2],0) :- consonant(H1), ssplit(T1,T2,0). %split to syllables
ssplit([H1|T1],[H1|T2],0) :- vowel(H1), ssplit(T1,T2,1).
ssplit([H1|T1],[H1|T2],1) :- vowel(H1), ssplit(T1,T2,1). %split to syllables
ssplit([H1|[]],[H1|T2],1) :- consonant(H1), ssplit([],T2,0).
ssplit([H1,H2|[]],[H1,H2|T2],1) :- consonant(H1), vowel(H2), ssplit([],T2,1).
ssplit([H1,H2|T1],['-',H1,H2|T2],1) :- consonant(H1), vowel(H2), ssplit(T1,T2,1).
ssplit([H1,H2|T1],T2,1) :- consonant(H1), consonant(H2), ssplit([H1,H2|T1],T2,2).
ssplit([H1,H2|[]],[H1,H2|T2],2) :- ssplit([],T2,0). %split to syllables
ssplit([H1,H2,H3|[]],[H1,H2,H3|T2],2) :- vowel(H3), ssplit([],T2,1).
ssplit([H1,H2,H3|T1],[H1,'-',H2,H3|T2],2) :- vowel(H3), ssplit(T1,T2,1).
ssplit([H1,H2,H3|T1],[H1,H2,H3|T2],2) :- consonant(H3), ssplit(T1,T2,0).
/*
ssplit(analog,L).
ssplit(ruler,L).
ssplit(prolog,L).
*/
DCG are more practical when handling input:
split_name(N, L) :-
atom_codes(N, Cs),
phrase(split_v(L), Cs, []).
split_v([]) --> [].
split_v([S|Syllables]) -->
vowel(X),
consonant(Y),
vowel(Z),
{atom_codes(S, [X,Y,Z])},
split_v(Syllables).
split_v([S|Syllables]) -->
vowel(V1),
consonant(C1),
consonant(C2),
vowel(V2),
{atom_codes(S, [V1,C1,C2,V2])},
split_v(Syllables).
% catch all unhandled
split_v([S|Syllables]) -->
[C], {atom_codes(S, [C])},
split_v(Syllables).
vowel(C) --> [C], {vowel(C)}.
consonant(C) --> [C], {\+vowel(C)}.
vowel(C) :- memberchk(C, "aeiou").
test:
?- split_name(stackoverflow,L).
L = [s, t, acko, v, e, r, f, l, o|...] ;
L = [s, t, a, c, k, ove, r, f, l|...] ;
L = [s, t, a, c, k, o, v, e, r|...] ;
false.