SWI-Prolog tokenize_atom/2 replacement? - prolog

What I need to do is to break atom to tokens. E. g.:
tokenize_string('Hello, World!', L).
would unify L=['Hello',',','World','!']. Exactly as tokenize_atom/2 do. But when I try to use tokenize_atom/2 with non-latin letters it fails. Is there any universal replacement or how I can write one? Thanks in advance.

Well, you could write your own lexer. For example I can show you a lexer from my arithmetic expressions parser.
:- use_module(library(http/dcg_basics)).
%
% lexer
%
lex([H | T]) -->
lexem_t(H), !,
lex(T).
lex([]) -->
[].
lexem_t(L) --> trashes, lexem(L), trashes.
trashes --> trash, !, trashes.
trashes --> [].
trash --> comment_marker(End), !, string(_), End.
trash --> white.
comment_marker("*)") --> "(*".
comment_marker("*/") --> "/*".
hex_start --> "0X".
hex_start --> "0x".
lexem(open) --> "(".
lexem(close) --> ")".
lexem(+) --> "+".
lexem(-) --> "-".
lexem(*) --> "*".
lexem(/) --> "/".
lexem(^) --> "^".
lexem(,) --> ",".
lexem(!) --> "!".
lexem(N) --> hex_start, !, xinteger(N). % this handles hex numbers
lexem(N) --> number(N). % this handles integers/floats
lexem(var(A)) --> identifier_c(L), {string_to_atom(L, A)}.
identifier_c([H | T]) --> alpha(H), !, many_alnum(T).
alpha(H) --> [H], {code_type(H, alpha)}.
alnum(H) --> [H], {code_type(H, alnum)}.
many_alnum([H | T]) --> alnum(H), !, many_alnum(T).
many_alnum([]) --> [].
How it works:
?- phrase(lex(L), "abc 123 привет 123.4e5 !+- 0xabc,,,"), write(L).
[var(abc), 123, var(привет), 1.234e+007, !, +, -, 2748, (,), (,), (,)]

Related

Termination of prolog query using using dcgs

Given the program
foo([]) --> [].
foo([Start|Rest]) --> alphanum(Start), foo(Rest).
alphanum(Ch) --> [Ch], { char_type(Ch, alnum) }.
How can I make the query length(I, 2), phrase(foo(C), I), false. terminate?
I am using SWI-Prolog version 8.4.3 for x86_64-linux
The non-termination seems to be originating from the last dcg rule. With the following program (not what I want), the query terminates.
foo([]) --> [].
foo([Start|Rest]) --> alphanum(Start), foo(Rest).
alphanum(Ch) --> [Ch].
I don't mind any other formulation of the program that achieves the same results
It will terminate - but there's a lot of Unicode character combinations to loop through.
You probably want instead (note that this is using usually-preferable codes instead of chars):
foo([]) --> [].
foo([Start|Rest]) --> alnum(Start), foo(Rest).
alnum(Digit) --> [Digit], { between(0'0, 0'9, Digit) }.
alnum(Lower) --> [Lower], { between(0'a, 0'z, Lower) }.
alnum(Upper) --> [Upper], { between(0'A, 0'Z, Upper) }.
Result in swi-prolog:
?- length(I, 2), phrase(foo(C), I), writeln(I), false.
...
[90,88]
[90,89]
[90,90]
false.

Translation to DCG Semicontext not working - follow on

As a follow up to this question which poses the problem
Return count of items in a list but if two identical items are next to each other then don't increment the count.
This code is the closest I came to solving this with DCG and semicontext.
lookahead(C),[C] -->
[C].
% empty list
% No lookahead needed because last item in list.
count_dcg(N,N) --> [].
% single item in list
% No lookahead needed because only one in list.
count_dcg(N0,N) -->
[_],
\+ [_],
{ N is N0 + 1 }.
% Lookahead needed because two items in list and
% only want to remove first item.
count_dcg(N0,N) -->
[C1],
lookahead(C2),
{ C1 == C2 },
count_dcg(N0,N).
% Lookahead needed because two items in list and
% only want to remove first item.
count_dcg(N0,N) -->
[C1],
lookahead(C2),
{
C1 \== C2,
N1 is N0 + 1
},
count_dcg(N1,N).
count(L,N) :-
DCG = count_dcg(0,N),
phrase(DCG,L).
What is the correct way to solve the problem using DCG with semicontext on the clause head?
Would like to know if the variation with the semicontext on the clause head is possible or not. If possible then working example code is desired, if not possible then an explanation is desired.
I think this is using semi context notation correctly. I am counting using 0,s(0),...
% Constraint Logic Programming
:- use_module(library(dif)). % Sound inequality
:- use_module(library(clpfd)). % Finite domain constraints
list([]) --> [].
list([L|Ls]) --> [L], list(Ls).
state(S), [state(S)] --> [state(S)].
state(S, s(S)), [state(s(S))] --> [state(S)].
keep_state(S,I),[state(S)] --> [state(S)],[I].
end_state(S) -->[state(S)],[].
lookahead(C),[S,C] -->
[S,C].
count_dcg(S,S) -->
state(S), %might not need this
end_state(S).
/* Can be used get the length of a list
count_dcg(S,S2) -->
state(S,S1),
keep_state(S1,_),
count_dcg(S1,S2),
{}.
*/
%last item.
count_dcg(S,S1) -->
state(S,S1),
keep_state(S1,_C),
list(R),
{R = [state(_)]}.
%Two the same dont increase state
count_dcg(S,S1) -->
state(S), %might not need this
keep_state(S,C1),
lookahead(C1),
count_dcg(S,S1).
%Two different increase state
count_dcg(S,S2) -->
state(S,S1),
keep_state(S1,C1),
lookahead(C2),
{
dif(C1,C2)
},
count_dcg(S1,S2).
count(L,S) :-
phrase(count_dcg(0,S),[state(0)|L]).
This does not work as well as I hoped for cases like:
65 ?- count([a,b,X,c],L).
X = b,
L = s(s(s(0))) ;
;
X = c,
L = s(s(s(0))) .
You can convert peano with:
natsx_int(0, 0).
natsx_int(s(N), I1) :-
I1 #> 0,
I2 #= I1 - 1,
natsx_int(N, I2).
or you can change the state predicates:
state(S), [state(S)] --> [state(S)].
state(S, S2), [state(S2)] --> [state(S)],{S2#=S+1}.
How about:
:-use_module(library(clpfd)).
list([]) --> [].
list([L|Ls]) --> [L], list(Ls).
lookahead(C),[C] -->
[C].
count_dcg(N,N) --> [].
count_dcg(N0,N) --> %last item.
[_],
list(R),
{R = [], N #=N0+1}.
count_dcg(N0,N) -->
[C1],
lookahead(C1),
count_dcg(N0,N).
count_dcg(N0,N) -->
[C1],
lookahead(C2),
{
dif(C1,C2),
N1 #= N0 + 1
},
count_dcg(N1,N).
count(L,N) :-
phrase(count_dcg(0,N),L).

Using list in Prolog DCG

I am trying to convert a Prolog predicate into DCG code. Even if I am familiar with grammar langage I have some troubles to understand how DCG works with lists and how I am supposed to use it.
Actually, this is my predicate :
cleanList([], []).
cleanList([H|L], [H|LL]) :-
number(H),
cleanList(L, LL),
!.
cleanList([_|L], LL) :-
cleanList(L, LL).
It is a simple predicate which removes non-numeric elements.
I would like to have the same behaviour writes in DCG.
I tried something like that (which does not work obviously) :
cleanList([]) --> [].
cleanList([H]) --> {number(H)}.
cleanList([H|T]) --> [H|T], {number(H)}, cleanList(T).
Is it possible to explain me what is wrong or what is missing ?
Thank you !
The purpose of DCG notation is exactly to hide, or better, make implicit, the tokens list. So, your code should look like
cleanList([]) --> [].
cleanList([H|T]) --> [H], {number(H)}, cleanList(T).
cleanList(L) --> [H], {\+number(H)}, cleanList(L).
that can be made more efficient:
cleanList([]) --> [].
cleanList([H|T]) --> [H], {number(H)}, !, cleanList(T).
cleanList(L) --> [_], cleanList(L).
A style note: Prologgers do prefers to avoid camels :)
clean_list([]) --> [].
etc...
Also, I would prefer more compact code:
clean_list([]) --> [].
clean_list(R) --> [H], {number(H) -> R = [H|T] ; R = T}, clean_list(T).

DCG doubling a count

I am playing around with DCGs and I have this code. This displays x number of 0s and x numbers of As.
y --> test(Count), as(Count).
test(0) --> [].
test(succ(0)) --> [0].
test(succ(succ(Count))) --> [0], test(Count), [0].
as(0) --> [].
as(succ(Count)) --> [a],as(Count).
my question is how do I pass a functor to make the number of As double the number of 0s. Here's what I tried, but it doesn't work.
y --> test(Count), as(add(Count,Count,R)).
If i only want to add one, this is what did and it works fine.
y --> test(Count), as(succ(Count)).
y --> test(Count), as(Count), as(Count).
or
y --> test(Count), {add(Count,Count,DCount)}, as(DCount).
Or you can double the succ for test
y --> test(Count), as(Count).
test(0) --> [].
test(succ(succ(Count))) --> [0], test(Count).
as(0) --> [].
as(succ(Count)) --> [a], as(Count).

Prolog calculator simply returns true

I'm writing a calculator in Prolog that reads natural language questions and returns a number answer for a class assignment, and I'm nearly complete. However, when I input a sentence the program simply returns 'Yes' and then quits. As far as I can tell it doesn't even read in the sentence. This is my first time writing in Prolog, so I have no clue what is wrong. Any help would be greatly appreciated.
My code:
:- consult('aux.p').
accumulator(0).
start :-
write('Cranky Calculator'), nl,
write('-----------------'), nl,
cvt.
cvt :-
write('What do ya want?'), nl,
read_sentence(Question),
butlast(Question, Questio),
Questio \== [quit], !,
(
phrase(sentence(Value), Questio, []),
write(Value);
write_string('Stop it with your gibberish!')
), nl,
cvt.
cvt.
reset(V) :-
retract(accumulator(_)),
assert(accumulator(V)).
accumulate('plus', N, Value) :-
{Temp is accumulator(_)},
{Value is Temp + N},
reset(Value).
accumulate('minus', N, Value) :-
{Temp is accumulator(_)},
{Value is Temp - N},
reset(Value).
accumulate('divided', N, Value) :-
{Temp is accumulator(_)},
{Value is Temp / N},
reset(Value).
accumulate('times', N, Value) :-
{Temp is accumulator(_)},
{Value is Temp * N},
reset(Value).
accumulate(N1, 'plus', N2, Value) :-
{Value is N1 + N2},
reset(Value).
accumulate(N1, 'minus', N2, Value) :-
{Value is N1 - N2},
reset(Value).
accumulate(N1, 'divided', N2, Value) :-
{Value is N1 / N2},
reset(Value).
accumulate(N1, 'times', N2, Value) :-
{Value is N1 * N2},
reset(Value).
%------------------base productions---------------------
% sentence can be to an entirely new question or simply be an addition
% to the previous one
sentence(V) --> base(V1), {V is V1}.
sentence(V) --> additional(V1), {V is V1}.
sentence --> [].
base(Value) -->
pro, be, number(N1), oper(OP), number(N2), qmark,
{
accumulate(N1, OP, N2, V), {Value is V}
}.
additional(Value) -->
oper(OP), number(N), qmark,
{
accumulate(OP, N, V), {Value is V}
}.
pro --> [what].
pro --> [how], [much].
be --> [is].
number(N) --> five_digit(N1), {N is N1}.
five_digit(N) --> ten_thousands(V1), four_digit(V2), {N is 1000 * V1 + V2}.
four_digit(N) --> thousands(V1), three_digit(V2), {N is 1000 * V1 + V2}.
three_digit(N) --> hundreds(V1), two_digit(V2), {N is 100 * V1 + V2}.
two_digit(N) --> tens(V1), one_digit(V2), {N is V1 + V2}.
two_digit(N) --> teens(V), {N is V}.
one_digit(N) --> digit(V), {N is V}.
one_digit(0) --> [].
ten_thousands(T) --> tens(V), thousand, {T is V}.
ten_thousands(T) --> tens(V), {T is V}.
ten_thousands(T) --> teens(V), thousand, {T is V}.
ten_thousands(0) --> [].
thousands(T) --> digit(V), thousand, {T is V}.
thousands(0) --> [].
hundreds(T) --> digit(V), hundred, {T is V}.
hundreds(0) --> [].
thousand --> [thousand].
hundred --> [hundred].
digit(1) --> [one].
digit(2) --> [two].
digit(3) --> [three].
digit(4) --> [four].
digit(5) --> [five].
digit(6) --> [six].
digit(7) --> [seven].
digit(8) --> [eight].
digit(9) --> [nine].
tens(20) --> [twenty].
tens(30) --> [thirty].
tens(40) --> [fourty].
tens(50) --> [fifty].
tens(60) --> [sixty].
tens(70) --> [seventy].
tens(80) --> [eighty].
tens(90) --> [ninety].
teens(10) --> [ten].
teens(11) --> [eleven].
teens(12) --> [twelve].
teens(13) --> [thirteen].
teens(14) --> [fourteen].
teens(15) --> [fifteen].
teens(16) --> [sixteen].
teens(17) --> [seventeen].
teens(18) --> [eighteen].
teens(19) --> [nineteen].
oper(plus) --> [plus].
oper(plus) --> [and].
oper(minus) --> [minus].
oper(divided) --> ['divided by'].
oper(times) --> [times].
qmark --> ['?'].
The output I get looks like:
|: what is twelve plus two?
Yes
I took your code as a spec for a calculator that also gives the
result as text. The idea here is to combine DCG and CLP(FD).
CLP(FD) is constraint solving for finite domains. Finite domains
should be enough for your calculator. To enable CLP(FD) you have
first to load the appropriate library. In Jekejeke Minlog this
is done as follows:
:- ensure_loaded(library('clpfd.px')).
The code has first a section that can not only recognize numbers
but also generate text for numbers. This is mainly the part where
DCGs are combined with CLP(FD):
number(N) --> {N #= 1000 * V1 + 100 * V2 + V3}, thousands(V1),
hundreds(V2), two_digit_opt(V3).
thousands(N) --> two_digit(N), thousand.
thousands(0) --> [].
thousand --> [thousand].
hundreds(N) --> digit(N), hundred.
hundreds(0) --> [].
hundred --> [hundred].
two_digit_opt(N) --> two_digit(N).
two_digit_opt(0) --> [].
two_digit(N) --> {N #= V1*10 + V2}, tens(V1), digit_opt(V2).
two_digit(N) --> {N #= V+10}, teens(V).
two_digit(N) --> digit(N).
digit_opt(N) --> digit(N).
digit_opt(0) --> [].
digit(1) --> [one].
digit(2) --> [two].
digit(3) --> [three].
digit(4) --> [four].
digit(5) --> [five].
digit(6) --> [six].
digit(7) --> [seven].
digit(8) --> [eight].
digit(9) --> [nine].
tens(2) --> [twenty].
tens(3) --> [thirty].
tens(4) --> [fourty].
tens(5) --> [fifty].
tens(6) --> [sixty].
tens(7) --> [seventy].
tens(8) --> [eighty].
tens(9) --> [ninety].
teens(0) --> [ten].
teens(1) --> [eleven].
teens(2) --> [twelve].
teens(3) --> [thirteen].
teens(4) --> [fourteen].
teens(5) --> [fifteen].
teens(6) --> [sixteen].
teens(7) --> [seventeen].
teens(8) --> [eighteen].
teens(9) --> [nineteen].
Here is a prove that the bidirectionality works:
?- phrase(number(X),[fifty,five]).
X = 55 ;
No
?- phrase(number(55),X).
X = [fifty,five] ;
No
Adding the calculator was straight forward. I didn't use assert/retract,
I simply using an argument in an infinite loop. I don't know how healthy
this is for your Prolog system, especially since we now inbetween touch
the constraint store. At least in Jekejeke Minlog as of version 0.7.2
the constraint store will not yet be completely recycled, so that one
cannot run the loop indefinitely.
But to show how all the pieces can be put together, the loop solution
is fine. The code reads as follows:
loop(S) :-
write('> '),
flush_output,
read(L),
phrase(cmd(C),L),
do(C,S,T),
phrase(number(T),M),
write(M), nl,
!, loop(T).
loop(S) :-
write('?'), nl,
loop(S).
do(set(N),_,N).
do(add(N),S,T) :- T is S+N.
do(sub(N),S,T) :- T is S-N.
cmd(set(N)) --> factor(N).
cmd(add(N)) --> [plus], factor(N).
cmd(sub(N)) --> [minus], factor(N).
factor(M) --> number(N), more(N, M).
more(N, M) --> [times], number(H), {J is N*H}, more(J,M).
more(N, M) --> [divided, by], number(H), {J is N//H}, more(J,M).
more(N, N) --> [].
And here is an example execution:
?- loop(0).
> [eleven,times,eleven].
[one,hundred,twenty,one]
> [minus,sixty,six].
[fifty,five]
Here is a little how to for the Jekejeke CLP(FD)
Jekejeke Minlog Desktop Installation
https://www.youtube.com/watch?v=6ZipaIrxSFQ
Jekejeke Minlog Android Installation
https://www.youtube.com/watch?v=Y2P7cEuOIws

Resources