Space-efficient reading of chars in canonical form - prolog

When using chars (lists of characters, thus atoms of length one) to represent text, we have the following options for writing them within terms:
"First," the double quoted list notation (6.3.7) is the most efficient one, requiring at least n+2 characters.
But it can only be read back if the Prolog flag double_quotes is set to chars.
['N',e,x,t,','] comes the list notation with at least 2n+1 characters. While it is nice and relatively compact, it implies that also operators are used when writing other data since it is enabled with ignore_ops(false), and this necessitates that the same operators will be present when reading, making it quite brittle.
'.'('L','.'(a,'.'(s,'.'(t,'.'(',',[]))))) the canonical notation which uses functional form also for lists requiring at least 7n+2 characters. That is a lot, but for interoperability (and that includes interoperability with the same system) it is best since it neither depends on the double_quotes flag nor the various operator declarations.
Writing chars in canonical notation can be done in constant space. But for reading, the situation is a bit more tricky. After all, a sequence starting with '.'(a, may also refer to a term '.'(a,Further,b). So a naive reading will have to wait (and use space) until the entire list of chars is read in. On the other hand, it seems to be a safe bet that '.'(a, will be a list constructor '.'(a,Further). In other words,
How to read a term in canonical notation with constant auxiliary space for the reading of chars within?
In case it helps just consider terms sampleterm/1. So consider the reading of all such terms written in canonical form. And, if you like, formulate it as a DCG.
sampleterm([]).
sampleterm(a).
sampleterm(b).
sampleterm('.'(E,Es)) :- % the real list constructor
sampleterm(E),
sampleterm(Es).
sampleterm('.'(E,F,G)) :- % no list constructor
sampleterm(E),
sampleterm(F),
sampleterm(G).
If such space efficient reading is possible, then systems that support a compact internal representation of chars like Scryer and Trealla could even go a tiny step further.
Ah, lest I forget what I have tried: read/1 indeed, but currently it was not ideal.

The following straightforward code is based on Prolog streams.
It focuses on reading "real lists sampletrees" from repositionable streams.
For (1) non-repositionable streams and (2) handling '.'/3 we fall back to read/1.
The main predicate is read_sampleterm/1:
read_sampleterm(Term) :-
current_input(S),
( stream_property(S,reposition(true)),
stream_property(S,position(P)),
( aux_read_sampleterm_1(Term0),
get_char('.') % this is sloppy!
-> true
; set_stream_position(S,P),
fail
)
-> Term = Term0
; read(Term) % fallback
).
Note that above code is sloppy: at the end of the read, we need
to ensure that EOF or a character that does not combine with '.' follows.
The actual reading is done by the following auxiliary predicates:
aux_read_sampleterm_1(Term) :-
get_char(Ch),
aux_read_sampleterm_2(Ch,Term,0). % use indexing
aux_read_sampleterm_2('\'',[X|Xs],N0) :-
get_char('.'),
get_char('\''),
get_char('('),
aux_read_sampleterm_1(X),
get_char(','),
N1 is N0 + 1,
get_char(Ch),
aux_read_sampleterm_2(Ch,Xs,N1).
aux_read_sampleterm_2('[',[],N) :-
get_char(']'),
eat_rparens(N).
aux_read_sampleterm_2(a,a,N) :-
eat_rparens(N).
aux_read_sampleterm_2(b,b,N) :-
eat_rparens(N).
eat_rparens(N) :-
( N > 0
-> get_char(')'),
N0 is N - 1,
eat_rparens(N0)
; true
).
To show some simple use cases we read from files:
read_sampleterm_from_file(File,Term) :-
open(File,read,S,[type(text)]),
current_input(S0),
set_input(S),
read_sampleterm(Term0),
set_input(S0),
close(S),
Term = Term0.
Sample queries using GNU Prolog 1.5.0:
First, sample1.txt:
'.'(a,'.'(b,[])).
We get:
| ?- read_sampleterm_from_file('sample1.txt',T).
T = [a,b]
yes
Next, sample2.txt:
'.'(a,'.'(b,a)).
We get:
| ?- read_sampleterm_from_file('sample2.txt',T).
T = [a,b|a]
yes
sample3.txt is next:
'.'('.'(a,'.'(b,'.'(a,[]))),[]).
We get:
| ?- read_sampleterm_from_file('sample3.txt',T).
T = [[a,b,a]]
(1 ms) yes
Note that running above tests were run without the "fallback option".

The order is important because of cut and regular compound term.
shift --> % 0
state(
s([close|Ts], [term(_,T),dot(0,Cs0)|Ss], CsVs),
s(Ts, [term(chars, Cs)|Ss], CsVs)
),
{ append(Cs0, T, Cs) }.
shift --> % 1
state(
s([close|Ts], [term(_,T),dot(N0,Cs0)|Ss], CsVs),
s(Ts, [dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0), append(Cs0, T, Cs) }.
shift --> % 2
state(
s([close|Ts], [dot(0,Cs)|Ss], CsVs),
s(Ts, [term(chars,Cs)|Ss], CsVs)
).
shift --> % 3
state(
s([close|Ts], [dot(N0,Cs)|Ss], CsVs),
s(Ts, [dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0) }.
shift --> % 4
state(
% s([comma|Ts], [term(atom,A),compound('.',As,As))|Ss], CsVs),
s([comma|Ts], [term(atom,A),compound('.',As,_)|Ss], CsVs),
s(Ts, [dot(0,[A])|Ss], CsVs)
),
% { acyclic_term(As), atom_length(A, 1) },
{ var(As), atom_length(A, 1) },
reduce, !.
shift --> % 5
state(
s([comma|Ts], [term(_,T),dot(0,[A])|Ss], CsVs),
s(Ts, [compound('.',[A,T|As],As)|Ss], CsVs)
).
shift --> % 6
state(
s([comma|Ts], [term(_,T),dot(N0,Cs0)|Ss], CsVs),
s(Ts, [compound('.',[A,T|As],As),dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0), append(Cs, [A], Cs0) }.
shift --> % 7
state(
s([comma|Ts], [dot(0,[A|Cs])|Ss], CsVs),
s(Ts, [compound('.',[A,Cs|As],As)|Ss], CsVs)
).
shift --> % 8
state(
s([comma|Ts], [dot(N0,Cs0)|Ss], CsVs),
s(Ts, [compound('.',[A,Cs1|As],As),dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0),
length(Cs, N0),
append(Cs, [A|Cs1], Cs0)
}.
reduce -->
state(
s(Ts, [dot(0,[A]),dot(N0,Cs0)|Ss], CsVs),
s(Ts, [dot(N,Cs)|Ss], CsVs)
),
{ succ(N0, N), append(Cs0, [A], Cs) }, !.
reduce --> [].
succ(X, S) :-
can_be(not_less_than_zero,X),
can_be(not_less_than_zero,S),
( nonvar(X) -> S is X+1 ; X is S-1, X >= 0 ).
This is the part added to the parser. It has been simplified a bit.
This parser is optimistic and assumes a list of characters is to be parsed when it sees '.'( followed by a character.
For an input like '.'(a,'.'(a,b,b),b), the parser will first build "aa" (chars) then undo it step by step to build the term. The first undo builds '.'(a,b,b) but the parser is still optimistic since [a|'.'(a,b,b)] could be built so "a" is a chars. But an undo is done again to build '.'(a,'.'(a,b,b),b).
There is also the case where the input is '.'(a,'.'(b,[]),c), the parse builds "ab" then needs to undo to build '.'(a,"b",c). The chars "ab" is split with "b" being a chars.
An input more complicated is like '.'(a,'.'(b,'.'(c,[]),d)), the parser builds "abc", then undo by splitting as "a", '.'(b and "c". Now the parser builds '.'(b,"c",d) with "c" being a chars and "a" also being a chars. Finally, it finishes parsing to [a|'.'(b,"c",d)] with [a| and "c" being chars.
It starts with % 4, if the compound term with atom '.' doesn't have any argument and the atom is a character then assumes a list of characters will be parsed and store the character efficiently (dot(0,[A])). Then reduce//0 the dots, on a procedural language the stack could be inspected further instead of using reduce like here.
A finalized chars is built after % 0 or % 2.
The splitting is done with % 6, % 7 and % 8.
The shift % 5 handles '.'(a,T,, it turns "a" into that.
The shift % 3 confirms how much of a chars is actually a chars. In dot(N, Cs), N may not be a chars but the tail is.
The shift % 1 inserts the tails and confirms by decreasing N0.

Lexer (lexer.pl):
:- module(lexer, []).
:- use_module(library(debug)).
:- use_module(library(dcgs)).
:- use_module(library(lists), [append/2, append/3, member/2]).
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Tokens.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
term(Ts) --> tokens(Ts).
% read_term(Ts) --> term(Ts0), end, !, { append(Ts0, [end], Ts) }.
read_term_([end]) --> end, !.
read_term_([T|Ts]) --> token(T), read_term_(Ts).
% Greedy.
tokens([T|Ts]) --> token(T), tokens(Ts).
tokens([]) --> [].
token(name(Cs)) --> name(Cs), !.
token(variable(Cs)) --> variable(Cs), !.
token(float_number(Cs)) --> float_number(Cs), !. % 3
token(integer(Cs)) --> integer(Cs), !. % 2
token(double_quoted_list(Cs)) --> double_quoted_list(Cs), !.
token(open) --> open, !.
token(open_ct) --> open_ct, !.
token(close) --> close_, !.
token(open_list) --> open_list, !.
token(close_list) --> close_list, !.
token(open_curly) --> open_curly, !.
token(close_curly) --> close_curly, !.
token(ht_sep) --> ht_sep, !.
token(comma) --> comma, !.
name(Cs) --> (layout_text_sequence '|' []), !, name_token(Cs).
variable(Cs) --> (layout_text_sequence '|' []), !, variable_token(Cs).
integer(Cs) --> (layout_text_sequence '|' []), !, integer_token(Cs).
float_number(Cs) --> (layout_text_sequence '|' []), !, float_number_token(Cs).
double_quoted_list(Cs) -->
(layout_text_sequence '|' []), !, double_quoted_list_token(Cs).
open --> layout_text_sequence, open_token.
open_ct --> open_token.
close_ --> (layout_text_sequence '|' []), !, close_token.
open_list --> (layout_text_sequence '|' []), !, open_list_token.
close_list --> (layout_text_sequence '|' []), !, close_list_token.
open_curly --> (layout_text_sequence '|' []), !, open_curly_token.
close_curly --> (layout_text_sequence '|' []), !, close_curly_token.
ht_sep --> (layout_text_sequence '|' []), !, head_tail_separator_token.
comma --> (layout_text_sequence '|' []), !, comma_token.
end --> (layout_text_sequence '|' []), !, end_token.
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Layout text.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
layout_text_sequence --> layout_text, layout_texts.
% Greedy.
layout_texts --> layout_text, layout_texts.
layout_texts --> [].
layout_text --> layout_char(_), !.
layout_text --> comment, !.
comment --> single_line_comment, !.
comment --> bracketed_comment, !.
single_line_comment --> end_line_comment_char(_), comment_text, new_line_char(_).
% Greedy. The order is important.
% single_line_comment -->
% end_line_comment_char(_), comment_text, new_line_char(_), !.
% single_line_comment -->
% end_line_comment_char(_), comment_text, [_], !, { false }.
% single_line_comment --> end_line_comment_char(_), comment_text.
bracketed_comment --> comment_open, comment_text, comment_close.
comment_open --> comment_1_char, comment_2_char.
comment_close --> comment_2_char, comment_1_char.
comment_text --> chars(_).
comment_1_char --> "/".
comment_2_char --> "*".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Names.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
name_token(Cs) --> letter_digit_token(Cs), !.
name_token(Cs) --> graphic_token(Cs), !.
name_token(Cs) --> quoted_token(Cs), !.
name_token(Cs) --> semicolon_token(Cs), !.
name_token(Cs) --> cut_token(Cs), !.
letter_digit_token([C|Cs]) --> small_letter_char(C), alphanumeric_chars(Cs).
graphic_token(_) --> ".", layout_char(_), !, { false }.
graphic_token(_) --> ".", end_line_comment_char(_), !, { false }.
graphic_token([C|Cs]) --> graphic_token_char(C), graphic_token_chars(Cs).
% Greedy.
graphic_token_chars([C|Cs]) --> graphic_token_char(C), graphic_token_chars(Cs).
graphic_token_chars([]) --> [].
graphic_token_char(C) --> graphic_char(C), !.
graphic_token_char(C) --> backslash_char(C), !.
quoted_token(Cs) -->
single_quote_char(_),
single_quoted_items(Cs),
single_quote_char(_).
% Greedy.
single_quoted_items(Cs) -->
single_quoted_item(Cs0),
single_quoted_items(Cs1),
{ append(Cs0, Cs1, Cs) }.
single_quoted_items([]) --> [].
single_quoted_item([C]) --> single_quoted_character(C), !.
single_quoted_item([]) --> continuation_escape_sequence, !.
continuation_escape_sequence --> backslash_char(_), new_line_char(_).
semicolon_token([C]) --> semicolon_char(C).
cut_token([C]) --> cut_char(C).
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Quoted characters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
single_quoted_character(C) --> non_quote_char(C), !.
single_quoted_character(C) --> single_quote_char(C), single_quote_char(C), !.
single_quoted_character(C) --> double_quote_char(C), !.
single_quoted_character(C) --> back_quote_char(C), !.
double_quoted_character(C) --> non_quote_char(C), !.
double_quoted_character(C) --> single_quote_char(C), !.
double_quoted_character(C) --> double_quote_char(C), double_quote_char(C), !.
double_quoted_character(C) --> back_quote_char(C), !.
back_quoted_character(C) --> non_quote_char(C), !.
back_quoted_character(C) --> single_quote_char(C), !.
back_quoted_character(C) --> double_quote_char(C), !.
back_quoted_character(C) --> back_quote_char(C), back_quote_char(C), !.
non_quote_char(C) --> graphic_char(C), !.
non_quote_char(C) --> alphanumeric_char(C), !.
non_quote_char(C) --> solo_char(C), !.
non_quote_char(C) --> space_char(C), !.
non_quote_char(C) --> meta_escape_sequence(C), !.
non_quote_char(C) --> control_escape_sequence(C), !.
non_quote_char(C) --> octal_escape_sequence(C), !.
non_quote_char(C) --> hexadecimal_escape_sequence(C), !.
meta_escape_sequence(C) --> backslash_char(_), meta_char(C0),
{ member(C0-C, [('\\')-('\\'), ''''-'''', '"'-'"', '`'-'`']), ! }.
control_escape_sequence(C) --> backslash_char(_), symbolic_control_char(C0),
{ member(
C0-C,
[
'a'-'\a',
'b'-'\b',
'r'-'\r',
'f'-'\f',
't'-'\t',
'n'-'\n',
'v'-'\v'
]
), !
}.
symbolic_control_char(C) --> symbolic_alert_char(C), !.
symbolic_control_char(C) --> symbolic_backspace_char(C), !.
symbolic_control_char(C) --> symbolic_carriage_return_char(C), !.
symbolic_control_char(C) --> symbolic_form_feed_char(C), !.
symbolic_control_char(C) --> symbolic_horizontal_tab_char(C), !.
symbolic_control_char(C) --> symbolic_new_line_char(C), !.
symbolic_control_char(C) --> symbolic_vertical_tab_char(C), !.
symbolic_alert_char('a') --> "a".
symbolic_backspace_char('b') --> "b".
symbolic_carriage_return_char('r') --> "r".
symbolic_form_feed_char('f') --> "f".
symbolic_horizontal_tab_char('t') --> "t".
symbolic_new_line_char('n') --> "n".
symbolic_vertical_tab_char('v') --> "v".
octal_escape_sequence(C) -->
backslash_char(_),
octal_digit_char(C0),
octal_digit_chars(Cs),
backslash_char(_),
{ number_chars(N, ['0', 'o', C0|Cs]),
char_code(C, N)
}.
hexadecimal_escape_sequence(C) -->
backslash_char(_),
symbolic_hexadecimal_char(C0),
hexadecimal_digit_char(C1),
hexadecimal_digit_chars(Cs),
backslash_char(_),
{ number_chars(N, ['0', C0, C1|Cs]),
char_code(C, N)
}.
symbolic_hexadecimal_char('x') --> "x".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Variables.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
variable_token(Cs) --> named_variable(Cs), !. % 1
variable_token(Cs) --> anonymous_variable(Cs), !. % 0
anonymous_variable([C]) --> variable_indicator_char(C).
named_variable([C0, C1|Cs]) -->
variable_indicator_char(C0), !,
alphanumeric_char(C1),
alphanumeric_chars(Cs).
named_variable([C|Cs]) -->
capital_letter_char(C), !,
alphanumeric_chars(Cs).
variable_indicator_char(C) --> underscore_char(C).
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Integer numbers.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
integer_token(Cs) --> character_code_constant(Cs), !.
integer_token(Cs) --> binary_constant(Cs), !.
integer_token(Cs) --> octal_constant(Cs), !.
integer_token(Cs) --> hexadecimal_constant(Cs), !.
integer_token(Cs) --> integer_constant(Cs), !.
integer_constant([C|Cs]) --> decimal_digit_char(C), decimal_digit_chars(Cs).
character_code_constant(['0', C0, C]) -->
"0", single_quote_char(C0), single_quoted_character(C).
binary_constant(Cs) -->
binary_constant_indicator(Cs0),
binary_digit_char(C),
binary_digit_chars(Cs1),
{ append(Cs0, [C|Cs1], Cs) }.
binary_constant_indicator("0b") --> "0b".
octal_constant(Cs) -->
octal_constant_indicator(Cs0),
octal_digit_char(C),
octal_digit_chars(Cs1),
{ append(Cs0, [C|Cs1], Cs) }.
octal_constant_indicator("0o") --> "0o".
hexadecimal_constant(Cs) -->
hexadecimal_constant_indicator(Cs0),
hexadecimal_digit_char(C),
hexadecimal_digit_chars(Cs1),
{ append(Cs0, [C|Cs1], Cs) }.
hexadecimal_constant_indicator("0x") --> "0x".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Floating point numbers.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
float_number_token(Cs) -->
integer_constant(Cs0),
fraction(Cs1),
exponent(Cs2),
{ append([Cs0, Cs1, Cs2], Cs) }.
fraction([C0, C1|Cs]) -->
decimal_point_char(C0),
decimal_digit_char(C1),
decimal_digit_chars(Cs).
% Greedy.
exponent([C|Cs]) --> exponent_char(C), sign(Cs0), integer_constant(Cs1), !,
{ append(Cs0, Cs1, Cs) }.
exponent([]) --> [].
% Greedy.
sign([C]) --> negative_sign_char(C), !.
sign([C]) --> positive_sign_char(C), !.
sign([]) --> [].
positive_sign_char('+') --> "+".
negative_sign_char('-') --> "-".
decimal_point_char('.') --> ".".
exponent_char(C) --> [C], { member(C, "eE"), ! }.
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Double quoted lists.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
double_quoted_string(Cs) -->
double_quote_char(_),
double_quoted_item(Cs0),
double_quoted_char(_),
append(["""", Cs0, """"], Cs).
% Greedy.
double_quoted_items(Cs) -->
double_quoted_item(Cs0), double_quoted_items(Cs1),
{ append(Cs0, Cs1, Cs) }.
double_quoted_items([]) --> [].
double_quoted_item([C]) --> double_quoted_character(C), !.
double_quoted_item([]) --> continuation_escape_sequence, !.
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Double quoted lists.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
double_quoted_list_token(Cs) -->
double_quote_char(C),
double_quoted_items(Cs),
double_quote_char(C).
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Back quoted strings.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
back_quoted_string -->
back_quote_char,
back_quoted_items,
back_quoted_char.
% Greedy.
back_quoted_items --> back_quoted_item, back_quoted_items.
back_quoted_items --> [].
back_quoted_item --> back_quoted_character, !.
back_quoted_item --> continuation_escape_sequence, !.
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Other tokens.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
open_token --> open_char(_).
close_token --> close_char(_).
open_list_token --> open_list_char(_).
close_list_token --> close_list_char(_).
open_curly_token --> open_curly_char(_).
close_curly_token --> close_curly_char(_).
head_tail_separator_token --> head_tail_separator_char(_).
comma_token --> comma_char(_).
% The order is important.
% Greedy. TODO: Find better.
end_token, [C] --> end_char, layout_char(C), !.
end_token, "%" --> end_char, end_line_comment_char(_), !.
end_token --> end_char, [_], !, { false }.
end_token --> end_char.
end_char --> ".".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processor character set.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
% Not greedy.
chars([]) --> [].
chars([C|Cs]) --> char(C), chars(Cs).
char(C) --> graphic_char(C), !.
char(C) --> alphanumeric_char(C), !.
char(C) --> solo_char(C), !.
char(C) --> layout_char(C), !.
char(C) --> meta_char(C), !.
char(C) --> [C], { write('Accepting: \''), write(C), write(''''), nl }.
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Graphic characters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
graphic_char(C) --> [C], { member(C, "#$&*+-./:<=>?#^~"), ! }.
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Alphanumeric characters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
% Greedy.
alphanumeric_chars([C|Cs]) -->
alphanumeric_char(C),
alphanumeric_chars(Cs).
alphanumeric_chars([]) --> [].
alphanumeric_char(C) --> alpha_char(C), !.
alphanumeric_char(C) --> decimal_digit_char(C), !.
alpha_char(C) --> underscore_char(C), !.
alpha_char(C) --> letter_char(C), !.
letter_char(C) --> capital_letter_char(C), !.
letter_char(C) --> small_letter_char(C), !.
% Greedy.
decimal_digit_chars([C|Cs]) --> decimal_digit_char(C), decimal_digit_chars(Cs).
decimal_digit_chars([]) --> [].
% Greedy.
binary_digit_chars([C|Cs]) --> binary_digit_char(C), binary_digit_chars(Cs).
binary_digit_chars([]) --> [].
% Greedy.
octal_digit_chars([C|Cs]) --> octal_digit_char(C), octal_digit_chars(Cs).
octal_digit_chars([]) --> [].
% Greedy.
hexadecimal_digit_chars([C|Cs]) -->
hexadecimal_digit_char(C), hexadecimal_digit_chars(Cs).
hexadecimal_digit_chars([]) --> [].
small_letter_char(C) --> [C], { member(C, "abcdefghijklmnopqrstuvwxyz"), ! }.
capital_letter_char(C) --> [C], { member(C, "ABCDEFGHIJKLMNOPQRSTUVWXYZ"), ! }.
decimal_digit_char(C) --> [C], { member(C, "0123456789"), ! }.
binary_digit_char(C) --> [C], { member(C, "01"), ! }.
octal_digit_char(C) --> [C], { member(C, "01234567"), ! }.
hexadecimal_digit_char(C) --> [C], { member(C, "0123456789AaBbCcDdEeFf"), ! }.
underscore_char('_') --> "_".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Solo characters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
solo_char(C) --> cut_char(C), !.
solo_char(C) --> open_char(C), !.
solo_char(C) --> close_char(C), !.
solo_char(C) --> comma_char(C), !.
solo_char(C) --> semicolon_char(C), !.
solo_char(C) --> open_list_char(C), !.
solo_char(C) --> close_list_char(C), !.
solo_char(C) --> open_curly_char(C), !.
solo_char(C) --> close_curly_char(C), !.
solo_char(C) --> head_tail_separator_char(C), !.
solo_char(C) --> end_line_comment_char(C), !.
cut_char('!') --> "!".
open_char('(') --> "(".
close_char(')') --> ")".
comma_char((',')) --> ",".
semicolon_char(';') --> ";".
open_list_char('[') --> "[".
close_list_char(']') --> "]".
open_curly_char('{') --> "{".
close_curly_char('}') --> "}".
head_tail_separator_char('|') --> "|".
end_line_comment_char('%') --> "%".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Layout characters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
layout_char(C) --> space_char(C), !.
layout_char(C) --> horizontal_tab_char(C), !.
layout_char(C) --> new_line_char(C), !.
space_char(' ') --> " ".
horizontal_tab_char('\t') --> "\t".
new_line_char('\n') --> "\n".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Meta characters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
meta_char(C) --> backslash_char(C), !.
meta_char(C) --> single_quote_char(C), !.
meta_char(C) --> double_quote_char(C), !.
meta_char(C) --> back_quote_char(C), !.
backslash_char('\\') --> "\\".
single_quote_char('''') --> "'".
double_quote_char('"') --> """".
back_quote_char('`') --> "`".
Parser (parser.pl):
:- module(parser, []).
:- use_module(library(debug)).
:- use_module(library(dcgs)).
:- use_module(library(error)).
:- use_module(library(format)).
:- use_module(library(lists)).
state(S), [S] --> [S].
state(S0, S), [S] --> [S0].
print --> state(s(Ts, Ss, CsVs)), { format("~q ~q ~q\n", [Ts, Ss, CsVs]) }.
print(Cs) -->
state(s(Ts, Ss, CsVs)),
{ format("~s ~q ~q ~q\n", [Cs, Ts, Ss, CsVs]) }.
lookahead(T) --> state(s([T|_], _, _)).
shift --> state(s([name(Cs)|Ts], Ss, CsVs), s(Ts, [term(atom,A)|Ss], CsVs)),
{ atom_chars(A, Cs) }.
shift -->
state(
s([open_list,close_list|Ts], Ss, CsVs),
s(Ts, [term(atom,[])|Ss], CsVs)
).
shift -->
state(
s([open_curly,close_curly|Ts], Ss, CsVs),
s(Ts, [term(atom,{})|Ss], CsVs)
).
shift -->
state(s([variable(Cs)|Ts], Ss, CsVs0), s(Ts, [term(variable,V)|Ss], CsVs)),
{ variable(Cs, CsVs0, V, CsVs) }.
shift -->
state(
s([integer(Cs)|Ts], [term(atom,-)|Ss], CsVs),
s(Ts, [term(integer,N)|Ss], CsVs)
), !,
{ number_chars(N, [-|Cs]) }.
shift -->
state(
s([float_number(Cs)|Ts], [term(atom,-)|Ss], CsVs),
s(Ts, [term(float,N)|Ss], CsVs)
), !,
{ number_chars(N, [-|Cs]) }.
shift -->
state(
s([integer(Cs)|Ts], Ss, CsVs),
s(Ts, [term(integer,N)|Ss], CsVs)
),
{ number_chars(N, Cs) }.
shift -->
state(
s([float_number(Cs)|Ts], Ss, CsVs),
s(Ts, [term(float,N)|Ss], CsVs)
),
{ number_chars(N, Cs) }.
shift -->
state(
s([open_ct|Ts], [term(atom,A)|Ss], CsVs),
s(Ts, [compound(A,As,As)|Ss], CsVs)
), !.
shift --> state(s([open|Ts], Ss, CsVs), s(Ts, [open|Ss], CsVs)).
shift --> state(s([open_ct|Ts], Ss, CsVs), s(Ts, [open_ct|Ss], CsVs)).
% /*
shift -->
state(
s([close|Ts], [term(_,T),dot(0,Cs0)|Ss], CsVs),
s(Ts, [term(chars, Cs)|Ss], CsVs)
),
{ append(Cs0, T, Cs) }.
shift -->
state(
s([close|Ts], [term(_,T),dot(N0,Cs0)|Ss], CsVs),
s(Ts, [dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0), append(Cs0, T, Cs) }.
shift -->
state(
s([close|Ts], [dot(0,Cs)|Ss], CsVs),
s(Ts, [term(chars,Cs)|Ss], CsVs)
).
shift -->
state(
s([close|Ts], [dot(N0,Cs)|Ss], CsVs),
s(Ts, [dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0) }.
% */
shift -->
state(
s([close|Ts], [term(_,T0),compound(A,As,[T0])|Ss], CsVs),
s(Ts, [term(compound,T)|Ss], CsVs)
),
{ T =.. [A|As] }.
shift -->
state(
s([close|Ts], [term(_,T),open|Ss], CsVs),
s(Ts, [term(compound,T)|Ss], CsVs)
).
shift -->
state(
s([close|Ts], [term(_,T),open_ct|Ss], CsVs),
s(Ts, [term(compound,T)|Ss], CsVs)
).
% /*
shift -->
state(
% s([comma|Ts], [term(atom,A),compound('.',As,As))|Ss], CsVs),
s([comma|Ts], [term(atom,A),compound('.',As,_)|Ss], CsVs),
s(Ts, [dot(0,[A])|Ss], CsVs)
),
% { acyclic_term(As), atom_length(A, 1) },
{ var(As), atom_length(A, 1) },
reduce, !.
shift -->
state(
s([comma|Ts], [term(_,T),dot(0,[A])|Ss], CsVs),
s(Ts, [compound('.',[A,T|As],As)|Ss], CsVs)
).
shift -->
state(
s([comma|Ts], [term(_,T),dot(N0,Cs0)|Ss], CsVs),
s(Ts, [compound('.',[A,T|As],As),dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0), append(Cs, [A], Cs0) }.
shift -->
state(
s([comma|Ts], [dot(0,[A|Cs])|Ss], CsVs),
s(Ts, [compound('.',[A,Cs|As],As)|Ss], CsVs)
).
shift -->
state(
s([comma|Ts], [dot(N0,Cs0)|Ss], CsVs),
s(Ts, [compound('.',[A,Cs1|As],As),dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0),
length(Cs, N0),
append(Cs, [A|Cs1], Cs0)
}.
% */
shift -->
state(
s([comma|Ts], [term(_,T),compound(A,As0,[T|As])|Ss], CsVs),
s(Ts, [compound(A,As0,As)|Ss], CsVs)
).
reduce -->
state(
s(Ts, [dot(0,[A]),dot(N0,Cs0)|Ss], CsVs),
s(Ts, [dot(N,Cs)|Ss], CsVs)
),
{ succ(N0, N), append(Cs0, [A], Cs) }, !.
reduce --> [].
read_term_ -->
% print,
lookahead(end), !,
accept.
read_term_ -->
shift, !,
read_term_.
accept --> state(s([end], [term(_,_)], _)), !.
accept -->
print("End"), { false }.
% { throw(error(reduction(imcomplete), read_term//0)) }.
variable("_", CsVs, _, CsVs) :- !.
variable(Cs, CsVs, V, CsVs) :-
member(Cs-V, CsVs), !.
variable(Cs, CsVs, V, [Cs-V|CsVs]).
succ(X, S) :-
can_be(not_less_than_zero,X),
can_be(not_less_than_zero,S),
( nonvar(X) -> S is X+1 ; X is S-1, X >= 0 ).
sml(S, M, Xs0, Xs) :-
'$skip_max_list'(S, M, Xs0, Xs).
The initialization file init.pl:
:- use_module(library(debug)).
:- use_module(library(charsio)).
:- use_module(library(dcgs)).
:- use_module(library(dif)).
:- use_module(library(lists)).
:- use_module(library(iso_ext)).
:- use_module(library(pairs)).
:- use_module(library(pio)).
:- use_module(library(format)).
:- use_module(lexer).
:- use_module(parser).
parse(Cs, Ts, Ss0, S) :-
phrase(lexer:read_term_(Ts), Cs),
Ss0 = [s(Ts, [], [])|_],
phrase(parser:read_term_, Ss0, [S]).
sample(a) --> [a].
sample([]) --> [[]].
sample('.'(E)) --> ['.'], sample(E).
sample('.'(E,Es)) --> ['.'], sample(E), sample(Es).
sample('.'(E,F,G)) --> ['.'], sample(E), sample(F), sample(G).
sample('.'(E,F,G,H)) --> ['.'], sample(E), sample(F), sample(G), sample(H).
generate(T, N) :-
length(Cs, N),
phrase(sample(T), Cs).
test :-
once(generate('.'(a,'.'(a,'.'(a,'.'(a,[]))),a), _)),
% N = 72849,
call_nth(user:generate(T, _), N),
( N mod 2^10 =:= 0, writeq(N), nl, false ; true ),
% T = '.'('.'(a,[]),[]),
% T = '.'('.'(aa,'.'(b,'.'(c,[]))),[]),
write_term_to_chars(T, [quoted(true),ignore_ops(true)], Cs0),
append(Cs0, ".", Cs),
( parse(Cs, _, _, _) ->
parse(Cs, Ts, Ss, S),
S = s(_,[term(_,T0)],_),
T0 \== T,
format("N = ~q,\nCs = ~q,\nTs = ~q,\nSs = ~q,\nS = ~q.\n\n", [N,Cs,Ts,Ss,S])
; format("N = ~q,\nCs = ~q,\nT = ~q.\n\n", [N,Cs,T])
),
halt.
test :-
halt.
dif(V, S0, V) :-
dif(V, S0).
ns_ --> "(".
ns_ --> ")".
ns_ --> "'.'".
ns_ --> "a".
ns --> [].
ns --> ns_, ns.
nonsense(Cs) :-
length(Cs, _),
[C0|Cs0] = Cs,
foldl(dif, Cs0, C0, _),
phrase(ns, Cs).
nonsense :-
call_nth(nonsense(Cs0), N),
( N mod 2^10 =:= 0, writeq(N), nl, false ; true ),
append(Cs0, ".", Cs),
( parse(Cs, Ts, Ss, S),
\+ catch(read_from_chars(Cs, _), error(syntax_error(_),_), false),
format("N = ~q,\nCs = ~q,\nTs = ~q,\nSs = ~q,\nS = ~q.\n\n", [N,Cs,Ts,Ss,S])
; catch(read_from_chars(Cs, T), error(syntax_error(_),_), false),
\+ parse(Cs, Ts, Ss, S),
\+ member(N, [161,749,822,3819]),
format("N = ~q,\nCs = ~q,\nT = ~q.\n\n", [N,Cs,T])
),
halt.
nonsense :-
halt.
Using Scryer Prolog with scryer-prolog init -g test. It can also be queried like ?- parse("'.'(a,[]).", Ts, Ss, S).. By uncommenting print//0 in read_term_//0 in parser.pl (or using state(S0, S), [S] --> [S0,S].), the transition can be visualized.
In the step by step, dot/2 and term(chars,_) use a compact internal representation.
Some information is lost in this implementation (can't be seeing but present), example when parsing '.'('.'(a,[]),[]).:
...
[close,comma,open_list,close_list,close,end] [term(atom,[]),dot(0,"a"),compound('.',A,A)] []
[comma,open_list,close_list,close,end] [term(chars,"a"),compound('.',A,A)] []
[open_list,close_list,close,end] [compound('.',["a"|A],A)] [] % "a" still compactly stored.
...
Some grammar rules of shift//0 can be commented to fall back to the unoptimized parser.
The space complexity is O(ln N) where N is the size of the list of characters (not token nor input).
In the worst case ([], '.'(a,a([])), '.'(a,a('.'(a,a([])))), etc), the space complexity is O(N ln N) where N is the number of tokens.
In the best case ([], '.'(a,[]), '.'(a,'.'(a,[])), etc), the space complexity is O(ln N) where N is the number of tokens.

Related

Exctraction of the Proof from Fitting's leanTap Prolog Prover

Here is the SWI-Prolog code of Fitting's leanTap revisited:
:- use_module(library(lists)).
:- use_module(library(statistics)).
% :- use_module(library(dom)).
% operator definitions (TPTP syntax)
:- op( 500, fy, ~). % negation
:- op(1000, xfy, &). % conjunction
:- op(1100, xfy, '|'). % disjunction
:- op(1110, xfy, =>). % conditional
:- op(1120, xfy, <=>). % biconditional
/*
Next, a classification of formula types,
& instances.
*/
type(X & Y, conj, X, Y).
type(~(X & Y), disj, ~ X, ~ Y).
type(X | Y, disj, X, Y).
type(~(X | Y), conj, ~ X, ~ Y).
type(X => Y, disj, ~ X, Y).
type(~(X => Y), conj, X, ~ Y).
type(X <=> Y, disj, X & Y, ~ X & ~ Y).
type(~(X <=> Y), disj, X & ~ Y, ~ X & Y).
type(~ (~ (X)), doub, X, _).
/*
Now the heart of the matter.
thm(Lambda, Gamma) :-
the sequent Lambda --> Gamma is provable.
*/
thm(Lambda, [Doubleneg | Gamma]) :-
type(Doubleneg, doub, X, _), !,
thm(Lambda, [X | Gamma]).
thm(Lambda, [Beta | Gamma]) :-
type(Beta, disj, Beta1, Beta2), !,
thm(Lambda, [Beta1, Beta2 | Gamma]).
thm(Lambda, [Alpha | Gamma]) :-
type(Alpha, conj, Alpha1, Alpha2), !,
thm(Lambda, [Alpha1 | Gamma]), !,
thm(Lambda, [Alpha2 | Gamma]).
thm([L1|Lambda], [L2|_]) :-
(
L1 = L2, !
;
thm(Lambda, [L2])
).
thm(Lambda, [~ L | Gamma]) :-
thm([L | Lambda], Gamma), !.
thm(Lambda, [L | Gamma]) :-
thm([~ L | Lambda], Gamma), !.
/*
Finally, the driver.
*/
prove(X) :-
time(thm([], [X])).
This code according to Fitting provides a sequent calculus. I have tried to change minimally this code to get a Prolog Print of each proof, with input prove(X, Proof), following the structure of Jen Otten's prover (online here
and here):
% -----------------------------------------------------------------
% leanseq.pl - A sequent calculus prover implemented in Prolog
% -----------------------------------------------------------------
:- use_module(library(lists)).
% operator definitions (TPTP syntax)
:- op( 500, fy, ~). % negation
:- op(1000, xfy, &). % conjunction
:- op(1100, xfy, '|'). % disjunction
:- op(1110, xfy, =>). % implication
% -----------------------------------------------------------------
provable(F, P) :- time(prove([] > [F], P)).
% -----------------------------------------------------------------
% axiom
prove(G > D, ax(G > D, A)) :- member(A,G), member(B,D), A == B, !.
% conjunction
prove(G > D, land(G > D, P) ) :- select1( (A & B) ,G,G1), !,
prove([A , B | G1] > D, P).
prove(G > D, rand(G > D, P1,P2)) :- select1( (A & B) ,D,D1), !,
prove(G > [A|D1], P1), prove(G > [B|D1], P2).
% disjunction
prove(G > D, lor(G > D, P1,P2)) :- select1((A | B),G,G1), !,
prove([A|G1] > D, P1), prove([B|G1] > D, P2).
prove(G > D, ror(G > D, P)) :- select1( (A | B),D,D1), !,
prove(G > [A,B|D1], P ).
% implication
prove(G > D, limpl(G > D, P1,P2)) :- select1((A => B),G,G1), !,
prove(G1 > [A|D], P1), prove([B|G1] > D, P2).
prove(G > D, rimpl(G > D, P)) :- select1((A => B),D,D1), !,
prove([A|G] > [B|D1], P).
% negation
prove(G > D, lneg(G > D, P)) :- select1( ~A,G,G1), !,
prove(G1 > [A|D], P).
prove(G > D, rneg(G > D, P)) :- select1(~A ,D,D1), !,
prove([A|G] > D1, P).
% -----------------------------------------------------------------
select1(X,L,L1) :- append(L2,[X|L3],L), append(L2,L3,L1).
% -----------------------------------------------------------------
For example :
provable((p => p), Proof).
% 22 inferences, 0.000 CPU in 0.000 seconds (95% CPU, 1132503 Lips)
Proof = rimpl([]>[(p=>p)], ax([p]>[p], p))
But all my tentatives to get from Fitting's prover (that is complete) a prover that provides a proof like Proof above have failed. Any help that could put me on the right track would be appreciated.
The Fitting code has some silly placement of cuts,
generating spurious choice points, and an unnecessary
recursion redoing all the pattern matching, instead of
directly using member/2. If you implement it more closely to
the original Wang McCarthy from the LISP 1.5 Manual at
page 44 ff, you get a little bit more speed:
/* Fitting */
?- time((between(1,100,_), test, fail; true)).
% 3,358,200 inferences, 0.297 CPU in 0.295 seconds (101% CPU, 11311832 Lips)
true.
/* Wang McCarthy */
?- time((between(1,100,_), test2, fail; true)).
% 2,802,900 inferences, 0.203 CPU in 0.209 seconds (97% CPU, 13798892 Lips)
true.
To arrive at Wang McCarthy replace this here from Fitting:
/* Fitting */
thm([L1|Lambda], [L2|_]) :-
(
L1 = L2, !
;
thm(Lambda, [L2])
).
thm(Lambda, [~ L | Gamma]) :-
thm([L | Lambda], Gamma), !.
thm(Lambda, [L | Gamma]) :-
thm([~ L | Lambda], Gamma), !.
By this here:
/* Wang McCarthy */
thm2(Lambda, [L|_]) :- member(L, Lambda), !.
thm2(Lambda, [~ L | Gamma]) :- !,
thm2([L | Lambda], Gamma).
thm2(Lambda, [L | Gamma]) :-
thm2([~ L | Lambda], Gamma).
As a test case I was running a collection of
principia mathematica tautologies.
The following solution works smoothly and is very fast, with label for sequent rules corresponding to Fitting's sequent calculus that Fitting calls dirseq :
:- use_module(library(lists)).
:- use_module(library(statistics)).
% :- use_module(library(dom)).
% operator definitions (TPTP syntax)
:- op( 500, fy, ~). % negation
:- op(1000, xfy, &). % conjunction
:- op(1100, xfy, '|'). % disjunction
:- op(1110, xfy, =>). % conditional
:- op(1120, xfy, <=>). % biconditional
/*
Next, a classification of formula types,
& instances.
*/
type((X & Y), conj, X, Y).
type(~((X | Y)), conj, ~ X, ~ Y).
type(~((X => Y)), conj, X, ~ Y).
type((X <=> Y), conj, (~ X | Y), (X | ~ Y)).
type(~((X <=> Y)), conj, (X | Y), (~ X | ~ Y)).
type(~ (~ (X)), doub, X, _).
type((X => Y), disj, ~ X, Y).
type(~((X & Y)), disj, ~ X, ~ Y).
type((X | Y), disj, X, Y).
/*
Now the heart of the matter.
thm(Lambda, Gamma) :-
the sequent Lambda --> Gamma is provable.
*/
thm(Lambda > [Alpha | Gamma], R) :-
type(Alpha, conj, Alpha1, Alpha2), !,
thm(Lambda > [Alpha1 | Gamma],P), !,
thm(Lambda > [Alpha2 | Gamma],Q),
R = alpha(Lambda > [Alpha | Gamma],(P & Q)).
thm(Lambda > [Beta | Gamma], R) :-
type(Beta, disj, Beta1, Beta2), !,
thm(Lambda > [Beta1, Beta2 | Gamma],P),
R = beta(Lambda > [Beta | Gamma], P).
thm(Lambda > [Doubleneg | Gamma], R) :-
type(Doubleneg, doub, X, Gamma), !,
thm(Lambda > [X | Gamma], P),
R = dn(Lambda > [Doubleneg | Gamma], P).
thm(Lambda > [L|Gamma], R) :-
member(L, Lambda), !,
R = ax(Lambda > [L|Gamma], ax).
thm(Lambda > [~ L | Gamma], R) :- !,
thm([L | Lambda] > Gamma, P),
R = duality(Lambda > [~ L | Gamma], P).
thm(Lambda > [L | Gamma], R) :-
thm([~ L | Lambda] > Gamma, P),
R = duality(Lambda > [L | Gamma], P).
/*
Finally, the driver.
*/
provable(X, R) :-
time(thm([] > [X], R)).
Many thanks for the help that I have received !
Interestingly you can easily add look-ahead (forward
checking, unit propagation) to Melvin Fittings prover.
Just take this end-phase:
/* Fitting */
thm([L1|Lambda], [L2|_]) :-
(
L1 = L2, !
;
thm(Lambda, [L2])
).
thm(Lambda, [~ L | Gamma]) :-
thm([L | Lambda], Gamma), !.
thm(Lambda, [L | Gamma]) :-
thm([~ L | Lambda], Gamma), !.
And replace it by this end-phase:
/* Fitting + fCube Simplification */
thm(_, [1 | _)] :- !.
thm(_, [0 | Gamma]) :- !,
thm2(_, Gamma).
thm2(_, [L| Gamma]) :-
opposite2(L, R),
reduce(Gamma, R, Gamma2),
thm2(_, Gamma2).
As can be seen in the above, the list Lambda
is even not anymore used. The predicate reduce/2 is
supposed to partially evaluate the list Gamma, under
the assumption that R is true. Here are some timings
for the test case SYN007+1.014.p:
/* Fitting */
% 12,779,502 inferences, 0.813 CPU in 0.826 seconds (98% CPU, 15728618 Lips)
/* Fitting + fCube Simplification */
% 1,203,958 inferences, 0.109 CPU in 0.098 seconds (112% CPU, 11007616 Lips)

run length encoding using DCGs

problem from: https://web.archive.org/web/20200718175929/http://www.pathwayslms.com/swipltuts/dcg/
Use a dcg to convert a sparse sequence like
[0,0,0,0,0,0,7,4,3,0,0,0,0,0,0,0,8,9,14,0,0,0,0....]
to
[zero(6), 7,4,3, zero(7), 8,9,14, ...]
i feel like i understand the material on this page up to here but don't really know how to start this one. any pointers would be appreciated
Try doing something like this:
code([X|Xs]) --> item(X), code(Xs).
code([]) --> [].
item(X) --> [0], zeros(1, X).
item(X) --> [X], { X \= 0 }.
zeros(M, N) --> [0], { M1 is M + 1 }, zeros(M1, N).
zeros(N, zero(N)) --> \+ [0].
Example:
?- phrase(code(C), [0,0,0,0,0,0,7,4,3,0,0,0,0,0,0,0,8,9,14,0,0,0,0]).
C = [zero(6), 7, 4, 3, zero(7), 8, 9, 14, zero(4)] ;
false.
Alternate style:
% For eos
:- use_module(library(dcg/basics)).
list_rle(Lst, LstRle) :-
must_be(list, Lst),
phrase(rle, Lst, LstRle),
% The first is correct
!.
% Check for list of zeros
rle, [zero(N)] --> [0], zeros(1, N), rle.
% Accept anything otherwise
rle, [C] --> [C], rle.
% Being greedy - check for end last
% eos means no more input - better than [] which can leave a remainder
rle --> eos.
zeros(N, Tot) --> [0], { N1 is N + 1 }, zeros(N1, Tot).
% Being greedy - check for end last
zeros(N, N) --> [].
Result in swi-prolog:
?- time(list_rle([0,0,0,0,0,0,7,4,3,0,0,0,0,0,0,0,8,9,14,0,0,0,0], L)).
% 39 inferences, 0.000 CPU in 0.000 seconds (92% CPU, 545073 Lips)
L = [zero(6),7,4,3,zero(7),8,9,14,zero(4)].
Another minor variant which can also be used as a generator:
rle --> eos.
rle, [zero(N)] --> [0], zeros(1, N), rle.
rle, [C] --> [C], { dif(C, 0) }, rle.
zeros(N, Tot) --> [0], !, { N1 is N + 1 }, zeros(N1, Tot).
zeros(N, N) --> [].
Result:
?- length(L, _), list_rle(L, LR).
L = LR, LR = [] ;
L = [0],
LR = [zero(1)] ;
L = LR, LR = [_A],
dif(_A,0) ;
L = [0,0],
LR = [zero(2)] ;
L = [_A,0],
LR = [_A,zero(1)],
dif(_A,0) ;
L = LR, LR = [_A,_B],
dif(_A,0),
dif(_B,0) ;
L = [0,0,0],
LR = [zero(3)] ;
L = [_A,0,0],
LR = [_A,zero(2)],
dif(_A,0) ;

Translation to DCG Semicontext not working - follow on

As a follow up to this question which poses the problem
Return count of items in a list but if two identical items are next to each other then don't increment the count.
This code is the closest I came to solving this with DCG and semicontext.
lookahead(C),[C] -->
[C].
% empty list
% No lookahead needed because last item in list.
count_dcg(N,N) --> [].
% single item in list
% No lookahead needed because only one in list.
count_dcg(N0,N) -->
[_],
\+ [_],
{ N is N0 + 1 }.
% Lookahead needed because two items in list and
% only want to remove first item.
count_dcg(N0,N) -->
[C1],
lookahead(C2),
{ C1 == C2 },
count_dcg(N0,N).
% Lookahead needed because two items in list and
% only want to remove first item.
count_dcg(N0,N) -->
[C1],
lookahead(C2),
{
C1 \== C2,
N1 is N0 + 1
},
count_dcg(N1,N).
count(L,N) :-
DCG = count_dcg(0,N),
phrase(DCG,L).
What is the correct way to solve the problem using DCG with semicontext on the clause head?
Would like to know if the variation with the semicontext on the clause head is possible or not. If possible then working example code is desired, if not possible then an explanation is desired.
I think this is using semi context notation correctly. I am counting using 0,s(0),...
% Constraint Logic Programming
:- use_module(library(dif)). % Sound inequality
:- use_module(library(clpfd)). % Finite domain constraints
list([]) --> [].
list([L|Ls]) --> [L], list(Ls).
state(S), [state(S)] --> [state(S)].
state(S, s(S)), [state(s(S))] --> [state(S)].
keep_state(S,I),[state(S)] --> [state(S)],[I].
end_state(S) -->[state(S)],[].
lookahead(C),[S,C] -->
[S,C].
count_dcg(S,S) -->
state(S), %might not need this
end_state(S).
/* Can be used get the length of a list
count_dcg(S,S2) -->
state(S,S1),
keep_state(S1,_),
count_dcg(S1,S2),
{}.
*/
%last item.
count_dcg(S,S1) -->
state(S,S1),
keep_state(S1,_C),
list(R),
{R = [state(_)]}.
%Two the same dont increase state
count_dcg(S,S1) -->
state(S), %might not need this
keep_state(S,C1),
lookahead(C1),
count_dcg(S,S1).
%Two different increase state
count_dcg(S,S2) -->
state(S,S1),
keep_state(S1,C1),
lookahead(C2),
{
dif(C1,C2)
},
count_dcg(S1,S2).
count(L,S) :-
phrase(count_dcg(0,S),[state(0)|L]).
This does not work as well as I hoped for cases like:
65 ?- count([a,b,X,c],L).
X = b,
L = s(s(s(0))) ;
;
X = c,
L = s(s(s(0))) .
You can convert peano with:
natsx_int(0, 0).
natsx_int(s(N), I1) :-
I1 #> 0,
I2 #= I1 - 1,
natsx_int(N, I2).
or you can change the state predicates:
state(S), [state(S)] --> [state(S)].
state(S, S2), [state(S2)] --> [state(S)],{S2#=S+1}.
How about:
:-use_module(library(clpfd)).
list([]) --> [].
list([L|Ls]) --> [L], list(Ls).
lookahead(C),[C] -->
[C].
count_dcg(N,N) --> [].
count_dcg(N0,N) --> %last item.
[_],
list(R),
{R = [], N #=N0+1}.
count_dcg(N0,N) -->
[C1],
lookahead(C1),
count_dcg(N0,N).
count_dcg(N0,N) -->
[C1],
lookahead(C2),
{
dif(C1,C2),
N1 #= N0 + 1
},
count_dcg(N1,N).
count(L,N) :-
phrase(count_dcg(0,N),L).

Why can I not "find all solutions" for a deterministic predicate?

I am writing a program to find all of the squares that a Knight can move to in a game of chess.
For example: validKnightMove(X1/Y1, X2/Y2). where each argument is a co-ordinate pair.
What I've done:
Written the predicate.
Made it deterministic using cuts.
Unit-tested it with PL-Unit.
The predicate works, but I cannot query it in a desirable way from the Prolog shell.
What I'd like to do:
I would like to make a query that will find all of the valid squares I can move to from a given location. For example, ?- validKnightMove(4/4, X/Y), then the shell would search for X and Y values which satisfy the predicate.
However, when I make the query, it simply returns false., despite having valid solutions.
Here is some output from the shell to demonstrate the issue:
1 ?- validKnightMove(4/4, 6/3).
true.
2 ?- validKnightMove(4/4, X/Y).
false.
Here is my code:
This code is admittedly verbose, but should be easy to read.
validKnightMove(X1/Y1, X2/Y2) :- % Right 1, Down 2
onBoard(X2/Y2),
X2 =:= X1 + 1,
Y2 =:= Y1 + 2,
!.
validKnightMove(X1/Y1, X2/Y2) :- % Right 2, Down 1
onBoard(X2/Y2),
X2 =:= X1 + 2,
Y2 =:= Y1 + 1,
!.
validKnightMove(X1/Y1, X2/Y2) :- % Left 2, Down 1
onBoard(X2/Y2),
X2 =:= X1 - 2,
Y2 =:= Y1 + 1,
!.
validKnightMove(X1/Y1, X2/Y2) :- % Left 1, Down 2
onBoard(X2/Y2),
X2 =:= X1 - 1,
Y2 =:= Y1 + 2,
!.
validKnightMove(X1/Y1, X2/Y2) :- % Right 1, Up 2
onBoard(X2/Y2),
X2 =:= X1 + 1,
Y2 =:= Y1 - 2,
!.
validKnightMove(X1/Y1, X2/Y2) :- % Right 2, Up 1
onBoard(X2/Y2),
X2 =:= X1 + 2,
Y2 =:= Y1 - 1,
!.
validKnightMove(X1/Y1, X2/Y2) :- % Left 2, Up 1
onBoard(X2/Y2),
X2 =:= X1 - 2,
Y2 =:= Y1 - 1,
!.
validKnightMove(X1/Y1, X2/Y2) :- % Left 1, Up 2
onBoard(X2/Y2),
X2 =:= X1 - 1,
Y2 =:= Y1 - 2,
!.
onBoard(X/Y) :-
between(1, 8, X),
between(1, 8, Y),
!.
My question is: Why can't prolog find all the solutions for a deterministic predicate?
Note: My prolog version is SWI-Prolog (Multi-threaded, 32 bits, Version 6.2.2)

Prolog calculator simply returns true

I'm writing a calculator in Prolog that reads natural language questions and returns a number answer for a class assignment, and I'm nearly complete. However, when I input a sentence the program simply returns 'Yes' and then quits. As far as I can tell it doesn't even read in the sentence. This is my first time writing in Prolog, so I have no clue what is wrong. Any help would be greatly appreciated.
My code:
:- consult('aux.p').
accumulator(0).
start :-
write('Cranky Calculator'), nl,
write('-----------------'), nl,
cvt.
cvt :-
write('What do ya want?'), nl,
read_sentence(Question),
butlast(Question, Questio),
Questio \== [quit], !,
(
phrase(sentence(Value), Questio, []),
write(Value);
write_string('Stop it with your gibberish!')
), nl,
cvt.
cvt.
reset(V) :-
retract(accumulator(_)),
assert(accumulator(V)).
accumulate('plus', N, Value) :-
{Temp is accumulator(_)},
{Value is Temp + N},
reset(Value).
accumulate('minus', N, Value) :-
{Temp is accumulator(_)},
{Value is Temp - N},
reset(Value).
accumulate('divided', N, Value) :-
{Temp is accumulator(_)},
{Value is Temp / N},
reset(Value).
accumulate('times', N, Value) :-
{Temp is accumulator(_)},
{Value is Temp * N},
reset(Value).
accumulate(N1, 'plus', N2, Value) :-
{Value is N1 + N2},
reset(Value).
accumulate(N1, 'minus', N2, Value) :-
{Value is N1 - N2},
reset(Value).
accumulate(N1, 'divided', N2, Value) :-
{Value is N1 / N2},
reset(Value).
accumulate(N1, 'times', N2, Value) :-
{Value is N1 * N2},
reset(Value).
%------------------base productions---------------------
% sentence can be to an entirely new question or simply be an addition
% to the previous one
sentence(V) --> base(V1), {V is V1}.
sentence(V) --> additional(V1), {V is V1}.
sentence --> [].
base(Value) -->
pro, be, number(N1), oper(OP), number(N2), qmark,
{
accumulate(N1, OP, N2, V), {Value is V}
}.
additional(Value) -->
oper(OP), number(N), qmark,
{
accumulate(OP, N, V), {Value is V}
}.
pro --> [what].
pro --> [how], [much].
be --> [is].
number(N) --> five_digit(N1), {N is N1}.
five_digit(N) --> ten_thousands(V1), four_digit(V2), {N is 1000 * V1 + V2}.
four_digit(N) --> thousands(V1), three_digit(V2), {N is 1000 * V1 + V2}.
three_digit(N) --> hundreds(V1), two_digit(V2), {N is 100 * V1 + V2}.
two_digit(N) --> tens(V1), one_digit(V2), {N is V1 + V2}.
two_digit(N) --> teens(V), {N is V}.
one_digit(N) --> digit(V), {N is V}.
one_digit(0) --> [].
ten_thousands(T) --> tens(V), thousand, {T is V}.
ten_thousands(T) --> tens(V), {T is V}.
ten_thousands(T) --> teens(V), thousand, {T is V}.
ten_thousands(0) --> [].
thousands(T) --> digit(V), thousand, {T is V}.
thousands(0) --> [].
hundreds(T) --> digit(V), hundred, {T is V}.
hundreds(0) --> [].
thousand --> [thousand].
hundred --> [hundred].
digit(1) --> [one].
digit(2) --> [two].
digit(3) --> [three].
digit(4) --> [four].
digit(5) --> [five].
digit(6) --> [six].
digit(7) --> [seven].
digit(8) --> [eight].
digit(9) --> [nine].
tens(20) --> [twenty].
tens(30) --> [thirty].
tens(40) --> [fourty].
tens(50) --> [fifty].
tens(60) --> [sixty].
tens(70) --> [seventy].
tens(80) --> [eighty].
tens(90) --> [ninety].
teens(10) --> [ten].
teens(11) --> [eleven].
teens(12) --> [twelve].
teens(13) --> [thirteen].
teens(14) --> [fourteen].
teens(15) --> [fifteen].
teens(16) --> [sixteen].
teens(17) --> [seventeen].
teens(18) --> [eighteen].
teens(19) --> [nineteen].
oper(plus) --> [plus].
oper(plus) --> [and].
oper(minus) --> [minus].
oper(divided) --> ['divided by'].
oper(times) --> [times].
qmark --> ['?'].
The output I get looks like:
|: what is twelve plus two?
Yes
I took your code as a spec for a calculator that also gives the
result as text. The idea here is to combine DCG and CLP(FD).
CLP(FD) is constraint solving for finite domains. Finite domains
should be enough for your calculator. To enable CLP(FD) you have
first to load the appropriate library. In Jekejeke Minlog this
is done as follows:
:- ensure_loaded(library('clpfd.px')).
The code has first a section that can not only recognize numbers
but also generate text for numbers. This is mainly the part where
DCGs are combined with CLP(FD):
number(N) --> {N #= 1000 * V1 + 100 * V2 + V3}, thousands(V1),
hundreds(V2), two_digit_opt(V3).
thousands(N) --> two_digit(N), thousand.
thousands(0) --> [].
thousand --> [thousand].
hundreds(N) --> digit(N), hundred.
hundreds(0) --> [].
hundred --> [hundred].
two_digit_opt(N) --> two_digit(N).
two_digit_opt(0) --> [].
two_digit(N) --> {N #= V1*10 + V2}, tens(V1), digit_opt(V2).
two_digit(N) --> {N #= V+10}, teens(V).
two_digit(N) --> digit(N).
digit_opt(N) --> digit(N).
digit_opt(0) --> [].
digit(1) --> [one].
digit(2) --> [two].
digit(3) --> [three].
digit(4) --> [four].
digit(5) --> [five].
digit(6) --> [six].
digit(7) --> [seven].
digit(8) --> [eight].
digit(9) --> [nine].
tens(2) --> [twenty].
tens(3) --> [thirty].
tens(4) --> [fourty].
tens(5) --> [fifty].
tens(6) --> [sixty].
tens(7) --> [seventy].
tens(8) --> [eighty].
tens(9) --> [ninety].
teens(0) --> [ten].
teens(1) --> [eleven].
teens(2) --> [twelve].
teens(3) --> [thirteen].
teens(4) --> [fourteen].
teens(5) --> [fifteen].
teens(6) --> [sixteen].
teens(7) --> [seventeen].
teens(8) --> [eighteen].
teens(9) --> [nineteen].
Here is a prove that the bidirectionality works:
?- phrase(number(X),[fifty,five]).
X = 55 ;
No
?- phrase(number(55),X).
X = [fifty,five] ;
No
Adding the calculator was straight forward. I didn't use assert/retract,
I simply using an argument in an infinite loop. I don't know how healthy
this is for your Prolog system, especially since we now inbetween touch
the constraint store. At least in Jekejeke Minlog as of version 0.7.2
the constraint store will not yet be completely recycled, so that one
cannot run the loop indefinitely.
But to show how all the pieces can be put together, the loop solution
is fine. The code reads as follows:
loop(S) :-
write('> '),
flush_output,
read(L),
phrase(cmd(C),L),
do(C,S,T),
phrase(number(T),M),
write(M), nl,
!, loop(T).
loop(S) :-
write('?'), nl,
loop(S).
do(set(N),_,N).
do(add(N),S,T) :- T is S+N.
do(sub(N),S,T) :- T is S-N.
cmd(set(N)) --> factor(N).
cmd(add(N)) --> [plus], factor(N).
cmd(sub(N)) --> [minus], factor(N).
factor(M) --> number(N), more(N, M).
more(N, M) --> [times], number(H), {J is N*H}, more(J,M).
more(N, M) --> [divided, by], number(H), {J is N//H}, more(J,M).
more(N, N) --> [].
And here is an example execution:
?- loop(0).
> [eleven,times,eleven].
[one,hundred,twenty,one]
> [minus,sixty,six].
[fifty,five]
Here is a little how to for the Jekejeke CLP(FD)
Jekejeke Minlog Desktop Installation
https://www.youtube.com/watch?v=6ZipaIrxSFQ
Jekejeke Minlog Android Installation
https://www.youtube.com/watch?v=Y2P7cEuOIws

Resources