When using chars (lists of characters, thus atoms of length one) to represent text, we have the following options for writing them within terms:
"First," the double quoted list notation (6.3.7) is the most efficient one, requiring at least n+2 characters.
But it can only be read back if the Prolog flag double_quotes is set to chars.
['N',e,x,t,','] comes the list notation with at least 2n+1 characters. While it is nice and relatively compact, it implies that also operators are used when writing other data since it is enabled with ignore_ops(false), and this necessitates that the same operators will be present when reading, making it quite brittle.
'.'('L','.'(a,'.'(s,'.'(t,'.'(',',[]))))) the canonical notation which uses functional form also for lists requiring at least 7n+2 characters. That is a lot, but for interoperability (and that includes interoperability with the same system) it is best since it neither depends on the double_quotes flag nor the various operator declarations.
Writing chars in canonical notation can be done in constant space. But for reading, the situation is a bit more tricky. After all, a sequence starting with '.'(a, may also refer to a term '.'(a,Further,b). So a naive reading will have to wait (and use space) until the entire list of chars is read in. On the other hand, it seems to be a safe bet that '.'(a, will be a list constructor '.'(a,Further). In other words,
How to read a term in canonical notation with constant auxiliary space for the reading of chars within?
In case it helps just consider terms sampleterm/1. So consider the reading of all such terms written in canonical form. And, if you like, formulate it as a DCG.
sampleterm([]).
sampleterm(a).
sampleterm(b).
sampleterm('.'(E,Es)) :- % the real list constructor
sampleterm(E),
sampleterm(Es).
sampleterm('.'(E,F,G)) :- % no list constructor
sampleterm(E),
sampleterm(F),
sampleterm(G).
If such space efficient reading is possible, then systems that support a compact internal representation of chars like Scryer and Trealla could even go a tiny step further.
Ah, lest I forget what I have tried: read/1 indeed, but currently it was not ideal.
The following straightforward code is based on Prolog streams.
It focuses on reading "real lists sampletrees" from repositionable streams.
For (1) non-repositionable streams and (2) handling '.'/3 we fall back to read/1.
The main predicate is read_sampleterm/1:
read_sampleterm(Term) :-
current_input(S),
( stream_property(S,reposition(true)),
stream_property(S,position(P)),
( aux_read_sampleterm_1(Term0),
get_char('.') % this is sloppy!
-> true
; set_stream_position(S,P),
fail
)
-> Term = Term0
; read(Term) % fallback
).
Note that above code is sloppy: at the end of the read, we need
to ensure that EOF or a character that does not combine with '.' follows.
The actual reading is done by the following auxiliary predicates:
aux_read_sampleterm_1(Term) :-
get_char(Ch),
aux_read_sampleterm_2(Ch,Term,0). % use indexing
aux_read_sampleterm_2('\'',[X|Xs],N0) :-
get_char('.'),
get_char('\''),
get_char('('),
aux_read_sampleterm_1(X),
get_char(','),
N1 is N0 + 1,
get_char(Ch),
aux_read_sampleterm_2(Ch,Xs,N1).
aux_read_sampleterm_2('[',[],N) :-
get_char(']'),
eat_rparens(N).
aux_read_sampleterm_2(a,a,N) :-
eat_rparens(N).
aux_read_sampleterm_2(b,b,N) :-
eat_rparens(N).
eat_rparens(N) :-
( N > 0
-> get_char(')'),
N0 is N - 1,
eat_rparens(N0)
; true
).
To show some simple use cases we read from files:
read_sampleterm_from_file(File,Term) :-
open(File,read,S,[type(text)]),
current_input(S0),
set_input(S),
read_sampleterm(Term0),
set_input(S0),
close(S),
Term = Term0.
Sample queries using GNU Prolog 1.5.0:
First, sample1.txt:
'.'(a,'.'(b,[])).
We get:
| ?- read_sampleterm_from_file('sample1.txt',T).
T = [a,b]
yes
Next, sample2.txt:
'.'(a,'.'(b,a)).
We get:
| ?- read_sampleterm_from_file('sample2.txt',T).
T = [a,b|a]
yes
sample3.txt is next:
'.'('.'(a,'.'(b,'.'(a,[]))),[]).
We get:
| ?- read_sampleterm_from_file('sample3.txt',T).
T = [[a,b,a]]
(1 ms) yes
Note that running above tests were run without the "fallback option".
The order is important because of cut and regular compound term.
shift --> % 0
state(
s([close|Ts], [term(_,T),dot(0,Cs0)|Ss], CsVs),
s(Ts, [term(chars, Cs)|Ss], CsVs)
),
{ append(Cs0, T, Cs) }.
shift --> % 1
state(
s([close|Ts], [term(_,T),dot(N0,Cs0)|Ss], CsVs),
s(Ts, [dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0), append(Cs0, T, Cs) }.
shift --> % 2
state(
s([close|Ts], [dot(0,Cs)|Ss], CsVs),
s(Ts, [term(chars,Cs)|Ss], CsVs)
).
shift --> % 3
state(
s([close|Ts], [dot(N0,Cs)|Ss], CsVs),
s(Ts, [dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0) }.
shift --> % 4
state(
% s([comma|Ts], [term(atom,A),compound('.',As,As))|Ss], CsVs),
s([comma|Ts], [term(atom,A),compound('.',As,_)|Ss], CsVs),
s(Ts, [dot(0,[A])|Ss], CsVs)
),
% { acyclic_term(As), atom_length(A, 1) },
{ var(As), atom_length(A, 1) },
reduce, !.
shift --> % 5
state(
s([comma|Ts], [term(_,T),dot(0,[A])|Ss], CsVs),
s(Ts, [compound('.',[A,T|As],As)|Ss], CsVs)
).
shift --> % 6
state(
s([comma|Ts], [term(_,T),dot(N0,Cs0)|Ss], CsVs),
s(Ts, [compound('.',[A,T|As],As),dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0), append(Cs, [A], Cs0) }.
shift --> % 7
state(
s([comma|Ts], [dot(0,[A|Cs])|Ss], CsVs),
s(Ts, [compound('.',[A,Cs|As],As)|Ss], CsVs)
).
shift --> % 8
state(
s([comma|Ts], [dot(N0,Cs0)|Ss], CsVs),
s(Ts, [compound('.',[A,Cs1|As],As),dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0),
length(Cs, N0),
append(Cs, [A|Cs1], Cs0)
}.
reduce -->
state(
s(Ts, [dot(0,[A]),dot(N0,Cs0)|Ss], CsVs),
s(Ts, [dot(N,Cs)|Ss], CsVs)
),
{ succ(N0, N), append(Cs0, [A], Cs) }, !.
reduce --> [].
succ(X, S) :-
can_be(not_less_than_zero,X),
can_be(not_less_than_zero,S),
( nonvar(X) -> S is X+1 ; X is S-1, X >= 0 ).
This is the part added to the parser. It has been simplified a bit.
This parser is optimistic and assumes a list of characters is to be parsed when it sees '.'( followed by a character.
For an input like '.'(a,'.'(a,b,b),b), the parser will first build "aa" (chars) then undo it step by step to build the term. The first undo builds '.'(a,b,b) but the parser is still optimistic since [a|'.'(a,b,b)] could be built so "a" is a chars. But an undo is done again to build '.'(a,'.'(a,b,b),b).
There is also the case where the input is '.'(a,'.'(b,[]),c), the parse builds "ab" then needs to undo to build '.'(a,"b",c). The chars "ab" is split with "b" being a chars.
An input more complicated is like '.'(a,'.'(b,'.'(c,[]),d)), the parser builds "abc", then undo by splitting as "a", '.'(b and "c". Now the parser builds '.'(b,"c",d) with "c" being a chars and "a" also being a chars. Finally, it finishes parsing to [a|'.'(b,"c",d)] with [a| and "c" being chars.
It starts with % 4, if the compound term with atom '.' doesn't have any argument and the atom is a character then assumes a list of characters will be parsed and store the character efficiently (dot(0,[A])). Then reduce//0 the dots, on a procedural language the stack could be inspected further instead of using reduce like here.
A finalized chars is built after % 0 or % 2.
The splitting is done with % 6, % 7 and % 8.
The shift % 5 handles '.'(a,T,, it turns "a" into that.
The shift % 3 confirms how much of a chars is actually a chars. In dot(N, Cs), N may not be a chars but the tail is.
The shift % 1 inserts the tails and confirms by decreasing N0.
Lexer (lexer.pl):
:- module(lexer, []).
:- use_module(library(debug)).
:- use_module(library(dcgs)).
:- use_module(library(lists), [append/2, append/3, member/2]).
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Tokens.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
term(Ts) --> tokens(Ts).
% read_term(Ts) --> term(Ts0), end, !, { append(Ts0, [end], Ts) }.
read_term_([end]) --> end, !.
read_term_([T|Ts]) --> token(T), read_term_(Ts).
% Greedy.
tokens([T|Ts]) --> token(T), tokens(Ts).
tokens([]) --> [].
token(name(Cs)) --> name(Cs), !.
token(variable(Cs)) --> variable(Cs), !.
token(float_number(Cs)) --> float_number(Cs), !. % 3
token(integer(Cs)) --> integer(Cs), !. % 2
token(double_quoted_list(Cs)) --> double_quoted_list(Cs), !.
token(open) --> open, !.
token(open_ct) --> open_ct, !.
token(close) --> close_, !.
token(open_list) --> open_list, !.
token(close_list) --> close_list, !.
token(open_curly) --> open_curly, !.
token(close_curly) --> close_curly, !.
token(ht_sep) --> ht_sep, !.
token(comma) --> comma, !.
name(Cs) --> (layout_text_sequence '|' []), !, name_token(Cs).
variable(Cs) --> (layout_text_sequence '|' []), !, variable_token(Cs).
integer(Cs) --> (layout_text_sequence '|' []), !, integer_token(Cs).
float_number(Cs) --> (layout_text_sequence '|' []), !, float_number_token(Cs).
double_quoted_list(Cs) -->
(layout_text_sequence '|' []), !, double_quoted_list_token(Cs).
open --> layout_text_sequence, open_token.
open_ct --> open_token.
close_ --> (layout_text_sequence '|' []), !, close_token.
open_list --> (layout_text_sequence '|' []), !, open_list_token.
close_list --> (layout_text_sequence '|' []), !, close_list_token.
open_curly --> (layout_text_sequence '|' []), !, open_curly_token.
close_curly --> (layout_text_sequence '|' []), !, close_curly_token.
ht_sep --> (layout_text_sequence '|' []), !, head_tail_separator_token.
comma --> (layout_text_sequence '|' []), !, comma_token.
end --> (layout_text_sequence '|' []), !, end_token.
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Layout text.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
layout_text_sequence --> layout_text, layout_texts.
% Greedy.
layout_texts --> layout_text, layout_texts.
layout_texts --> [].
layout_text --> layout_char(_), !.
layout_text --> comment, !.
comment --> single_line_comment, !.
comment --> bracketed_comment, !.
single_line_comment --> end_line_comment_char(_), comment_text, new_line_char(_).
% Greedy. The order is important.
% single_line_comment -->
% end_line_comment_char(_), comment_text, new_line_char(_), !.
% single_line_comment -->
% end_line_comment_char(_), comment_text, [_], !, { false }.
% single_line_comment --> end_line_comment_char(_), comment_text.
bracketed_comment --> comment_open, comment_text, comment_close.
comment_open --> comment_1_char, comment_2_char.
comment_close --> comment_2_char, comment_1_char.
comment_text --> chars(_).
comment_1_char --> "/".
comment_2_char --> "*".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Names.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
name_token(Cs) --> letter_digit_token(Cs), !.
name_token(Cs) --> graphic_token(Cs), !.
name_token(Cs) --> quoted_token(Cs), !.
name_token(Cs) --> semicolon_token(Cs), !.
name_token(Cs) --> cut_token(Cs), !.
letter_digit_token([C|Cs]) --> small_letter_char(C), alphanumeric_chars(Cs).
graphic_token(_) --> ".", layout_char(_), !, { false }.
graphic_token(_) --> ".", end_line_comment_char(_), !, { false }.
graphic_token([C|Cs]) --> graphic_token_char(C), graphic_token_chars(Cs).
% Greedy.
graphic_token_chars([C|Cs]) --> graphic_token_char(C), graphic_token_chars(Cs).
graphic_token_chars([]) --> [].
graphic_token_char(C) --> graphic_char(C), !.
graphic_token_char(C) --> backslash_char(C), !.
quoted_token(Cs) -->
single_quote_char(_),
single_quoted_items(Cs),
single_quote_char(_).
% Greedy.
single_quoted_items(Cs) -->
single_quoted_item(Cs0),
single_quoted_items(Cs1),
{ append(Cs0, Cs1, Cs) }.
single_quoted_items([]) --> [].
single_quoted_item([C]) --> single_quoted_character(C), !.
single_quoted_item([]) --> continuation_escape_sequence, !.
continuation_escape_sequence --> backslash_char(_), new_line_char(_).
semicolon_token([C]) --> semicolon_char(C).
cut_token([C]) --> cut_char(C).
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Quoted characters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
single_quoted_character(C) --> non_quote_char(C), !.
single_quoted_character(C) --> single_quote_char(C), single_quote_char(C), !.
single_quoted_character(C) --> double_quote_char(C), !.
single_quoted_character(C) --> back_quote_char(C), !.
double_quoted_character(C) --> non_quote_char(C), !.
double_quoted_character(C) --> single_quote_char(C), !.
double_quoted_character(C) --> double_quote_char(C), double_quote_char(C), !.
double_quoted_character(C) --> back_quote_char(C), !.
back_quoted_character(C) --> non_quote_char(C), !.
back_quoted_character(C) --> single_quote_char(C), !.
back_quoted_character(C) --> double_quote_char(C), !.
back_quoted_character(C) --> back_quote_char(C), back_quote_char(C), !.
non_quote_char(C) --> graphic_char(C), !.
non_quote_char(C) --> alphanumeric_char(C), !.
non_quote_char(C) --> solo_char(C), !.
non_quote_char(C) --> space_char(C), !.
non_quote_char(C) --> meta_escape_sequence(C), !.
non_quote_char(C) --> control_escape_sequence(C), !.
non_quote_char(C) --> octal_escape_sequence(C), !.
non_quote_char(C) --> hexadecimal_escape_sequence(C), !.
meta_escape_sequence(C) --> backslash_char(_), meta_char(C0),
{ member(C0-C, [('\\')-('\\'), ''''-'''', '"'-'"', '`'-'`']), ! }.
control_escape_sequence(C) --> backslash_char(_), symbolic_control_char(C0),
{ member(
C0-C,
[
'a'-'\a',
'b'-'\b',
'r'-'\r',
'f'-'\f',
't'-'\t',
'n'-'\n',
'v'-'\v'
]
), !
}.
symbolic_control_char(C) --> symbolic_alert_char(C), !.
symbolic_control_char(C) --> symbolic_backspace_char(C), !.
symbolic_control_char(C) --> symbolic_carriage_return_char(C), !.
symbolic_control_char(C) --> symbolic_form_feed_char(C), !.
symbolic_control_char(C) --> symbolic_horizontal_tab_char(C), !.
symbolic_control_char(C) --> symbolic_new_line_char(C), !.
symbolic_control_char(C) --> symbolic_vertical_tab_char(C), !.
symbolic_alert_char('a') --> "a".
symbolic_backspace_char('b') --> "b".
symbolic_carriage_return_char('r') --> "r".
symbolic_form_feed_char('f') --> "f".
symbolic_horizontal_tab_char('t') --> "t".
symbolic_new_line_char('n') --> "n".
symbolic_vertical_tab_char('v') --> "v".
octal_escape_sequence(C) -->
backslash_char(_),
octal_digit_char(C0),
octal_digit_chars(Cs),
backslash_char(_),
{ number_chars(N, ['0', 'o', C0|Cs]),
char_code(C, N)
}.
hexadecimal_escape_sequence(C) -->
backslash_char(_),
symbolic_hexadecimal_char(C0),
hexadecimal_digit_char(C1),
hexadecimal_digit_chars(Cs),
backslash_char(_),
{ number_chars(N, ['0', C0, C1|Cs]),
char_code(C, N)
}.
symbolic_hexadecimal_char('x') --> "x".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Variables.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
variable_token(Cs) --> named_variable(Cs), !. % 1
variable_token(Cs) --> anonymous_variable(Cs), !. % 0
anonymous_variable([C]) --> variable_indicator_char(C).
named_variable([C0, C1|Cs]) -->
variable_indicator_char(C0), !,
alphanumeric_char(C1),
alphanumeric_chars(Cs).
named_variable([C|Cs]) -->
capital_letter_char(C), !,
alphanumeric_chars(Cs).
variable_indicator_char(C) --> underscore_char(C).
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Integer numbers.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
integer_token(Cs) --> character_code_constant(Cs), !.
integer_token(Cs) --> binary_constant(Cs), !.
integer_token(Cs) --> octal_constant(Cs), !.
integer_token(Cs) --> hexadecimal_constant(Cs), !.
integer_token(Cs) --> integer_constant(Cs), !.
integer_constant([C|Cs]) --> decimal_digit_char(C), decimal_digit_chars(Cs).
character_code_constant(['0', C0, C]) -->
"0", single_quote_char(C0), single_quoted_character(C).
binary_constant(Cs) -->
binary_constant_indicator(Cs0),
binary_digit_char(C),
binary_digit_chars(Cs1),
{ append(Cs0, [C|Cs1], Cs) }.
binary_constant_indicator("0b") --> "0b".
octal_constant(Cs) -->
octal_constant_indicator(Cs0),
octal_digit_char(C),
octal_digit_chars(Cs1),
{ append(Cs0, [C|Cs1], Cs) }.
octal_constant_indicator("0o") --> "0o".
hexadecimal_constant(Cs) -->
hexadecimal_constant_indicator(Cs0),
hexadecimal_digit_char(C),
hexadecimal_digit_chars(Cs1),
{ append(Cs0, [C|Cs1], Cs) }.
hexadecimal_constant_indicator("0x") --> "0x".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Floating point numbers.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
float_number_token(Cs) -->
integer_constant(Cs0),
fraction(Cs1),
exponent(Cs2),
{ append([Cs0, Cs1, Cs2], Cs) }.
fraction([C0, C1|Cs]) -->
decimal_point_char(C0),
decimal_digit_char(C1),
decimal_digit_chars(Cs).
% Greedy.
exponent([C|Cs]) --> exponent_char(C), sign(Cs0), integer_constant(Cs1), !,
{ append(Cs0, Cs1, Cs) }.
exponent([]) --> [].
% Greedy.
sign([C]) --> negative_sign_char(C), !.
sign([C]) --> positive_sign_char(C), !.
sign([]) --> [].
positive_sign_char('+') --> "+".
negative_sign_char('-') --> "-".
decimal_point_char('.') --> ".".
exponent_char(C) --> [C], { member(C, "eE"), ! }.
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Double quoted lists.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
double_quoted_string(Cs) -->
double_quote_char(_),
double_quoted_item(Cs0),
double_quoted_char(_),
append(["""", Cs0, """"], Cs).
% Greedy.
double_quoted_items(Cs) -->
double_quoted_item(Cs0), double_quoted_items(Cs1),
{ append(Cs0, Cs1, Cs) }.
double_quoted_items([]) --> [].
double_quoted_item([C]) --> double_quoted_character(C), !.
double_quoted_item([]) --> continuation_escape_sequence, !.
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Double quoted lists.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
double_quoted_list_token(Cs) -->
double_quote_char(C),
double_quoted_items(Cs),
double_quote_char(C).
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Back quoted strings.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
back_quoted_string -->
back_quote_char,
back_quoted_items,
back_quoted_char.
% Greedy.
back_quoted_items --> back_quoted_item, back_quoted_items.
back_quoted_items --> [].
back_quoted_item --> back_quoted_character, !.
back_quoted_item --> continuation_escape_sequence, !.
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Other tokens.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
open_token --> open_char(_).
close_token --> close_char(_).
open_list_token --> open_list_char(_).
close_list_token --> close_list_char(_).
open_curly_token --> open_curly_char(_).
close_curly_token --> close_curly_char(_).
head_tail_separator_token --> head_tail_separator_char(_).
comma_token --> comma_char(_).
% The order is important.
% Greedy. TODO: Find better.
end_token, [C] --> end_char, layout_char(C), !.
end_token, "%" --> end_char, end_line_comment_char(_), !.
end_token --> end_char, [_], !, { false }.
end_token --> end_char.
end_char --> ".".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processor character set.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
% Not greedy.
chars([]) --> [].
chars([C|Cs]) --> char(C), chars(Cs).
char(C) --> graphic_char(C), !.
char(C) --> alphanumeric_char(C), !.
char(C) --> solo_char(C), !.
char(C) --> layout_char(C), !.
char(C) --> meta_char(C), !.
char(C) --> [C], { write('Accepting: \''), write(C), write(''''), nl }.
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Graphic characters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
graphic_char(C) --> [C], { member(C, "#$&*+-./:<=>?#^~"), ! }.
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Alphanumeric characters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
% Greedy.
alphanumeric_chars([C|Cs]) -->
alphanumeric_char(C),
alphanumeric_chars(Cs).
alphanumeric_chars([]) --> [].
alphanumeric_char(C) --> alpha_char(C), !.
alphanumeric_char(C) --> decimal_digit_char(C), !.
alpha_char(C) --> underscore_char(C), !.
alpha_char(C) --> letter_char(C), !.
letter_char(C) --> capital_letter_char(C), !.
letter_char(C) --> small_letter_char(C), !.
% Greedy.
decimal_digit_chars([C|Cs]) --> decimal_digit_char(C), decimal_digit_chars(Cs).
decimal_digit_chars([]) --> [].
% Greedy.
binary_digit_chars([C|Cs]) --> binary_digit_char(C), binary_digit_chars(Cs).
binary_digit_chars([]) --> [].
% Greedy.
octal_digit_chars([C|Cs]) --> octal_digit_char(C), octal_digit_chars(Cs).
octal_digit_chars([]) --> [].
% Greedy.
hexadecimal_digit_chars([C|Cs]) -->
hexadecimal_digit_char(C), hexadecimal_digit_chars(Cs).
hexadecimal_digit_chars([]) --> [].
small_letter_char(C) --> [C], { member(C, "abcdefghijklmnopqrstuvwxyz"), ! }.
capital_letter_char(C) --> [C], { member(C, "ABCDEFGHIJKLMNOPQRSTUVWXYZ"), ! }.
decimal_digit_char(C) --> [C], { member(C, "0123456789"), ! }.
binary_digit_char(C) --> [C], { member(C, "01"), ! }.
octal_digit_char(C) --> [C], { member(C, "01234567"), ! }.
hexadecimal_digit_char(C) --> [C], { member(C, "0123456789AaBbCcDdEeFf"), ! }.
underscore_char('_') --> "_".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Solo characters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
solo_char(C) --> cut_char(C), !.
solo_char(C) --> open_char(C), !.
solo_char(C) --> close_char(C), !.
solo_char(C) --> comma_char(C), !.
solo_char(C) --> semicolon_char(C), !.
solo_char(C) --> open_list_char(C), !.
solo_char(C) --> close_list_char(C), !.
solo_char(C) --> open_curly_char(C), !.
solo_char(C) --> close_curly_char(C), !.
solo_char(C) --> head_tail_separator_char(C), !.
solo_char(C) --> end_line_comment_char(C), !.
cut_char('!') --> "!".
open_char('(') --> "(".
close_char(')') --> ")".
comma_char((',')) --> ",".
semicolon_char(';') --> ";".
open_list_char('[') --> "[".
close_list_char(']') --> "]".
open_curly_char('{') --> "{".
close_curly_char('}') --> "}".
head_tail_separator_char('|') --> "|".
end_line_comment_char('%') --> "%".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Layout characters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
layout_char(C) --> space_char(C), !.
layout_char(C) --> horizontal_tab_char(C), !.
layout_char(C) --> new_line_char(C), !.
space_char(' ') --> " ".
horizontal_tab_char('\t') --> "\t".
new_line_char('\n') --> "\n".
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Meta characters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
meta_char(C) --> backslash_char(C), !.
meta_char(C) --> single_quote_char(C), !.
meta_char(C) --> double_quote_char(C), !.
meta_char(C) --> back_quote_char(C), !.
backslash_char('\\') --> "\\".
single_quote_char('''') --> "'".
double_quote_char('"') --> """".
back_quote_char('`') --> "`".
Parser (parser.pl):
:- module(parser, []).
:- use_module(library(debug)).
:- use_module(library(dcgs)).
:- use_module(library(error)).
:- use_module(library(format)).
:- use_module(library(lists)).
state(S), [S] --> [S].
state(S0, S), [S] --> [S0].
print --> state(s(Ts, Ss, CsVs)), { format("~q ~q ~q\n", [Ts, Ss, CsVs]) }.
print(Cs) -->
state(s(Ts, Ss, CsVs)),
{ format("~s ~q ~q ~q\n", [Cs, Ts, Ss, CsVs]) }.
lookahead(T) --> state(s([T|_], _, _)).
shift --> state(s([name(Cs)|Ts], Ss, CsVs), s(Ts, [term(atom,A)|Ss], CsVs)),
{ atom_chars(A, Cs) }.
shift -->
state(
s([open_list,close_list|Ts], Ss, CsVs),
s(Ts, [term(atom,[])|Ss], CsVs)
).
shift -->
state(
s([open_curly,close_curly|Ts], Ss, CsVs),
s(Ts, [term(atom,{})|Ss], CsVs)
).
shift -->
state(s([variable(Cs)|Ts], Ss, CsVs0), s(Ts, [term(variable,V)|Ss], CsVs)),
{ variable(Cs, CsVs0, V, CsVs) }.
shift -->
state(
s([integer(Cs)|Ts], [term(atom,-)|Ss], CsVs),
s(Ts, [term(integer,N)|Ss], CsVs)
), !,
{ number_chars(N, [-|Cs]) }.
shift -->
state(
s([float_number(Cs)|Ts], [term(atom,-)|Ss], CsVs),
s(Ts, [term(float,N)|Ss], CsVs)
), !,
{ number_chars(N, [-|Cs]) }.
shift -->
state(
s([integer(Cs)|Ts], Ss, CsVs),
s(Ts, [term(integer,N)|Ss], CsVs)
),
{ number_chars(N, Cs) }.
shift -->
state(
s([float_number(Cs)|Ts], Ss, CsVs),
s(Ts, [term(float,N)|Ss], CsVs)
),
{ number_chars(N, Cs) }.
shift -->
state(
s([open_ct|Ts], [term(atom,A)|Ss], CsVs),
s(Ts, [compound(A,As,As)|Ss], CsVs)
), !.
shift --> state(s([open|Ts], Ss, CsVs), s(Ts, [open|Ss], CsVs)).
shift --> state(s([open_ct|Ts], Ss, CsVs), s(Ts, [open_ct|Ss], CsVs)).
% /*
shift -->
state(
s([close|Ts], [term(_,T),dot(0,Cs0)|Ss], CsVs),
s(Ts, [term(chars, Cs)|Ss], CsVs)
),
{ append(Cs0, T, Cs) }.
shift -->
state(
s([close|Ts], [term(_,T),dot(N0,Cs0)|Ss], CsVs),
s(Ts, [dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0), append(Cs0, T, Cs) }.
shift -->
state(
s([close|Ts], [dot(0,Cs)|Ss], CsVs),
s(Ts, [term(chars,Cs)|Ss], CsVs)
).
shift -->
state(
s([close|Ts], [dot(N0,Cs)|Ss], CsVs),
s(Ts, [dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0) }.
% */
shift -->
state(
s([close|Ts], [term(_,T0),compound(A,As,[T0])|Ss], CsVs),
s(Ts, [term(compound,T)|Ss], CsVs)
),
{ T =.. [A|As] }.
shift -->
state(
s([close|Ts], [term(_,T),open|Ss], CsVs),
s(Ts, [term(compound,T)|Ss], CsVs)
).
shift -->
state(
s([close|Ts], [term(_,T),open_ct|Ss], CsVs),
s(Ts, [term(compound,T)|Ss], CsVs)
).
% /*
shift -->
state(
% s([comma|Ts], [term(atom,A),compound('.',As,As))|Ss], CsVs),
s([comma|Ts], [term(atom,A),compound('.',As,_)|Ss], CsVs),
s(Ts, [dot(0,[A])|Ss], CsVs)
),
% { acyclic_term(As), atom_length(A, 1) },
{ var(As), atom_length(A, 1) },
reduce, !.
shift -->
state(
s([comma|Ts], [term(_,T),dot(0,[A])|Ss], CsVs),
s(Ts, [compound('.',[A,T|As],As)|Ss], CsVs)
).
shift -->
state(
s([comma|Ts], [term(_,T),dot(N0,Cs0)|Ss], CsVs),
s(Ts, [compound('.',[A,T|As],As),dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0), append(Cs, [A], Cs0) }.
shift -->
state(
s([comma|Ts], [dot(0,[A|Cs])|Ss], CsVs),
s(Ts, [compound('.',[A,Cs|As],As)|Ss], CsVs)
).
shift -->
state(
s([comma|Ts], [dot(N0,Cs0)|Ss], CsVs),
s(Ts, [compound('.',[A,Cs1|As],As),dot(N,Cs)|Ss], CsVs)
),
{ succ(N, N0),
length(Cs, N0),
append(Cs, [A|Cs1], Cs0)
}.
% */
shift -->
state(
s([comma|Ts], [term(_,T),compound(A,As0,[T|As])|Ss], CsVs),
s(Ts, [compound(A,As0,As)|Ss], CsVs)
).
reduce -->
state(
s(Ts, [dot(0,[A]),dot(N0,Cs0)|Ss], CsVs),
s(Ts, [dot(N,Cs)|Ss], CsVs)
),
{ succ(N0, N), append(Cs0, [A], Cs) }, !.
reduce --> [].
read_term_ -->
% print,
lookahead(end), !,
accept.
read_term_ -->
shift, !,
read_term_.
accept --> state(s([end], [term(_,_)], _)), !.
accept -->
print("End"), { false }.
% { throw(error(reduction(imcomplete), read_term//0)) }.
variable("_", CsVs, _, CsVs) :- !.
variable(Cs, CsVs, V, CsVs) :-
member(Cs-V, CsVs), !.
variable(Cs, CsVs, V, [Cs-V|CsVs]).
succ(X, S) :-
can_be(not_less_than_zero,X),
can_be(not_less_than_zero,S),
( nonvar(X) -> S is X+1 ; X is S-1, X >= 0 ).
sml(S, M, Xs0, Xs) :-
'$skip_max_list'(S, M, Xs0, Xs).
The initialization file init.pl:
:- use_module(library(debug)).
:- use_module(library(charsio)).
:- use_module(library(dcgs)).
:- use_module(library(dif)).
:- use_module(library(lists)).
:- use_module(library(iso_ext)).
:- use_module(library(pairs)).
:- use_module(library(pio)).
:- use_module(library(format)).
:- use_module(lexer).
:- use_module(parser).
parse(Cs, Ts, Ss0, S) :-
phrase(lexer:read_term_(Ts), Cs),
Ss0 = [s(Ts, [], [])|_],
phrase(parser:read_term_, Ss0, [S]).
sample(a) --> [a].
sample([]) --> [[]].
sample('.'(E)) --> ['.'], sample(E).
sample('.'(E,Es)) --> ['.'], sample(E), sample(Es).
sample('.'(E,F,G)) --> ['.'], sample(E), sample(F), sample(G).
sample('.'(E,F,G,H)) --> ['.'], sample(E), sample(F), sample(G), sample(H).
generate(T, N) :-
length(Cs, N),
phrase(sample(T), Cs).
test :-
once(generate('.'(a,'.'(a,'.'(a,'.'(a,[]))),a), _)),
% N = 72849,
call_nth(user:generate(T, _), N),
( N mod 2^10 =:= 0, writeq(N), nl, false ; true ),
% T = '.'('.'(a,[]),[]),
% T = '.'('.'(aa,'.'(b,'.'(c,[]))),[]),
write_term_to_chars(T, [quoted(true),ignore_ops(true)], Cs0),
append(Cs0, ".", Cs),
( parse(Cs, _, _, _) ->
parse(Cs, Ts, Ss, S),
S = s(_,[term(_,T0)],_),
T0 \== T,
format("N = ~q,\nCs = ~q,\nTs = ~q,\nSs = ~q,\nS = ~q.\n\n", [N,Cs,Ts,Ss,S])
; format("N = ~q,\nCs = ~q,\nT = ~q.\n\n", [N,Cs,T])
),
halt.
test :-
halt.
dif(V, S0, V) :-
dif(V, S0).
ns_ --> "(".
ns_ --> ")".
ns_ --> "'.'".
ns_ --> "a".
ns --> [].
ns --> ns_, ns.
nonsense(Cs) :-
length(Cs, _),
[C0|Cs0] = Cs,
foldl(dif, Cs0, C0, _),
phrase(ns, Cs).
nonsense :-
call_nth(nonsense(Cs0), N),
( N mod 2^10 =:= 0, writeq(N), nl, false ; true ),
append(Cs0, ".", Cs),
( parse(Cs, Ts, Ss, S),
\+ catch(read_from_chars(Cs, _), error(syntax_error(_),_), false),
format("N = ~q,\nCs = ~q,\nTs = ~q,\nSs = ~q,\nS = ~q.\n\n", [N,Cs,Ts,Ss,S])
; catch(read_from_chars(Cs, T), error(syntax_error(_),_), false),
\+ parse(Cs, Ts, Ss, S),
\+ member(N, [161,749,822,3819]),
format("N = ~q,\nCs = ~q,\nT = ~q.\n\n", [N,Cs,T])
),
halt.
nonsense :-
halt.
Using Scryer Prolog with scryer-prolog init -g test. It can also be queried like ?- parse("'.'(a,[]).", Ts, Ss, S).. By uncommenting print//0 in read_term_//0 in parser.pl (or using state(S0, S), [S] --> [S0,S].), the transition can be visualized.
In the step by step, dot/2 and term(chars,_) use a compact internal representation.
Some information is lost in this implementation (can't be seeing but present), example when parsing '.'('.'(a,[]),[]).:
...
[close,comma,open_list,close_list,close,end] [term(atom,[]),dot(0,"a"),compound('.',A,A)] []
[comma,open_list,close_list,close,end] [term(chars,"a"),compound('.',A,A)] []
[open_list,close_list,close,end] [compound('.',["a"|A],A)] [] % "a" still compactly stored.
...
Some grammar rules of shift//0 can be commented to fall back to the unoptimized parser.
The space complexity is O(ln N) where N is the size of the list of characters (not token nor input).
In the worst case ([], '.'(a,a([])), '.'(a,a('.'(a,a([])))), etc), the space complexity is O(N ln N) where N is the number of tokens.
In the best case ([], '.'(a,[]), '.'(a,'.'(a,[])), etc), the space complexity is O(ln N) where N is the number of tokens.
I'm trying to make the simple graph coloring algorithm in Prolog, but I'm having a bit of a hard time understanding the language. I know what I want to do - I want to go to a vertex, find all the other vertices connected to it, check my vertex's color, and depending on that, color the other vertices with different colors. I'm just having a hard time translating this to Prolog. If it was a C dialect or Java, it would be a piece of cake for me, but this is giving me fits.
This is what I have so far:
main:- graph_coloring.
%color_list([blue, red, green, yellow, white], colors).
%vertex_list([a, b, c, d], vertices).
%edge_list([(a,b),(b,c),(c,d)], edges).
%Our graph
color(blue).
color(red).
color(green).
color(black).
color(white).
%graph([a-b, b-c, b-d, c-d]).
vertex(a).
vertex(b).
vertex(c).
vertex(d).
%Subject to changing, so are asserted into listener at runtime.
init_dynamic_facts:-
assertz(vertex_color(a, none)),
assertz(vertex_color(b, none)),
assertz(vertex_color(c, none)),
assertz(vertex_color(d, none)),
assertz(location(a)).
edge(a,b).
edge(b,c).
edge(b,d).
edge(c,d).
is_connect(A,B):-
edge(A,B).
is_connect(A,B):-
edge(B,A).
connections(Vertex):-
edge(Vertex,X).
connections(Vertex):-
edge(X,Vertex).
move(Vertex):-
retract(location(_)),
asserta(location(Vertex)).
paint_vertex(Vertex, Color):-
retract(vertex_color(Vertex,_)),
asserta(vertex_color(Vertex, Color)).
find_vertex_color(Vertex):-
vertex_color(Vertex, X).
graph_coloring:-
location(Current_vertex),
vertex_color(Current_vertex, Curr_color),
( Curr_color =:= none ->
connections(Current_vertex, Others),
vertex_color(Others, Other_colors),
paint_vertex(Current_vertex,
How can I complete this algorithm?
(edited: more code under graph_coloring)
I would like to mention this problem is a typical constraint satisfaction problem and can be efficiency solved using the CSP module of SWI-Prolog. Here is the full algorithm:
:- use_module(library(clpfd)).
color(red).
color(blue).
color(green).
vertex(a).
vertex(b).
vertex(c).
vertex(d).
vertex(e).
edge(a,b).
edge(a,c).
edge(b,c).
edge(b,d).
edge(c,d).
colorGraph(ColorList) :-
findall((X, Y), edge(X, Y), Edges),
findall(X, vertex(X), Vertexes),
findall(hasColor(X, _), member(X, Vertexes), ColorList),
createConstraint(Edges,ColorList),
colorNodes(ColorList).
createConstraint([],_).
createConstraint([(V1,V2)|RL],ColorList):-
member(hasColor(V1,C1),ColorList),
member(hasColor(V2,C2),ColorList),
dif(C1,C2),
createConstraint(RL,ColorList).
colorNodes([]).
colorNodes([hasColor(_,C)|Nodes]) :-
color(C),
colorNodes(Nodes).
color/1 indicates the colors available, vertex/1 indicates the vertexes in the graph and edge/2 defines the couples between the vertexes. Moreover, colorGraph(?List) determines the color of the vertexes, where List is a list of hasColor(Vertex, Color) clauses, Vertex being the colored vertex using Color.
Let's details each part of the algorithm above to understand what happens.
:- use_module(library(clpfd)).
It indicates to SWI-Prolog to load the module containing the predicates for constraint satisfaction problems.
colorGraph(ColorList) :-
findall((X, Y), edge(X, Y), Edges),
findall(X, vertex(X), Vertexes),
findall(hasColor(X, _), member(X, Vertexes), ColorList),
createConstraint(Edges,ColorList),
colorNodes(ColorList).
The predicate colorGraph/1 is the entry point of the algorithm. It converts the clauses of edges/vertexes into lists, constraints the ColorList to have the list of vertexes defined and finally create the constraints on the colors and assign the colors (they are two separated operations).
createConstraint([],_).
createConstraint([(V1,V2)|RL],ColorList):-
member(hasColor(V1,C1),ColorList),
member(hasColor(V2,C2),ColorList),
dif(C1,C2),
createConstraint(RL,ColorList).
The predictate createConstraint/2 simply states that two linked vertexes must have a different color. This is worth mentioning dif/2 is a CSP predicate.
colorNodes([]).
colorNodes([hasColor(_,C)|Nodes]) :-
color(C),
colorNodes(Nodes).
The predicate colorNodes/1 assigns the right color to the vertexes. Prolog is going to take care to assign the right colors based on the constraints defined above.
Finally, the results can be found by calling the predicate colorGraph/1, such as:
?- colorGraph(L).
L = [hasColor(a, red), hasColor(b, blue), hasColor(c, green), hasColor(d, red), hasColor(e, red)] ;
L = [hasColor(a, red), hasColor(b, blue), hasColor(c, green), hasColor(d, red), hasColor(e, blue)] ;
L = [hasColor(a, red), hasColor(b, blue), hasColor(c, green), hasColor(d, red), hasColor(e, green)] ;
L = [hasColor(a, red), hasColor(b, green), hasColor(c, blue), hasColor(d, red), hasColor(e, red)] ;
L = [hasColor(a, red), hasColor(b, green), hasColor(c, blue), hasColor(d, red), hasColor(e, blue)] ;
L = [hasColor(a, red), hasColor(b, green), hasColor(c, blue), hasColor(d, red), hasColor(e, green)] ;
L = [hasColor(a, blue), hasColor(b, red), hasColor(c, green), hasColor(d, blue), hasColor(e, red)] ;
L = [hasColor(a, blue), hasColor(b, red), hasColor(c, green), hasColor(d, blue), hasColor(e, blue)] ;
L = [hasColor(a, blue), hasColor(b, red), hasColor(c, green), hasColor(d, blue), hasColor(e, green)] ;
L = [hasColor(a, blue), hasColor(b, green), hasColor(c, red), hasColor(d, blue), hasColor(e, red)] ;
L = [hasColor(a, blue), hasColor(b, green), hasColor(c, red), hasColor(d, blue), hasColor(e, blue)] ;
L = [hasColor(a, blue), hasColor(b, green), hasColor(c, red), hasColor(d, blue), hasColor(e, green)] ;
L = [hasColor(a, green), hasColor(b, red), hasColor(c, blue), hasColor(d, green), hasColor(e, red)] ;
L = [hasColor(a, green), hasColor(b, red), hasColor(c, blue), hasColor(d, green), hasColor(e, blue)] ;
L = [hasColor(a, green), hasColor(b, red), hasColor(c, blue), hasColor(d, green), hasColor(e, green)] ;
L = [hasColor(a, green), hasColor(b, blue), hasColor(c, red), hasColor(d, green), hasColor(e, red)] ;
L = [hasColor(a, green), hasColor(b, blue), hasColor(c, red), hasColor(d, green), hasColor(e, blue)] ;
L = [hasColor(a, green), hasColor(b, blue), hasColor(c, red), hasColor(d, green), hasColor(e, green)] ;
I think you are trying to think in a way that is not natural for Prolog programs; that is, you are trying not to use recursion :) What I've came up with is the following, which however may not be entirely correct (it's late, and I don't have a good reputation when trying to think at times like this...:) )
Let's assume that you have the graph described by the following facts:
edge(a,b).
edge(b,c).
edge(b,d).
edge(c,d).
and that the available colors are
color(blue).
color(red).
color(green).
(you only need 3 colors to color a planar graph, so let's just use 3 here). Let's also assume that you want the answer to be given as a [Vertex-Color] list, where the list will contain a color for every vertex of your graph. I believe the following is a correct solution:
coloring([V-C]) :-
color(C),
\+ edge(V,_).
coloring([V-C,V1-C1|Coloring]) :-
color(C),
edge(V,V1),
V \== V1,
coloring([V1-C1|Coloring]),
C1 \== C.
The first clause says that if there is no edge from V to any other vertex, just try all possible colors. The second clause says that vertex V will get color C, and vertex V1 will get color C1 if there is an edge from V to V1, where V != V1 and C != C1. (I also assumed that your graph is connected, i.e. there are no vertices which are not connected to other vertices).
And since we only want solutions where all the vertices have colors, we will only keep lists of length |V|, where V is the set of vertices you have. You can implement this restriction in various ways; I prefer to use "findall/3":
colors(X) :-
coloring(X),
findall(V,edge(V,_),List),
length(List,Len),
length(X,Len).
Now, by consulting this program and asking |?- colors(X). you will get all the possible color assignments for the vertices of your graph.
If anyone finds a problem I am almost sure there exists in the above solution, please, do let us know :)
Spyros