What's the preferred way to ignore rest of input? I found one somewhat verbose way:
ignore_rest --> [].
ignore_rest --> [_|_].
And it works:
?- phrase(ignore_rest, "foo schmoo").
true ;
But when I try to collapse these two rules into:
ignore_rest2 --> _.
Then it doesn't:
?- phrase(ignore_rest2, "foo schmoo").
ERROR: phrase/3: Arguments are not sufficiently instantiated
What you want is to state that there is a sequence of arbitrarily many characters. The easiest way to describe this is:
... -->
[].
... -->
[_],
... .
Using [_|_] as a non-terminal as you did, is an SWI-Prolog specific extension which is highly problematic. In fact, in the past, there were several different extensions to/interpretations of [_|_]. Most notably Quintus Prolog did permit to define a user-defined '.'/4 to be called when [_|_] was used as a non-terminal. Note that [_|[]] was still considered a terminal! Actually, this was rather an implementation error. But nevertheless, it was exploited. See for such an example:
David B. Searls, Investigating the Linguistics of DNA with Definite Clause Grammars. NACLP 1989.
Why not simply use phrase/3 instead of phrase/2? For example, assuming that you have a prefix//0 non-terminal that consumes only part of the input:
?- phrase(prefix, Input, _).
The third argument of phrase/3 returns the non-consumed terminals, which you can simply ignore.
Related
I'm writing a code generator that converts definite clause grammars to other grammar notations. To do this, I need to expand a grammar rule:
:- initialization(main).
main :-
-->(example,A),writeln(A).
% this should print ([a],example1), but this is a runtime error
example --> [a],example1.
example1 --> [b].
But -->(example, A) doesn't expand the rule, even though -->/2 appears to be defined here. Is there another way to access the definitions of DCG grammar rules?
This is a guess of what your are expecting and why you are having a problem. It just bugs me because I know you are smart and should be able to connect the dots from the comments. (Comments were deleted when this was posted, but the OP did see them.)
This is very specific to SWI-Prolog.
When Prolog code is loaded it automatically goes through term expansion as noted in expand.pl.
Any clause with --> will get expanded based on the rules of dcg_translate_rule/2. So when you use listing/1 on the code after it is loaded, the clauses with --> have already been expanded. So AFAIK you can not see ([a],example1) which is the code before loading then term expansion, but example([a|A], B) :- example(A, B) which is the code after loading and term expansion.
The only way to get the code as you want would be to turn off the term expansion during loading, but then the code that should have been expanded will not and the code will not run.
You could also try and find the source for the loaded code but I also think that is not what you want to do.
Based on this I'm writing a code generator that converts definite clause grammars to other grammar notations. perhaps you need to replace the code for dcg_translate_rule/2 or some how intercept the code on loading and before the term expansion.
HTH
As for the error related to -->(example,A),writeln(A). that is because that is not a valid DCG clause.
As you wrote on the comments, if you want to convert DCGs into CHRs, you need to apply the conversion before the default expansion of DCGs into clauses. For example, assuming your code is saved to a grammars.pl file:
?- assertz(term_expansion((H --> B), '--->'(H,B))).
true.
?- assertz(goal_expansion((H --> B), '--->'(H,B))).
true.
?- [grammars].
[a],example1
true.
I'm trying to write a DCG for a command interface. The idea is to read a string of input, split it on spaces, and hand the resulting list of tokens to a DCG to parse it into a command and arguments. The result of parsing should be a list of terms which I can use with =.. to construct a goal to call. However, I've become really confused by the string type situation in SWI-Prolog (ver. 7.2.3). SWI-Prolog includes a library of basic DCG functionality, including a goal integer//1 which is supposed to parse an integer. It fails due to a type error, but the bigger problem is that I can't figure out how to make a DCG work nicely in SWI-Prolog with "lists of tokens".
Here's what I'm trying to do:
:- use_module(library(dcg/basics)).
% integer//1 is from the dcg/basics lib
amount(X) --> integer(X), { X > 0 }.
cmd([show,all]) --> ["show"],["all"].
cmd([show,false]) --> ["show"].
cmd([skip,X]) --> ["skip"], amount(X).
% now in the interpreter:
?- phrase(cmd(L), ["show","all"]).
L = [show, all].
% what is the problem with this next query?
?- phrase(cmd(L), ["skip", "50"]).
ERROR: code_type/2: Type error: `character' expected, found `"50"' (a string)
I have read Section 5.2 of the SWI manual, but it didn't quite answer my questions:
What type is expected by integer//1 in the dcg/basics library? The error message says "character", but I can't find any useful reference as to what exactly this means and how to provide it with "proper" input.
How do I pass a list of strings (tokens) to phrase/2 such that I can use integer//1 to parse a token as an integer?
If there's no way to use the integer//1 primitive to parse a string of digits into an integer, how should I accomplish this?
I did quite a bit of expermenting with using different values for the double_quote flag in SWI-Prolog, plus different input formats, such as using a list of atoms, using a single string as the input, i.e. "skip 50" rather than ["skip", "50"], and so on, but I feel like there are assumptions about how DCGs work that I don't understand.
I have studied these three pages as well, which have lots of examples but none quite address my issues (some links omitted since I don't have enough reputation to post all of them):
The tutorial "Using Definite Clause Grammars in SWI-Prolog" by Anne Ogborn
A tutorial from Amzi! Prolog about writing command interfaces as DCGs.
Section 7.3 of J. R. Fisher's Prolog tutorial
A third, more broad question is how to generate an error message if an integer is expected but cannot be parsed as one, something like this:
% the user types:
> skip 50x
I didn't understand that number.
One approach is to set the variable X in the DCG above to some kind of error value and then check for that later (like in the hypothetical skip/1 goal that is supposed to get called by the command), but perhaps there's a better/more idiomatic way? Most of my experience in writing parsers comes from using Haskell's Parsec and Attoparsec libraries, which are fairly declarative but work somewhat differently, especially as regards error handling.
Prolog doesn't have strings. The traditional representation of a double quoted character sequence is a list of codes (integers). For efficiency reasons, SWI-Prolog ver. >= 7 introduced strings as new atomic data type:
?- atomic("a string").
true.
and backquoted literals have now the role previously held by strings:
?- X=`123`.
X = [49, 50, 51].
Needless to say, this caused some confusion, also given the weakly typed nature of Prolog...
Anyway, a DCG still works on (difference) lists of character codes, just the translator has been extended to accept strings as terminals. Your code could be
cmd([show,all]) --> whites,"show",whites,"all",blanks_to_nl.
cmd([show,false]) --> whites,"show",blanks_to_nl.
cmd([skip,X]) --> whites,"skip",whites,amount(X),blanks_to_nl.
and can be called like
?- phrase(cmd(C), ` skip 2300 `).
C = [skip, 2300].
edit
how to generate an error message if an integer is expected
I would try:
...
cmd([skip,X]) --> whites,"skip",whites,amount(X).
% integer//1 is from the dcg/basics lib
amount(X) --> integer(X), { X > 0 }, blanks_to_nl, !.
amount(unknown) --> string(S), eos, {print_message(error, invalid_int_arg(S))}.
prolog:message(invalid_int_arg(_)) --> ['I didn\'t understand that number.'].
test:
?- phrase(cmd(C), ` skip 2300x `).
ERROR: I didn't understand that number.
C = [skip, unknown] ;
false.
Our textbook gave us this example of a structurer for a math equation in Prolog:
math(Result) --> number(Number1), operator(Operator), number(Number2), { Result = [Number1, Operator, Number2] }.
operator('+') --> ['+'].
number('number') --> ['NUMBER'].
I'm quite new to Prolog, however, and I have no idea how to use this example to get the output. I'm under the impression it restructures the input using Result and outputs it for use.
The only input I've tried that doesn't cause an error is math('number', '+', 'number'). but it always outputs false and I don't know why. Furthermore shouldn't it restructure it and give me the result in Result as well?
What should I be inputting here?
This example is a DCG. You should use the phrase/2 interface predicate to access DCGs.
To find out what the DCG describes, start with the most general query, relating the nonterminal math(R) to a list Ls that is described by the first argument:
?- phrase(math(R), Ls).
From the answer you get (very easy exercise!), you will notice that R is probably not what you meant it to be. Hint: Look up (=..)/2.
Notice in particular that you need not be "inputting" anything here: A DCG describes a list. The list can be specified, but need not be given: A variable will do too! Think in terms of relations between arbitrary terms.
Is it possible to use the prolog format predicate to print to file?
I have a table of data that I print to stdout using the format predicate, i.e.
print_table :-
print_table_header,
forall(range(1.0,10.0,0.1,N), print_row(N,L)).
%% print_row(L) :- take a list of the form, [a,b,c,d,e] and
%% print it to screen as a single row of tab separated float values (1DP)
print_row(N,L) :-
build_row(N,L),
format('~t~1f~10+ ~t~1f~10+ ~t~1f~10+ ~t~1f~10+ ~t~1f~10+ ~n', L).
print_table_header :-
format('~t~w~10+ ~t~w~10+ ~t~w~10+ ~t~w~10+ ~t~w~10+ ~n', ['N','N2','N3','N4','N5']).
would be nice to somehow reuse the code to print the same thing to file.
In addition to the other good answer (+1!), I would like to present a purer solution to such tasks.
The key idea is to make format/2 accessible within DCGs, and then to use a DCG to describe the output.
This is very easy, using the codes argument of format/3, provided by several Prolog implementations. All you need are the following short auxiliary definitions:
format_(Data, Args) --> call(format_dlist(Data, Args)).
format_dlist(Data, Args, Cs0, Cs) :- format(codes(Cs0,Cs), Data, Args).
The nonterminal call//1 calls its argument with two additional arguments that let you access the implicit DCG arguments, and this is used to describe additional codes via format/3.
Now, we can simply use the nonterminal format_//2 within DCGs.
For example, to describe a simple table:
table -->
row([a,b,c]),
row([d,e,f]).
row(Ls) --> format_("~t~w~10+~t~w~10+~t~w~10+~n", Ls).
Usage example and result:
?- phrase(table, Cs), format("~s", [Cs]).
a b c
d e f
Cs = [32, 32, 32, 32, 32, 32, 32, 32, 32|...].
Note that one last remaining format/2 is used to actually write the output to the screen.
However, everything else is free of side-effects and declaratively describes a table.
An important advantage of this method is that you can easily write test cases to see whether your tables are (still) correctly formatted. It is easy to reason about Prolog lists of codes (described with a DCG), but quite hard to reason about things that only appear on the terminal.
You can!
Consider the following extract of the SICStus Prolog documentation for format/[2,3]:
11.3.85 format/[2,3]
Synopsis
format(+Control, +Arguments)
format(+Stream, +Control, +Arguments)
Interprets the Arguments according to the Control string and prints the result on Stream.
The predicates format/[2,3] are widely supported across Prolog implementations.
However, as of right now, these predicates are not part of ISO Prolog.
I would write the output 'routines' with an additional parameter, a Stream, and then I would pass user while testing or printing to screen. See ISO predicates open/3, close/1 etc for stream handling...
Note that IO it's among the least 'declarative' areas of the language, because, for efficiency, an approach based on side effects is required...
SWI-Prolog has a builtin with_output_to, that would allows to reuse your existing code without adding a parameter. But since you tagged iso-prolog your question, you should really add the Stream parameter...
I always seem to struggle to write DCG's to parse input files. But it seems it should be simple? Are there any tips or tricks to think about this problem?
For a concrete example, lets say I want to parse a fasta file. (https://en.wikipedia.org/wiki/FASTA_format). I want to read each description and each sequence on back tracking.
:- use_module(library(pio)).
:- use_module(library(dcg/basics)).
:- portray_text(true).
:- set_prolog_flag(double_quotes, codes).
:- set_prolog_flag(back_quotes,string).
fasta_file([]) -->[].
fasta_file([Section|Sections]) -->
fasta_section(Section),
fasta_file(Sections).
fasta_section(Section) -->
fasta_description(Description),
fasta_seq(Sequence),
{Section =.. [section,Description,Sequence]}.
fasta_description(Description) -->
">",
string(Description),
{no_gt(Description),
no_nl(Description)}.
fasta_seq([]) --> [].
fasta_seq(Seq) -->
nt([S]),
fasta_seq(Ss),
{S="X"->Seq =Ss;Seq=[S|Ss]}.
nt("A") --> "A".
nt("C") --> "C".
nt("G") --> "G".
nt("T") --> "T".
nt("X") --> "\n".
no_gt([]).
no_gt([E|Es]):-
dif([E],">"),
no_gt(Es).
no_nl([]).
no_nl([E|Es]):-
dif([E],"\n"),
no_nl(Es).
Now this is clearly wrong. The behaviour I would like is
?-phrase(fasta_section(S),">frog\nACGGGGTACG\n>duck\nACGTTAG").
S = section("frog","ACGGGGTACG");
S = section("duck","ACGTTAG");
false.
But if I did phrase(fasta_file(Sections),">frog\nACGGGGTACG\n>duck\nACGTTAG). Sections is unified with a list of sections/2s, which is what I want, but my current code seems quite hacky- how I have handled the newline character for example.
for sure, there are 'small' typing problems:
nt("A") -->"A",
nt("C") -->"C",
nt("G") -->"G",
nt("T") -->"T".
should be
nt("A") -->"A".
nt("C") -->"C".
nt("G") -->"G".
nt("T") -->"T".
anyway, I also had my problems debugging DCG, I wrote a parser to load in Prolog a MySQL dump (plain SQL, really), and was a pain when something unexpected, like escaped strings, or UTF8 (?) weird encodings were found.
I would suggest to use phrase/3, to see if there is an unparsable tail. Also, could help to place some debug output after known, well behaved sequences.
Of course, I assume you already tried to use the SWI-Prolog debugger.
Also, beware of
...
dif([E],">"),
...
did you set the appropriate flag about double quotes ? In DCG bodies, the rewrite machinery takes care of matching, but a sequence of codes in SWI-Prolog by default doesn't match double quoted strings...
edit
I think this will not solve your doubt about a general strategy... anyway, it's how I would handle the problem...
fasta_file([]) -->[].
fasta_file([Section|Sections]) -->
fasta_section(Section),
fasta_file(Sections).
fasta_section(section(Description,Sequence)) -->
fasta_description(Description),
fasta_seq(SequenceCs), {atom_codes(Sequence, SequenceCs)}, !.
fasta_description(Description) -->
">", string(DescriptionCs), "\n", {atom_codes(Description, DescriptionCs)}.
fasta_seq([S|Seq]) --> nt(S), fasta_seq(Seq).
fasta_seq([]) --> "\n" ; []. % optional \n at EOF
nt(0'A) --> "A".
nt(0'C) --> "C".
nt(0'G) --> "G".
nt(0'T) --> "T".
now
?- phrase(fasta_file(S), `>frog\nACGGGGTACG\n>duck\nACGTTAG`).
S = [section(frog, 'ACGGGGTACG'), section(duck, 'ACGTTAG')] ;
false.
note: the order of clauses fasta_seq//1 is important, since it implements an 'eager' parsing - mainly for efficiency. As I said, I had to parse SQL, several MBs was common.
edit
?- phrase((string(_),fasta_section(S)), `>frog\nACGGGGTACG\n>duck\nACGTTAG`,_).
S = section(frog, 'ACGGGGTACG') ;
S = section(duck, 'ACGTTAG') ;
false.
fasta_section//1 is mean to match a definite sequence. To get all on backtracking we must provide a backtrack point. In this case, string//1 from library(dcg/basics) does the job