Adding a parsing constraint to a DCG - prolog

Graphic tokens can serve as Prolog operators that don't require single quotes.
A translation of ISO/IEC 13211-1:1995, 6.4.2 "Syntax.Tokens.Names" is:
graphic_token --> kleene_plus(graphic_token_char).
graphic_token_char --> member("#$&*+-./:<=>?#^~\\").
% some auxiliary code
kleene_plus(NT) --> NT, kleene_star(NT).
kleene_star(NT) --> "" | kleene_plus(NT).
member(Xs) --> [X], { member(X,Xs) }.
Subsection 6.4.1 "Syntax.Tokens.Layout Text" adds the following constraint:
A graphic token shall not begin with the character sequence comment open (i.e., "/*").
Enforcing that restriction in the DCG is no big deal...
graphic_token --> graphic_token_char. % 1 char
graphic_token --> % 2+ chars
[C1,C2],
{ phrase((graphic_token_char,graphic_token_char), [C1,C2]) },
{ dif([C1,C2], "/*") },
kleene_star(graphic_token_char).
... but quite ugly!
How do I make it pretty again (and keep it bidirectional)?

I'm not sure this is prettier, but maybe something like this:
graphic_token --> kleene_plus_member("#$&*+-.:<=>?#^~\\",0'/).
graphic_token --> "/", kleene_star_member("#$&+-./:<=>?#^~\\", 0'*).
kleene_plus_member(Xs, Code) --> member(Xs), kleene_star(member([Code|Xs])).
kleene_star_member(Xs, Code) --> "" | member(Xs), kleene_star(member([Code|Xs])).
The first clause of graphic_token parses a graphic token that does not begin with / and the second clause the one which starts with it.

Related

Prolog how to eliminate left recursion

I have written the DCG (Adjective phrase and prepositional phrase) in prolog, when I tried to run it, by entering ip([every,boy,loved,some,girl]), it shows out of local stack. I realised there is something wrong with the nbar. Can someone help me out? Many thanks.
%tree
treeP(Term):-
% Print the tree assuming indentation 0
treeP(0,Term),
% Tidy up with linefeed
nl.
treeP(_N,Tree):-
% Tree is just a variable
var(Tree),!,
write(Tree).
treeP(N,[Tree|Trees]):-
proper_list([Tree|Trees]),!,
write('['),
N1 is N+1,
treePNEL(N1,[Tree|Trees]),
write(']').
treeP(N,Tree):-
% Nonatomic case
Tree=..[Functor,Argument|Arguments],
!,
% Write the functor and opening parenthesis
write(Functor),write('('),
% Set N1 to new indentation for arguments
atom_length(Functor,M), N1 is N+M+1,
% Pretty-print the arguments
treePNEL(N1,[Argument|Arguments]),
% Write right parenthesis
write(')').
treeP(_N,Tree):-
% Noncompound case
write(Tree).
treePNEL(N,[Tree1,Tree2|Trees]):-
treeP(N,Tree1),
% Go to correct position for further printing
nl, tab(N),
treePNEL(N,[Tree2|Trees]).
treePNEL(N,[Tree]):-
treeP(N,Tree).
ip(Sentence):-
setof(IP,
ip(IP,Sentence,[]),
IP),
treeP(IP).
ip(SSem) --> np(NPSem), ibar(IbarSem),
{var_replace(NPSem,NPSem1),
beta(NPSem1#IbarSem,SSem)}.
ibar(VPSem) --> i(MvdVbL),vp(VPSem,MvdVbL).
i([]) --> [].
i([]) --> [Aux],{isAux(Aux)}.
i([Verb]) --> [InflVerb],{pastInfl(Verb,InflVerb),isVerb(Verb)}.
pastInfl(see,saw).
pastInfl(love,loved).
vp(VbarSem,MvdVbL) --> vbar(VbarSem,MvdVbL).
vbar(VbarSem,MvdVbL) --> v(VSem,MvdVbL), np(NPSem),
{var_replace(VSem,VSem1),
beta(VSem1#NPSem,VbarSem)}.
v(lbd(s, lbd(x,s#lbd(y,Fla))),[]) --> [Verb],
{isVerb(Verb),Fla=..[Verb,x,y]}.
v(lbd(s,lbd(x,s#lbd(y,Fla))),[MvdVb])--> [],
{Fla=..[MvdVb,x,y]}.
np(NbarSem) --> nbar(NbarSem).
nbar(NbarSem) --> adj(AdjSem),nbar(NbarSem1),
{var_replace(AdjSem,AdjSem1),beta(AdjSem1#NbarSem1,NbarSem)}.
nbar(NbarSem) --> det(DetSem),nbar(NbarSem1),
{var_replace(DetSem,DetSem1),beta(DetSem1#NbarSem1,NbarSem)}.
nbar(NSem) --> n(NSem).
nbar(NbarSem) --> nbar(NbarSem1),pp(PPSem),
{var_replace(PPSem,PPSem1),beta(PPSem1#NbarSem1,NbarSem)}.
nbar(NSem) --> n(NSem).
pp(PPSem) --> pbar(PPSem).
pbar(NbarSem) --> po(PPSem),np(NbarSem1),
{var_replace(PPSem,PPSem1),beta(PPSem1#NbarSem1,NbarSem)}.
isVerb(love).
n(lbd(x,boy(x))) --> [boy].
n(lbd(x,girl(x))) --> [girl].
det(lbd(q,lbd(p,exists(x,(q#x & p#x))))) --> [some].
det(lbd(q,lbd(p,forall(x,(q#x -> p#x))))) --> [every].
nbar(NbarSem) --> adj(AdjSem),nbar(NbarSem1)
nbar(NbarSem) --> det(DetSem),nbar(NbarSem1)
nbar(NbarSem) --> nbar(NbarSem1),pp(PPSem)
Tabling is implemented in recent versions of SWI-Prolog. By declaring the predicates (or non-terminals) using left-recursion as tabled predicates (or non-terminals), you can keep their definitions. For details, consult:
http://www.swi-prolog.org/pldoc/man?section=tabling

Parsing grammar using DCG [SICStus]

So far,a wide syntax that I have to parse it (in order to create a Syntax Analyzer), the problem is that I got redundancy somewhere in code, but I dont know where is it.
part of Grammar ;
Grammar_types
Type :: = Basic_Type
| "PP" "(" Type ")"
| Type "*" Type
| "struct" "(" (Ident ":" Type)+"," ")"
| "(" Type ")" .
Basic_Type :: = "ZZ"| "BOOL" | "STRING" | Ident .
I try to analyze this gramar without DCG , example to parse Id :: = Id ((Id) * ",") *
Example_1
"id","id_0(id1,id2,..)"
Code_1
Entete_ (ID, Id, Ids) - atom_concat(XY,')', ID),
atom_concat(XX,Ids, XY),check_ids(Ids),
atom_concat(Id,'(',XX),check_id(Id) ,!.
...
but during some searches , I found that DCG is one of the most effective parsers, so I come back to got the code below ;
Code_2
type(Type) --> "struct(", idents_types(Type),")"
| "PP(",ident(Type),")"
| "(",type(Type),")"
| type(Type),"*",type(Type)
| basic_type(Type)
| "error1_type".
...
Example_Syntaxe ;
"ZZ" ; "PP(ZZ*STRING)" ; "struct(x:personne,struct(y:PP(PP))" ; "ZZ*ZZ" ...
Test
| ?- phrase(type(L),"struct(aa:struct())").
! Resource error: insufficient memory
% source_info
I think that the problem over here (idents_types)
| ?- phrase(idents_types(L),"struct(aa:STRING)").
! Resource error: insufficient memory
Expected result
| ?- type('ZZ*struct(p1:STRING,p2:PP(STRING),p3:(BOOL*PP(STRING)),p4:PP(personne*BOOL))').
p1-STRING
STRING
p2-PP(STRING)
STRING
p3-(BOOL*PP(STRING))
STRING
BOOL
p4-PP(personne*BOOL)
BOOL
personne
ZZ
yes
So my question is, why am I receiving this error of redundancy , and how can I fix it?
You have a left recursion on type//1.
type(Type) --> ... | type(Type),"*",type(Type) | ...
You can look into this question for further information.
Top down parsers, from which DCGs borrow, must have a mean to lookahead a symbol that drives the analysis in right direction.
The usual solution to this problem, as indicated from the link above, is to introduce a service nonterminal that left associate recursive applications of the culprit rule, or is epsilon (terminate the recursion).
An epsilon rule is written like
rule --> [].
The transformation can require a fairly bit of thinking... In this answer I suggest a bottom up alternative implementation, that could be worthy if the grammar cannot be transformed, for practical or theoric problems (LR grammars are more general than LL).
You may want to try this simple minded transformation, but for sure it leaves several details to be resolved.
type([Type,T]) --> "struct(", idents_types(Type),")", type_1(T)
| "PP(",ident(Type),")", type_1(T)
| "(",type(Type),")", type_1(T)
| basic_type(Type), type_1(T)
| "error1_type", type_1(T).
type_1([Type1,Type2,T]) --> type(Type1),"*",type(Type2), type_1(T).
type_1([]) --> [].
edit
I fixed several problems, in both your and mine code. Now it parses the example...
type([Type,T]) --> "struct(", idents_types(Type), ")", type_1(T)
| "PP(", type(Type), ")", type_1(T)
| "(", type(Type), ")", type_1(T)
| basic_type(Type), type_1(T)
| "error1_type", type_1(T).
% my mistake here...
type_1([]) --> [].
type_1([Type,T]) --> "*",type(Type), type_1(T).
% the output Type was unbound on ZZ,etc
basic_type('ZZ') --> "ZZ".
basic_type('BOOL') --> "BOOL".
basic_type('STRING') --> "STRING".
basic_type(Type) --> ident(Type).
% here would be better to factorize ident(Ident),":",type(Type)
idents_types([Ident,Type|Ids]) --> ident(Ident),":",type(Type),",",
idents_types(Ids).
idents_types([Ident,Type]) --> ident(Ident),":",type(Type).
idents_types([]) --> [].
% ident//1 forgot to 'eat' a character
ident(Id) --> [C], { between(0'a,0'z,C),C\=0'_},ident_1(Cs),{ atom_codes(Id,[C|Cs]),last(Cs,L),L\=0'_}.
ident_1([C|Cs]) --> [C], { between(0'a,0'z,C);between(0'0,0'9,C);C=0'_ },
ident_1(Cs).
ident_1([]) --> [].

Parsing an integer in a string?

I was going through this example on DCG
integer(I) -->
digit(D0),
digits(D),
{ number_codes(I, [D0|D])
}.
digits([D|T]) -->
digit(D), !,
digits(T).
digits([]) -->
[].
digit(D) -->
[D],
{ code_type(D, digit)
}.
But this example parses an integer only if it's in the beginning of the string (because digit(D0) fails is D0 is not a number code).
How do I go about parsing an integer anywhere in the string, e.g. "abc123def"?
You might add something like this:
non_digits--> [D], {not(code_type(D, digit))}, !, non_digits.
non_digits-->[].
and then add a call to non_digits to skip non digits, e.g.:
integer_skip(I) -->
non_digits,
digit(D0),
digits(D),
{
number_codes(I, [D0|D])
},
non_digits.

DCG parser in Prolog for logic statements

Given the following grammar:
BoolExpr ::= BoolConj { "or" BoolConj }.
BoolConj ::= BoolLit { "and" BoolLit }.
BoolLit ::= [ "not" ] BoolPosLit.
BoolPosLit ::= "true"| "false"| "(" BoolExpr ")".
I want to write a DCG parser for the grammar above. The parser simply only have to accepts (succeeds) or rejects (fails) on a given input,
Here is what I have:
boolepxr([or(O)|BoolExpr]) -->
['or'],
boolconj(O),
boolexpr(BoolExpr).
boolconj([and(A)|BoolConj]) -->
['and'],
boollit(A),
boolconj(BoolConj).
boollit([not(N)|BoolLit]) -->
['not'],
boolps(N).
boolps([true(B)|BoolPs]) -->
['true'],
boolexpr(B),
boolps(BoolPS).
boolps([false(B)|BoolPs]) -->
['false'],
boolexpr(B),
boolps(BoolPS).
However, when I run this program, I didn't get the appropriate output.
since the DCG must only recognize the input, I have simplified:
boolexpr --> boolconj, [or], boolexpr.
boolexpr --> boolconj.
boolconj --> boollit, [and], boolconj.
boolconj --> boollit.
boollit --> [not], boolps.
boollit --> boolps.
boolps --> [true].
boolps --> [false].
boolps --> ['('], boolexpr, [')'].
yields
?- phrase(boolexpr, [true,or,false,and,not,true]).
true ;
false.

String tokenization in prolog

I have the following context free grammar in a text file 'grammar.txt'
S ::= a S b
S ::= []
I'm opening this file and able to read each line in prolog.
Now i want to tokenize each line and generate a list such as
L=[['S','::=','a','S','b'],['S','::=','#']] ('#' represents empty)
How can i do this?
Write the specification in a DCG. I give you the basic (untested), you'll need to refine it.
parse_grammar([Rule|Rules]) -->
parse_rule(Rule),
parse_grammar(Rules).
parse_grammar([]) --> [].
parse_rule([NT, '::=' | Body]) -->
parse_symbol(NT),
skip_space,
"::=",
skip_space,
parse_symbols(Body),
skip_space, !. % the cut is required if you use findall/3 (see below)
parse_symbols([S|Rest]) -->
parse_symbol(S),
skip_space,
parse_symbols(Rest).
parse_symbols([]) --> [].
parse_symbol(S) -->
[C], {code_type(C, alpha), atom_codes(S, [C])}.
skip_space -->
[C], {code_type(C, space)}, skip_space.
skip_space --> [].
This parse the whole file, using this toplevel:
...,
read_file_to_codes('grammar.txt', Codes),
phrase(parse_grammar(Grammar), Codes, [])).
You say you read the file 1 line at time: then use
...
findall(R, (get_line(L), phrase(parse_rule(R), L, [])), Grammar).
HTH

Resources