DCG parser in Prolog for logic statements - prolog

Given the following grammar:
BoolExpr ::= BoolConj { "or" BoolConj }.
BoolConj ::= BoolLit { "and" BoolLit }.
BoolLit ::= [ "not" ] BoolPosLit.
BoolPosLit ::= "true"| "false"| "(" BoolExpr ")".
I want to write a DCG parser for the grammar above. The parser simply only have to accepts (succeeds) or rejects (fails) on a given input,
Here is what I have:
boolepxr([or(O)|BoolExpr]) -->
['or'],
boolconj(O),
boolexpr(BoolExpr).
boolconj([and(A)|BoolConj]) -->
['and'],
boollit(A),
boolconj(BoolConj).
boollit([not(N)|BoolLit]) -->
['not'],
boolps(N).
boolps([true(B)|BoolPs]) -->
['true'],
boolexpr(B),
boolps(BoolPS).
boolps([false(B)|BoolPs]) -->
['false'],
boolexpr(B),
boolps(BoolPS).
However, when I run this program, I didn't get the appropriate output.

since the DCG must only recognize the input, I have simplified:
boolexpr --> boolconj, [or], boolexpr.
boolexpr --> boolconj.
boolconj --> boollit, [and], boolconj.
boolconj --> boollit.
boollit --> [not], boolps.
boollit --> boolps.
boolps --> [true].
boolps --> [false].
boolps --> ['('], boolexpr, [')'].
yields
?- phrase(boolexpr, [true,or,false,and,not,true]).
true ;
false.

Related

How to deal with [a,b|c] format lists?

I created a simple parser for subset of Clojure language. For some reason it returns me a list in format [a,b,c|d], not in format [a,b,c,d]. Morover member(X, List) doesn't work with such format of list properly, i.e.
member(X, [a,b,c|d]).
X = a ;
X = b ;
X = c.
The question is how should I improve my code to fix this problem and get list in usual format ? Or maybe there is a way to transform [a,b,c|d] -> [a,b,c,d] ?
You can call e.g.
main.
[(concat x) (lambda x (inc (inc x)))]
And get:
expr([expr([expr(at(id([c,o,n,c,a,t])))|expr(at(id([x])))])|expr([expr(at(id([l,a,m,b,d,a]))),expr(at(id([x])))|expr([expr(at(id([i,n,c])))|expr([expr(at(id([i,n,c])))|expr(at(id([x])))])])])])
Code:
mydelimiter --> delimiter.
mydelimiter --> delimiter, mydelimiter.
delimiter --> [','].
delimiter --> ['\n'].
delimiter --> ['\t'].
delimiter --> ['\s'].
specsymbol('+') --> ['+'].
specsymbol('-') --> ['-'].
specsymbol('>') --> ['>'].
specsymbol('<') --> ['<'].
specsymbol('=') --> ['='].
specsymbol('*') --> ['*'].
specsymbol('_') --> ['_'].
snum(0) --> ['0'].
snum(1) --> ['1'].
snum(2) --> ['2'].
snum(3) --> ['3'].
snum(4) --> ['4'].
snum(5) --> ['5'].
snum(6) --> ['6'].
snum(7) --> ['7'].
snum(8) --> ['8'].
snum(9) --> ['9'].
numb([A]) --> snum(A).
numb([A|B]) --> snum(A), numb(B).
mynumber(num(X)) --> numb(X).
mystring(str([])) --> quotesymbol, quotesymbol.
mystring(str(S)) --> quotesymbol, anychars(S), quotesymbol.
quotesymbol --> ['\"'].
anychar(A) --> [A], {A \== '\"'}.
anychars([A]) --> anychar(A).
anychars([A|B]) --> anychar(A), anychars(B).
identifier(id(I)) --> id_start_spec(I); id_start_letter(I).
letter(L) --> [L], {is_alpha(L)}.
id_start_letter([L]) --> letter(L).
id_start_letter([L|I]) --> letter(L), ids_l(I).
ids_l([I]) --> letter(I); snum(I); specsymbol(I).
ids_l([I|Is]) --> (letter(I); snum(I); specsymbol(I)), ids_l(Is).
id_start_spec([S]) --> specsymbol(S).
id_start_spec([S|I]) --> specsymbol(S), ids_s(I).
ids_s([I]) --> snum(I); specsymbol(I).
ids_s([I|Is]) --> (snum(I); specsymbol(I)), ids_s(Is).
keyword(kw([C|K])) --> mycolonsymbol(C), id_start_letter(K).
mycolonsymbol(':') --> [':'].
myatom(at(A)) --> mynumber(A); mystring(A); identifier(A); keyword(A).
expression(expr(S)) --> myatom(S).
expression(expr(S)) --> r_br_expression(S).
expression(expr(S)) --> s_br_expression(S).
expression(expr(S)) --> f_br_expression(S).
r_br_expression(S) --> r_openbracketsymbol, expressions(S), r_closedbracketsymbol.
expressions(S) --> expression(S).
expressions([S|SS]) --> expression(S), mydelimiter, expressions(SS).
r_openbracketsymbol --> ['('].
r_closedbracketsymbol --> [')'].
s_br_expression(S) --> s_openbracketsymbol, expressions(S), s_closedbracketsymbol.
s_openbracketsymbol --> ['['].
s_closedbracketsymbol --> [']'].
f_br_expression(S) --> f_openbracketsymbol, expressions(S), f_closedbracketsymbol.
f_openbracketsymbol --> ['{'].
f_closedbracketsymbol --> ['}'].
main :-
read_string(user_input, "\n", "", _, StrIn),
atom_chars(StrIn, L),
phrase(expression(T), L),
writeln(T),
!.
First of all, use a better readable syntax, by setting the following directive in front of your code:
:- set_prolog_flag(double_quotes, chars).
Now you can write in stead of ['}'] the more compact "}", instead of ['\s'] the standard " ". Or specsymbol(+) --> "+". instead.
Second, define letter//1 and anychar//1 like so:
letter(L) --> [L], {char_type(L,alpha)}.
anychar(A) --> [A], {dif(A, '\"')}.
With this you can start to debug, best by using smaller test cases.
Also note that Prolog has a top level (prolog-toplevel), so there is no need to define main at all. Instead, you can type your query directly, or even better, consider a simpler case first:
?- phrase(expression(T),"(inc x)").
T = expr([expr(at(id("inc")))|expr(at(id("x")))]).
?- phrase(expression(T),"(x)").
T = expr(expr(at(id("x")))).
So in the first case there is an odd instance of a partial list and in the second there is no list at all. Instead, both should be (well formed) lists. The culprit is expressions//1 which should rather read:
expressions([S]) --> expression(S).
expressions([S|Ss]) -->
{Ss = [_|_]}, % redundant goal for termination
expression(S), mydelimiter, expressions(Ss).
Note that also
?- phrase(expression(T),"(inc,,,x)").
succeeds, and I am not sure that this is intended.

Adding a parsing constraint to a DCG

Graphic tokens can serve as Prolog operators that don't require single quotes.
A translation of ISO/IEC 13211-1:1995, 6.4.2 "Syntax.Tokens.Names" is:
graphic_token --> kleene_plus(graphic_token_char).
graphic_token_char --> member("#$&*+-./:<=>?#^~\\").
% some auxiliary code
kleene_plus(NT) --> NT, kleene_star(NT).
kleene_star(NT) --> "" | kleene_plus(NT).
member(Xs) --> [X], { member(X,Xs) }.
Subsection 6.4.1 "Syntax.Tokens.Layout Text" adds the following constraint:
A graphic token shall not begin with the character sequence comment open (i.e., "/*").
Enforcing that restriction in the DCG is no big deal...
graphic_token --> graphic_token_char. % 1 char
graphic_token --> % 2+ chars
[C1,C2],
{ phrase((graphic_token_char,graphic_token_char), [C1,C2]) },
{ dif([C1,C2], "/*") },
kleene_star(graphic_token_char).
... but quite ugly!
How do I make it pretty again (and keep it bidirectional)?
I'm not sure this is prettier, but maybe something like this:
graphic_token --> kleene_plus_member("#$&*+-.:<=>?#^~\\",0'/).
graphic_token --> "/", kleene_star_member("#$&+-./:<=>?#^~\\", 0'*).
kleene_plus_member(Xs, Code) --> member(Xs), kleene_star(member([Code|Xs])).
kleene_star_member(Xs, Code) --> "" | member(Xs), kleene_star(member([Code|Xs])).
The first clause of graphic_token parses a graphic token that does not begin with / and the second clause the one which starts with it.

convert EBNF to BNF and use it as DCG format on Prolog

As part of my project I am supposed to convert EBNF to BNF and use DCG to program BNF in SWI-Prolog.
EBNF is as follows:
program -> int main ( ) { declarations statements }
declarations -> { declaration }
declaration -> type identifier [ [digit] ] ;
type -> int | bool | float | char
statements -> { statement }
statement -> ; | block | assignment | if_statement | while_statement
block -> { statements }
assignment -> identifier [ [digit] ] = expression ;
if_statement -> if ( expression ) statement
while_statement -> while ( expression ) statement
expression -> conjunction { || conjunction }
conjunction -> equality { && equality }
equality -> relation [ equ_op relation ]
equ_op -> == | !=
relation -> addition [ rel_op addition ]
rel_op -> < | <= | > | >=
addition -> term { add_op term }
add_op -> + | -
term -> factor { mul_op factor }
mul_op -> * | / | %
factor -> [ unary_op ] primary
unary_op -> - | !
primary -> identifier [ [digit] ] | literal | ( expression ) | type (
expression )
literal --> digit | boolean
identifier -> A | ... | Z
boolean --> true | false
digit --> 0 | ... | 9
My program should take the source file as input and print a message which says the program is syntactically correct or not.
Since I don't have any experience in prolog and watching lots of videos in Youtube and reading tutorials and weblogs which are not helpful at all (at least for me because of lack of experience), I need some help how to do it. Is there anybody please?
I solved this question. It was kind of easy:
program --> ["int"], ["main"], ["("], [")"], ["{"], declarations,
statements, ["}"].
declarations --> declaration.
declarations --> declaration, declarations.
declarations --> [].
declaration --> type, identifier, [";"].
declaration --> type, identifier, ["["], digit, ["]"], [";"].
type --> ["int"].
type --> ["bool"].
type --> ["float"].
type --> ["char"].
statements --> statement.
statements --> statement, statements.
statements --> [].
statement --> [";"].
statement --> block.
statement --> assignment.
statement --> if_statement.
statement --> while_statement.
block --> ["{"], statements, ["}"].
assignment --> identifier, ["["], digit, ["]"], ["="], expression, [";"].
if_statement --> ["if"], ["("], expression, [")"], statement.
while_statement --> ["while"], ["("], expression, [")"], statement.
expression --> conjunction, conjunctions.
conjunctions --> ["||"], conjunction.
conjunctions --> ["||"], conjunction, conjunctions.
conjunctions --> [].
conjunction --> equality, equalities.
equalities --> ["&&"], equality.
equalities --> ["&&"], equality, equalities.
equalities --> [].
equality --> relation.
equality --> relation, equ_op, relation.
equ_op --> ["=="].
equ_op --> ["!="].
relation --> addition.
relation --> addition, rel_op, addition.
rel_op --> ["<"].
rel_op --> ["<="].
rel_op --> [">"].
rel_op --> [">="].
addition --> term, terms.
terms --> add_op, term.
terms --> add_op, term, terms.
terms --> [].
add_op --> ["+"].
add_op --> ["-"].
term --> factor, factors.
factors --> mul_op, factor.
factors --> mul_op, factor, factors.
factors --> [].
mul_op --> ["*"].
mul_op --> ["/"].
mul_op --> ["%"].
factor --> primary.
factor --> unary_op, primary.
unary_op --> ["-"].
unary_op --> ["!"].
primary --> identifier.
primary --> identifier, ["["], digit, ["]"].
primary --> literal.
primary --> ["("], expression, [")"].
primary --> type, ["("], expression, [")"].
literal --> digit.
literal --> boolean.
identifier --> ["A"].
identifier --> ["B"].
identifier --> ["C"].
identifier --> ["D"].
identifier --> ["E"].
identifier --> ["F"].
identifier --> ["G"].
identifier --> ["H"].
identifier --> ["I"].
identifier --> ["J"].
identifier --> ["K"].
identifier --> ["L"].
identifier --> ["M"].
identifier --> ["N"].
identifier --> ["O"].
identifier --> ["P"].
identifier --> ["Q"].
identifier --> ["R"].
identifier --> ["S"].
identifier --> ["T"].
identifier --> ["U"].
identifier --> ["V"].
identifier --> ["W"].
identifier --> ["X"].
identifier --> ["Y"].
identifier --> ["Z"].
boolean -->["true"].
boolean --> ["false"].
digit --> ["0"].
digit --> ["1"].
digit --> ["2"].
digit --> ["3"].
digit --> ["4"].
digit --> ["5"].
digit --> ["6"].
digit --> ["7"].
digit --> ["8"].
digit --> ["9"].

Semantic Representation for enlish sentences in Prolog

I am trying to get a semantic representation for English sentences by using a DCG in Prolog.
This is the outcome I should be aiming for:
?- s(Sem,[john,sleeps],[]).
Sem=sleeps(john).
This is my current status:
?- s(Sem, [john,sleeps],[]).
Sem = (sleeps,john).
My Problem is the following: I really do not have any idea how I am supposed to get my verb outside of the parenthesis.
I am using a simple DCG:
s((VP,NP)) --> np(NP), vp(VP).
s((VP,NP,OBJ)) --> np(NP), vp(VP,OBJ).
np(NP) --> pn(NP).
np(np(det(D),N)) --> det(D),n(N).
np(np(det(D),a(A),N)) --> det(D),a(A),n(N).
n(n(N)) --> cn(N).
vp(V)--> iv(V).
vp(V,OBJ) --> tv(V), np(OBJ).
iv(sleeps) --> [sleeps].
tv(likes) --> [likes].
tv(hates) --> [hates].
det(the) --> [the].
det(a) --> [a].
a(red) --> [red].
a(green) --> [green].
cn(book) --> [book].
cn(table) --> [table].
pn(john) --> [john].
pn(mary) --> [mary].
I appreciate any help since I am still a beginner in programming, especially in prolog.

String tokenization in prolog

I have the following context free grammar in a text file 'grammar.txt'
S ::= a S b
S ::= []
I'm opening this file and able to read each line in prolog.
Now i want to tokenize each line and generate a list such as
L=[['S','::=','a','S','b'],['S','::=','#']] ('#' represents empty)
How can i do this?
Write the specification in a DCG. I give you the basic (untested), you'll need to refine it.
parse_grammar([Rule|Rules]) -->
parse_rule(Rule),
parse_grammar(Rules).
parse_grammar([]) --> [].
parse_rule([NT, '::=' | Body]) -->
parse_symbol(NT),
skip_space,
"::=",
skip_space,
parse_symbols(Body),
skip_space, !. % the cut is required if you use findall/3 (see below)
parse_symbols([S|Rest]) -->
parse_symbol(S),
skip_space,
parse_symbols(Rest).
parse_symbols([]) --> [].
parse_symbol(S) -->
[C], {code_type(C, alpha), atom_codes(S, [C])}.
skip_space -->
[C], {code_type(C, space)}, skip_space.
skip_space --> [].
This parse the whole file, using this toplevel:
...,
read_file_to_codes('grammar.txt', Codes),
phrase(parse_grammar(Grammar), Codes, [])).
You say you read the file 1 line at time: then use
...
findall(R, (get_line(L), phrase(parse_rule(R), L, [])), Grammar).
HTH

Resources