I am trying to get a semantic representation for English sentences by using a DCG in Prolog.
This is the outcome I should be aiming for:
?- s(Sem,[john,sleeps],[]).
Sem=sleeps(john).
This is my current status:
?- s(Sem, [john,sleeps],[]).
Sem = (sleeps,john).
My Problem is the following: I really do not have any idea how I am supposed to get my verb outside of the parenthesis.
I am using a simple DCG:
s((VP,NP)) --> np(NP), vp(VP).
s((VP,NP,OBJ)) --> np(NP), vp(VP,OBJ).
np(NP) --> pn(NP).
np(np(det(D),N)) --> det(D),n(N).
np(np(det(D),a(A),N)) --> det(D),a(A),n(N).
n(n(N)) --> cn(N).
vp(V)--> iv(V).
vp(V,OBJ) --> tv(V), np(OBJ).
iv(sleeps) --> [sleeps].
tv(likes) --> [likes].
tv(hates) --> [hates].
det(the) --> [the].
det(a) --> [a].
a(red) --> [red].
a(green) --> [green].
cn(book) --> [book].
cn(table) --> [table].
pn(john) --> [john].
pn(mary) --> [mary].
I appreciate any help since I am still a beginner in programming, especially in prolog.
Related
Using this input:
parse(Parse, [what,did,thomas,eat], [])
I want to produce this output:
sbarq(
whnp(
wp(what)
),
sq(
vbd(did),
np(nnp(thomas)),
vp(
vb(eat),
whnp(
wp(what)
)
) % this is not in the input, I need to copy the entire whnp here
)
with this code:
parse(Tree) --> sbarq(Tree).
% rules
sbarq(sbarq(WHNP, SQ)) --> whnp(WHNP), sq(SQ).
whnp(whnp(WP)) --> wp(WP).
sq(sq(VBD, NP, VP)) --> vbd(VBD), np(NP), vp(VP).
np(np(NNP)) --> nnp(NNP).
vp(vp(VB)) --> vb(VB).
% lexicon
wp(wp(what)) --> [what].
vbd(vbd(did)) --> [did].
nnp(nnp(thomas)) --> [thomas].
vb(vb(eat)) --> [eat].
how can I change my code to copy the whnp into the vp?
You can move WHNP to the desired location as shown below:
% rules
sbarq(sbarq(WHNP, SQ)) --> whnp(WHNP), sq(WHNP, SQ).
whnp(whnp(WP)) --> wp(WP).
sq(WHNP, sq(VBD, NP, VP)) --> vbd(VBD), np(NP), vp(WHNP, VP).
np(np(NNP)) --> nnp(NNP).
vp(WHNP, vp(VB, WHNP)) --> vb(VB).
% lexicon
wp(wp(what)) --> [what].
vbd(vbd(did)) --> [did].
nnp(nnp(thomas)) --> [thomas].
vb(vb(eat)) --> [eat].
Example:
?- phrase(sbarq(T), [what,did,thomas,eat]).
T = sbarq(whnp(wp(what)), sq(vbd(did), np(nnp(thomas)), vp(vb(eat), whnp(wp(what))))).
I created a simple parser for subset of Clojure language. For some reason it returns me a list in format [a,b,c|d], not in format [a,b,c,d]. Morover member(X, List) doesn't work with such format of list properly, i.e.
member(X, [a,b,c|d]).
X = a ;
X = b ;
X = c.
The question is how should I improve my code to fix this problem and get list in usual format ? Or maybe there is a way to transform [a,b,c|d] -> [a,b,c,d] ?
You can call e.g.
main.
[(concat x) (lambda x (inc (inc x)))]
And get:
expr([expr([expr(at(id([c,o,n,c,a,t])))|expr(at(id([x])))])|expr([expr(at(id([l,a,m,b,d,a]))),expr(at(id([x])))|expr([expr(at(id([i,n,c])))|expr([expr(at(id([i,n,c])))|expr(at(id([x])))])])])])
Code:
mydelimiter --> delimiter.
mydelimiter --> delimiter, mydelimiter.
delimiter --> [','].
delimiter --> ['\n'].
delimiter --> ['\t'].
delimiter --> ['\s'].
specsymbol('+') --> ['+'].
specsymbol('-') --> ['-'].
specsymbol('>') --> ['>'].
specsymbol('<') --> ['<'].
specsymbol('=') --> ['='].
specsymbol('*') --> ['*'].
specsymbol('_') --> ['_'].
snum(0) --> ['0'].
snum(1) --> ['1'].
snum(2) --> ['2'].
snum(3) --> ['3'].
snum(4) --> ['4'].
snum(5) --> ['5'].
snum(6) --> ['6'].
snum(7) --> ['7'].
snum(8) --> ['8'].
snum(9) --> ['9'].
numb([A]) --> snum(A).
numb([A|B]) --> snum(A), numb(B).
mynumber(num(X)) --> numb(X).
mystring(str([])) --> quotesymbol, quotesymbol.
mystring(str(S)) --> quotesymbol, anychars(S), quotesymbol.
quotesymbol --> ['\"'].
anychar(A) --> [A], {A \== '\"'}.
anychars([A]) --> anychar(A).
anychars([A|B]) --> anychar(A), anychars(B).
identifier(id(I)) --> id_start_spec(I); id_start_letter(I).
letter(L) --> [L], {is_alpha(L)}.
id_start_letter([L]) --> letter(L).
id_start_letter([L|I]) --> letter(L), ids_l(I).
ids_l([I]) --> letter(I); snum(I); specsymbol(I).
ids_l([I|Is]) --> (letter(I); snum(I); specsymbol(I)), ids_l(Is).
id_start_spec([S]) --> specsymbol(S).
id_start_spec([S|I]) --> specsymbol(S), ids_s(I).
ids_s([I]) --> snum(I); specsymbol(I).
ids_s([I|Is]) --> (snum(I); specsymbol(I)), ids_s(Is).
keyword(kw([C|K])) --> mycolonsymbol(C), id_start_letter(K).
mycolonsymbol(':') --> [':'].
myatom(at(A)) --> mynumber(A); mystring(A); identifier(A); keyword(A).
expression(expr(S)) --> myatom(S).
expression(expr(S)) --> r_br_expression(S).
expression(expr(S)) --> s_br_expression(S).
expression(expr(S)) --> f_br_expression(S).
r_br_expression(S) --> r_openbracketsymbol, expressions(S), r_closedbracketsymbol.
expressions(S) --> expression(S).
expressions([S|SS]) --> expression(S), mydelimiter, expressions(SS).
r_openbracketsymbol --> ['('].
r_closedbracketsymbol --> [')'].
s_br_expression(S) --> s_openbracketsymbol, expressions(S), s_closedbracketsymbol.
s_openbracketsymbol --> ['['].
s_closedbracketsymbol --> [']'].
f_br_expression(S) --> f_openbracketsymbol, expressions(S), f_closedbracketsymbol.
f_openbracketsymbol --> ['{'].
f_closedbracketsymbol --> ['}'].
main :-
read_string(user_input, "\n", "", _, StrIn),
atom_chars(StrIn, L),
phrase(expression(T), L),
writeln(T),
!.
First of all, use a better readable syntax, by setting the following directive in front of your code:
:- set_prolog_flag(double_quotes, chars).
Now you can write in stead of ['}'] the more compact "}", instead of ['\s'] the standard " ". Or specsymbol(+) --> "+". instead.
Second, define letter//1 and anychar//1 like so:
letter(L) --> [L], {char_type(L,alpha)}.
anychar(A) --> [A], {dif(A, '\"')}.
With this you can start to debug, best by using smaller test cases.
Also note that Prolog has a top level (prolog-toplevel), so there is no need to define main at all. Instead, you can type your query directly, or even better, consider a simpler case first:
?- phrase(expression(T),"(inc x)").
T = expr([expr(at(id("inc")))|expr(at(id("x")))]).
?- phrase(expression(T),"(x)").
T = expr(expr(at(id("x")))).
So in the first case there is an odd instance of a partial list and in the second there is no list at all. Instead, both should be (well formed) lists. The culprit is expressions//1 which should rather read:
expressions([S]) --> expression(S).
expressions([S|Ss]) -->
{Ss = [_|_]}, % redundant goal for termination
expression(S), mydelimiter, expressions(Ss).
Note that also
?- phrase(expression(T),"(inc,,,x)").
succeeds, and I am not sure that this is intended.
I have written the DCG (Adjective phrase and prepositional phrase) in prolog, when I tried to run it, by entering ip([every,boy,loved,some,girl]), it shows out of local stack. I realised there is something wrong with the nbar. Can someone help me out? Many thanks.
%tree
treeP(Term):-
% Print the tree assuming indentation 0
treeP(0,Term),
% Tidy up with linefeed
nl.
treeP(_N,Tree):-
% Tree is just a variable
var(Tree),!,
write(Tree).
treeP(N,[Tree|Trees]):-
proper_list([Tree|Trees]),!,
write('['),
N1 is N+1,
treePNEL(N1,[Tree|Trees]),
write(']').
treeP(N,Tree):-
% Nonatomic case
Tree=..[Functor,Argument|Arguments],
!,
% Write the functor and opening parenthesis
write(Functor),write('('),
% Set N1 to new indentation for arguments
atom_length(Functor,M), N1 is N+M+1,
% Pretty-print the arguments
treePNEL(N1,[Argument|Arguments]),
% Write right parenthesis
write(')').
treeP(_N,Tree):-
% Noncompound case
write(Tree).
treePNEL(N,[Tree1,Tree2|Trees]):-
treeP(N,Tree1),
% Go to correct position for further printing
nl, tab(N),
treePNEL(N,[Tree2|Trees]).
treePNEL(N,[Tree]):-
treeP(N,Tree).
ip(Sentence):-
setof(IP,
ip(IP,Sentence,[]),
IP),
treeP(IP).
ip(SSem) --> np(NPSem), ibar(IbarSem),
{var_replace(NPSem,NPSem1),
beta(NPSem1#IbarSem,SSem)}.
ibar(VPSem) --> i(MvdVbL),vp(VPSem,MvdVbL).
i([]) --> [].
i([]) --> [Aux],{isAux(Aux)}.
i([Verb]) --> [InflVerb],{pastInfl(Verb,InflVerb),isVerb(Verb)}.
pastInfl(see,saw).
pastInfl(love,loved).
vp(VbarSem,MvdVbL) --> vbar(VbarSem,MvdVbL).
vbar(VbarSem,MvdVbL) --> v(VSem,MvdVbL), np(NPSem),
{var_replace(VSem,VSem1),
beta(VSem1#NPSem,VbarSem)}.
v(lbd(s, lbd(x,s#lbd(y,Fla))),[]) --> [Verb],
{isVerb(Verb),Fla=..[Verb,x,y]}.
v(lbd(s,lbd(x,s#lbd(y,Fla))),[MvdVb])--> [],
{Fla=..[MvdVb,x,y]}.
np(NbarSem) --> nbar(NbarSem).
nbar(NbarSem) --> adj(AdjSem),nbar(NbarSem1),
{var_replace(AdjSem,AdjSem1),beta(AdjSem1#NbarSem1,NbarSem)}.
nbar(NbarSem) --> det(DetSem),nbar(NbarSem1),
{var_replace(DetSem,DetSem1),beta(DetSem1#NbarSem1,NbarSem)}.
nbar(NSem) --> n(NSem).
nbar(NbarSem) --> nbar(NbarSem1),pp(PPSem),
{var_replace(PPSem,PPSem1),beta(PPSem1#NbarSem1,NbarSem)}.
nbar(NSem) --> n(NSem).
pp(PPSem) --> pbar(PPSem).
pbar(NbarSem) --> po(PPSem),np(NbarSem1),
{var_replace(PPSem,PPSem1),beta(PPSem1#NbarSem1,NbarSem)}.
isVerb(love).
n(lbd(x,boy(x))) --> [boy].
n(lbd(x,girl(x))) --> [girl].
det(lbd(q,lbd(p,exists(x,(q#x & p#x))))) --> [some].
det(lbd(q,lbd(p,forall(x,(q#x -> p#x))))) --> [every].
nbar(NbarSem) --> adj(AdjSem),nbar(NbarSem1)
nbar(NbarSem) --> det(DetSem),nbar(NbarSem1)
nbar(NbarSem) --> nbar(NbarSem1),pp(PPSem)
Tabling is implemented in recent versions of SWI-Prolog. By declaring the predicates (or non-terminals) using left-recursion as tabled predicates (or non-terminals), you can keep their definitions. For details, consult:
http://www.swi-prolog.org/pldoc/man?section=tabling
Given the following grammar:
BoolExpr ::= BoolConj { "or" BoolConj }.
BoolConj ::= BoolLit { "and" BoolLit }.
BoolLit ::= [ "not" ] BoolPosLit.
BoolPosLit ::= "true"| "false"| "(" BoolExpr ")".
I want to write a DCG parser for the grammar above. The parser simply only have to accepts (succeeds) or rejects (fails) on a given input,
Here is what I have:
boolepxr([or(O)|BoolExpr]) -->
['or'],
boolconj(O),
boolexpr(BoolExpr).
boolconj([and(A)|BoolConj]) -->
['and'],
boollit(A),
boolconj(BoolConj).
boollit([not(N)|BoolLit]) -->
['not'],
boolps(N).
boolps([true(B)|BoolPs]) -->
['true'],
boolexpr(B),
boolps(BoolPS).
boolps([false(B)|BoolPs]) -->
['false'],
boolexpr(B),
boolps(BoolPS).
However, when I run this program, I didn't get the appropriate output.
since the DCG must only recognize the input, I have simplified:
boolexpr --> boolconj, [or], boolexpr.
boolexpr --> boolconj.
boolconj --> boollit, [and], boolconj.
boolconj --> boollit.
boollit --> [not], boolps.
boollit --> boolps.
boolps --> [true].
boolps --> [false].
boolps --> ['('], boolexpr, [')'].
yields
?- phrase(boolexpr, [true,or,false,and,not,true]).
true ;
false.
I have the following context free grammar in a text file 'grammar.txt'
S ::= a S b
S ::= []
I'm opening this file and able to read each line in prolog.
Now i want to tokenize each line and generate a list such as
L=[['S','::=','a','S','b'],['S','::=','#']] ('#' represents empty)
How can i do this?
Write the specification in a DCG. I give you the basic (untested), you'll need to refine it.
parse_grammar([Rule|Rules]) -->
parse_rule(Rule),
parse_grammar(Rules).
parse_grammar([]) --> [].
parse_rule([NT, '::=' | Body]) -->
parse_symbol(NT),
skip_space,
"::=",
skip_space,
parse_symbols(Body),
skip_space, !. % the cut is required if you use findall/3 (see below)
parse_symbols([S|Rest]) -->
parse_symbol(S),
skip_space,
parse_symbols(Rest).
parse_symbols([]) --> [].
parse_symbol(S) -->
[C], {code_type(C, alpha), atom_codes(S, [C])}.
skip_space -->
[C], {code_type(C, space)}, skip_space.
skip_space --> [].
This parse the whole file, using this toplevel:
...,
read_file_to_codes('grammar.txt', Codes),
phrase(parse_grammar(Grammar), Codes, [])).
You say you read the file 1 line at time: then use
...
findall(R, (get_line(L), phrase(parse_rule(R), L, [])), Grammar).
HTH