Difference between Context-sensitive grammar and Context-free grammar [duplicate] - algorithm

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Context-sensitive grammar and Context-free grammar
In my textbook, here is the explain of these two terms :
Context Sensitive Grammar:
grammar can have productions of the form w1 → w2, where w1 = lAr and
w2 = lwr, where A is a nonterminal symbol, l and r are strings of zero
or more terminal or nonterminal symbols, and w is a nonempty string of
terminal or nonterminal symbols. It can also have the production S → λ
as long as S does not appear on the right-hand side of any other
production.
Context Free Grammar:
grammar can have productions only of the form w1 → w2, where w1 is a
single symbol that is not a terminal symbol. A type 3 grammar can have
productions only of the form w1 → w2 with w1 = A and either w2 = aB or
w2 = a, where A and B are nonterminal symbols and a is a terminal
symbol, or with w1 = S and w2 = λ.
In my textbook, the author said : CSG is a special case of CFG. But, I don't get this point. because in CSG, lAr -> lwr. l and r can be strings of zero or more terminal or nonterminal. So, when it is a string of zero (means : length = 0). we can write lAr as A. So, CSG will be CFG. So, CSG is CFG
Does something I have understand wrong ? Please correct it for me.
Thanks :)

The textbook is in error. As you say, a CFG is a special case of a CSG.
CSGs can express strictly more languages than CFGs can.

Related

Can we make pda for a^n b^n+1?

I am little confused about either can we make pda for a^n b^n+1 language ? As to my knowledge pda can be made only for CFL. Does a^n b^n+1 is cfl or not? Plz help
The Context free grammar of a^n b^n+1 would be:
S -> aS'bb
S' -> aS'b | empty
Since the CFG exists, the PDA is also possible for this language.
Given Language : a^n b^(n+1)
It is a Context Free Language Because Context Free Grammar exits.
S -> aS'bb
S'-> aS'b| epsilon(€)
So, we can construct a PDA(Push Down Automata) for it.
PDA for a^n b^(n+1) is
𝛿(q0, a, Z0) =(q0, aZ0)
𝛿(q0, a, a) =(q0, aa)
𝛿(q0, b, a) =(q1, €(pop))
𝛿(q1, b, a) =(q1, €(pop))
𝛿(q1, b, Z0) =(q2, Z0)
Here, 𝛿 represents transition.
€ represents pop operation.
q0 is the initial state.
q2 is the final state.
whenever, any string in the above language is passed to PDA it reaches final state q2.
If strings not present in above language does not reach q2.hence,those strings are rejected.
examples of strings in the language: a^3b^4 ,a^59b^60 etc.
PDA Image:
[1]: https://i.stack.imgur.com/RPdBP.jpg

Is it possible to represent a context-free grammar with first-order logic?

Briefly, I have a EBNF grammar and so a parse-tree, but I do not know if there is a procedure to translate it in First Order Logic.
For example:
DR ::= E and P
P ::= B | (and P)* | (or P)*
B ::= L | P (and L P)
L ::= a
Yes, there is. The general pattern for translating a production of the form
A ::= B C ... D
is to paraphrase is declaratively as saying
A sequence of terminals s is an A (or: A generates the sequence s, if you prefer that formulation) if:
s is the concatenation of s_1, s_2, ... s_n, and
s_1 is a B / B generates the sequence s_1, and
s_2 is a C / C generates the sequence s_2, and
...
s_n is a D / D generates the sequence s_n.
Assuming we write these in the obvious way using a generates predicate, and that we can write concatenation using a || operator, your first rule becomes (if I am right to guess that E and P are non-terminals and "and" is a terminal symbol) something like
generates(DR,s) ⊃ generates(E,s1)
∧ generates(and,s2)
∧ generates(P,s3)
∧ s = s1 || s2 || s3
To establish the consequent (i.e. prove that s is an A), prove the antecedents. As long as the grammar does actually generate some sentences, and as long as you have some premises defining the "generates" relation for terminal symbols, the proof will be straightforward.
Prolog definite-clause grammars are a beautiful instantiation of this pattern. It takes some of us a while to understand and appreciate the use of difference lists in DCGs, but they handle the partitioning of s into subsequences and the association of the subsequences with the different parts of the right hand side much more elegantly than the simple translation into logic given above.

Name for ":-" infix operator

":-" serves as an infix operator in the Prolog logic programming language that, in the following context, roughly means:
H :- B1, B2, ... BN
H is provable if bodies B1 through BN are all provable.
Somewhat remarkably, in all my time studying Prolog, I've neglected to assign a name to this symbol. Does anybody know what the agreed upon name for :- is?
The :- sign represents an implication arrow. If you write your example with logical symbols it reads:
H ← B1 ∧ B2 ∧ ... ∧ BN
So you can also say: "H is implied by B1 and B2 and ... and BN" or "The body of the rule implies its head."
It is also correct to call the operator itself "implication arrow" or just "implication".
I'm not sure how agreed upon it is, but here's a reference to naming it "neck":
http://www.cse.unsw.edu.au/~billw/prologdict.html#neck

Removing left recursion in DCG - Prolog

I've got a small problem with left recursion in this grammar. I'm trying to write it in Prolog, but I don't know how to remove left recursion.
<expression> -> <simple_expression>
<simple_expression> -> <simple_expression> <binary_operator> <simple_expression>
<simple_expression> -> <function>
<function> -> <function> <atom>
<function> -> <atom>
<atom> -> <number> | <variable>
<binary_operator> -> + | - | * | /
expression(Expr) --> simple_expression(SExpr), { Expr = SExpr }.
simple_expression(SExpr) --> simple_expression(SExpr1), binary_operator(Op), simple_expression(SExpr2), { SExpr =.. [Op, SExpr1, SExpr2] }.
simple_expression(SExpr) --> function(Func), { SExpr = Func }.
function(Func) --> function(Func2), atom(At), { Func = [Func2, atom(At)] }.
function(Func) --> atom(At), { Func = At }.
I've written something like that, but it won't work at all. How to change it to get this program working?
The problem with your program is indeed left recursion; it should be removed otherwise you'll get stuck in an infinite loop
To remove immediate left recursion you replace each rule of the form
A->A a1|A a2|....|b1|b2|....
with:
A -> b1 A'|b2 A'|....
A' -> ε | a1 A'| a2 A'|....
so function would be
function -> atom, functionR.
funtionR -> [].
wiki page
The problem only arises since you are using backward chaining. In forward chaining it is possible to deal with left recursive grammar rules directly. Provided the grammar rules of the form:
NT ==> NT'
Don't form a cycle. You can also use auxiliary computations, i.e. the {}/1, if you place them after the non-terminals of the body and if the non-terminals in the head don't have parameters exclusively going into the auxiliary computations. i.e. the bottom-up condition.
Here is an example left recursive grammar that works perfectly this way in forward chaining:
:- use_module(library(minimal/chart)).
:- use_module(library(experiment/ref)).
:- static 'D'/3.
expr(C) ==> expr(A), [+], term(B), {C is A+B}.
expr(C) ==> expr(A), [-], term(B), {C is A-B}.
expr(A) ==> term(A).
term(C) ==> term(A), [*], factor(B), {C is A*B}.
term(C) ==> term(A), [/], factor(B), {C is A/B}.
term(A) ==> factor(A).
factor(A) ==> [A], {integer(A)}.
Here is a link to the source code of the chart parser. From this link the source code of the forward chainer can be also found. In the following an example session is shown:
?- use_module(library(minimal/hypo)).
?- chart([1,+,2,*,3], N) => chart(expr(X), N).
X = 7
During parsing the chart parser will fill a chart in a bottom up fashion. For each non-terminal p/n in the above productions there will be facts p/n+2. Here is the result of the chart for the above example:
:- thread_local factor/3.
factor(3, 4, 5).
factor(2, 2, 3).
factor(1, 0, 1).
:- thread_local term/3.
term(3, 4, 5).
term(2, 2, 3).
term(6, 2, 5).
term(1, 0, 1).
:- thread_local expr/3.
expr(3, 4, 5).
expr(2, 2, 3).
expr(6, 2, 5).
expr(1, 0, 1).
expr(3, 0, 3).
expr(7, 0, 5).
The answer from #thanosQR is fairly good, but applies to a more general context than DCG, and requires a change in the Parse Tree. Effectively, the 'outcome' of parsing has been removed, that's not good.
If you are interested just in parsing expressions, I posted here something useful.
Answer set programming (ASP) provides another route to implement grammars.
ASP can be implemented with non-deterministic forward chaining and this is what
our library(minimal/asp) provides. The result of ASP are then different models
of the given rules. We use here ASP models to represent a Cocke-Younger-Kasami
chart. We begin our chart with the given words we want to parse which are
represented by word/3 facts. Compared to DCG we do not pass around anymore
lists but instead word positions. The Prolog text calc2.p shows such an implementation
of an ASP based parser. All rules are now (<=)/2 rules, means they are forward
chaining rules. And all heads are now choose/1 heads, means they make a ASP
model choice. We explain how expr is realized, the term is realized similary.
Since we do not have an automatic translation, we did the translation manually.
We will provide the words from right to left and only trigger at the beginning
of each attributed grammar rule:
choose([expr(C, I, O)]) <= posted(expr(A, I, H)), word('+', H, J), term(B, J, O), C is A+B.
choose([expr(C, I, O)]) <= posted(expr(A, I, H)), word('-', H, J), term(B, J, O), C is A-B.
choose([expr(B, I, O)]) <= posted(word('-', I, H)), term(A, H, O), B is -A.
choose([expr(A, I, O)]) <= posted(term(A, I, O)).
As can be seen no extra predicate expr_rest was needed and the translation from
grammar to rules was 1-1. The same happens for term. The execution of such a grammar
requires that first the words are posted from right to left, and the result can
then be read off from the corresponding non-terminal:
?- post(word(78,7,8)), post(word('+',6,7)), post(word(56,5,6)), post(word('*',4,5)),
post(word(34,3,4)), post(word('+',2,3)), post(word(12,1,2)), post(word('-',0,1)),
expr(X,0,8).
X = 1970
We have also made a Prolog text show.p which allows visualizing the ASP model as
a parsing chart. We simply use the common triangular matrix representation. The
parsing chart for the above arithmetic expression has looks as follows:
Peter Schüller (2018) - Answer Set Programming in Linguistics
https://peterschueller.com//pub/2018/2018-schueller-asp-linguistics.pdf
User Manual - Module "asp"
http://www.jekejeke.ch/idatab/doclet/prod/en/docs/15_min/10_docu/02_reference/07_theory/01_minimal/06_asp.html

Parsing expressions with an undefined number of arguments

I'm trying to parse a string in a self-made language into a sort of tree, e.g.:
# a * b1 b2 -> c * d1 d2 -> e # f1 f2 * g
should result in:
# a
* b1 b2
-> c
* d1 d2
-> e
# f1 f2
* g
#, * and -> are symbols. a, b1, etc. are texts.
Since the moment I know only rpn method to evaluate expressions, and my current solution is as follows. If I allow only a single text token after each symbol I can easily convert expression first into RPN notation (b = b1 b2; d = d1 d2; f = f1 f2) and parse it from here:
a b c -> * d e -> * # f g * #
However, merging text tokens and whatever else comes seems to be problematic. My idea was to create marker tokens (M), so RPN looks like:
a M b2 b1 M c -> * M d2 d1 M e -> * # f2 f1 M g * #
which is also parseable and seems to solve the problem.
That said:
Does anyone have experience with something like that and can say it is or it is not a viable solution for the future?
Are there better methods for parsing expressions with undefined arity of operators?
Can you point me at some good resources?
Note. Yes, I know this example very much resembles Lisp prefix notation and maybe the way to go would be to add some brackets, but I don't have any experience here. However, the source text must not contain any artificial brackets and also I'm not sure what to do about potential infix mixins like # a * b -> [if value1 = value2] c -> d.
Thanks for any help.
EDIT: It seems that what I'm looking for are sources on postfix notation with a variable number of arguments.
I couldn't fully understand your question, but it seems what you want is a grammar definition and a parser generator. I suggest you take a look at ANTLR, it should be pretty straightforward with it to define a grammar for either your original syntax or the RPN.
Edit: (After exercising self-criticism, and making some effort to understand the question details.) Actually, the language grammar is unclear from your example. However, it seems to me, that the advantages of the prefix/postfix notations (i.e. that you need neither parentheses nor a precedence-aware parser) stem from the fact that you know the number of arguments every time you encounter an operator, therefore you know exactly how many elements to read (for prefix notation) or to pop from the stack (for postfix notation). OTOH, I beleive that having operators which can have variable number of arguments makes prefix/postfix notations not simply difficult to parse but outright ambiguous. Take the following expression for example:
# a * b c d
Which of the following three is the canonical form?
(a, *(b, c, d))
(a, *(b, c), d)
(a, *(b), c, d)
Without knowing more about the operators, it is impossible to tell. Of course you could define some sort of greedyness of the operators, e.g. * is greedier than #, so it gobbles up all the arguments. But this would beat the purpose of a prefix notation, because you simply wouldn't be able to write down the second variant from the above three; not without additinonal syntactic elements.
Now that I think of it, it is probably not by sheer chance that none of the programming languages I know support operators with a variable number of arguments, only functions/procedures.

Resources