Context-free grammars versus context-sensitive grammars? - algorithm

Can someone explain to me why grammars [context-free grammar and context-sensitive grammar] of this kind accepts a String?
What I know is
Context-free grammar is a formal grammar in which every production(rewrite) rule is a form of V→w
Where V is a single nonterminal symbol and w is a string of terminals and/or non-terminals. w can be empty
Context-sensitive grammar is a formal grammar in which left-hand sides and right hand sides of any production (rewrite) rules may be surrounded by a context of terminal and nonterminal symbols.
But how can i explain why these grammar accepts a String?

An important detail here is that grammars do not accept strings; they generate strings. Grammars are descriptions of languages that provide a means for generating all possible strings contained in the language. In order to tell if a particular string is contained in the language, you would use a recognizer, some sort of automaton that processes a given string and says "yes" or "no."
A context-free grammar (CFG) is a grammar where (as you noted) each production has the form A → w, where A is a nonterminal and w is a string of terminals and nonterminals. Informally, a CFG is a grammar where any nonterminal can be expanded out to any of its productions at any point. The language of a grammar is the set of strings of terminals that can be derived from the start symbol.
A context-sensitive grammar (CSG) is a grammar where each production has the form wAx → wyx, where w and x are strings of terminals and nonterminals and y is also a string of terminals. In other words, the productions give rules saying "if you see A in a given context, you may replace A by the string y." It's an unfortunate that these grammars are called "context-sensitive grammars" because it means that "context-free" and "context-sensitive" are not opposites, and it means that there are certain classes of grammars that arguably take a lot of contextual information into account but aren't formally considered to be context-sensitive.
To determine whether a string is contained in a CFG or a CSG, there are many approaches. First, you could build a recognizer for the given grammar. For CFGs, the pushdown automaton (PDA) is a type of automaton that accepts precisely the context-free languages, and there is a simple construction for turning any CFG into a PDA. For the context-sensitive grammars, the automaton you would use is called a linear bounded automaton (LBA).
However, these above approaches, if treated naively, are not very efficient. To determine whether a string is contained in the language of a CFG, there are far more efficient algorithms. For example, many grammars can have LL(k) or LR(k) parsers built for them, which allows you to (in linear time) decide whether a string is contained in the grammar. All grammars can be parsed using the Earley parser, which in O(n3) can determine whether a string of length n is contained in the grammar (interestingly, it can parse any unambiguous CFG in O(n2), and with lookaheads can parse any LR(k) grammar in O(n) time!). If you were purely interested in the question "is string x contained in the language generated by grammar G?", then one of these approaches would be excellent. If you wanted to know how the string x was generated (by finding a parse tree), you can adapt these approaches to also provide this information. However, parsing CSGs is, in general, PSPACE-complete, so there are no known parsing algorithms for them that run in worst-case polynomial time. There are some algorithms that in practice tend to run quickly, though. The authors of Parsing Techniques: A Practical Guide (see below) have put together a fantastic page containing all sorts of parsing algorithms, including one that parses context-sensitive languages.
If you're interested in learning more about parsing, consider checking out the excellent book "Parsing Techniques: A Practical Guide, Second Edition" by Grune and Jacobs, which discusses all sorts of parsing algorithms for determining whether a string is contained in a grammar and, if so, how it is generated by the parsing algorithm.

As was said before, a Grammar doesn't accept a string, but it is simply a way in order to generate specific words of a Language that you analyze. In fact, the grammar as the generative rule in the Formal Language Theory instead the finite state automaton do what you're saying, the recognition of specific strings.
In particular, you need recursive enumerable automaton in order to recognize Type 1 Languages( the Context Sensitive Languages in the Chomsky's Hierarchy ).
A grammar for a specific language only grants to you to specify the property of all the strings which gather to the set of strings of the CS language.
I hope that my explanation was clear.

One easy way to show that a grammar accepts a string is to show the production rules for that string.

Related

Program to resolve FCG membership

Well I need help, I'm working with languages and free context grammars, and I need to know if there is an algorithm or program that helps to resolve membership issue, this means that giving an string "w" and a FCG G, decide if the string it's on the language or if is not.
I'm looking for a library or a program that can do this for later convert the string into an automata.
First of all, I've only seen such grammars called context-free grammars, not free context grammars. Also, automata is the plural of automaton. Your last statement regarding converting a string into an automaton makes no sense. There is a correspondence between context-free grammars and pushdown automata, but not between strings and automata. Given a context-free grammar, the simplest algorithm, which may not be the most efficient, for deciding if a string is part of the language of the grammar would be to apply every possible production in the grammar to every possible non-terminal string that can be derived from the starting non-terminal. Generate every possible string that of length less than or equal to the string in question. If the string is not there, then it is not a member of the language of the grammar.

How to define the grammar for TeX/LaTeX and Makefile?

Both are technologies that are expressed via languages full of macros, but in a more technical terms, what is the kind of grammar and how to describe their own properties ?
I'm not interested in a graphical representation, by properties I mean a descriptive phrase about this subject, so please don't just go for a BNF/EBNF oriented response full of arcs and graphs .
I assume that both are context-free grammars, but this is a big family of grammars, there is a way to describe this 2 in a more precise way ?
Thanks.
TeX can change the meaning of characters at run time, so it's not context free.
Is my language Context-Free?
I believe that every useful language ends up being Turing-complete, reflexive, etc.
Fortunately that is not the end of the story.
Most of the parser generation tools (yacc, antler, etc) process up to context-free grammars (CFG).
So we divide the language processing problem in 3 steps:
Build an over-generating CFG; this is the "syntactical" part that constitutes a solid base where we add the other components,
Add "semantic" constraints (with some extra syntactic and semantic constraints)
main semantics ( static semantics, pragmatics, attributive semantics, etc)
Writing a context-free grammar is a very standard way of speaking about all the languages!
It is a very clear and didactic notation for languages!! (and sometimes is not telling all the truth).
When We say that "is not context-free, is Turing-complete, ..." you can translate it to "you can count with lots of semantic extra work" :)
How can I speak about it?
Many choices available. I like to do a subset of the following:
Write a clear semantic oriented CFG
for each symbol (T or NT) add/define a set of semantic attributes
for each production rule: add syntactic/semantic constraints predicates
for each production rule: add a set equations to define the values of the attributes
for each production rule: add a English explanation, examples, etc

Mildly Context sensitive Grammars

Can Anyone clearly explain what is exactly mildly context sensitive grammars?
Are these grammars can be used to model Natural Languages?
Moreover, are the grammars like indexed grammar,head grammar and tree grammar belong to mildly context sensitive grammars?
The term ‘mildly context-sensitive grammars’ was introduced by Joshi (1985). The intention was to characterize formal grammars that are adequate for the description of natural language. They should be more powerful than context-free grammar (which Huybregts [1984] and Shieber [1985] had shown to be inadequate for natural language) but less powerful than general context-sensitive grammars (which, among other drawbacks, cannot be parsed in polynomial time).
Joshi’s characterization of mildly context-sensitive grammars was biased toward his work on tree-adjoining grammar (TAG). However, together with his students Vijay Shanker and David Weir, Joshi soon discovered that TAGs are equivalent, in terms of the generated languages, to the independently introduced head grammar, linear indexed grammar, and combinatory categorial grammar. This showed that the notion of mildly context-sensitivity is a very general one and not tied to a specific formalism.
Today, the term mildly context-sensitive grammar formalism is used to refer to several grammar formalisms that have some or all of the characteristic properties put forth by Joshi. Many of them are being researched and applied in descriptive and, most prominently, computational linguistics.
References
Riny Huybregts. The Weak Inadequacy of Context-Free Phrase Structure Grammars. In Ger de Haan, Mieke Trommelen, and Wim Zonneveld, editors, Van periferie naar kern, pages 81–99. Foris, Dordrecht, The Netherlands, 1984.
Aravind K. Joshi. Tree Adjoining Grammars: How Much Context-Sensitivity Is Required to Provide Reasonable Structural Descriptions?. In David R. Dowty, Lauri Karttunen, and Arnold M. Zwicky, editors, Natural Language Parsing, pages 206–250. Cambridge University Press, 1985.
Stuart M. Shieber. Evidence Against the Context-Freeness of Natural Language. Linguistics and Philosophy, 8(3):333–343, 1985.

Testing membership in context-free language

I'm working on a slot-machine mini-game application. The rules for what constitutes a winning prize are rather complex (n of a kind, n of any kind, specific sequences), and to make matters even more complicated, this code should work for a slot-machine with (n >= 3) reels.
So, after some thought, I believe defining a context-free language is the most efficient and extensible way to go. This way I could define the grammar in an XML file.
So my question is, given a string of symbols S, how do I go about testing if S is in a given Context-Free Language? Would I simply exhaust rules until I'm out of valid rules/symbols, or is there a known algorithm that could help. Thanks.
Also, a language like this seems non-regular, am I correct? I've never been good at proofs, so I've avoided trying.
Any comments on my approach would be appreciated as well.
Thanks.
"...given a string of symbols S, how do I go about testing if S is in
a given Context-Free Language?"
If a string w is in L(G); the process of finding a sequence of production rules of G by which w is derived is call parsing. So, you have to create a parse tree to search for some derivation. To do this you perform an exhaustive Breadth-First-Search. There is a serious issue that arises: The searching process may never terminate. To prevent endless searches you have to transform the grammer into what is known as normal form.
"Also, a language like this seems non-regular, am I correct?"
Not necessarily. Every regular language is context-free (because it can be described by a CTG), but not every context-free language is regular.
General cases of context free grammers are hard to evaluate.
However, there are methods to parse grammers in subsets of the context free grammers.
For example: SLR and LL grammers are often used by compilers to parse programming languages, which are also context free languages. To use these, your grammer must be in one of these "families" (remember - there are infinite number of grammers for each context free language).
Some practical tools you might want to use that are generally used for compilers are JavaCC in java and bison in C++.
(If I remember correctly, Bison is SLR parser and JavaCC is LL Parser, but I could be wrong)
P.S.
For a specific slot machine, with n slots and k symbols - the language is definetly regular, since there are at most kn "words" in it, and every finite language is regular. Things obviously get compilcated if you are looking for a grammer for all slot machines.
Your best bet is to actually code this with a proper programming language. A CFG is overkill, because it can be extremely hard to code some, as you say, "rather complex" rules. For example, grammars are poorly suited to talking about the number of things.
For example, how would you code "the number of cherries is > the number of any other object" in such a language? How would the person you're giving the program to do so? CFGs cannot easily express such concepts, and regular expressions cannot sanely do so by any stretch.
The answer is that grammars are not right for this task, unless the slot machines is trying to make English sentences.
You also have to consider what happens when TWO or more "prize sequences" match! Assuming you want to give out the highest prize, you need an ordered list of recognizers. This is not to say you can't code your recognizers with (for example) regular expressions in addition to arbitrary functions. I'm just saying that general CFG parsing is overkill, because what CFGs get you over regular languages (i.e. regular expressions) is the ability to consider parse trees of arbitrary depth (like nested parentheses of level N or more), which is probably not what you care about.
This is not to say that you don't, for example, want to allow regular expressions. You can make that job easy by using a parser generator to recognize regexes involving cherries bananas and pears, see http://en.wikipedia.org/wiki/Comparison_of_parser_generators, which you can then embed, though you might want to simply roll your own recursive descent parser (assuming again you don't care about CFGs, especially if your tokens are bounded length).
For example, here is how I might implement it in pseudocode (ideally you'd use a statically typechecked language with good list manipulation, which I can't think of off the top of my head):
rules = []
function Rule(name, code) {
this.name = name
this.code = code
rules.push(this) # adds them in order
}
##########################
Rule("All the same", regex(.*))
Rule("No two-in-a-row", function(list, counts) {
not regex(.{2}).match(list)
})
Rule("More cherries than anything else", function(list, counts) {
counts[cherries]>counts[x] for all x in counts
or
sorted(counts.items())[0]==cherries
or
counts.greatest()==cherries
})
for token in [cherry, banana, ...]:
Rule("At least 50% "+token, function(list, counts){
counts[token] >= list.length/2
})

What does S-attributed and L-attributed grammar mean?

I'm reading a compiler book and kinda confused when it says "a S-attribute grammar is also a L-attribute grammar". Couldn't understand. Can someone make it clear (an example should be great). Thanks.
L-attributed grammar
L-attributed grammars are a special type of attribute grammars. They allow the attributes to be evaluated in one left-to-right traversal of the abstract syntax tree. As a result, attribute evaluation in L-attributed grammars can be incorporated conveniently in top-down parsing. Many programming languages are L-attributed. Special types of compilers, the narrow compilers, are based on some form of L-attributed grammar. These are comparable with S-attributed grammars. Used for code synthesis.
S-attributed grammar
S-Attributed Grammars are a class of attribute grammars characterized by having no inherited attributes. Inherited attributes, which must be passed down from parent nodes to children nodes of the abstract syntax tree during the semantic analysis of the parsing process, are a problem for bottom-up parsing because in bottom-up parsing, the parent nodes of the abstract syntax tree are created after creation of all of their children. Attribute evaluation in S-attributed grammars can be incorporated conveniently in both top-down parsing and bottom-up parsing. Yacc is based on the S-attributed approach.
Any S-attributed grammar is also an L-attributed grammar.
In L-attributed grammars attribute evaluation can be performed in left-to-right traversal. Since in S-attributed grammars attributes are not inherited, it does not prevent you from doing just that. As such, you can say an S-attributed grammar conforms to that characteristic of an L-grammar.
Simply S-attributed Grammar is the Grammar who has strictly Synthesized type of grammar means Only having Value attribute throughout the parse tree
where as L-Attributed grammar can have both synthesized as well as Inherited grammar with some of the rules like one having the transfer of inheritance from always left to right.
I think it will help you out.

Resources