Mildly Context sensitive Grammars - formal-languages

Can Anyone clearly explain what is exactly mildly context sensitive grammars?
Are these grammars can be used to model Natural Languages?
Moreover, are the grammars like indexed grammar,head grammar and tree grammar belong to mildly context sensitive grammars?

The term ‘mildly context-sensitive grammars’ was introduced by Joshi (1985). The intention was to characterize formal grammars that are adequate for the description of natural language. They should be more powerful than context-free grammar (which Huybregts [1984] and Shieber [1985] had shown to be inadequate for natural language) but less powerful than general context-sensitive grammars (which, among other drawbacks, cannot be parsed in polynomial time).
Joshi’s characterization of mildly context-sensitive grammars was biased toward his work on tree-adjoining grammar (TAG). However, together with his students Vijay Shanker and David Weir, Joshi soon discovered that TAGs are equivalent, in terms of the generated languages, to the independently introduced head grammar, linear indexed grammar, and combinatory categorial grammar. This showed that the notion of mildly context-sensitivity is a very general one and not tied to a specific formalism.
Today, the term mildly context-sensitive grammar formalism is used to refer to several grammar formalisms that have some or all of the characteristic properties put forth by Joshi. Many of them are being researched and applied in descriptive and, most prominently, computational linguistics.
References
Riny Huybregts. The Weak Inadequacy of Context-Free Phrase Structure Grammars. In Ger de Haan, Mieke Trommelen, and Wim Zonneveld, editors, Van periferie naar kern, pages 81–99. Foris, Dordrecht, The Netherlands, 1984.
Aravind K. Joshi. Tree Adjoining Grammars: How Much Context-Sensitivity Is Required to Provide Reasonable Structural Descriptions?. In David R. Dowty, Lauri Karttunen, and Arnold M. Zwicky, editors, Natural Language Parsing, pages 206–250. Cambridge University Press, 1985.
Stuart M. Shieber. Evidence Against the Context-Freeness of Natural Language. Linguistics and Philosophy, 8(3):333–343, 1985.

Related

How to translate a mathematically described algorithm to a programming language?

Given an algorithm, described in a paper of your choice with a bunch of symbols and a very special notation.
How do I learn to read such algorithm descriptions and turn them into a computer program?
The following picture describes an algorithm to calculate the incremental local outlier factor:
What is the best approach to translate that to a programming language?
Your answer can also help me by pointing to articles that describe the symbols being used, the notation in general and tutorials on how to read and understand such papers.
You seem to be asking for what is generally described as part of a course in first-order predicate calculus. A general introduction (including notation) can be found here or in your library.
Generally, notation like means "for all that are in the set ..." and would often be implemented in programming language using a for loop or similar looping structure, depending on the particular language.
Introductory books on algorithms will also likely be useful in answering the questions you have. An example would be Sedgewick's book Algorithms in C if your target computer language is C.
I think the question can only be answered in a very broad sense. The pseudocode notation in algorithmic papers does not follow a consistent standard, and sometimes no pseudocode notation for the algorithms in question. A general advice would be to study the problem covered as much as possible and to get into mathematics a bit.
I have written a translator that is able to convert some simple mathematical formulas into various programming languages. This is the input in in mathematical notation:
distance_formula(x1,y1,x2,y2) = sqrt((x1-x2)^2+(y1-y2)^2)
and this is the output in Python:
def distance_formula(x1,y1,x2,y2):
return math.sqrt(((x1-x2)**2)+((y1-y2)**2))
Similarly, there are several pseudocode translators that have been designed to convert descriptions of algorithms into equivalent programs.

How to define the grammar for TeX/LaTeX and Makefile?

Both are technologies that are expressed via languages full of macros, but in a more technical terms, what is the kind of grammar and how to describe their own properties ?
I'm not interested in a graphical representation, by properties I mean a descriptive phrase about this subject, so please don't just go for a BNF/EBNF oriented response full of arcs and graphs .
I assume that both are context-free grammars, but this is a big family of grammars, there is a way to describe this 2 in a more precise way ?
Thanks.
TeX can change the meaning of characters at run time, so it's not context free.
Is my language Context-Free?
I believe that every useful language ends up being Turing-complete, reflexive, etc.
Fortunately that is not the end of the story.
Most of the parser generation tools (yacc, antler, etc) process up to context-free grammars (CFG).
So we divide the language processing problem in 3 steps:
Build an over-generating CFG; this is the "syntactical" part that constitutes a solid base where we add the other components,
Add "semantic" constraints (with some extra syntactic and semantic constraints)
main semantics ( static semantics, pragmatics, attributive semantics, etc)
Writing a context-free grammar is a very standard way of speaking about all the languages!
It is a very clear and didactic notation for languages!! (and sometimes is not telling all the truth).
When We say that "is not context-free, is Turing-complete, ..." you can translate it to "you can count with lots of semantic extra work" :)
How can I speak about it?
Many choices available. I like to do a subset of the following:
Write a clear semantic oriented CFG
for each symbol (T or NT) add/define a set of semantic attributes
for each production rule: add syntactic/semantic constraints predicates
for each production rule: add a set equations to define the values of the attributes
for each production rule: add a English explanation, examples, etc

Context-free grammars versus context-sensitive grammars?

Can someone explain to me why grammars [context-free grammar and context-sensitive grammar] of this kind accepts a String?
What I know is
Context-free grammar is a formal grammar in which every production(rewrite) rule is a form of V→w
Where V is a single nonterminal symbol and w is a string of terminals and/or non-terminals. w can be empty
Context-sensitive grammar is a formal grammar in which left-hand sides and right hand sides of any production (rewrite) rules may be surrounded by a context of terminal and nonterminal symbols.
But how can i explain why these grammar accepts a String?
An important detail here is that grammars do not accept strings; they generate strings. Grammars are descriptions of languages that provide a means for generating all possible strings contained in the language. In order to tell if a particular string is contained in the language, you would use a recognizer, some sort of automaton that processes a given string and says "yes" or "no."
A context-free grammar (CFG) is a grammar where (as you noted) each production has the form A → w, where A is a nonterminal and w is a string of terminals and nonterminals. Informally, a CFG is a grammar where any nonterminal can be expanded out to any of its productions at any point. The language of a grammar is the set of strings of terminals that can be derived from the start symbol.
A context-sensitive grammar (CSG) is a grammar where each production has the form wAx → wyx, where w and x are strings of terminals and nonterminals and y is also a string of terminals. In other words, the productions give rules saying "if you see A in a given context, you may replace A by the string y." It's an unfortunate that these grammars are called "context-sensitive grammars" because it means that "context-free" and "context-sensitive" are not opposites, and it means that there are certain classes of grammars that arguably take a lot of contextual information into account but aren't formally considered to be context-sensitive.
To determine whether a string is contained in a CFG or a CSG, there are many approaches. First, you could build a recognizer for the given grammar. For CFGs, the pushdown automaton (PDA) is a type of automaton that accepts precisely the context-free languages, and there is a simple construction for turning any CFG into a PDA. For the context-sensitive grammars, the automaton you would use is called a linear bounded automaton (LBA).
However, these above approaches, if treated naively, are not very efficient. To determine whether a string is contained in the language of a CFG, there are far more efficient algorithms. For example, many grammars can have LL(k) or LR(k) parsers built for them, which allows you to (in linear time) decide whether a string is contained in the grammar. All grammars can be parsed using the Earley parser, which in O(n3) can determine whether a string of length n is contained in the grammar (interestingly, it can parse any unambiguous CFG in O(n2), and with lookaheads can parse any LR(k) grammar in O(n) time!). If you were purely interested in the question "is string x contained in the language generated by grammar G?", then one of these approaches would be excellent. If you wanted to know how the string x was generated (by finding a parse tree), you can adapt these approaches to also provide this information. However, parsing CSGs is, in general, PSPACE-complete, so there are no known parsing algorithms for them that run in worst-case polynomial time. There are some algorithms that in practice tend to run quickly, though. The authors of Parsing Techniques: A Practical Guide (see below) have put together a fantastic page containing all sorts of parsing algorithms, including one that parses context-sensitive languages.
If you're interested in learning more about parsing, consider checking out the excellent book "Parsing Techniques: A Practical Guide, Second Edition" by Grune and Jacobs, which discusses all sorts of parsing algorithms for determining whether a string is contained in a grammar and, if so, how it is generated by the parsing algorithm.
As was said before, a Grammar doesn't accept a string, but it is simply a way in order to generate specific words of a Language that you analyze. In fact, the grammar as the generative rule in the Formal Language Theory instead the finite state automaton do what you're saying, the recognition of specific strings.
In particular, you need recursive enumerable automaton in order to recognize Type 1 Languages( the Context Sensitive Languages in the Chomsky's Hierarchy ).
A grammar for a specific language only grants to you to specify the property of all the strings which gather to the set of strings of the CS language.
I hope that my explanation was clear.
One easy way to show that a grammar accepts a string is to show the production rules for that string.

How does Prolog technically work? What's under the hood?

I want to learn more about the internals of Prolog and understand how this works.
I know how to use it. But not how it works internally. What are the names of the algorithms and concepts used in Prolog?
Probably it builds some kind of tree structure or directed object graph, and then upon queries it traveres that graph with a sophisticated algorithm. A Depth First Search maybe. There might be some source code around but it would be great to read about it from a high level perspective first.
I'm really new to AI and understanding Prolog seems to be a great way to start, imho. My idea is to try to rebuild something similar and skipping the parser part completely. I need to know the directions in which I have to do my research efforts.
What are the names of the algorithms and concepts used in Prolog?
Logic programming
Depth-first, backtracking search
Unification
See Sterling & Shapiro, The Art of Prolog (MIT Press) for the theory behind Prolog.
Probably it builds some kind of tree structure or directed object graph, and then upon queries it traveres that graph with a sophisticated algorithm. A Depth First Search maybe.
It doesn't build the graph explicitly, that wouldn't even be possible with infinite search spaces. Check out the first chapters of Russell & Norvig for the concept of state-space search. Yes, it does depth-first search with backtracking, but no, that isn't very sophisticated. It's just very convenient and programming alternative search strategies isn't terribly hard in Prolog.
understanding Prolog seems to be a great way to start, imho.
Depends on what you want to do, but knowing Prolog certainly doesn't hurt. It's a very different way of looking at programming. Knowing Prolog helped me understand functional programming very quickly.
My idea is to try to rebuild something similar and skipping the parser part completely
You mean skipping the Prolog syntax? If you happen to be familiar with Scheme or Lisp, then check out section 4.4 of Abelson & Sussman where they explain how to implement a logic programming variant of Scheme, in Scheme.
AI is a wide field, Prolog only touches symbolic AI. As for Prolog, the inner workings are too complex to explain here, but googling will give you plenty of resources. E.g. http://www.amzi.com/articles/prolog_under_the_hood.htm .
Check also Wikipedia articles to learn about the other areas of AI.
You might also want to read about the Warren Abstract Machine
typically, prolog code is translated to WAM instructions and then executed more efficiently.
I would add:
Programming Languages: An interpreter based approach by Samuel N. Kamin. The book is out of print, but you may find it in a University Library. It contains a Prolog implementation in Pascal.
Tim Budd's "The Kamin Interpreters in C++" (in postscript)
The book by Sterling and Shapiro, mentioned by larsmans, actually contains an execution model of Prolog. It's quite nice and explains clearly "how Prolog works". And it's an excellent book!
There are also other sources you could try. Most notably, some Lisp books build pedagogically-oriented Prolog interpreters:
On Lisp by paul Graham (in Common Lisp, using -- and perhaps abusing -- macros)
Paradigms of Artificial Intelligence Programming by Peter Norvig (in Common Lisp)
Structure and Interpretation of Computer Programs by Abelson and Sussman (in Scheme).
Of these, the last one is the clearest (in my humble opinion). However, you'd need to learn some Lisp (either Common Lisp or Scheme) to understand those.
The ISO core standard for Prolog also contains an execution model. The execution model is of interest since it gives a good model of control constructs such as cut !/0, if-then-else (->)/2, catch/3 and throw/1. It also explains how to conformantly deal with naked variables.
The presentation in the ISO core standard is not that bad. Each control construct is described in a form of a prose use case with a reference to an abstract Prolog machine consisting of a stack, etc.. Then there are pictures that show the stack before and after execution of the control construct.
The cheapest source is ANSI:
http://webstore.ansi.org/RecordDetail.aspx?sku=INCITS%2FISO%2FIEC+13211-1-1995+%28R2007%29
In addition to the many good answers already posted, I add some historical facts on Prolog.
Wikipedia on Prolog: Prolog was created around 1972 by Alain Colmerauer with Philippe Roussel, based on Robert Kowalski's procedural interpretation of Horn clauses.
Alain was a French computer scientist and professor at Aix-Marseille University from 1970 to 1995. Retired in 2006, he remained active until he died in 2017. He was named Chevalier de la Legion d’Honneur by the French government in 1986.
The inner works of Prolog can best be explained by its inventor in this article Prolog in 10 figures. It was published in Communications of the ACM, vol. 28, num. 12, December. 1985.
Prolog uses a subset of first order predicate logic, called Horn logic. The algorithm used to derive answers is called SLD resolution.

A question about logic and the Curry-Howard correspondence

Could you please explain me what is the basic connection between the fundamentals of logical programming and the phenomenon of syntactic similarity between type systems and conventional logic?
The Curry-Howard correspondence is not about logic programming, but functional programming. The fundamental mechanic of Prolog is justified in proof theory by John Robinson's resolution technique, which shows how it is possible to check whether logical formulae expressed as Horn clauses are satisfiable, that is, whether you can find terms to substitue for their logic variables that make them true.
Thus logic programming is about specifying programs as logical formulae, and the calculation of the program is some form of proof inference, in Prolog reolution, as I have said. By contrast the Curry-Howard correspondence shows how proofs in a special formulasition of logic, called natural deduction, correspond to programs in the lambda calculus, with the type of the program corresponding to the formula that the proof proves; computation in the lambda calculus corresponds to an important phenomenon in proof theory called normalisation, which transforms proofs into new, more direct proofs. So logic programming and functional programming correspond to different levels in these logics: logic programs match formulae of a logic, whilst functional programs match proofs of formulae.
There's another difference: the logics used are generally different. Logic programming generally uses simpler logics — as I said, Prolog is founded on Horn clauses, which are a highly restricted class of formulae where implications may not be nested, and there are no disjunctions, although Prolog recovers the full strength of classical logic using the cut rule. By contrast, functional programming languages such as Haskell make heavy use of programs whose types have nested implications, and are decorated by all kinds of forms of polymorphism. They are also based on intuitionistic logic, a class of logics that forbids use of the principle of the excluded middle, which Robinson's computational mechanism is based on.
Some other points:
It is possible to base logic programming on more sophisticated logics than Horn clauses; for example, Lambda-prolog is based on intuitionistic logic, with a different computation mechanism than resolution.
Dale Miller has called the proof-theoretic paradigm behind logic programming the proof search as programming metaphor, to contrast with the proofs as programs metaphor that is another term used for the Curry-Howard correspondence.
Logic programming is fundamentally about goal directed searching for proofs. The structural relationship between typed languages and logic generally involves functional languages, although sometimes imperative and other languages - but not logic programming languages directly. This relationship relates proofs to programs.
So, logic programming proof search can be used to find proofs that are then interpreted as functional programs. This seems to be the most direct relationship between the two (as you asked for).
Building whole programs this way isn't practical, but it can be useful for filling in tedious details in programs, and there's some important examples of this in practice. A basic example of this is structural subtyping - which corresponds to filling in a few proof steps via a simple entailment proof. A much more sophisticated example is the type class system of Haskell, which involves a particular kind of goal directed search - in the extreme this involves a Turing-complete form of logic programming at compile time.

Resources