Chomsky Normal Form Conversion Algorithm - complexity-theory

Why do we add a new start state S0 -> S when we want to convert a grammar to Chomsky normal form? What goes wrong if we do not do that?
At first I thought it's because of epsilon rules. But we do not remove an epsilon rule from start variable. So, what is benefit of adding S0 -> S?
Thanks

Depending on whether the empty string is in the language you might have the rule $S --> \epsilon$ (or $S_0 --> \epsilon$). This could delete an arbitrary number of symbols $S$ if these could appear on the right hand sides of rules. Because we do not want the start symbol to appear again, we introduce a new one.
This way we get exactly one more symbol per application of a rule A -> BC.

I think I have some explanation. If a grammar is like this:
S --> S1
S1 --> S
S1 --> a
Then, at the step of removing "unit rules" since we do not consider any specific order, we might remove S --> S1 first and we will have:
S1 --> S1
S1 --> a
and the start variable is entirely removed.

Related

What is the meaning of a double slash `//` after the predicate name in Prolog, appearing in the context of DCGs?

Some "predicate indicators" (this is the ISO Standard terminology for a syntactic name/arity expression denoting a predicate or functor (both of these terms are equivalent) are not content with a single slash, but actually take two. This always occurs in the context of DCGs. Examples:
syntax_error//1: "Throw the syntax error Error at the current location of the input. This predicate is designed to be called from the handler of phrase_from_file/3."
js_expression(+Expression)//: "Emit a single JSON argument."
According to the recent drafts WDTR 13211-3 (3.19) this is called a non-terminal indicator. Similar to a predicate indicator (3.131) it is used to denote one particular non-terminal.
Note that most implementations translate a non-terminal nt//n to a predicate nt/n+2. You cannot rely on the precise way of translation, though. And thus the outcome of calling a non-terminal directly by calling the corresponding predicate, that is, with the same name and two extra arguments is not defined. In particular the second additional argument has to be handled with care. A direct use might violate steadfastness, in particular when using dcg-semicontext.
What is the meaning of a double slash // after the predicate name in Prolog, appearing in the context of DCGs?
It is used by the term rewrite system of Prolog (SWI-Prolog src), but for a person it lets you know that the predicate is a DCG and has two hidden arguments added to the end of the predicate.
For example here is a very simple DCG that has 1 visible argument.
simple_dcg(X) -->
{ X is 1 + 2 }.
When the listing is seen
?- listing(simple_dcg).
simple_dcg(X, A, B) :-
X is 1+2,
B=A.
true.
the two extra hidden arguments (A, B) appear.
If you have been following my EDCG questions on SWI-Prolog forum then you know it can get much more complicated.

Recognize phrases ending by "?" from a given text in Prolog

I'm gonna write a program in Prolog in order to analyze a text and to recognize the questions within it.
Given a text, the program have to recognize all sentences ending by an interrogative mark and save them in a list. Then every element of that list (that is, each phrase ending by "?") will be analyzed and simplified to make sure they will start with the "WH-questions".
Here an example:
"What is climate change?
The planet's climate has constantly been changing over geological time. [...]
What is the "greenhouse effect"?
The greenhouse effect refers to the way the Earth's atmosphere traps some of the energy from the Sun. [...].
The question is: how will these balance out? "
The list should contain: ["What is climate change?","What is the greenhouse effect ?", " how will these balance out?"]
Using split_string/4 I obtain this list
L = ["What is climate change", "The planet's (...). What is the greenhouse effect" , "The greenhouse (...). The question is: how will these balance out?"]
I don't know how to analyze and further to split each elements of the list in order to have the first list I've shown you.
Can you help me, please? Thanks :)
I suggest to feed a DCG with the output of tokenize_atom:
?- tokenize_atom('What is climate change?', L).
L = ['What', is, climate, change, ?].
Then you can capture all the content between literals 'What' and ?.
To accomplish the capture, library(dcg/basics) has string//1 that could help.
Example:
:- use_module(library(dcg/basics)).
wh_capture(P, Cs) :-
tokenize_atom(P, Tks),
phrase(wh_capture(Cs), Tks).
wh_capture([]) --> [].
wh_capture([C|Cs]) -->
['What'], string(Content), [?], {C=['What'|Content]},
wh_capture(Cs).
wh_capture(Cs) --> string(_), [.], wh_capture(Cs).
Usage:
?- wh_capture('What about you? Phrase to skip. What now?',L).
L = [['What', about, you], ['What', now]]
string//1 has a peculiar behaviour... I usually would place a cut after the end sequence delimiter... like
wh_capture([C|Cs]) -->
['What'], string(Content), [?], {C=['What'|Content]},
!, wh_capture(Cs).
Your approach is naive for any language (and this is a very deep subject), so don't try to re-invent the wheel (at least until you know what to reinvent). Google for a) parsing and then b) [Prolog] Natural Language Processing.
Basically, before further analysis, you need (in the sense to not have a million problems later) to tokenize first.

How to build Prolog grammar parse tree consisting of two sentences joined by a conjunction

I have following Prolog code to recognise a sentence. Notice that it builds a parse tree for the grammar too.
sentence(plural,s(Np,Vp)) -->
noun_phrase(plural,Np),
verb_phrase(plural,Vp).
sentence(singular,s(Np,Vp)) -->
noun_phrase(singular,Np),
verb_phrase(singular,Vp).
I need to have a predicate that can recognise a compound sentence (it consists of two sentences joined by a conjunction). I came up with following code but execution fails. Of course, in my Prolog code there are definitions for noun_phrase, verb_phrase and so on.
compound_sentence(comp_s(s1,Conj,s2)) -->
sentence(_,s1(Np,Vp)),
conjuction(_,Conj),
sentence(_,s2(Np,Vp)).
e.g. When I run this query, it will fail.
?- phrase(compound_sentence(_),
[the,reboot,is,a,success,and,the,user,does,a,save]).
How do you go about detecting compound sentences?
The reason why query fails:
phrase(compound_sentence(_), ...)
because (a) the two subgoals sentence(, s1(Np,Vp)) cannot match the parse tree sentence/2 is building: sentence(, s(Np,Vp)). And (b) the two sentences cannot have the same Np and Vp. Try something like this:
compound_sentence(comp_s(S1,Conj,S2)) -->
sentence(_, S1),
conjuction(_,Conj),
sentence(_, S2).
where S1 = s(Np1, Vp1) corresponding to the first sentence, and S2 = s(Np2, Vp2) for the second.

Chomsky Hierarchy Type 2: Not terminal symbols on left hand site

is it allowed to make two non terminal symbols on the left handed side of a grammatic in type 2 grammatic?
I should define a Type 2 grammatic for the Language L2. It was easy if it is allowed to do a rule like
CB->BC but I'm not sure if this would violate any rules. In Type 1 it'd be easy.
Thank you!
No. According to the Chomsky Hierarchy, a Type-2 Language is characterized by rules in the form $A \rightarrow a$ where $A$ is a variable and $a$ is $(V U T)^{\ast}$,

controlling order of variables in an expression

In Mathematica, how do you change the order of importance of variables? for example: if i enter b+c+a+d, i get a+b+c+d but i want b and d to preceed other variables. so that i get b+d+a+c
note, i'd like to use it where + is non-commutative
First you need to define an ordering function like:
In[1]:= CPOrdering[a]=3;
CPOrdering[b]=1;
CPOrdering[d]=2;
CPOrdering[c]=4;
Although, for more complicated examples, you should probably be smarter about it than this - ie use pattern matching.
Then you can sort expressions using
In[5]:= CirclePlus[a,b,c,d]
SortBy[%,CPOrdering]
Out[5]= a\[CirclePlus]b\[CirclePlus]c\[CirclePlus]d
Out[6]= b\[CirclePlus]d\[CirclePlus]a\[CirclePlus]c
This can then be automated using something like
CPOrdering[a_, b_] := CPOrdering[a] < CPOrdering[b]
CirclePlus[a__] /; (!OrderedQ[{a}, CPOrdering]) := CirclePlus##SortBy[{a}, CPOrdering]
The underlying reason b+c+a+d becomes a+b+c+d in Mathematica is because Plus has the attribute Orderless. In general, a symbol f with attribute Orderless means that the elements of f in an expession f[e1, e2, e3], the elements ei should be sorted into canonical order, and in particular, Mathematica's canonical order equivalent to that of OrderedQ and Ordering.
Orderless is even accounted for during pattern matching:
In[47]:= a+b+c+d /. a+c -> e
Out[47]= b+d+e
It's highly, highly recommended that you do NOT remove the Orderless attribute from Plus because the consequences could be dire for lots of functionality in Mathematica.
As other posters have noted, your best bet is to simply define your own function that is NOT Orderless, and will therefore preserve argument order. You might also find HoldForm useful in very limited circumstances.
Also note that nothing stops you from typesetting symbols in an expression in whatever order you want in a notebook, as long as you don't evaluate-in-place, etc.
So, don't use "+", because Plus[] IS commutative.
Define your own myPlus[x_,y_]:= .... whatever.
If you have an idea of what your new Plus[] should do, post it and we may try to help you with the definition/
HTH!
PS> You may change the definition of Plus[] ... but :)

Resources