Context-sensitive grammar - formal-languages

I'm looking for a context-sensitive grammar that describes the following language:
L = { ww | w ∈ {a,b}*, |w| ≥ 1} <br>
I've got problems with the fact that no rules such as X -> ε are allowed and therefore I can't place any nonterminal indicating the "middle" of the word.
Is there any trick to the problem?
If you happen to know the answer, please help.

Sure, this is actually easy. In a context-sensitive grammar, you can have strings on the LHS; that's the context. So let's say you end up with a string like this:
abababWababab
Alright, so you don't want a rule like
W := -empty-
Excellent. How about these rules?
aWa := aa
aWb := ab
bWa := ba
bWb := bb
Of course, this implies that you should avoid introducing W unless you're sure you're going to have a non-empty string.

Related

Is it possible to represent a context-free grammar with first-order logic?

Briefly, I have a EBNF grammar and so a parse-tree, but I do not know if there is a procedure to translate it in First Order Logic.
For example:
DR ::= E and P
P ::= B | (and P)* | (or P)*
B ::= L | P (and L P)
L ::= a
Yes, there is. The general pattern for translating a production of the form
A ::= B C ... D
is to paraphrase is declaratively as saying
A sequence of terminals s is an A (or: A generates the sequence s, if you prefer that formulation) if:
s is the concatenation of s_1, s_2, ... s_n, and
s_1 is a B / B generates the sequence s_1, and
s_2 is a C / C generates the sequence s_2, and
...
s_n is a D / D generates the sequence s_n.
Assuming we write these in the obvious way using a generates predicate, and that we can write concatenation using a || operator, your first rule becomes (if I am right to guess that E and P are non-terminals and "and" is a terminal symbol) something like
generates(DR,s) ⊃ generates(E,s1)
∧ generates(and,s2)
∧ generates(P,s3)
∧ s = s1 || s2 || s3
To establish the consequent (i.e. prove that s is an A), prove the antecedents. As long as the grammar does actually generate some sentences, and as long as you have some premises defining the "generates" relation for terminal symbols, the proof will be straightforward.
Prolog definite-clause grammars are a beautiful instantiation of this pattern. It takes some of us a while to understand and appreciate the use of difference lists in DCGs, but they handle the partitioning of s into subsequences and the association of the subsequences with the different parts of the right hand side much more elegantly than the simple translation into logic given above.

A language that can be recognised by a TM but cannot be decided by a TM?

Can a language which can be recognised by a TM but cannot be decided by a TM?
example of a language which can be recognised by a TM but cannot be
decided by a TM
Would the answer be:
TM={<M,w> M is a TM that accepts input string w}
Could I be wrong?
What is the difference between decidability and recognisability?
In short, Any string that a recognized by a TM is called TM recognizable whereas any strings that is acceptable by a TM is called TM decidable.
For your first question - is there a language that is recognizable by a TM but not decidable by a TM? - the answer is "yes," and the language you've given, which is the universal language, is an example of such a language.
For your second question - what's the difference between decidability and recognizability? - the answer you've given is on the right track, but as written as incorrect. Remember that decidability and recognizability are properites of languages, not strings. There's no such thing as a "decidable string" or a "recognizable string."
A language L is decidable if there's a TM M with the following properties: for every string w &in; L, M accepts w, and for every string w ∉ L, M rejects w. In other words, if you don't know whether w is in L or not, you can run M on w, wait for it to give you an answer, and discover the answer.
A language L is recognizable if there's a TM M with the following properties: for every string w &in; L, M accepts w, and for every string w ∉ L, M does not accept w (that is, either M loops on w, or M rejects w). In other words, if you are sure that w &in; L and want to confirm this, you can run M on w, watch it accept w, and be certain that your answer was right, but if you didn't know in advance whether w is in L, you might not be able to use M to find out the answer, since M might loop on w.

Using Evaluate with a Pure Function and SetDelayed

I want to evaluate f below by passing a list to some function:
f = {z[1] z[2], z[2]^2};
a = % /. {z[1]-> #1,z[2]-> #2};
F[Z_] := Evaluate[a] & ## Z ;
So now if I try F[{1,2}] I get {2, 4} as expected. But looking closer ?F returns the definition
F[Z_] := (Evaluate[a] &) ## Z
which depends on the value of a, so if we set a=3 and then evaluate F[{1,2}], we get 3. I know that adding the last & makes the Evaluate[a] hold, but what is an elegant work around? Essentially I need to force the evaluation of Evaluate[a], mainly to improve efficiency, as a is in fact quite complicated.
Can someone please help out, and take into consideration that f has to contain an Array[z,2] given by some unknown calculation. So writing
F[Z_] := {Z[[1]]Z[[2]],Z[[2]]^2}
would not be enough, I need this to be generated automatically from our f.
Many thanks for any contribution.
Please consider asking your future questions at the dedicated StackExchange site for Mathematica.
Your questions will be much less likely to become tumbleweeds and may be viewed by many experts.
You can inject the value of a into the body of both Function and SetDelayed using With:
With[{body = a},
F[Z_] := body & ## Z
]
Check the definition:
Definition[F]
F[Z$_] := ({#1 #2, #2^2} &) ## Z$
You'll notice Z has become Z$ due to automatic renaming within nested scoping constructs but the behavior is the same.
In the comments you said:
And again it bothers me that if the values of z[i] were changed, then this workaround would fail.
While this should not be a problem after F[Z_] is defined as above, if you wish to protect the replacement done for a you could use Formal Symbols instead of z. These are entered with e.g. Esc$zEsc for Formal z. Formal Symbols have the attribute Protected and exist specifically to avoid such conflicts as this.
This looks much better in a Notebook than it does here:
f = {\[FormalZ][1] \[FormalZ][2], \[FormalZ][2]^2};
a = f /. {\[FormalZ][1] -> #1, \[FormalZ][2] -> #2};
Another approach is to do the replacements inside a Hold expression, and protect the rules themselves from evaluation by using Unevaluated:
ClearAll[f, z, a, F, Z]
z[2] = "Fail!";
f = Hold[{z[1] z[2], z[2]^2}];
a = f /. Unevaluated[{z[1] -> #1, z[2] -> #2}] // ReleaseHold;
With[{body = a},
F[Z_] := body & ## Z
]
Definition[F]
F[Z$_] := ({#1 #2, #2^2} &) ## Z$

Noncommutative Multiplication and Negative coeffcients at the Beginning of an Expression in Mathematica

With the help of some very gracious stackoverflow contributors in this post, I have the following new definition for NonCommutativeMultiply (**) in Mathematica:
Unprotect[NonCommutativeMultiply];
ClearAll[NonCommutativeMultiply]
NonCommutativeMultiply[] := 1
NonCommutativeMultiply[___, 0, ___] := 0
NonCommutativeMultiply[a___, 1, b___] := a ** b
NonCommutativeMultiply[a___, i_Integer, b___] := i*a ** b
NonCommutativeMultiply[a_] := a
c___ ** Subscript[a_, i_] ** Subscript[b_, j_] ** d___ /; i > j :=
c ** Subscript[b, j] ** Subscript[a, i] ** d
SetAttributes[NonCommutativeMultiply, {OneIdentity, Flat}]
Protect[NonCommutativeMultiply];
This multiplication is great, however, it does not deal with negative values at the beginning of an expression, i.e.,
a**b**c + (-q)**c**a
should simplify to
a**b**c - q**c**a
and it will not.
In my multiplication, the variable q (and any integer scaler) is commutative; I am still trying to write a SetCommutative function, without success. I am not in desperate need of SetCommutative, it would just be nice.
It would also be helpful if I were able to pull all of the q's to the beginning of each expression, i.e.,:
a**b**c + a**b**q**c**a
should simplify to:
a**b**c + q**a**b**c**a
and similarly, combining these two issues:
a**b**c + a**c**(-q)**b
should simplify to:
a**b**c - q**a**c**b
At the current time, I would like to figure out how to deal with these negative variables at the beginning of an expression and how to pull the q's and (-q)'s to the front as above. I have tried to deal with the two issues mentioned here using ReplaceRepeated (\\.), but so far I have had no success.
All ideas are welcome, thanks...
The key to doing this is to realize that Mathematica represents a-b as a+((-1)*b), as you can see from
In[1]= FullForm[a-b]
Out[2]= Plus[a,Times[-1,b]]
For the first part of your question, all you have to do is add this rule:
NonCommutativeMultiply[Times[-1, a_], b__] := - a ** b
or you can even catch the sign from any position:
NonCommutativeMultiply[a___, Times[-1, b_], c___] := - a ** b ** c
Update -- part 2. The general problem with getting scalars to front is that the pattern _Integer in your current rule will only spot things that are manifestly integers. It wont even spot that q is an integer in a construction like Assuming[{Element[q, Integers]}, a**q**b].
To achieve this, you need to examine assumptions, a process that is probably to expensive to be put in the global transformation table. Instead I would write a transformation function that I could apply manually (and maybe remove the current rule form the global table). Something like this might work:
NCMScalarReduce[e_] := e //. {
NonCommutativeMultiply[a___, i_ /; Simplify#Element[i, Reals],b___]
:> i a ** b
}
The rule used above uses Simplify to explicitly query assumptions, which you can set globally by assigning to $Assumptions or locally by using Assuming:
Assuming[{q \[Element] Reals},
NCMScalarReduce[c ** (-q) ** c]]
returns -q c**c.
HTH
Just a quick answer that repeats some of the comments from the previous question.
You can remove a couple of the definitions and solve all of the parts of this question using the rule that acts on Times[i,c] where i is commutative and c has the default of Sequence[]
Unprotect[NonCommutativeMultiply];
ClearAll[NonCommutativeMultiply]
NonCommutativeMultiply[] := 1
NonCommutativeMultiply[a___, (i:(_Integer|q))(c_:Sequence[]), b___] := i a**Switch[c, 1, Unevaluated[Sequence[]], _, c]**b
NonCommutativeMultiply[a_] := a
c___**Subscript[a_, i_]**Subscript[b_, j_] ** d___ /; i > j := c**Subscript[b, j]**Subscript[a, i]**d
SetAttributes[NonCommutativeMultiply, {OneIdentity, Flat}]
Protect[NonCommutativeMultiply];
This then works as expected
In[]:= a**b**q**(-c)**3**(2 a)**q
Out[]= -6 q^2 a**b**c**a
Note that you can generalize (_Integer|q) to work on more general commutative objects.

Parsing expressions with an undefined number of arguments

I'm trying to parse a string in a self-made language into a sort of tree, e.g.:
# a * b1 b2 -> c * d1 d2 -> e # f1 f2 * g
should result in:
# a
* b1 b2
-> c
* d1 d2
-> e
# f1 f2
* g
#, * and -> are symbols. a, b1, etc. are texts.
Since the moment I know only rpn method to evaluate expressions, and my current solution is as follows. If I allow only a single text token after each symbol I can easily convert expression first into RPN notation (b = b1 b2; d = d1 d2; f = f1 f2) and parse it from here:
a b c -> * d e -> * # f g * #
However, merging text tokens and whatever else comes seems to be problematic. My idea was to create marker tokens (M), so RPN looks like:
a M b2 b1 M c -> * M d2 d1 M e -> * # f2 f1 M g * #
which is also parseable and seems to solve the problem.
That said:
Does anyone have experience with something like that and can say it is or it is not a viable solution for the future?
Are there better methods for parsing expressions with undefined arity of operators?
Can you point me at some good resources?
Note. Yes, I know this example very much resembles Lisp prefix notation and maybe the way to go would be to add some brackets, but I don't have any experience here. However, the source text must not contain any artificial brackets and also I'm not sure what to do about potential infix mixins like # a * b -> [if value1 = value2] c -> d.
Thanks for any help.
EDIT: It seems that what I'm looking for are sources on postfix notation with a variable number of arguments.
I couldn't fully understand your question, but it seems what you want is a grammar definition and a parser generator. I suggest you take a look at ANTLR, it should be pretty straightforward with it to define a grammar for either your original syntax or the RPN.
Edit: (After exercising self-criticism, and making some effort to understand the question details.) Actually, the language grammar is unclear from your example. However, it seems to me, that the advantages of the prefix/postfix notations (i.e. that you need neither parentheses nor a precedence-aware parser) stem from the fact that you know the number of arguments every time you encounter an operator, therefore you know exactly how many elements to read (for prefix notation) or to pop from the stack (for postfix notation). OTOH, I beleive that having operators which can have variable number of arguments makes prefix/postfix notations not simply difficult to parse but outright ambiguous. Take the following expression for example:
# a * b c d
Which of the following three is the canonical form?
(a, *(b, c, d))
(a, *(b, c), d)
(a, *(b), c, d)
Without knowing more about the operators, it is impossible to tell. Of course you could define some sort of greedyness of the operators, e.g. * is greedier than #, so it gobbles up all the arguments. But this would beat the purpose of a prefix notation, because you simply wouldn't be able to write down the second variant from the above three; not without additinonal syntactic elements.
Now that I think of it, it is probably not by sheer chance that none of the programming languages I know support operators with a variable number of arguments, only functions/procedures.

Resources