How can I transform an Extended Backus Naur Grammar to its normal representation? - algorithm

I have a grammar that contains expressions between brackets '{}' which represents 0 or more times that expression, and expressions between square brackets '[]', which represent 1 or none times that expression, I think this kind of grammars are called Extended Backus-Naur Form Grammars.
I would like to transform the grammar to its normal form (where there are not brackets nor square brackets).
Is there an existing algorithm to do that?
I know that I can substitute something like A--> B[CD]E to A-->BE, A--> BCDE but I would like to know if there are existing algorithms that I could implement in order to transform those expressions.

The most straightforward way to do that is to replace every EBNF construct with a new rule. Here are the equivalences you can use:
Option
A ::= B [C D] E ;
A ::= B X E ;
X ::= C D | ɛ ;
Where ɛ represents the empty string.
Repetition
A ::= B {C D} E ;
Zero or more times:
A ::= B X E ;
X ::= C D X | ɛ ;
One or more times:
A ::= B X E ;
X ::= C D | C D X ;
Grouping
A ::= B (C D) E ;
A ::= B X E ;
X ::= C D ;
Apply these transformations recursively and you'll end up with vanilla BNF.

Related

Cleaner way to represent languages accepted by DFAs?

I am given 2 DFAs. * denotes final states and -> denotes the initial state, defined over the alphabet {a, b}.
1) ->A with a goes to A. -> A with b goes to *B. *B with a goes to *B. *B with b goes to ->A.
The regular expression for this is clearly:
E = a* b(a* + (a* ba* ba*)*)
And the language that it accepts is L1= {w over {a,b} | w is b preceeded by any number of a's followed by any number of a's or w is b preceeded by any number of a's followed by any number of bb with any number of a's in middle of(middle of bb), end or beginning.}
2) ->* A with b goes to ->* A. ->*A with a goes to *B. B with b goes to -> A. *B with a goes to C. C with a goes to C. C with b goes to C.
Note: A is both final and initial state. B is final state.
Now the regular expression that I get for this is:
E = b* ((ab) * + a(b b* a)*)
Finally the language that this DFA accepts is:
L2 = {w over {a, b} | w is n 1's followed by either k 01's or a followed by m 11^r0' s where n,km,r >= 0}
Now the question is, is there a cleaner way to represent the languages L1 and L2 because it does seem ugly. Thanks in advance.
E = a* b(a* + (a* ba* ba*)*)
= a*ba* + a*b(a* ba* ba*)*
= a*ba* + a*b(a*ba*ba*)*a*
= a*b(a*ba*ba*)*a*
= a*b(a*ba*b)*a*
This is the language of all strings of a and b containing an odd number of bs. This might be most compactly denoted symbolically as {w in {a,b}* | #b(w) = 1 (mod 2)}.
For the second one: the only way to get to state B is to see an a in A, and the only way to get to C from outside C is to see an a in B. C is a dead state and the only way to get to it is to see aa starting in A. That is: if you ever see two as in a row, the string is not in the language; the language is the set of all strings over a and b not containing the substring aa. This might be most compactly denoted symbolically as {(a+b)*aa(a+b)*}^c where ^c means "complement".

Switch from dyadic to monadic interpretation in a J sentence

I am trying to understand composition in J, after struggling to mix and match different phases. I would like help switching between monadic and dyadic phrases in the same sentence.
I just made a simple dice roller in J, which will serve as an example:
d=.1+[:?[#]
4 d 6
2 3 1 1
8 d 12
10 2 11 11 5 11 1 10
This is a chain: "d is one plus the (capped) roll of x occurrences of y"
But what if I wanted to use >: to increment (and skip the cap [: ), such that it "switched" to monadic interpretation after the first fork?
It would read: "d is the incremented roll of x occurrences of y".
Something like this doesn't work, even though it looks to me to have about the right structure:
d=.>:&?[#]
d
>:&? ([ # ])
(If this approach is against the grain for J and I should stick to capped forks, that is also useful information.)
Let's look at a dyadic fork a(c d f h g)b where c,d,f, g and h are verbs and a and b are arguments, which is evaluated as: (a c b) d (a f b) h (a g b) The arguments are applied dyadically to the verbs in the odd positions (or tines c,f and g) - and those results are fed dyadically right to left into the even tines d and h. Also a fork can be either in the form of (v v v) or (n v v) where v stands for verbs and n stands for nouns. In the case of (n v v) you just get the value of n as the left argument to the middle tine.
If you look at your original definition of d=.1+[:?[#] you might notice it simplifies to a dyadic fork with five tines (1 + [: ? #) where the [ # ] can be replaced by # as it is a dyadic fork (see definition above).
The [: (Cap) verb returns no value to the left argument of ? which means that ? acts monadically on the result of a # b and this becomes the right argument to + which has a left argument of 1.
So, on to the question of how to get rid of the [: and use >: instead of 1 + ...
You can also write ([: f g) as f#:g to get rid of the Cap, which means that ([: ? #) becomes ?#:# and now since you want to feed this result into >: you can do that by either:
d1=.>:#:?#:#
d2=. [: >: ?#:#
4 d1 6
6 6 1 5
4 d2 6
2 3 4 5
8 d1 12
7 6 6 4 6 9 8 7
8 d2 12
2 10 10 9 8 12 4 3
Hope this helps, it is a good fundamental question about how forks are evaluated. It would be your preference of whether you use the ([: f g) or f#:g forms of composition.
To summarize the main simple patterns of verb mixing in J:
(f #: g) y = f (g y) NB. (1) monadic "at"
x (f #: g) y = f (x g y) NB. (2) dyadic "at"
x (f &: g) y = (g x) f (g y) NB. (3) "appose"
(f g h) y = (f y) g (h y) NB. (4) monadic fork
x (f g h) y = (x f y) g (x h y) NB. (5) dyadic fork
(f g) y = y f (g y) NB. (6) monadic hook
x (f g) y = x f (g y) NB. (7) dyadic hook
A nice review of those is here (compositions) and here (trains).
Usually there are many possible forms for a verb. To complicate matters more, you can mix many primitives in different ways to achieve to same result.
Experience, style, performance and other such factors influence the way you'll combine the above to form your verb.
In this particular case, I would use #bob's d1 because I find it clearer to read: increase the roll of x copies of y:
>: # ? # $
For the same reason, I am replacing # with $. When I see # in this context, I automatically read "number of elements of", but maybe that's just me.

Is it possible to represent a context-free grammar with first-order logic?

Briefly, I have a EBNF grammar and so a parse-tree, but I do not know if there is a procedure to translate it in First Order Logic.
For example:
DR ::= E and P
P ::= B | (and P)* | (or P)*
B ::= L | P (and L P)
L ::= a
Yes, there is. The general pattern for translating a production of the form
A ::= B C ... D
is to paraphrase is declaratively as saying
A sequence of terminals s is an A (or: A generates the sequence s, if you prefer that formulation) if:
s is the concatenation of s_1, s_2, ... s_n, and
s_1 is a B / B generates the sequence s_1, and
s_2 is a C / C generates the sequence s_2, and
...
s_n is a D / D generates the sequence s_n.
Assuming we write these in the obvious way using a generates predicate, and that we can write concatenation using a || operator, your first rule becomes (if I am right to guess that E and P are non-terminals and "and" is a terminal symbol) something like
generates(DR,s) ⊃ generates(E,s1)
∧ generates(and,s2)
∧ generates(P,s3)
∧ s = s1 || s2 || s3
To establish the consequent (i.e. prove that s is an A), prove the antecedents. As long as the grammar does actually generate some sentences, and as long as you have some premises defining the "generates" relation for terminal symbols, the proof will be straightforward.
Prolog definite-clause grammars are a beautiful instantiation of this pattern. It takes some of us a while to understand and appreciate the use of difference lists in DCGs, but they handle the partitioning of s into subsequences and the association of the subsequences with the different parts of the right hand side much more elegantly than the simple translation into logic given above.

Systematically extract noun arguments from J expression

What is the systematic approach to extracting nouns as arguments from an expression in J? To be clear, an expression containing two literals should become a dyadic expression with the left and right arguments used instead of the literals.
I'm trying to learn tacit style so I prefer not to use named variables if it is avoidable.
A specific example is a simple die roll simulator I made:
>:?10#6 NB. Roll ten six sided dice.
2 2 6 5 3 6 4 5 4 3
>:?10#6
2 1 2 4 3 1 3 1 5 4
I would like to systematically extract the arguments 10 and 6 to the outside of the expression so it can roll any number of any sized dice:
d =. <new expression here>
10 d 6 NB. Roll ten six sided dice.
1 6 4 6 6 1 5 2 3 4
3 d 100 NB. Roll three one hundred sided dice.
7 27 74
Feel free to illustrate using my example, but I'm looking to be able to follow the procedure for arbitrary expressions.
Edit: I just found out that a quoted version using x and y can be automatically converted to tacit form using e.g. 13 : '>:?x#y'. If someone can show me how to find the definition of 13 : I might be able to answer my own question.
If your goal is to learn tacit style, it's better that you simply learn it from the ground up rather than try to memorize an explicit algorithm—J4C and Learning J are good resources—because the general case of converting an expression from explicit to tacit is intractable.
Even ignoring the fact that there have been no provisions for tacit conjunctions since J4, in the explicit definition of a verb you can (1) use control words, (2) use and modify global variables, (3) put expressions containing x and/or y as the operands of an adverb or conjunction, and (4) reference itself. Solving (1), (3), or (4) is very hard in the general case and (2) is just flat out impossible.*
If your J sentence is one of a small class of expressions, there is an easy way to apply the fork rules make it tacit, and this is what is more or less what is implemented in 13 :. Recall that
(F G H) y is (F y) G (H y), and x (F G H) y is (x F y) G (x H y) (Monad/Dyad Fork)
([: G H) y is G (H y), and x ([: G H) y is G (x H y) (Monad/Dyad Capped Fork)
x [ y is x, x ] y is y, and both of [ y and ] y are y (Left/Right)
Notice how forks use their center verbs as the 'outermost' verb: Fork gives a dyadic application of g, while Capped Fork gives a monadic one. This corresponds exactly to the two modes of application of a verb in J, monadic and dyadic. So a quick-and-dirty algorithm for making tacit a "dyadic" expression might look like the following, for F G H verbs and N nouns:
Replace x with (x [ y) and y with (x ] y). (Left/Right)
Replace any other noun n with (x N"_ y)
If you see the pattern (x F y) G (x H y), replace it with x (F G H) y. (Fork)
If you see the pattern G (x H y), replace it with x ([: G H) y. (*Capped Fork()
Repeat 1 through 4 until you attain the form x F y, at which point you win.
If no more simplifications can be performed and you have not yet won, you lose.
A similar algorithm can be derived for "monadic expressions", expressions only dependent on y. Here's a sample derivation.
<. (y - x | y) % x NB. start
<. ((x ] y) - (x [ y) | (x ] y)) % (x [ y) NB. 1
<. ((x ] y) - (x ([ | ]) y)) % (x [ y) NB. 3
<. (x (] - ([ | ])) y) % (x [ y) NB. 3
<. x ((] - ([ | ])) % [) y NB. 3
x ([: <. ((] - ([ | ])) % [)) y NB. 4 and we win
This neglects some obvious simplifications, but attains the goal. You can mix in various other rules to simplify, like the long train rule—if Train is a train of odd length then (F G (Train)) are equivalent (F G Train)—or the observation that x ([ F ]) y and x F y are equivalent. After learning the rules, it shouldn't be hard to modify the algorithm to get the result [: <. [ %~ ] - |, which is what 13 : '<. (y - x | y) % x' gives.
The fail condition is attained whenever an expression containing x and/or y is an operand to an adverb or conjunction. It is sometimes possible to recover a tacit form with some deep refactoring, and knowledge of the verb and gerundial forms of ^: and }, but I am doubtful that this can be done programmatically.
This is what makes (1), (3), and (4) hard instead of impossible. Given knowledge of how $: works, a tacit programmer can find a tacit form for, say, the Ackermann function without too much trouble, and a clever one can even refactor that for efficiency. If you could find an algorithm doing that, you'd obviate programmers, period.
ack1 =: (1 + ])`(([ - 1:) $: 1:)`(([ - 1:) $: [ $: ] - 1:)#.(, i. 0:)
ack2 =: $: ^: (<:#[`]`1:) ^: (0 < [) >:
3 (ack1, ack2) 3
61 61
TimeSpace =: 6!:2, 7!:2#] NB. iterations TimeSpace code
10 TimeSpace '3 ack1 8'
2.01708 853504
10 TimeSpace '3 ack2 8'
0.937484 10368
* This is kind of a lie. You can refactor the entire program involving such a verb through some advanced voodoo magic, cf. Pepe Quintana's talk at the 2012 J Conference. It isn't pretty.
13 : is documented in the vocabulary or NuVoc under : (Explicit).
The basic idea is that the value you want to be x becomes [ and the value you want to be y becomes ]. But as soon as the the rightmost token changes from a noun (value) to a verb like [ or ], the entire statement becomes a train, and you may need to use the verb [: or the conjunctions # or #: to restore the composition behavior you had before.
You can also replace the values with the actual names x and y, and then wrap the whole thing in ((dyad : ' ... ')). That is:
>:?10#6 NB. Roll ten six sided dice.
can become:
10 (dyad : '>: ? x # y') 6 NB. dyad is predefined. It's just 4.
If you only need the y argument, you can use monad, which is prefined as 3. The name verb is also 3. I tend to use verb : when I provide both a monadic and dyadic version, and monad when I only need the monadic meaning.
If your verb is a one-liner like this, you can sometimes convert it automatically to tacit form by replacing the 3 or 4 with 13.
I have some notes on factoring verbs in j that can help you with the step-by-step transformations.
addendum: psuedocode for converting a statement to tacit dyad
This only covers a single statement (one line of code) and may not work if the constant values you're trying to extract are being passed to a conjunction or adverb.
Also, the statement must not make any reference to other variables.
Append [ x=. xVal [ y =. yVal to the statement.
Substitute appropriate values for xVal and yVal.
Rewrite the original expression in terms of the new x and y.
rewrite statement [ x=. xVal [ y=. yVal as:
newVerb =: (4 : 0)
statement ] y NB. we'll fill in x later.
)
(xVal) newVerb yVal
Now you have an explicit definition in terms of x and y. The reason for putting it on multiple lines instead of using x (4 : 'expr') y is that if expr still contains a string literal, you will have to fiddle with escaping the single quotes.
Converting the first noun
Since you only had a pipeline before, the rightmost expression inside statement must be a noun. Convert it to a fork using the following rules:
y → (])
x → ]x ([)
_, __, _9 ... 9 → (_:), (__:), (_9:) ... (9:)
n → n"_ (for any other arbitrary noun)
This keeps the overall meaning the same because the verb you've just created is invoked immediately and applied to the [ y.
Anyway, this new tacit verb in parentheses becomes the core of the train you will build. From here on out, you work by consuming the rightmost expression in the statement, and moving it inside the parentheses.
Fork normal form
From here on out, we will assume the tacit verb we're creating is always a fork.
This new tacit verb isn't actually a fork, but we will pretend it is, because any single-token verb can be rewritten as a fork using the rule:
v → ([: ] v).
There is no reason to actually do this transformation, it's just so I can simplify the rule below and always call it a fork.
We will not use hooks because any hook can be rewritten as a fork with the rule:
(u v) → (] u [: v ])
The rules below should produce trains in this form automatically.
Converting the remaining tokens
Now we can use the following rules to convert the rest of the original pipeline, moving one item at a time into the fork.
For all of these rules, the (]x)? isn't J syntax. It means the ]x may or may not be there. You can't put the ] x in until you transform a usage of x without changing the meaning of the code. Once you transform an instance of x, the ]x is required.
Following the J convention, u and v represent arbitrary verbs, and n is an arbitrary noun. Note that these include verbs
tokens y u (]x)? (fork) ] y → tokens (]x)? (] u fork) ] y
tokens x u (]x)? (fork) ] y → tokens ]x ([ u fork) ] y
tokens n u (]x)? (fork) ] y → tokens (]x)? (n u fork) ] y
tokens u v (]x)? (fork) ] y → tokens u (]x)? ([: v fork) ] y
There are no rules for adverbs or conjunctions, because you should just treat those as part of the verbs. For example +:^:3 should be treated as a single verb. Similarly, anything in parentheses should be left alone as a single phrase.
Anyway, keep applying these rules until you run out of tokens.
Cleanup
You should end up with:
newVerb =: (4 : 0)
] x (fork) ] y
)
(xVal) newVerb yVal
This can be rewritten as:
(xVal) (fork) yVal
And you are done.

string pattern match,the suffix array can solve this or have more solution?

i have a string that random generate by a special characters (B,C,D,F,X,Z),for example to generate a following string list:
B D Z Z Z C D C Z
B D C
B Z Z Z D X
D B Z F
Z B D C C Z
B D C F Z
..........
i also have a pattern list, that is to match the generate string and return a best pattern and extract some string from the string.
string pattern
B D C [D must appear before the C >> DC]
B C F
B D C F
B X [if string have X,must be matched.]
.......
for example,
B D Z Z Z C D C Z,that have B and DC,so that can match by B D C
D B Z C F,that have B and C and F,so that can match by B C F
D B Z D F,that have B and F,so that can match by B F
.......
now,i just think about suffix array.
1.first convert a string to suffix array object.
2.loop each a pattern,that find which suffix array can be matched.
3.compare all matched patterns and get which is a best pattern.
var suffix_array=Convert a string to suffix array.
var list=new List();
for (int i=0;i<pattern length;i++){
if (suffix_array.match(pattern))
list.Add(pattern);
}
var max=list[0];
for (int i=1;i<list.length;i++){
{
if (list[i]>max)
max=list[i];
Write(list[i]);
}
i just think this method is to complex,that need to build a tree for a pattern ,and take it to match suffix array.who have a more idea?
====================update
i get a best solution now,i create a new class,that have a B,C,D,X...'s property that is array type.each property save a position that appear at the string.
now,if the B not appear at the string,we can immediately end this processing.
we can also get all the C and D position,and then compare it whether can sequential appear(DC,DCC,CCC....)
I'm not sure what programming language you are using; have you checked its capabilities with regular expressions ? If you are not familiar with these, you should be, hit Google.
var suffix_array=Convert a string to suffix array.
var best=(worst value - presumably zero - pattern);
for (int i=0;i<pattern list array length;i++){
if (suffix_array.match(pattern[i])){
if(pattern[i]>best){
best=pattern[i];
}
(add pattern[i] to list here if you still want a list of all matches)
}
}
write best;
Roughly, anyway, if I understand what you're looking for that's a slight improvement though I'm sure there may be a better solution.

Resources