Syntax Directed Definition for a grammar to print the parsing string - syntax

---> Consider the grammar below:
S->SaS|bB
B->AcB| ε
A->dAd| ε
For the grammar given above, write the syntax directed definition that
prints the string that is being parsed and construct an annotated parse tree for the string ‘bddcab’.
Solution:
Now rewriting above grammar we have:
S->S1aS2
S->bB
B->AcB1
B-> ε
A->dA1d
A-> ε
( The numbers 1 and 2 following the non-terminal actually denote subscripts. And the subscripts in above grammar denote instances of the non-terminal.)
The above grammar along with the semantic rules.
Productions Semantic Rules
S->S1aS2 S.val=S1.val+a.lexval + S2.val { print S.val }
S->bB S.val=b.lexval + B.val { Print S.val}
B->AcB1 B.val=A.val+c.lexval + B1.val
B-> ε
A->dA1d A.val=d.lexval + A1.val + d.lexval
A-> ε
** The '+' operator is merely for concatenation.
Is this solution alright? I have a feeling that it might not be accurate.
Here's the annotated parse tree.

I think that those print actions in the S rules will backfire because S can occur multiple times.
S can generate SaS. But each of those S's can also generate SaS.
Basically, if you're building the printed representation as a semantic property, you can only do the print outside of the grammar once it is fully evaluated, ensuring that it happens only once.
This could be shown by introducing a pseudo start symbol X. S is reduced to X just once, and so the print happens just once, pulling the final val from the top-level S.
X -> S { print S.val } // print the top-level S's val, just once.
The other approach would be to have truly syntax-directed printing, whereby the side effect of printing happens as the parsing reductions takes place. E.g. Yacc-like embedded rule among the right hand symbols:
S -> S1 a { print a.lexeme } S2 { /* other semantic rules go here */ }
In every rule that recognizes a terminal, print the terminal as soon as it is recognized. So here, we know that the reduction of S1 causes all of its terminals to be printed (by similar rules all over the grammar). Then we recognize an a and print it, and then S2 is recognized and reduced, causing all of its terminals to be printed. You may recognize that this is closely analogous to an inorder traversal of a tree.

Related

Is there a performance difference between head$filter and head$dropWhile with Haskell Strings?

I'm working on lists of "People" objects in Haskell, and I was wondering if there was any difference in performance between head$dropWhile and head$filter to find the first person with a given name. The two options and a snip of the datatype would be:
datatype Person = Person { name :: String
, otherStuff :: StuffTypesAboutPerson }
findPerson :: String -> [Person] -> Person
findPerson n = head $ dropWhile (\p -> name p /= n)
findPerson n = head $ filter (\p -> name p == n)
My thought was, filter would have to compare the full length of n to the full length of every name until it finds the first one. I would think dropWhile would only need to compare the strings until the first non-matching Char. However, I know there is a ton of magic in Haskell, especially GHC. I would prefer to use the filter version, because I think it's more straight-forward to read. However, I was wondering if there actually is any performance difference? Even if it's negligible, I'm also interested from a curiosity standpoint at this point.
Edit: I know I also need to protect from errors with Maybe, etc, but I left that out to simplify the code example.
There are several approaches to the problem
findPerson n = head $ dropWhile (\p -> name p /= n)
findPerson n = head $ filter (\p -> name p == n)
findPerson n = fromJust $ find (\p -> name p == n)
The question also points out two facts:
when x,y are equal strings, == needs to compare all the characters
when x,y are different strings, /= only needs to compare until the first different character
This is correct, but does not consider the other cases
when x,y are equal strings, /= needs to compare all the characters
when x,y are different strings, == only needs to compare until the first different character
So, between == and /= there is no performance winner. We can expect that, at most, one of them will perform an additional not w.r.t. the other one.
Also, all the three implementations of findPerson mentioned above, essentially perform the same steps. Given xs :: [Person], they will all scan xs until a matching name is found, and no more. On all the persons before the match, the name will be compared against n, and this comparison will stop at the first different character (no matter what comparison we use above). The matching person will have their name compared completely with n (again, in all cases).
Hence, the approaches are expected to run in the same time. There might be a very small difference between them, but it could be so small that it would be hard to detect. You can try to experiment with criterion and see what happens, if you wish.

Multi-character substitution cipher algorithm

My problem is the following. I have a list of substitutions, including one substitution for each letter of the alphabet, but also some substitutions for groups of more than one letter. For example, in my cipher p becomes b, l becomes w, e becomes i, but le becomes by, and ple becomes memi.
So, while I can think of a few simple/naïve ways of implementing this cipher, it's not very efficient, and I was wondering what the most efficient way to do it would be. The answer doesn't have to be in any particular language, a general structured English algorithm would be fine, but if it must be in some language I'd prefer C++ or Java or similar.
EDIT: I don't need this cipher to be decipherable, an algorithm that mapped all single letters to the letter 'w' but mapped the string 'had' to the string 'jon' instead should be ok, too (then the string "Mary had a little lamb." would become "Wwww jon w wwwwww wwww.").
I'd like the algorithm to be fully general.
One possible approach is to use deterministic automaton. The closest to your problem and commonly used example is Aho–Corasick string matching algorithm. The difference will be, instead of matching you would like to emit cypher at some transition. Generally at each transition you will emit or do not emit cypher.
In your example
p -> b
l -> w
e -> i
le -> by
ple -> memi
The automaton (in Erlang like pseudocode)
start(p) -> p(next());
start(l) -> l(next());
start(e) -> e(next());
...
p(l) -> pl(next);
p(X) -> emit(b), start(X).
l(e) -> emit(by), start(next());
l(X) -> emit(w), start(X).
e(X) -> emit(i), start(X).
pl(e) -> emit(memi), start(next());
pl(X) -> emit(b), l(X).
If you are not familiar with Erlang, start(), p() are functions each for one state. Each line with -> is one transition and the actions follows the ->. emit() is function which emits cypher and next() is function returning next character. The X is variable for any other character.

adding a number to a list within a function OCaml

Here is what I have and the error that I am getting sadly is
Error: This function has type 'a * 'a list -> 'a list
It is applied to too many arguments; maybe you forgot a `;'.
Why is that the case? I plan on passing two lists to the deleteDuplicates function, a sorted list, and an empty list, and expect the duplicates to be removed in the list r, which will be returned once the original list reaches [] condition.
will be back with updated code
let myfunc_caml_way arg0 arg1 = ...
rather than
let myfunc_java_way(arg0, arg1) = ...
Then you can call your function in this way:
myfunc_caml_way "10" 123
rather than
myfunc_java_way("10, 123)
I don't know how useful this might be, but here is some code that does what you want, written in a fairly standard OCaml style. Spend some time making sure you understand how and why it works. Maybe you should start with something simpler (eg how would you sum the elements of a list of integers ?). Actually, you should probably start with an OCaml tutorial, reading carefully and making sure you aunderstand the code examples.
let deleteDuplicates u =
(*
u : the sorted list
v : the result so far
last : the last element we read from u
*)
let rec aux u v last =
match u with
[] -> v
| x::xs when x = last -> aux xs v last
| x::xs -> aux u (x::v) x
in
(* the first element is a special case *)
match u with
[] -> []
| x::xs -> List.rev (aux xs [x] x)
This is not a direct answer to your question.
The standard way of defining an "n-ary" function is
let myfunc_caml_way arg0 arg1 = ...
rather than
let myfunc_java_way(arg0, arg1) = ...
Then you can call your function in this way:
myfunc_caml_way "10" 123
rather than
myfunc_java_way("10, 123)
See examples here:
https://github.com/ocaml/ocaml/blob/trunk/stdlib/complex.ml
By switching from myfunc_java_way to myfunc_caml_way, you will be benefited from what's called "Currying"
What is 'Currying'?
However please note that you sometimes need to enclose the whole invocation by parenthesis
myfunc_caml_way (otherfunc_caml_way "foo" "bar") 123
in order to tell the compiler not to interpret your code as
((myfunc_caml_way otherfunc_caml_way "foo") "bar" 123)
You seem to be thinking that OCaml uses tuples (a, b) to indicate arguments of function calls. This isn't the case. Whenever some expressions stand next to each other, that's a function call. The first expression is the function, and the rest of the expressions are the arguments to the function.
So, these two lines:
append(first,r)
deleteDuplicates(remaining, r)
Represent a function call with three arguments. The function is append. The first argument is (first ,r). The second argument is deleteDuplicates. The third argument is (remaining, r).
Since append has just one argument (a tuple), you're passing it too many arguments. This is what the compiler is telling you.
You also seem to be thinking that append(first, r) will change the value of r. This is not the case. Variables in OCaml are immutable. You can't do anything that will change the value of r.
Update
I think you have too many questions for SO to help you effectively at this point. You might try reading some OCaml tutorials. It will be much faster than asking a question here for every error you see :-)
Nonetheless, here's what "match failure" means. It means that somewhere you have a match that you're applying to an expression, but none of the patterns of the match matches the expression. Your deleteDuplicates code clearly has a pattern coverage error; i.e., it has a pattern that doesn't cover all cases. Your first match only works for empty lists or for lists of 2 or more elements. It doesn't work for lists of 1 element.

Evaluation functions and expressions in Boolean expressions

I am aware how we can evaluate an expression after converting into Polish Notations. However I would like to know how I can evaluate something like this:
If a < b Then a + b Else a - b
a + b happens in case condition a < b is True, otherwise, if False a - b is computed.
The grammar is not an issue here. Since I only need the algorithm to solve this problem. I am able evaluate boolean and algebraic expressions. But how can I go about solving the above problem?
Do you need to assign a+b or a-b to something?
You can do this:
int c = a < b ? a+b : a-b;
Or
int sign = a < b ? 1 : -1;
int c = a + (sign * b);
Refer to LISP language for S-express:
e.g
(if (> a b) ; if-part
(+ a b) ; then-part
(- a b)) ; else-part
Actually if you want evaluate just this simple if statement, toknize it and evaluate it, but if you want to evaluate somehow more complicated things, like nested if then else, if with experssions, multiple else, variable assignments, types, ... you need to use some parser, like LR parsers. You can use e.g Lex&Yacc to write a good parser for your own language. They support somehow complicated grammars. But if you want to know how does LR parser (or so) works, you should read into them, and see how they use their table to read tokens and parse them. e.g take a look at wiki page and see how does LR parser table works (it's something more than simple stack and is not easy to describe it here).
If your problem is just really parsing if statement, you can cheat from parser techniques, you can add empty thing after a < b, which means some action, and empty thing after else, which also means an action. When you parsed the condition, depending on correctness or wrongness you will run one of actions. By the way if you want to parse expressions inside if statement you need conditional stack, means something like SLR table.
Basically, you need to build in support for a ternary operator. IE, where currently you pop an operator, and then wait for 2 sequential values before resolving it, you need to wait for 3 if your current operation is IF, and 2 for the other operations.
To handle the if statement, you can consider the if statement in terms of C++'s ternary operator. Which formats you want your grammar to support is up to you.
a < b ? a + b : a - b
You should be able to evaluate boolean operators on your stack the way you currently evaluate arithmetic operations, so a < b should be pushed as
< a b
The if can be represented by its own symbol on the stack, we can stick with '?'.
? < a b
and the 2 possible conditions to evaluate need to separated by another operator, might as well use ':'
? < a b : + a b - a b
So now when you pop '?', you see it is the operator that needs 3 values, so put it aside as you normally would, and continue to evaluate the stack until you have 3 values. The ':' operator should be a binary operator, that simply pushes both of its values back onto the stack.
Once you have 3 values on the stack, you evaluate ? as:
If the first value is 1, push the 2nd value, throw away the third.
If the first value is 0, throw away the 2nd and push the 3rd.

Haskell's algebraic data types

I'm trying to fully understand all of Haskell's concepts.
In what ways are algebraic data types similar to generic types, e.g., in C# and Java? And how are they different? What's so algebraic about them anyway?
I'm familiar with universal algebra and its rings and fields, but I only have a vague idea of how Haskell's types work.
Haskell's algebraic data types are named such since they correspond to an initial algebra in category theory, giving us some laws, some operations and some symbols to manipulate. We may even use algebraic notation for describing regular data structures, where:
+ represents sum types (disjoint unions, e.g. Either).
• represents product types (e.g. structs or tuples)
X for the singleton type (e.g. data X a = X a)
1 for the unit type ()
and μ for the least fixed point (e.g. recursive types), usually implicit.
with some additional notation:
X² for X•X
In fact, you might say (following Brent Yorgey) that a Haskell data type is regular if it can be expressed in terms of 1, X, +, •, and a least fixed point.
With this notation, we can concisely describe many regular data structures:
Units: data () = ()
1
Options: data Maybe a = Nothing | Just a
1 + X
Lists: data [a] = [] | a : [a]
L = 1+X•L
Binary trees: data BTree a = Empty | Node a (BTree a) (BTree a)
B = 1 + X•B²
Other operations hold (taken from Brent Yorgey's paper, listed in the references):
Expansion: unfolding the fix point can be helpful for thinking about lists. L = 1 + X + X² + X³ + ... (that is, lists are either empty, or they have one element, or two elements, or three, or ...)
Composition, ◦, given types F and G, the composition F ◦ G is a type which builds “F-structures made out of G-structures” (e.g. R = X • (L ◦ R) ,where L is lists, is a rose tree.
Differentiation, the derivative of a data type D (given as D') is the type of D-structures with a single “hole”, that is, a distinguished location not containing any data. That amazingly satisfy the same rules as for differentiation in calculus:
1′ = 0
X′ = 1
(F + G)′ = F' + G′
(F • G)′ = F • G′ + F′ • G
(F ◦ G)′ = (F′ ◦ G) • G′
References:
Species and Functors and Types, Oh My!, Brent A. Yorgey, Haskell’10, September 30, 2010, Baltimore, Maryland, USA
Clowns to the left of me, jokers to the right (Dissecting Data Structures), Conor McBride POPL 2008
"Algebraic Data Types" in Haskell support full parametric polymorphism, which is the more technically correct name for generics, as a simple example the list data type:
data List a = Cons a (List a) | Nil
Is equivalent (as much as is possible, and ignoring non-strict evaluation, etc) to
class List<a> {
class Cons : List<a> {
a head;
List<a> tail;
}
class Nil : List<a> {}
}
Of course Haskell's type system allows more ... interesting use of type parameters but this is just a simple example. With regards to the "Algebraic Type" name, i've honestly never been entirely sure of the exact reason for them being named that, but have assumed that it's due the mathematical underpinnings of the type system. I believe that the reason boils down to the theoretical definition of an ADT being the "product of a set of constructors", however it's been a couple of years since i escaped university so i can no longer remember the specifics.
[Edit: Thanks to Chris Conway for pointing out my foolish error, ADT are of course sum types, the constructors providing the product/tuple of fields]
In universal algebra
an algebra consists of some sets of elements
(think of each set as the set of values of a type)
and some operations, which map elements to elements.
For example, suppose you have a type of "list elements" and a
type of "lists". As operations you have the "empty list", which is a 0-argument
function returning a "list", and a "cons" function which takes two arguments,
a "list element" and a "list", and produce a "list".
At this point there are many algebras that fit the description,
as two undesirable things may happen:
There could be elements in the "list" set which cannot be built
from the "empty list" and the "cons operation", so-called "junk".
This could be lists starting from some element that fell from the sky,
or loops without a beginning, or infinite lists.
The results of "cons" applied to different arguments could be equal,
e.g. consing an element to a non-empty list
could be equal to the empty list. This is sometimes called "confusion".
An algebra which has neither of these undesirable properties is called
initial, and this is the intended meaning of the abstract data type.
The name initial derives from the property that there is exactly
one homomorphism from the initial algebra to any given algebra.
Essentially you can evaluate the value of a list by applying the operations
in the other algebra, and the result is well-defined.
It gets more complicated for polymorphic types ...
A simple reason why they are called algebraic; there are both sum (logical disjunction) and product (logical conjunction) types. A sum type is a discriminated union, e.g:
data Bool = False | True
A product type is a type with multiple parameters:
data Pair a b = Pair a b
In O'Caml "product" is made more explicit:
type 'a 'b pair = Pair of 'a * 'b
Haskell's datatypes are called "algebraic" because of their connection to categorical initial algebras. But that way lies madness.
#olliej: ADTs are actually "sum" types. Tuples are products.
#Timbo:
You are basically right about it being sort of like an abstract Tree class with three derived classes (Empty, Leaf, and Node), but you would also need to enforce the guarantee that some one using your Tree class can never add any new derived classes, since the strategy for using the Tree datat type is to write code that switches at runtime based on the type of each element in the tree (and adding new derived types would break existing code). You can sort of imagine this getting nasty in C# or C++, but in Haskell, ML, and OCaml, this is central to the language design and syntax so coding style supports it in a much more convenient manner, via pattern matching.
ADT (sum types) are also sort of like tagged unions or variant types in C or C++.
old question, but no one's mentioned nullability, which is an important aspect of Algebraic Data Types, perhaps the most important aspect. Since each value most be one of alternatives, exhaustive case-based pattern matching is possible.
For me, the concept of Haskell's algebraic data types always looked like polymorphism in OO-languages like C#.
Look at the example from http://en.wikipedia.org/wiki/Algebraic_data_types:
data Tree = Empty
| Leaf Int
| Node Tree Tree
This could be implemented in C# as a TreeNode base class, with a derived Leaf class and a derived TreeNodeWithChildren class, and if you want even a derived EmptyNode class.
(OK I know, nobody would ever do that, but at least you could do it.)

Resources