Shunting Yard Algorithm in case "a b +" [duplicate] - algorithm

We use the Shunting-Yard algorithm to evaluate expressions. We can validate the expression by simply applying the algorithm. It fails if there are missing operands, miss-matched parenthesis, and other things. The Shunting-Yard algorithm however has a larger supported syntax than just human readable infix. For example,
1 + 2
+ 1 2
1 2 +
are all acceptable ways to provide '1+2' as input to the Shunting-Yard algorithm. '+ 1 2' and '1 2 +' are not valid infix, but the standard Shunting-Yard algorithm can handle them. The algorithm does not really care about the order, it applies operators by order of precedence grabbing the 'nearest' operands.
We would like to restrict our input to valid human readable infix. I am looking for a way to either modify the Shunting-Yard algorithm to fail with non-valid infix or provide an infix validation prior to using Shunting-Yard.
Is anyone aware of any published techniques to do this? We must support both basic operator, custom operators, brackets, and functions (with multiple arguments). I haven't seen anything that works with more than the basic operators online.
Thanks

The solution to my problem was to enhance the algorithm posted on Wikipedia with the state machine recommended by Rici. I am posting the pseudo code here because it may be of use to others.
Support two states, ExpectOperand and ExpectOperator.
Set State to ExpectOperand
While there are tokens to read:
If token is a constant (number)
Error if state is not ExpectOperand.
Push token to output queue.
Set state to ExpectOperator.
If token is a variable.
Error if state is not ExpectOperand.
Push token to output queue.
Set state to ExpectOperator.
If token is an argument separator (a comma).
Error if state is not ExpectOperator.
Until the top of the operator stack is a left parenthesis (don't pop the left parenthesis).
Push the top token of the stack to the output queue.
If no left parenthesis is encountered then error. Either the separator was misplaced or the parentheses were mismatched.
Set state to ExpectOperand.
If token is a unary operator.
Error if the state is not ExpectOperand.
Push the token to the operator stack.
Set the state to ExpectOperand.
If the token is a binary operator.
Error if the state is not ExpectOperator.
While there is an operator token at the top of the operator stack and either the current token is left-associative and of lower then or equal precedence to the operator on the stack, or the current token is right associative and of lower precedence than the operator on the stack.
Pop the operator from the operator stack and push it onto the output queue.
Push the current operator onto the operator stack.
Set the state to ExpectOperand.
If the token is a Function.
Error if the state is not ExpectOperand.
Push the token onto the operator stack.
Set the state to ExpectOperand.
If the token is a open parentheses.
Error if the state is not ExpectOperand.
Push the token onto the operator stack.
Set the state to ExpectOperand.
If the token is a close parentheses.
Error if the state is not ExpectOperator.
Until the token at the top of the operator stack is a left parenthesis.
Pop the token off of the operator stack and push it onto the output queue.
Pop the left parenthesis off of the operator stack and discard.
If the token at the top of the operator stack is a function then pop it and push it onto the output queue.
Set the state to ExpectOperator.
At this point you have processed all the input tokens.
While there are tokens on the operator stack.
Pop the next token from the operator stack and push it onto the output queue.
If a parenthesis is encountered then error. There are mismatched parenthesis.
You can easily differentiate between unary and binary operators (I'm specifically speaking about the negative prefix and subtraction operator) by looking at the previous token. If there is no previous token, the previous token is an open parenthesis, or the previous token is an operator then you have encountered a unary prefix operator, else you have encountered the binary operator.

A nice discussion on Shunting Yard algorithms is http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm
The algorithm presented there uses the key idea of the operator stack but has some grammar to know what should be expected next. It has two main functions E() which expects an expression and P() which is expecting either a prefix operator, a variable, a number, brackets and functions. Prefix operators always bind tighter than binary operators so you want to deal this any of then first.
If we say P stands for some prefix sequence and B is a binary operator then any expression will be of the form
P B P B P
i.e. you are either expecting a a prefix sequence or a binary operator. Formally the grammar is
E -> P (B P)*
and P will be
P -> -P | variable | constant | etc.
This translates to psudocode as
E() {
P()
while next token is a binary op:
read next op
push onto stack and do the shunting yard logic
P()
if any tokens remain report error
pop remaining operators off the stack
}
P() {
if next token is constant or variable:
add to output
else if next token is unary minus:
push uminus onto operator stack
P()
}
You can expand this to handle other unary operators, functions, brackets, suffix operators.

Related

What to do when converting infix to postfix expression using stack?

Problem I am facing is that what to do when two operators of same priorities are their?
Example:
If ^ is in the top of stack and ^ comes that what to do?
Should I enter it in stack or just pop out one ^ or both ^ comes out of the stack?
Since both operators are of same precedence, it doesn't matter in which order you execute the calculation as long as there is no bracket involved. You can push it onto stack and do calculation later together or pop the existing one to do the calculation now.
What to do in this case depends on the operator or its specific precedence level, and is referred to as the operator's associativity: https://en.wikipedia.org/wiki/Operator_associativity
Usually + and - have the same precedence and left associativity, for example, meaning a+b-c+d = ((a+b)-c)+d.
The assignment operators usually have right-associativity, meaning a=b+=c=d is the same as a=(b+=(c=d))
I haven't done a detailed survey, but I think that exponentiation operators usually have right associativity, because left associativity is redundant with multiplication, i.e., (a^b)^c = a^(b*c)

Parse expression with functions

This is my situation: the input is a string that contains a normal mathematical operation like 5+3*4. Functions are also possible, i.e. min(5,A*2). This string is already tokenized, and now I want to parse it using stacks (so no AST). I first used the Shunting Yard Algorithm, but here my main problem arise:
Suppose you have this (tokenized) string: min(1,2,3,+) which is obviously invalid syntax. However, SYA turns this into the output stack 1 2 3 + min(, and hopefully you see the problem coming. When parsing from left to right, it sees the + first, calculating 2+3=5, and then calculating min(1,5), which results in 1. Thus, my algorithm says this expression is completely fine, while it should throw a syntax error (or something similar).
What is the best way to prevent things like this? Add a special delimiter (such as the comma), use a different algorithm, or what?
In order to prevent this issue, you might have to keep track of the stack depth. The way I would do this (and I'm not sure it is the "best" way) is with another stack.
The new stack follows these rules:
When an open parentheses, (, or function is parsed, push a 0.
Do this in case of nested functions
When a closing parentheses, ), is parsed, pop the last item off and add it to the new last value on the stack.
The number that just got popped off is how many values were returned by the function. You probably want this to always be 1.
When a comma or similar delimiter is parsed, pop from the stack, add that number to the new last element, then push a 0.
Reset so that we can begin verifying the next argument of a function
The value that just got popped off is how many values were returned by the statement. You probably want this to always be 1.
When a number is pushed to the output, increment the top element of this stack.
This is how many values are available in the output. Numbers increase the number of values. Binary operators need to have at least 2.
When a binary operator is pushed to the output, decrement the top element
A binary operator takes 2 values and outputs 1, thus reducing the overall number of values left on the output by 1.
In general, an n-ary operator that takes n values and returns m values should add (m-n) to the top element.
If this value ever becomes negative, throw an error!
This will find that the last argument in your example, which just contains a +, will decrement the top of the stack to -1, automatically throwing an error.
But then you might notice that a final argument in your example of, say, 3+ would return a zero, which is not negative. In this case, you would throw an error in one of the steps where "you probably want this to always be 1."

How to build a binary expression tree from a prefix notation?

something like this ( * (+ 1 2 3) 5)
Operator like *, + can have more than two operands.
To make prefix notation with unbounded number of operands you should define some additional rules for open/close brackets (and that's not what prefix notation generaly does).
Simple parser will take operation, first operand and add other operands one by one. On each step just create new operation node, left operand will take previous (current) result, right operand will take newly fetched operand.
Continue up to the end of input or close bracket. Do not remove close bracket from input - it should be dealt with in open-close parse part, not in operation parse.
Taking operand is straightforward:
"(" -> go deeper and parse subexpression up to ")".
Different operation - > go deeper and parse sub expression.
Same operation cam be simply ignored, but it's up to you.
Constant (or variable if you have them) -> make operand subexpression.

How to convert infix expression to prefix expression without using stack, array, programming language or implemetaion

Someone tell me any algorithm or steps to be taken for converting infix expression to prefix expression without using stack, array, any programming language or implementation. Just simple human algorithms for Non CS Students.
If anyone have better algorithm or steps please specify and also try to solve it for me please... :)
(5+15/3)^2-(8*3/3*4/5*32/5+42)*(3*3/3*5/4)
The "simple human algorithm" uses a stack. Consider the Shunting Yard, for example. You can do that with paper and pencil. The "output queue" is simply the solution that you output. The "stack" is just a holding place. So when it says, "push onto the stack", imagine putting that value on the top of a stack of other values. When it says, "pop from the stack," imagine removing the thing that was on top.
When doing it with pencil and paper, dedicate a couple of lines at the bottom of the page for your output queue. Create a column on the right side of the page as your stack. Wherever it says, "write it to the output queue", write that value as the next value on your answer line.
The first time it says, "push onto the stack", write that value in the stack column, at the bottom. If you have to push something else, write it above that value. When it says "pop from the stack," erase the top value from your stack column, freeing up a space.
That really is the simplest reliable way to do things by hand.
I'll use the first bit of your example for a demonstration. Let's say you want to convert (5+15/3)^2 to postfix. Using the instructions in the Shunting Yard article:
Your output queue is empty and so is your stack. The first token is (. The instructions say to push it onto the stack. So we have:
output queue:
stack: (
The next token is 5. It goes to the output queue:
output queue: 5
stack: (
Next is +. Since there is no token on the top of the stack, we just push it:
output queue: 5
stack: ( +
Next is 15. It goes to the output queue
output queue: 5 15
stack: ( +
Next is /. It's an operator and there's an operator on the stack, but / has higher precedence than +. So according to the rules, we push / onto the stack:
output queue 5 15
stack: ( + /
Next is 3. It goes to the output queue:
output queue 5 15 3
stack: ( + /
Next is ). The rules say to start popping operators from the stack until we get to the open parenthesis. Or, if we empty the stack and there's no open paren, then we have mismatched parentheses. Anyway, popping the stack and adding to the output queue:
output queue: 5 15 3 / +
stack: <empty>
Next token is ^. There are no operators on the stack, so we push it.
output queue: 5 15 3 / +
stack: ^
Finally, we have 2. It goes to the output queue:
output queue: 5 15 3 / + 2
stack: ^
And we're at the end of the string, so we pop all the operators and put them on the output queue:
output queue: 5 15 3 / + 2 ^
And that's the postfix representation of (5 + 15/3)^2.
The only tricky part is getting the operator precedence right. Basically, exponentiation is highest. Multiplication and division next, at equal precedence, then addition and subtraction at equal precedence. If those are the only operators, it's easy to remember. Otherwise you'll probably want a table of operator precedence handy so you can refer to it when you're working the algorithm. And the unary minus (i.e. 5 + -1) require a special case. But really, that's all there is to it.

Why do we need prefix, postfix notation

I know how each of them can be converted to one another but never really understood what their applications are. The usual infix operation is quite readable, but where does it fail which led to inception of prefix and postfix notation
Infix notation is easy to read for humans, whereas pre-/postfix notation is easier to parse for a machine. The big advantage in pre-/postfix notation is that there never arise any questions like operator precedence.
For example, consider the infix expression 1 # 2 $ 3. Now, we don't know what those operators mean, so there are two possible corresponding postfix expressions: 1 2 # 3 $ and 1 2 3 $ #. Without knowing the rules governing the use of these operators, the infix expression is essentially worthless.
Or, to put it in more general terms: it is possible to restore the original (parse) tree from a pre-/postfix expression without any additional knowledge, but the same isn't true for infix expressions.
Postfix notation, also known as RPN, is very easy to process left-to-right. An operand is pushed onto a stack; an operator pops its operand(s) from the stack and pushes the result. Little or no parsing is necessary. It's used by Forth and by some calculators (HP calculators are noted for using RPN).
Prefix notation is nearly as easy to process; it's used in Lisp.
At least for the case of the prefix notation: The advantage of using a prefix operator is that syntactically, it reads as if the operator is a function call
Another aspect of prefix/postfix vs. infix is that the arity of the operator (how many arguments it is applied to) no longer has to be limited to exactly 2. It can be more, or sometimes less (0 or 1 when defaults are implied naturally, like zero for addition/subtraction, one for multiplication/division).

Resources