Parse expression with functions - algorithm

This is my situation: the input is a string that contains a normal mathematical operation like 5+3*4. Functions are also possible, i.e. min(5,A*2). This string is already tokenized, and now I want to parse it using stacks (so no AST). I first used the Shunting Yard Algorithm, but here my main problem arise:
Suppose you have this (tokenized) string: min(1,2,3,+) which is obviously invalid syntax. However, SYA turns this into the output stack 1 2 3 + min(, and hopefully you see the problem coming. When parsing from left to right, it sees the + first, calculating 2+3=5, and then calculating min(1,5), which results in 1. Thus, my algorithm says this expression is completely fine, while it should throw a syntax error (or something similar).
What is the best way to prevent things like this? Add a special delimiter (such as the comma), use a different algorithm, or what?

In order to prevent this issue, you might have to keep track of the stack depth. The way I would do this (and I'm not sure it is the "best" way) is with another stack.
The new stack follows these rules:
When an open parentheses, (, or function is parsed, push a 0.
Do this in case of nested functions
When a closing parentheses, ), is parsed, pop the last item off and add it to the new last value on the stack.
The number that just got popped off is how many values were returned by the function. You probably want this to always be 1.
When a comma or similar delimiter is parsed, pop from the stack, add that number to the new last element, then push a 0.
Reset so that we can begin verifying the next argument of a function
The value that just got popped off is how many values were returned by the statement. You probably want this to always be 1.
When a number is pushed to the output, increment the top element of this stack.
This is how many values are available in the output. Numbers increase the number of values. Binary operators need to have at least 2.
When a binary operator is pushed to the output, decrement the top element
A binary operator takes 2 values and outputs 1, thus reducing the overall number of values left on the output by 1.
In general, an n-ary operator that takes n values and returns m values should add (m-n) to the top element.
If this value ever becomes negative, throw an error!
This will find that the last argument in your example, which just contains a +, will decrement the top of the stack to -1, automatically throwing an error.
But then you might notice that a final argument in your example of, say, 3+ would return a zero, which is not negative. In this case, you would throw an error in one of the steps where "you probably want this to always be 1."

Related

What is the difference between length of a list and number of a list in AppleScript

Is there any difference between the 2?
I've been using number of list for quite a long time now, but I noticed that length was also reserved for Applescript, and that it seemed to have the same function as number....
But its highlighted purple instead of blue.
Are they exactly the same, or are they different? And which one would you suggest using?
Although both expressions have the same result, there is a difference.
number of — which is a synonym for count — evaluates the number of items when it's called.
length is a property of the class list which implies that the class maintains the value constantly and there is no further evaluation when it's called.
I'd prefer the latter.

Visualizing and Solving recusrive questions without a computer

Say I've something like this: (predict the output)
void abc (char *s){
if(s[0]=='\0')
return;
abc(s+1);
abc(s+1);
printf(“%c “, s[0]);
}
It's not tough to solve, but I take too much time doing it and I've to redo such questions 2-3 times because I lose track of the recursion and values of variables(especially when there are 2-3 such recursive statements)
Is there any good method to use when one has to solve such questions?
The basic technique is to first start with a small input. Then try with one larger. Then try with one larger than that. For recursive functions, a pattern should emerge that lets you predict what the next one will look like given you know what the previous one looked like.
So, let's start with an empty string. Easy, nothing is printed.
input: ""
output:
Next is a string of length one. Almost as easy, the two recursive calls each do nothing (empty string case), and then the string's character is printed.
input: "z"
output: z
Next is a string of length two. Each of the recursive calls end up printing the second character (string of length one case), and then the first character is printed.
input: "yz"
output: zzy
So, let's try to predict what will happen for the string of length three case. What will happen is that the substring that excludes the first character gets worked on twice, then the first character is printed. That substring is the string of length two case. So:
input: "xyz"
output: zzyzzyx
So, it should be clear now how to derive the next output sequence given the current output sequence.
The easiest example for analyzing recursion is Fibonacci and Factorial function.
This will help you in analyzing recursive functions in a better manner. Whenever you lose track of recursive functions just recall these examples.
Take a stack of index cards of an appropriate size. Start tracing the initial call to the recursive function. When you get another call start a new index card and either put it in front of the first card or behind it (as appropriate). Sooner or later you will (unless you are tracing an infinite recursion) trace the execution of a call which does not make a recursive call, in which case copy the return value back to the card you came from.
It's probably a good idea to include 'go to card X' and 'came from card Y' on your cards.
In complicated situations you might find it useful to create more than one stack of cards to trace your function calls, oh why the heck, why not call them call stacks.

Why doesn't the following function work for recursive appending to a list in swi prolog?

I have a list L and I need to split each element into a separate list and again append them together. This is the code I made for the same.
split([],[]).
split([H|T],Ls):-split(T,Ls),splist(H,[]).
make(Val,[H1|List],[H1|Res]):- make(Val,List,Res). make(Val, List,[Val|List]).
splist(H,L2):- make(Sum,[],L1),append(L1,L2,NewL).
When I use this code, each element of L is passed recursively from split() to splist() and made into a list L1 with single element by make(). I need append to keep concatenating L1 and L2. But it does not so so
For example, I have L=[1,2,3]. Now I need the following process to be done.
H=1, L1=[1] and L2=[1]. Next H=2, L1=[2] and L2=[1,2]. Next H=3, L1=[3] and L2=[1,2,3].
I need the output as mentioned above, but this is what my code does.
H=1, L1=[1], and L2= [1]. Next H=2, L1=[2] and L2=[2]. Next H=3, L1=[3] and L2=[3].
I can't make any sense out of your code. make definition is incomplete. As is, it does nothing and then fails.
Your split is equivalent to split(X,[]):- reverse(X,R), maplist(spl([]),R). with spl(B,A):-splist(A,B)., i.e. it tries splist(H,[]) for each element H of the input list X, backwards, to see whether it fails or not - that's its only outcome, as the arguments are fixed - H and [].
naming your predicates split and splist is a very bad idea - we humans are wired to distinguish words from their start, and the only different letter in these names is hidden way far near the end. IOW the two names are very similar, and it is very easy to misread and mistype them.
lastly, for splist(H,L2):- make(Sum,[],L1),append(L1,L2,NewL)., since make cn only fail, so will splist. But even if make were to produce something in L1 out of thin air - Sum starts out uninstantiated mind you - what does it say about L2? That it can be appended to the list L1? Any list can be appended to any other, saying that is saying nothing.
?? :)

Prefix to Infix Conversion Algorithm with figure

After some google search I find it!
Prefix to Infix
This algorithm is a non-tail recursive method.
The reversed input string is completely pushed into a stack.
prefixToInfix(stack)
1) IF stack is not empty
a. Temp -->pop the stack
b. IF temp is a operator
i. Write a opening parenthesis to output
ii. prefixToInfix(stack)
iii. Write temp to output
iv. prefixToInfix(stack)
v. Write a closing parenthesis to output
c. ELSE IF temp is a space -->prefixToInfix(stack)
d. ELSE
i. Write temp to output
ii. IF stack.top NOT EQUAL to space -->prefixToInfix(stack)
when the Stack top is
F(ABC)
and we enter the algorithm, "A" is written to the output as it was currently the value of
temp=A (say)
Now how I get '-' on the output column as according to the algorithm the next temp value will be "B" which was popped from the stack after the last recursive call.
How the diagram is showing output "((A-" ...
Where I am doing the incorrect assumption ?
Could someone take the trouble in explaining it ?
I don't quite understand your question.
If your stack is ABC, F(ABC) pops the A, goes into branch d.i. and writes an A to output, goes on into d.ii. and performs F(BC), which will, in the end, write both the B and C to output.
If you want your output to look like it does on the diagram, you'll need your stack to be * - A B C (note the spaces between every element!).
Edit:
(As an aside: all this is easier stepped through than described, so I suggest you write the algorithm as a program and start it in your choice of debugger.)
OK, so you have stored the first * in temp (a), written a ( (b.i.), and called the algorithm with the remaining stack (b.ii.). This throws away a blank, then you store a - in the next branch's temp, write a (, and called the algorithm with the remaining stack. At some point, you end up in d.ii., you have just written an A to output, giving you
((A
and the remaining stack is
_B_C
with a space on top and another space between B and C.
So now d.ii. finds the space and doesn't do anything anymore: this control branch is done, and we go back to where we came from, which was d.ii. in your - control branch. You write the - to output at d.iii., call the algorithm with the remaining stack (_B_C) at d.iv., and there you go, writing the B, a ), the * and C and the last ).
Just remember where you came from, so you know where to jump back after your current recursion is done.

How to get rid of unnecessary parentheses in mathematical expression

Hi I was wondering if there is any known way to get rid of unnecessary parentheses in mathematical formula. The reason I am asking this question is that I have to minimize such formula length
if((-if(([V].[6432])=0;0;(([V].[6432])-([V].[6445]))*(((([V].[6443]))/1000*([V].[6448])
+(([V].[6443]))*([V].[6449])+([V].[6450]))*(1-([V].[6446])))))=0;([V].[6428])*
((((([V].[6443]))/1000*([V].[6445])*([V].[6448])+(([V].[6443]))*([V].[6445])*
([V].[6449])+([V].[6445])*([V].[6450])))*(1-([V].[6446])));
it is basically part of sql select statement. It cannot surpass 255 characters and I cannot modify the code that produces this formula (basically a black box ;) )
As you see many parentheses are useless. Not mentioning the fact that:
((a) * (b)) + (c) = a * b + c
So I want to keep the order of operations Parenthesis, Multiply/Divide, Add/Subtract.
Im working in VB, but solution in any language will be fine.
Edit
I found an opposite problem (add parentheses to a expression) Question.
I really thought that this could be accomplished without heavy parsing. But it seems that some parser that will go through the expression and save it in a expression tree is unevitable.
If you are interested in remove the non-necessary parenthesis in your expression, the generic solution consists in parsing your text and build the associated expression tree.
Then, from this tree, you can find the corresponding text without non-necessary parenthesis, by applying some rules:
if the node is a "+", no parenthesis are required
if the node is a "*", then parenthesis are required for left(right) child only if the left(right) child is a "+"
the same apply for "/"
But if your problem is just to deal with these 255 characters, you can probably just use intermediate variables to store intermediate results
T1 = (([V].[6432])-([V].[6445]))*(((([V].[6443]))/1000*([V].[6448])+(([V].[6443]))*([V].[6449])+([V].[6450]))*(1-([V].[6446])))))
T2 = etc...
You could strip the simplest cases:
([V].[6432]) and (([V].[6443]))
Becomes
v.[6432]
You shouldn't need the [] around the table name or its alias.
You could shorten it further if you can alias the columns:
select v.[6432] as a, v.[6443] as b, ....
Or even put all the tables being queried into a single subquery - then you wouldn't need the table prefix:
if((-if(a=0;0;(a-b)*((c/1000*d
+c*e+f)*(1-g))))=0;h*
(((c/1000*b*d+c*b*
e+b*f))*(1-g));
select [V].[6432] as a, [V].[6445] as b, [V].[6443] as c, [V].[6448] as d,
[V].[6449] as e, [V].[6450] as f,[V].[6446] as g, [V].[6428] as h ...
Obviously this is all a bit psedo-code, but it should help you simplify the full statement
I know this thread is really old, but as it is searchable from google.
I'm writing a TI-83 plus calculator program that addresses similar issues. In my case, I'm trying to actually solve the equation for a specific variable in number, but it may still relate to your problem, although I'm using an array, so it might be easier for me to pick out specific values...
It's not quite done, but it does get rid of the vast majority of parentheses with (I think), a somewhat elegant solution.
What I do is scan through the equation/function/whatever, keeping track of each opening parenthese "(" until I find a closing parenthese ")", at which point I can be assured that I won't run into any more deeply nested parenthese.
y=((3x + (2))) would show the (2) first, and then the (3x + (2)), and then the ((3x + 2))).
What it does then is checks the values immediately before and after each parenthese. In the case above, it would return + and ). Each of these is assigned a number value. Between the two of them, the higher is used. If no operators are found (*,/,+,^, or -) I default to a value of 0.
Next I scan through the inside of the parentheses. I use a similar numbering system, although in this case I use the lowest value found, not the highest. I default to a value of 5 if nothing is found, as would be in the case above.
The idea is that you can assign a number to the importance of the parentheses by subtracting the two values. If you have something like a ^ on the outside of the parentheses
(2+3)^5
those parentheses are potentially very important, and would be given a high value, (in my program I use 5 for ^).
It is possible however that the inside operators would render the parentheses very unimportant,
(2)^5
where nothing is found. In that case the inside would be assigned a value of 5. By subtracting the two values, you can then determine whether or not a set of parentheses is neccessary simply by checking whether the resulting number is greater than 0. In the case of (2+3)^5, a ^ would give a value of 5, and a + would give a value of 1. The resulting number would be 4, which would indicate that the parentheses are in fact needed.
In the case of (2)^5 you would have an inner value of 5 and an outer value of 5, resulting
in a final value of 0, showing that the parentheses are unimportant, and can be removed.
The downside to this is that, (at least on the TI-83) scanning through the equation so many times is ridiculously slow. But if speed isn't an issue...
Don't know if that will help at all, I might be completely off topic. Hope you got everything up and working.
I'm pretty sure that in order to determine what parentheses are unnecessary, you have to evaluate the expressions within them. Because you can nest parentheses, this is is the sort of recursive problem that a regular expression can only address in a shallow manner, and most likely to incorrect results. If you're already evaluating the expression, maybe you'd like to simplify the formula if possible. This also gets kind of tricky, and in some approaches uses techniques that that are also seen in machine learning, such as you might see in the following paper: http://portal.acm.org/citation.cfm?id=1005298
If your variable names don't change significantly from 1 query to the next, you could try a series of replace() commands. i.e.
X=replace([QryString],"(([V].[6443]))","[V].[6443]")
Also, why can't it surpass 255 characters? If you are storing this as a string field in an Access table, then you could try putting half the expression in 1 field and the second half in another.
You could also try parsing your expression using ANTLR, yacc or similar and create a parse tree. These trees usually optimize parentheses away. Then you would just have to create expression back from tree (without parentheses obviously).
It might take you more than a few hours to get this working though. But expression parsing is usually the first example on generic parsing, so you might be able to take a sample and modify it to your needs.

Resources