Related
Context
I started teaching myself lambda calculus last night and I am trying to determine if what I understand so far is correct.
Understanding
SKK is equivalent to the Identity combinator, I.
Where L stands for lambda:
S = LxLyLz((xz)(yz))
K = LxLy(x)
K essentially takes the next 2 (lambda) terms and gives back the first of those. S seems a little more complicated in the untyped lambda calculus.
My Interpretation
SK(any-lambda-term) is also equivalent to I.
I.e. the application of the application of S to K to Any-lambda-term is equivalent to the Identity combinator:
((S K)(Any)) = I = S K K = ((S K)(K))
I am using the convention of “left-association” in my above notation, if that helps (And I tried to make that clear in the 4th term above with parentheses. Everything I have read so far seems to use this convention).
Reasoning
S K = LyLz((K z)(y z))
The next lambda term will be substituted for y, let the term be Y.
S K Y = Lz((K z)(Y z))
(Y z) is the application of Y to z, also a lambda term.
(K z)returns the constant function that returns z, given another term input: (Y z).
Is my interpretation true? If not, can you provide an explanation? I would greatly appreciate it. Particularly if a sort of order of operations can be explained—I regularly find myself confused when considering when to evaluate. Perhaps that will be refined with practice.
Your intuition is correct, but an intuition proves nothing (alas...)
So, how can we prove your statement? Simply by showing that SKK and SKS have the same behaviour. "Behaviour" is an informal notion, which is formally capture by "semantics": if SKK and SKS are equals, then they should always reduce to the same term, according to the SKI-calculus semantics.
Now, there is a deep question, which is: what are the SKI-calculus? Actually, there is not a single way to answer that. What you implicitly do in your question is that you express SKI in terms of λ terms and you rely on the semantics of the λ calculus. This is absolutly correct. An other way to do it could have been to define directly SKI semantics. For instance, if you look at the wikipedia page, you can see that the semantics are not defined with lambda terms (and the fact that it correspond to lambda term is a (nice and expected) side effect). In the rest of this answer, I'll take the same approach as you do, and convert SKI terms in λ terms. A good exercise for you is to redo the proof, using the proper SKI semantics.
So, let formalize your question: your question is whether, for any SKI term t, SKKt = SKSt? Well... Let's see.
SKKt is encoded as (λx.λy.λz.(xz)(yz))(λx.λy.x)(λx.λy.x)t in the λ-calculus. We now just have to reduce it to a normal form (I detail every step, each time I reduce the leftmost λ, even tho it is not the fastest strategy):
(λx.λy.λz.(xz)(yz))(λx.λy.x)(λx.λy.x)t
= (λy.λz.((λx.λy.x)z)(yz))(λx.λy.x)t
= (λz.((λx.λy.x)z)((λx.λy.x)z))t
= ((λx.λy.x)t)((λx.λy.x)t)
= (λy.t)((λx.λy.x)t)
= t
So, the encoding of SKKt in the λ calculus reduces to t (as a sidenote, we just proved that SKK is equivalent to I here). To conclude our proof, we have to reduce SKSt and see whether it also reduces to t.
SKSt is encoded as (λx.λy.λz.(xz)(yz))(λx.λy.x)(λx.λy.λz.(xz)(yz))t. Let reduce it. (I don't detail as much this time)
(λx.λy.λz.(xz)(yz))(λx.λy.x)(λx.λy.λz.(xz)(yz))t
= ((λx.λy.x) t)((λx.λy.λz.(xz)(yz)) t))
= (λy.t)((λx.λy.λz.(xz)(yz)) t))
= t
Hurrah! It also reduce to t, so indeed, SKS and SKK are equivalent. It seems that the third combinator is not important: that as soon as you have SK?, it is equivalent to I. As an exercise, you can easily prove it (same strategy, if it is the case, then for any terms t and s, SKts = s). As mentionned above, an other good exercise is to redo the proof without using the λ semantics, but the proper SKI semantics.
Finally, my answer should raise a new question to you: we have two semantics, one that encodes SKI terms into λ terms, and one that does not. The question you may have is: are the two semantics equivalent? What does it mean for two semantics to be equivalent? If you are only starting to teach yourself λ calculus, it may be a bit early to try to answer those questions right now, but you can keep it in a corner of your head for when you'll get more familiar with formal languages.
I have a Relation f defined as f: A -> B × C. I would like to write a firsr-order formula to constrain this relation to be a bijective function from A to B × C?
To be more precise, I would like the first order counter part of the following formula (actually conjunction of the three):
∀a: A, ∃! bc : B × C, f(a)=bc -- f is function
∀a1,a2: A, f(a1)=f(a2) → a1=a2 -- f is injective
∀(b, c) : B × C, ∃ a : A, f(a)=bc -- f is surjective
As you see the above formulae are in Higher Order Logic as I quantified over the relations. What is the first-order logic equivalent of these formulae if it is ever possible?
PS:
This is more general (math) question, rather than being more specific to any theorem prover, but for getting help from these communities --as I think there are mature understanding of mathematics in these communities-- I put the theorem provers tag on this question.
(Update: Someone's unhappy with my answer, and SO gets me fired up in general, so I say what I want here, and will probably delete it later, I suppose.
I understand that SO is not a place for debates and soapboxes. On the other hand, the OP, qartal, whom I assume is the unhappy one, wants to apply the answer from math.stackexchange.com, where ZFC sets dominates, to a question here which is tagged, at this moment, with isabelle and logic.
First, notation is important, and sloppy notation can result in a question that's ambiguous to the point of being meaningless.
Second, having a B.S. in math, I have full appreciation for the logic of ZFC sets, so I have full appreciation for math.stackexchange.com.
I make the argument here that the answer given on math.stackexchange.com, linked to below, is wrong in the context of Isabelle/HOL. (First hmmm, me making claims under ill-defined circumstances can be annoying to people.)
If I'm wrong, and someone teaches me something, the situation here will be redeemed.
The answerer says this:
First of all in logic B x C is just another set.
There's not just one logic. My immediate reaction when I see the symbol x is to think of a type, not a set. Consider this, which kind of looks like your f: A -> BxC:
definition foo :: "nat => int × real" where "foo x = (x,x)"
I guess I should be prolific in going back and forth between sets and types, and reading minds, but I did learn something by entering this term:
term "B × C" (* shows it's of type "('a × 'b) set" *)
Feeling paranoid, I did this to see if had fallen into a major gotcha:
term "f : A -> B × C"
It gives a syntax error. Here I am, getting all pedantic, and our discussion is ill-defined because the notation is ill-defined.
The crux: the formula in the other answer is not first-order in this context
(Another hmmm, after writing what I say below, I'm full circle. Saying things about stuff when the context of the stuff is ill-defined.)
Context is everything. The context of the other site is generally ZFC sets. Here, it's HOL. That answerer says to assume these for his formula, wich I give below:
Ax is true iff x∈A
Bx is true iff x∈B×C
Rxy is true iff f(x)=y
Syntax. No one has defined it here, but the tag here is isabelle, so I take it to mean that I can substitute the left-hand side of the iff for the right-hand side.
Also, the expression x ∈ A is what would be in the formula in a typical set theory textbook, not Rxy. Therefore, for the answerer's formula to have meaning, I can rightfully insert f(x) = y into it.
This then is why I did a lot of hedging in my first answer. The variable f cannot be in the formula. If it's in the formula, then it's a free variable which is implicitly quantified. Here's the formula in Isar syntax:
term "∀x. (Ax --> (∃y. By ∧ Rxy ∧ (∀z. (Bz ∧ Rxz) --> y = z)))"
Here it is with the substitutions:
∀x. (x∈A --> (∃y. y∈B×C ∧ f(x)=y ∧ (∀z. (z∈B×C ∧ f(x)=z) --> y = z)))
In HOL, f(x) = f x, and so f is implicitly, universally quantified. If this is the case, then it's not first-order.
Really, I should dig deep to recall what I was taught, that f(x)=y means:
(x,f(x)) = (x,y) which means we have to have (x,y)∈(A, B×C)
which finally gets me:
∀x. (x∈A -->
(∃y. y∈B×C ∧ (x,y)∈(A,B×C) ∧ (∀z. (z∈B×C ∧ (x,z)∈(A,B×C)) --> y = z)))
Finally, I guess it turns out that in the context of math.stackexchange.com, it's 100% on.
Am I the only one who feels compulsive about questioning what this means in the context of Isabelle/HOL? I don't accept that everything here is defined well enough to show that it's first order.
Really, qartal, your notation should be specific to a particular logic.
First answer
With Isabelle, I answer the question based on my interpretation of your
f: A -> B x C, which I take as a ZFC set, in particular a subset of the
Cartesian product A x (B x C)
You're sort of mixing notation from the two logics, that of ZFC
sets and that of HOL. Consequently, I might be off on what I think you're
asking.
You don't define your relation, so I keep things simple.
I define a simple ZFC function, and prove the first
part of your first condition, that f is a function. The second part would be
proving uniqueness. It can be seen that f satisfies that, so once a
formula for uniqueness is stated correctly, auto might easily prove it.
Please notice that the
theorem is a first-order formula. The characters ! and ? are ASCII
equivalents for \<forall> and \<exists>.
(Clarifications must abound when
working with HOL. It's first-order logic if the variables are atomic. In this
case, the type of variables are numeral. The basic concept is there. That
I'm wrong in some detail is highly likely.)
definition "A = {1,2}"
definition "B = A"
definition "C = A"
definition "f = {(1,(1,1)), (2,(1,1))}"
theorem
"!a. a \<in> A --> (? z. z \<in> (B × C) & (a,z) \<in> f)"
by(auto simp add: A_def B_def C_def f_def)
(To completely give you an example of what you asked for, I would have to redefine my function so its bijective. Little examples can take a ton of work.)
That's the basic idea, and the rest of proving that f is a function will
follow that basic pattern.
If there's a problem, it's that your f is a ZFC set function/relation, and
the logical infrastructure of Isabelle/HOL is set up for functions as a type.
Functions as ordered pairs, ZFC style, can be formalized in Isabelle/HOL, but
it hasn't been done in a reasonably complete way.
Generalizing it all is where the work would be. For a particular relation, as
I defined above, I can limit myself to first-order formulas, if I ignore that
the foundation, Isabelle/HOL, is, of course, higher-order logic.
So I have the following working code in Prolog that produces the factorial of a given value of A:
factorial(0,1).
factorial(A,B) :- A>0, C is A-1, factorial(C,D), B is A*D.
I am looking for an explanation as to how this code works. I.e, what exactly happens when you ask the query: factorial(4, Answer).
Firstly,
factorial(0, 1).
I know the above is the "base case" of the recursive definition. What I am not sure of why/how it is the base case. My guess is that factorial(0, 1) inserts some structure containing (0, 1) as a member of "factorial". If so, what does the structure look like? I know if we say something like "rainy(seattle).", this means that Seattle is rainy. But "factorial(0, 1)"... 0, 1 is factorial? I realize it means factorial of 0 is 1, but how is this being used in the long run? (Writing this is helping me understand more as I go along, but I would like some feedback to make sure my thinking is correct.)
factorial(A,B) :- A>0, C is A-1, factorial(C,D), B is A*D.
Now, what exactly does the above code mean. How should I read it?
I am reading it as: factorial of (A, B) is true if A>0, C is A-1, factorial(C, D), B is A*D. That does not sound quite right to me... Is it?
"A > 0". So if A is equal to 0, what happens? It must not return at this point, or else the base case would never be used. So my guess is that A > 0 returns false, but the other functions are executed one last time. Did recursion stop because it reached the base case, or because A was not greater than 0? Or a combination of both? At what point is the base case used?
I guess that boils down to the question: What is the purpose of having both a base case and A > 0?
Sorry for the badly formed questions, thank you.
EDIT: In fact, I removed "A > 0" from the procedure and the code still works. So I guess my questions were not stupid at least. (And that code was taken from a tutorial.)
It is counterproductive to think of Prolog facts and rules in terms of data structures. When you write factorial(0, 1). you assert a fact to the Prolog interpreter that is assumed to be universally true. With this fact alone Prolog can answer questions of three types:
What is the factorial of 0? (i.e. factorial(0, X); the answer is X=1)
A factorial of what number is 1? (i.e. factorial(X,1); the answer is X=0)
Is it true that a factorial of 0 is 1? (i.e. factorial(0,1); the answer is "Yes")
As far as the rest of your Prolog program is concerned, only the first question is important. That is the question that the second clause of your factorial/2 rule will be asking at the end of evaluating a factorial.
The second rule uses comma operator, which is Prolog's way of saying "and". Your interpretation can be rewritten in terms of variables A and B like this:
B is a factorial of A when A>0, and C is set to A-1, and D is set to the factorial of C, and B is set to A times D
This rule covers all As above zero. The reference to factorial(C,D) will use the same rule again and again, until C arrives to zero. This is when this rule stops being applicable, so Prolog would grab the "base case" rule, and use 1 as its output. At this point, the chain of evaluating factorial(C, D) starts unwrapping, until it goes all the way to the initial invocation of the rule. This is when Prolog computes the final answer, and factorial/2 returns "Yes" and produces the desired output value.
In response to your edit, removing the A>0 is not dangerous only for getting the first result. Generally, you can ask Prolog to find you more results. This is when the factorial/2 with A>0 removed would fail spectacularly, because it would start going down the invocation chain of the second clause with negative numbers - a chain of calls that will end in numeric overflow or stack overflow, whichever comes first.
If you come from a procedural language background, the following C++ code might help. It mirrors pretty accurately the way the Prolog code executes (at least for the common case that A is given and B is uninstantiated):
bool fac(int a, int &b)
{
int c,d;
return
a==0 && (b=1,true)
||
a>0 && (c=a-1,true) && fac(c,d) && (b=a*d,true);
}
The Prolog comma operates like the sequential &&, and multiple clauses like a sequential ||.
My mental model for how prolog works is a tree traversal.
The facts and predicates in a prolog database form a forest of trees. When you ask the Prolog engine to evaluate a predicate:
?- factorial(6,N).
the Prolog engine looks for the tree rooted with the specified functor and arity (factorial/2 in this case). The Prolog engine then performs a depth-first traversal of that tree trying to find a solution using unification and pattern matching. Facts are evaluated as they are; For predicates, the right-hand side of the :- operator is evaluated, walking further into the tree, guided by the various logical operators.
Evaluation stops with the first successful evaluation of a leaf node in the tree, with the prolog engine remembering its state in the tree traversal. On backtracking, the tree traversal continues from where it left off. Execution is finally complete when the tree traversal is completed and there are no more paths to follow.
That's why Prolog is a descriptive language rather than an imperative language: you describe what constitutes truth (or falsity) and let the Prolog engine figure out how to get there.
I am currently trying to learn some basic prolog. As I learn I want to stay away from if else statements to really understand the language. I am having trouble doing this though. I have a simple function that looks like this:
if a > b then 1
else if
a == b then c
else
-1;;
This is just very simple logic that I want to convert into prolog.
So here where I get very confused. I want to first check if a > b and if so output 1. Would I simply just do:
sample(A,B,C,O):-
A > B, 1,
A < B, -1,
0.
This is what I came up with. o being the output but I do not understand how to make the 1 the output. Any thoughts to help me better understand this?
After going at it some more I came up with this but it does not seem to be correct:
Greaterthan(A,B,1.0).
Lessthan(A,B,-1.0).
Equal(A,B,C).
Sample(A,B,C,What):-
Greaterthan(A,B,1.0),
Lessthan(A,B,-1.0),
Equal(A,B,C).
Am I headed down the correct track?
If you really want to try to understand the language, I recommend using CapelliC's first suggestion:
sample(A, B, _, 1) :- A > B.
sample(A, B, C, C) :- A == B.
sample(A, B, _, -1) :- A < B.
I disagree with CappeliC that you should use the if/then/else syntax, because that way (in my experience) it's easy to fall into the trap of translating the different constructs, ending up doing procedural programming in Prolog, without fully grokking the language itself.
TL;DR: Don't.
You are trying to translate constructs you know from other programming languages to Prolog. With the assumption that learning Prolog means essentially mapping one construct after the other into Prolog. After all, if all constructs have been mapped, you will be able to encode any program into Prolog.
However, by doing that you are missing the essence of Prolog altogether.
Prolog consists of a pure, monotonic core and some procedural adornments. If you want to understand what distinguishes Prolog so much from other programming languages you really should study its core first. And that means, you should ignore those other parts. You have only so much attention span, and if you waste your time with going through all of these non-monotonic, even procedural constructs, chances are that you will miss its essence.
So, why is a general if-then-else (as it has been proposed by several answers) such a problematic construct? There are several reasons:
In the general case, it breaks monotonicity. In pure monotonic Prolog programs, adding a new fact will increase the set of true statements you can derive from it. So everything that was true before adding the fact, will be true thereafter. It is this property which permits one to reason very effectively over programs. However, note that monotonicity means that you cannot model every situation you might want to model. Think of a predicate childless/1 that should succeed if a person does not have a child. And let's assume that childless(john). is true. Now, if you add a new fact about john being the parent of some child, it will no longer hold that childless(john) is true. So there are situations that inherently demand some non-monotonic constructs. But there are many situations that can be modeled in the monotonic part. Stick to those first.
if-then-else easily leads to hard-to-read nesting. Just look at your if-then-else-program and try to answer "When will the result be -1"? The answer is: "If neither a > b is true nor a == b is true". Lengthy, isn't it? So the people who will maintain, revise and debug your program will have to "pay".
From your example it is not clear what arguments you are considering, should you be happy with integers, consider to use library(clpfd) as it is available in SICStus, SWI, YAP:
sample(A,B,_,1) :- A #> B.
sample(A,B,C,C) :- A #= B.
sample(A,B,_,-1) :- A #< B.
This definition is now so general, you might even ask
When will -1 be returned?
?- sample(A,B,C,-1).
A = B, C = -1, B in inf..sup
; A#=<B+ -1.
So there are two possibilities.
Here are some addenda to CapelliC's helpful answer:
When starting out, it is sometimes easy to mistakenly conceive of Prolog predicates functionally. They are either not functions at all, or they are n-ary functions which only ever yield true or false as outputs. However, I often find it helpful to forget about functions and just think of predicates relationally. When we define a predicate p/n, we're describing a relation between n elements, and we've named the relation p.
In your case, it sounds like we're defining conditions on an ordered triplet, <A, B, C>, where the value of C depends upon the relation between A and B. There are three relevant relationships between A and B (here, since we are dealing with a simple case, these three are exhaustive for the kind of relationship in question), and we can simply describe what value C should have in the three cases.
sample(A, B, 1.0) :-
A > B.
sample(A, B, -1.0) :-
A < B.
sample(A, B, some_value) :-
A =:= B.
Notice that I have used the arithmetical operator =:=/2. This is more specific than ==/2, and it lets us compare mathematical expressions for numerical equality. ==/2 checks for equivalence of terms: a == a, 2 == 2, 5+7 == 5+7 are all true, because equivalent terms stand on the left and right of the operator. But 5+7 == 7+5, 5+7 == 12, A == a are all false, since it are the terms themselves which are being compared and, in the first case the values are reversed, in the second we're comparing +(5,7) with an integer and in the third we're comparing a free variable with an atom. The following, however, are true: 2 =:= 2, 5 + 7 =:= 12, 2 + 2 =:= 4 + 0. This will let us unify A and B with evaluable mathematical expressions, rather than just integers or floats. We can then pose queries such as
?- sample(2^3, 2+2+2, X).
X = 1.0
?- sample(2*3, 2+2+2, X).
X = some_value.
CapelliC points out that when we write multiple clauses for a predicate, we are expressing a disjunction. He is also careful to note that this particular example works as a plain disjunction only because the alternatives are by nature mutually exclusive. He shows how to get the same exclusivity entailed by the structure of your first "if ... then ... else if ... else ..." by intervening in the resolution procedure with cuts. In fact, if you consult the swi-prolog docs for the conditional ->/2, you'll see the semantics of ->/2 explained with cuts, !, and disjunctions, ;.
I come down midway between CapelliC and SQB in prescribing use of the control predicates. I think you are wise to stick with defining such things with separate clauses while you are still learning the basics of the syntax. However, ->/2 is just another predicate with some syntax sugar, so you oughtn't be afraid of it. Once you start thinking relationally instead of functionally or imperatively, you might find that ->/2 is a very nice tool for giving concise expression to patterns of relation. I would format my clause using the control predicates thus:
sample(A, B, Out) :-
( A > B -> Out = 1.0
; A =:= B -> Out = some_value
; Out = -1.0
).
Your code has both syntactic and semantic issues.
Predicates starts lower case, and the comma represent a conjunction. That is, you could read your clause as
sample(A,B,C,What) if
greaterthan(A,B,1.0) and lessthan(A,B,-1.0) and equal(A,B,C).
then note that the What argument is useless, since it doesn't get a value - it's called a singleton.
A possible way of writing disjunction (i.e. OR)
sample(A,B,_,1) :- A > B.
sample(A,B,C,C) :- A == B.
sample(A,B,_,-1) :- A < B.
Note the test A < B to guard the assignment of value -1. That's necessary because Prolog will execute all clause if required. The basic construct to force Prolog to avoid some computation we know should not be done it's the cut:
sample(A,B,_,1) :- A > B, !.
sample(A,B,C,C) :- A == B, !.
sample(A,B,_,-1).
Anyway, I think you should use the if/then/else syntax, even while learning.
sample(A,B,C,W) :- A > B -> W = 1 ; A == B -> W = C ; W = -1.
I would like to know if there is a straightforward algorithm for rearranging simple symbolic algebraic expressions. Ideally I would like to be able to rewrite any such expression with one variable alone on the left hand side. For example, given the input:
m = (x + y) / 2
... I would like to be able to ask about x in terms of m and y, or y in terms of x and m, and get these:
x = 2*m - y
y = 2*m - x
Of course we've all done this algorithm on paper for years. But I was wondering if there was a name for it. It seems simple enough but if somebody has already cataloged the various "gotchas" it would make life easier.
For my purposes I won't need it to handle quadratics.
(And yes, CAS systems do this, and yes I know I could just use them as a library. I would like to avoid such a dependency in my application. I really would just like to know if there are named algorithms for approaching this problem.)
What you want is equation solving algorithm(s). But i bet that this is huge topic. In general case, there may be:
system of equations
equations may be non-linear, thus additional algorithms such as equation factorization needed.
knowledge how to reverse functions is needed,- for example =>
sin(x) + 10 = z, solving for x we reverse sin(), which is arcsin(). (Not all functions may be reversible !)
finally some equations may be hard-solvable even for CAS, such as
sin(x)+x=y, solve for x.
Hard answer is - your best bet is to take source code of some CAS,- for example you can take a look at MAXIMA CAS source code which is written in LISP. And find code which is responsible for equation solving.
Easy answer - if all that you need is solving equation which is linear and is composed only from basic operators +-*/. Then you know answer already - use old good paper method
- think of what rules we used on paper, and just re-write these rules as symbolic algorithm which manipulates equation string.
good luck !
It seems like what you're interested in doing is maintaining a system of linear equations and then, at any time, being able to solve for one variable in terms of all the others. If you encode the relationships as a matrix, it seems like you could then reduce the matrix to some nice form (for example, reduced row echelon form) to get the "simplest" dependencies amongst the variables (for some nice definition of "simplest.") Once you have the data like this, you should be able to read off all the dependencies by just looking at some row that has a nonzero entry for the variable in question, then normalizing it so that the variable has coefficient one.
A note - in general, you won't always get a unique solution for each variable. For example, given the trivial equations
x = y
x = z
Then solving for z could yield either "z = x" or "z = y," depending on how much simplification you want. Or alternatively, in a setup like
x = 2y + 3w
x = 9z
Returning a value for x could hand back either expression, or their sum over two, or a whole bunch of other things that are all technically true but not necessarily useful. I'm not sure how you'd handle this, but depending on the form of your equations you can probably find a way to handle it.
Convert your expression into a data structure (tree) in reverse Polish notation. Your tree is made up of nodes, each node has an operation, a left and a right. Each of the left and right can be a symbol (eg: "x") or another node. For example:
(x + (a + b))
Would become:
(+ x (+ a b))
Or in JSON:
["+", "x", ["+", "a", "b"]]
Your original expression m = (x + y) / 2 would look like this:
m = ["/", ["+", "x", "y"], "2"]
One of your desired expressions (solving for x) looks like this:
x = ["-", ["*", "m", "2"], "y"]
Can you see that the tree of expressions has been turned inside out and each operator has been reversed? The "-" is the reverse of the "+" and now wraps the "*" which is a reverse of the "/". The:
["+", "x", "y"]
Becomes:
["-", (something), "y"]
Where (something) is a reversal of the outer expression recursively. To try to explain the process: a) recursively descend the expression tree until you find the node containing the symbol you want to solve for, b) make a new node containing a reverse of this nodes operation, c) replace the symbol you want to solve for with the reversed outer expression, recursively doing this as you progress back outwards.
There are various simple ways in which the initial equation can be modified, performing the correct modifications in the correct order will lead to the correct solution. So how about seeing this as a search or even a pathfinding problem?