This question already has answers here:
Solving a linear equation
(11 answers)
Closed 9 years ago.
Is there a way to solve a simple linear equation like
-x+3 = x+5
Using binary search? or any other numerical method?
BACKGROUND:
My question comes because I want to solve equations like "2x+5-(3x+2)=x+5" Possible operators are: *, -, + and brackets.
I thought first of converting it to infix notation both sides of the equation, and then performing some kind of binary search.
What do you think of this approach? I'm supposed to solve this in less than 40 min in an interview.
It is not hard to write a simple parser that solves $-x+3 -(x+5) = 0$ or any other similar expression algebraically to $a*x + b = 0$ for cumulated constants $a$ and $b$. Then, one could easily compute the exact solution to be $x = -b/a$.
If you really want a numerical approach, observe that both sides describe their own linear function graph, i.e., $y_l = -x_l+3$ on the left an $y_r = x_r + 5$ on the right. Thus, finding a solution to this equation is the same as finding an intersection point of both functions. Therefore you can start with any value $x=x_l=x_r$ and evaluate both sides to get the corresponding left and right $y$-values $y_l$ and $y_r$. If their difference is $0$, then you found a solution (either the unique intersection point by luck, or both lines are equal as in $2x = 2x$). Otherwise, check, e.g., position $x+1$. If the new difference $y_l - y_r$ is unchanged to before, both lines are parallel (for example $2x = 2x + 7$). Otherwise the difference has gone farer away or nearer towards 0 (from positive or negative side). So, now you have all that you need to numerically test further points $x$ (e.g., in a binary search fashion if you at first look for some $x$ that achieves a positive $y$-difference and another $x$ that achieves a negative $y$-difference and then run binary search between them) to approximate the $x$-value for which the difference $y_l - y_r$ is $0$. (Of course, you could alternatively compute the solution algebraically again, since evaluating the lines at two positions gives you all information that you need to compute the intersection point exactly).
Thus, the numerical approach is quite absurd here, but it motivates this algorithmic way of thinking.
Do you really need to solve it with a numerical approach? I'm pretty sure you can, but it's not so hard to parse the expression to solve it analytically. I mean, if it is indeed a linear equation, it's just a matter to discover what is the coeficient of x and the free term when the equation is reduced. In the 26 minutes of this question, I made a simple parser to do that, by hand:
import re, sys, json
TOKENS = {
'FREE': '[0-9]+',
'XTERM': '[0-9]*x',
'ADD': '\+',
'SUB': '-',
'POW': '\^',
'MUL': '\*',
'EQL': '=',
'LPAREN': '\(',
'RPAREN': '\)',
'EOF': '$'
}
class Token:
EOF = lambda p: Token('EOF', '', p)
def __init__(self, name, raw, position):
self.name = name
self.image = raw.strip()
self.raw = raw
self.position = position
class Expr:
def __init__(self, x, c):
self.x = x
self.c = c
def add(self, e):
return Expr(self.x + e.x, self.c + e.c)
def sub(self, e):
return Expr(self.x - e.x, self.c - e.c)
def mul(self, e):
return Expr(self.x * e.c + e.x * self.c, self.c * e.c)
def neg(self):
return Expr(-self.x, -self.c)
class Scanner:
def __init__(self, expr):
self.expr = expr
self.position = 0
def match(self, name):
match = re.match('^\s*'+TOKENS[name], self.expr[self.position:])
return Token(name, match.group(), self.position) if match else None
def peek(self, *allowed):
for match in map(self.match, allowed):
if match: return match
def next(self, *allowed):
token = self.peek(*TOKENS)
self.position += len(token.raw)
return token
def maybe(self, *allowed):
if self.peek(*allowed):
return self.next(*allowed)
def following(self, value, *allowed):
self.next(*allowed)
return value
def expect(self, **actions):
token = self.next(*actions.keys())
return actions[token.name](token)
def evaluate(expr, variables={}):
tokens = Scanner(expr)
def Binary(higher, **ops):
e = higher()
while tokens.peek(*ops):
e = ops[tokens.next(*ops).name](e, higher())
return e
def Equation():
left = Add()
tokens.next('EQL')
right = Add()
return left.sub(right)
def Add(): return Binary(Mul, ADD=Expr.add, SUB=Expr.sub)
def Mul(): return Binary(Neg, MUL=Expr.mul)
def Neg():
return Neg().neg() if tokens.maybe('SUB') else Primary()
def Primary():
return tokens.expect(
FREE = lambda x: Expr(0, float(x.image)),
XTERM = lambda x: Expr(float(x.image[:-1] or 1), 0),
LPAREN = lambda x: tokens.following(Add(), 'RPAREN'))
expr = tokens.following(Equation(), 'EOF')
return -expr.c / float(expr.x)
print evaluate('2+2 = x')
print evaluate('-x+3 = x+5')
print evaluate('2x+5-(3x+2)=x+5')
First, your question must be related to Solving Binary Tree. A method that you can use is to construct a binary try putting the root the operator with highest priority, following lower priority operators and operations are leaf nodes. You can learn about this method in solving equation.
Related
Corrects sequences of parentesis can be defined recursively:
The empty string "" is a correct sequence.
If "X" and "Y" are correct sequences, then "XY" (the concatenation of
X and Y) is a correct sequence.
If "X" is a correct sequence, then "(X)" is a correct sequence.
Each correct parentheses sequence can be derived using the above
rules.
Given two strings s1 and s2. Each character in these strings is a parenthesis, but the strings themselves are not necessarily correct sequences of parentheses.
You would like to interleave the two sequences so that they will form a correct parentheses sequence. Note that sometimes two different ways of interleaving the two sequences will produce the same final sequence of characters. Even if that happens, we count each of the ways separately.
Compute and return the number of different ways to produce a correct parentheses sequence, modulo 10^9 + 7.
Example s1 = (() and s2 = ())
corrects sequences of parentheses, s1 (red) and s2(blue)
I don't understand the recursive algorithm, what does X and Y mean? And modulo 10^9 + 7?
First, I tried defining all permutations of s1 and s2 and then calculate the number of balanced parentheses. But that way is wrong, isn't it?
class InterleavingParenthesis:
def countWays(self, s1, s2):
sequences = list(self.__exchange(list(s1 + s2)))
corrects = 0
for sequence in sequences:
if self.__isCorrect(sequence):
corrects += 1
def __isCorrect(self, sequence):
s = Stack()
balanced = True
i = 0
while i < len(sequence) and balanced:
if '(' == sequence[i]:
s.stack(sequence[i])
elif s.isEmpty():
balanced = False
else: s.remove()
i += 1
if s.isEmpty() and balanced: return True
else: return False
def __exchange(self, s):
if len(s) <= 0: yield s
else:
for i in range(len(s)):
for p in self.__exchange(s[:i] + s[i + 1:]):
yield [s[i]] + p
class Stack:
def __init__(self):
self.items = []
def stack(self, data):
self.items.append(data)
def remove(self):
self.items.pop()
def isEmpty(self):
return self.items == []
Here's an example that shows how this recursive property works:
Start with:
X = "()()(())"
Through property 2, we break this into further X and Y:
X = "()" ; Y = "()(())"
For X, we can look at the insides with property 3.
X = ""
Because of property 1, we know this is valid.
For Y, we use property 2 again:
X = "()"
Y = "(())"
Using the same recursion as before (property 2, then property 1) we know that X is valid. Note that in code, you usually have to go through the same process, I'm just saving time for humans. For Y, you use property 3:
X = "()"
And again.. :
X = ""
And with property 1, you know this is valid.
Because all sub-parts of "()()(())" are valid, "()()(())" is valid. That's an example of recursion: You keep breaking things down into smaller problems until they are solvable. In code, you would have the function call itself with regards to a smaller part of it, in your case, X and Y.
As for the question you were given, there is a bit that doesn't make sense to me. I don't get how there is any room for doubt in any string of parentheses, like in the image you linked. In "((()())())" for example, there is no way these two parentheses do not match up: "((()())())". Therefore my answer would be that there is only one permutation for every valid string of parentheses, but this obviously is wrong somehow.
Could you or anyone else expand on this?
I have a two-fold homework problem, Implement Karp-Rabin and run it on a test file and the second part:
For the hash values modulo q, explain why it is a bad idea to use q as a power of 2. Can you construct a terrible example e.g. for q=64
and n=15?
This is my implementation of the algorithm:
def karp_rabin(text, pattern):
# setup
alphabet = 'ACGT'
d = len(alphabet)
n = len(pattern)
d_n = d**n
q = 2**32-1
m = {char:i for i,char in enumerate(alphabet)}
positions = []
def kr_hash(s):
return sum(d**(n-i-1) * m[s[i]] for i in range(n))
def update_hash():
return d*text_hash + m[text[i+n-1]] - d_n * m[text[i-1]]
pattern_hash = kr_hash(pattern)
for i in range(0, len(text) - n + 1):
text_hash = update_hash() if i else kr_hash(text[i:n])
if pattern_hash % q == text_hash % q and pattern == text[i:i+n]:
positions.append(i)
return ' '.join(map(str, positions))
...The second part of the question is referring to this part of the code/algo:
pattern_hash = kr_hash(pattern)
for i in range(0, len(text) - n + 1):
text_hash = update_hash() if i else kr_hash(text[i:n])
# the modulo q used to check if the hashes are congruent
if pattern_hash % q == text_hash % q and pattern == text[i:i+n]:
positions.append(i)
I don't understand why it would be a bad idea to use q as a power of 2. I've tried running the algorithm on the test file provided(which is the genome of ecoli) and there's no discernible difference.
I tried looking at the formula for how the hash is derived (I'm not good at math) trying to find some common factors that would be really bad for powers of two but found nothing. I feel like if q is a power of 2 it should cause a lot of clashes for the hashes so you'd need to compare strings a lot more but I didn't find anything along those lines either.
I'd really appreciate help on this since I'm stumped. If someone wants to point out what I can do better in the first part (code efficiency, readability, correctness etc.) I'd also be thrilled to hear your input on that.
There is a problem if q divides some power of d, because then only a few characters contribute to the hash. For example in your code d=4, if you take q=64 only the last three characters determine the hash (d**3 = 64).
I don't really see a problem if q is a power of 2 but gcd(d,q) = 1.
Your implementation looks a bit strange because instead of
if pattern_hash % q == text_hash % q and pattern == text[i:i+n]:
you could also use
if pattern_hash == text_hash and pattern == text[i:i+n]:
which would be better because you get fewer collisions.
The Thue–Morse sequence has among its properties that its polynomial hash quickly becomes zero when a power of 2 is the hash module, for whatever polynomial base (d). So if you will try to search a short Thue-Morse sequence in a longer one, you will have a great lot of hash collisions.
For example, your code, slightly adapted:
def karp_rabin(text, pattern):
# setup
alphabet = '01'
d = 15
n = len(pattern)
d_n = d**n
q = 32
m = {char:i for i,char in enumerate(alphabet)}
positions = []
def kr_hash(s):
return sum(d**(n-i-1) * m[s[i]] for i in range(n))
def update_hash():
return d*text_hash + m[text[i+n-1]] - d_n * m[text[i-1]]
pattern_hash = kr_hash(pattern)
for i in range(0, len(text) - n + 1):
text_hash = update_hash() if i else kr_hash(text[i:n])
if pattern_hash % q == text_hash % q : #and pattern == text[i:i+n]:
positions.append(i)
return ' '.join(map(str, positions))
print(karp_rabin('0110100110010110100101100110100110010110011010010110100110010110', '0110100110010110'))
outputs a lot of positions, although only three of then are proper matches.
Note that I have dropped the and pattern == text[i:i+n] check. Obviously if you restore it, the result will be correct, but also it is obvious that the algorithm will do much more work checking this additional condition than for other q. In fact, because there are so many collisions, the whole idea of algorithm becomes not working: you could almost as effectively wrote a simple algorithm that checks every position for a match.
Also note that your implementation is quite strange. The whole idea of polynomial hashing is to take the modulo operation each time you compute the hash. Otherwise your pattern_hash and text_hash are very big numbers. In other languages this might mean arithmetic overflow, but in Python this will invoke big integer arithmetic, which is slow and once again loses the whole idea of the algorithm.
I have the following function:
F(0) = 0
F(1) = 1
F(2) = 2
F(2*n) = F(n) + F(n+1) + n , n > 1
F(2*n+1) = F(n-1) + F(n) + 1, n >= 1
I am given a number n < 10^25 and I have to show that exists a value a such as F(a)=n. Because of how the function is defined, there might exist a n such as F(a)=F(b)=n where a < b and in this situation I must return b and not a
What I have so far is:
We can split this function into two strict monotone series, one for F(2*n) and one for F(2*n+1) and can find the specified value in logarithmic time, so the finding is more or less done.
I've also found that F(2*n) >= F(2*n+1) for any n, so I first search for it in F(2*n) and if I don't find it there, I search in F(2*n+1)
The problem is calculating the function value. Even with some crazy memoization up to 10^7 and then falling back to recursion, it still couldn't calculate values above 10^12 in a reasonable time.
I think I have the algorithm for actually finding what I need all figured out, but I can't calculate F(n) fast enough.
Simply use memoisation all the way up to the target value, e.g. in Python:
class Memoize:
def __init__(self, fn):
self.fn = fn
self.memo = {}
def __call__(self, *args):
if not self.memo.has_key(args):
self.memo[args] = self.fn(*args)
return self.memo[args]
#Memoize
def R(n):
if n<=1: return 1
if n==2: return 2
n,rem = divmod(n,2)
if rem:
return R(n)+R(n-1)+1
return R(n)+R(n+1)+n
This computes the answer for 10**25 instantly.
The reason this works is because the nature of the recursion means that for a binary number abcdef it will only need to at most use the values:
abcdef
abcde-1,abcde,abcde+1
abcd-2,abcd-1,abcd,abcd+1,abcd+2
abc-2,abc-1,abc,abc+1,abc+2
ab-2,ab-1,ab,ab+1,ab+2
a-2,a-1,a,a+1,a+2
At each step you can move up or down 1, but you also divide the number by 2 so the most you can move away from the original number is limited.
Therefore the memoised code will only use at most 5*log_2(n) evaluations.
Given a certain object that respond_to? :+ I would like to know what it's the identity element for that operation on that object. For example, if a is Fixnum then it should give 0 for operation :+ because a + 0 == a for any Fixnum. Of course I already know the identity element for :+ and :* when talking about Fixnums, but is there any standard pattern/idiom to obtain those dynamically for all Numeric types and operations?.
More specifically I have write some code (see below) to calculate shortest path between v1 and v2 (vertexes in a graph) where the cost/distance/weigh of each edge in the graph is given in a user-specified type. In the current implementation the cost/weight of the edges could be a Fixnum, a Float or anything that implements Comparable and can add 0 to itself and return self.
But I was wondering what is the best pattern:
requiring that type used must support a + 0 == a
requiring that type provide some kind of addition identity element discovery 'a.class::ADDITION_IDENTITY_ELEMENT
??
My Dijkstra algorithm implementation
def s_path(v1,v2)
dist = Hash.new { nil}
pred = {}
dist[v1] = 0 # distance from v1 to v1 is zero
#pq = nodes
pq = [v1]
while u = pq.shift
for edge in from(u)
u,v,cost = *edge
new_dist = cost + dist[u]
if dist[v].nil? or new_dist < dist[v]
dist[v] = new_dist
pred[v] = u
pq << v
end
end
end
path = [v2]
path << pred[path.last] while pred[path.last]
path.reverse
end
I think the a.class::ADDITION_IDENTITY_ELEMENT is pretty good except I would call it a.class::Zero.
Another option would be to do (a-a).
Personally I wouldn't try to make things so abstract and I would just require that every distance be a Numeric (e.g. Float or Integer). Then you can just keep using 0.
i am working on a python script to test out genetic programming.
As an exercise i have made a simple Script that tries to guess
a string without the whole population part.
My Code is:
# acts as a gene
# it has three operations:
# Mutation : One character is changed
# Replication: a sequencepart is duplicated
# Extinction : A sequencepart is lost
# Crossover : the sequence is crossed with another Sequence
import random
class StringGene:
def __init__(self, s):
self.sequence = s
self.allowedChars = "ABCDEFGHIJKLMOPQRSTUVWXYZ/{}[]*()+-"
def __str__(self):
return self.sequence
def Mutation(self):
x = random.randint(0, len(self.sequence)-1)
r = random.randint(0, len(self.allowedChars)-1)
d = self.sequence
self.sequence = d[:x-1]+ self.allowedChars[r] + d[x:]
def Replication(self):
x1 = random.randint(0, len(self.sequence)-1)
x2 = random.randint(0, len(self.sequence)-1)
self.sequence =self.sequence[:x1]+ self.sequence[x1:x2] + self.sequence[x2:]
self.sequence = self.sequence[:32]
def Extinction(self):
x1 = random.randint(0, len(self.sequence)-1)
x2 = random.randint(0, len(self.sequence)-1)
self.sequence = self.sequence[:x1] + self.sequence[x2:]
def CrossOver(self, s):
x1 = random.randint(0, len(self.sequence)-1)
x2 = random.randint(0, len(s)-1)
self.sequence = self.sequence[:x1+1]+ s[x2:]
#x1 = random.randint(0, len(self.sequence)-1)
#self.sequence = s[:x2 ] + self.sequence[x1+1:]
if __name__== "__main__":
import itertools
def hamdist(str1, str2):
if (len(str2)>len(str1)):
str1, str2 = str2, str1
str2 = str2.ljust(len(str1))
return sum(itertools.imap(str.__ne__, str1, str2))
g = StringGene("Hi there, Hello World !")
g.Mutation()
print "gm: " + str(g)
g.Replication()
print "gr: " + str(g)
g.Extinction()
print "ge: " + str(g)
h = StringGene("Hello there, partner")
print "h: " + str(h)
g.CrossOver(str(h))
print "gc: " + str(g)
change = 0
oldres = 100
solutionstring = "Hello Daniel. Nice to meet you."
best = StringGene("")
res = 100
print solutionstring
while (res > 0):
g.Mutation()
g.Replication()
g.Extinction()
res = hamdist(str(g), solutionstring)
if res<oldres:
print "'"+ str(g) + "'"
print "'"+ str(best) + "'"
best = g
oldres = res
else :
g = best
change = change + 1
print "Solution:" + str(g)+ " " + str(hamdist(solutionstring, str(g))) + str (change)
I have a crude hamming distance as a measure how far the solution string
differs from the current one. However i want to be able to have a varying
length in the guessing, so i introduced replication and deletion of parts
of the string.
Now, however the string grows infinitely and the Solution String is never
found. Can you point out, where i went wrong?
Can you suggest improvements?
cheers
Your StringGene objects are mutable, which means that when you do an operation like best = g, you are making both g and best reference the same object. Since after that first step you only have a single object, every mutation gets applied permanently, whether or not it's successful, and all comparisons between g and best are comparisons between the same object.
You either need to implement a copy operator, or make instances immutable, and have each mutation operator return a modified version of the 'gene'.
Also, if the first mutation fails to improve the string, you set g to best, which is an empty string, throwing away your starting string entirely.
Finally, the canonical test string is "Methinks it is like a weasel".
The simplest thing might be to limit how long the guessed string is allowed to be. Don't allow guesses above a certain length.
I had a look at your code and I'm not good enough in Python to find any bugs, but it might be that you're simply referencing or indexing the array incorrectly, resulting in always adding new characters to the guess-string, so your string is always increasing in length... I don't know if that's the bug, but things like that have happened to me before, so double-check your array indicies. ;)
I think your fitness function is too simple. I would play with using two variables, one the size distance and the other your "hamdist". The further the size difference is, the more it effects the total fitness. So add the two together with some percentage constant.
I'm also not very familiar with python, but it looks to me that this is not what you're doing.
First of all, what you are doing is a genetic algorithm, not genetic programming (which is a related, but a different concept).
I don't know Python, but it looks you have a major problem in your extinction function. As far as I can tell, if x1 > x2 it causes the string to increase in size instead of decreasing (the part between x1 and x2 is effectively doubled). What would happen in the replication function when x1 > x2, I can't tell without knowing Python.
Also keep in mind, that maintaining a population is key to effectively solving problems with genetic algorithms. Crossovers are the essential part of the algorithm, and they make little or no sense if they are not made between population members (also, the more varied the population is, the better, most of the time). The code you presented is dependant on mutations of a single specimen to achieve your expected result, and thus highly unlikely to produce anything useful faster than a simple brute force method.