Given a string consisting of number, add + or - sign to make the expression values 0. Return the expression.
For example,
123 => 1 + 2 -3 = 0
173956 => 17 + 39 - 56 = 0
I have no clues to solve this problem other than a brute-force way.
Is there any suggestion?
This is a search problem. Search must be performed in the solution space.
Suppose we starting from '123' string, at this point, we can add + or - sign after '1', as result we get '1 + 23' or '1 - 23'. Every variant can be split further by adding a sign after next character. As result, all possible sign additions will form tree-like structure - the solution space. Your algorithm must search solution in this structure. I think A* can be used to do this.
Anders K draw nice ASCII graph of the solution space, you just need to search it for solution. Simple breadth first search or depth first search can do the work, but I think it will be slow if solution space is large.
Also, I think that is possible to find more optimal, specific solution that exploits properties of the solution space, for example - it's tree-like structure.
you can solve it in many ways for example using a recursive approach which becomes obvious if you structure it up as a tree
e.g. 123
since there can be two different signs after a digit digit (+|-) :
1
/ \
+ -
/ \
2 2
/ \ / \
+ - + -
| | | |
3 3 3 3
Related
My problem is to match text with math expressions inside.
I saw this topic - What is the best way to index documents which contain mathematical expression in elastic search?
But in my context I can't just create an additional field with math expression. I have to match text like math tasks for students and these tasks could have multiple math expressions.
I think the MathML is most preferred format here because I can split MathML tags into words and match them as a usual words.
I'm interested to get most close match results to math expressions. What is the most proper way to reach this kind of matching?
Examples:
Solve the equation (2x + 7) ^ 2 = (2x - 1) ^ 2 .
Find all values of the parameter a, for each of which the equation: | x - a ^ 2 + a + 2 | + | x - a ^ 2 + 3a - 1 | = 2a - 3 has roots, but none of them belongs to the interval (4; 19)
P.S. graphical representation of equation:
Suppose we have a string of binary values in which some portions may correspond to specific letters, for example:
A = 0
B = 10
C = 001
D = 010
E = 001
For example, if we assume the string "001010", we can have 6 different possibilities:
AABB
ADB
CAB
CD
EAB
ED
I have to extract the exact number of combinations.
I'm trying to solve the problem conceptually by a dynamic programming point of view but I have difficulty in the formulation of subproblems and in the composition of the corresponding matrix.
I appreciate any indications of the correct algorithm formulation.
Thanks in advance.
You can use a simple recursive procedure: try to match every pattern to the beginning of the string; if there is a match, repeat recursively with the remainder of the string. When the string is empty, you have found a decoding.
Patterns= ["0", "10", "001", "010", "001"]
Letters= "ABCDE"
def Decode(In, Out):
global Patterns
if len(In) == 0:
print Out
else:
for i in range(len(Patterns)):
if In[:len(Patterns[i])] == Patterns[i]:
Decode(In[len(Patterns[i]):], Out + Letters[i])
Decode("001010", "")
AABB
ADB
CAB
CD
EAB
ED
You can formulate a DP whereby f(i) = sum( f(i - j) * count(matches_j) ), for all matches of length j ending at index i, which, depending on the input, you might also speed up by creating a custom trie for the dictionary so you would only check relevant matches (e.g., A followed by B followed by D). To take your example:
f(0) = 1
f(1) = 1 * f(0) = 1
f(2) = 2
f(3) = 1 * f(2) + 1 * f(1) + 1 * f(0) = 4
f(4) = 0
f(5) = 1 * f(4) + 1 * f(3) + 1 * f(2) = 6
When solving DP problems, it often helps to think about a recursive solution first, then thinking about converting it to a DP solution.
A nice recursive insight here is that if you have a nonempty string of digits, any way of decoding it will start with some single character. You could therefore count the number of ways to decode the string by trying each character, seeing if it matches at the beginning and, if so, counting up how many ways there are to decode the rest of the string.
The reason this turns into a nice DP problem is that when you pull off a single character you're left with a shorter string of digits that's always a suffix of the original string. So imagine that you made a table storing, for each suffix of the original string, how many ways there were to decode that string. If you fill that matrix in from the right to the left using the above insight, you'd ultimately end up getting the final answer by reading off the entry corresponding to the entire string.
See if you can find a way to turn this into a concrete algorithm and to then go and code it up. Good luck!
In looking through the dynamic programming algorithm for computing the minimum edit distance between two strings I am having a hard time grasping one thing. To me it seems like given the two strings s and t inserting a character into s would be the same as deleting a character from t. Why then do we need to consider these operations separately when computing the edit distance? I always have a hard time computing the indices in the recurrence relation because I can't intuitively understand this part.
I've read through Skiena and some other sources but they all don't explain this part well. This SO link explains the insert and delete operations better than elsewhere in terms of understanding what string is being inserted into or deleted from but I still can't figure out why they aren't one and the same.
Edit: Ok, I didn't do a very good job of detailing the source of my confusion.
The way Skiena explains computing the minimum edit distance m(i,j) of the first i characters of a string s and the first j characters of a string t based on already having computed solutions to the subproblems is as follows. m(i,j) will be the minimum of the following 3 possibilities:
opt[MATCH] = m[i-1][j-1].cost + match(s[i],t[j]);
opt[INSERT] = m[i][j-1].cost + indel(t[j]);
opt[DELETE] = m[i-1][j].cost + indel(s[i]);
The way I understand it the 3 operations are all operations on the string s. An INSERT means you have to insert a character at the end of string s to get the minimum edit distance. A DELETE means you have to delete the character at the end of string s to get the minimum edit distance.
Given s = "SU" and t = "SATU" INSERT and DELETE would be as follows:
Insert:
SU_
SATU
Delete:
SU
SATU_
My confusion was that an INSERT into s is the same as a DELETION from t. I'm probably confused on something basic but it's not intuitive to me yet.
Edit 2: I think this link kind of clarifies my confusion but I'd love an explanation given my specific questions above.
They aren't the same thing any more than < and > are the same thing. There is of course a sort of duality and you are correct to point it out. a < b if and only if b > a so if you have a good algorithm to test for b > a then it makes sense to use it when you need to test if a < b.
It is much easier to directly test if s can be obtained from t by deletion rather than to directly test if t can be obtained from s by insertion. It would be silly to randomly insert letters to s and see if you get t. I can't imagine that any implementation of edit-distance actually does that. Still, it doesn't mean that you can't distinguish between insertion and deletion.
More abstractly. There is a relation, R on any set of strings defined by
s R t <=> t can be obtained from s by insertion
deletion is the inverse relation. Closely related, but not the same.
The problem of edit distance can be restated as a problem of converting the source string into target string with minimum number of operations (including insertion, deletion and replacement of a single character).
Thus, in the process of converting a source string into a target string, if inserting a character from target string or deleting a character from the source string or replacing a character in the source string with a character from the target string yields the same (minimum) edit distance, then, well, all the operations can be said to be equivalent. In other words, it does not matter how you arrive at the target string as long as you have done minimum number of edits.
This is realized by looking at how the cost matrix is calculated. Consider a simpler problem where source = AT (represented vertically) and target = TA (represented horizontally). The matrix is then constructed as (coming from west, northwest, north in that order):
| ε | T | A |
| | | |
ε | 0 | 1 | 2 |
| | | |
A | 1 | min(2, 1, 2) = 1 | min(2, 1, 3) = 1 |
| | | |
T | 2 | min(3, 1, 3) = 1 | min(2, 2, 2) = 2 |
The idea of filling this matrix is:
If we moved east, we insert the current target string character.
If we moved south, we delete the current source string character.
If we moved southeast, we replace the current source character with current target character.
If all or any two of these impart the same cost in terms of editing, then they can be said to be equivalent and you can break the ties arbitrarily.
One of the first experiences with this comes when we find c(2, 2) in the cost matrix (c(0, 0) through c(0, 2) -- minimum costs of converting an empty string to "T", "TA" respectively, and c(0, 0) to c(2,0) -- costs of converting "A", "AT" respectively to empty string are clear).
Value of c(2, 2), can be realized either by:
inserting the current character in target, 'A' (we move east from c(2,1)) -- cost is 1 + 1 = 2, or
replacing the current character 'T' in source by current character in target 'A' -- cost is `1 + 1 = 2
deleting the current character in source, 'T' (we move south from c(1, 2)) -- cost is 1 + 1 = 2
Since all values are the same, which one are you going to choose?
If you choose to move from west, your alignment could be:
A T -
- T A
(one deletion, one 0-cost replacement, one insertion)
If you choose to move from north, your alignment could be:
- A T
T A -
(one insertion, one 0-cost replacement, one deletion)
If you choose to move from northwest, your alignment could be:
A T
T A
(Two 1-cost replacements).
All these edit graphs are equivalent in terms of given edit distance (under given cost function).
Edit distance is only interested in the minimum number of operations required to transform one sequence into another; it is not interested in the uniqueness of the transformation. In practice, there are often multiple ways to transform one string into another, that all have the minimum number of operations.
I have a string that need to be compressed by a dictionary compression algorithm. If a substring is found in the dictionary, it is encoded with cost 2. If no match is found, the cost will be the size of the substring. Given a fixed dictionary and a string, how could I chose the best substrings in the dictionary resulting in the minimum cost?
For example, consider the string ABBBBBCD and the following dictionary:
entry 1 - ABBB
entry 2 - BBCD
entry 3 - BBBBB
entry 4 - ABBBB
entry 5 - CD
The best solution is to chose ABBB and BBCD, resulting in cost 2 + 2 = 4.
If I choose A, BBBBB, C and D, the cost would be 1 + 2 + 1 + 1 = 5, that is worst than the first.
Yet, if I choose ABBBB, B, CD, the cost will be 2 + 1 + 2 = 5.
After the explanations, my question is: is there a known algorithm that solves this problem? Or, is there some known algorithm that could be modified in order that I can solve the problem not using brute force method?
Please, ask me if something is not clear.
You can formulate and solve it as a shortest path problem.
Create a graph with each index as a vertex. Now add a directed edge from i to j (i
Now find the shortest path from index 1 to n. (See: http://www.geeksforgeeks.org/shortest-path-for-directed-acyclic-graphs/ )
Context: I am building a FoundationDB, and I am thinking about which key use first
Let's say we have this set of elements :
{AP,AQ,AR,BP,BQ,BR}
and we want to build a tree from it. One way is to group by the first character first, and then by the second, obtaining
root
+-----+------+
+ +
A B
+----+----+ +----+----+
| | | | | |
+ + + + + +
P Q R P Q R
One other possible way is to group first by the second character, and then by the first, obtaining:
root
+--------+--------+
+ + +
P Q R
+--+-+ +--+--+ +-+--+
+ + + + + +
A B A B A B
Assuming the probability distribution of the strings is uniform, which one leads to the fastest search time? In general, is best to having an high number of branches on the top levels or the trees or on the bottom ones?
First solution will lead to choosing one out of 2 options and then choosing one out of 3 options, while the second will first make a choice one out of three and then one out of two. Theoretically both should be approximately the same.
EDIT: as per your comment. In case you have two layers where the number of choices is significantly different like 30 and 1000000 I advice you to put the 30 options on the higher level and then have the 1000000 ones on the lower level. I believe caching will speed up the lower level more in similar cases.