This question is purely out of curiosity. I am off school for the summer, and was going to implement an algorithm to solve this just for fun. That led to the above question, how hard is this problem?
The problem: you are given a list of positive integers, a set of mathematical operators and the equal sign(=). can you create a valid mathematical expression using the integers (in the same order) and the operators (any number of times)?
An example will should clarify any questions:
given: {2, 3, 5, 25} , {+, -, *, /} , {=}
output: YES
the expression (only one i think) is (2 + 3) * 5 = 25. you only need to output YES/NO.
I believe the problem is in NP. I say this because it is a decision problem (YES/NO answer) and I can find a non-deterministic poly time algorithm that decides it.
a. non-deterministically select a sequence of operators to place between the integers.
b. verify you answer is a valid mathematical expression (this can be done in constant
time).
In this case, the big question is this: Is the problem in P? (i.e. Is there a deterministic poly time algorithm that decides it?) OR Is the problem NP complete? (i.e. Can a known NP Complete problem be reduced to this? or equivalently Is every NP language poly time reducable to this problem?) OR neither? (i.e. problem in NP but not NP Complete)
Note: This problem statement assumes P not equal to NP. Also, although I am new to Stack Overflow, I am familiar with the homework tag. This is indeed just curiosity, not homework :)
An straightforward reduction from the Partition problem (which is NP-Complete) - given a set of N integers S, the input to the "Valid Math" problem would be - the elements of S, N-2 '+' operators and an '=' sign.
There seems to be some sort of confusion about how to check for NP-completeness. An NP-complete problem is at least as hard, in a particular sense, as any other problem in NP. Suppose we were comparing to 3SAT, as some posters are trying to do.
Now, reducing the given problem to 3SAT proves nothing. It is then true that, if 3SAT can be solve efficiently (meaning P=NP), the given problem can be solved efficiently. However, if the given problem can be solved efficiently, then perhaps it corresponds only to easy special cases of 3SAT.
We would have to reduce 3SAT to the given problem. This means that we would have to make up a rule to transform arbitrary 3SAT problems to examples of the given problem, such that the solution of the given problem would tell us how to solve the 3SAT problem. This means that 3SAT couldn't be harder than the given problem. Since 3SAT is the hardest possible, then the given problem must also be the hardest possible.
The reduction from the Partition problem works. That problem works like this: given a multiset S of integers, can we divide this into two disjoint subsets that between them include each member of S, such that the sums of the disjoint subsets are equal?
To do this, we construct a sequence beginning with 0, containing each element of S, and then 0. We use {+, -} as the operation set. This means that each element of S will be either added or subtracted to total to 0, meaning that the sum of the added elements is the same as the sum of the subtracted elements.
Therefore, this problem is at least as hard as the Partition problem, since we can solve a example Partition program if we can solve the given one, and is therefore NP-complete.
OK, first, you specify "set" of integers but a set is by definition unordered, so you mean a "list" of integers.
Also, I am going to make an assumption here which may be wrong, which is that the = sign always appears exactly once, between the second to last and the last integer on your list. If you allow the equals sign in the middle, it becomes more complicated.
Here is an actual proof that "Valid Mathematical Expression" (VME) is NP complete. We can do a reduction from Subset sum. NOTE that Wikipedia's definition of subset sum requires that the subset is non-empty. In fact, it is true that the more general problem of subset sum allowing empty subsets is NP complete, if the desired sum is also part of the input. I won't give that proof unless requested. Given the instance of subset sum {i_1, i_2, ..., i_n} along with desired sum s, create the following instance of VME:
{0, i_1, 0, i_2, 0, ..., i_n, s}, {+, *}, {=}
IF the instance of subset sum is solvable, then there is some subset of the integers that adds to 0. If the integer i1 is part of the sum, add it with its corresponding zero (immediately to the left) and if i1 is not part of the sum, multiply it. Between each zero and the term to the right, insert an addition sign.
Taking the Wikipedia example
{−7, −3, −2, 5, 8}
where { −3, −2, 5} sums to 0, we would encode it as
{0, -7, 0, -3, 0, -2, 0, 5, 0, 8, 0}
and the resulting expression would be
{0*7 + 0 + -3 + 0 + -2 + 0 + 5 + 0*8 = 0}
Now we also need to show that any solution to this instance of VME results in a solution to the instance of subset sum. This is easier than you think. When we look at a resulting expression, we can group the numbers into those which are multiplied with a 0 (including as part of a chain multiplication) and those that are not. Any number that is multiplied with a zero is not included in the final sum. Any number that is not multiplied with a zero must be added into the final sum.
So we have shown that this instance of VME is solvable IF and ONLY IF the corresponding instance of subset sum is solvable, so the reduction is complete.
EDIT: The Partition reduction (with the comment) works as well, and is better because it allows you to put the equals sign anywhere. Neat!
Don't have the time for the full answer right now, but you can describe a reduction from this problem to the Knapsack Problem.
Using dynamic programming you can achieve pseudo-polynomial time solution. Note that this does not conflict with the fact that the problem is indeed NP Complete.
There are two properties that need to be satisfied for it to be NP Complete.
A decision problem C is NP-complete if:
C is in NP, and
Every problem in NP is reducible to C in polynomial time.
We have established 1. 2 results from the fact that every problem in NP is reducible to 3SAT and 3SAT is reducible to the current problem.
Therefore it is NP-complete.
(edit) Answer to the comment below:
I will prove that SAT is reducible to the current problem, and since 3SAT is reducible to SAT, the result follows.
Input formula is the conjunction of the following expressions:
(x1 V x2 V x3 V ... xn V y1)
(x1 V x2 V x3 V ... xn V y2)
(x1 V x2 V x3 V ... xn V y3)
.
.
.
(x1 V x2 V x3 V ... xn V y64)
where each yi is a boolean based on what the order of the operators applied between all the xi's is.
i.e., yi can take a total of 4x4x4x4x1 values (assuming that only +, -, x, / are the operators and = is always the last operator; this can be changed if the operator set is modified to include other operators)
If none of the expressions is true, then the complete expression will evaluate to FALSE, and there is no way to check unless we substitute all possible values, i.e, x1 through xn as the n numbers and y1 through y64 as the various ways in which the operators can be applied (This takes care of order)
This conversion is in POLY-time, and the given boolean formula is satisfiable iff the mathematical expression is valid, etc.
Anyone notice a flaw?
This isn't really an answer to your complexity question, but your problem sounds a bit like the Countdown problem. A quick search turned up this paper: http://www.cs.nott.ac.uk/~gmh/countdown.pdf
I don't have to time to work out a proof at the moment, but a hunch tells me that it may not be in P. you can define a grammar for arithmetic, and then this question amounts to finding if there's a valid parse tree that uses all these terminals. i believe that that problem is in NP but outside of P.
Related
Lets assume that we have only integer numbers which values are in range 1 to N. Next we will split them into K-element multi-sets. How would you find such set which contains smallest possible number of those multi-sets yet sum of this multi-set contains all numbers from 1 to N? In case of ambiguity answer will be any set that matches criteria (first found).
For instance, we have N = 9, K = 3
(1,2,3)(4,5,6)(7,8,8)(8,7,6)(1,9,2)(4,4,3)
Smallest number of multi-sets that contains all the numbers from 1 to 9 is equal to 4 and can be either (1,2,3)(4,5,6)(7,8,8)(1,9,2) or (1,2,3)(4,5,6)(8,7,6)(1,9,2).
Any idea for efficient algorithm to find such set?
PS
After writing an answer I found yet another 4 element set: (4,5,6)(1,9,2)(4,4,3)(7,8,8) or (4,5,6)(1,9,2)(4,4,3)(8,7,6) But as I said algorithm finding any minimum set would be fine.
Your question is a restricted version the classic Set Covering problem, but it still easy to show that it is NP-Hard.
Any approximation technique for this problem would be reasonable here. In particular, the greedy solution of choosing the next subset covering the most uncovered items - is esp. easy to implement.
This problem, as #Ami Tavroy said, is NP-hard by reduction to 3-dimensional matching (here).
To do the reduction, note the restricted decision variant of 3-dimensional matching when it reduces to a exact cover (here):
...given a set T and an integer k, decide whether there exists a
3-dimensional matching M ⊆ T with |M| ≥ k. ... The problem is
NP-complete even in the special case that k = |X| = |Y| =
|Z|.1[4][5] In this case, a 3-dimensional (dominating) matching is
not only a set packing but also an exact cover: the set M covers each
element of X, Y, and Z exactly once.[6]
This variant can be solved in P if you can solve the other question in P - you can produce all the triples in O(N ^ 3) time and then do set cover, and check if K = N / 3 or not. Thus by reduction, the original questions is also NP-hard.
This was on my last comp stat qual. I gave an answer I thought was pretty good. We just get our score on the exam, not whether we got specific questions right. Hoping the community can give guidance on this one, I am not interested in the answer so much as what is being tested and where I can go read more about it and get some practice before the next exam.
At first glance it looks like a time complexity question, but when it starts talking about mapping-functions and pre-sorting data, I am not sure how to handle.
So how would you answer?
Here it is:
Given a set of items X = {x1, x2, ..., xn} drawn from some domain Z, your task is to find if a query item q in Z occurs in the set. For simplicity you may assume each item occurs exactly once in X and that it takes O(l) amount of time to compare any two items in Z.
(a) Write pseudo-code for an algorithm which checks if q in X. What is the worst case time complexity of your algorithm?
(b) If l is very large (e.g. if each element of X is a long video) then one needs efficient algorithms to check if q \in X. Suppose you are given access to k functions h_i: Z -> {1, 2, ..., m} which uniformly map an element of Z to a number between 1 and m, and let k << l and m > n.
Write pseudo-code for an algorithm which uses the function h_1...h_k to check if q \in X. Note that you are allowed to preprocess the data. What is the worst case time complexity of your algorithm?
Be explicit about the inputs, outputs, and assumptions in your pseudocode.
The first seems to be a simple linear scan. The time complexity is O(n * l), the worst case is to compare all elements. Note - it cannot be sub-linear with n, since there is no information if the data is sorted.
The second (b) is actually a variation of bloom-filter, which is a probabalistic way to represent a set. Using bloom filters - you might have false positives (say something is in the set while it is not), but never false negative (say something is not int the set, while it is).
Let's say I have two sets: (n_1, n_2, ...) and (m_1, m_2, ...) and a matching function match(n, m) that returns a value from 0 to 1. I want to find the mapping between the two sets such that the following constraints are met:
Each element must have at most 1 matched element in the opposite set.
Unmatched elements will be paired with a dummy element at cost 1
The sum of the match function when applied to all elements is maximal
I am having trouble expressing this formally, but if you lined up each set parallel to each other with their original ordering and drew a line between matched elements, none of the lines would cross. E.x. [n_1<->m_2, n_2<->m_3] is a valid mapping but [n_1<->m_2, n_2<->m_1] is not.
(I believe the first three are standard weighted bipartite matching constraints, but I specified them in case I misunderstood weighted bipartite matching)
This is relatively straight forward to do with an exhaustive search in exponential time (with respect to the size of the sets), but I'm hoping a polynomial time (ideally O((|n|*|m|)^3) or better) solution exists.
I have searched a fair amount on the "assignment problem"/"weighted bipartite matching" and have seen variations with different constraints, but didn't find one that matched or that I was able to reduce to one with this added ordering constraint. Do you have any ideas on how I might solve this? Or perhaps a rough proof that it is not solvable in polynomial time (for my purposes, a reduction to NP-complete would also work)?
This problem has been studied under the name "maximum-weight non-crossing matching". There's a simple quadratic-time dynamic program. Let M(a, b) be the value of an optimal matching on n1, …, na and m1, …, mb. We have the recurrence
M(0, b) = -b
M(a, 0) = -a
M(a, b) = max {M(a - 1, b - 1) + match(a, b), M(a - 1, b) - 1, M(a, b - 1) - 1}.
By tracing back the argmaxes, you can recover an optimal solution from its value.
If match has significantly fewer than a quadratic number of entries greater than -1, there is an algorithm that runs in time linearithmic in the number of useful entries (Khoo and Cong, A Fast Multilayer General Area Router for MCM Designs).
Suppose we have a set of doubles s, something like this:
1.11, 1.60, 5.30, 4.10, 4.05, 4.90, 4.89
We now want to find the smallest, positive integer scale factor x that any element of s multiplied by x is within one tenth of a whole number.
Sorry if this isn't very clear—please ask for clarification if needed.
Please limit answers to C-style languages or algorithmic pseudo-code.
Thanks!
You're looking for something called simultaneous Diophantine approximation. The usual statement is that you're given real numbers a_1, ..., a_n and a positive real epsilon and you want to find integers P_1, ..., P_n and Q so that |Q*a_j - P_j| < epsilon, hopefully with Q as small as possible.
This is a very well-studied problem with known algorithms. However, you should know that it is NP-hard to find the best approximation with Q < q where q is another part of the specification. To the best of my understanding, this is not relevant to your problem because you have a fixed epsilon and want the smallest Q, not the other way around.
One algorithm for the problem is (Lenstra–Lenstra)–Lovász's lattice reduction algorithm. I wonder if I can find any good references for you. These class notes mention the problem and algorithm, but probably aren't of direct help. Wikipedia has a fairly detailed page on the algorithm, including a fairly large list of implementations.
To answer Vlad's modified question (if you want exact whole numbers after multiplication), the answer is known. If your numbers are rationals a1/b1, a2/b2, ..., aN/bN, with fractions reduced (ai and bi relatively prime), then the number you need to multiply by is the least common multiple of b1, ..., bN.
This is not a full answer, but some suggestions:
Note: I'm using "s" for the scale factor, and "x" for the doubles.
First of all, ask yourself if brute force doesn't work. E.g. try s = 1, then s = 2, then s = 3, and so forth.s
We have a list of numbers x[i], and a tolerance t = 1/10. We want to find the smallest positive integer s, such that for each x[i], there is an integer q[i] such that |s * x[i] - q[i]| < t.
First note that if we can produce an ordered list for each x[i], it's simple enough to merge these to find the smallest s that will work for all of them. Secondly note that the answer depends only on the fractional part of x[i].
Rearranging the test above, we have |x - q/s| < t/s. That is, we want to find a "good" rational approximation for x, in the sense that the approximation should be better than t/s. Mathematicians have studied a variant of this where the criterion for "good" is that it has to be better than any with a smaller "s" value, and the best way to find these is through truncations of the continued fraction expansion.
Unfortunately, this isn't quite what you need, since once you get under your tolerance, you don't necessarily need to continue to get increasingly better -- the same tolerance will work. The next obvious thing is to use this to skip to the first number that would work, and do brute force from there. Unfortunately, for any number the largest the first s can be is 5, so that doesn't buy you all that much. However, this method will find you an s that works, just not the smallest one. Can we use this s to find a smaller one, if it exists? I don't know, but it'll set an upper limit for brute forcing.
Also, if you need the tolerance for each x to be < t, than this means the tolerance for the product of all x must be < t^n. This might let you skip forward a great deal, and set a reasonable lower limit for brute forcing.
I recently had this problem on a test: given a set of points m (all on the x-axis) and a set n of lines with endpoints [l, r] (again on the x-axis), find the minimum subset of n such that all points are covered by a line. Prove that your solution always finds the minimum subset.
The algorithm I wrote for it was something to the effect of:
(say lines are stored as arrays with the left endpoint in position 0 and the right in position 1)
algorithm coverPoints(set[] m, set[][] n):
chosenLines = []
while m is not empty:
minX = min(m)
bestLine = n[0]
for i=1 to length of n:
if n[i][0] <= minX and n[i][1] > bestLine[1] then
bestLine = n[i]
add bestLine to chosenLines
for i=0 to length of m:
if m[i] <= bestLine[1] then delete m[i] from m
return chosenLines
I'm just not sure if this always finds the minimum solution. It's a simple greedy algorithm so my gut tells me it won't, but one of my friends who is much better than me at this says that for this problem a greedy algorithm like this always finds the minimal solution. For proving mine always finds the minimal solution I did a very hand wavy proof by contradiction where I made an assumption that probably isn't true at all. I forget exactly what I did.
If this isn't a minimal solution, is there a way to do it in less than something like O(n!) time?
Thanks
Your greedy algorithm IS correct.
We can prove this by showing that ANY other covering can only be improved by replacing it with the cover produced by your algorithm.
Let C be a valid covering for a given input (not necessarily an optimal one), and let S be the covering according to your algorithm. Now lets inspect the points p1, p2, ... pk, that represent the min points you deal with at each iteration step. The covering C must cover them all as well. Observe that there is no segment in C covering two of these points; otherwise, your algorithm would have chosen this segment! Therefore, |C|>=k. And what is the cost (segments count) in your algorithm? |S|=k.
That completes the proof.
Two notes:
1) Implementation: Initializing bestLine with n[0] is incorrect, since the loop may be unable to improve it, and n[0] does not necessarily cover minX.
2) Actually this problem is a simplified version of the Set Cover problem. While the original is NP-complete, this variation results to be polynomial.
Hint: first try proving your algorithm works for sets of size 0, 1, 2... and see if you can generalise this to create a proof by induction.