Dividing set of integers - algorithm

I've got a problem with a solution for algorithmic problem as described below.
We have a set (e.g. array) of integers. Our task is to divide them into the groups (they don't have to have same amount of elements) that sum is equal to each other. I case that primal set cannot be divide we have to give answer "impossible to divide".
For example:
Set A is given [-7 3 3 1 2 5 14]. The answer is [-7 14], [3 3 1], [2 5].
It seems that it's easy to say when for sure it would be impossible. When sum of primal set isn't divisible by 3: sum(A) % 3 != 0.
Do you have any idea how to solve that problem?

This is the 3-partition problem variant of the partition problem, the difference being that the classic partition problem splits the set into two sets (not three) whose sums are equal to each other. This problem is NP-complete, so you're almost certainly not going to find a polynomial time solution for it; the 2-partition problem has a pseudopolynomial time solution, but the 3-partition problem does not.
See this answer for an outline of how to adapt the 2-partition algorithm to a 3-partition algorithm. See also this paper for a parallel solution.

Related

Difficulty in thinking a divide and conquer approach

I am self-learning algorithms. As we know Divide and Conquer is one of the algorithm design paradigms. I have studied mergeSort, QuickSort, Karatsuba Multiplication, counting inversions of an array as examples of this particular design pattern. Although it sounds very simple, divides the problems into subproblems, solves each subproblem recursively, and merges the result of each of them, I found it very difficult to develop an idea of how to apply that logic to a new problem. To my understanding, all those above-mentioned canonical examples come up with a very clever trick to solve the problem. For example, I am trying to solve the following problem:
Given a sequence of n numbers such that the difference between two consecutive numbers is constant, find the missing term in logarithmic time.
Example: [5, 7, 9, 11, 15]
Answer: 13
First, I came up with the idea that it can be solved using the divide and conquer approach as the naive approach will take O(n) time. From my understanding of divide and conquer, this is how I approached:
The original problem can be divided into two independent subproblems. I can search for the missing term in the two subproblems recursively. So, I first divide the problem.
leftArray = [5,7,9]
rightArray = [11, 15]
Now it says, I need to solve the subproblems recursively until it becomes trivial to solve. In this case, the subproblem becomes of size 1. If there is only one element, there are 0 missing elements. Now to combine the result. But I am not sure how to do it or how it will solve my original problem.
Definitely, I am missing something crucial here. My question is how to approach when solving this type of divide and conquer problem. Should I come up with a trick like a mergeSort or QuickSort? The more I see the solution to this kind of problem, it feels I am memorizing the approach to solve, not understanding and each problem solves it differently. Any help or suggestion regarding the mindset when solving divide and conquer would be greatly appreciated. I have been trying for a long time to develop my algorithmic skill but I improved very little. Thanks in advance.
You have the right approach. The only missing part is an O(1) way to decide which side you are discarding.
First, note that the numbers in your problem must be ordered, otherwise you can't do better than O(n). There also needs to be at least three numbers, otherwise you wouldn't figure out the "step".
With this understanding in place, you can determine the "step" in O(1) time by examining the initial three terms, and see what's the difference between the consecutive ones. Two outcomes are possible:
Both differences are the same, and
One difference is twice as big as the other.
Case 2 hands you a solution by luck, so we will consider only the first case from now on. With the step in hand, you can determine if the range has a gap in it by subtracting the endpoints, and comparing the result to the number of gaps times the step. If you arrive at the same result, the range does not have a missing term, and can be discarded. When both halves can be discarded, the gap is between them.
As #Sergey Kalinichenko points out, this assumes the incoming set is ordered
However, if you're certain the input is ordered (which is likely in this case) observe the nth position's value to be start + jumpsize * index; this allows you to bisect to find where it shifts
Example: [5, 7, 9, 11, 15]
Answer: 13
start = 5
jumpsize = 2
check midpoint: 5 * 2 * 2 -> 9
this is valid, so the shift must be after the midpoint
recurse
You can find the jumpsize by checking the first 3 values
a, b, c = (language-dependent retrieval)
gap1 = b - a
gap2 = c - b
if gap1 != gap2:
if (value at 4th index) - c == gap1:
missing value is b + gap1 # 2nd gap doesn't match
else:
missing value is a + gap2 # 1st gap doesn't match
bisect remaining values

Why dynamic programming for 0/1 Knapsack?

I looked at many resources and also this question, but am still confused why we need Dynamic Programming to solve 0/1 knapsack?
The question is: I have N items, each item with value Vi, and each item has weight Wi. We have a bag of total weight W. How to select items to get best total of values over limitation of weight.
I am confused with the dynamic programming approach: why not just calculate the fraction of (value / weight) for each item and select the item with best fraction which has less weight than remaining weight in bag?
For your fraction-based approach you can easily find a counterexample.
Consider
W=[3, 3, 5]
V=[4, 4, 7]
Wmax=6
Your approach gives optimal value Vopt=7 (we're taking the last item since 7/5 > 4/3), but taking the first two items gives us Vopt=8.
As other answers pointed out, there are edge cases with your approach.
To explain the recursive solution a bit better, and perhaps to understand it better I suggest you approach it with this reasoning:
For each "subsack"
If we have no fitting element there is no best element
If we only have one fitting element, the best choice is that element
If we have more than one fitting element, we take each element and calculate the best fit for its "subsack". The best choice is the highest valued element/subsack combination.
This algorithm works because it spans all the possible combinations of fitting elements and finds the one with the highest value.
A direct solution, instead, is not possible as the problem is NP-hard.
Just look at this counterexample:
Weight 7, W/V pairs (3/10),(4/12),(5/21)
Greedy algorithm fails when there is unit ratio case. for example consider the following example:
n= 1 2, P= 4 18, W= 2 18, P/W= 2 1
Knapsack capacity=18
According to greedy algorithm it will consider the first item since it's P/W ratio is greater and hence the total profit will be 4 (since it cannot insert the second item after first as the capacity reduces to 16 after inserting the first item).
But the actual answer is 18.
Hence there are multiple corner cases where greedy fails to give optimal solution, that's why we use Dynamic programming in 0/1 knapsack problem.

Pseudo-polynomial time algorithm for modified coin change

I thought of the following problem while thinking of coin change, but I don't know an efficient (pseudo-polynomial time) algorithm to solve it. I'd like to know if there is any pseudo-polynomial time solution or maybe some classic literature on it that I'm missing.
There is a well-known pseudo-polynomial time dynamic programming solution to the coin change problem, which asks the following:
You have coins, the of which has a value of . How many subsets of coins exist, such that the sum of the coin values is ?
The dynamic programming solution runs in , and is a classic. But I'm interested in the following slight generalization:
What if the coin no longer has just a value of , but instead can assume a set of values ? Now, the question is: How many subsets of coins exist, such there exists an assignment of coins to values such that the sum of the values of these coins is ?
(The classic coin change problem is actually an instance of this problem with for all , so I do not expect a polynomial time solution.)
For example, given and , and the following coins:
Then, there are subsets:
Take coin only, assign this coin to have value .
Take coin only, assign this coin to have value .
Take coins and , assign them to have values and , respectively or and , respectively—these are considered to be the same way.
Take coins , and , and assign them to have values , and , respectively.
The issue I'm having is precisely the third case; it would be easy to modify the classic dynamic programming solution for this problem, but it will count these two subsets as separate because different values are assigned, even though the coins taken are the same.
So far, I have only been able to find the straightforward exponential time algorithm for this problem: consider each of the subsets, and run the standard dynamic programming coin change algorithm (which is ) to check whether this subset of coins is valid, giving an time algorithm. That's not what I'm looking for here; I'm trying to find a solution without the factor.
Can you provide me with a pseudo-polynomial time algorithm to solve this question, or can it be proven that none exists unless, say, P = NP? That is, this problem is NP-complete (or similar).
Maybe this is hard, but the way you asked the question it doesn't seem so.
Take coins 1 and 3, assign them to have values 7 and 3, respectively or 6 and 4, respectively—these are considered to be the same way.
Let us encode the two equivalent solutions in this way:
((1, 3), (7, 3))
((1, 3), (6, 4))
Now, when we memoize a solution, can't we just ask if the first element of the pair, (1, 3), has already been solved? Then we would suppress computing the (6, 4) solution.

Number of ways to represent a number as a sum of K numbers in subset S

Let the set S be {1 , 2 , 4 , 5 , 10}
Now i want to find the number of ways to represent x as sum of K numbers of the set S. (a number can be included any number of times)
if x = 10 and k = 3
Then the ans should be 2 => (5,4,1) , (4,4,2)
The order of the numbers doesn't matter ie.(4,4,2) and (4,2,4) count as one.
I did some research and found that the set can be represented as a polynomial x^1+x^2+x^4+x^5+x^10 and after raising the polynomial to the power K the coefficients of the product polynomial gives the ans.
But the ans includes (4,4,2) and (4,2,4) as unique terms which i don't want
Is there any way to make (4,4,2) and (4,2,4) count as same term ?
This is a NP-complete, a variant of the sum-subset problem as described here.
So frankly, I don't think you can solve it via a non-exponential (iterate though all combinations) solution, without any restrictions on the problem input (such as maximum number range, etc.).
Without any restrictions on the problem domain, I suggest iterating through all your possible k-set instances (as described in the Pseudo-polynomial time dynamic programming solution) and see which are a solution.
Checking whether 2 solutions are identical is nothing compared to the complexity of the overall algo. So, a hash of the solution set-elements will work just fine:
E.g. hash-order-insensitive(4,4,2)==hash-order-insensitive(4,2,4) => check the whole set, otherwise the solutions are distinct.
PS: you can also describe step-by-step your current solution.

Tips for Project Euler Problem #78

This is the problem in question: Problem #78
This is driving me crazy. I've been working on this for a few hours now and I've been able to reduce the complexity of finding the number of ways to stack n coins to O(n/2), but even with those improvements and starting from an n for which p(n) is close to one-million, I still can't reach the answer in under a minute. Not at all, actually.
Are there any hints that could help me with this?
Keep in mind that I don't want a full solution and there shouldn't be any functional solutions posted here, so as not to spoil the problem for other people. This is why I haven't included any code either.
Wikipedia can help you here. I assume that the solution you already have is a recursion such as the one in the section "intermediate function". This can be used to find the solution to the Euler problem, but isn't fast.
A much better way is to use the recursion based on the pentagonal number theorem in the next section. The proof of this theorem isn't straight forward, so I don't think the authors of the problem expect that you come up with the theorem by yourself. Rather it is one of the problems, where they expect some literature search.
This problem is really asking to find the first term in the sequence of integer partitions that’s divisible by 1,000,000.
A partition of an integer, n, is one way of describing how many ways the sum of positive integers, ≤ n, can be added together to equal n, regardless of order. The function p(n) is used to denote the number of partitions for n. Below we show our 5 “coins” as addends to evaluate 7 partitions, that is p(5)=7.
5 = 5
= 4+1
= 3+2
= 3+1+1
= 2+2+1
= 2+1+1+1
= 1+1+1+1+1
We use a generating function to create the series until we find the required n.
The generating function requires at most 500 so-called generalized pentagonal numbers, given by n(3n – 1)/2 with 0, ± 1, ± 2, ± 3…, the first few of which are 0, 1, 2, 5, 7, 12, 15, 22, 26, 35, … (Sloane’s A001318).
We have the following generating function which uses our pentagonal numbers as exponents:
1 - q - q^2 + q^5 + q^7 - q^12 - q^15 + q^22 + q^26 + ...
my blog at blog.dreamshire.com has a perl program that solves this in under 10 sec.
Have you done problems 31 or 76 yet? They form a nice set that is an generalization of the same base problem each time. Doing the easier questions may give you insight into a solution for 78.
here some hints:
Divisibility by one million is not the same thing as just being larger than one million. 1 million = 1,000,000 = 10^6 = 2^6 * 5^6.
So the question is to find a lowest n so that the factors of p(n) contain six 2's and six 5's.

Resources