Dynamic programming function in O(nk) time - algorithm

Given two integer arrays A of size n and B of size k, and knowing that all items
in the array B are unique, I want to find an algorithm that finds indices j' < j'', such
that all elements of B belong to A[j' : j''] and value |j''-j'|is minimized or
returns zero if there are no such indices at all. I also note that A can contain duplicates.
To provide more clarity, we can consider array A = {1, 2, 9, 6, 7, 8, 1, 0, 0, 6} and B {1, 8, 6}, then you can see that B ⊆ A[1 : 6] and B ⊆ A[4 : 7], but at the same time 7−4 < 6−1,
thus algorithm should output j'= 4 and j''= 7.
I want to find an algorithm that runs in O(nk) time.
My work so far is that I was thinking for each j'∈ [n], I can compute the minimum j'' ≥ j' so that B ⊆ A[j', j'']. If I assume B = {b1, ..., bk}, let Next[j'][i] denote the smallest index t ≥ j' so that at = b_i, i.e., the index of next element after a_j' (included) which equals bi.
In particular if such t doesn’t exist, simply let Next[j'][i] = ∞. If I am able to show that the minimum j'' is the following
j'' = max i∈[k] of Next[j'][i],
then I think I will be able to design a dynamic programming algorithm to compute Next in O(nk) time. Any help on this dynamic programming problem would be much appreciated!

Just run a sliding window that maintains the invariant of including all elements of B. That's O(n) with a hashmap.

Related

Use FFT to find all possible fixed-size subset sums

I need to solve the following problem: given an integer sequence x of size N, and a subset size k, find all the possible subset sums. A subset sum is the sum of elements in the subset.
If elements in x are allowed to appear many times (up to k of course) in a subset (sub-multiset), this problem has a pseudo polynomial time solution via FFT. Here is an example:
x = [0, 1, 2, 3, 6]
k = 4
xFrequency = [1, 1, 1, 1, 0, 0, 1] # On the support of [0, 1, 2, 3, 4, 5, 6]
sumFrequency = selfConvolve(xFrequency, times = 4) # A fast approach is to simply raise the power of the Fourier series.
sumFrequency > 0 # Gives a boolean vector indicating all possible size-k subset sums.
But what can be done if an element cannot show up multiple times in a subset?
I came up with the following method but am unsure of its correctness. The idea is to first find the frequencies of sums that are produced by adding at least 2 identical elements:
y = [0, 2, 4, 6, 12] # = [0, 1, 2, 3, 6] + [0, 1, 2, 3, 6]
yFrequency = [0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1]
sumFrequencyWithRedundancy = convolve(yFrequency, x, x)
My reasoning is that since y represents all possible sums of 2 identical elements, then every sum in y + x + x is guaranteed to have been produced by adding at least 2 identical elements. Finally
sumFrequencyNoRedundancy = sumFrequency - sumFrequencyWithRedundancy
sumFrequencyNoRedundancy > 0
Any mistake or any other established method for solving the problem?
Thanks!
Edits:
After some tests, it does not work. There turns out to be much more combinations that should be excluded from sumFrequency besides sumFrequencyWithRedundancy, and the combinatoric analyses seem to escalate rapidly with k, eventually making it less efficient than brute-force summation.
My motivation was to find all possible sample sums given sampling without replacement and the sample size. Then I came across the idea of solving the standard subset sum problem via FFT --- free subset size and the qualified subsets themselves unneeded. The reference materials can be easily found online, basically a divide and conquer approach:
Divide the superset into 2 sets, left and right.
Compute all possible subset sums in the left and right sets. The sums are represented by 2 boolean vectors.
Convolve the 2 boolean vectors.
Find if the target sum is indicated in the final boolean vector.
You can see why the algorithm works for the standard subset sum problem.
If anyone can let me know some work on how to find all possible size-k subset sums, I would really appreciate it!
Given k and the n-element array x, it suffices to evaluate the degree-k coefficient in z of the polynomial
n x[i]
product (1 + y z).
i=1
This coefficient is a polynomial in y where the exponents with nonzero coefficients indicate the sums that can be formed using exactly k distinct terms.
One strategy is to split x with reasonably balanced sums, evaluate each half mod z^(k+1), and then multiply using the school algorithm for the outer multiplications and FFT (or whatever) for the inner. This should end up costing roughly O(k^2 S log^2 S).
The idea for evaluating elementary symmetric polynomials efficiently is due to Ben-Or.

Maximum Sum for Subarray with fixed cutoff

I have a list of integers, and I need to find a way to get the maximum sum of a subset of them, adding elements to the total until the sum is equal to (or greater than) a fixed cutoff. I know this seems similar to the knapsack, but I was unsure whether it was equivalent.
Sorting the array and adding the maximum element until sum <= cutoff does not work. Observe the following list:
list = [6, 5, 4, 4, 4, 3, 2, 2, 1]
cutoff = 15
For this list, doing it the naive way results in a sum of 15, which is very sub-optimal. As far as I can see, the maximum you could arrive at using this list is 20, by adding 4 + 4 + 4 + 2 + 6. If this is just a different version of knapsack, I can just implement a knapsack solution, as I probably have small enough lists to get away with this, but I'd prefer to do something more efficient.
First of all in any sum, you won't have produced a worse result by adding the largest element last. So there is no harm in assuming that the elements are sorted from smallest to largest as a first step.
And now you use a dynamic programming approach similar to the usual subset sum.
def best_cutoff_sum (cutoff, elements):
elements = sorted(elements)
sums = {0: None}
for e in elements:
next_sums = {}
for v, path in sums.iteritems():
next_sums[v] = path
if v < cutoff:
next_sums[v + e] = [e, path]
sums = next_sums
best = max(sums.keys())
return (best, sums[best])
print(best_cutoff_sum(15, [6, 5, 4, 4, 4, 3, 2, 2, 1]))
With a little work you can turn the path from the nested array it currently is to whatever format you want.
If your list of non-negative elements has n elements, your cutoff is c and your maximum value is v, then this algorithm will take time O(n * (k + v))

Segments with most points algorithm analysis

We define x1, x2,..., x_n to be a sequence of points (numbers) and [s_i, t_i] be a set of n segments for 1 ≤ i ≤ n. Point x_j is inside the segment i if s_i ≤ x_j ≤ t_i. I want to find the segment with the most points.
Now to solve this, I am thinking we can sort x and the intervals based on s. Keep a separate array, T, such that T[i] = maximum points in the segment i. Initialize all the values in this array to 0. Then, for each x, check all the intervals that fit the constraint and increment T[i] accordingly.
This in the worst case scenario can take O(n^2). But I feel like I have a lot of redundancy here. How do I make this more efficient?
Just to clarify, if you problem is one-dimensional, the points in X (x_1 to x_n) are numbers, and the segments are intervals.
You can easily solve this by sorting X and using the resulting indices. You can effectively calculate the number of points within a segment [s, t] by finding the two corresponding indices i and j. Find (using binary-search or whatever is most efficient) i such that x_i < s <= x_(i+1), and j such that x_j <= t < x_(j+1). Note the inequalities (in case s or t might be in X). The number of points within [s, t] is equal to j-i.
If it is possible that s < x_1 or t > x_n, simply append a point to both ends of X (a minimum and a maximum).
This has complexity O(n log n), limited by the sorting algorithm. If you can use something like counting sort that uses the values as indices into an array (or keys into a multiset), then you can improve on that by doing some more work.
Let S be the set of points containing every s and every t for all the segments [s, t]. The idea is to build an indexing array for X (kind of like for a counting sort).
First, build the array A such that A[x in X] = 1 and A[x not in X] = 0. Then, go through it again to build the array A_less such that A_less[i] equals the sum of all A[j] with j < i.
For example, if A = [1, 0, 0, 1, 0, 1, 0], then A_less = [0, 1, 1, 1, 2, 2, 3]. You can build this array using a simple counter.
You can now refer directly to this array to get the number of points which values are less than or equal to another. In the previous example, there are clearly three points in X, with values 0, 3, and 5. By refering to A_less, you can know that there are A_less[4] = 2 points with values less than or equal to 4.
Similarly, build A_less_equal such that A_less_equal[i] equals the sum of all A[j] with j <= i. Using the same example, A_less_equal = [1, 1, 1, 2, 2, 3, 3].
Now, for any segment [s, t], you can get the number of points it contains by computing A_less_equal[t] - A_less[s]. All of that has complexity O(n).
If your points are not integers (are at least, not easily usable as indices), then you can still use the same idea, replacing the arrays with sorted sets, the keys of which are every value in X or S (you need to add the values in S to be able to look them up at the end).

How to partition an array of integers in a way that minimizes the maximum of the sum of each partition?

The inputs are an array A of positive or null integers and another integer K.
We should partition A into K blocks of consecutive elements (by "partition" I mean that every element of A belongs to some block and 2 different blocks don't contain any element in common).
We define the sum of a block as sum of the elements of the block.
The goal is to find such a partition in K blocks such that the maximum of the sums of each block (let's call that "MaxSumBlock") is minimized.
We need to output the MaxSumBlock (we don't need to find an actual partition)
Here is an example:
Input:
A = {2, 1, 5, 1, 2, 2, 2}
K = 3
Expected output:
MaxSumBlock: 6
(with partition: {2, 1}, {5, 1}, {2, 2, 2})
In the expected output, the sums of each block are 3, 6 and 6. The max is 6.
Here is an non optimal partition:
partition: {2, 1}, {5}, {1, 2, 2, 2}
The sums of each block in that case are 3, 6 and 7. The max is hence 7. It is not a correct answer.
What algorithm solves this problem?
EDIT: K and the size of A is no bigger than 100'000. Each element of A is no bigger than 10'000
Use binary search.
Let max sum range from 0 to sum(array). So, mid = (range / 2). See if mid can be achieved by partitioning into k sets in O(n) time. If yes, go for lower range and if not, go for a higher range.
This will give you the result in O(n log n).
PS: if you have any problem with writing the code, I can help but I'd suggest you try it first yourself.
EDIT:
as requested, I'll explain how to find if mid can be achieved by partitioning into k sets in O(n) time.
Iterate through the elements till sum is less than or equal to mid. As soon as it gets greater than mid, let it be part of next set. If you get k or less sets, mid is achievable, else not.

From a given number, determine three close numbers whose product is the original number

I have a number n, and I want to find three numbers whose product is n but are as close to each other as possible. That is, if n = 12 then I'd like to get 3, 2, 2 as a result, as opposed to 6, 1, 2.
Another way to think of it is that if n is the volume of a cuboid then I want to find the lengths of the sides so as to make the cuboid as much like a cube as possible (that is, the lengths as similar as possible). These numbers must be integers.
I know there is unlikely to be a perfect solution to this, and I'm happy to use something which gives a good answer most of the time, but I just can't think where to go with coming up with this algorithm. Any ideas?
Here's my first algorithm sketch, granted that n is relatively small:
Compute the prime factors of n.
Pick out the three largest and assign them to f1, f2, f3. If there are less than three factors, assign 1.
Loop over remaining factors in decreasing order, multiply them into the currently smallest partition.
Edit
Let's take n=60.
Its prime factors are 5 3 2 2.
Set f1=5, f2=3 and f3=2.
The remaining 2 is multiplied to f3, because it is the smallest.
We end up with 5 * 4 * 3 = 60.
Edit
This algorithm will not find optimum, notice btillys comment:
Consider 17550 = 2 * 3 * 3 * 3 * 5 * 5
* 13. Your algorithm would give 15, 30, 39 when the best is 25, 26, 27.
Edit
Ok, here's my second algorithm sketch with a slightly better heuristic:
Set the list L to the prime factors of n.
Set r to the cube root of n.
Create the set of three factors F, initially set to 1.
Iterate over the prime factors in descending order:
Try to multiply the current factor L[i] with each of the factors in descending order.
If the result is less than r, perform the multiplication and move on to the next
prime factor.
If not, try the next F. If out of Fs, multiply with the smallest one.
This will work for the case of 17550:
n=17550
L=13,5,5,3,3,3,2
r=25.98
F = { 1, 1, 1 }
Iteration 1:
F[0] * 13 is less than r, set F to {13,1,1}.
Iteration 2:
F[0] * 5 = 65 is greated than r.
F[1] * 5 = 5 is less than r, set F to {13,5,1}.
Iteration 3:
F[0] * 5 = 65 is greated than r.
F[1] * 5 = 25 is less than r, set F to {13,25,1}.
Iteration 4:
F[0] * 3 = 39 is greated than r.
F[1] * 3 = 75 is greated than r.
F[2] * 3 = 3 is less than r, set F to {13,25,3}.
Iteration 5:
F[0] * 3 = 39 is greated than r.
F[1] * 3 = 75 is greated than r.
F[2] * 3 = 9 is less than r, set F to {13,25,9}.
Iteration 6:
F[0] * 3 = 39 is greated than r.
F[1] * 3 = 75 is greated than r.
F[2] * 3 = 27 is greater than r, but it is the smallest F we can get. Set F to {13,25,27}.
Iteration 7:
F[0] * 2 = 26 is greated than r, but it is the smallest F we can get. Set F to {26,25,27}.
Here's a purely math based approach, that returns the optimal solution and does not involve any kind of sorting. Hell, it doesn't even need the prime factors.
Background:
1) Recall that for a polynomial
the sum and product of the roots are given by
where x_i are the roots.
2) Recall another elementary result from optimization theory:
i.e., given two variables such that their product is a constant, the sum is minimum when the two variables are equal to each other. The tilde variables denote the optimal values.
A corollary of this would be that if the sum of two variables whose product is constant, is a minimum, then the two variables are equal to each other.
Reformulate the original problem:
Your question above can now be reformulated as a polynomial root-finding exercise. We'll construct a polynomial that satisfies your conditions, and the roots of that polynomial will be your answer. If you need k numbers that are optimal, you'll have a polynomial of degree k. In this case, we can talk in terms of a cubic equation
We know that:
c is the negative of the input number (assume positive)
a is an integer and negative (since factors are all positive)
b is an integer (which is the sum of the roots taken two at a time) and is positive.
Roots of p must be real (and positive, but that has already been addressed).
To solve the problem, we simply need to maximize a subject to the above set of conditions. The only part not explicitly known right now, is condition 4, which we can easily enforce using the discriminant of the polynomial.
For a cubic polynomial p, the discriminant is
and p has real and distinct roots if ∆>0 and real and coincident (either two or all three) if ∆=0. So, constraint 4 now reads ∆>=0. This is now simple and easy to program.
Solution in Mathematica
Here's a solution in Mathematica that implements this.
And here's a test on some of the numbers used in other answers/comments.
The column on the left is the list and the corresponding row in the column on the right gives the optimal solution.
NOTE:
I just noticed that OP never mentioned that the 3 numbers needed to be integers although everyone (including myself until now) assumed that they were (probably because of his first example). Re-reading the question, and going by the cube example, it doesn't seem like OP was fixated on integers.
This is an important point which will decide which class of algorithms to pursue and needs to be defined. If they need not be integers, there are several polynomial based solutions that can be provided, one of which is mine (after relaxing the integer constraint). If they should be integers, then perhaps an approach using branch-n-bound/branch-n-cut/cutting plane might be more appropriate.
The following was written assuming the OP meant the three numbers to be integers.
The way I've implemented it right now, it can give a non-integer solution in certain cases.
The reason this gives non-integer solutions for x is because I had only maximized a, when actually, b also needs to be minimum (not only that, but also because I haven't placed a constraint on the x_i being integers. It is possible to use the integer root theorem, which would involve finding the prime factors, but makes things more complicated.)
Mathematica code in text
Clear[poly, disc, f]
poly = x^3 + a x^2 + b x + c;
disc = Discriminant[poly, x];
f[n_Integer] :=
Module[{p, \[CapitalDelta] = disc /. c -> -n},
p = poly /.
Maximize[{a, \[CapitalDelta] >= 0,
b > 0 && a < 0 && {a, b} \[Element] Integers}, {a, b}][[
2]] /. c -> -n;
Solve[p == 0]
]
There may be a clever way to find the tightest triplet, as Anders Lindahl is pursuing, but I will focus on a more basic approach.
If I generate all triplets, then I can filter them afterward however I want, so I will start there. The best way I know to generate these uses recursion:
f[n_, 1] := {{n}}
f[n_, k_] := Join ##
Table[
{q, ##} & ### Select[f[n/q, k - 1], #[[1]] >= q &],
{q, #[[2 ;; ⌈ Length##/k ⌉ ]] & # Divisors # n}
]
This function f takes two integer arguments, the number to factor n, and the number of factors to produce k.
The section #[[2 ;; ⌈ Length##/k ⌉ ]] & # Divisors # n uses Divisors to produce a list of all divisors of n (including 1), and then takes from these from the second (to drop the 1) to the Ceiling of the number of divisors divided by k.
For example, for {n = 240, k = 3} the output is {2, 3, 4, 5, 6, 8}
The Table command iterates over this list while accumulating results, assigning each element to q.
The body of the Table is Select[f[n/q, k - 1], #[[1]] >= q &]. This calls f recursively, and then selects from the result all lists that begin with a number >= q.
{q, ##} & ### (also in the body) then "prepends" q to each of these selected lists.
Finally, Join ## merges the lists of these selected lists that are produced by each loop of Table.
The result is all of the integer factors of n into k parts, in lexicographical order. Example:
In[]:= f[240, 3]
Out[]= {{2, 2, 60}, {2, 3, 40}, {2, 4, 30}, {2, 5, 24}, {2, 6, 20},
{2, 8, 15}, {2, 10, 12}, {3, 4, 20}, {3, 5, 16}, {3, 8, 10},
{4, 4, 15}, {4, 5, 12}, {4, 6, 10}, {5, 6, 8}}
With the output of the function/algorithm given above, one can then test triplets for quality however desired.
Notice that because of the ordering the last triplet in the output is the one with the greatest minimum factor. This will usually be the most "cubic" of the results, but occasionally it is not.
If the true optimum must be found, it makes sense to test starting from the right side of the list, abandoning the search if a better result is not found quickly, as the quality of the results decrease as you move left.
Obviously this method relies upon a fast Divisors function, but I presume that this is either a standard library function, or you can find a good implementation here on StackOverflow. With that in place, this should be quite fast. The code above finds all triplets for n from 1 to 10,000 in 1.26 seconds on my machine.
Instead of reinventing the wheel, one should recognize this as a variation of a well known NP-complete problem.
Compute the prime factors of n.
Compute the logarithms of these factors
The problem translates as partitioning these logs into three sums that are as close as possible.
This problem is known as a variation of the Bin Packing problem, known as Multiprocessor scheduling
Given the fact that the Multiprocessor scheduling problem is NP-complete, it's no wonder that it's hard to find an algorithm that does not search the whole problem space and finds the optimum solution.
But I guess there are already several algorithms that deal with either Bin-Packing or Multiprocessor-Scheduling and find near-optimum solutions in efficient manner.
Another related problem (generalization) is the Job shop scheduling. See the wikipedia description with many links to known algorithms.
What wikipedia describes as (the often-used LPT-Algorithm (Longest Processing Time) is exactly what Anders Lindahl came up with first.
EDIT
Here's a shorter explanation using more efficient code, KSetPartitions simplifies things considerably. So did some suggestions from Mr.W. The overall logic remains the same.
Assuming there a at least 3 prime factors of n,
Find the list of triplet KSetPartitions for the prime factors of n.
Multiply each of the elements (prime factors) within each subset to produce all possible combinations for three divisors of n (when multiplied they yield n). You can think of the divisors as the length, width and height of an orthogonal parallelepiped.
The parallelepiped closest to a cube will have the shortest space diagonal. Sum the squares of the three divisors for each case and pick the smallest.
Here's the code in Mathematica:
Needs["Combinatorica`"]
g[n_] := Module[{factors = Join ## ConstantArray ### FactorInteger[n]},
Sort[Union[Sort /# Apply[Times, Union[Sort /#
KSetPartitions[factors, 3]], {2}]]
/. {a_Integer, b_Integer, c_Integer} :>
{Total[Power[{a, b, c}, 2]], {a, b, c}}][[1, 2]]]
It can handle fairly large numbers, but slows down considerably as the number of factors of n grows. The examples below show timings for 240, 2400, ...24000000.
This could be sped up in principle by taking into account cases where a prime factor appears more than once in a divisor. I don't have the know-how to do it yet.
In[28]:= g[240]
Out[28]= {5, 6, 8}
In[27]:= t = Table[Timing[g[24*10^n]][[1]], {n, 6}]
Out[27]= {0.001868, 0.012734, 0.102968, 1.02469, 10.4816, 105.444}

Resources