Assume I have two arrays, both of them are sorted, for example:
A: [1, 4, 5, 8, 10, 24]
B: [3, 6, 9, 29, 50, 65]
And then I merge these two array into one array and keep original relative order of both two array
C: [1, 4, 3, 5, 6, 9, 8, 29, 10, 24, 50, 65]
Is there any way to split C into two sorted array in O(n) time?
note: not necessarily into the original A and B
Greedily assign your integers to list 1 if they can go there. If they can't, assign them to list 2.
Here's some Ruby code to play around with this idea. It randomly splits the integers from 0 to n-1 into two sorted lists, then randomly merges them, then applies the greedy approach.
def f(n)
split1 = []
split2 = []
0.upto(n-1) do |i|
if rand < 0.5
split1.append(i)
else
split2.append(i)
end
end
puts "input 1: #{split1.to_s}"
puts "input 2: #{split2.to_s}"
merged = []
split1.reverse!
split2.reverse!
while split1.length > 0 && split2.length > 0
if rand < 0.5
merged.append(split1.pop)
else
merged.append(split2.pop)
end
end
merged += split1.reverse
merged += split2.reverse
puts "merged: #{merged.to_s}"
merged.reverse!
greedy1 = [merged.pop]
greedy2 = []
while merged.length > 0
if merged[-1] >= greedy1[-1]
greedy1.append(merged.pop)
else
greedy2.append(merged.pop)
end
end
puts "greedy1: #{greedy1.to_s}"
puts "greedy2: #{greedy2.to_s}"
end
Here's sample output:
> f(20)
input 1: [2, 3, 4, 5, 8, 9, 10, 18, 19]
input 2: [0, 1, 6, 7, 11, 12, 13, 14, 15, 16, 17]
merged: [2, 0, 1, 6, 3, 4, 5, 8, 9, 7, 10, 11, 18, 12, 13, 19, 14, 15, 16, 17]
greedy1: [2, 6, 8, 9, 10, 11, 18, 19]
greedy2: [0, 1, 3, 4, 5, 7, 12, 13, 14, 15, 16, 17]
> f(20)
input 1: [1, 3, 5, 6, 8, 9, 10, 11, 13, 15]
input 2: [0, 2, 4, 7, 12, 14, 16, 17, 18, 19]
merged: [0, 2, 4, 7, 12, 14, 16, 1, 3, 5, 6, 8, 17, 9, 18, 10, 19, 11, 13, 15]
greedy1: [0, 2, 4, 7, 12, 14, 16, 17, 18, 19]
greedy2: [1, 3, 5, 6, 8, 9, 10, 11, 13, 15]
> f(20)
input 1: [0, 1, 2, 6, 7, 9, 11, 14, 15, 18]
input 2: [3, 4, 5, 8, 10, 12, 13, 16, 17, 19]
merged: [3, 4, 5, 8, 10, 12, 0, 13, 16, 17, 1, 19, 2, 6, 7, 9, 11, 14, 15, 18]
greedy1: [3, 4, 5, 8, 10, 12, 13, 16, 17, 19]
greedy2: [0, 1, 2, 6, 7, 9, 11, 14, 15, 18]
Let's take your example.
[1, 4, 3, 5, 6, 9, 8, 29, 10, 24, 50, 65]
In time O(n) you can work out the minimum of the tail.
[1, 3, 3, 5, 6, 8, 8, 10, 10, 24, 50, 65]
And now the one stream is all cases where it is the minimum, and the other is the cases where it isn't.
[1, 3, 5, 6, 8, 10, 24, 50, 65]
[ 4, 9, 29, ]
This is all doable in time O(n).
We can go further and now split into 3 streams based on which values in the first stream could have gone in the last without changing it being increasing.
[ 3, 5, 6, 8, 10, 24, ]
[1, 5, 6, 8, 50, 65]
[ 4, 9, 29, ]
And now we can start enumerating the 2^6 = 64 different ways of splitting the original stream back into 2 increasing streams.
I am trying to solve this problem but I can't manage to figure out how.
Let's suppose I have a list of positive and negative numbers whose sum is guaranteed to be 0.
[-10, 1, 2, 20, 5, -100, -80, 10, 15, 15, 60, 100, -20, -18]
I want to obtain a list with the largest number of sub-sets, using all the elements of the initial list only once. And each subset must have the sum 0.
So in the case of this simple input:
[-5, -4, 5, 2, 3, -1]
The best results that can be obtained are:
1. [[-5, 5], [[-4, -1, 2, 3]]] #2 subsets
2. [[-5, 2, 3], [-4, -1, 5]] #2 subsets
These, for example, would be totally wrong answers:
1. [[-5, -4, -1, 2, 3, 5]] #1 subsets that is the initial list, NO
2. [[-5,5]] #1 subset, and not all elements are used, NO
Even if it's NP-Complete, how can I manage to solve it even with a brute-force approach? I just need a solution for small list of numbers.
def get_subsets(lst):
N = len(lst)
cand = []
dp = [0 for x in range(1<<N)] # maximum number of subsets using elements represented by bitset
back = [0 for x in range(1<<N)]
# Section 1
for i in range(1,1<<N):
cur = 0
for j in range(N):
if i&(1<<j):
cur += lst[j]
if not cur:
cand.append(i) # if subset sums to 0, it's viable
dp[0] = 1
# Section 2
for i in range(1<<N):
while cand and cand[0] <= i:
cand.pop(0)
if not dp[i]:
continue
for j in cand:
if i&j: # if subsets intersect, it cannot be added
continue
if dp[i]+1 > dp[i|j]:
back[i|j] = j
dp[i|j] = dp[i]+1
# Section 3
ind = dp.index(max(dp))
res = []
while back[ind]:
cur = []
for i in range(N):
if back[ind]&(1<<i):
cur.append(lst[i])
res.append(cur)
ind = ind^back[ind]
return res
print (get_subsets([-5, -4, 5, 2, 3, -1]))
Basically, this solution collects all subsets of the original list that can sum to zero, then attempts to merge as many of them together as possible without colliding. It runs in worst-case O(2^{2N}) time, where N is the length of the list, but it should hit an average case of around O(2^N), since there typically shouldn't be too many subsets summing to 0.
EDIT: I added sections to facilitate explanation of the algorithm
Section 1: I iterate through all possible 2^N-1 nonempty subsets of the original list, and check which of these subsets sum to 0; any viable zero-sum subsets are added to the list cand (represented as an integer in the range [1,2^N-1] with bits set at the indices making up the subset).
Section 2: dp is a dynamic programming table storing the maximum number of subsets summing to 0 that can be formed using the subset represented by the integer i at dp[i]. Initially, all entries of dp are set to 0 except dp[0] = 1, since the empty set has a sum of 0. Then I iterate through each subset from 0 to 2^N-1, and I run through the list of candidate subsets and attempt to merge the two subsets.
Section 3: This is just backtracking to find the answer: while filling in dp, I also kept an array back that stores the most recent subset added to achieve the subset i at back[i]. So I find the subset that maximizes the number of sub-subsets summing to 0 with ind = dp.index(max(dp)), and then I backtrack from there, shrinking the subset by removing the most recently added subset until I finally arrive back to the empty set.
This problem is NP-complete, since it is a combination of two NP-complete problems:
finding a single subset whose sum is 0 is known as the subset sum problem
when you find all the subsets whose sum is 0, you have to solve an exact cover problem with a special condition: you want to maximize the number of subsets.
The following steps will provide a solution:
use dynamic programming to find the subsets whose sum is 0 ('https://en.wikipedia.org/wiki/Subset_sum_problem#Pseudo-polynomial_time_dynamic_programming_solution)
to maximize the number of subsets, one would use D. Knuth's Algorithm X to find the exact cover.
A few remarks:
First, we know that there is an exact cover because the list of numbers has a sum of 0.
Second, we can use only the subsets that are not supersets of any other subset. Because, if A is a superset of X (both sum to 0), A can't be in the cover that has the largest number of subsets. Let A, B, C, ... be the cover with the maximum number of subsets, then we can replace A by X and A\X (it is trivial to see that the sum of A\X elements is 0) and we get the cover X, A\X, B, C, ... that is better.
Third, when we use Algorithm X, all paths in the search tree will lead to a success. Let A, B, C, ... be a path composed of non overlapping subsets having each a sum of 0. Then the complent has also a sum of 0 (which may be a superset of another subset, and then we'll use 2.).
As you see, nothing new here, and I will use only well known techniques/algorithms.
Find the subsets having a sum of 0.
The algorithm is well known. Here's a Python implementation based on Wikipedia explanations
class Q:
def __init__(self, values):
self.len = len(values)
self.min = sum(e for e in values if e <= 0)
self.max = sum(e for e in values if e >= 0)
self._arr = [False] * self.len * (self.max - self.min + 1)
def __getitem__(self, item):
index, v = item
return self._arr[v * self.len + index]
def __setitem__(self, item, value):
index, v = item
self._arr[v * self.len + index] = value
class SubsetSum:
def __init__(self, values):
self._values = values
self._q = Q(values)
def prepare(self):
for s in range(self._q.min, self._q.max + 1):
self._q[0, s] = (self._values[0] == s)
for i in range(self._q.len):
self._q[i, 0] = True
for i in range(1, self._q.len):
v = self._values[i]
for s in range(self._q.min, self._q.max + 1):
self._q[i, s] = (v == s) or self._q[i - 1, s] or self._q[
i - 1, s - v]
def subsets(self, target=0):
yield from self._subsets(self._q.len - 1, target, [])
def _subsets(self, i, target, p):
assert i >= 0
v = self._values[i]
c = self._q[i - 1, target]
b = self._q[i - 1, target - v]
if i == 0:
if target == 0:
if p:
yield p
elif self._q[0, target]:
yield p + [i]
else:
if self._q.min <= target - v <= self._q.max and self._q[
i - 1, target - v]:
yield from self._subsets(i - 1, target - v, p + [i])
if self._q[i - 1, target]:
yield from self._subsets(i - 1, target, p)
Here's how it works:
arr = [-10, 1, 2, 20, 5, -100, -80, 10, 15, 15, 60, 100, -20, -18]
arr = sorted(arr)
s = SubsetSum(arr)
s.prepare()
subsets0 = list(s.subsets())
print(subsets0)
Output:
[[13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0], [13, 12, 11, 10, 9, 7, 6, 5, 3, 2, 1, 0], [13, 12, 11, 10, 9, 4, 2, 1, 0], [13, 12, 11, 10, 8, 7, 4, 2, 1, 0], [13, 12, 11, 10, 8, 6, 5, 4, 3, 1, 0], [13, 12, 11, 10, 7, 2, 1, 0], [13, 12, 11, 10, 6, 5, 3, 1, 0], [13, 12, 11, 9, 8, 7, 4, 2, 1, 0], [13, 12, 11, 9, 8, 6, 5, 4, 3, 1, 0], [13, 12, 11, 9, 7, 2, 1, 0], [13, 12, 11, 9, 6, 5, 3, 1, 0], [13, 12, 11, 8, 7, 6, 5, 3, 1, 0], [13, 12, 11, 8, 4, 1, 0], [13, 12, 11, 1, 0], [13, 12, 10, 9, 8, 7, 6, 5, 4, 3, 1, 0], [13, 12, 10, 9, 8, 2, 1, 0], [13, 12, 10, 9, 7, 6, 5, 3, 1, 0], [13, 12, 10, 9, 4, 1, 0], [13, 12, 10, 8, 7, 4, 1, 0], [13, 12, 10, 7, 1, 0], [13, 12, 9, 8, 7, 4, 1, 0], [13, 12, 9, 7, 1, 0], [13, 11, 10, 8, 6, 5, 4, 3, 2, 0], [13, 11, 10, 6, 5, 3, 2, 0], [13, 11, 9, 8, 6, 5, 4, 3, 2, 0], [13, 11, 9, 6, 5, 3, 2, 0], [13, 11, 8, 7, 6, 5, 3, 2, 0], [13, 11, 8, 4, 2, 0], [13, 11, 7, 6, 5, 4, 3, 2, 1], [13, 11, 7, 6, 5, 4, 3, 0], [13, 11, 2, 0], [13, 10, 9, 8, 7, 6, 5, 4, 3, 2, 0], [13, 10, 9, 7, 6, 5, 3, 2, 0], [13, 10, 9, 4, 2, 0], [13, 10, 8, 7, 4, 2, 0], [13, 10, 8, 6, 5, 4, 3, 2, 1], [13, 10, 8, 6, 5, 4, 3, 0], [13, 10, 7, 2, 0], [13, 10, 6, 5, 3, 2, 1], [13, 10, 6, 5, 3, 0], [13, 9, 8, 7, 4, 2, 0], [13, 9, 8, 6, 5, 4, 3, 2, 1], [13, 9, 8, 6, 5, 4, 3, 0], [13, 9, 7, 2, 0], [13, 9, 6, 5, 3, 2, 1], [13, 9, 6, 5, 3, 0], [13, 8, 7, 6, 5, 3, 2, 1], [13, 8, 7, 6, 5, 3, 0], [13, 8, 4, 2, 1], [13, 8, 4, 0], [13, 7, 6, 5, 4, 3, 1], [13, 2, 1], [13, 0], [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1], [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 0], [12, 11, 10, 9, 8, 2, 0], [12, 11, 10, 9, 7, 6, 5, 3, 2, 1], [12, 11, 10, 9, 7, 6, 5, 3, 0], [12, 11, 10, 9, 4, 2, 1], [12, 11, 10, 9, 4, 0], [12, 11, 10, 8, 7, 4, 2, 1], [12, 11, 10, 8, 7, 4, 0], [12, 11, 10, 8, 6, 5, 4, 3, 1], [12, 11, 10, 7, 2, 1], [12, 11, 10, 7, 0], [12, 11, 10, 6, 5, 3, 1], [12, 11, 9, 8, 7, 4, 2, 1], [12, 11, 9, 8, 7, 4, 0], [12, 11, 9, 8, 6, 5, 4, 3, 1], [12, 11, 9, 7, 2, 1], [12, 11, 9, 7, 0], [12, 11, 9, 6, 5, 3, 1], [12, 11, 8, 7, 6, 5, 3, 1], [12, 11, 8, 4, 1], [12, 11, 1], [12, 10, 9, 8, 7, 6, 5, 4, 3, 1], [12, 10, 9, 8, 2, 1], [12, 10, 9, 8, 0], [12, 10, 9, 7, 6, 5, 3, 1], [12, 10, 9, 4, 1], [12, 10, 8, 7, 4, 1], [12, 10, 7, 1], [12, 9, 8, 7, 4, 1], [12, 9, 7, 1], [11, 10, 8, 6, 5, 4, 3, 2], [11, 10, 6, 5, 3, 2], [11, 9, 8, 6, 5, 4, 3, 2], [11, 9, 6, 5, 3, 2], [11, 8, 7, 6, 5, 3, 2], [11, 8, 4, 2], [11, 7, 6, 5, 4, 3], [11, 2], [10, 9, 8, 7, 6, 5, 4, 3, 2], [10, 9, 7, 6, 5, 3, 2], [10, 9, 4, 2], [10, 8, 7, 4, 2], [10, 8, 6, 5, 4, 3], [10, 7, 2], [10, 6, 5, 3], [9, 8, 7, 4, 2], [9, 8, 6, 5, 4, 3], [9, 7, 2], [9, 6, 5, 3], [8, 7, 6, 5, 3], [8, 4]]
Reduce the number of subsets
We have 105 subsets that sum to 0, but we can remove the subsets that are superset of other subsets. We need a function to find if a list of elements contains all elements in another list. In Python:
import collections
def contains(l1, l2):
"""
Does l1 contain all elements of l2?
"""
c = collections.Counter(l1)
for e in l2:
c[e] -= 1
return all(n >= 0 for n in c.values())
Now, we can remove the subsets that are supersets of another subset.
def remove_supersets(subsets):
subsets = sorted(subsets, key=len)
new_subsets = []
for i, s1 in enumerate(subsets):
for s2 in subsets[:i]: # smaller subsets
if contains(s1, s2):
break
else: # not a superset
new_subsets.append(s1)
return new_subsets
In our situation:
subsets0 = remove_supersets(subsets0)
print(len(subsets0))
Output:
[[13, 0], [11, 2], [8, 4], [13, 2, 1], [12, 11, 1], [10, 7, 2], [9, 7, 2], [12, 10, 7, 1], [12, 9, 7, 1], [10, 9, 4, 2], [10, 6, 5, 3], [9, 6, 5, 3], [12, 11, 10, 7, 0], [12, 11, 9, 7, 0], [12, 10, 9, 8, 0], [12, 10, 9, 4, 1], [8, 7, 6, 5, 3], [12, 11, 10, 9, 4, 0], [12, 10, 9, 8, 2, 1], [11, 7, 6, 5, 4, 3], [13, 7, 6, 5, 4, 3, 1]]
[[0, 2, 10, 6, 4], [0, 2, 10, 8, 1], [0, 2, 11, 5, 4], [0, 2, 11, 7, 1], [0, 16, 9, 4], [0, 16, 15, 1], [0, 18, 19], [3, 2, 12, 11], [3, 2, 13, 10], [3, 17, 16], [3, 19, 14], [20, 14, 1]]
We managed to reduce the number of subsets to 21, that is a good improvement since we need to explore all possibilities to find an exact cover.
Algorithm X
I do not use the dancing links here (I think that technique is we'll designed for low level languages like C, but you can implement them in Python if you want). We just need to keep track of the remaing subsets:
class Matrix:
def __init__(self, subsets, ignore_indices=set()):
self._subsets = subsets
self._ignore_indices = ignore_indices
def subset_values(self, i):
assert i not in self._ignore_indices
return self._subsets[i]
def value_subsets_indices(self, j):
return [i for i, s in self._subsets_generator() if j in s]
def _subsets_generator(self):
return ((i, s) for i, s in enumerate(self._subsets) if
i not in self._ignore_indices)
def rarest_value(self):
c = collections.Counter(
j for _, s in self._subsets_generator() for j in s)
return c.most_common()[-1][0]
def take_subset(self, i):
s = self._subsets[i]
to_ignore = {i2 for i2, s2 in self._subsets_generator() if
set(s2) & set(s)}
return Matrix(self._subsets,
self._ignore_indices | to_ignore)
def __bool__(self):
return bool(list(self._subsets_generator()))
And finally the cover function:
def cover(m, t=[]):
if m: # m is not empty
j = m.rarest_value()
for i in m.value_subsets_indices(j):
m2 = m.take_subset(i)
yield from cover(m2, t + [i])
else:
yield t
Finally, we have:
m = Matrix(subsets0)
ts = list(cover(m))
t = max(ts, key=len)
print([[arr[j] for j in subsets0[i]] for i in t])
Output:
[[100, -100], [10, -10], [15, 2, 1, -18], [15, 5, -20], [60, 20, -80]]
Below essentially the same idea as Michael Huang, with 30 more lines...
A solution with cliques
We can prebuild all subsets whose sum is 0.
Build subsets of 1 elem;
then of size 2 by reusing the previous ones
and keep those whose sum is zero along the way
Now say such subset is a node of a graph.
Then a node is in relation with another one iff their associated subset has no number in common.
We thus want to build the maximum clique of the graph:
In a clique, all nodes are in relation idem their subsets are disjoints
The maximum clique gives us the maximal number of subsets
function forall (v, reduce) {
const nexts = v.map((el, i) => ({ v: [el], i, s: el })).reverse()
while (nexts.length) {
const next = nexts.pop()
for (let i = next.i + 1; i < v.length; ++i) {
const { s, skip } = reduce(next, v[i])
if (!skip) {
nexts.push({ v: next.v.concat(v[i]), s: s, i })
}
}
}
}
function buildSubsets (numbers) {
const sums = []
forall(numbers, (next, el) => {
const s = next.s + el
if (s === 0) {
sums.push({ s, v: next.v.concat(el) })
return { s, skip: true }
}
return { s }
})
return sums
}
const bin2decs = bin => {
const v = []
const s = bin.toString(2)
for (let i = 0; i < s.length; ++i) {
if (intersects(dec2bin(i), bin)) {
v.push(i)
}
}
return v
}
const dec2bin = dec => Math.pow(2, dec)
const decs2bin = decs => decs.reduce((bin, dec) => union(dec2bin(dec), bin), 0)
// Set methods on int
const isIn = (a, b) => (a & b) === a
const intersects = (a, b) => a & b
const union = (a, b) => a | b
// if a subset contains another one, discard it
// e.g [1,2,4] should be discarded if [1,2] is present
const cleanSubsets = bins => bins.filter(big => bins.every(b => big === b || !isIn(b, big)))
function bestClique (decs) {
const cliques = []
forall(decs, (next, el) => {
if (intersects(next.s, el)) { return { skip: true } }
const s = union(next.s, el)
cliques.push({ s, v: next.v.concat(el) })
return { s }
})
return cliques.sort((a, b) => b.v.length - a.v.length)[0]
}
// in case we have duplicated numbers in the list,
// they are still uniq thanks to their id: i (idem position in the list)
const buildNumbers = v => v.map((n, i) => {
const u = new Number(n)
u.i = i
return u
})
function run (v) {
const numbers = buildNumbers(v)
const subs = buildSubsets(numbers)
const bins = subs.map(s => decs2bin(s.v.map(n => n.i)))
const clique = bestClique(cleanSubsets(bins))
const indexedSubs = clique.v.map(bin2decs)
const subsets = indexedSubs.map(sub => sub.map(i => numbers[i].valueOf()))
console.log('subsets', JSON.stringify(subsets))
}
run([1, -1, 2, -2])
run([-10, 1, 2, 20, 5, -100, -80, 10, 15, 15, 60, 100, -20, -18, 10, -10])
run([-5, -4, 5, 2, 3, -1])