Greedy algorithm to find potential weighted activities in order? - algorithm

For example, let's say there's a list of activities {a, b, c, d, e, f, g}
a & b are worth 9 points each.
c is worth 8 points.
d & e are worth 6 points each.
f is worth 5 points.
g is worth 4 points.
The list of activities is already sorted by points descending.
I want to find the highest points combination of three activities (let's call this combination X) that fulfills certain requirements (such that F(X) = true).
F(X) accepts only a combination of three activities, and cannot be modified.
How can I generate X without first having to calculate all possible combinations?
How can I iterate through all the possible combinations in decreasing total points?
I want to be able to find the highest point combination, test it. If it fails, generate second-highest point combination, etc.
The example list is only a few items. However, the actual list can get extremely large that and it would be impractical to generate all combinations.
How should I do this?

The following idea only solves an easier version of the problem, but maybe it can be extended to a more powerful solution.
Let [a(1)..a(N)] be the input array.
What I suggest here is a way to enumerate the top N^(1/3) triplets, out of the C(N,3)~N^3 ones in the complete enumeration. I know this is modest, but it guarantees O(N) time and space.
s = a(1) + a(2) + a(N^(1/3))
T = All triplets in [a(1),a(2), .. ,a(N^(1/3))] (takes O(N) time and space)
Sort T in descending triplet sum order (Time complexity: O(N^(1/3) * log N) = O(N) )
Iterate over T and return every triplet r while sum(r) >= s
Explanation:
In (1) we compute the highest score possible for a triplet that doesn't involve only items from T = [a(1)..a(N^(1/3))]. In other words, T already contains all triplets of score > s. Therefore we generate all triplets in T, sort them, and return only the ones we are sure about (i.e. the ones with score>=s). How many such triplets will be returned? well, this depends on the array values, but we can guarantee at least N^(1/3) - 2 triplets, since all triplets [a(1)+a(2)+a(i)] for 2<i<=N^(1/3) have a sum >=s. In practice the number of "good" triplets may be much higher, but again, this depends on the array numbers distribution.

Here is way to generate ordered combinations of three:-
for(int i=0;i<Set.size();i++) {
for(int j=i+1;j<Set.size();j++) {
for(int k=j+1;k<Set.size();k++) {
if(F(Set[i],Set[j],Set[k])) {
// your code to find max
}
}
}
}

I do not think there is a great solution for your requirement. Here is the best (and worst) solution I can think of: Generate all combinations put them in a vector then sort! Complexity is O(n^3 log(n^3)). Only little improvement I think can be achieve by assuming that there is no point of having same value in the same position multiple times as it will generate a combination already got generated.
Implementation in c++:
#include <stdio.h>
#include <algorithm>
#include <vector>
using namespace std;
struct Combin {
int a, b, c, sum;
};
bool comp(Combin a, Combin b) {
return a.sum>b.sum;
}
int main() {
vector<int> act;
int n;
while(scanf("%d", &n)==1 && n) {
act.push_back(n);
}
vector<Combin>combs;
for (int i=0; i<act.size(); i++) {
if (i>0 && act[i-1]==act[i]) continue;
for (int j=i+1; j<act.size(); j++) {
if (j>i+1 && act[j-1]==act[j]) continue;
for (int k=j+1; k<act.size(); k++) {
if (k>j+1 && act[k-1]==act[k]) continue;
Combin comb;
comb.a=i; comb.b=j; comb.c=k;
comb.sum = act[i]+act[j]+act[k];
combs.push_back(comb);
}
}
}
sort(combs.begin(), combs.end(), comp);
for (int i=0; i<combs.size(); i++) {
printf("(%d,%d,%d)=%d\n", act[combs[i].a], act[combs[i].b], act[combs[i].c], combs[i].sum);
}
return 0;
}

I remembered that Pythons combinations generator tends to keep the LHS-to-RHS of its input varying from the least-to-most, and outputs the LHS terms first.
If you use Pythons method of generating combinations, which its docs states is the following:
def combinations(iterable, r):
# combinations('ABCD', 2) --> AB AC AD BC BD CD
# combinations(range(4), 3) --> 012 013 023 123
pool = tuple(iterable)
n = len(pool)
if r > n:
return
indices = range(r)
yield tuple(pool[i] for i in indices)
while True:
for i in reversed(range(r)):
if indices[i] != i + n - r:
break
else:
return
indices[i] += 1
for j in range(i+1, r):
indices[j] = indices[j-1] + 1
yield tuple(pool[i] for i in indices)
Then you can just give the function your items in decreasing weight order to generate items in decreasing weight order:
for x in combinations('abcdef', 3):
print(x)
('a', 'b', 'c')
('a', 'b', 'd')
('a', 'b', 'e')
('a', 'b', 'f')
('a', 'c', 'd')
('a', 'c', 'e')
('a', 'c', 'f')
('a', 'd', 'e')
('a', 'd', 'f')
('a', 'e', 'f')
('b', 'c', 'd')
('b', 'c', 'e')
('b', 'c', 'f')
('b', 'd', 'e')
('b', 'd', 'f')
('b', 'e', 'f')
('c', 'd', 'e')
('c', 'd', 'f')
('c', 'e', 'f')
('d', 'e', 'f')
Note: As Essam points out in the comments, combinations(data, 3) is equivalent to comb3(data) where:
def comb3(data):
lendata = len(data)
for i in range(lendata):
for j in range(i+1, lendata):
for k in range(j+1, lendata):
yield (data[i], data[j], data[k])

My other answer of just using generated combinations in a certain order doesn't give the fully sorted answer. In fact their will be 6 out of 35 occasions where the total for the next triple goes up rather than down.
If we use the combinations but put them in a heap of fixed maximum size then we can trade the maximum heap size for accuracy of sort like so:
from itertools import combinations
from heapq import heappush, heappop, heappushpop
BESTOFLAST = 10 # max heap size
item2points = dict(a=9, b=9, c=8, d=6, e=6, f=5, g=4)
def partially_ordered_triples(item2points, BESTOFLAST=BESTOFLAST):
ordereditems = sorted(item2points.keys(),
key=lambda item: item2points[item],
reverse=True)
#print(ordereditems) # ['a', 'b', 'c', 'e', 'd', 'f', 'g']
triples = combinations(ordereditems, 3)
heap = [] # Empty heap
# Preload heap
for i in range(BESTOFLAST):
triple = triples.next()
total = sum(item2points[item] for item in triple)
heappush(heap, (-total, triple)) # minheap so -total
# load/remove from heap in partially sorted order
for triple in triples:
total = sum(item2points[item] for item in triple)
thistotal, thistriple = heappushpop(heap, (-total, triple))
yield thistriple, -thistotal
# drain rest of heap
while heap:
thistotal, thistriple = heappop(heap)
yield thistriple, -thistotal
if __name__ == '__main__':
for heapsize in range(BESTOFLAST + 1):
print('Using a heap of size: %i and skipping:' % heapsize)
length = skipped = 0
previoustotal = sum(item2points.values()) # Starting high value
for triple, newtotal in partially_ordered_triples(item2points, heapsize):
if newtotal > previoustotal: skipped += 1
length += 1
previoustotal = newtotal
print(" of %i triples, %i were skipped to keep the total count decreasing" % (length, skipped))
If the size of the heap is large enough then there will be no deviations from the required order. If too small then the number of deviations increases:
The output:
Using a heap of size: 0 and skipping:
of 35 triples, 6 were skipped to keep the total count decreasing
Using a heap of size: 1 and skipping:
of 35 triples, 4 were skipped to keep the total count decreasing
Using a heap of size: 2 and skipping:
of 35 triples, 4 were skipped to keep the total count decreasing
Using a heap of size: 3 and skipping:
of 35 triples, 3 were skipped to keep the total count decreasing
Using a heap of size: 4 and skipping:
of 35 triples, 2 were skipped to keep the total count decreasing
Using a heap of size: 5 and skipping:
of 35 triples, 2 were skipped to keep the total count decreasing
Using a heap of size: 6 and skipping:
of 35 triples, 1 were skipped to keep the total count decreasing
Using a heap of size: 7 and skipping:
of 35 triples, 1 were skipped to keep the total count decreasing
Using a heap of size: 8 and skipping:
of 35 triples, 1 were skipped to keep the total count decreasing
Using a heap of size: 9 and skipping:
of 35 triples, 0 were skipped to keep the total count decreasing
Using a heap of size: 10 and skipping:
of 35 triples, 0 were skipped to keep the total count decreasing

Related

How to generate all set combinations in a random order

First off, I'm not even sure the terminology is the right one, as I havent found anything similar (especially since I dont even know what keywords to use)
The problem:
There is a population of people, and I want to assign them into groups. I have a set of rules to give each assignation a score. I want to find the best one (or at least a very good one).
For example, with a population of four {A,B,C,D} and assigning to two groups of two, the possible assignations are:
{A,B},{C,D}
{A,C},{B,D}
{A,D},{B,C}
And for example, {B,A},{C,D} and {C,D},{A,B} are both the same as the first one (I don't care about the order inside the groups and the order of the groups themselves).
The number of people, the amount of groups and how many people fit in each group are all inputs.
My idea was to list each possible assignation, calculate their score and keep track of the best one. That is, to brute force it. As the population can be big, I was thinking of going through them in a random order and return the best one found when time runs out (probably when the user gets bored or thinks it is a good enough find). The population can vary from very small (the four listed) to really big (maybe 200+) so just trying random ones without caring about repeats breaks down with the small ones, where a brute force is possible (plus I wouldn't know when to stop if I used plain random permutations).
The population is big enough that listing all the assignations to be able to shuffle them doesn't fit into memory. So I need either a method to find all the possible assignations in a random order, or a method to, given an index, generate the corresponding assignation, and use an index array and shuffle that (the second would be better because I can then easily distribute the tasks into multiple servers).
A simple recursive algorithm for generating these pairings is to pair the first element with each of the remaining elements, and for each of those couplings, recursively generate all the pairings of the remaining elements. For groups, generate all the groups made up of the first element and all the combinations of the remaining elements, then recurse for the remainders.
You can compute how many possible sets of groups there are like this:
public static int numGroupingCombinations(int n, int groupSize)
{
if(n % groupSize != 0)
return 0; // n must be a multiple of groupSize
int count = 1;
while(n > groupSize)
{
count *= nCr(n - 1, groupSize - 1);
n -= groupSize;
}
return count;
}
public static int nCr(int n, int r)
{
int ret = 1;
for (int k = 0; k < r; k++) {
ret = ret * (n-k) / (k+1);
}
return ret;
}
So I need either a method to find all the possible assignations in a random order, or a method to, given an index, generate the corresponding assignation, and use an index array and shuffle that (the second would be better because I can then easily distribute the tasks into multiple servers).
To generate a grouping from an index, choose a combination of items to group with the first element by taking the modulo of the index with the number of possible combinations, and generating the combination from the result using this algorithm. Then divide the index by that same number and recursively generate the rest of the set.
public static void generateGrouping(String[] elements, int groupSize, int start, int index)
{
if(elements.length % groupSize != 0)
return;
int remainingSize = elements.length - start;
if(remainingSize == 0)
{
// output the elements:
for(int i = 0; i < elements.length; i += groupSize)
{
System.out.print("[");
for(int j = 0; j < groupSize; j++)
System.out.print(((j==0)?"":",")+elements[i+j]);
System.out.print("]");
}
System.out.println("");
return;
}
int combinations = nCr(remainingSize - 1, groupSize - 1);
// decide which combination of remaining elements to pair the first element with:
int[] combination = getKthCombination(remainingSize - 1, groupSize - 1, index % combinations);
// swap elements into place
for(int i = 0; i < groupSize - 1; i++)
{
String temp = elements[start + 1 + i];
elements[start + 1 + i] = elements[start + 1 + combination[i]];
elements[start + 1 + combination[i]] = temp;
}
generateGrouping(elements, groupSize, start + groupSize, index / combinations);
// swap them back:
for(int i = groupSize - 2; i >= 0; i--)
{
String temp = elements[start + 1 + i];
elements[start + 1 + i] = elements[start + 1 + combination[i]];
elements[start + 1 + combination[i]] = temp;
}
}
public static void getKthCombination(int n, int r, int k, int[] c, int start, int offset)
{
if(r == 0)
return;
if(r == n)
{
for(int i = 0; i < r; i++)
c[start + i] = i + offset;
return;
}
int count = nCr(n - 1, r - 1);
if(k < count)
{
c[start] = offset;
getKthCombination(n-1, r-1, k, c, start + 1, offset + 1);
return;
}
getKthCombination(n-1, r, k-count, c, start, offset + 1);
}
public static int[] getKthCombination(int n, int r, int k)
{
int[] c = new int[r];
getKthCombination(n, r, k, c, 0, 0);
return c;
}
Demo
The start parameter is just how far along the list you are, so pass zero when calling the function at the top level. The function could easily be rewritten to be iterative. You could also pass an array of indices instead of an array of objects that you want to group, if swapping the objects is a large overhead.
What you call "assignations" are partitions with a fixed number of equally sized parts. Well, mostly. You didn't specify what should happen if (# of groups) * (size of each group) is less than or greater than your population size.
Generating every possible partition in a non-specific order is not too difficult, but it is only good for small populations or for filtering and finding any partition that matches some independent criteria. If you need to optimize or minimize something, you'll end up looking at the whole set of partitions, which may not be feasible.
Based on the description of your actual problem, you want to read up on local search and optimization algorithms, of which the aforementioned simulated annealing is one such technique.
With all that said, here is a simple recursive Python function that generates fixed-length partitions with equal-sized parts in no particular order. It is a specialization of my answer to a similar partition problem, and that answer is itself a specialization of this answer. It should be fairly straightforward to translate into JavaScript (with ES6 generators).
def special_partitions(population, num_groups, group_size):
"""Yields all partitions with a fixed number of equally sized parts.
Each yielded partition is a list of length `num_groups`,
and each part a tuple of length `group_size.
"""
assert len(population) == num_groups * group_size
groups = [] # a list of lists, currently empty
def assign(i):
if i >= len(population):
yield list(map(tuple, groups))
else:
# try to assign to an existing group, if possible
for group in groups:
if len(group) < group_size:
group.append(population[i])
yield from assign(i + 1)
group.pop()
# assign to an entirely new group, if possible
if len(groups) < num_groups:
groups.append([population[i]])
yield from assign(i + 1)
groups.pop()
yield from assign(0)
for partition in special_partitions('ABCD', 2, 2):
print(partition)
print()
for partition in special_partitions('ABCDEF', 2, 3):
print(partition)
When executed, this prints:
[('A', 'B'), ('C', 'D')]
[('A', 'C'), ('B', 'D')]
[('A', 'D'), ('B', 'C')]
[('A', 'B', 'C'), ('D', 'E', 'F')]
[('A', 'B', 'D'), ('C', 'E', 'F')]
[('A', 'B', 'E'), ('C', 'D', 'F')]
[('A', 'B', 'F'), ('C', 'D', 'E')]
[('A', 'C', 'D'), ('B', 'E', 'F')]
[('A', 'C', 'E'), ('B', 'D', 'F')]
[('A', 'C', 'F'), ('B', 'D', 'E')]
[('A', 'D', 'E'), ('B', 'C', 'F')]
[('A', 'D', 'F'), ('B', 'C', 'E')]
[('A', 'E', 'F'), ('B', 'C', 'D')]
Let's say we have a total of N elements that we want to organize in G groups of E (with G*E = N). Neither the order of the groups nor the order of the elements within groups matter. The end goal is to produce every solution in a random order, knowing that we cannot store every solution at once.
First, let's think about how to produce one solution. Since order doesn't matter, we can normalize any solution by sorting the elements within groups as well as the groups themselves, by their first element.
For instance, if we consider the population {A, B, C, D}, with N = 4, G = 2, E = 2, then the solution {B,D}, {C,A} can be normalized as {A,C}, {B,D}. The elements are sorted within each group (A before C), and the groups are sorted (A before B).
When the solutions are normalized, the first element of the first group is always the first element of the population. The second element is one of the N-1 remaining, the third element is one of the N-2 remaining, and so on, except these elements must remain sorted. So there are (N-1)!/((N-E)!*(E-1)!) possibilities for the first group.
Similarly, the first element of the next groups are fixed : they are the first of the remaining elements after each group has been created. Thus, the number of possibilities for the (n+1)th group (n from 0 to G-1) is (N-nE-1)!/((N-(n+1)E)!*(E-1)!) = ((G-n)E-1)!/(((G-n-1)E)!*(E-1)!).
This gives us one possible way of indexing a solution. The index is not a single integer, but rather an array of G integers, the integer n (still from 0 to G-1) being in the range 1 to (N-nE-1)!/((N-nE-E)!*(E-1)!), and representing the group n (or "(n+1)th group") of the solution. This is easy to produce randomly and to check for duplicates.
The last thing we need to find is a way to produce a group from a corresponding integer, n. We need to choose E-1 elements from the N-nE-1 remaining. At this point, you can imagine listing every combination and choosing the (n+1)th one. Of course, this can be done without generating every combination : see this question.
For curiosity, the total number of solutions is (GE)!/(G!*(E!)^G).
In your example, it is (2*2)!/(2!*(2!)^2) = 3.
For N = 200 and E = 2, there are 6.7e186 solutions.
For N = 200 and E = 5, there are 6.6e243 solutions (the maximum I found for 200 elements).
Additionally, for N = 200 and E > 13, the number of possibilities for the first group is greater than 2^64 (so it cannot be stored in a 64-bit integer), which is problematic for representing an index. But as long as you don't need groups with more than 13 elements, you can use arrays of 64-bit integers as indices.
Perhaps a simulated annealing approach might work. You can start with a non-optimal initial solution and iterate using heuristics to improve.
Your scoring criteria may help you choose the initial solution, e.g. make the best scoring first group you can, then with what's left make the best scoring second group, and so on.
Good choices of "neighboring states" may be implied by your scoring criteria, but at the very least, you could consider two states neighboring if they differ by a single swap.
So the iteration portion of the algorithm would be to try a bunch of swaps, sampled randomly, and choose the one that improves the global score according to the annealing schedule.
I'm hoping you can find a better choice of the adjacent states! That is, I'm hoping you can find better rules for iteratively improving based on your scoring criteria.
If you have a sufficiently large population that you can't fit all assignations in memory and are unlikely to ever test all possible assignations, then the simplest method will just be to choose test assignations randomly. For example:
repeat
randomly shuffle population
put 1st n/2 members of the shuffled pop into assig1 and 2nd n/2 into assig2
score assignation and record it if best so far
until bored
If you have a large population it is unlikely that there will be much loss of efficiency due to duplicating a test as it is unlikely that you would chance on the same assignation again.
Depending on your scoring rules it may be more efficient to choose the next assignation to be tested by, for example swapping a pair of members between the best assignation so far found, but you haven't provided enough information to determine if that is the case.
Here is an approach targeting your optimization-problem (and ignoring your permutation-based approach).
I formulate the problem as mixed-integer-problem and use specialized solvers to calculate good solutions.
As your problem is not well-formulated, it might need some modifications. But the general message is: this approach will be hard to beat!.
Code
import numpy as np
from cvxpy import *
""" Parameters """
N_POPULATION = 50
GROUPSIZES = [3, 6, 12, 12, 17]
assert sum(GROUPSIZES) == N_POPULATION
N_GROUPS = len(GROUPSIZES)
OBJ_FACTORS = [0.4, 0.1, 0.15, 0.35] # age is the most important
""" Create fake data """
age_vector = np.clip(np.random.normal(loc=35.0, scale=10.0, size=N_POPULATION).astype(int), 0, np.inf)
height_vector = np.clip(np.random.normal(loc=180.0, scale=15.0, size=N_POPULATION).astype(int), 0, np.inf)
weight_vector = np.clip(np.random.normal(loc=85, scale=20, size=N_POPULATION).astype(int), 0, np.inf)
skill_vector = np.random.randint(0, 100, N_POPULATION)
""" Calculate a-priori stats """
age_mean, height_mean, weight_mean, skill_mean = np.mean(age_vector), np.mean(height_vector), \
np.mean(weight_vector), np.mean(skill_vector)
""" Build optimization-model """
# Variables
X = Bool(N_POPULATION, N_GROUPS) # 1 if part of group
D = Variable(4, N_GROUPS) # aux-var for deviation-norm
# Constraints
constraints = []
# (1) each person is exactly in one group
for p in range(N_POPULATION):
constraints.append(sum_entries(X[p, :]) == 1)
# (2) each group has exactly n (a-priori known) members
for g_ind, g_size in enumerate(GROUPSIZES):
constraints.append(sum_entries(X[:, g_ind]) == g_size)
# Objective: minimize deviation from global-statistics within each group
# (ugly code; could be improved a lot!)
group_deviations = [[], [], [], []] # age, height, weight, skill
for g_ind, g_size in enumerate(GROUPSIZES):
group_deviations[0].append((sum_entries(mul_elemwise(age_vector, X[:, g_ind])) / g_size) - age_mean)
group_deviations[1].append((sum_entries(mul_elemwise(height_vector, X[:, g_ind])) / g_size) - height_mean)
group_deviations[2].append((sum_entries(mul_elemwise(weight_vector, X[:, g_ind])) / g_size) - weight_mean)
group_deviations[3].append((sum_entries(mul_elemwise(skill_vector, X[:, g_ind])) / g_size) - skill_mean)
for i in range(4):
for g in range(N_GROUPS):
constraints.append(D[i,g] >= abs(group_deviations[i][g]))
obj_parts = [sum_entries(OBJ_FACTORS[i] * D[i, :]) for i in range(4)]
objective = Minimize(sum(obj_parts))
""" Build optimization-problem & solve """
problem = Problem(objective, constraints)
problem.solve(solver=GUROBI, verbose=True, TimeLimit=120) # might need to use non-commercial solver here
print('Min-objective: ', problem.value)
""" Evaluate solution """
filled_groups = [[] for g in range(N_GROUPS)]
for g_ind, g_size in enumerate(GROUPSIZES):
for p in range(N_POPULATION):
if np.isclose(X[p, g_ind].value, 1.0):
filled_groups[g_ind].append(p)
for g_ind, g_size in enumerate(GROUPSIZES):
print('Group: ', g_ind, ' of size: ', g_size)
print(' ' + str(filled_groups[g_ind]))
group_stats = []
for g in range(N_GROUPS):
age_mean_in_group = age_vector[filled_groups[g]].mean()
height_mean_in_group = height_vector[filled_groups[g]].mean()
weight_mean_in_group = weight_vector[filled_groups[g]].mean()
skill_mean_in_group = skill_vector[filled_groups[g]].mean()
group_stats.append((age_mean_in_group, height_mean_in_group, weight_mean_in_group, skill_mean_in_group))
print('group-assignment solution means: ')
for g in range(N_GROUPS):
print(np.round(group_stats[g], 1))
""" Compare with input """
input_data = np.vstack((age_vector, height_vector, weight_vector, skill_vector))
print('input-means')
print(age_mean, height_mean, weight_mean, skill_mean)
print('input-data')
print(input_data)
Output (time-limit of 2 minutes; commercial solver)
Time limit reached
Best objective 9.612058823514e-01, best bound 4.784117647059e-01, gap 50.2280%
('Min-objective: ', 0.961205882351435)
('Group: ', 0, ' of size: ', 3)
[16, 20, 27]
('Group: ', 1, ' of size: ', 6)
[26, 32, 34, 45, 47, 49]
('Group: ', 2, ' of size: ', 12)
[0, 6, 10, 12, 15, 21, 24, 30, 38, 42, 43, 48]
('Group: ', 3, ' of size: ', 12)
[2, 3, 13, 17, 19, 22, 23, 25, 31, 36, 37, 40]
('Group: ', 4, ' of size: ', 17)
[1, 4, 5, 7, 8, 9, 11, 14, 18, 28, 29, 33, 35, 39, 41, 44, 46]
group-assignment solution means:
[ 33.3 179.3 83.7 49. ]
[ 33.8 178.2 84.3 49.2]
[ 33.9 178.7 83.8 49.1]
[ 33.8 179.1 84.1 49.2]
[ 34. 179.6 84.7 49. ]
input-means
(33.859999999999999, 179.06, 84.239999999999995, 49.100000000000001)
input-data
[[ 22. 35. 28. 32. 41. 26. 25. 37. 32. 26. 36. 36.
27. 34. 38. 38. 38. 47. 35. 35. 34. 30. 38. 34.
31. 21. 25. 28. 22. 40. 30. 18. 32. 46. 38. 38.
49. 20. 53. 32. 49. 44. 44. 42. 29. 39. 21. 36.
29. 33.]
[ 161. 158. 177. 195. 197. 206. 169. 182. 182. 198. 165. 185.
171. 175. 176. 176. 172. 196. 186. 172. 184. 198. 172. 162.
171. 175. 178. 182. 163. 176. 192. 182. 187. 161. 158. 191.
182. 164. 178. 174. 197. 156. 176. 196. 170. 197. 192. 171.
191. 178.]
[ 85. 103. 99. 93. 71. 109. 63. 87. 60. 94. 48. 122.
56. 84. 69. 162. 104. 71. 92. 97. 101. 66. 58. 69.
88. 69. 80. 46. 74. 61. 25. 74. 59. 69. 112. 82.
104. 62. 98. 84. 129. 71. 98. 107. 111. 117. 81. 74.
110. 64.]
[ 81. 67. 49. 74. 65. 93. 25. 7. 99. 34. 37. 1.
25. 1. 96. 36. 39. 41. 33. 28. 17. 95. 11. 80.
27. 78. 97. 91. 77. 88. 29. 54. 16. 67. 26. 13.
31. 57. 84. 3. 87. 7. 99. 35. 12. 44. 71. 43.
16. 69.]]
Solution remarks
This solution looks quite nice (regarding mean-deviation) and it only took 2 minutes (we decided on the time-limit a-priori)
We also got tight bounds: 0.961 is our solution; we know it can't be lower than 4.784
Reproducibility
The code uses numpy and cvxpy
An commercial solver was used
You might need to use a non-commercial MIP-solver (supporting time-limit for early abortion; take current best-solution)
The valid open-source MIP-solvers supported in cvxpy are: cbc (no chance of setting time-limits for now) and glpk (check the docs for time-limit support)
Model decisions
The code uses L1-norm penalization, which results in an MIP-problem
Depending on your problem, it might be wise to use L2-norm penalization (one big deviation hurts more than many smaller ones), which will result in a harder problem (MIQP / MISOCP)

All ways to partition a string

I'm trying to find a efficient algorithm to get all ways to partition a string
eg for a given string 'abcd' =>
'a' 'bcd'
'a' 'b' 'cd'
'a' 'b' 'c' 'd'
'ab' 'cd'
'ab' 'c' 'd'
'abc' 'd'
'a', 'bc', 'd
any language would be appreciated
Thanks in advance !
Problem analysis
Between each pair of adjacent characters, you can decide whether to cut. For a string of size n, there are n-1 positions where you can cut or not, i.e. there are two possibilities. Therefore a string of size n can be partitioned in 2n-1 ways.
The output consists of 2n-1 partitions, each having n characters plus separators. So we can describe the output size as f(n) = 2n-1 * n + s(n) where s(n) ≥ 0 accounts for the partition separators and line separators.
So due to the output size alone, an algorithm solving this problem must have exponential runtime or worse: Ω(2n).
(0 ≤ c * 2n = ½ * 2n = 2n-1 ≤ 2n-1 * n ≤ f(n) for all n≥k with positive constants c=½, k=1)
Solution
I chose to represent a partition as integer. Each bit in cutpoints determines whether to cut between characters i and i+1. To iterate through all possible partitions, we just need to go trough all integers between 0 and 2^(n-1) - 1.
Example: For a string of length 4, we go through all integers between 0 and 2^3 - 1 or 0 and 7 or in binary: 000 and 111.
# (python 2 or 3)
def all_partitions(string):
for cutpoints in range(1 << (len(string)-1)):
result = []
lastcut = 0
for i in range(len(string)-1):
if (1<<i) & cutpoints != 0:
result.append(string[lastcut:(i+1)])
lastcut = i+1
result.append(string[lastcut:])
yield result
for partition in all_partitions("abcd"):
print(partition)
Memory usage:
I think my solution uses O(n) memory with Python 3. Only one partition is generated at a time, it's printed and not referenced anymore. This changes of course, if you keep all results, e.g. by storing them in a list.
In Python 2 replace range with xrange, otherwise all possible cutpoints will be stored in a list, therefore needing an exponential amount of memory.
JavaScript solution
// ES6 generator
function* all_partitions(string) {
for (var cutpoints = 0; cutpoints < (1 << (string.length - 1)); cutpoints++) {
var result = [];
var lastcut = 0;
for (var i = 0; i < string.length - 1; i++) {
if (((1 << i) & cutpoints) !== 0) {
result.push(string.slice(lastcut, i + 1));
lastcut = i + 1;
}
}
result.push(string.slice(lastcut));
yield result;
}
}
for (var partition of all_partitions("abcd")) {
console.log(partition);
}
Tested with NodeJS v4.4.3 (disclaimer: I have not used NodeJS before).
GeeksforGeeks has provided a well-explained solution to this problem:
For string abcd there will be 2^(n-1) i.e. 8 partitions.
(a)(b)(c)(d)
(a)(b)(cd)
(a)(bc)(d)
(a)(bcd)
(ab)(c)(d)
(ab)(cd)
(abc)(d)
(abcd)
The crux of the solution lies in the recursion to print all the permutations.
maintain two parameters – index of the next character to be processed and the output string so far. We start from index of next character to be processed, append substring formed by unprocessed string to the output string and recurse on remaining string until we process the whole string.
// Java program to find all combinations of Non-
// overlapping substrings formed from given
// string
class GFG
{
// find all combinations of non-overlapping
// substrings formed by input string str
static void findCombinations(String str, int index,
String out)
{
if (index == str.length())
System.out.println(out);
for (int i = index; i < str.length(); i++)
// append substring formed by str[index,
// i] to output string
findCombinations(str, i + 1, out +
"(" + str.substring(index, i+1) + ")" );
}
// driver program
public static void main (String[] args)
{
// input string
String str = "abcd";
findCombinations(str, 0, "");
}
}
Time Complexity is O(2^n)
Here's the link to the article: http://www.geeksforgeeks.org/print-ways-break-string-bracket-form/
I just wanted to post a simple recursive solution to this problem for anyone stumbling on this question. Probably not the best way, but this was way simpler for me to understand and implement. If I am wrong, please correct me.
def party(s:str, P:list, res:list) -> None :
"""Recursively generates all partitions of a given string"""
res.append(P+[s])
for i in range(1,len(s)):
party(s[i:],P+[s[:i]],res)
res = []
party("abcd",[],res)
print(res)
"""
[['abcd'], ['a', 'bcd'], ['a', 'b', 'cd'], ['a', 'b', 'c', 'd'],
['a', 'bc', 'd'], ['ab', 'cd'], ['ab', 'c', 'd'], ['abc', 'd']]
"""
It works as follows:
Given a string or a substring of it, we can split after each of its character creating two halves.
Say: "abc" can be partitioned into ["a","bc"], ["ab","c"]
We save the first part in a intermediate partition P and
recursively call party on the other half.
Because both halves together form a complete partition we save it to res.
Example:
initially: s = "abc" is a valid partition, save it to res.
recr call: s = "bc", P = ["a"] , so P +[s]= ["a","bc"] is also valid, save it to res.
Proceed with splitting "bc".
P = ["a","b"], s="c" so P + [s] is also valid. And so on..
recr call 3: s = "c", P = ["ab"], so P + [s] =["ab","c"] is also valid, save it to res
Working:
tests = ["abc","abcd","a"]
for t in tests:
res = []
party(t,[],res)
print(f'{t} -> {res} \n')
"""Output
abc -> [['abc'], ['a', 'bc'], ['a', 'b', 'c'], ['ab', 'c']]
abcd -> [['abcd'], ['a', 'bcd'], ['a', 'b', 'cd'], ['a', 'b', 'c', 'd'],
['a', 'bc', 'd'], ['ab', 'cd'], ['ab', 'c', 'd'], ['abc', 'd']]
a -> [['a']]
"""
This is a solution which minimizes developer time by taking advantage of a built-in iterator. It should be reasonably quick for problem sizes for which the answer itself is not infeasibly large.
There is a one-to-one correspondence between partitions of a string and subsets of potential cutpoints. If the length of the string is n then there are n-1 places where you could cut the string. A straightforward way would be to iterate through such subsets, and for each such subset, slice the string in that way. Here is a Python approach which uses the standard modules itertools:
import itertools
def multiSlice(s,cutpoints):
k = len(cutpoints)
if k == 0:
return [s]
else:
multislices = [s[:cutpoints[0]]]
multislices.extend(s[cutpoints[i]:cutpoints[i+1]] for i in range(k-1))
multislices.append(s[cutpoints[k-1]:])
return multislices
def allPartitions(s):
n = len(s)
cuts = list(range(1,n))
for k in range(n):
for cutpoints in itertools.combinations(cuts,k):
yield multiSlice(s,cutpoints)
For example:
>>> parts = allPartitions('World')
>>> for p in parts: print(p)
['World']
['W', 'orld']
['Wo', 'rld']
['Wor', 'ld']
['Worl', 'd']
['W', 'o', 'rld']
['W', 'or', 'ld']
['W', 'orl', 'd']
['Wo', 'r', 'ld']
['Wo', 'rl', 'd']
['Wor', 'l', 'd']
['W', 'o', 'r', 'ld']
['W', 'o', 'rl', 'd']
['W', 'or', 'l', 'd']
['Wo', 'r', 'l', 'd']
['W', 'o', 'r', 'l', 'd']
Note that this approach produces generates ['World'] as a partition of 'World'. This corresponds to slicing with an empty set of cut points. I regard that as a feature rather than a bug since the standard mathematical definition of partition allows for a partition of a set into one piece. If this in undesirable for your purposes, the fix is easy enough -- just iterate over the nonempty subsets of the cut points. In terms of the above code, this fix amounts to adding two characters to allPartitions: replace
for k in range(n):
by
for k in range(1,n):
Something along the lines of the following (untested and likely buggy VB.NET sample)
Function FindAllGroups(s As String) As List(Of List(Of String))
Dim ret As New List(Of List(Of String))
Dim l As New List(Of String)
l.Add(s) 'the whole string unbroken
ret.Add(l) 'first option we return is the whole unbroken string by itself
If s.Length > 1 Then
Dim tmp = FindAllGroups(s.Substring(1)) 'find all the groups for the rest of the string after the first character
For Each l2 in tmp
l = l2.ToList 'Copy it
l.Insert(s.SubString(0,1),0)'insert the first character from this string by itself before this combination for the rest of the string
ret.Add(l)
Next
For Each l2 in tmp
l = l2.ToList 'Copy it
l(0)= s.SubString(0,1) & l(0) 'insert the first character from this string as part of the first element in the list
ret.Add(l)
Next
End If
Return ret
End Function
This basically works by saying that we can take 'abcd' and split it into
'a', 1st option for 'bcd' split
'a', 2nd option for 'bcd' split
...
+
1st option for 'bcd' split with the first element prepended with 'a'
2nd option for 'bcd' split with the first element prepended with 'a'
...
then to calculate 'bcd', we just repeat the process as above, only with
'b', 1st option for 'cd' split
'b', 2nd option for 'cd' split
...
+
1st option for 'cd' split with the first element prepended with 'b'
2nd option for 'cd' split with the first element prepended with 'b'
...
etc. repeated recursively.
However, this code isn't particularly efficient at runtime. One thing that you could do to speed it up significantly would be to add a Dictionary(Of String, List(Of List(Of String)) outside the function which you can store a cache of the results in and if the item exists in there, you return from there, if not, calculate it and add it. Lists also might not be the most efficient, and the ToList function might not be the quickest way of cloning. However, I've simplified it to make it easier to understand and also to save me time working it out!
This is a fairly standard depth first search (backtracking) problem.
void dfs(int startIndex, const string& s, vector<string>& tmp,
vector<vector<string>>& res){
if (startIndex == s.size()) {
res.push_back(tmp);
return;
}
for (int i = 1; startIndex + i <= s.size(); ++i) {
tmp.push_back(s.substr(startIndex, i));
dfs(startIndex + i, s, tmp, res);
tmp.pop_back();
}
}
int main()
{
vector<vector<string>> res;
vector<string> tmp;
string s = "abcd";
dfs(0, s, tmp, res);
}
For its execution and result please refer to here.
#include <bits/stdc++.h>
using namespace std;
vector<string> ans;
string s;
void solve(int previouscut, int len)
{
if(previouscut == s.length()) // base case
{
for(auto str:ans)
cout << str << " " ;
cout << "\n";
return;
}
if(previouscut+len>s.length()) // boundary case
return;
//cut
ans.push_back(s.substr(previouscut,len));
solve(previouscut + len,1);
ans.pop_back(); //backtrack
// no cut
solve(previouscut, len+1);
}
int main()
{
cin >> s;
solve(0,1);
return 0;
}
https://www.geeksforgeeks.org/substring-in-cpp/#

Bubble sort pseudo code what does n-1 mean?

I have a question about a specific line in the bubble sort pseudo code.
This pseudocode is taken from wikipedia:
procedure bubbleSort( A : list of sortable items )
n = length(A)
repeat
swapped = false
for i = 1 to n-1 inclusive do //THIS IS THE LINE I DON'T UNDERSTAND
/* if this pair is out of order */
if A[i-1] > A[i] then
/* swap them and remember something changed */
swap( A[i-1], A[i] )
swapped = true
end if
end for
until not swapped
end procedure
I do not understand the for loop's condition (1 to n-1). I clearly have to run through all elements from the second element at index 1 to the last element for the algorithm to work.
But when I read the term n-1 I see it as the last element minus 1, which will skip the last element. So I guess my question is, what does n-1 really mean in this context?
If n is the count of elements. The highest index is n-1.
This line iterates from the index 1 to the highest index n-1.
The first element has an index of 0. This code does not start there because of what it does inside the loop. pay attention to the i-1 part.
To give you an example of what that pseudocode does:
`A ={'C', 'E', 'B', 'D', 'A'}`
`n` = `5`
inner_loop for i => 1, 2, 3, 4
i = 1
if(A[0] > A[1]) => false
i = 2
if(A[1] > A[2]) => true
swap(A[1] , A[2]) => A ={'C', 'B', 'E', 'D', 'A'}
swapped = true
i = 3
if(A[2] > A[3]) => false
i = 4
if(A[3] > A[4]) => true
swap(A[3] , A[4]) => A ={'C', 'B', 'E', 'A', 'D'}
swapped = true
In a senses this code does not run through the elements but rather trough the comparisson of adjacent elements.
n-1 does not mean the second-to-last element. It means the last element.
Here's why: Usually in programming, lists are zero-indexed, meaning the numbering starts at zero and goes to n-1 where n is the length of the list. The loop starts at i = 1 which is actually the second element (since later you have to compare A[i] to A[i-1]—that's the first element).
Since most programming languages start with index 0, you'll only want to compare from array index 0 to array index n-1 for an array of size n. If you continue to n, you'll be comparing outside of the array in the line:
if A[i-1] > A[i]
Hope this helps.
That is written in pseudo-code, so we don't know for sure how that "language" implements array indexing, but it seems that it is 0-indexed. Which means that if length(A) = n = 5 the elements are numbered from 0 through 4 (i.e. A[0] is how you access the first element A[4] is how you access the last one).
the sorting is occurring till n-1 because the last element will automatically be sorted during the last iteration i.e the nth iteration in case of bubblesort

How do you find the largest gap in a vector in O(n) time?

You are given the locations of various cars in the same lane on a highway as doubles to a vector, in no particular order. How can you find the largest gap between neighboring cars in O(n) time?
It seems like a simple solution would be to sort then check, but of course this isn't linear.
Divide the vector in n+1 equally sized buckets. For each such buckets, store the maximum and the minimum value, all other values can be discarded. Because of the pigeonhole principle, at least one of those parts is empty, so the non-minimum/non-maximum values in either parts don't have an influence for the result.
Then, go over the buckets and calculate the distance to the next and the previous non-empty bucket, and take the maximum; this is the final result.
An example with n=5 and values 5,2,20,17,3. Minimum is 2, maximum is 20 => bucket size is (20-2)/5 = 4.
Bucket: 2 6 10 14 18 20
Min/Max: 2-5 - - 17,17 20,20
Differences: 2-5, 5-17, 17-20.
Maximum is 5-17.
My Python implementation of ipc's solution:
def maximum_gap(l):
n = len(l)
if n < 2:
return 0
(x_min, x_max) = (min(l), max(l))
if x_min == x_max:
return 0
buckets = [None] * (n + 1)
bucket_size = float(x_max - x_min) / n
for x in l:
k = int((x - x_min) / bucket_size)
if buckets[k] is None:
buckets[k] = (x, x)
else:
buckets[k] = (min(x, buckets[k][0]), max(x, buckets[k][1]))
result = 0
for i in range(n):
if buckets[i + 1] is None:
buckets[i + 1] = buckets[i]
else:
result = max(result, buckets[i + 1][0] - buckets[i][1])
return result
assert maximum_gap([]) == 0
assert maximum_gap([42]) == 0
assert maximum_gap([1, 1, 1, 1]) == 0
assert maximum_gap([1, 2, 3, 4, 6, 8]) == 2
assert maximum_gap([5, 2, 20, 17, 3]) == 12
I use a tuple for bucket's elements, None if empty. In the last part, I eliminate preemptively any remaining empty bucket by assigning it to the previous one (this works, since the first one is guaranteed to be non-empty).
Note the special case when all elements are equal.

How to generate a list of subsets with restrictions?

I am trying to figure out an efficient algorithm to take a list of items and generate all unique subsets that result from splitting the list into exactly 2 sublists. I'm sure there is a general purpose way to do this, but I'm interested in a specific case. My list will be sorted, and there can be duplicate items.
Some examples:
Input
{1,2,3}
Output
{{1},{2,3}}
{{2},{1,3}}
{{3},{1,2}}
Input
{1,2,3,4}
Output
{{1},{2,3,4}}
{{2},{1,3,4}}
{{3},{1,2,4}}
{{4},{1,2,3}}
{{1,2},{3,4}}
{{1,3},{2,4}}
{{1,4},{2,3}}
Input
{1,2,2,3}
Output
{{1},{2,2,3}}
{{2},{1,2,3}}
{{3},{1,2,2}}
{{1,2},{2,3}}
{{1,3},{2,2}}
I can do this on paper, but I'm struggling to figure out a simple way to do it programmatically. I'm only looking for a quick pseudocode description of how to do this, not any specific code examples.
Any help is appreciated. Thanks.
If you were generating all subsets you would end up generating 2n subsets for a list of length n. A common way to do this is to iterate through all the numbers i from 0 to 2n-1 and use the bits that are set in i to determine which items are in the ith subset. This works because any item either is or is not present in any particular subset, so by iterating through all the combinations of n bits you iterate through the 2n subsets.
For example, to generate the subsets of (1, 2, 3) you would iterate through the numbers 0 to 7:
0 = 000b → ()
1 = 001b → (1)
2 = 010b → (2)
3 = 011b → (1, 2)
4 = 100b → (3)
5 = 101b → (1, 3)
6 = 110b → (2, 3)
7 = 111b → (1, 2, 3)
In your problem you can generate each subset and its complement to get your pair of mutually exclusive subsets. Each pair would be repeated when you do this so you only need to iterate up to 2n-1 - 1 and then stop.
1 = 001b → (1) + (2, 3)
2 = 010b → (2) + (1, 3)
3 = 011b → (1, 2) + (3)
To deal with duplicate items you could generate subsets of list indices instead of subsets of list items. Like with the list (1, 2, 2, 3) generate subsets of the list (0, 1, 2, 3) instead and then use those numbers as indices into the (1, 2, 2, 3) list. Add a level of indirection, basically.
Here's some Python code putting this all together.
#!/usr/bin/env python
def split_subsets(items):
subsets = set()
for n in xrange(1, 2 ** len(items) / 2):
# Use ith index if ith bit of n is set.
l_indices = [i for i in xrange(0, len(items)) if n & (1 << i) != 0]
# Use the indices NOT present in l_indices.
r_indices = [i for i in xrange(0, len(items)) if i not in l_indices]
# Get the items corresponding to the indices above.
l = tuple(items[i] for i in l_indices)
r = tuple(items[i] for i in r_indices)
# Swap l and r if they are reversed.
if (len(l), l) > (len(r), r):
l, r = r, l
subsets.add((l, r))
# Sort the subset pairs so the left items are in ascending order.
return sorted(subsets, key = lambda (l, r): (len(l), l))
for l, r in split_subsets([1, 2, 2, 3]):
print l, r
Output:
(1,) (2, 2, 3)
(2,) (1, 2, 3)
(3,) (1, 2, 2)
(1, 2) (2, 3)
(1, 3) (2, 2)
The following C++ function does exactly what you need, but the order differs from the one in examples:
// input contains all input number with duplicates allowed
void generate(std::vector<int> input) {
typedef std::map<int,int> Map;
std::map<int,int> mp;
for (size_t i = 0; i < input.size(); ++i) {
mp[input[i]]++;
}
std::vector<int> numbers;
std::vector<int> mult;
for (Map::iterator it = mp.begin(); it != mp.end(); ++it) {
numbers.push_back(it->first);
mult.push_back(it->second);
}
std::vector<int> cur(mult.size());
for (;;) {
size_t i = 0;
while (i < cur.size() && cur[i] == mult[i]) cur[i++] = 0;
if (i == cur.size()) break;
cur[i]++;
std::vector<int> list1, list2;
for (size_t i = 0; i < cur.size(); ++i) {
list1.insert(list1.end(), cur[i], numbers[i]);
list2.insert(list2.end(), mult[i] - cur[i], numbers[i]);
}
if (list1.size() == 0 || list2.size() == 0) continue;
if (list1 > list2) continue;
std::cout << "{{";
for (size_t i = 0; i < list1.size(); ++i) {
if (i > 0) std::cout << ",";
std::cout << list1[i];
}
std::cout << "},{";
for (size_t i = 0; i < list2.size(); ++i) {
if (i > 0) std::cout << ",";
std::cout << list2[i];
}
std::cout << "}\n";
}
}
A bit of Erlang code, the problem is that it generates duplicates when you have duplicate elements, so the result list still needs to be filtered...
do([E,F]) -> [{[E], [F]}];
do([H|T]) -> lists:flatten([{[H], T}] ++
[[{[H|L1],L2},{L1, [H|L2]}] || {L1,L2} <- all(T)]).
filtered(L) ->
lists:usort([case length(L1) < length(L2) of true -> {L1,L2};
false -> {L2,L1} end
|| {L1,L2} <- do(L)]).
in pseudocode this means that:
for a two long list {E,F} the result is {{E},{F}}
for longer lists take the first element H and the rest of the list T and return
{{H},{T}} (the first element as a single element list, and the remaining list)
also run the algorithm recursively for T, and for each {L1,L2} element in the resulting list return {{H,L1},{L2}} and {{L1},{H,L2}}
My suggestion is...
First, count how many of each value you have, possibly in a hashtable. Then calculate the total number of combinations to consider - the product of the counts.
Iterate through that number of combinations.
At each combination, copy your loop count (as x), then start an inner loop through your hashtable items.
For each hashtable item, use (x modulo count) as your number of instances of the hashtable key in the first list. Divide x by the count before repeating the inner loop.
If you are worried that the number of combinations might overflow your integer type, the issue is avoidable. Use an array with each item (one for every hashmap key) starting from zero, and 'count' through the combinations treating each array item as a digit (so the whole array represents the combination number), but with each 'digit' having a different base (the corresponding count). That is, to 'increment' the array, first increment item 0. If it overflows (becomes equal to its count), set it to zero and increment the next array item. Repeat the overflow checks until If overflows continue past the end of the array, you have finished.
I think sergdev is using a very similar approach to this second one, but using std::map rather than a hashtable (std::unordered_map should work). A hashtable should be faster for large numbers of items, but won't give you the values in any particular order. The ordering for each loop through the keys in a hashtable should be consistent, though, unless you add/remove keys.

Resources