Related
Given two integers n and r, I want to generate all possible combinations with the following rules:
There are n distinct numbers to choose from, 1, 2, ..., n;
Each combination should have r elements;
A combination may contain more than one of an element, for instance (1,2,2) is valid;
Order matters, i.e. (1,2,3) and (1,3,2) are considered distinct;
However, two combinations are considered equivalent if one is a cyclic permutation of the other; for instance, (1,2,3) and (2,3,1) are considered duplicates.
Examples:
n=3, r=2
11 distinct combinations
(1,1,1), (1,1,2), (1,1,3), (1,2,2), (1,2,3), (1,3,2), (1,3,3), (2,2,2), (2,2,3), (2,3,3) and (3,3,3)
n=2, r=4
6 distinct combinations
(1,1,1,1), (1,1,1,2), (1,1,2,2), (1,2,1,2), (1,2,2,2), (2,2,2,2)
What is the algorithm for it? And how to implement it in c++?
Thank you in advance for advice.
Here is a naive solution in python:
Generate all combinations from the Cartesian product of {1, 2, ...,n} with itself r times;
Only keep one representative combination for each equivalency class; drop all other combinations that are equivalent to this representative combination.
This means we must have some way to compare combinations, and for instance, only keep the smallest combination of every equivalency class.
from itertools import product
def is_representative(comb):
return all(comb[i:] + comb[:i] >= comb
for i in range(1, len(comb)))
def cartesian_product_up_to_cyclic_permutations(n, r):
return filter(is_representative,
product(range(n), repeat=r))
print(list(cartesian_product_up_to_cyclic_permutations(3, 3)))
# [(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 1, 1), (0, 1, 2), (0, 2, 1), (0, 2, 2), (1, 1, 1), (1, 1, 2), (1, 2, 2), (2, 2, 2)]
print(list(cartesian_product_up_to_cyclic_permutations(2, 4)))
# [(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 1), (0, 1, 0, 1), (0, 1, 1, 1), (1, 1, 1, 1)]
You mentioned that you wanted to implement the algorithm in C++. The product function in the python code behaves just like a big for-loop that generates all the combinations in the Cartesian product. See this related question to implement Cartesian product in C++: Is it possible to execute n number of nested "loops(any)" where n is given?.
So I'm trying to sort data in this format...
[((0, 4), 3), ((4, 0), 3), ((1, 6), 1), ((3, 2), 3), ((0, 5), 1)...
Ascending by key and then descending by value. I'm able to achieve this via...
test = test.sortBy(lambda x: (x[0], -x[1]))
which would give me based on shortened version above...
[((0, 4), 3), ((0, 5), 1), ((1, 6), 1), ((3, 2), 3), ((4, 0), 3)...
The problem I'm having is that after the sorting I no longer want the value but do need to retain the sort after grouping the data. So...
test = test.map(lambda x: (x[0][0],x[0][1]))
Gives me...
[(0, 4), (0, 5), (1, 6), (3, 2), (4, 0)...
Which is still in the order I need it but I need the elements to be grouped up by key. I then use this command...
test = test.groupByKey().map(lambda x: (x[0], list(x[1])))
But in the process I lose the sorting. Is there any way retain?
I managed to retain the order by changing the format of the tuple...
test = test.map(lambda x: (x[0][0],(x[0][1],x[1]))
test = test.groupByKey().map(lambda x: (x[0], sorted(list(x[1]), key=lambda x: (x[0],-x[1]))))
[(0, [(4, 3), (5, 1)] ...
which leaves me with the value (2nd element in the tuple) that I want to get rid of but took care of that too...
test = test.map(lambda x: (x[0], [e[0] for e in x[1]]))
Feels a bit hacky but not sure how else it could be done.
So if I had the numbers [1,2,2,3] and I want k=2 partitions I'd have [1][2,2,3], [1,2][2,3], [2,2][1,3], [2][1,2,3], [3][1,2,2], etc.
See an answer in Python at Code Review.
user3569's solution at Code Review produces five 2-tuples for the test case below, instead of exclusively 3-tuples. However, removing the frozenset() call for the returned tuples leads to the code returning exclusively 3-tuples. The revised code is as follows:
from itertools import chain, combinations
def subsets(arr):
""" Note this only returns non empty subsets of arr"""
return chain(*[combinations(arr,i + 1) for i,a in enumerate(arr)])
def k_subset(arr, k):
s_arr = sorted(arr)
return set([i for i in combinations(subsets(arr),k)
if sorted(chain(*i)) == s_arr])
s = k_subset([2,2,2,2,3,3,5],3)
for ss in sorted(s):
print(len(ss)," - ",ss)
As user3569 says "it runs pretty slow, but is fairly concise".
(EDIT: see below for Knuth's solution)
The output is:
3 - ((2,), (2,), (2, 2, 3, 3, 5))
3 - ((2,), (2, 2), (2, 3, 3, 5))
3 - ((2,), (2, 2, 2), (3, 3, 5))
3 - ((2,), (2, 2, 3), (2, 3, 5))
3 - ((2,), (2, 2, 5), (2, 3, 3))
3 - ((2,), (2, 3), (2, 2, 3, 5))
3 - ((2,), (2, 3, 3), (2, 2, 5))
3 - ((2,), (2, 3, 5), (2, 2, 3))
3 - ((2,), (2, 5), (2, 2, 3, 3))
3 - ((2,), (3,), (2, 2, 2, 3, 5))
3 - ((2,), (3, 3), (2, 2, 2, 5))
3 - ((2,), (3, 5), (2, 2, 2, 3))
3 - ((2,), (5,), (2, 2, 2, 3, 3))
3 - ((2, 2), (2, 2), (3, 3, 5))
3 - ((2, 2), (2, 3), (2, 3, 5))
3 - ((2, 2), (2, 5), (2, 3, 3))
3 - ((2, 2), (3, 3), (2, 2, 5))
3 - ((2, 2), (3, 5), (2, 2, 3))
3 - ((2, 3), (2, 2), (2, 3, 5))
3 - ((2, 3), (2, 3), (2, 2, 5))
3 - ((2, 3), (2, 5), (2, 2, 3))
3 - ((2, 3), (3, 5), (2, 2, 2))
3 - ((2, 5), (2, 2), (2, 3, 3))
3 - ((2, 5), (2, 3), (2, 2, 3))
3 - ((2, 5), (3, 3), (2, 2, 2))
3 - ((3,), (2, 2), (2, 2, 3, 5))
3 - ((3,), (2, 2, 2), (2, 3, 5))
3 - ((3,), (2, 2, 3), (2, 2, 5))
3 - ((3,), (2, 2, 5), (2, 2, 3))
3 - ((3,), (2, 3), (2, 2, 2, 5))
3 - ((3,), (2, 3, 5), (2, 2, 2))
3 - ((3,), (2, 5), (2, 2, 2, 3))
3 - ((3,), (3,), (2, 2, 2, 2, 5))
3 - ((3,), (3, 5), (2, 2, 2, 2))
3 - ((3,), (5,), (2, 2, 2, 2, 3))
3 - ((5,), (2, 2), (2, 2, 3, 3))
3 - ((5,), (2, 2, 2), (2, 3, 3))
3 - ((5,), (2, 2, 3), (2, 2, 3))
3 - ((5,), (2, 3), (2, 2, 2, 3))
3 - ((5,), (2, 3, 3), (2, 2, 2))
3 - ((5,), (3, 3), (2, 2, 2, 2))
Knuth's solution, as implemented by Adeel Zafar Soomro on the same Code Review page can be called as follows if no duplicates are desired:
s = algorithm_u([2,2,2,2,3,3,5],3)
ss = set(tuple(sorted(tuple(tuple(y) for y in x) for x in s)))
I haven't timed it, but Knuth's solution is visibly faster, even for this test case.
However, it returns 63 tuples rather than the 41 returned by user3569's solution. I haven't yet gone through the output closely enough to establish which output is correct.
Here's a version in Haskell:
import Data.List (nub, sort, permutations)
parts 0 = []
parts n = nub $ map sort $ [n] : [x:xs | x <- [1..n`div`2], xs <- parts(n - x)]
partition [] ys result = sort $ map sort result
partition (x:xs) ys result =
partition xs (drop x ys) (result ++ [take x ys])
partitions xs k =
let variations = filter (\x -> length x == k) $ parts (length xs)
in nub $ concat $ map (\x -> mapVariation x (nub $ permutations xs)) variations
where mapVariation variation = map (\x -> partition variation x [])
OUTPUT:
*Main> partitions [1,2,2,3] 2
[[[1],[2,2,3]],[[1,2,3],[2]],[[1,2,2],[3]],[[1,2],[2,3]],[[1,3],[2,2]]]
Python solution:
pip install PartitionSets
Then:
import partitionsets.partition
filter(lambda x: len(x) == k, partitionsets.partition.Partition(arr))
The PartitionSets implementation seems to be pretty fast however it's a pity you can't pass number of partitions as an argument, so you need to filter your k-set partitions from all subset partitions.
You may also want to look at:
similar topic on researchgate.
I encountered and solved this problem as part of a larger algorithm, but my solution seems inelegant and I would appreciate any insights.
I have a list of pairs which can be viewed as points on a Cartesian plane. I need to generate three lists: the sorted x values, the sorted y values, and a list which maps an index in the sorted x values with the index in the sorted y values corresponding to the y value with which it was originally paired.
A concrete example might help explain. Given the following list of points:
((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
The sorted list of x values would be (3, 4, 5, 7, 9, 15), and the sorted list of y values would be (0, 4, 7, 7, 11, 12).
Assuming a zero based indexing scheme, the list that maps the x list index to the index of its paired y list index would be (2, 3, 0, 4, 5, 1).
For example the value 7 appears as index 3 in the x list. The value in the mapping list at index 3 is 4, and the value at index 4 in the y list is 11, corresponding to the original pairing (7, 11).
What is the simplest way of generating this mapping list?
Here's a simple O(nlog n) method:
Sort the pairs by their x value: ((3, 7), (4, 7), (5, 0), (7, 11), (9, 12), (15, 4))
Produce a list of pairs in which the first component is the y value from the same position in the previous list and the second increases from 0: ((7, 0), (7, 1), (0, 2), (11, 3), (12, 4), (4, 5))
Sort this list by its first component (y value): ((0, 2), (4, 5), (7, 0), (7, 1), (11, 3), (12, 4))
Iterate through this list. For the ith such pair (y, k), set yFor[k] = i. yFor[] is your list (well, array) mapping indices in the sorted x list to indices in the sorted y list.
Create the sorted x list simply by removing the 2nd element from the list produced in step 1.
Create the sorted y list by doing the same with the list produced in step 3.
I propose the following.
Generate the unsorted x and y lists.
xs = [3, 15, 7, 5, 4, 9 ]
ys = [7, 4, 11, 0, 7, 12]
Transform each element into a tuple - the first of the pair being the coordinate, the second being the original index.
xs = [(3, 0), (15, 1), ( 7, 2), (5, 3), (4, 4), ( 9, 5)]
ys = [(7, 0), ( 4, 1), (11, 2), (0, 3), (7, 4), (12, 5)]
Sort both lists.
xs = [(3, 0), (4, 4), (5, 3), (7, 2), ( 9, 5), (15, 1)]
ys = [(0, 3), (4, 1), (7, 0), (7, 4), (11, 2), (12, 5)]
Create an array, y_positions. The nth element of the array contains the current index of the y element that was originally at index n.
Create an empty index_list.
For each element of xs, get the original_index, the second pair of the tuple.
Use y_positions to retrieve the current index of the y element with the given original_index. Add the current index to index_list.
Finally, remove the index values from xs and ys.
Here's a sample Python implementation.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
#generate unsorted lists
xs, ys = zip(*points)
#pair each element with its index
xs = zip(xs, range(len(xs)))
ys = zip(ys, range(len(xs)))
#sort
xs.sort()
ys.sort()
#generate the y positions list.
y_positions = [None] * len(ys)
for i in range(len(ys)):
original_index = ys[i][1]
y_positions[original_index] = i
#generate `index_list`
index_list = []
for x, original_index in xs:
index_list.append(y_positions[original_index])
#remove tuples from x and y lists
xs = zip(*xs)[0]
ys = zip(*ys)[0]
print "xs:", xs
print "ys:", ys
print "index list:", index_list
Output:
xs: (3, 4, 5, 7, 9, 15)
ys: (0, 4, 7, 7, 11, 12)
index list: [2, 3, 0, 4, 5, 1]
Generation of y_positions and index_list is O(n) time, so the complexity of the algorithm as a whole is dominated by the sorting step.
Thank you for the answers. For what it's worth, the solution I had was pretty similar to those outlined, but as j_random_hacker pointed out, there's no need for a map. It just struck me that this little problem seems more complicated than it appears at first glance and I was wondering if I was missing something obvious. I've rehashed my solution into Python for comparison.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
N = len(points)
# Separate the points into their x and y components, tag the values with
# their index into the points list.
# Sort both resulting (value, tag) lists and then unzip them into lists of
# sorted x and y values and the tag information.
xs, s = zip(*sorted(zip([x for (x, y) in points], range(N))))
ys, r = zip(*sorted(zip([y for (x, y) in points], range(N))))
# Generate the mapping list.
t = N * [0]
for i in range(N):
t[r[i]] = i
index_list = [t[j] for j in s]
print "xs:", xs
print "ys:", ys
print "index_list:", index_list
Output:
xs: (3, 4, 5, 7, 9, 15)
ys: (0, 4, 7, 7, 11, 12)
index_list: [2, 3, 0, 4, 5, 1]
I've just understood what j_random_hacker meant by removing a level of indirection by sorting the points in x initially. That allows things to be tidied up nicely. Thanks.
points = ((3, 7), (15, 4), (7, 11), (5, 0), (4, 7), (9, 12))
N = len(points)
ordered_by_x = sorted(points)
ordered_by_y = sorted(zip([y for (x, y) in ordered_by_x], range(N)))
index_list = N * [0]
for i, (y, k) in enumerate(ordered_by_y):
index_list[k] = i
xs = [x for (x, y) in ordered_by_x]
ys = [y for (y, k) in ordered_by_y]
print "xs:", xs
print "ys:", ys
print "index_list:", index_list
Say, we have an N-dimensional grid and some point X in it with coordinates (x1, x2, ..., xN).
For simplicity we can assume that the grid is unbounded.
Let there be a radius R and a sphere of this radius with center in X, that is the set of all points in grid such that their manhattan distance from X is equal to R.
I suspect that their will be 2*N*R such points.
My question is: how do I enumerate them in efficient and simple way? By "enumerate" I mean the algorithm, which, given N, X and R will produce the list of points which form this sphere (where point is the list of it's coordinates).
UPDATE: Initially I called the metric I used "Hamming distance" by mistake. My apologies to all who answered the question. Thanks to Steve Jessop for pointing this out.
Consider the minimal axis-aligned hypercube that bounds the hypersphere and write a procedure to enumerate the grid points inside the hypercube.
Then you only need a simple filter function that allows you to discard the points that are on the cube but not in the hypersphere.
This is a simple and efficient solution for small dimensions. For instance, for 2D, 20% of the points enumerated for the bounding square are discarded; for 6D, almost 90% of the hypercube points are discarded.
For higher dimensions, you will have to use a more complex approach: loop over every dimension (you may need to use a recursive function if the number of dimensions is variable). For every loop you will have to adjust the minimal and maximal values depending on the values of the already calculated grid components. Well, try doing it for 2D, enumerating the points of a circle and once you understand it, generalizing the procedure to higher dimensions would be pretty simple.
update: errh, wait a minute, you want to use the Manhattan distance. Calling the cross polytope "sphere" may be correct but I found it quite confusing! In any case you can use the same approach.
If you only want to enumerate the points on the hyper-surface of the cross polytope, well, the solution is also very similar, you have to loop over every dimension with appropriate limits. For instance:
for (i = 0; i <= n; i++)
for (j = 0; j + i <= n; j++)
...
for (l = 0; l + ...+ j + i <= n; l++) {
m = n - l - ... - j - i;
printf(pat, i, j, ..., l, m);
}
For every point generated that way, then you will have to consider all the variations resulting of negating any of the components to cover all the faces and then displace them with the vector X.
update
Perl implementation for the case where X = 0:
#!/usr/bin/perl
use strict;
use warnings;
sub enumerate {
my ($d, $r) = #_;
if ($d == 1) {
return ($r ? ([-$r], [$r]) : [0])
}
else {
my #r;
for my $i (0..$r) {
for my $s (enumerate($d - 1, $r - $i)) {
for my $j ($i ? (-$i, $i) : 0) {
push #r, [#$s, $j]
}
}
}
return #r;
}
}
#ARGV == 2 or die "Usage:\n $0 dimension radius\n\n";
my ($d, $r) = #ARGV;
my #r = enumerate($d, $r);
print "[", join(',', #$_), "]\n" for #r;
Input: radius R, dimension D
Generate all integer partitions of R with cardinality ≤ D
For each partition, permute it without repetition
For each permutation, twiddle all the signs
For example, code in python:
from itertools import *
# we have to write this function ourselves because python doesn't have it...
def partitions(n, maxSize):
if n==0:
yield []
else:
for p in partitions(n-1, maxSize):
if len(p)<maxSize:
yield [1] + p
if p and (len(p)<2 or p[1]>p[0]):
yield [ p[0]+1 ] + p[1:]
# MAIN CODE
def points(R, D):
for part in partitions(R,D): # e.g. 4->[3,1]
part = part + [0]*(D-len(part)) # e.g. [3,1]->[3,1,0] (padding)
for perm in set(permutations(part)): # e.g. [1,3,0], [1,0,3], ...
for point in product(*[ # e.g. [1,3,0], [-1,3,0], [1,-3,0], [-...
([-x,x] if x!=0 else [0]) for x in perm
]):
yield point
Demo for radius=4, dimension=3:
>>> result = list( points(4,3) )
>>> result
[(-1, -2, -1), (-1, -2, 1), (-1, 2, -1), (-1, 2, 1), (1, -2, -1), (1, -2, 1), (1, 2, -1), (1, 2, 1), (-2, -1, -1), (-2, -1, 1), (-2, 1, -1), (-2, 1, 1), (2, -1, -1), (2, -1, 1), (2, 1, -1), (2, 1, 1), (-1, -1, -2), (-1, -1, 2), (-1, 1, -2), (-1, 1, 2), (1, -1, -2), (1, -1, 2), (1, 1, -2), (1, 1, 2), (0, -2, -2), (0, -2, 2), (0, 2, -2), (0, 2, 2), (-2, 0, -2), (-2, 0, 2), (2, 0, -2), (2, 0, 2), (-2, -2, 0), (-2, 2, 0), (2, -2, 0), (2, 2, 0), (-1, 0, -3), (-1, 0, 3), (1, 0, -3), (1, 0, 3), (-3, -1, 0), (-3, 1, 0), (3, -1, 0), (3, 1, 0), (0, -1, -3), (0, -1, 3), (0, 1, -3), (0, 1, 3), (-1, -3, 0), (-1, 3, 0), (1, -3, 0), (1, 3, 0), (-3, 0, -1), (-3, 0, 1), (3, 0, -1), (3, 0, 1), (0, -3, -1), (0, -3, 1), (0, 3, -1), (0, 3, 1), (0, -4, 0), (0, 4, 0), (0, 0, -4), (0, 0, 4), (-4, 0, 0), (4, 0, 0)]
>>> len(result)
66
(Above I used set(permutations(...)) to get permutations without repetition, which is not efficient in general, but it might not matter here due to the nature of the points. And if efficiency mattered, you could write your own recursive function in your language of choice.)
This method is efficient because it does not scale with the hypervolume, but just scales with the hypersurface, which is what you're trying to enumerate (might not matter much except for very large radii: e.g. will save you roughly a factor of 100x speed if your radius is 100).
You can work your way recursively from the center, counting zero distance once and working on symmetries. This Python implementation works on the lower-dimension "stem" vector and realizes one 1-dimensional slice at a time. One might also do the reverse, but it would imply iterating on the partial hyperspheres. While mathematically the same, the efficiency of both approaches is heavily language-dependent.
If you know beforehand the cardinality of the target space, I would recommend to write an iterative implementation.
The following enumerates the points on a R=16 hyper-LEGO block in six dimensions in about 200 ms on my laptop. Of course, performance rapidly decreases with more dimensions or larger spheres.
def lapp(lst, el):
lst2 = list(lst)
lst2.append(el)
return lst2
def hypersphere(n, r, stem = [ ]):
mystem = lapp(stem, 0)
if 1 == n:
ret = [ mystem ]
for d in range(1, r+1):
ret.append(lapp(stem, d))
ret.append(lapp(stem, -d))
else:
ret = hypersphere(n-1, r, mystem)
for d in range(1, r+1):
mystem[-1] = d
ret.extend(hypersphere(n-1, r-d, mystem))
mystem[-1] = -d
ret.extend(hypersphere(n-1, r-d, mystem))
return ret
(This implementation assumes the hypersphere is centered in the origin. It would be easier to translate all points afterwards than carrying along the coordinates of the center).