Help me find a good algorithm?
I have a bag full of n balls. (Let's say 28 balls for an example.)
The balls in this bag each have 1 color. There are <= 4 different colors of balls in the bag. (Let's say red, green, blue, and purple are possibilities.)
I have three buckets, each with a number of how many balls they need to end up with. These numbers total n. (For example, let's say bucket A needs to end up with 7 balls, bucket B needs to end up with 11 balls, bucket C needs to end up with 10 balls.)
The buckets also may or may not have color restrictions - colors that they will not accept. (Bucket A doesn't accept purple balls or green balls. Bucket B doesn't accept red balls. Bucket C doesn't accept purple balls or blue balls.)
I need to distribute the balls efficiently and randomly (equal probability of all possibilities).
I can't just randomly put balls in buckets that have space to accept them, because that could bring me to a situation where the only bucket that has space left in it does not accept the only color that is left in the bag.
It is given that there is always at least 1 possibility for distributing the balls. (I will not have a bag of only red balls and some bucket with number > 0 doesn't accept red balls.)
All of the balls are considered distinct, even if they are the same color. (One of the possibilities where bucket C gets red ball 1 and not red ball 2 is different from the possibility where everything is the same except bucket C gets red ball 2 and not red ball 1.)
Edit to add my idea:
I don't know if this has equal probability of all possibilities, as I would like. I haven't figured out the efficiency - It doesn't seem too bad.
And this contains an assertion that I'm not sure if it's always true.
Please comment on any of these things if you know.
Choose a ball from the bag at random. (Call it "this ball".)
If this ball fits and is allowed in a number of buckets > 0:
Choose one of those buckets at random and put this ball in that bucket.
else (this ball is not allowed in any bucket that it fits in):
Make a list of colors that can go in buckets that are not full.
Make a list of balls of those colors that are in full buckets that this ball is allowed in.
If that 2nd list is length 0 (There are no balls of colors from the 1st list in the bucket that allows the color of this ball):
ASSERT: (Please show me an example situation where this might not be the case.)
There is a 3rd bucket that is not involved in the previously used buckets in this algorithm.
(One bucket is full and is the only one that allows this ball.
A second bucket is the only one not full and doesn't allow this ball or any ball in the first bucket.
The 3rd bucket is full must allow some color that is in the first bucket and must have some ball that is allowed in the second bucket.)
Choose, at random, a ball from the 3rd bucket balls of colors that fit in the 2nd bucket, and move that ball to the 2nd bucket.
Choose, at random, a ball from the 1st bucket balls of colors that fit in the 3rd bucket, and move that ball to the 3rd bucket.
Put "this ball" (finally) in the 1st bucket.
else:
Choose a ball randomly from that list, and move it to a random bucket that is not full.
Put "this ball" in a bucket that allows it.
Next ball.
Here's an O(n^3)-time algorithm. (The 3 comes from the number of buckets.)
We start by sketching a brute-force enumeration algorithm, then extract an efficient counting algorithm, then show how to sample.
We enumerate with an algorithm that has two nested loops. The outer loop iterates through the balls. The color of each ball does not matter; only that it can be placed in certain buckets but not others. At the beginning of each outer iteration, we have a list of partial solutions (assignments of the balls considered so far to buckets). The inner loop is over partial solutions; we add several partial solutions to a new list by extending the assignment in all valid ways. (The initial list has one element, the empty assignment.)
To count solutions more efficiently, we apply a technique called dynamic programming or run-length encoding depending on how you look at it. If two partial solutions have the same counts in each bucket (O(n^3) possibilities over the life of the algorithm), then all valid extensions of one are valid extensions of the other and vice versa. We can annotate the list elements with a count and discard all but one representative of each "equivalence class" of partial solutions.
Finally, to get a random sample, instead of choosing the representative arbitrarily, when we are combining two list entries, we sample the representative from each side proportionally to that side's count.
Working Python code (O(n^4) for simplicity; there are data structural improvements possible).
#!/usr/bin/env python3
import collections
import random
def make_key(buckets, bucket_sizes):
return tuple(bucket_sizes[bucket] for bucket in buckets)
def sample(balls, final_bucket_sizes):
buckets = list(final_bucket_sizes)
partials = {(0,) * len(buckets): (1, [])}
for ball in balls:
next_partials = {}
for count, partial in partials.values():
for bucket in ball:
next_partial = partial + [bucket]
key = make_key(buckets, collections.Counter(next_partial))
if key in next_partials:
existing_count, existing_partial = next_partials[key]
total_count = existing_count + count
next_partials[key] = (total_count, existing_partial if random.randrange(total_count) < existing_count else next_partial)
else:
next_partials[key] = (count, next_partial)
partials = next_partials
return partials[make_key(buckets, final_bucket_sizes)][1]
def test():
red = {'A', 'C'}
green = {'B', 'C'}
blue = {'A', 'B'}
purple = {'B'}
balls = [red] * 8 + [green] * 8 + [blue] * 8 + [purple] * 4
final_bucket_sizes = {'A': 7, 'B': 11, 'C': 10}
return sample(balls, final_bucket_sizes)
if __name__ == '__main__':
print(test())
I am not really sure what is the trade off you want between a random, a correct and an efficient distribution.
If you want a completely random distribution just pick a ball and put it randomly in a bucket it can go in. It would be pretty efficient but you may easily make a bucket overflow.
If you want to be sure to be correct and random you could try to get all distribution possible correct and pick one of it randomly, but it could be very inefficient since the basic brute force algorithm to create all distribution possibility would nearly be at a complexity of NumberOfBucket^NumberOfBalls.
A better algorithm to create all correct case would be to try to build all case verifying your two rules (a bucket B1 can only have N1 balls & a bucket only accept certain colors) color by color. For instance:
//let a distribution D be a tuple N1,...,Nx of the current number of balls each bucket can accept
void DistributeColor(Distribution D, Color C) {
DistributeBucket(D,B1,C);
}
void DistributeBucket(Distribution D, Bucket B, Color C) {
if B.canAccept(C) {
for (int i = 0; i<= min(D[N],C.N); i++) {
//we put i balls of the color C in the bucket B
C.N-=i;
D.N-=i;
if (C.N == 0) {
//we got no more balls of this color
if (isLastColor(C)){
//this was the last color so it is a valid solution
save(D);
} else {
//this was not the last color, try next color
DistributeColor(D,nextColor(C))
}
} else {
//we still got balls
if (isNotLastBucket(B)) {
//This was not the last bucket let's try to fill the next one
DistributeBucket(D, nextBucket(B), C)
} else {
//this was the last bucket, so this distibution is not a solution. Let's do nothing (please don't kill yourself :/ )
}
}
//reset the balls for the next try
C.N+=i;
D.N+=i;
}
}
//it feel like déjà vu
if (isNotLastBucket(B)) {
//This was not the last bucket let's try to fill the next one
DistributeBucket(D, nextBucket(B), C)
} else {
//this was the last bucket, so this distribution is not a solution.
}
}
(This code is pseudo C++ and is not meant to runnable)
1 First you choose 7 between 28: you have C28,7 =1184040 possibilities.
2 Second, you choose 11 between remaining 21: you have C21,11=352716 possibilities.
3 remaining 10 elements are in bucket C.
At each step, if your choice doesnt fit the rules, you stop and do it again.
Everything makes 417629852640 possibilities (without restriction).
It is not very efficient, but for one choice, it doesnt matter a lot. If restrictions are not too restrictive, you do not lose too much time.
If there are very few solutions, you must restrict combinations (only good colors).
In some cases at least, this problem can be solved quite quickly by
first using the constraints to reduce the problem to a more manageable
size, then searching the solution space.
First, note that we can ignore the distinctness of the balls for the
main part of the algorithm. Having found a solution only considering
color, it’s trivial to randomly assign distinct ball numbers per color
by shuffling within each color.
To restate the problem and clarify the notion of equal probability, here
is a naive algorithm that is simple and correct but potentially very
inefficient:
Sort the balls into some random order with uniform probability. Each
of the n! permutations is equally likely. This can be done with
well-known shuffling algorithms.
Assign the balls to buckets in sequence according to capacity. In
other words, using the example buckets, first 7 to A, next 11 to B,
last 10 to C.
Check if the color constraints have been met. If they have not been
met, go back to the beginning; else stop.
This samples from the space of all permutations with equal probability
and filters out the ones that don’t meet the constraints, so uniform
probability is satisfied. However, given even moderately severe
constraints, it might loop many millions of times before finding a
solution. On the other hand, if the problem is not very constrained, it
will find a solution quickly.
We can exploit both of these facts by first examining the constraints
and the number of balls of each color. For example, consider the
following:
A: 7 balls; allowed colors (red, blue)
B: 11 balls; allowed colors (green, blue, purple)
C: 10 balls; allowed colors (red, green)
Balls: 6 red, 6 green, 10 blue, 6 purple
In a trial run with these parameters, the naive algorithm failed to find
a valid solution within 20 million iterations. But now let us reduce the
problem.
Note that all 6 purple balls must go in B, because it’s the only bucket
that can accept them. So the problem reduces to:
Preassigned: 6 purple in B
A: 7 balls; allowed colors (red, blue)
B: 5 balls; allowed colors (green, blue)
C: 10 balls; allowed colors (red, green)
Balls: 6 red, 6 green, 10 blue
C needs 10 balls, and can only take red and green. There are 6 of each.
The possible counts are 4+6, 5+5, 6+4. So we must put at least 4 red and
4 green in C.
Preassigned: 6 purple in B, 4 red in C, 4 green in C
A: 7 balls; allowed colors (red, blue)
B: 5 balls; allowed colors (green, blue)
C: 2 balls; allowed colors (red, green)
Balls: 2 red, 2 green, 10 blue
We have to put 10 blue balls somewhere. C won’t take any. B can take 5
at most; the other 5 must go in A. A can take 7 at most; the other 3
must go in B. Thus, A must take at least 5 blue, and B must take at
least 3 blue.
Preassigned: 6 purple in B, 4 red in C, 4 green in C, 5 blue in A, 3 blue in B
A: 2 balls; allowed colors (red, blue)
B: 2 balls; allowed colors (green, blue)
C: 2 balls; allowed colors (red, green)
Balls: 2 red, 2 green, 2 blue
At this point, the problem is trivial: checking random solutions to the
reduced problem will find a valid solution within a few tries.
For the fully-reduced problem, 80 out of 720 permutations are valid, so
a valid solution will be found with probability 1/9. For the original
problem, out of 28! permutations there are 7! * 11! * 10! * 80 valid
solutions, and the probability of finding one is less than one in five
billion.
Turning the human reasoning used above into a reducing algorithm is more
difficult, and I will only consider it briefly. Generalizing from the
specific cases above:
Are there any balls that will only go into one bucket?
Is there a bucket that must take some minimum number of balls of one
or more colors?
Is there a color that can only go into certain buckets?
If these don’t reduce a specific problem sufficiently, examination of
the problem may yield other reductions that can then be coded.
Finally: will this always work well? It’s hard to be sure, but I suspect
it will, in most cases, because the constraints are what cause the naive
algorithm to fail. If we can use the constraints to reduce the problem
to one where the constraints don’t matter much, the naive algorithm
should find a solution without too much trouble; the number of valid
solutions should be a reasonably large fraction of all the
possibilities.
Afterthought: the same reduction technique would improve the performance
of the other answers here, too, assuming they’re correct.
Related
Given the following polygon, which is divided into sub-polygons as depicted below [left], I would like to create n number of contiguous, equally sized groups of sub-polygons [right, where n=6]. There is no regular pattern to the sub-polygons, though they are guaranteed to be contiguous and without holes.
This is not splitting a polygon into equal shapes, it is grouping its sub-polygons into equal, contiguous groups. The initial polygon may not have a number of sub-polygons divisible by n, and in these cases non-equally sized groups are ok. The only data I have is n, the number of groups to create, and the coordinates of the sub-polygons and their outer shell (generated through a clipping library).
My current algorithm is as follows:
list sub_polygons[] # list of polygon objects
for i in range(n - 1):
# start a new grouping
pick random sub_polygon from list as a starting point
remove this sub_polygon from the list
add this sub_polygon to the current group
while (number of shapes in group < number needed to be added):
add a sub_polygon that the group borders to the group
remove this sub-polygon from the sub-polygons list
add all remaining sub-shapes to the final group
This runs into problems with contiguity, however. The below illustrates the problem - if the red polygon is added to the blue group, it cuts off the green polygon such that it cannot be added to anything else to create a contiguous group.
It's simple to add a check for this when adding a sub-polygon to a group, such as
if removing sub-polygon from list will create non-contiguous union
pass;
but this runs into edge conditions where every possible shape that can be added creates a non-contiguous union of the available sub-polygons. In the below, my current algorithm is trying to add a sub-polygon to the red group, and with the check for contiguity is unable to add any:
Is there a better algorithm for grouping the sub-polygons?
I think it's more complicated to be solved in a single run. Despite the criteria used for selecting next polygon, it may stock somewhere in the middle. So, you need an algorithm that goes back and changes previous decision in such cases. The classic algorithm that does so is BackTracking.
But before starting, let's change the representation of the problem. These polygons form a graph like this:
This is the pseudocode of the algorithm:
function [ selected, stop ] = BackTrack(G, G2, selected, lastGroupLen, groupSize)
if (length(selected) == length(G.Node))
stop = true;
return;
end
stop = false;
if (lastGroupLen==groupSize)
// start a new group
lastGroupLen=0;
end;
// check continuity of remaining part of graph
if (discomp(G2) > length(selected))
return;
end
if (lastGroupLen==0)
available = G.Nodes-selected;
else
available = []
// find all nodes connected to current group
for each node in last lastGroupLen selected nodes
available = union(available, neighbors(G, node));
end
available = available-selected;
end
if (length(available)==0)
return;
end
lastSelected = selected;
for each node in available
[selected, stop] = BackTrack(G, removeEdgesTo(G2, node),
Union(lastSelected, node), lastGroupLen+1, groupSize);
if (stop)
break;
end
end
end
where:
selected: an ordered set of nodes that can be divided to n consecutive groups
stop: becomes true when the solution was found
G: the initial graph
G2: what remains of the graph after removing all edges to last selected node
lastGroupLen: number of nodes selected for last group
groupSize: maximum allowable size of each group
discomp(): returns number of discontinuous components of the graph
removeEdgesTo(): removes all edges connected to a node
That should be called like:
[ selected, stop ] = BackTrack( G, G, [], 0, groupSize);
I hope that is clear enough. It goes like this:
Just keep in mind the performance of this algorithm can be severely affected by order of nodes. One solution to speed it up is to order polygons by their centroids:
But there is another solution, if you are not satisfied with this outcome like myself. You can order the available set of nodes by their degrees in G2, so in each step, nodes that have less chance to make the graph disconnected will be visited first:
And as a more complicated problem, i tested map of Iran that has 262 counties. I set the groupSize to 20:
I think you can just follow the procedure:
Take some contiguous group of sub-polygons lying on the perimeter of the current polygon (if the number of polygons on the perimeter is less than the target size of the group, just take all of them and take whatever more you need from the next perimeter, and repeat until you reach your target group size).
Remove this group and consider the new polygon that consists of the remaining sub-polygons.
Repeat until remaining polygon is empty.
Implementation is up to you but this method should ensure that all formed groups are contiguous and that the remaining polygon formed at step 2 is contiguous.
EDIT:
Never mind, user58697 raises a good point, a counterexample to the algorithm above would be a polygon in the shape of an 8, where one sub-polygon bridges two other polygons.
Question: Given M points on a line separated by 11 unit. Find the number of ways N circles of different radii can be drawn so that they don't intersect or overlap or one inside another?? Provided that the centers of circles should be those MM points.
Example 1: N=3,M=6,r1=1,r2=1,r3=1 Answer: 24 ways.
Example 2: N=2,M=5 ,r1=1,r2=2 Answer: 6 ways.
Example 3: N=1,M=10,r=50. Answer =10 ways.
I found this question online and have not been able to solve it till now. Till now I have been able to only work up this much that any circle can take spaces from n−rn−r to n−2rn−2r. But among other issues how can I adjust for edge cases in which a circle with radius 33 takes n−4n−4th point, now the last point will be left untouched but I cannot place any circle with a radius greater than 1. I am not able to see any generalized mathematical solution to this.
If the center of the circles can be placed on non-integer x and y coordinates, then it is either impossible due to the length being too short, or infinitely many due to have enough length and there are infinitely many translations.
So, since you have to compute the results, I will assume that the coordinates of (M,M) are integer numbers.
If there is a single circle, then the solution is the number of points the circle can be legally placed.
If there are at least two circles, then you need to calculate the sum of the diameters and if that happens to be larger than the total length of the line we are speaking about, then you have no solution. If that is not the case, then you need to subtract the sum of diameters from the total length, getting Complementer. You also have N! permutations to compute the order of the circles. And you will have Complementer - 1 possible locations where you can distribute the gaps between the circles. The lengths of the gaps are G1, ..., Gn-1
We know that G1 + ... + Gn-1 = Complementer
The number of possible distributions of G1, ..., Gn-1 is D. The formula therefore would b
N! * D
The remaining question is: how can we compute D?
Solution:
function distr(depth, maxDepth, amount)
if (depth = maxDepth) then
return 1 //we need to put the remaining elements on the last slot
end if
sum = 1 //if we put amount here, that is a trivial case
for i = amount - 1 to 0 do
sum = distr(depth + 1, maxDepth, amount - i)
end for
return sum
end distr
You need to call distr with depth = 1, maxDepth = N-1, amout = Complementer
I've implemented a SAH kd-tree based upon the paper On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N) by Wald and Havran. Note I haven't done their splicing and merging suggestion right at the end to speed up the building of the tree, just the SAH part.
I'm testing the algorithm with an axis-aligned cube where each face is split into two triangles, so I have N = 12 triangles in total. The bottom-left vertex (i.e. the one nearest the axes) is actually at the origin.
Face Triangle
----------------
Front: 0, 1
Left: 6, 7
Right: 2, 3
Top: 4, 5
Bottom: 10, 11
Back: 8, 9
Assuming node traversal cost Ct = 1.0 and intersection cost Ci = 10.0. We first find the cost of not splitting which is Cns = N * Ci = 12 * 10.0 = 120.0.
Now we go through each axis in turn and do an incremental sweep through the candidate split planes to see if the cost of splitting is cheaper. The first split plane is p = <x,0>. We have Nl = 0, Np = 2 and Nr = 10 (that is, the number of triangles on the left, in the plane, and to the right of the plane). The two triangles in the plane are number 6 and 7 (the left face). All others are to the right.
The SAH(p,V,Nl,Nr,Np) function is now executed. This takes the split plane, the voxel V to be split, and the numbers of left/right/plane triangles. It computes the probability of hitting the left (flat) voxel as Pl = SA(Vl)/SA(V) = 50/150 = 1/3, the right probability as Pr = SA(Vr)/SA(V) = 150/150 = 1. Now it runs the cost function twice; first with the planar triangles on the left, then with the planar triangles on the right to get Cl and Cr respectively.
The cost function C(Pl,Pr,Nl,Nr) returns bias * (Ct + Ci(Pl * Nl + Pr * Nr))
Cl: cost with planar triangles on the left (Nl = 2, Nr = 10)
bias = 1 we aren't biasing as neither left nor right voxel is empty.
Cl = 1 * (1 + 10(1/3 * 2 + 1 * 10)) = 107.666
Cr: cost with planar triangles on the right (Nl = 0, Nr = 12)
bias = 0.8 empty cell bias comes into play.
Cr = 0.8 * (1 + 10(1/3 * 0 + 1 * 12)) = 96.8
The algorithm determines that Cr = 96.8 is better than splitting the two triangles off into a flat cell Cl = 107.666 and also better than not splitting the voxel at all Cns = 120.0. No other candidate splits are found to be cheaper. We therefore split into an empty left child, and a right child containing all the triangles. When we recurse into the right child to continue the tree building, we perform exactly the same steps as above. It's only because of a max depth termination criterion that this doesn't cause a stack overflow. The resultant tree is very deep.
The paper claims to have thought about this sort of thing:
During clipping, special care has to be taken to correctly handle
special cases like “flat” (i.e., zero-volume) cells, or cases where
numerical inaccuracies may occur (e.g., for cells that are very thin
compared to the size of the triangle). For example, we must make sure
not to “clip away” triangles lying in a flat cell. Note that such
cases are not rare exceptions, but are in fact encouraged by the SAH,
as they often produce minimal expected cost.
This local approximation can easily get stuck in a local minimum: As
the local greedy SAH overestimates CV (p), it might stop subdivision
even if the correct cost would have indicated further subdivision. In
particular, the local approximation can lead to premature termination
for voxels that require splitting off flat cells on the sides: many
scenes (in particular, architectural ones) contain geometry in the
form of axis-aligned boxes (a light fixture, a table leg or table top,
. . . ), in which case the sides have to be “shaved off” until the
empty interior is exposed. For wrongly chosen parameters, or when
using cost functions different from the ones we use (in particular,
ones in which a constant cost is added to the leaf estimate), the
recursion can terminated prematurely. Though this pre-mature exit
could also be avoided in a hardcoded way—e.g., only performing the
automatic termination test for non-flat cells—we propose to follow our
formulas, in which case no premature exit will happen.
Am I doing anything wrong? How can I correct this?
Probably a bit late by now :-/, but just in case: I think the thing that's "wrong" here is in the concept of "empty cell" bias: what you want to "encourage" (by giving it a less-than-one bias) is cutting away empty space, but that implies that the cell being cut away actually does have a volume. Apparently, in your implementation you're only checking if one side has zero triangles, and are thus applying the bias even if no space is cut away at all.
Fix: apply the bias only if one side has count==0 and non-zero width.
Lets say I have a collection of blocks. 12 are red, 8 are blue, 5 are yellow and 1 is green. I need to create an algorithm that outputs these objects into a single array with no red blocks next to each other, no blue blocks next to each other, etc. The output should look something like this:
red, blue, red, blue, red, blue, yellow, blue, green, red, yellow, etc.
In my programming experiences so far, I'me come to places where I had to write an algorithm to do this more than once. The last time I did it was about 2 years ago working for a startup. I implemented such an algorithm in python, but the source code is not available. I do remember it took me at least 100 lines to create.
Does this algorithm have a name? I don't want to have to implement it again.
I do not know of a name for this problem. Below is the algorithm I came up with to solve it.
You need to keep track of # of each block remaining.
repeat:
output 1 block of largest color set.
output 1 block from the second largest color set.
the output:
r b r b r b r b r g r b r g r b r g r b r g r b g y
note: before running this algorithm, you need to check to see if the largest color set's size is greater than 1 + the sum of the other color's sizes. If it is, there is no solution.
note: picking the from the second largest set is not required. The second pick in the loop can come from any of the non-largest color sets.
Just off of the top of my head - create a queue that contains all the blocks you want to insert in decreasing quantity (i.e. using the example above the queue would contain 12 reds then 8 blues then 5 yellows then 1 green). Insert an element from the queue into every even index of the array and then every odd index (i.e. insert a red block at index
0,2,4,6,8,10,12,14,16,18,20,22, then insert blues at 24,1,3,5,7,9,11,13 then insert yellows at 15,17,19,21 and insert the green at 23)
Note that for some combinations of blocks this task is impossible - before running the algorithm you have to check that the set of blocks with the highest number has no more blocks than the sum of all the blocks divided by 2
First you need to check if such array exists.
e.g. if you have 4 reds and only 1 blue, then it doesn't exist
So if the number of largest collection is smaller than the sum of all other collections minus 1, then there is no valid solution
Then you just put all items of your largest collection, say red, there as a list.
Between each item of the list, it is a spot you can insert other elements
e.g. _ red _ red _ red _ red _ red _ red ...
Now you can insert other items collection by collection to those spots. The order of the collections doesn't matter.
e.g. blue red blue red blue red blue red yellow red yellow red _ red _ red ..
You need to consume those spots always from left to right (or always from right to left).
Whenever you run out of spots, you start again from left (or right) to insert items to the new spots.
e.g. green blue _ red _ blue _ red _ blue _ red _ blue _ red _ yellow _ red ...
Original Question
If you are given N maximally distant colors (and some associated distance metric), can you come up with a way to sort those colors into some order such that the first M are also reasonably close to being a maximally distinct set?
In other words, given a bunch of distinct colors, come up with an ordering so I can use as many colors as I need starting at the beginning and be reasonably assured that they are all distinct and that nearby colors are also very distinct (e.g., bluish red isn't next to reddish blue).
Randomizing is OK but certainly not optimal.
Clarification: Given some large and visually distinct set of colors (say 256, or 1024), I want to sort them such that when I use the first, say, 16 of them that I get a relatively visually distinct subset of colors. This is equivalent, roughly, to saying I want to sort this list of 1024 so that the closer individual colors are visually, the farther apart they are on the list.
This also sounds to me like some kind of resistance graph where you try to map out the path of least resistance. If you inverse the requirements, path of maximum resistance, it could perhaps be used to produce a set that from the start produces maximum difference as you go, and towards the end starts to go back to values closer to the others.
For instance, here's one way to perhaps do what you want.
Calculate the distance (ref your other post) from each color to all other colors
Sum the distances for each color, this gives you an indication for how far away this color is from all other colors in total
Order the list by distance, going down
This would, it seems, produce a list that starts with the color that is farthest away from all other colors, and then go down, colors towards the end of the list would be closer to other colors in general.
Edit: Reading your reply to my first post, about the spatial subdivision, would not exactly fit the above description, since colors close to other colors would fall to the bottom of the list, but let's say you have a cluster of colors somewhere, at least one of the colors from that cluster would be located near the start of the list, and it would be the one that generally was farthest away from all other colors in total. If that makes sense.
This problem is called color quantization, and has many well known algorithms: http://en.wikipedia.org/wiki/Color_quantization I know people who implemented the octree approach to good effect.
It seems perception is important to you, in that case you might want to consider working with a perceptual color space such as YUV, YCbCr or Lab. Everytime I've used those, they have given me much better results than sRGB alone.
Converting to and from sRGB can be a pain but in your case it could actually make the algorithm simpler and as a bonus it will mostly work for color blinds too!
N maximally distant colors can be considered a set of well-distributed points in a 3-dimensional (color) space. If you can generate them from a Halton sequence, then any prefix (the first M colors) also consists of well-distributed points.
If I'm understanding the question correctly, you wish to obtain the subset of M colours with the highest mean distance between colours, given some distance function d.
Put another way, considering the initial set of N colours as a large, undirected graph in which all colours are connected, you want to find the longest path that visits any M nodes.
Solving NP-complete graph problems is way beyond me I'm afraid, but you could try running a simple physical simulation:
Generate M random points in colour space
Calculate the distance between each point
Calculate repulsion vectors for each point that will move it away from all other points (using 1 / (distance ^ 2) as the magnitude of the vector)
Sum the repulsion vectors for each point
Update the position of each point according to the summed repulsion vectors
Constrain any out of bound coordinates (such as luminosity going negative or above one)
Repeat from step 2 until the points stabilise
For each point, select the nearest colour from the original set of N
It's far from efficient, but for small M it may be efficient enough, and it will give near optimal results.
If your colour distance function is simple, there may be a more deterministic way of generating the optimal subset.
Start with two lists. CandidateColors, which initially contains your distinct colors and SortedColors, which is initially empty.
Pick any color and remove it from CandidateColors and put it into SortedColors. This is the first color and will be the most common, so it's a good place to pick a color that jives well with your application.
For each color in CandidateColors calculate its total distance. The total distance is the sum of the distance from the CandidateColor to each of the colors in SortedColors.
Remove the color with the largest total distance from CandidateColors and add it to the end of SortedColors.
If CandidateColors is not empty, go back to step 3.
This greedy algorithm should give you good results.
You could just sort the candidate colors based on the maximum-distanced of the minimum-distance to any of the index colors.
Using Euclidean color distance:
public double colordistance(Color color0, Color color1) {
int c0 = color0.getRGB();
int c1 = color1.getRGB();
return distance(((c0>>16)&0xFF), ((c0>>8)&0xFF), (c0&0xFF), ((c1>>16)&0xFF), ((c1>>8)&0xFF), (c1&0xFF));
}
public double distance(int r1, int g1, int b1, int r2, int g2, int b2) {
int dr = (r1 - r2);
int dg = (g1 - g2);
int db = (b1 - b2);
return Math.sqrt(dr * dr + dg * dg + db * db);
}
Though you can replace it with anything you want. It just needs a color distance routine.
public void colordistancesort(Color[] candidateColors, Color[] indexColors) {
double current;
double distance[] = new double[candidateColors.length];
for (int j = 0; j < candidateColors.length; j++) {
distance[j] = -1;
for (int k = 0; k < indexColors.length; k++) {
current = colordistance(indexColors[k], candidateColors[j]);
if ((distance[j] == -1) || (current < distance[j])) {
distance[j] = current;
}
}
}
//just sorts.
for (int j = 0; j < candidateColors.length; j++) {
for (int k = j + 1; k < candidateColors.length; k++) {
if (distance[j] > distance[k]) {
double d = distance[k];
distance[k] = distance[j];
distance[j] = d;
Color m = candidateColors[k];
candidateColors[k] = candidateColors[j];
candidateColors[j] = m;
}
}
}
}
Do you mean that from a set of N colors, you need to pick M colors, where M < N, such that M is the best representation of the N colors in the M space?
As a better example, reduce a true-color (24 bit color space) to a 8-bit mapped color space (GIF?).
There are quantization algorithms for this, like the Adaptive Spatial Subdivision algorithm used by ImageMagic.
These algorithms usually don't just pick existing colors from the source space but creates new colors in the target space that most closely resemble the source colors. As a simplified example, if you have 3 colors in the original image where two are red (with different intensity or bluish tints etc.) and the third is blue, and need to reduce to two colors, the target image could have a red color that is some kind of average of the original two red + the blue color from the original image.
If you need something else then I didn't understand your question :)
You can split them in to RGB HEX format so that you can compare the R with R's of a different color, same with the G and B.
Same format as HTML
XX XX XX
RR GG BB
00 00 00 = black
ff ff ff = white
ff 00 00 = red
00 ff 00 = green
00 00 ff = blue
So the only thing you would need to decide is how close you want the colors and what is an acceptable difference for the segments to be considered different.