neighbor searching algorithm - algorithm

I have a STL file that contains the coordinates (x,y,z) of 3 points (p0, p1, p2) of a triangle. these triangle represent a 3D surface f(x,y,z). The STL file might have over a 1000 triangles to represent a complex geometry.
for my application, I need to know the neighboring triangles for each triangle entry from the stl file. meaning that for each triangle, i have to pick 3 pairs of points pair1=(p0,p1), pair2=(p0,p2), pair3= (p1,p2) and compare them with pair of points in other triangles in the set
what's the best and most efficient algorithm to achieve this purpose? can i use a hashtree, hashmap?

change the mesh representation to point table and triangle faces table. STL demands that all triangles are joined in their vertexes so no cutting of edges which means neighboring triangle always share one complete edge.
double pnt[points][3];
int tri[triangles][3];
The pnt should be list of all distinct points (index sort it to improve speed for high point count). The tri should contain 3 indexes of points used in triangle. Sort them (asc or desc) to improve match speed.
Now if any triangle tri[i] shares the same edge like tri[j] then those two are neighboring triangles.
if ((tri[i][0]==tri[j][0])&&(tri[i][1]==tri[j][1])
||(tri[i][0]==tri[j][1])&&(tri[i][1]==tri[j][2])) triangles i,j are neighbors
Add all combinations ...
If you need just neighboring points then find all triangles containing that points and all the other points used in those triangles are neighbors
To load STL to such structure do this:
clear pnt[],tri[] lists/tables
process each triangle of STL
for each point of triangle
look if it is in pnt[] if yes use its index for new triangle. if not add new point to pnt and use its index for new triangle. When all 3 points done add new triangle to tri.
Improving pnt[] performance
Add index sort for pnt[] sorted by any coordinate for example x and improve performance of checking if point is already present in pnt.
So while adding (xi,yi,zi) into pnt[] find index of point that have the biggest x which is xi>=pnt[i0][0] via binary search and then scan all points in pnt until x crosses xi so xi<pnt[i1][0] this way you do not need to check all points.
If this is too slow (usually if number of points is bigger then 40000) you can improve performance more by segment index sorting (dividing index sort into segment pages of finite size like 8192 points)
Improving tri[] performance
You can also sort the tri[] by tri[i][0] so you can use binary search similarly to pnt[].

I would suggest going with hashmap where values are sets (based on tree) of references to Tringles, keys are those pairs of Points (lets call these pairs simply Sides) and some hashing function that would take into accout the property that hash of Side (a,b) should be equal to hash of (b,a).
Some kind of algorithm:
Read 3 Points and create from them 3 Sides and Triangle.
Add all that to hashmap: map[side[i]].insert(tringle)
Repeat 1-2 until you read all the data
Now you have a map with filled data. About the complexity of filling: insertion into hashmap are constant-time at average (it also depends on the hash-function) and insertion complexity into a set is logarithmic so the complete complexity of filleng data is O(n*logm) where n is the number of Sides and m is average number of Tringles with the same Side.
Normally each set would contain around 4 Triangles: 1 + 3 side-neighbours, so logm is relatively small (comparing to n) and could be not taken into account (suppose it is a constant). These suggestions lead us to some kind of conclusion: best-case complexity for filling is O(n) (no collisions, no rehashing, etc) and worst is O(n*logn) (worst-case inserting of n Sides by 1 average case in map and by logn case inserting into one set meaning all Tringles share the same Side).
Now to get all side-neighbours for some Triangle:
Get all 3 sets for each Side of that Triangle (e.g. set[i] = map[triangle.sides[i]].
Get intersection of those 3 sets (exclude triangle to get only its side-neighbours).
Done.
About complexity of getting side-neighbours: linearly-depent on the size of the sets and relatively small comparing to 'n' in normal case.
Note: To get not side-neighbours but point-neighbours (assuming neighbours are called any 2 Triangles with common Point not Side) simply fill sets with Points instead of Sides. The above assumptions about time-complexities hold exept for constants.

Related

How to index nearby 3D points on the fly?

In physics simulations (for example n-body systems) it is sometimes necessary to keep track of which particles (points in 3D space) are close enough to interact (within some cutoff distance d) in some kind of index. However, particles can move around, so it is necessary to update the index, ideally on the fly without recomputing it entirely. Also, for efficiency in calculating interactions it is necessary to keep the list of interacting particles in the form of tiles: a tile is a fixed size array (eg 32x32) where the rows and columns are particles, and almost every row-particle is close enough to interact with almost every column particle (and the array keeps track of which ones actually do interact).
What algorithms may be used to do this?
Here is a more detailed description of the problem:
Initial construction: Given a list of points in 3D space (on the order of a few thousand to a few million, stored as array of floats), produce a list of tiles of a fixed size (NxN), where each tile has two lists of points (N row points and N column points), and a boolean array NxN which describes whether the interaction between each row and column particle should be calculated, and for which:
a. every pair of points p1,p2 for which distance(p1,p2) < d is found in at least one tile and marked as being calculated (no missing interactions), and
b. if any pair of points is in more than one tile, it is only marked as being calculated in the boolean array in at most one tile (no duplicates),
and also the number of tiles is relatively small if possible (but this is less important than being able to update the tiles efficiently)
Update step: If the positions of the points change slightly (by much less than d), update the list of tiles in the fastest way possible so that they still meet the same conditions a and b (this step is repeated many times)
It is okay to keep any necessary data structures that help with this, for example the bounding boxes of each tile, or a spatial index like a quadtree. It is probably too slow to calculate all particle pairwise distances for every update step (and in any case we only care about particles which are close, so we can skip most possible pairs of distances just by sorting along a single dimension for example). Also it is probably too slow to keep a full (quadtree or similar) index of all particle positions. On the other hand is perfectly fine to construct the tiles on a regular grid of some kind. The density of particles per unit volume in 3D space is roughly constant, so the tiles can probably be built from (essentially) fixed size bounding boxes.
To give an example of the typical scale/properties of this kind of problem, suppose there is 1 million particles, which are arranged as a random packing of spheres of diameter 1 unit into a cube with of size roughly 100x100x100. Suppose the cutoff distance is 5 units, so typically each particle would be interacting with (2*5)**3 or ~1000 other particles or so. The tile size is 32x32. There are roughly 1e+9 interacting pairs of particles, so the minimum possible number of tiles is ~1e+6. Now assume each time the positions change, the particles move a distance around 0.0001 unit in a random direction, but always in a way such that they are at least 1 unit away from any other particle and the typical density of particles per unit volume stays the same. There would typically be many millions of position update steps like that. The number of newly created pairs of interactions per step due to the movement is (back of the envelope) (10**2 * 6 * 0.0001 / 10**3) * 1e+9 = 60000, so one update step can be handled in principle by marking 60000 particles as non-interacting in their original tiles, and adding at most 60000 new tiles (mostly empty - one per pair of newly interacting particles). This would rapidly get to a point where most tiles are empty, so it is definitely necessary to combine/merge tiles somehow pretty often - but how to do it without a full rebuild of the tile list?
P.S. It is probably useful to describe how this differs from the typical spatial index (eg octrees) scenario: a. we only care about grouping close by points together into tiles, not looking up which points are in an arbitrary bounding box or which points are closest to a query point - a bit closer to clustering that querying and b. the density of points in space is pretty constant and c. the index has to be updated very often, but most moves are tiny
Not sure my reasoning is sound, but here's an idea:
Divide your space into a grid of 3d cubes, like this in three dimensions:
The cubes have a side length of d. Then do the following:
Assign all points to all cubes in which they're contained; this is fast since you can derive a point's cube from just their coordinates
Now check the following:
Mark all points in the top left of your cube as colliding; they're less than d apart. Further, every "quarter cube" in space is only the top left quarter of exactly one cube, so you won't check the same pair twice.
Check fo collisions of type (p, q), where p is a point in the top left quartile, and q is a point not in the top left quartile. In this way, you will check collision between every two points again at most once, because very pair of quantiles is checked exactly once.
Since every pair of points is either in the same quartile or in neihgbouring quartiles, they'll be checked by the first or the second algorithm. Further, since points are approximately distributed evenly, your runtime is much less than n^2 (n=no points); in aggregate, it's k^2 (k = no points per quartile, which appears to be approximately constant).
In an update step, you only need to check:
if a point crossed a boundary of a box, which should be fast since you can look at one coordinate at a time, and box' boundaries are a simple multiple of d/2
check for collisions of the points as above
To create the tiles, divide the space into a second grid of (non-overlapping) cubes whose width is chosen s.t. the average count of centers between two particles that almost interact with each other that fall into a given cube is less than the width of your tiles (i.e. 32). Since each particle is expected to interact with 300-500 particles, the width will be much smaller than d.
Then, while checking for interactions in step 1 & 2, assigne particle interactions to these new cubes according to the coordinates of the center of their interaction. Assign one tile per cube, and mark interacting particles assigned to that cube in the tile. Visualization:
Further optimizations might be to consider the distance of a point's closest neighbour within a cube, and derive from that how many update steps are needed at least to change the collision status of that point; then ignore that point for this many steps.
I suggest the following algorithm. E.g we have cube 1x1x1 and the cutoff distance is 0.001
Let's choose three base anchor points: (0,0,0) (0,1,0) (1,0,0)
Associate array of size 1000 ( 1 / 0.001) with each anchor point
Add three numbers into each regular point. We will store the distance between the given point and each anchor point inside these fields
At the same time this distance will be used as an index in an array inside the anchor point. E.g. 0.4324 means index 432.
Let's store the set of points inside of each three arrays
Calculate distance between the regular point and each anchor point every time when update point
Move point between sets in arrays during the update
The given structures will give you an easy way to find all closer points: it is the intersection between three sets. And we choose these sets based on the distance between point and anchor points.
In short, it is the intersection between three spheres. Maybe you need to apply additional filtering for the result if you want to erase the corners of this intersection.
Consider using the Barnes-Hut algorithm or something similar. A simulation in 2D would use a quadtree data structure to store particles, and a 3D simulation would use an octree.
The benefit of using a a tree structure is that it stores the particles in a way that nearby particles can be found quickly by traversing the tree, and far-away particles are in traversal paths that can be ignored.
Wikipedia has a good description of the algorithm:
The Barnes–Hut tree
In a three-dimensional n-body simulation, the Barnes–Hut algorithm recursively divides the n bodies into groups by storing them in an octree (or a quad-tree in a 2D simulation). Each node in this tree represents a region of the three-dimensional space. The topmost node represents the whole space, and its eight children represent the eight octants of the space. The space is recursively subdivided into octants until each subdivision contains 0 or 1 bodies (some regions do not have bodies in all of their octants). There are two types of nodes in the octree: internal and external nodes. An external node has no children and is either empty or represents a single body. Each internal node represents the group of bodies beneath it, and stores the center of mass and the total mass of all its children bodies.
demo

Given coordinates of points, find all pairs of points that exist within a certain distance of each other?

2 points are pairs if the distance between the two points are 0 <= D <= 1000. Given the 2D coordinates (floating point numbers) of 0 <= N <= 1000 stars, determine how many pairs there are.
I've seen this question a couple of times before but I forgot the implementation. I believe this had something to do with divide and conquer, where you split the plane by half and recurse on the two sides of the plane, but I'm very unsure of how that would work out.
No need for any code, just a general walkthrough of the solution for this type of problem would suffice.
What you might be thinking of is a quad tree, the 2D case of a k-d tree. In a quad tree, you start with a bounding rectangle that encompasses all points. You insert all points into this base level.
From there, you divide the quad into either halves or quarters. You insert each point into the half or quarter into which it falls. You can further subdivide each half or quarter into smaller halves or quarters, inserting each point into the smaller areas they fall into.
To find all points within a distance of a given point, you simply find all quads in your tree that have any point within the given distance. Then you can test only points in those quads against your initial point.
This keeps you from doing the typical n2 comparison of all points against one another.

Most efficient way to select point with the most surrounding points

N.B: there's a major edit at the bottom of the question - check it out
Question
Say I have a set of points:
I want to find the point with the most points surrounding it, within radius (ie a circle) or within (ie a square) of the point for 2 dimensions. I'll refer to it as the densest point function.
For the diagrams in this question, I'll represent the surrounding region as circles. In the image above, the middle point's surrounding region is shown in green. This middle point has the most surrounding points of all the points within radius and would be returned by the densest point function.
What I've tried
A viable way to solve this problem would be to use a range searching solution; this answer explains further and that it has " worst-case time". Using this, I could get the number of points surrounding each point and choose the point with largest surrounding point count.
However, if the points were extremely densely packed (in the order of a million), as such:
then each of these million points () would need to have a range search performed. The worst-case time , where is the number of points returned in the range, is true for the following point tree types:
kd-trees of two dimensions (which are actually slightly worse, at ),
2d-range trees,
Quadtrees, which have a worst-case time of
So, for a group of points within radius of all points within the group, it gives complexity of for each point. This yields over a trillion operations!
Any ideas on a more efficient, precise way of achieving this, so that I could find the point with the most surrounding points for a group of points, and in a reasonable time (preferably or less)?
EDIT
Turns out that the method above is correct! I just need help implementing it.
(Semi-)Solution
If I use a 2d-range tree:
A range reporting query costs , for returned points,
For a range tree with fractional cascading (also known as layered range trees) the complexity is ,
For 2 dimensions, that is ,
Furthermore, if I perform a range counting query (i.e., I do not report each point), then it costs .
I'd perform this on every point - yielding the complexity I desired!
Problem
However, I cannot figure out how to write the code for a counting query for a 2d layered range tree.
I've found a great resource (from page 113 onwards) about range trees, including 2d-range tree psuedocode. But I can't figure out how to introduce fractional cascading, nor how to correctly implement the counting query so that it is of O(log n) complexity.
I've also found two range tree implementations here and here in Java, and one in C++ here, although I'm not sure this uses fractional cascading as it states above the countInRange method that
It returns the number of such points in worst case
* O(log(n)^d) time. It can also return the points that are in the rectangle in worst case
* O(log(n)^d + k) time where k is the number of points that lie in the rectangle.
which suggests to me it does not apply fractional cascading.
Refined question
To answer the question above therefore, all I need to know is if there are any libraries with 2d-range trees with fractional cascading that have a range counting query of complexity so I don't go reinventing any wheels, or can you help me to write/modify the resources above to perform a query of that complexity?
Also not complaining if you can provide me with any other methods to achieve a range counting query of 2d points in in any other way!
I suggest using plane sweep algorithm. This allows one-dimensional range queries instead of 2-d queries. (Which is more efficient, simpler, and in case of square neighborhood does not require fractional cascading):
Sort points by Y-coordinate to array S.
Advance 3 pointers to array S: one (C) for currently inspected (center) point; other one, A (a little bit ahead) for nearest point at distance > R below C; and the last one, B (a little bit behind) for farthest point at distance < R above it.
Insert points pointed by A to Order statistic tree (ordered by coordinate X) and remove points pointed by B from this tree. Use this tree to find points at distance R to the left/right from C and use difference of these points' positions in the tree to get number of points in square area around C.
Use results of previous step to select "most surrounded" point.
This algorithm could be optimized if you rotate points (or just exchange X-Y coordinates) so that width of the occupied area is not larger than its height. Also you could cut points into vertical slices (with R-sized overlap) and process slices separately - if there are too many elements in the tree so that it does not fit in CPU cache (which is unlikely for only 1 million points). This algorithm (optimized or not) has time complexity O(n log n).
For circular neighborhood (if R is not too large and points are evenly distributed) you could approximate circle with several rectangles:
In this case step 2 of the algorithm should use more pointers to allow insertion/removal to/from several trees. And on step 3 you should do a linear search near points at proper distance (<=R) to distinguish points inside the circle from the points outside it.
Other way to deal with circular neighborhood is to approximate circle with rectangles of equal height (but here circle should be split into more pieces). This results in much simpler algorithm (where sorted arrays are used instead of order statistic trees):
Cut area occupied by points into horizontal slices, sort slices by Y, then sort points inside slices by X.
For each point in each slice, assume it to be a "center" point and do step 3.
For each nearby slice use binary search to find points with Euclidean distance close to R, then use linear search to tell "inside" points from "outside" ones. Stop linear search where the slice is completely inside the circle, and count remaining points by difference of positions in the array.
Use results of previous step to select "most surrounded" point.
This algorithm allows optimizations mentioned earlier as well as fractional cascading.
I would start by creating something like a https://en.wikipedia.org/wiki/K-d_tree, where you have a tree with points at the leaves and each node information about its descendants. At each node I would keep a count of the number of descendants, and a bounding box enclosing those descendants.
Now for each point I would recursively search the tree. At each node I visit, either all of the bounding box is within R of the current point, all of the bounding box is more than R away from the current point, or some of it is inside R and some outside R. In the first case I can use the count of the number of descendants of the current node to increase the count of points within R of the current point and return up one level of the recursion. In the second case I can simply return up one level of the recursion without incrementing anything. It is only in the intermediate case that I need to continue recursing down the tree.
So I can work out for each point the number of neighbours within R without checking every other point, and pick the point with the highest count.
If the points are spread out evenly then I think you will end up constructing a k-d tree where the lower levels are close to a regular grid, and I think if the grid is of size A x A then in the worst case R is large enough so that its boundary is a circle that intersects O(A) low level cells, so I think that if you have O(n) points you could expect this to cost about O(n * sqrt(n)).
You can speed up whatever algorithm you use by preprocessing your data in O(n) time to estimate the number of neighbouring points.
For a circle of radius R, create a grid whose cells have dimension R in both the x- and y-directions. For each point, determine to which cell it belongs. For a given cell c this test is easy:
c.x<=p.x && p.x<=c.x+R && c.y<=p.y && p.y<=c.y+R
(You may want to think deeply about whether a closed or half-open interval is correct.)
If you have relatively dense/homogeneous coverage, then you can use an array to store the values. If coverage is sparse/heterogeneous, you may wish to use a hashmap.
Now, consider a point on the grid. The extremal locations of a point within a cell are as indicated:
Points at the corners of the cell can only be neighbours with points in four cells. Points along an edge can be neighbours with points in six cells. Points not on an edge are neighbours with points in 7-9 cells. Since it's rare for a point to fall exactly on a corner or edge, we assume that any point in the focal cell is neighbours with the points in all 8 surrounding cells.
So, if a point p is in a cell (x,y), N[p] identifies the number of neighbours of p within radius R, and Np[y][x] denotes the number of points in cell (x,y), then N[p] is given by:
N[p] = Np[y][x]+
Np[y][x-1]+
Np[y-1][x-1]+
Np[y-1][x]+
Np[y-1][x+1]+
Np[y][x+1]+
Np[y+1][x+1]+
Np[y+1][x]+
Np[y+1][x-1]
Once we have the number of neighbours estimated for each point, we can heapify that data structure into a maxheap in O(n) time (with, e.g. make_heap). The structure is now a priority-queue and we can pull points off in O(log n) time per query ordered by their estimated number of neighbours.
Do this for the first point and use a O(log n + k) circle search (or some more clever algorithm) to determine the actual number of neighbours the point has. Make a note of this point in a variable best_found and update its N[p] value.
Peek at the top of the heap. If the estimated number of neighbours is less than N[best_found] then we are done. Otherwise, repeat the above operation.
To improve estimates you could use a finer grid, like so:
along with some clever sliding window techniques to reduce the amount of processing required (see, for instance, this answer for rectangular cases - for circular windows you should probably use a collection of FIFO queues). To increase security you can randomize the origin of the grid.
Considering again the example you posed:
It's clear that this heuristic has the potential to save considerable time: with the above grid, only a single expensive check would need to be performed in order to prove that the middle point has the most neighbours. Again, a higher-resolution grid will improve the estimates and decrease the number of expensive checks which need to be made.
You could, and should, use a similar bounding technique in conjunction with mcdowella's answers; however, his answer does not provide a good place to start looking, so it is possible to spend a lot of time exploring low-value points.

How to calculate total volume of multiple overlapping cuboids

I have a list of Cuboids, defined by their coordinates of their lower-left-back and upper-right-front corners, with edges parallel to the axis. Coordinates are double values. These cuboids are densely packed, will overlap with one or more others, or even fully contain others.
I need to calculate the total volume encompassed by all the given cuboids. Areas which overlap (even multiple times) should be counted exactly once.
For example, the volumes:
((0,0,0) (3,3,3))
((0,1,0) (2,2,4))
((1,0,1) (2,5,2))
((6,6,6) (8,8,8))
The total volume is 27 + 1 + 2 + 8 = 38.
Is there an easy way to do this ( in O(n^3) time or better?) ?
How about maintaining a collection of non-intersecting cuboids as each one is processed?
This collection would start empty.
The first cuboid would be added to the collection – it would be the only element, therefore guaranteed not to intersect anything else.
The second and subsequent cuboids would be checked against the elements in the collection. For each new cuboid N, for each element E already in the collection:
If N is totally contained by E, discard N and resume processing at the next new cuboid.
If N totally contains E, remove E from the collection and continue testing N against the other elements in the collection.
If N intersects E, split N into up to five (see comment below) smaller cuboids (depending on how they intersect) representing the volume that does not intersect and continue testing these smaller cuboids against the other elements in the collection.
If we get to the end of the tests against the non-intersecting elements with one or more cuboids generated from N (representing the volume contributed by N that wasn't in any of the previous cuboids) then add them all to the collection and process the next cuboid.
Once all the cuboids have been processed, the total volume will be the sum of the volumes in the collection of non-intersecting cuboids that has been built up.
This can be solved efficiently using a plane-sweep algorithm, that is a straightforward extension of the line-sweep algorithm suggested here for finding the total area of overlapping rectangles.
For each cuboid add it's left and right x-coordinate in an event queue and sort the queue. Now sweep a yz-plane (that has a constant x value) through the cuboids and record the volume between any two successive events in the event queue. We do this by maintaining the list of rectangles that intersect the plane at any stage
As we sweep the plane we encounter two types of events:
(1) We see the beginning of new cuboid that starts intersecting the sweeping plane. In this case a new rectangle intersects the plane, and we update the area of the rectangles intersecting the sweeping plane.
(2) The end of an existing cuboid that was intersecting with the plane. In this case we have to remove the corresponding rectangle from the list of rectangles that are currently intersecting the plane and update the new area of the resulting rectangles.
The volume of the cuboids between any two successive events qi and qi+1 is equal to the horizontal distance between the two events times the area of the rectangles intersecting the sweep line at qi.
By using the O(nlogn) algorithm for computing the area of rectangles as a subroutine, we can obtain an O(n2logn) algorithm for computing the total volume of the cuboids. But there may be a better way of maintaining the rectangles (since we only add or delete a rectangle at any stage) that is more efficient.
I recently had the same problem and found the following approach easy to implement and working for n dimensions.
First build a grid and then check for each cell in the grid whether it overlaps with a cuboid or not. The volume of overlapping cuboids is the sum of the volumes for those cells which are included in one or more cuboids.
Describe your cuboids with their min/max value for each dimension.
For each dimension store min/max values of each cuboid in an array. Sort this array and remove duplicates.
Now you have grid points of a non-equidistant grid. Each cell of the grid is either completely inside one or more cuboids or not.
Iterate over the grid cells and count the volume for those cells which overlap with one or more cuboids.
You can get all grid cells by using the Cartesian Product.
I tried the cellular approach suggested by #ccssmnn; it worked but was way too slow. The problem is that the size of the array used for "For each dimension store min/max values of each cuboid in an array." is O(n), so the number of cells (hence, the execution time) is n^d, e.g., n^3 for three dimensions.
Next, I tried a nested sweep-line algorithm, as suggested by #krjampani; much faster but still too slow. I believe the complexity is n^2*log^3(n).
So now, I'm wondering if there's any recourse. I've read several postings that mention the use of interval trees or augmented interval trees; might this approach have better complexity, e.g., n*log^3(n)?
Also, I'm trying to get my head around what would be the augmenting value in this case? In the case of point or range queries, I can see sorting the cuboids by their (xlo,ylo,zlo) and using max(xhi,yhi,zhi) for each subtree as the augmenting value, but can't figure out how to extend this to keep track of the union of the cuboids and its volume.

Finding the farthest point in one set from another set

My goal is a more efficient implementation of the algorithm posed in this question.
Consider two sets of points (in N-space. 3-space for the example case of RGB colorspace, while a solution for 1-space 2-space differs only in the distance calculation). How do you find the point in the first set that is the farthest from its nearest neighbor in the second set?
In a 1-space example, given the sets A:{2,4,6,8} and B:{1,3,5}, the answer would be
8, as 8 is 3 units away from 5 (its nearest neighbor in B) while all other members of A are just 1 unit away from their nearest neighbor in B. edit: 1-space is overly simplified, as sorting is related to distance in a way that it is not in higher dimensions.
The solution in the source question involves a brute force comparison of every point in one set (all R,G,B where 512>=R+G+B>=256 and R%4=0 and G%4=0 and B%4=0) to every point in the other set (colorTable). Ignore, for the sake of this question, that the first set is elaborated programmatically instead of iterated over as a stored list like the second set.
First you need to find every element's nearest neighbor in the other set.
To do this efficiently you need a nearest neighbor algorithm. Personally I would implement a kd-tree just because I've done it in the past in my algorithm class and it was fairly straightforward. Another viable alternative is an R-tree.
Do this once for each element in the smallest set. (Add one element from the smallest to larger one and run the algorithm to find its nearest neighbor.)
From this you should be able to get a list of nearest neighbors for each element.
While finding the pairs of nearest neighbors, keep them in a sorted data structure which has a fast addition method and a fast getMax method, such as a heap, sorted by Euclidean distance.
Then, once you're done simply ask the heap for the max.
The run time for this breaks down as follows:
N = size of smaller set
M = size of the larger set
N * O(log M + 1) for all the kd-tree nearest neighbor checks.
N * O(1) for calculating the Euclidean distance before adding it to the heap.
N * O(log N) for adding the pairs into the heap.
O(1) to get the final answer :D
So in the end the whole algorithm is O(N*log M).
If you don't care about the order of each pair you can save a bit of time and space by only keeping the max found so far.
*Disclaimer: This all assumes you won't be using an enormously high number of dimensions and that your elements follow a mostly random distribution.
The most obvious approach seems to me to be to build a tree structure on one set to allow you to search it relatively quickly. A kd-tree or similar would probably be appropriate for that.
Having done that, you walk over all the points in the other set and use the tree to find their nearest neighbour in the first set, keeping track of the maximum as you go.
It's nlog(n) to build the tree, and log(n) for one search so the whole thing should run in nlog(n).
To make things more efficient, consider using a Pigeonhole algorithm - group the points in your reference set (your colorTable) by their location in n-space. This allows you to efficiently find the nearest neighbour without having to iterate all the points.
For example, if you were working in 2-space, divide your plane into a 5 x 5 grid, giving 25 squares, with 25 groups of points.
In 3 space, divide your cube into a 5 x 5 x 5 grid, giving 125 cubes, each with a set of points.
Then, to test point n, find the square/cube/group that contains n and test distance to those points. You only need to test points from neighbouring groups if point n is closer to the edge than to the nearest neighbour in the group.
For each point in set B, find the distance to its nearest neighbor in set A.
To find the distance to each nearest neighbor, you can use a kd-tree as long as the number of dimensions is reasonable, there aren't too many points, and you will be doing many queries - otherwise it will be too expensive to build the tree to be worthwhile.
Maybe I'm misunderstanding the question, but wouldn't it be easiest to just reverse the sign on all the coordinates in one data set (i.e. multiply one set of coordinates by -1), then find the first nearest neighbour (which would be the farthest neighbour)? You can use your favourite knn algorithm with k=1.
EDIT: I meant nlog(n) where n is the sum of the sizes of both sets.
In the 1-Space set I you could do something like this (pseudocode)
Use a structure like this
Struct Item {
int value
int setid
}
(1) Max Distance = 0
(2) Read all the sets into Item structures
(3) Create an Array of pointers to all the Items
(4) Sort the array of pointers by Item->value field of the structure
(5) Walk the array from beginning to end, checking if the Item->setid is different from the previous Item->setid
if (SetIDs are different)
check if this distance is greater than Max Distance if so set MaxDistance to this distance
Return the max distance.

Resources