I want to find an algorithm that solves the following problem.
Suppose we have a point cloud with N points of dimension m we want to divide the point cloud into sub-clouds where any sub-cloud is larger than or equal to size k and we want to minimize the following:
each sub-cloud size is closest as possible to k.
the distances between points in each sub-cloud.
any direction for a solution will be great, and implementation in python will be appreciated.
Have you thought about using K-means machine learning algorithm?
I know it's not a perfect solution, as you still need to solve the k-size condition, but it's a good direction.
To solve the issue I would:
Choose my k to be something around N / size of sub-cloud (what you called k). I think it has the best chance to success.
Each sub-cloud that returned from the algorithm and is smaller than the wanted size- just add its points to the closest sub-cloud that's been created.
Hope it helped you somehow!
Related
for example, array a[]= {1,1,10}, we need to find 'x' such that |x-1|+|x-1|+|x-10| is minimum. here, x is 1.
Could it be solved in greedy approach, like taking average or something else?
Note: taking average doesn't work, why?
I can only come up with O(nlogn) solution (binary search), is there any other approach like dp?
Thanks in advance!
This is a very simple question (don't feel offended: at least, the answer is simple if you know it):
If you have an odd number of items, then the one in the middle will be your optimum.
If you have an even number, any number between the central two will be equally optimal.
So no numerical calculation is needed, only sorting :)
Explanation: to move your resulting number will change each of the abs(x-x1) by the same amount, but with different signs: some increasing and some decreasing.
From the point in the center (in the odd case) there will be one more increasing abs when moving in either direction; while when moving between the two central points (in the even case) there will be equally many abs of each sign.
I am currently assigned to create a c++ program to find the closest pair of points in a (x,y) coordinate system. However, I am having a lot of trouble trying to understand one thing.
Every tutorial/guide that I have read about the closest pair problem it tells me to sort the set of points by there Y coordinates, but I don't see what the point of this is? Can someone explain to me why we sort it by Y coordinates and what is the use of it? I understand that we sort the points by X in order to get L and X*, but I just don't understand why we have to sort the points by Y coordinates as well.
You don't, but then your running time is not improved over O(n2). The whole point is to compute as little as possible -- by examining as few points as possible, ignoring those you know will not be part of the answer. Do that by sorting Y.
Here's a pretty good explanation I just googled: http://www.cs.mcgill.ca/~cs251/ClosestPair/ClosestPairDQ.html
I was browsing Stack Overflow for a question on an algorithm in which to find the closest point to point A from a list of 2D points. I know the list must be sorted to get an optimal time, but I want it faster than O(N**2) which would be using brute force.
I found an answer that seems appealing here: Given list of 2d points, find the point closest to all other points, but I don't fully understand the answer. I am wondering about the part where he begins explaining about the horizontal/vertical cost and from that point onward. Could someone provide me an example of what to do in the case of these (random) points?
point A: (20, 40.6)
List of Points[(-20,200),(12,47), (4,0), (-82,92), (40,15), (112, 97), (-203, 84)]
If you can provide an alternative method to the linked post that would also be fine. I know it has something to do with sorting the list, and probably taking out the extremes, but again I'm not sure what method(s) to use.
EDIT: I understand now that this is not the euclidian distance that I am most interested in. Would a divide and conquer algorithm/method be the best bet here? I don't fully understand it yet, but it sounds like it solves what I want it to solve in O(N*log(N)). Would this approach be optimal, and if so would someone mind breaking it down to the basics as I am unable to understand it as most other sites have described it.
What you try to do is not possible if there is no structure in the list of points and they can really be random. Assume you have an algorithm that runs faster than in linear time, then in in your list of points there is one point B that is not read at all by the algorithm. Necessarily if I change B to another value the algorithm will run in the same way and return the same result. Now if the algorithm does not return a point of the list that is the same than A then I can change B to B=A and the correct solution to the problem would now be B (you can't get any closer than actually being the same point) and the algorithm would necessarily return a wrong result.
What the question you are referring to is trying to do is find a point A out of a list L such that sum of all distances between A and every point in L is minimal. The algorithm described in the answer runs in time O(n*log(n)) (where n). Note that n*log(n) actually grows faster than n so it is actually slower than looking at every element.
Also "distance" in the question does not refer to the euclidean distance. Where normally you would define the distance between a point (x_1,y_1) and a second point (x_2,y_2) to be sqrt((x_2-x_1)^2+(y_2-x_2)^2) the question refers to the "Taxicab distance" |x_2-x_1|+|y_2-x_2| where | | refers to the absolute value.
Re: edit
If you just want to find one point of the list that is closest to a fixed point A than you can linearly search for it in the list. See the following python code:
def distance(x,y):
#Manhatten distance
x1,x2=x
y1,y2=y
return abs(x1-y1)+abs(x2-y2)
def findClosest(a,listOfPoints):
minDist=float("inf")
minIndex=None
for index,b in enumerate(listOfPoints):
if distance(a,b) < minDist:
minDist=distance(a,b)
minIndex=index
return minDist,minIndex
a=(20, 40.6)
listOfPoints=[(-20,200),(12,47), (4,0), (-82,92), (40,15), (112, 97), (-203, 84)]
minDist,minIndex=findClosest(a,listOfPoints)
print("minDist:",minDist)
print("minIndex:",minIndex)
print("closest point:",listOfPoints[minIndex])
The challenge of the referenced question is that you don't want to minimize the distance to a fixed point, but that you want to find a point A out of the list L whose average distance to all other points in L is minimal.
I have been thinking about a variation of the closest pair problem in which the only available information is the set of distances already calculated (we are not allowed to sort points according to their x-coordinates).
Consider 4 points (A, B, C, D), and the following distances:
dist(A,B) = 0.5
dist(A,C) = 5
dist(C,D) = 2
In this example, I don't need to evaluate dist(B,C) or dist(A,D), because it is guaranteed that these distances are greater than the current known minimum distance.
Is it possible to use this kind of information to reduce the O(n²) to something like O(nlogn)?
Is it possible to reduce the cost to something close to O(nlogn) if I accept a kind of approximated solution? In this case, I am thinking about some technique based on reinforcement learning that only converges to the real solution when the number of reinforcements go to infinite, but provides a great approximation for small n.
Processing time (measured by the big O notation) is not the only issue. To keep a very large amount of previous calculated distances can also be an issue.
Imagine this problem for a set with 10⁸ points.
What kind of solution should I look for? Was this kind of problem solved before?
This is not a classroom problem or something related. I have been just thinking about this problem.
I suggest using ideas that are derived from quickly solving k-nearest-neighbor searches.
The M-Tree data structure: (see http://en.wikipedia.org/wiki/M-tree and http://www.vldb.org/conf/1997/P426.PDF ) is designed to reduce the number distance comparisons that need to be performed to find "nearest neighbors".
Personally, I could not find an implementation of an M-Tree online that I was satisfied with (see my closed thread Looking for a mature M-Tree implementation) so I rolled my own.
My implementation is here: https://github.com/jon1van/MTreeMapRepo
Basically, this is binary tree in which each leaf node contains a HashMap of Keys that are "close" in some metric space you define.
I suggest using my code (or the idea behind it) to implement a solution in which you:
Search each leaf node's HashMap and find the closest pair of Keys within that small subset.
Return the closest pair of Keys when considering only the "winner" of each HashMap.
This style of solution would be a "divide and conquer" approach the returns an approximate solution.
You should know this code has an adjustable parameter the governs the maximum number of Keys that can be placed in an individual HashMap. Reducing this parameter will increase the speed of your search, but it will increase the probability that the correct solution won't be found because one Key is in HashMap A while the second Key is in HashMap B.
Also, each HashMap is associated a "radius". Depending on how accurate you want your result you maybe able to just search the HashMap with the largest hashMap.size()/radius (because this HashMap contains the highest density of points, thus it is a good search candidate)
Good Luck
If you only have sample distances, not original point locations in a plane you can operate on, then I suspect you are bounded at O(E).
Specifically, it would seem from your description that any valid solution would need to inspect every edge in order to rule out it having something interesting to say, meanwhile, inspecting every edge and taking the smallest solves the problem.
Planar versions bypass O(V^2), by using planar distances to deduce limitations on sets of edges, allowing us to avoid needing to look at most of the edge weights.
Use same idea as in space partitioning. Recursively split given set of points by choosing two points and dividing set in two parts, points that are closer to first point and points that are closer to second point. That is same as splitting points by a line passing between two chosen points.
That produces (binary) space partitioning, on which standard nearest neighbour search algorithms can be used.
I need to generate n random points in general position in the plane, i.e. no three points can lie on a same line. Points should have coordinates that are integers and lie inside a fixed square m x m. What would be the best algorithm to solve such a problem?
Update: square is aligned with the axes.
Since they're integers within a square, treat them as points in a bitmap. When you add a point after the first, use Bresenham's algorithm to paint all pixels on each of the lines going through the new point and one of the old ones. When you need to add a new point, get a random location and check if it's clear; otherwise, try again. Since each pair of pixels gives a new line, and thus excludes up to m-2 other pixels, as the number of points grows you will have several random choices rejected before you find a good one. The advantage of the approach I'm suggesting is that you only pay the cost of going through all lines when you have a good choice, while rejecting a bad one is a very quick test.
(if you want to use a different definition of line, just replace Bresenham's with the appropriate algorithm)
Can't see any way around checking each point as you add it, either by (a) running through all of the possible lines it could be on, or (b) eliminating conflicting points as you go along to reduce the possible locations for the next point. Of the two, (b) seems like it could give you better performance.
Similar to #LaC's answer. If memory is not a problem, you could do it like this:
Add all points on the plane to a list (L).
Shuffle the list.
For each point (P) in the list,
For each point (Q) previously picked,
Remove every point from L which are linear to P-Q.
Add P to the picked list.
You could continue the outer loop until you have enough points, or run out of them.
This might just work (though might be a little constrained on being random). Find the largest circle you can draw within the square (this seems very doable). Pick any n points on the circle, no three will ever be collinear :-).
This should be an easy enough task in code. Say the circle is centered at origin (so something of the form x^2 + y^2 = r^2). Assuming r is fixed and x randomly generated, you can solve to find y coordinates. This gives you two points on the circle for every x which are diametrically opposite. Hope this helps.
Edit: Oh, integer points, just noticed that. Thats a pity. I'm going to keep this solution up though - since I like the idea
Both #LaC's and #MizardX's solution are very interesting, but you can combine them to get even better solution.
The problem with #LaC's solution is that you get random choices rejected. The more points you have already generated the harder it gets to generate new ones. If there is only one available position left you have slight chance of randomly choosing it (1/(n*m)).
In the #MizardX's solution you never get rejected choices, however if you directly implement the "Remove every point from L which are linear to P-Q." step you'll get worse complexity (O(n^5)).
Instead it would be better to use a bitmap to find which points from L are to be removed. The bitmap would contain a value indicating whether a point is free to use and what is its location on the L list or a value indicating that this point is already crossed out. This way you get worst-case complexity of O(n^4) which is probably optimal.
EDIT:
I've just found that question: Generate Non-Degenerate Point Set in 2D - C++
It's very similar to this one. It would be good to use solution from this answer Generate Non-Degenerate Point Set in 2D - C++. Modifying it a bit to use radix or bucket sort and adding all the n^2 possible points to the P set initially and shufflying it, one can also get worst-case complexity of O(n^4) with a much simpler code. Moreover, if space is a problem and #LaC's solution is not feasible due to space requirements, then this algorithm will just fit in without modifications and offer a decent complexity.
Here is a paper that can maybe solve your problem:
"POINT-SETS IN GENERAL POSITION WITH MANY
SIMILAR COPIES OF A PATTERN"
by BERNARDO M. ABREGO AND SILVIA FERNANDEZ-MERCHANT
um, you don't specify which plane.. but just generate 3 random numbers and assign to x,y, and z
if 'the plane' is arbitrary, then set z=o every time or something...
do a check on x and y to see if they are in your m boundary,
compare the third x,y pair to see if it is on the same line as the first two... if it is, then regenerate the random values.