Fast way to calculate k nearest points to point P - algorithm

I can not decide the fastest way to pick the k nearest points to some point P from an n-points set. My guesses are below:
Compute the n-distance, order it and pick the k smallest values;
Compute pointwise distance and update a k-sized point stack;
Any other manners are welcome.

Getting the median is O(n) operation, thus the whole problem has minimum complexity of O(n) - compute all distances and find the kth smallest element partitioning the whole set by this threshold.
One can also work in chunks of K >> k.
The maximum of the first k distances works as a preliminary threshold: all the points further than that do not need to be considered. One will instead place all the points smaller than that into an array, and after the array size is close to K, one can use the kth element linear algorithm to re-partition the array.

Finding the smallest k elements is O(n), for any value of k. It's O(n log k) if you need the k elements to be sorted. This is the Partition Algorithm.
You're best off reading the algorithm on Wikipedia. It's quicksort, but you only need to 'recurse' on one side because the other side is guaranteed to be completely out or completely in. There are expensive tricks that guarantee O(n log n) instead of that merely being the average.

Related

What are the tighest lower bound in a 2-D Peak Finding?

I would like to know the lower bound for time complexity in algorithms that need to find just one peak in a 2D array, and in general in a N-D array. The best for 2-D I achieved was O(n)
For an N x M array, consider the following algorithm:
We take the middle row, and in O(M) find the maximum element. If both of its (columnwise) neighbors are not larger, it is a peak. Otherwise, if the above number is larger, one of the top half of the rows must contain a peak. Similarly of course for the bottom half.
Thus, we can divide the search space in half with an O(M) step, leading to a total runtime of O(M log N). I do believe this is optimal, but have not proven it so.

Find Pair with Difference less than K with O(n) complexity on average

I have an unsorted array of n positive numbers and a parameter k, I need to find out if there is a pair of numbers in the array that the difference between than is less than k and I need to do so in time complexity of O(n) on probable average and in space complexity of O(n).
I believe it requires the use of a universal hash table but I'm not sure how, any ideas?
This answer works even on unbounded integers and floats (doing some assumptions on the nicety of the hashmap you'll be using - the java implementation should work for instance):
keep a hashmap<int, float> all_divided_values. For each key y,
if all_divided_values[y] exists, it will contain a value v that
is in the array such that floor(v/k) = y.
For each value v in the original array A, if v/k is in all_divided_values's keys, output (v, all_divided_values[v/k])
(they are distant by less than k). Else, store v in
all_divided_values[v/k]
Once all_divided_values is filled, go through A again. For each v, test whether all_divided_values[v/k - 1] exists, and if so,
output the pair (v, all_divided_values[v/k - 1]) if and only if abs(v-all_divided_values[v/k - 1])<=k
Inserting in a hashmap is usually (with Java hashmap for instance) O(1) in average, so the total time is O(n). But please note that technically this could be false, for instance if your language's implementation does not have a nice strategy about the hashmap.
Simple solution:
1- Sort the array
2- Calculate the difference between consecutive elements
a) If the difference is smaller than k return that pair
b) If no consecutive number difference yields a value smaller than k, then your array has no pair of numbers such that the difference is smaller than k.
Sorting is O(nlogn), but if you have only Integers of limited size, you can use Counting sort, that is O(n)
You can consider this way.
The problem can be modeled as this:-
consider each element (considering integer) now you convert them to a range (A[i]-K,A[i]+K)
Now you want to check if any of the two intervals overlap.
Interval intersection problem without any sorted ness is not solvable in O(n) (worst case). You need to sort them and then inn O(n) you can check if hey intersect.
Same goes for your logic. Sort it and find it.

Grouping set of points to nearest pairs

I need an algorithm for the following problem:
I'm given a set of 2D points P = { (x_1, y_1), (x_2, y_2), ..., (x_n, y_n) } on a plane. I need to group them in pairs in the following manner:
Find two closest points (x_a, y_a) and (x_b, y_b) in P.
Add the pair <(x_a, y_a), (x_b, y_b)> to the set of results R.
Remove <(x_a, y_a), (x_b, y_b)> from P.
If initial set P is not empty, go to the step one.
Return set of pairs R.
That naive algorithm is O(n^3), using faster algorithm for searching nearest neighbors it can be improved to O(n^2 logn). Could it be made any better?
And what if the points are not in the euclidean space?
An example (resulting groups are circled by red loops):
Put all of the points into an http://en.wikipedia.org/wiki/R-tree (time O(n log(n))) then for each point calculate the distance to its nearest neighbor. Put points and initial distances into a priority queue. Initialize an empty set of removed points, and an empty set of pairs. Then do the following pseudocode:
while priority_queue is not empty:
(distance, point) = priority_queue.get();
if point in removed_set:
continue
neighbor = rtree.find_nearest_neighbor(point)
if distance < distance_between(point, neighbor):
# The previous neighbor was removed, find the next.
priority_queue.add((distance_between(point, neighbor), point)
else:
# This is the closest pair.
found_pairs.add(point, neighbor)
removed_set.add(point)
removed_set.add(neighbor)
rtree.remove(point)
rtree.remove(neighbor)
The slowest part of this is the nearest neighbor searches. An R-tree does not guarantee that those nearest neighbor searches will be O(log(n)). But they tend to be. Furthermore you are not guaranteed that you will do O(1) neighbor searches per point. But typically you will. So average performance should be O(n log(n)). (I might be missing a log factor.)
This problem calls for a dynamic Voronoi diagram I guess.
When the Voronoi diagram of a point set is known, the nearest neighbor pair can be found in linear time.
Then deleting these two points can be done in linear or sublinear time (I didn't find precise info on that).
So globally you can expect an O(N²) solution.
If your distances are arbitrary and you can't embed your points into Euclidean space (and/or the dimension of the space would be really high), then there's basically no way around at least a quadratic time algorithm because you don't know what the closest pair is until you check all the pairs. It is easy to get very close to this, basically by sorting all pairs according to distance and then maintaining a boolean look up table indicating which points in your list have already been taken, and then going through the list of sorted pairs in order and adding a pair of points to your "nearest neighbors" if neither point in the pair is in the look up table of taken points, and then adding both points in the pair to the look up table if so. Complexity O(n^2 log n), with O(n^2) extra space.
You can find the closest pair with this divide and conquer algorithm that runs in O(nlogn) time, you may repeat this n times and you will get O(n^2 logn) which is not better than what you got.
Nevertheless, you can exploit the recursive structure of the divide and conquer algorithm. Think about this, if the pair of points you removed were on the right side of the partition, then everything will behave the same on the left side, nothing changed there, so you just have to redo the O(logn) merge steps bottom up. But consider that the first new merge step will be to merge 2 elements, the second merges 4 elements then 8, and then 16,.., n/4, n/2, n, so the total number of operations on these merge steps are O(n), so you get the second closest pair in just O(n) time. So you repeat this n/2 times by removing the previously found pair and get a total O(n^2) runtime with O(nlogn) extra space to keep track of the recursive steps, which is a little better.
But you can do even better, there is a randomized data structure that let you do updates in your point set and get an expected O(logn) query and update time. I'm not very familiar with that particular data structure but you can find it in this paper. That will make your algorithm O(nlogn) expected time, I'm not sure if there is a deterministic version with similar runtimes, but those tend to be way more cumbersome.

Algorithm to solve variation of k-partition

For my algorithm design class homework came this brain teaser:
Given a list of N distinct positive integers, partition the list into two
sublists of n/2 size such that the difference between sums of the sublists
is maximized.
Assume that n is even and determine the time complexity.
At first glance, the solution seems to be
sort the list via mergesort
select the n/2 location
for all elements greater than, add to high array
for all elements lower than, add to low array
This would have a time complexity of O((n log n)+ n)
Are there any better algorithm choices for this problem?
Since you can calculate median in O(n) time you can also solve this problem in O(n) time. Calculate median, and using it as threshold, create high array and low array.
See http://en.wikipedia.org/wiki/Median_search on calculating median in O(n) time.
Try
http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm
What you're effectively doing is finding the median. The trick is, once you've found the values, you wouldn't have needed to sort the first n/2 and the last n/2.

In-place sorting algorithm for the k smallest integers in an array on n distinct integers

Is there an in-place algorithm to arrange the k smallest integers in an array of n distinct integers with 1<=k<=n?
I believe counting sort can be modified for this, but I can't seem to figure out how?
Any help will be appreciated.
How about selection sort? It runs in place in O(n^2). Just stop after you've found k smallest elements.
Do you want to partition the array so that k smallest elements are the first k elements (not necessarily sorted order)? IF so, what you are looking for is generalized median find algorithm which runs in O(n) (Just google for median find algorithm).
If you can live with randomized algorithm that finishes in linear time with high probability then all you have to do is keep picking your pivot randomly which greatly simplifies the implementation.
You could use randomized selection to select the kth smallest integer in O(n) time, then partition on that element, and then use quicksort on the k smallest elements. This uses O(1) additional memory and runs in total time O(n + k log k).
You're looking for a selection algorithm. BFPRT will give you guaranteed worst-case O(n) performance, but it's pretty complex.

Resources