I would like to know the lower bound for time complexity in algorithms that need to find just one peak in a 2D array, and in general in a N-D array. The best for 2-D I achieved was O(n)
For an N x M array, consider the following algorithm:
We take the middle row, and in O(M) find the maximum element. If both of its (columnwise) neighbors are not larger, it is a peak. Otherwise, if the above number is larger, one of the top half of the rows must contain a peak. Similarly of course for the bottom half.
Thus, we can divide the search space in half with an O(M) step, leading to a total runtime of O(M log N). I do believe this is optimal, but have not proven it so.
Related
I can not decide the fastest way to pick the k nearest points to some point P from an n-points set. My guesses are below:
Compute the n-distance, order it and pick the k smallest values;
Compute pointwise distance and update a k-sized point stack;
Any other manners are welcome.
Getting the median is O(n) operation, thus the whole problem has minimum complexity of O(n) - compute all distances and find the kth smallest element partitioning the whole set by this threshold.
One can also work in chunks of K >> k.
The maximum of the first k distances works as a preliminary threshold: all the points further than that do not need to be considered. One will instead place all the points smaller than that into an array, and after the array size is close to K, one can use the kth element linear algorithm to re-partition the array.
Finding the smallest k elements is O(n), for any value of k. It's O(n log k) if you need the k elements to be sorted. This is the Partition Algorithm.
You're best off reading the algorithm on Wikipedia. It's quicksort, but you only need to 'recurse' on one side because the other side is guaranteed to be completely out or completely in. There are expensive tricks that guarantee O(n log n) instead of that merely being the average.
I have a set of points in the plane that I want to sort based on when they encounter an arbitrary sweepline. An alternative definition is that I want to be able to sort them based on any linear combination of the x- and y-coordinates. I want to do the sorting in linear time, but am allowed to perform precomputation on the set of points in quadratic time (but preferably O(n log(n))). Is this possible? I would love a link to a paper that discusses this problem, but I could not find it myself.
For example, if I have the points (2,2), (3,0), and (0,3) I want to be able to sort them on the value of 3x+2y, and get [(0,3), (3,0), (2,2)].
Edit: In the comments below the question a helpful commenter has shown me that the naive algorithm of enumerating all possible sweeplines will give a O(n^2 log(n)) preprocessing algorithm (thanks again!). Is it possible to have a O(n log(n)) preprocessing algorithm?
First note, that enumerating all of the sweeplines takes O(n^2 log(n)), but then you have to sort the n^2 sweeplines. Doing that naively will take time O(n^3 log(n)) and space O(n^3).
I think I can get average performance down to O(n) with O(n^2 log*(n)) time and O(n^2) space spent on preprocessing. (Here log* is the iterated logarithm and for all intents and purposes it is a constant.) But this is only average performance, not worst case.
The first thing to note is that there are n choose 2 = n*(n-1)/2 pairs of points. As we rotate 360 degrees, each pair will cross the other twice, for at most O(n^2) different orderings and O(n^2) pair crossings between them. Also note that after a pair crosses, it does not cross again for 180 degrees. Over any range of less than 180 degrees, a given pair either will cross once or won't.
Now the idea is that we'll store a random O(n) of those possible orderings and which sweepline they correspond to. Between any sweepline and the next, we'll see O(n^2 / n) = O(n) pairs of points cross. Therefore both sorts are correct to on average O(1), and every inversion between the first and the order we want is an inversion between the first and second sorts. We'll use this to find our final sort in O(n).
Let me fill in details backwards.
We have our O(n) sweeplines precalculated. In time O(log(n)) we find the two nearest. Let's assume we find the following data structures.
pos1: Lookup from point to its position in sweepline 1.
points1: Lookup from position to the point there in sweepline 1.
points2: Lookup from position to the point there in sweepline 2.
We will now try to sort in time O(n).
We initialize the following data structures:
upcoming: Priority queue of points that could be next.
is_seen: Bitmap from position to whether we've added the point to upcoming.
answer: A vector/array/whatever you language calls it that will hold the answer at the end.
max_j: The farthest point in line 2 that we have added to upcoming. Starts at -1.
And now we do the following.
for i in range(n):
while is_seen[i] == 0:
# Find another possible point
max_j++
point = points2[max_j]
upcoming.add(point with where it is encountered as priority)
is_seen[pos1[point]] = 1
# upcoming has points1[i] and every point that can come before it.
answer.append(upcoming.pop())
Waving my hands vigorously, every point is put into upcoming once, and taken out once. On average, upcoming has O(1) points in it, so all operations average out to O(1). Since there are n points, the total time is O(n).
OK, how do we set up our sweeplines? Since we only care about average performance, we cheat. We randomly choose O(n) pairs of points. Each pair of points defines a sweepline. We sort those sweeplines in O(n log(n)).
Now we have to sort O(n) sweeplines. How do we do this?
Well we can sort a fixed number of them by any method we want. Let's pick 4 evenly chosen sweeplines and do that. (We actually only need to do the calculation 2x. We pick 2 pairs of points. We pick the sweepline where the first 2 cross, then the second 2 cross, then the other 2 sweeplines are at 180 degrees from the first 2, and therefore are just reversed order.) After that, we can use the algorithm above to sort a sweepline between 2 others. And do that through bisection to smaller and smaller intervals.
Now, of course, the sweeplines will not be as close as they were above. But let's note that if we expect the points to agree to within an average O(f(n)) places between the sweepline, then the heap will have O(f(n)) elements in it, and operations on it will take O(log(f(n))) time, and so we get the intermediate sweepline in O(n log(f(n)). How long is the whole calculation?
Well, we have kind of a tree of calculations to do. Let's divide the sweeplines by what level they are, the group them. The grouping will be the top:
1 .. n/log(n)
n/log(n) .. n/log(log(n))
n/log(log(n)) .. n/log(log(log(n)))
...and so on.
In each group we have O(n / log^k(n)) sweeplines to calculate. Each sweepline takes O(n log^k(n)) time to calculate. Therefore each level takes O(n^2). The number of levels is the iterated logarithm, log*(n). So total preprocessing time is O(n^2 log*(n)).
The question is as the title says.
I am trying to figure out if there is a way of finding peak element in
2d-array in O(n) time where n is the length of each side in 2d-array i.e.
n^2 total elements.
By definition, "peak" in a 2-d array is an element such that it is >=
to all its neighbours (that is elements in up, down, left and right slots).
I read course note at:
http://courses.csail.mit.edu/6.006/spring11/lectures/lec02.pdf
and understood how to do in O(nlogn) but don't seem to quite
grasp how to do about O(n).
Could anybody come up with or explain how this problem can solved in O(n)?
Edit: n is the length of each side of the array i.e there are n^2 elements total.
The second algorithm given in the linked PDF is O(n). A "window" is defined to collectively be the boundary (i.e. all four outer edges), middle column and middle row of the current sub-square. Here's a summary of the algorithm:
Find maximum value in current window
Return it if it's a peak
Otherwise, find the larger neighbor of this maximum and recurse in the corresponding quadrant.
As described in the slides, the time complexity is defined by T(n) = T(n/2) + cn (the T(n/2) term is due to the edge length being halved on each recursive step; the cn term is the time required to find the maximum in the current window). Hence, the complexity is O(n).
The correctness of this algorithm is based on several observations that are listed on one of the slides:
If you enter a quadrant, it contains a peak of the overall array
This is basically a generalization of the same 1D argument. You only enter a quadrant when it contains some element greater than all elements on the border. So, either that element will be a peak, or you can keep "climbing up" until you find a peak somewhere in the quadrant.
Maximum element of window never decreases as we descend in recursion
The next window in the recursion always contains the maximum element of the current window, so this is true.
Peak in visited quadrant is also peak in overall array
This follows from the definition of peak, since it only depends on immediate neighbors.
Disclaimer: there are many questions about it, but I didn't find any with requirement of constant memory.
Hamming numbers is a numbers 2^i*3^j*5^k, where i, j, k are natural numbers.
Is there a possibility to generate Nth Hamming number with O(N) time and O(1) (constant) memory? Under generate I mean exactly the generator, i.e. you can only output the result and not read the previously generated numbers (in that case memory will be not constant). But you can save some constant number of them.
I see only best algorithm with constant memory is not better than O(N log N), for example, based on priority queue. But is there mathematical proof that it is impossible to construct an algorithm in O(N) time?
First thing to consider here is the direct slice enumeration algorithm which can be seen e.g. in this SO answer, enumerating the triples (k,j,i) in the vicinity of a given logarithm value (base 2) of a sequence member so that target - delta < k*log2_5 + j*log2_3 + i < target + delta, progressively calculating the cumulative logarithm while picking the j and k so that i is directly known.
It is thus an N2/3-time algo producing N2/3-wide slices of the sequence at a time (with k*log2_5 + j*log2_3 + i close to the target value, so these triples form the crust of the tetrahedron filled with the Hamming sequence triples 1), meaning O(1) time per produced number, thus producing N sequence members in O(N) amortized time and O(N2/3)-space. That's no improvement over the baseline Dijkstra's algorithm 2 with the same complexities, even non-amortized and with better constant factors.
To make it O(1)-space, the crust width will need to be narrowed as we progress along the sequence. But the narrower the crust, the more and more misses will there be when enumerating its triples -- and this is pretty much the proof you asked for. The constant slice size means O(N2/3) work per the O(1) slice, for an overall O(N5/3) amortized time, O(1) space algorithm.
These are the two end points on this spectrum: from N1-time, N2/3-space to N0 space, N5/3-time, amortized.
1 Here's the image from Wikipedia, with logarithmic vertical scale:
This essentially is a tetrahedron of Hamming sequence triples (i,j,k) stretched in space as (i*log2, j*log3, k*log5), seen from the side. The image is a bit askew, if it's to be true 3D picture.
edit: 2 It seems I forgot that the slices have to be sorted, as they are produced out of order by the j,k-enumerations. This changes the best complexity for producing the sequence's N numbers in order via the slice algorithm to O(N2/3 log N) time, O(N2/3) space and makes Dijkstra's algorithm a winner there. It doesn't change the top bound of O(N5/3) time though, for the O(1) slices.
For my algorithm design class homework came this brain teaser:
Given a list of N distinct positive integers, partition the list into two
sublists of n/2 size such that the difference between sums of the sublists
is maximized.
Assume that n is even and determine the time complexity.
At first glance, the solution seems to be
sort the list via mergesort
select the n/2 location
for all elements greater than, add to high array
for all elements lower than, add to low array
This would have a time complexity of O((n log n)+ n)
Are there any better algorithm choices for this problem?
Since you can calculate median in O(n) time you can also solve this problem in O(n) time. Calculate median, and using it as threshold, create high array and low array.
See http://en.wikipedia.org/wiki/Median_search on calculating median in O(n) time.
Try
http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm
What you're effectively doing is finding the median. The trick is, once you've found the values, you wouldn't have needed to sort the first n/2 and the last n/2.