I'm trying to solve this problem from my textbook:
When a frisbee is thrown into a bucket, it gets stuck where the inner diameter of the bucket is smaller than the outer diameter of the frisbee. Analyse where in the bucket the frisbees gets stuck given the dimensions of the bucket and the radius of the frisbees. The bucket is always empty when a new frisbee is thrown, so they don't pile up in the bucket.
The bucket is symmetric along the y-axis. The picture shows what the cross section of the bucket can look like. The walls of the bucket are line segments that always connect at origo.
Input: (x_1,y_1),...,(x_m,y_m),r_1,...,r_n, all numbers are positive, m≤n and y_1 < y_2 < ...<y_m.
(x_i,y_i) are the coordinates of the wall of the bucket, from the bottom to the top. r_i is the radius of the frisbee i.
Output: h_1,...h_n, where h_1 ≤ h_2 ≤...≤ h_n
These are the different heights (y-coordinates) where the frisbees get stuck.
The algorithm should be as efficient as possible.
Thanks in advance!
A lot of great algorithms have complexities that are dominated by an initial sort, so that really shouldn't set off any alarm bells for you.
Since the problem statement indicates that there are more frisbees than bucket segments, though, the complexity of O(n log n + m) that you achieve isn't quite optimal.
Note that a frisbee can only get stuck on the segment above a point that has a smaller x than all the points above it. Start by making a list of these points in y order, which you can do easily in O(m) time.
Because every point in the list has a smaller x than all the points after it, the list is monotonically increasing in x. For each frisbee, therefore, you can do a binary search in the list to find the last point with x < r. This takes O(log m) per frisbee for a total time of O(n log m + m), which is strictly smaller than O(n log n + m).
The worst case is when every frisbee lands on a different segment, and that's possible if m = n and if, for all j and i such that 0 < j < i, x_j < x_i. From there, the problem amounts to sorting the list of frisbees. So the complexity of the solution is indeed dominated by the sorting.
However, the complexity of sorting isn't necessarily O(n log n). If the maximum size of a frisbee has a limit l, and if l < n, then using radix sort will lower the complexity to O(n log l) (where log l is the number of digits of l or "key length").
Related
Suppose that you are given a sequence of n elements to sort. The input sequence
consists of n=k subsequences, each containing k elements.The elements in a given
subsequence are all smaller than the elements in the succeeding subsequence and
larger than the elements in the preceding subsequence.
So is there a O(nlogk) method to put a disordered array to an array described above? thx!
A different formulation of the question
The question can be thought of like this. You have n balls of different sizes. You want to organize these into n/k buckets such that each bucket contains exactly k balls. Furthermore these buckets are placed in a line in which the left most bucket contains the k smallest balls. The 2nd bucket from the left contains the next k balls that would have been the smallest if we were to remove the leftmost bucket. The rightmost bucket contains the k largest balls.
But within each bucket you have no order. If you want the largest ball you know which bucket you must begin searching in, but you still need to search around in it.
I will be using the term bucket instead of subsequence since subsequence makes me think about ordering which is not important, what is important is belonging so bucket is easier for me.
A problem with the proposed complexity of the imagined solution
You are stating that k is the length (or size) of each bucket. It therefore naturally can be between 1 and n.
You then ask for if a O(n log k) solution exists that can organize the elements in this manner. There is a problem with your proposed complexity that is easy to see when we consider the two extremes k=1 and k=n.
k=n. Meaning we only have one large bucket. This is trivial since no action is needed. But your proposed complexity was O(n log k) = O(n log n) when k = n.
Let us consider k=1 too because it has a similar, but inverse, issue.
k=1. Each bucket contains 1 ball, and we need n buckets. This is the same as asking us to fully sort the whole sequence which will at best be O(n log n). But your proposed complexity was O(n log k) = O(n log 1) = O(n * 0) = 0. Remember log 1 = 0. It seems that your proposed complexity does not fit the problem at all.
We can pause here and say. No, you cannot do what you wish on O(n log k) because it does not make sense that the problem would become harder when you decrease the number of buckets. More importantly it cannot become easier as you increase the number of buckets.
If I were tasked to do this sorting manually I would say is trivial to sort into one bucket. Two is easy. Three would be harder than two. If you have n buckets then that is as hard as it can get!
Answer for an altered complexity
It is however interesting to consider what would happen if we were to fix your proposed complexity so that we instead ask the following. Is there a way to sort into these buckets in O(n log b) where b is the number of buckets (b = n / k)?
The extreme cases here seem to make sense.
b=1. One bucket. No sorting needed. O(n log b) = O(n log 1) = O(0). (technically this should still maybe be O(1))
b=n. n buckets. Full sort needed. O(n log b) = O(n log n).
So a solution seems possible. But this is outside the scope of the question now. I however suspect that Selection Algorithms and quickselect are the way to go.
Suppose I have N points in the interval [0,1] and I have already divided this unit interval into n sub-intervals, say [0,x1),[x1,x2),...,[xn-2,xn-1),[xn-1,1] Then I need to determine which sub-interval does each of these N points belong to. What is the best algorithm for completing this job? These sub-intervals do not distribute evenly but they are known. N is O(1 million), n is O(1 k).
IF points are not sorted, sort them by coordinate.
Do merging (like merge algorithm in MergeSort) for point list with interval list.
Complexity is O(NlogN + N + n) (or O(N + n) if both list are sorted already)
Compare with #Mukul Varshney approach complexity O(Nlogn) and choose the best variant for your case
Assuming lower bound of each interval i.e. 0,x1,x2,x3.... are in order, Preserve the 1st value(i.e. lower bounds) of the interval in an array, then use binary search to locate the index greater or less than the number n belongs to.
This is an interesting question I have found on the web. Given an array containing n numbers (with no information about them), we should pre-process the array in linear time so that we can return the k smallest elements in O(k) time, when we are given a number 1 <= k <= n
I have been discussing this problem with some friends but no one could find a solution; any help would be appreciated!
For the pre-processing step, we will use the partition-based selection several times on the same data set.
Find the n/2-th number with the algorithm.. now the dataset is partitioned into two half, lower and upper. On the lower half find again the middlepoint. On its lower partition do the same thing and so on... Overall this is O(n) + O(n/2) + O(n/4) + ... = O(n).
Now when you have to return the k smallest elements, search for the nearest x < k, where x is a partition boundary. Everything below it can be returned, and from the next partition you have to return k - x numbers. Since the next partition's size is O(k), running another selection algorithm for the k - x th number will return the rest.
We can find the median of a list and partition around it in linear time.
Then we can use the following algorithm: maintain a buffer of size 2k.
Every time the buffer gets full, we find the median and partition around it, keeping only the lowest k elements.
This requires n/k find-median-and-partition steps, each of which take O(k) time with a traditional quickselect. this approach requires only O(n) time.
Additionally if you need the sorted output.
Which adds an additional O(k log k) time. In total, this approach requires only O(n + k log k) time and O(k) space.
The background
According to Wikipedia and other sources I've found, building a binary heap of n elements by starting with an empty binary heap and inserting the n elements into it is O(n log n), since binary heap insertion is O(log n) and you're doing it n times. Let's call this the insertion algorithm.
It also presents an alternate approach in which you sink/trickle down/percolate down/cascade down/heapify down/bubble down the first/top half of the elements, starting with the middle element and ending with the first element, and that this is O(n), a much better complexity. The proof of this complexity rests on the insight that the sink complexity for each element depends on its height in the binary heap: if it's near the bottom, it will be small, maybe zero; if it's near the top, it can be large, maybe log n. The point is that the complexity isn't log n for every element sunk in this process, so the overall complexity is much less than O(n log n), and is in fact O(n). Let's call this the sink algorithm.
The question
Why isn't the complexity for the insertion algorithm the same as that of the sink algorithm, for the same reasons?
Consider the actual work done for the first few elements in the insertion algorithm. The cost of the first insertion isn't log n, it's zero, because the binary heap is empty! The cost of the second insertion is at worst one swap, and the cost of the fourth is at worst two swaps, and so on. The actual complexity of inserting an element depends on the current depth of the binary heap, so the complexity for most insertions is less than O(log n). The insertion cost doesn't even technically reach O(log n) until after all n elements have been inserted [it's O(log (n - 1)) for the last element]!
These savings sound just like the savings gotten by the sink algorithm, so why aren't they counted the same for both algorithms?
Actually, when n=2^x - 1 (the lowest level is full), n/2 elements may require log(n) swaps in the insertion algorithm (to become leaf nodes). So you'll need (n/2)(log(n)) swaps for the leaves only, which already makes it O(nlogn).
In the other algorithm, only one element needs log(n) swaps, 2 need log(n)-1 swaps, 4 need log(n)-2 swaps, etc. Wikipedia shows a proof that it results in a series convergent to a constant in place of a logarithm.
The intuition is that the sink algorithm moves only a few things (those in the small layers at the top of the heap/tree) distance log(n), while the insertion algorithm moves many things (those in the big layers at the bottom of the heap) distance log(n).
The intuition for why the sink algorithm can get away with this that the insertion algorithm is also meeting an additional (nice) requirement: if we stop the insertion at any point, the partially formed heap has to be (and is) a valid heap. For the sink algorithm, all we get is a weird malformed bottom portion of a heap. Sort of like a pine tree with the top cut off.
Also, summations and blah blah. It's best to think asymptotically about what happens when inserting, say, the last half of the elements of an arbitrarily large set of size n.
While it's true that log(n-1) is less than log(n), it's not smaller by enough to make a difference.
Mathematically: The worst-case cost of inserting the i'th element is ceil(log i). Therefore the worst-case cost of inserting elements 1 through n is sum(i = 1..n, ceil(log i)) > sum(i = 1..n, log i) = log 1 + log 1 + ... + log n = log(1 × 2 × ... × n) = log n! = O(n log n).
Ran into the same problem yesterday. I tried coming up with some form of proof to satisfy myself. Does this make any sense?
If you start inserting from the bottom, The leaves will have constant time insertion- just copying it into the array.
The worst case running time for a level above the leaves is:
k*(n/2h)*h
where h is the height (leaves being 0, top being log(n) ) k is a constant( just for good measure ). So (n/2h) is the number of nodes per level and h is the MAXIMUM number of 'sinking' operations per insert
There are log(n) levels,
Hence, The total running time will be
Sum for h from 1 to log(n): n* k* (h/2h)
Which is k*n * SUM h=[1,log(n)]: (h/2h)
The sum is a simple Arithmetico-Geometric Progression which comes out to 2.
So you get a running time of k*n*2, Which is O(n)
The running time per level isn't strictly what i said it was but it is strictly less than that.Any pitfalls?
This is not a homework problem . Its an interview question . I am not able to come up with good solution for this problem .
Problem :
Given an n*n (bottom left(0,0) , top right(n,n)) grid and n rectangles with sides parallel to the coordinate axis. The bottom left and top right coordinates for the n rectangles are provided in the form (x1,y1)(x1',y1') .... (xn,yn)(xn',yn'). There are M queries which asks for the number of rectangles that cover a rectangle with coordinates (a,b)(c,d). How do I solve it in an efficient way ? Is there a way to precompute for all coordinate positions so that I can return the answer in O(1) .
Constraints:
1<= n <= 1000
It is straightforward to create, in O(n^4) space and O(n^5) time, a data structure that provides O(1) lookups. If M exceeds O(n^2) it might be worthwhile to do so. It also is straightforward to create, in O(n^2) space and O(n^3) time, a data structure that provides lookups in O(n) time. If M is O(n^2), that may be a better tradeoff; ie, take O(n^3) precomputation time and O(n^3) time for O(n^2) lookups at O(n) each.
For the precomputation, make an n by n array of lists. Let L_pq denote the list for cell p,q of the n by n grid. Each list contains up to n rectangles, with lists all ordered by the same relation (ie if Ri < Rj in one list, Ri < Rj in every list that pair is in). The set of lists takes time O(n^3) to compute, taken either as "for each C of n^2 cells, for each R of n rectangles, if C in R add R to L_C" or as "for each R of n rectangles, for each cell C in R, add R to L_C".
Given a query (a,b,c,d), in time O(n) count the size of the intersection of lists L_ab and L_cd. For O(1) lookups, first do the precomputation mentioned above, and then for each a,b, for each c>a and d<b, do the O(n) query mentioned above and save the result in P[a,b,c,d] where P is an appropriately large array of integers.
It is likely that an O(n^3) or perhaps O(n^2 · log n) precomputation method exists using either segment trees, range trees, or interval trees that can do queries in O(log n) time.