Minimum interval containing all the objects - algorithm

Given a set of objects, each of which is placed at several locations on a Natural number line: Find the smallest interval [a, b] containing all the objects.
Example: Consider 3 objects A, B, C
A is placed at 1, 5, 7
B is placed at 2, 4, 6
C is placed at 4, 8, 9
The smallest interval that encompasses all the three objects is [4, 5].
I can only think of O(S^2) solution where S is the minimal interval containing all the object locations i.e, [1, 9].
Is there a better way to do this ?
PS : Note that multiple objects can be placed at the same location.

Sort all the data points in ascending order (nlogn time).
Traverse these data points from the left.
Keep track of the following:
1. For each type of object, maintain an entry of the coordinate of last object found (maybe through a hashmap for fast operation).
2. Minimum interval length found till now.
3. The coordinate of the earliest element in the list. This is to keep track of the start of current interval.
Whenever you encounter an object,
1. Update its entry in the maintained list.
2. Check whether the coordinate of earliest element has been updated. If so, then calculate the new interval length and update the minimum interval length if the new one is smaller.
You will first need to ascertain that you have encountered all types of objects to calculate the first valid minimum interval length. You can do that by a counter.
If the number of different types of elements is bounded and small, then the order of complexity is O(nlogn) where n is the total number of data points.

You can do this in O(N) using 2 indices while going through the list. (Lets call them left and right).
You start with both of them in position 1, and then increment right until [left,right] has all the elements. You know that this is the minimum interval starting in left that has all the elements. Now increment left. Now increment right again until you have all the elements. (NOTE, many times you don't even have to increment). Get the minimum out of all the complete intervals and you have your answer.
This works because if you know [left,right] is the minimum interval starting at left, the interval starting at left+1 will have it's right >= the last right.
This is O(N) because you add the elements of a location once, and delete them once.
You'll need to use a hash to count the unique elements.

Related

Create a binary tree in O(n)

I have a sequence with n numbers and i want to create a data structure to answer the following question:
sequence n = [5 ,7, 4, 24, 8, 3, 12, 34]
I want the min(2,5) then the answer is 3 because a2=7, a3=4, a4=24, a5=8. So the min(i,j) returns the position of minimum number between (i,j).
I thought that a good data structure to save this sequence would be a complete binary tree to save the sequence numbers at leaves. But how can i implement this structure in O(n)?
All you need is a Segment Tree with range minimum query. Here is detailed explanation of it. Building time is O(n), because there are in tree no more than 2 * n nodes, so final time complexity will be O(n).
If you need to find not only the minimum value, but also the position, then inside the vertex you need to store not only the minimum, but also where it was reached. How to update such a structure seems clear: when you recalculate the minimum in the father, you need to see from which son it is received and take the corresponding position of the minimum from the son. For leaves, the positions are equal to the positions of the leaves themselves.

Quick sorting algorithm states using middle element as pivot

I need help understanding exactly how the quick sort algorithm works. I've been watching teaching videos and still fail to really grasp it completely.
I have an unsorted list: 1, 2, 9, 5, 6, 4, 7, 8, 3
And I have to quick sort it using 6 as the pivot.
I need to see the state of the list after each partition procedure.
My main problem is understanding what the order of the elements are before and after the pivot. So in this case if we made 6 the pivot, I know the numbers 1 - 5 will be before 6 and 7 - 9 will go after that. But what will the order of the numbers 1 - 5 be and 7 - 9 be in the first partition given my list above?
Here is the partition algorithm that I want to use (bear in my I'm using the middle element as my initial pivot):
Determine the pivot, and swap the pivot with the first element of the list.
Suppose that the index smallIndex points to the last element smaller than the pivot. The index smallIndex is initialized to the first element of the list.
For the remaining elements in the list (starting at the second element)
If the current element is smaller than the pivot
a. Increment smallIndex
b. Swap the current element with the array element pointed to by smallIndex.
Swap the first element, that is the pivot, with the array element pointed to by smallIndex.
It would be amazing if anyone could show the list after each single little change that occurs to the list in the algorithm.
It doesn't matter.
All that matters - all that the partitioning process asserts - is that, after it has been run, there are no values on the left-hand side of the center point that emerges that are greater than the pivot and that there are no values on the right-hand side that are less than the pivot value.
The internal order of the two partitions is then handled in the subsequent recursive calls for each half.

Positioning an ordered sequence of intervals for maximum alignment with another sequence of fixed intervals

I have two sequences of intervals.
The first is fixed and non-overlapping, so something like:
[1..10], [12..15], [23..56], [72..89], ...
The second is not fixed, so it's just an ordered list of interval lengths:
[7, 2, 5, 26, ...]
The task at hand is to:
Place every interval from the second list at a given starting point, so that the second list becomes a list of fixed, non-overlapping intervals much like the first, while preserving its order
Find the alignment that minimizes the amount of integers that are in some interval from one of the lists but not in any interval from the other list
Very simple example:
[25..26], [58..68], [74..76], [78..86]
[10, 12]
The optimal solution is to place the interval of length 10 at [58..68] and the interval of length 12 at [74..86] which results in only the numbers 25, 26, and 77 being in one list but not the other.
The only thing I've come up with that seems mildly helpful is that if I lay down the intervals in order, I know how many 'penalties' the interval I've already created, so I have an upper bound for the score, which means I have an admissible heuristic and I can do A* search instead of looking at the entire tree. However, the total range of numbers spans from 0 to about 34M, so I'd like something better.
Any help would be hot!
OK, here's a half-thought-out answer. It should work in polynomial time, but I haven't bothered checking what the index is. It may well be possible to get a better index than the answer as outlined here. The details are left as an exercise to the reader :-) I hope it's not too unclear.
I'll define the score of a solution as the number of integers which appear in both lists of intervals. Let f(i,m) be the highest score it's possible to get using just the first i interval lengths, subject to the condition that none of your intervals goes above m. The function f, for fixed i, is essentially a (non-strictly) increasing function from the integers to a bounded subset of the integers. Therefore:
all values of f(i,m), for m > 0, are equal, with finitely many exceptions;
all values of f(i,m), for m < 0, are equal, with finitely many exceptions.
This means it's possible to represent all values of f(i,m) using a finite data structure (still considering a fixed value of i).
Now let F(i) be the value of this data structure representing all values of f(i,m). I claim that, given F(i), it is possible to calculate F(i+1). To do this, we only need to answer the following question for all x: If I place the new interval at x, how good is the best solution I can get? But we know what this is - it's just f(i,x) + the score we've got from this interval.
So if n is the number of intervals in the second list, the score of the best solution will be F(n).
To actually find the solution, you could work backwards from this.
You know what's the best score you can get. Say it's s_0. Then put the last interval as far left as possible, subject to the condition that it allows you to score s_0. That is, find the smallest m such that f(n,m) = s_0; and place the interval such that it only just stays inside the bound at m.
Then, let s_1 be the score you need to get from all the other intervals in order to get a total of s_0. Place the next-last interval as far left as possible, subject to the condition that you can still score s_1. That is, find the smallest m such that f(n,m) = s_1; and place the interval such that it only just stays inside the bound at m.
And so on...

Sort a given array based on parent array using only swap function

It is a coding interview question. We are given an array say random_arr and we need to sort it using only the swap function.
Also the number of swaps for each element in random_arr are limited. For this you are given an array parent_arr, containing number of swaps for each element of random_arr.
Constraints:
You should use swap function.
Every element may repeat minimum 5 times and maximum 26 times.
You cannot make elements of given array to 0.
You should not write helper functions.
Now I will explain how parent_arr is declared. If parent_arr is like:
parent_arr[] = {a,b,c,d,...,z} then
a can be swapped at most one time.
b can be swapped at most two times.
if parent_arr[] = {c,b,a,....,z} then
c can be swapped at most one time.
b can be swapped at most two times.
a can be swapped at most three times
My solution:
For each element in random_arr[] store that how many elements are below it, if it is sorted. Now select element having minimum swap count from parent_arr[] and check whether it exist in random_arr[]. If yes and it if has occurred more than one time then it will have more than one location where it can be placed. Now choose the position(rather element at that position, preciously) with maximum swap count and swap it. Now decrease the swap count for that element and sort the parent_arr[] and repeat the process.
But it is quite inefficient and its correctness can't be proved. Any ideas?
First, let's simplify your algorithm; then let's informally prove its correctness.
Modified algorithm
Observe that once you computed the number of elements below each number in the sorted sequence, you have enough information to determine for each group of equal elements x their places in the sorted array. For example, if c is repeated 7 times and has 21 elements ahead of it, then cs will occupy the range [21..27] (all indexes are zero-based; the range is inclusive of its ends).
Go through the parent_arr in the order of increasing number of swaps. For each element x, find the beginning of its target range rb; also note the end of its target range re. Now go through the elements of random_arr outside of the [rb..re] range. If you see x, swap it into the range. After swapping, increment rb. If you see that random_arr[rb] is equal to x, continue incrementing: these xs are already in the right spot, you wouldn't need to swap them.
Informal proof of correctness
Now lets prove the correctness of the above. Observe that once an element is swapped into its place, it is never moved again. When you reach an element x in the parent_arr, all elements with lower number of swaps are already processed. By construction of the algorithm this means that these elements are already in place. Suppose that x has k number of allowed swaps. When you swap it into its place, you move another element out.
This replaced element cannot be x, because the algorithm skips xs when looking for a destination in the target range [rb..re]. Moreover, the replaced element cannot be one of elements below x in the parent_arr, because all elements below x are in their places already, and therefore cannot move. This means that the swap count of the replaced element is necessarily k+1 or more. Since by the time that we finish processing x we have exhausted at most k swaps on any element (which is easy to prove by induction), any element that we swap out to make room for x will have at least one remaining swap that would allow us to swap it in place when we get to it in the order dictated by the parent_arr.

Given a set of intervals, find the minimum number of points that need to be placed, so that every interval has a point in it

Suppose you are given a set of intervals, with the starting time of each interval as s subscript i and the finishing time of f subscript i. Find the minimum number of points that need to be placed to that every interval has a point.
I'm trying to find an algorithm that would solve this. I'm getting stuck when an interval that overlap two intervals, i.e starts halfway through one interval and ends halfway through another has an interval that is contained in it.
Thanks
Remove any intervals that completely contain a smaller interval. You can do this because, if the smaller interval is satisfied, then the larger interval must also be satisfied.
Sort intervals by s_i.
Starting from the first interval: place a point at f_i. This will satisfy the first interval, and any intervals that overlap it.
Continue in sorted order to the next interval that does not yet contain a point, and place a point at f_i.
Repeat.
This question needs an answer with code. Here is a python implementation of the algorithm that user612112 mentions, which is a little better than the one in the accepted answer:
Initialize an empty list of output points
Sort the ranges by end point, and process them in end point order
For each range, if the last output point is less than the start of the range, then add the range's end point to the output set
Note that you don't need any preprocessing to remove redundant ranges, and you don't need the sort to distinguish between multiple ranges with the same end point.
# given some inclusive ranges
ranges=[(1,5),(2,4),(4,6),(3,7),(5,9),(6,6)]
# sort by the end points
ranges.sort(key=lambda p:p[1])
#generate required points
out=[]
last = None
for r in ranges:
if last == None or last < r[0]:
last = r[1]
out.append(last)
#print answer
print(out)
Sort the intervals in order of nondecreasing upper bound.
Initialize a variable most_recent_placed to -inf (something less than all interval lower bounds).
Scan the intervals in sorted order. For a given interval [a, b], if most_recent_placed < a, then put a point at b and set most_recent_placed to b.
The proof that this solution A is optimal is to establish inductively that for any valid solution B and any point x, the number of points placed by B with coordinates less than x is at least as large as the number of points placed by A left of x.
First sort the intervals in increasing order of starting point.
put a point on the smallest fi.
if the next interval which has finishing time f(i + 1) has this point then the previous point covers f(i+1) else put new point at f(i+1).
Iterate the procedure

Resources