How can you determine whether there exists a set of values that satisfy certain given criteria. The criteria are in the form of intervals and the minimum value within that interval.
For instance, given the criteria:
Interval : Minimum value in that interval
{2, 2} : 5
{1, 4} : 1
{4, 4} : 4
One set of values that can satisfy it is
{1, 5, 1, 4}
On the other hand, given the critera:
Interval : Minimum value in that interval
{2, 3} : 1
{1, 4} : 2
{4, 4} : 4
There exists no such set of values that satisfies them.
I want to determine whether there exists a set of values that satisfy the given criteria (i.e., I only want to find an algorithm that returns true if there exists a set of values that do satisfy the criteria and false if there does not).
I know how to do this using an O(N^2) brute-force, but I want to achieve an O(N lgN) solution if possible.
My first attempt at solving this involved merging overlapping intervals and then checking the merged intervals for the lowest value, however I quickly realized that doing so does not necessarily guarantee a correct answer.
My second attempt involved segment trees, namely trying to assign values to each values and if you were trying to overwrite an interval then no such interval existed. However, this was also quickly abandoned as you can still achieve at a valid set of values even when some portions are overwritten.
My third attempt involved interval trees, trying to find the intersection points between two intervals and checking whether a valid set of values could be created. This seemed promising but was an O(N^2) algorithm so was also abandoned.
Can anyone provide some insight?
The idea to assign values is correct. You just need to sort intervals by their minumum value(in increasing order). That is, the solution looks like this:
Build a segment tree(with -infinity values in all nodes).
Process all intervals in sorted order. For each interval, just assign its value(no matter what there was before).
Run queries for all intervals to check that everything is correct.
The only non-trivial statement is: if this algorithm did not find a solution, there is no solution. I will not post a formal proof, but here are key observation:
We must assign a new value to the entire interval when we process them in sorted order.
We do not assign a new value anywhere else(that is, we cannot destroy the value for another interval accidently).
Related
I have several sets of pairs like:
a: ([1, 2], [4,5])
b: ([1, 3])
c: ([4, 7], [1, 8])
d: ([9, 7], [1, 5])
...
Where no two pairs are identical, and the elements of no pair are identical. Each set can contain many pairs. There is a smallish number of elements (around 200).
From each set I take one pair. Now, I want to take pairs in such a way, that the number of elements is the smallest possible.
The problem is too large to try every combination, is there any algorithm or heuristic that might help me find the optimal (or a close guess)?
The problem has a definite NP-complete feel about it. So here are two greedy approaches that may produce reasonable approximate answers. To figure which is better you should implement both and compare.
The first is bottom up. Give each set a value of 2 if it has a pair selected from it, and (n+1)/n if it has n pairs partially selected from it. At each round, give each element a value for being selected which is the sum of the amount by which adding it increases the value of all of the sets. In the round select the element with the highest value, then update the value of all of the sets, update the value of all remaining elements, and continue.
This will pick elements that look like they are making progress towards covering all sets.
The second is top down. Start with all elements selected, and give each set a value of 1/n where n is the number of selected pairs. Elements that are required for all pairs in a given set are put into the final set. Of the remaining elements, find the one who increases the value the least if it is removed, and remove it.
The idea is that we start with too big a cover and repetitively remove the one which seems least important for covering all the sets. What we are left with is hopefully minimal.
let's say you have a finite and arbitrary set of sets, and each inner set has can contain integers from 1 to 4 not repeating. So a set could be {{1}, {1,4}, {1,4}, {1,2,3,4,4}, {2,3,4}}. And suppose you have a requirement that you have a set of numbers that have to be in the inner sets, but an inner set can only contribute one number to the requirement.
That was probably confusing, so let me given an example: Say the requirement is {1,2,3,4} and say the set is {{1,2, 3, 4}, {3,4}, {1,2}, {1,2}}. The, it meets the requirement, since you could take 3 from the first inner set, 4 from the second, 1 from the third, and 2 from the last. However, if the set is {{1,2,3,4}, {1,2}, {1,2}, {1,2}} then that does not meet the requirement since you could get a 3 or 4 from the first inner set, but not get the other from any of the other inner sets.
Note that for the requirements, duplicates are fine: so a requirement of {1,1,3} is allowed.
So my question is: Given a requirement and a set, how would you write an algorithm to determine if the set satisfies the condition?
Thanks for reading this!
Take the cross product of the inner sets, and see if it contains the requirement. (Where by cross-product of sets A and B, I mean all sets that can be derived by taking one element from A and one element from B; if exactly one element is a set, add the other element to that set; if both are sets, take their union.)
Try maximal matching in an unweighted bipartite graph.
I have a set of frequency values and I'd like to find the most likely subsets of values meeting the following conditions:
values in each subset should be harmonically related (approximately multiples of a given value)
the number of subsets should be as small as possible
every subset should have a minimum number of missing harmonics smaller than the highest value
E.g. [1,2,3,4,10,20,30] should return [1,2,3,4] and [10,20,30] (a set with all the values is not optimal because, even if they are harmonically related, there are many missing values)
The brute force method could be to compute all the possible subsets of values in the sets and compute some cost value, but that would take way too long time.
Is there any efficient algorithm to perform this task (or something similar)?
I would reduce the problem to minimum set cover, which, although NP-hard, often is efficiently solvable in practice via integer programming. I'm assuming that it would be reasonable to decompose [1, 2, 3, 4, 8, 12, 16] as [1, 2, 3, 4] and [4, 8, 12, 16], with 4 repeating.
To solve set cover (well, to use stock integer-program solvers, anyway), we need to enumerate all of the maximal allowed subsets. If the fundamental (i.e., the given value) must belong to the set, then, for each frequency, we can enumerate its multiples in order until too many in a row are missing. If not, we try all pairs of frequencies, assume that their fundamental is their approximate greatest common divisor, and extend the subset downward and upward until too many frequencies are missing.
I have two sequences of intervals.
The first is fixed and non-overlapping, so something like:
[1..10], [12..15], [23..56], [72..89], ...
The second is not fixed, so it's just an ordered list of interval lengths:
[7, 2, 5, 26, ...]
The task at hand is to:
Place every interval from the second list at a given starting point, so that the second list becomes a list of fixed, non-overlapping intervals much like the first, while preserving its order
Find the alignment that minimizes the amount of integers that are in some interval from one of the lists but not in any interval from the other list
Very simple example:
[25..26], [58..68], [74..76], [78..86]
[10, 12]
The optimal solution is to place the interval of length 10 at [58..68] and the interval of length 12 at [74..86] which results in only the numbers 25, 26, and 77 being in one list but not the other.
The only thing I've come up with that seems mildly helpful is that if I lay down the intervals in order, I know how many 'penalties' the interval I've already created, so I have an upper bound for the score, which means I have an admissible heuristic and I can do A* search instead of looking at the entire tree. However, the total range of numbers spans from 0 to about 34M, so I'd like something better.
Any help would be hot!
OK, here's a half-thought-out answer. It should work in polynomial time, but I haven't bothered checking what the index is. It may well be possible to get a better index than the answer as outlined here. The details are left as an exercise to the reader :-) I hope it's not too unclear.
I'll define the score of a solution as the number of integers which appear in both lists of intervals. Let f(i,m) be the highest score it's possible to get using just the first i interval lengths, subject to the condition that none of your intervals goes above m. The function f, for fixed i, is essentially a (non-strictly) increasing function from the integers to a bounded subset of the integers. Therefore:
all values of f(i,m), for m > 0, are equal, with finitely many exceptions;
all values of f(i,m), for m < 0, are equal, with finitely many exceptions.
This means it's possible to represent all values of f(i,m) using a finite data structure (still considering a fixed value of i).
Now let F(i) be the value of this data structure representing all values of f(i,m). I claim that, given F(i), it is possible to calculate F(i+1). To do this, we only need to answer the following question for all x: If I place the new interval at x, how good is the best solution I can get? But we know what this is - it's just f(i,x) + the score we've got from this interval.
So if n is the number of intervals in the second list, the score of the best solution will be F(n).
To actually find the solution, you could work backwards from this.
You know what's the best score you can get. Say it's s_0. Then put the last interval as far left as possible, subject to the condition that it allows you to score s_0. That is, find the smallest m such that f(n,m) = s_0; and place the interval such that it only just stays inside the bound at m.
Then, let s_1 be the score you need to get from all the other intervals in order to get a total of s_0. Place the next-last interval as far left as possible, subject to the condition that you can still score s_1. That is, find the smallest m such that f(n,m) = s_1; and place the interval such that it only just stays inside the bound at m.
And so on...
Given a set of objects, each of which is placed at several locations on a Natural number line: Find the smallest interval [a, b] containing all the objects.
Example: Consider 3 objects A, B, C
A is placed at 1, 5, 7
B is placed at 2, 4, 6
C is placed at 4, 8, 9
The smallest interval that encompasses all the three objects is [4, 5].
I can only think of O(S^2) solution where S is the minimal interval containing all the object locations i.e, [1, 9].
Is there a better way to do this ?
PS : Note that multiple objects can be placed at the same location.
Sort all the data points in ascending order (nlogn time).
Traverse these data points from the left.
Keep track of the following:
1. For each type of object, maintain an entry of the coordinate of last object found (maybe through a hashmap for fast operation).
2. Minimum interval length found till now.
3. The coordinate of the earliest element in the list. This is to keep track of the start of current interval.
Whenever you encounter an object,
1. Update its entry in the maintained list.
2. Check whether the coordinate of earliest element has been updated. If so, then calculate the new interval length and update the minimum interval length if the new one is smaller.
You will first need to ascertain that you have encountered all types of objects to calculate the first valid minimum interval length. You can do that by a counter.
If the number of different types of elements is bounded and small, then the order of complexity is O(nlogn) where n is the total number of data points.
You can do this in O(N) using 2 indices while going through the list. (Lets call them left and right).
You start with both of them in position 1, and then increment right until [left,right] has all the elements. You know that this is the minimum interval starting in left that has all the elements. Now increment left. Now increment right again until you have all the elements. (NOTE, many times you don't even have to increment). Get the minimum out of all the complete intervals and you have your answer.
This works because if you know [left,right] is the minimum interval starting at left, the interval starting at left+1 will have it's right >= the last right.
This is O(N) because you add the elements of a location once, and delete them once.
You'll need to use a hash to count the unique elements.