(1 point) Suppose you have a set of N people who all do not know each other, and you have to process
a sequence of queries of the following kinds:
Type 1: MakeFriends(x,y) - x and y become friends with each other
Type 2: AreFriends(x,y) - output 1 if x and y are friends, and output 0 otherwise
In particular, for every pair of people, we need to maintain a state indicating whether they are friends
or not, and update this state based on queries of type 1, and report the state for queries of type 2.
Since you have learned about the disjoint sets data structures in your algorithms course, you decide to
store the set of N people as N singleton sets to begin with. For each query of type 1, you perform a
union operation, and for each query of type 2, you perform a find operation.
What can you say about this algorithm?
A. It may not always produce an accurate answer.
B. It works with O(lg∗n), O(lg∗n) amortized response time to both queries.
C. It works with O(n), O(n) worst-case response time to both queries.
D. The correctness of the approach depends on how DSU is implemented.
If we get the following sequence of queries: MakeFriends(1,2), MakeFriends(2,3), AreFriends(1,3), the answer to the last query with our approach (irrespective of how DSU is implemented)
will be 1, while the correct answer is 0.
Is the above explanation correct
Let's imagine two sets S1 and S2, where S1 contains elements 1, 2, 3, and S2 contains elements 4, 5, 6. These two sets don't have any elements in common, so they're disjoint.
Each set can be viewed as a tree that has a single root. Let's say the root of S1 is 1 and the root of S2 is 4.
Now if UNION(2, 5) is called, it will essentially merge set S1 with set S2. In other words, all the elements will now be part of a single set, and the root can be either 1 or 4 depending on the implementation. If FIND(1,6) is called afterwards, it will return true, as 1 and 6 now belongs to the same set, even though there weren't any calls like UNION(1,6) that explicitly asked to join 1 and 6.
If we get the following sequence of queries: MakeFriends(1,2), MakeFriends(2,3), AreFriends(1,3), the answer to the last query with our approach (irrespective of how DSU is implemented) will be 1, while the correct answer is 0.
Coming back to this statement, MakeFriends(1,2) will create a set that includes the elements 1, 2. Calling MakeFriends(2, 3) will include 3 in the same set. As a result, AreFriends(1,3) will return true, even though there weren't any explicit calls that joined 1 and 3, similar to the example mentioned above. So this statement is correct.
In the multiple choice questions, A. It may not always produce an accurate answer is correct.
Related
I have a vector of pairs (datatype=double), where each pair is (a,b) and a less than b.For a number x, I want to find out number of pair in vector, where a<=x<=b.
Consider the vector size about 10^6.
My Approach
Sort the vector pair and perform a lower_bound operation for x over "a" in pair then iterate from start till my lower bound value and check for values of "b" which satisfies condition of x<=b.
Time Complexity
N(LogN) where N is vector size.
Issue
I have to perform this over large queries where this approach becomes inefficient.So is there any better solution to decrease the time complexity.
Sorry for my poor English and question formatting.
In addition to the previous answer, here's a suggestion how to prepare the ranges to optimize the subsequent lookup. The idea boils down to precomputing the result for all significantly different input values, but being smart about when values don't differ significantly.
To illustrate what I mean, let's consider this sequence of ranges:
1, 3
1, 8
2, 4
2, 6
The prepared output structure then looks like this:
1, 2 -> 2
2, 3 -> 4
3, 4 -> 3
4, 6 -> 2
6, 8 -> 1
For any number in the range 1, 2, there are two matching ranges in the initial sequence. For any number in the range 2, 3, there are four matches, etc. Note that there are five ranges here now, because some of the input ranges partially overlapped. Since for every range here the end value is also the start value of the next range, the end value can be optimized out. The result then looks like a simple map:
1 -> 2
2 -> 4
3 -> 3
4 -> 2
6 -> 1
8 -> 0
Note here that the last range didn't have one following, so the explicit zero becomes necessary. For the values before the first, that is implied. In order to find the result for a value, just find the key that is less than or equal to that value. This is a simple O(log n) lookup.
Firstly, if you just did a simple scan over the pairs, you would have O(n) complexity! The O(n log n) comes from sorting and for a one-off operation this is just overhead. This might even be the best way to do it, if you don't reuse the results and even if you just perform a few queries, it might still be better than sorting. Make sure you allow yourself to switch out the algorithm.
Anyhow, let's consider that you need to make many queries. Then, one relatively obvious step to improve things is to not iterate step-by-step after sorting. Instead, you can do a binary search for the lower bound. Simply partition the sequence into halves. The lower bound can be found in either half, which you can determine by looking at the middle element between the partitions. Recurse until you found the first element that can not possibly contain the value you search, because its start value is already greater.
Concerning the other direction, things are not that easy. Just because you sorted the ranges by the start value doesn't imply that the end values are sorted, too. Also, ranges that match and ranges that don't can be mixed in the sequence, so here you will have to perform a linear scan.
Lastly, some notes:
You could parallelize this algorithm using multithreading.
Depending on your number of searches M in your outer loop, you could also switch the outer loop with the inner one. That means that for every pair of the input vector, you check each of the M search values whether they fall within the range. This might be better, in particular when the M searches fit into the CPU cache.
This is a very typical style problem in for segment trees, binary indexed trees, interval trees.
There are two operations that you have to carry out on an array arr.
You have two operations on an array arr:
1. Range update: Add(a, b): for(int i = a; i <= b; ++i) arr[i]++
2. Point query : Query(x): return arr[x]
Alternately, you could formulate your problem slightly cleverly.
1. Point Update: Add(a, b): arr[a]++; arr[b+1]--;
2. Range Query: Query(x): return sum(arr[0], arr[1] ..... arr[x]);
In each of the cases above, you have one O(n) operation and one O(1) operation.
For the second case, the query is essentially a prefix sum calculation. Binary Indexed Trees are especially efficient at this task.
Tutorial for Binary Indexed Trees
IMPORTANT IDEA: ARRAY COMPRESSION
You did mention that the vector size is about 10^6, so there is a chance that you may not be able to create an array that big. If you are able to create a set that consists of all the as and bs and xs beforehand, then you can translate them into numbers from 1 to size of set.
SUPER CLEVER IDEA: MO's ALGORITHM
This is only allowed if you are allowed to solve the problem offline. What that means is that you can take all the query points x as input, solve them in any order as you like and store the solution, and then print the solution in the correct order.
Please mention if this is your situation, and only then will I elaborate further on this. But Binary Indexed Trees are going to be more efficient than Mo's algorithm.
EDIT:
Because your interval values are of type double, you must convert them to integers before you use my solution. Let me give an example,
Intervals = (1.1 to 1.9), (1.4 to 2.1)
Query Points = 1.5, 2.0
Here all the points that are of interest are not all the possible doubles, but just the above numbers = {1.1, 1.4, 1.5, 1.9, 2.0, 2.1}
If we map them into positive integers:
1.1 --> 1
1.4 --> 2
1.5 --> 3
1.9 --> 4
2.0 --> 5
2.1 --> 6
Then you could use segment trees/binary indexed trees.
For each pair a,b you can decompose so that a=+1 and b=-1 for the number of ranges valid for a particular value. Then in becomes a simple O(log n) lookup to see how many ranges encompass the search value.
I have a variable like follow:
var L_s = collection.mutable.Set[collection.mutable.Set[Int]]()
and I want to find the union of sets of different size .
Please Note : All the sets in L_s will be of same size that is k-1 when we want to find union of size k.
As of now I am doing the following :
for(i <- L_s){
for (j <- L_s){
if((i.union(j)).size == k)
{
res.+=(i.union(j))
}
}
}
This operation is taking a lot of time if L_s has many sets. I wanted to know what is the most efficient way to do this operation.
Since union is commutative (a union b == b union a) you're doing twice as many operations as needed, and you're repeating the union operation again when the target size is found, and every Set gets union with itself. These inefficiencies can be eliminated.
L_s.toVector.combinations(2).map(x => x(0) union x(1)).filter(_.size == k).toSet
There are basically two opportunities to speed up your algorithm:
Speed up a single comparison of two sets
Reduce the total number of required comparisons
With comparison I mean the check whether the union of both sets has size k.
The first one is the simpler part and I mentioned it in my previous answer. You don't have to actually compute the union and check its size. It is sufficient to know the size of the intersection (not the actual intersection, only its size). Run through the elements of the first set and count how many occur in the second (very efficient lookup). If the intersection has size k - 2 you found a pair of sets the conforms to your requirement. You should break the loop whenever you have found two elements not in the second set because you will never reach intersection size k - 2 from there.
This works because two sets with the required property have all elements in common except for one in each set. This means set 1 has exactly one element not in set 2 and vice versa.
You can also exploit this property to limit the number of comparisons. The idea is the following. Your sets contain integers that can be ordered. If a pair of sets qualifies and you consider only the lowest two integers of each set, let's call it the prefix, the prefixes have at least one integer in common. Anything else would be a contradiction to your requirement.
You can now index all sets by their lowest and second lowest integer in advance. I.e., you build two maps of type Map[Int, Set[Set[Int]], let's call them setsByLowestMember and setsBySecondLowestMember. When you now loop through all sets for testing, instead of testing a set against every other set, you only test against those sets having the lowest or second lowest member equal to the lowest or second lowest of the current set.
Example: The current (ordered) set under consideration is { 3, 7, 9, ...}. You check against all of setsByLowestMember(3), setsByLowestMember(7), setsBySecondLowestMember(3) and setsBySecondLowestMember(7).
Depending on your data this may significantly speed up your algorithm. However, if your sets tend to have a very large intersection in general, it might not help that much. If it helps, it can be further improved (there still some checks done twice with the above approach).
I recently saw a logic/math problem called 4 Fours where you need to use 4 fours and a range of operators to create equations that equal to all the integers 0 to N.
How would you go about writing an elegant algorithm to come up with say the first 100...
I started by creating base calculations like 4-4, 4+4, 4x4, 4/4, 4!, Sqrt 4 and made these values integers.
However, I realized that this was going to be a brute force method testing the combinations to see if they equal, 0 then 1, then 2, then 3 etc...
I then thought of finding all possible combinations of the above values, checking that the result was less than 100 and filling an array and then sorting it...again inefficient because it may find 1000s of numbers over 100
Any help on how to approach a problem like this would be helpful...not actual code...but how to think through this problem
Thanks!!
This is an interesting problem. There are a couple of different things going on here. One issue is how to describe the sequence of operations and operands that go into an arithmetic expression. Using parentheses to establish order of operations is quite messy, so instead I suggest thinking of an expression as a stack of operations and operands, like - 4 4 for 4-4, + 4 * 4 4 for (4*4)+4, * 4 + 4 4 for (4+4)*4, etc. It's like Reverse Polish Notation on an HP calculator. Then you don't have to worry about parentheses, having the data structure for expressions will help below when we build up larger and larger expressions.
Now we turn to the algorithm for building expressions. Dynamic programming doesn't work in this situation, in my opinion, because (for example) to construct some numbers in the range from 0 to 100 you might have to go outside of that range temporarily.
A better way to conceptualize the problem, I think, is as breadth first search (BFS) on a graph. Technically, the graph would be infinite (all positive integers, or all integers, or all rational numbers, depending on how elaborate you want to get) but at any time you'd only have a finite portion of the graph. A sparse graph data structure would be appropriate.
Each node (number) on the graph would have a weight associated with it, the minimum number of 4's needed to reach that node, and also the expression which achieves that result. Initially, you would start with just the node (4), with the number 1 associated with it (it takes one 4 to make 4) and the simple expression "4". You can also throw in (44) with weight 2, (444) with weight 3, and (4444) with weight 4.
To build up larger expressions, apply all the different operations you have to those initial node. For example, unary negation, factorial, square root; binary operations like * 4 at the bottom of your stack for multiply by 4, + 4, - 4, / 4, ^ 4 for exponentiation, and also + 44, etc. The weight of an operation is the number of 4s required for that operation; unary operations would have weight 0, + 4 would have weight 1, * 44 would have weight 2, etc. You would add the weight of the operation to the weight of the node on which it operates to get a new weight, so for example + 4 acting on node (44) with weight 2 and expression "44" would result in a new node (48) with weight 3 and expression "+ 4 44". If the result for 48 has better weight than the existing result for 48, substitute that new node for (48).
You will have to use some sense when applying functions. factorial(4444) would be a very large number; it would be wise to set a domain for your factorial function which would prevent the result from getting too big or going out of bounds. The same with functions like / 4; if you don't want to deal with fractions, say that non-multiples of 4 are outside of the domain of / 4 and don't apply the operator in that case.
The resulting algorithm is very much like Dijkstra's algorithm for calculating distance in a graph, though not exactly the same.
I think that the brute force solution here is the only way to go.
The reasoning behind this is that each number has a different way to get to it, and getting to a certain x might have nothing to do with getting to x+1.
Having said that, you might be able to make the brute force solution a bit quicker by using obvious moves where possible.
For instance, if I got to 20 using "4" three times (4*4+4), it is obvious to get to 16, 24 and 80. Holding an array of 100 bits and marking the numbers reached
Similar to subset sum problem, it can be solved using Dynamic Programming (DP) by following the recursive formulas:
D(0,0) = true
D(x,0) = false x!=0
D(x,i) = D(x-4,i-1) OR D(x+4,i-1) OR D(x*4,i-1) OR D(x/4,i-1)
By computing the above using DP technique, it is easy to find out which numbers can be produced using these 4's, and by walking back the solution, you can find out how each number was built.
The advantage of this method (when implemented with DP) is you do not recalculate multiple values more than once. I am not sure it will actually be effective for 4 4's, but I believe theoretically it could be a significant improvement for a less restricted generalization of this problem.
This answer is just an extension of Amit's.
Essentially, your operations are:
Apply a unary operator to an existing expression to get a new expression (this does not use any additional 4s)
Apply a binary operator to two existing expressions to get a new expression (the new expression has number of 4s equal to the sum of the two input expressions)
For each n from 1..4, calculate Expressions(n) - a List of (Expression, Value) pairs as follows:
(For a fixed n, only store 1 expression in the list that evaluates to any given value)
Initialise the list with the concatenation of n 4s (i.e. 4, 44, 444, 4444)
For i from 1 to n-1, and each permitted binary operator op, add an expression (and value) e1 op e2 where e1 is in Expressions(i) and e2 is in Expressions(n-i)
Repeatedly apply unary operators to the expressions/values calculated so far in steps 1-3. When to stop (applying 3 recursively) is a little vague, certainly stop if an iteration produces no new values. Potentially limit the magnitude of the values you allow, or the size of the expressions.
Example unary operators are !, Sqrt, -, etc. Example binary operators are +-*/^ etc. You can easily extend this approach to operators with more arguments if permitted.
You could do something a bit cleverer in terms of step 3 never ending for any given n. The simple way (described above) does not start calculating Expressions(i) until Expressions(j) is complete for all j < i. This requires that we know when to stop. The alternative is to build Expressions of a certain maximum length for each n, then if you need to (because you haven't found certain values), extend the maximum length in an outer loop.
I'm looking for a sorting algorithm that honors a min and max range for each element1. The problem domain is a recommendations engine that combines a set of business rules (the restrictions) with a recommendation score (the value). If we have a recommendation we want to promote (e.g. a special product or deal) or an announcement we want to appear near the top of the list (e.g. "This is super important, remember to verify your email address to participate in an upcoming promotion!") or near the bottom of the list (e.g. "If you liked these recommendations, click here for more..."), they will be curated with certain position restriction in place. For example, this should always be the top position, these should be in the top 10, or middle 5 etc. This curation step is done ahead of time and remains fixed for a given time period and for business reasons must remain very flexible.
Please don't question the business purpose, UI or input validation. I'm just trying to implement the algorithm in the constraints I've been given. Please treat this as an academic question. I will endeavor to provide a rigorous problem statement, and feedback on all other aspects of the problem is very welcome.
So if we were sorting chars, our data would have a structure of
struct {
char value;
Integer minPosition;
Integer maxPosition;
}
Where minPosition and maxPosition may be null (unrestricted). If this were called on an algorithm where all positions restrictions were null, or all minPositions were 0 or less and all maxPositions were equal to or greater than the size of the list, then the output would just be chars in ascending order.
This algorithm would only reorder two elements if the minPosition and maxPosition of both elements would not be violated by their new positions. An insertion-based algorithm which promotes items to the top of the list and reorders the rest has obvious problems in that every later element would have to be revalidated after each iteration; in my head, that rules out such algorithms for having O(n3) complexity, but I won't rule out such algorithms without considering evidence to the contrary, if presented.
In the output list, certain elements will be out of order with regard to their value, if and only if the set of position constraints dictates it. These outputs are still valid.
A valid list is any list where all elements are in a position that does not conflict with their constraints.
An optimal list is a list which cannot be reordered to more closely match the natural order without violating one or more position constraint. An invalid list is never optimal. I don't have a strict definition I can spell out for 'more closely matching' between one ordering or another. However, I think it's fairly easy to let intuition guide you, or choose something similar to a distance metric.
Multiple optimal orderings may exist if multiple inputs have the same value. You could make an argument that the above paragraph is therefore incorrect, because either one can be reordered to the other without violating constraints and therefore neither can be optimal. However, any rigorous distance function would treat these lists as identical, with the same distance from the natural order and therefore reordering the identical elements is allowed (because it's a no-op).
I would call such outputs the correct, sorted order which respects the position constraints, but several commentators pointed out that we're not really returning a sorted list, so let's stick with 'optimal'.
For example, the following are a input lists (in the form of <char>(<minPosition>:<maxPosition>), where Z(1:1) indicates a Z that must be at the front of the list and M(-:-) indicates an M that may be in any position in the final list and the natural order (sorted by value only) is A...M...Z) and their optimal orders.
Input order
A(1:1) D(-:-) C(-:-) E(-:-) B(-:-)
Optimal order
A B C D E
This is a trivial example to show that the natural order prevails in a list with no constraints.
Input order
E(1:1) D(2:2) C(3:3) B(4:4) A(5:5)
Optimal order
E D C B A
This example is to show that a fully constrained list is output in the same order it is given. The input is already a valid and optimal list. The algorithm should still run in O(n log n) time for such inputs. (Our initial solution is able to short-circuit any fully constrained list to run in linear time; I added the example both to drive home the definitions of optimal and valid and because some swap-based algorithms I considered handled this as the worse case.)
Input order
E(1:1) C(-:-) B(1:5) A(4:4) D(2:3)
Optimal Order
E B D A C
E is constrained to 1:1, so it is first in the list even though it has the lowest value. A is similarly constrained to 4:4, so it is also out of natural order. B has essentially identical constraints to C and may appear anywhere in the final list, but B will be before C because of value. D may be in positions 2 or 3, so it appears after B because of natural ordering but before C because of its constraints.
Note that the final order is correct despite being wildly different from the natural order (which is still A,B,C,D,E). As explained in the previous paragraph, nothing in this list can be reordered without violating the constraints of one or more items.
Input order
B(-:-) C(2:2) A(-:-) A(-:-)
Optimal order
A(-:-) C(2:2) A(-:-) B(-:-)
C remains unmoved because it already in its only valid position. B is reordered to the end because its value is less than both A's. In reality, there will be additional fields that differentiate the two A's, but from the standpoint of the algorithm, they are identical and preserving OR reversing their input ordering is an optimal solution.
Input order
A(1:1) B(1:1) C(3:4) D(3:4) E(3:4)
Undefined output
This input is invalid for two reasons: 1) A and B are both constrained to position 1 and 2) C, D, and E are constrained to a range than can only hold 2 elements. In other words, the ranges 1:1 and 3:4 are over-constrained. However, the consistency and legality of the constraints are enforced by UI validation, so it's officially not the algorithms problem if they are incorrect, and the algorithm can return a best-effort ordering OR the original ordering in that case. Passing an input like this to the algorithm may be considered undefined behavior; anything can happen. So, for the rest of the question...
All input lists will have elements that are initially in valid positions.
The sorting algorithm itself can assume the constraints are valid and an optimal order exists.2
We've currently settled on a customized selection sort (with runtime complexity of O(n2)) and reasonably proved that it works for all inputs whose position restrictions are valid and consistent (e.g. not overbooked for a given position or range of positions).
Is there a sorting algorithm that is guaranteed to return the optimal final order and run in better than O(n2) time complexity?3
I feel that a library standard sorting algorithm could be modified to handle these constrains by providing a custom comparator that accepts the candidate destination position for each element. This would be equivalent to the current position of each element, so maybe modifying the value holding class to include the current position of the element and do the extra accounting in the comparison (.equals()) and swap methods would be sufficient.
However, the more I think about it, an algorithm that runs in O(n log n) time could not work correctly with these restrictions. Intuitively, such algorithms are based on running n comparisons log n times. The log n is achieved by leveraging a divide and conquer mechanism, which only compares certain candidates for certain positions.
In other words, input lists with valid position constraints (i.e. counterexamples) exist for any O(n log n) sorting algorithm where a candidate element would be compared with an element (or range in the case of Quicksort and variants) with/to which it could not be swapped, and therefore would never move to the correct final position. If that's too vague, I can come up with a counter example for mergesort and quicksort.
In contrast, an O(n2) sorting algorithm makes exhaustive comparisons and can always move an element to its correct final position.
To ask an actual question: Is my intuition correct when I reason that an O(n log n) sort is not guaranteed to find a valid order? If so, can you provide more concrete proof? If not, why not? Is there other existing research on this class of problem?
1: I've not been able to find a set of search terms that points me in the direction of any concrete classification of such sorting algorithm or constraints; that's why I'm asking some basic questions about the complexity. If there is a term for this type of problem, please post it up.
2: Validation is a separate problem, worthy of its own investigation and algorithm. I'm pretty sure that the existence of a valid order can be proven in linear time:
Allocate array of tuples of length equal to your list. Each tuple is an integer counter k and a double value v for the relative assignment weight.
Walk the list, adding the fractional value of each elements position constraint to the corresponding range and incrementing its counter by 1 (e.g. range 2:5 on a list of 10 adds 0.4 to each of 2,3,4, and 5 on our tuple list, incrementing the counter of each as well)
Walk the tuple list and
If no entry has value v greater than the sum of the series from 1 to k of 1/k, a valid order exists.
If there is such a tuple, the position it is in is over-constrained; throw an exception, log an error, use the doubles array to correct the problem elements etc.
Edit: This validation algorithm itself is actually O(n2). Worst case, every element has the constraints 1:n, you end up walking your list of n tuples n times. This is still irrelevant to the scope of the question, because in the real problem domain, the constraints are enforced once and don't change.
Determining that a given list is in valid order is even easier. Just check each elements current position against its constraints.
3: This is admittedly a little bit premature optimization. Our initial use for this is for fairly small lists, but we're eyeing expansion to longer lists, so if we can optimize now we'd get small performance gains now and large performance gains later. And besides, my curiosity is piqued and if there is research out there on this topic, I would like to see it and (hopefully) learn from it.
On the existence of a solution: You can view this as a bipartite digraph with one set of vertices (U) being the k values, and the other set (V) the k ranks (1 to k), and an arc from each vertex in U to its valid ranks in V. Then the existence of a solution is equivalent to the maximum matching being a bijection. One way to check for this is to add a source vertex with an arc to each vertex in U, and a sink vertex with an arc from each vertex in V. Assign each edge a capacity of 1, then find the max flow. If it's k then there's a solution, otherwise not.
http://en.wikipedia.org/wiki/Maximum_flow_problem
--edit-- O(k^3) solution: First sort to find the sorted rank of each vertex (1-k). Next, consider your values and ranks as 2 sets of k vertices, U and V, with weighted edges from each vertex in U to all of its legal ranks in V. The weight to assign each edge is the distance from the vertices rank in sorted order. E.g., if U is 10 to 20, then the natural rank of 10 is 1. An edge from value 10 to rank 1 would have a weight of zero, to rank 3 would have a weight of 2. Next, assume all missing edges exist and assign them infinite weight. Lastly, find the "MINIMUM WEIGHT PERFECT MATCHING" in O(k^3).
http://www-math.mit.edu/~goemans/18433S09/matching-notes.pdf
This does not take advantage of the fact that the legal ranks for each element in U are contiguous, which may help get the running time down to O(k^2).
Here is what a coworker and I have come up with. I think it's an O(n2) solution that returns a valid, optimal order if one exists, and a closest-possible effort if the initial ranges were over-constrained. I just tweaked a few things about the implementation and we're still writing tests, so there's a chance it doesn't work as advertised. This over-constrained condition is detected fairly easily when it occurs.
To start, things are simplified if you normalize your inputs to have all non-null constraints. In linear time, that is:
for each item in input
if an item doesn't have a minimum position, set it to 1
if an item doesn't have a maximum position, set it to the length of your list
The next goal is to construct a list of ranges, each containing all of the candidate elements that have that range and ordered by the remaining capacity of the range, ascending so ranges with the fewest remaining spots are on first, then by start position of the range, then by end position of the range. This can be done by creating a set of such ranges, then sorting them in O(n log n) time with a simple comparator.
For the rest of this answer, a range will be a simple object like so
class Range<T> implements Collection<T> {
int startPosition;
int endPosition;
Collection<T> items;
public int remainingCapacity() {
return endPosition - startPosition + 1 - items.size();
}
// implement Collection<T> methods, passing through to the items collection
public void add(T item) {
// Validity checking here exposes some simple cases of over-constraining
// We'll catch these cases with the tricky stuff later anyways, so don't choke
items.add(item);
}
}
If an element A has range 1:5, construct a range(1,5) object and add A to its elements. This range has remaining capacity of 5 - 1 + 1 - 1 (max - min + 1 - size) = 4. If an element B has range 1:5, add it to your existing range, which now has capacity 3.
Then it's a relatively simple matter of picking the best element that fits each position 1 => k in turn. Iterate your ranges in their sorted order, keeping track of the best eligible element, with the twist that you stop looking if you've reached a range that has a remaining size that can't fit into its remaining positions. This is equivalent to the simple calculation range.max - current position + 1 > range.size (which can probably be simplified, but I think it's most understandable in this form). Remove each element from its range as it is selected. Remove each range from your list as it is emptied (optional; iterating an empty range will yield no candidates. That's a poor explanation, so lets do one of our examples from the question. Note that C(-:-) has been updated to the sanitized C(1:5) as described in above.
Input order
E(1:1) C(1:5) B(1:5) A(4:4) D(2:3)
Built ranges (min:max) <remaining capacity> [elements]
(1:1)0[E] (4:4)0[A] (2:3)1[D] (1:5)3[C,B]
Find best for 1
Consider (1:1), best element from its list is E
Consider further ranges?
range.max - current position + 1 > range.size ?
range.max = 1; current position = 1; range.size = 1;
1 - 1 + 1 > 1 = false; do not consider subsequent ranges
Remove E from range, add to output list
Find best for 2; current range list is:
(4:4)0[A] (2:3)1[D] (1:5)3[C,B]
Consider (4:4); skip it because it is not eligible for position 2
Consider (2:3); best element is D
Consider further ranges?
3 - 2 + 1 > 1 = true; check next range
Consider (2:5); best element is B
End of range list; remove B from range, add to output list
An added simplifying factor is that the capacities do not need to be updated or the ranges reordered. An item is only removed if the rest of the higher-sorted ranges would not be disturbed by doing so. The remaining capacity is never checked after the initial sort.
Find best for 3; output is now E, B; current range list is:
(4:4)0[A] (2:3)1[D] (1:5)3[C]
Consider (4:4); skip it because it is not eligible for position 3
Consider (2:3); best element is D
Consider further ranges?
same as previous check, but current position is now 3
3 - 3 + 1 > 1 = false; don't check next range
Remove D from range, add to output list
Find best for 4; output is now E, B, D; current range list is:
(4:4)0[A] (1:5)3[C]
Consider (4:4); best element is A
Consider further ranges?
4 - 4 + 1 > 1 = false; don't check next range
Remove A from range, add to output list
Output is now E, B, D, A and there is one element left to be checked, so it gets appended to the end. This is the output list we desired to have.
This build process is the longest part. At its core, it's a straightforward n2 selection sorting algorithm. The range constraints only work to shorten the inner loop and there is no loopback or recursion; but the worst case (I think) is still sumi = 0 n(n - i), which is n2/2 - n/2.
The detection step comes into play by not excluding a candidate range if the current position is beyond the end of that ranges max position. You have to track the range your best candidate came from in order to remove it, so when you do the removal, just check if the position you're extracting the candidate for is greater than that ranges endPosition.
I have several other counter-examples that foiled my earlier algorithms, including a nice example that shows several over-constraint detections on the same input list and also how the final output is closest to the optimal as the constraints will allow. In the mean time, please post any optimizations you can see and especially any counter examples where this algorithm makes an objectively incorrect choice (i.e. arrives at an invalid or suboptimal output when one exists).
I'm not going to accept this answer, because I specifically asked if it could be done in better than O(n2). I haven't wrapped my head around the constraints satisfaction approach in #DaveGalvin's answer yet and I've never done a maximum flow problem, but I thought this might be helpful for others to look at.
Also, I discovered the best way to come up with valid test data is to start with a valid list and randomize it: for 0 -> i, create a random value and constraints such that min < i < max. (Again, posting it because it took me longer than it should have to come up with and others might find it helpful.)
Not likely*. I assume you mean average run time of O(n log n) in-place, non-stable, off-line. Most Sorting algorithms that improve on bubble sort average run time of O(n^2) like tim sort rely on the assumption that comparing 2 elements in a sub set will produce the same result in the super set. A slower variant of Quicksort would be a good approach for your range constraints. The worst case won't change but the average case will likely decrease and the algorithm will have the extra constraint of a valid sort existing.
Is ... O(n log n) sort is not guaranteed to find a valid order?
All popular sort algorithms I am aware of are guaranteed to find an order so long as there constraints are met. Formal analysis (concrete proof) is on each sort algorithems wikepedia page.
Is there other existing research on this class of problem?
Yes; there are many journals like IJCSEA with sorting research.
*but that depends on your average data set.
How can you determine whether there exists a set of values that satisfy certain given criteria. The criteria are in the form of intervals and the minimum value within that interval.
For instance, given the criteria:
Interval : Minimum value in that interval
{2, 2} : 5
{1, 4} : 1
{4, 4} : 4
One set of values that can satisfy it is
{1, 5, 1, 4}
On the other hand, given the critera:
Interval : Minimum value in that interval
{2, 3} : 1
{1, 4} : 2
{4, 4} : 4
There exists no such set of values that satisfies them.
I want to determine whether there exists a set of values that satisfy the given criteria (i.e., I only want to find an algorithm that returns true if there exists a set of values that do satisfy the criteria and false if there does not).
I know how to do this using an O(N^2) brute-force, but I want to achieve an O(N lgN) solution if possible.
My first attempt at solving this involved merging overlapping intervals and then checking the merged intervals for the lowest value, however I quickly realized that doing so does not necessarily guarantee a correct answer.
My second attempt involved segment trees, namely trying to assign values to each values and if you were trying to overwrite an interval then no such interval existed. However, this was also quickly abandoned as you can still achieve at a valid set of values even when some portions are overwritten.
My third attempt involved interval trees, trying to find the intersection points between two intervals and checking whether a valid set of values could be created. This seemed promising but was an O(N^2) algorithm so was also abandoned.
Can anyone provide some insight?
The idea to assign values is correct. You just need to sort intervals by their minumum value(in increasing order). That is, the solution looks like this:
Build a segment tree(with -infinity values in all nodes).
Process all intervals in sorted order. For each interval, just assign its value(no matter what there was before).
Run queries for all intervals to check that everything is correct.
The only non-trivial statement is: if this algorithm did not find a solution, there is no solution. I will not post a formal proof, but here are key observation:
We must assign a new value to the entire interval when we process them in sorted order.
We do not assign a new value anywhere else(that is, we cannot destroy the value for another interval accidently).