Neo4J - Finding the widest path on very large graphs

Neo4J - Finding the widest path on very large graphs - algorithm

I have created a very large directional weighted graph, and I'm trying to find the widest path between two points.
each edge has a count property
Here is a small portion of the graph:
I have found this example and modified the query, so the path collecting would be directional like so:
MATCH p = (v1:Vertex {name:'ENTRY'})-[:TRAVELED*]->(v2:Vertex {name:'EXIT'})
WITH p, EXTRACT(c IN RELATIONSHIPS(p) | c.count) AS counts
UNWIND(counts) AS b
WITH p, MIN(b) AS count
ORDER BY count DESC
RETURN NODES(p) AS `Widest Path`, count
LIMIT 1
This query seems to require an enormous amount of memory, and fails even on partial data.
Update: for classification, the query is running until running out of memory.
I've found this link, that combines the use of spark and neo4j. Unfortunately Mazerunner for Neo4j, does not support "widest path" algorithm out of the box. What would be the right approach to run the "widest path" query on a very large graph?

The reason your algorithm is taking a long time to run is because (a) you have a big graph, (b) your memory parameters probably need tweaking (see comments) and (c) you're enumerating every possible path between ENTRY and EXIT. Depending on what your graph is structured like, this could be a huge number of paths.
Note that if you're looking for the broadest path, then broadest is the largest/smallest weight on an edge. This means that you're probably computing and re-computing many paths you can ignore.
Wikipedia has good information on this algorithm you should consider. In particular:
It is possible to find maximum-capacity paths and minimax paths with a
single source and single destination very efficiently even in models
of computation that allow only comparisons of the input graph's edge
weights and not arithmetic on them.[12][18] The algorithm maintains a
set S of edges that are known to contain the bottleneck edge of the
optimal path; initially, S is just the set of all m edges of the
graph. At each iteration of the algorithm, it splits S into an ordered
sequence of subsets S1, S2, ... of approximately equal size; the
number of subsets in this partition is chosen in such a way that all
of the split points between subsets can be found by repeated
median-finding in time O(m). The algorithm then reweights each edge of
the graph by the index of the subset containing the edge, and uses the
modified Dijkstra algorithm on the reweighted graph; based on the
results of this computation, it can determine in linear time which of
the subsets contains the bottleneck edge weight. It then replaces S by
the subset Si that it has determined to contain the bottleneck weight,
and starts the next iteration with this new set S. The number of
subsets into which S can be split increases exponentially with each
step, so the number of iterations is proportional to the iterated
logarithm function, O(logn), and the total time is O(m logn).[18] In
a model of computation where each edge weight is a machine integer,
the use of repeated bisection in this algorithm can be replaced by a
list-splitting technique of Han & Thorup (2002), allowing S to be
split into O(√m) smaller sets Si in a single step and leading to a
linear overall time bound.
You should consider implementing this approach with cypher rather than your current "enumerate all paths" approach, as the "enumerate all paths" approach has you re-checking the same edge counts for as many paths as there are that involve that particular edge.
There's not ready-made software that will just do this for you, I'd recommend taking that paragraph (and checking its citations for further information) and then implementing that. I think performance wise you can do much better than your current query.

Some thoughts.
Your query (and the original example query) can be simplified. This may or may not be sufficient to prevent your memory issue.
For each matched path, there is no need to: (a) create a collection of counts, (b) UNWIND it into rows, and then (c) perform a MIN aggregation. The same result could be obtained by using the REDUCE function instead:
MATCH p = (v1:Vertex {name:'ENTRY'})-[:TRAVELED*]->(v2:Vertex {name:'EXIT'})
WITH p, REDUCE(m = 2147483647, c IN RELATIONSHIPS(p) | CASE WHEN c.count < m THEN c.count ELSE m END) AS count
ORDER BY count DESC
RETURN NODES(p) AS `Widest Path`, count
LIMIT 1;
(I assume that the count property value is an int. 2147483647 is the max int value.)
You should create an index (or, perhaps more appropriately, a uniqueness constraint) on the name property of the Vertex label. For example:
CREATE INDEX ON :Vertex(name)
EDITED
This enhanced version of the above query might solve your memory problem:
MERGE (t:Temp) SET t.count = 0, t.widest_path = NULL
WITH t
OPTIONAL MATCH p = (v1:Vertex {name:'ENTRY'})-[:TRAVELED*]->(v2:Vertex {name:'EXIT'})
WITH t, p, REDUCE(m = 2147483647, c IN RELATIONSHIPS(p) | CASE WHEN c.count < m THEN c.count ELSE m END) AS count
WHERE count > t.count
SET t.count = count, t.widest_path = NODES(p)
WITH COLLECT(DISTINCT t)[0] AS t
WITH t, t.count AS count, t.widest_path AS `Widest Path`
DELETE t
RETURN `Widest Path`, count;
It creates (and ultimately deletes) a temporary :Temp node to keep track of the currently "winning" count and (the corresponding path nodes). (You must make sure that the label Temp is not otherwise used.)
The WITH clause starting with COLLECT(DISTINCT t) uses aggregation of distinct :Temp nodes (of which there is only 1) to ensure that Cypher only keeps a single reference to the :Temp node, no matter how many paths satisfy the WHERE clause. Also, that WITH clause does NOT include p, so that Cypher does not accumulate paths that we do not care about. It is this clause that might be the most important in helping to avoid your memory issues.
I have not tried this out.

Related

Number of walks from source to sink with exactly h hops

Given an un-directed graph, a starting vertex and ending vertex. Find the number of walks (so a vertex can be visited more than once) from the source to the sink that involve exactly h hops. For example, if the graph is a triangle, the number of such paths with h hops is given by the h-th Jakobstahl number. This can be extended to a fully connected k-node graph, producing the recurrence (and closed form solution) here.
When the graph is an n-sided polygon, the accepted answer here expresses the number of walks as a sum of binomial terms.
I assume there might be an efficient algorithm for finding this number for any given graph? We can assume the graph is provided in adjacency matrix or adjacency list or any other convenient notation.

A solution to this would be to use a modified BFS with two alternating queues and a per-node counter for paths to this node of a certain length:
paths(start, end, n):
q = set(start)
q_next = set()
path_ct = map()
path_ct_next = map()
path_ct[start] = 1
for i in [0, n): # counting loop
for node in q: # queue loop
for a in adjacent(node): # neighbor-loop
path_ct_next[a] += path_ct[node]
q_next.add(a)
q = q_next
q_next = set()
path_ct = path_ct_next
path_ct_next = map()
return path_ct_next[end]
The basic assumption here is that map() produces a dictionary that returns zero, if the entry doesn't yet exist. Otherwise it returns the previously set value. The counting-loop simply takes care of doing exactly as many iterations as hops as required. The queue loop iterates over all nodes that can be reached using exactly i hops. In the neighbor-loop finally all nodes that can be reached in i + 1 hops are found. In this loop the adjacent nodes will be stored into a queue for the next iteration of counting-loop. The number of possible paths to reach such a node is the sum of the number of paths to reach it's predecessors. Once this is done for each node of the current iteration of counting-loop, the queues and tables are swapped/replaced by empty instances and the algorithm can start over.

If you take the adjacency matrix of a graph and raise it to the nth power, the resulting matrix counts the number of paths from each node to each other that uses exactly n edges. That would provide one way of computing the number you want - plus many others you aren’t all that interested in. :-)
Assuming the number of paths is “small” (say, something that fits into a 64-bit integer), you could use exponentiation by squaring to compute the matrix in O(log n) multiplies for a total cost of O(|V|ω log n), where ω is the exponent of the fastest matrix multiplication algorithm. However, if the quantity you’re looking for doesn’t fit into a machine word, then the cost of this approach will depend on how big the answer is as the multiplies will take variable amounts of time. For most graphs and small n this won’t be an issue, but if n is large and there are other parts of the graph that are densely connected this will slow down a bit.
Hope this helps!

You can make an algorithm that keep searching all possible paths , but with a variable that will contain your number of hops
For each possible path , each hop will decrement that variable and when arriving to zero your algorithm goes to trying another path , and if ever a path arrives to the target before making variable reachs zero , this path will be added to the list of your desired paths

Why is the time complexity O(lgN) for an operation in weighted union find algorithm?

So everywhere I see weighted union find algorithm, they use this approach:
Maintain an array pointing to the parent node of a particular node
Maintain an array denoting the size of the tree a node is in
For union (p,q) merge the smaller tree with larger
Time complexity here is O(lgN)
Now an optimzation on this is to flatten the trees, ie, whenever I am calculating the root of a particular node, set all nodes in that path to point to that root.
Time complexity of this is O(lg*N)
This I can understand, but what I don't get is why don't they start off with an array/hashset wherein nodes point to the root (instead of the immediate parent node)? That would bring the time complexity down to O(1).

I am going to assume that the time complexity you are asking for is the time to check whether 2 nodes belong to the same set.
The key is in how sets are joined, specifically you take the root of one set (the smaller one) and have it point to the root of the other set. Let the two sets have p and q as roots respectively, and |p| will represents the size of the set p if p is a root, while in general it will be the number of items whos set path goes through p (which is 1 + all its children).
We can without loss of generality assume that |p| <= |q| (otherwise we just exchange their names). We then have that |p u q| = |p|+|q| >= 2|p|. This shows us that each subtree in data-structure can at most be half as big as its parent, so given N items it can at most have the depth 1+lg N = O(lg(N)).
If the two choosen items are the furthest possible from the root, it will take O(N) operations to find the root for each of their sets, since you only need O(1) operations to move up one layer in the set, and then O(1) operations to compare those roots.
This cost is also applied to each union operation itself, since you need to figure out which two roots you need to merge. The reason we dont just have all nodes point directly to the root, is several-fold. First we would need to change all the nodes in the set every time we perform a union, secondly we only have edges pointing from the nodes toward the root and not the other way, so we would have to look through all nodes to find the ones we would need to change. The next reason is that we have good optimizations that can help in this kind of step and still work. Finally you could do such a step at the end if you really need to, but it would cost O(N lg(N)) time to perform it, which is compariable to how long it would take to run the entire algorithm by itself without the running short-cut optimization.

You are correct that the solution you suggest will bring down the time complexity of a Find operation to O(1). However, doing so will make a Union operation slower.
Imagine that you use an array/hashtable to remember the representative (or the root, as you call it) of each node. When you perform a union operation between two nodes x and y, you would either need to update all the nodes with the same representative as x to have y's representative, or vice versa. This way, union runs in O(min{|Sx|, |Sy|}), where Sxis the set of nodes with the same representative as x. These sets can be significantly larger than log n.
The weighted union algorithm on the other hand has O(log n) for both Find and
So it's a trade-off. If you expect to do many Find operations, but few Union operations, you should use the solution you suggest. If you expect to do many of each, you can use the weighted union algorithm to avoid excessively slow operations.

Pathfinding task - how can I find next vertex on the shortest path from A to B faster that O ( n )?

I have a quite tricky task to solve:
You are given a N * M board (1 <= N, M <= 256). You can move from each field to it's neighbouring field (moving diagonally is not allowed). At the beginning, there are two types of fields: active and blocked. You can pass through active field, but you can't go on the blocked one. You have Q queries (1 <= Q <= 200). There are two types of queries:
1) find the next field (neighbouring to A) that lies on the shortest path from field A to B
2) change field A from active to blocked or conversly.
The first type query can be easily solved with simple BFS in O(N * M) time. We can represent active and blocked fields as 0 or 1, so the second query could be done in constant time.
The total time of that algorithm would be O(Q (number of queries) * N * M).
So what's the problem? I have a 1/60 second to solve all the queries. If we consider 1 second as 10^8 calculations, we are left with about 1,5 * 10^6 calculations. One BFS may take up to N * M * 4 time, which is about 2,5 * 10^5. So if Q is 200, the needed calculations may be up to 5 * 10^7, which is way too slow.
As far as I know, there is no better pathfinding algorithms than BFS in this case (well, I could go for an A*, but I'm not sure if it's much quicker than BFS, it's still worst-case O(|E|) - according to Wikipedia ). So there's not much to optimize in this area. However, I could change my graph in some way to reduce the amount of edges that the algorithm would have to process (I don't need to know the full shortest path, only the next move I should make, so the rest of the shortest path can be very simplified). I was thinking about some preprocessing - grouping vertices in a groups and making a graph of graphs, but I'm not sure how to handle the blocked fields in that way.
How can I optimize it better? Or is it even possible?
EDIT: The actual problem: I have some units on the board. I want to start moving them to the selected destination. Units can't share the same field, so one can block others' paths or open a new, better paths for them. There can be a lot of units, that's why I need a better optimization.

If I understand the problem correctly, you want to find the shortest path on a grid from A to B, with the added ability that your path-finder can remove walls for an additional movement cost?
You can treat this as a directed graph problem, where you can move into any wall-node for a cost of 2, and into any normal node for a cost of 1. Then just use any directed-graph pathfinding algorithm such as Dijkstra's or A* (the usual heuristic, manhatten distance, will still work)

Are there sorting algorithms that respect final position restrictions and run in O(n log n) time?

I'm looking for a sorting algorithm that honors a min and max range for each element1. The problem domain is a recommendations engine that combines a set of business rules (the restrictions) with a recommendation score (the value). If we have a recommendation we want to promote (e.g. a special product or deal) or an announcement we want to appear near the top of the list (e.g. "This is super important, remember to verify your email address to participate in an upcoming promotion!") or near the bottom of the list (e.g. "If you liked these recommendations, click here for more..."), they will be curated with certain position restriction in place. For example, this should always be the top position, these should be in the top 10, or middle 5 etc. This curation step is done ahead of time and remains fixed for a given time period and for business reasons must remain very flexible.
Please don't question the business purpose, UI or input validation. I'm just trying to implement the algorithm in the constraints I've been given. Please treat this as an academic question. I will endeavor to provide a rigorous problem statement, and feedback on all other aspects of the problem is very welcome.
So if we were sorting chars, our data would have a structure of
struct {
char value;
Integer minPosition;
Integer maxPosition;
}
Where minPosition and maxPosition may be null (unrestricted). If this were called on an algorithm where all positions restrictions were null, or all minPositions were 0 or less and all maxPositions were equal to or greater than the size of the list, then the output would just be chars in ascending order.
This algorithm would only reorder two elements if the minPosition and maxPosition of both elements would not be violated by their new positions. An insertion-based algorithm which promotes items to the top of the list and reorders the rest has obvious problems in that every later element would have to be revalidated after each iteration; in my head, that rules out such algorithms for having O(n3) complexity, but I won't rule out such algorithms without considering evidence to the contrary, if presented.
In the output list, certain elements will be out of order with regard to their value, if and only if the set of position constraints dictates it. These outputs are still valid.
A valid list is any list where all elements are in a position that does not conflict with their constraints.
An optimal list is a list which cannot be reordered to more closely match the natural order without violating one or more position constraint. An invalid list is never optimal. I don't have a strict definition I can spell out for 'more closely matching' between one ordering or another. However, I think it's fairly easy to let intuition guide you, or choose something similar to a distance metric.
Multiple optimal orderings may exist if multiple inputs have the same value. You could make an argument that the above paragraph is therefore incorrect, because either one can be reordered to the other without violating constraints and therefore neither can be optimal. However, any rigorous distance function would treat these lists as identical, with the same distance from the natural order and therefore reordering the identical elements is allowed (because it's a no-op).
I would call such outputs the correct, sorted order which respects the position constraints, but several commentators pointed out that we're not really returning a sorted list, so let's stick with 'optimal'.
For example, the following are a input lists (in the form of <char>(<minPosition>:<maxPosition>), where Z(1:1) indicates a Z that must be at the front of the list and M(-:-) indicates an M that may be in any position in the final list and the natural order (sorted by value only) is A...M...Z) and their optimal orders.
Input order
A(1:1) D(-:-) C(-:-) E(-:-) B(-:-)
Optimal order
A B C D E
This is a trivial example to show that the natural order prevails in a list with no constraints.
Input order
E(1:1) D(2:2) C(3:3) B(4:4) A(5:5)
Optimal order
E D C B A
This example is to show that a fully constrained list is output in the same order it is given. The input is already a valid and optimal list. The algorithm should still run in O(n log n) time for such inputs. (Our initial solution is able to short-circuit any fully constrained list to run in linear time; I added the example both to drive home the definitions of optimal and valid and because some swap-based algorithms I considered handled this as the worse case.)
Input order
E(1:1) C(-:-) B(1:5) A(4:4) D(2:3)
Optimal Order
E B D A C
E is constrained to 1:1, so it is first in the list even though it has the lowest value. A is similarly constrained to 4:4, so it is also out of natural order. B has essentially identical constraints to C and may appear anywhere in the final list, but B will be before C because of value. D may be in positions 2 or 3, so it appears after B because of natural ordering but before C because of its constraints.
Note that the final order is correct despite being wildly different from the natural order (which is still A,B,C,D,E). As explained in the previous paragraph, nothing in this list can be reordered without violating the constraints of one or more items.
Input order
B(-:-) C(2:2) A(-:-) A(-:-)
Optimal order
A(-:-) C(2:2) A(-:-) B(-:-)
C remains unmoved because it already in its only valid position. B is reordered to the end because its value is less than both A's. In reality, there will be additional fields that differentiate the two A's, but from the standpoint of the algorithm, they are identical and preserving OR reversing their input ordering is an optimal solution.
Input order
A(1:1) B(1:1) C(3:4) D(3:4) E(3:4)
Undefined output
This input is invalid for two reasons: 1) A and B are both constrained to position 1 and 2) C, D, and E are constrained to a range than can only hold 2 elements. In other words, the ranges 1:1 and 3:4 are over-constrained. However, the consistency and legality of the constraints are enforced by UI validation, so it's officially not the algorithms problem if they are incorrect, and the algorithm can return a best-effort ordering OR the original ordering in that case. Passing an input like this to the algorithm may be considered undefined behavior; anything can happen. So, for the rest of the question...
All input lists will have elements that are initially in valid positions.
The sorting algorithm itself can assume the constraints are valid and an optimal order exists.2
We've currently settled on a customized selection sort (with runtime complexity of O(n2)) and reasonably proved that it works for all inputs whose position restrictions are valid and consistent (e.g. not overbooked for a given position or range of positions).
Is there a sorting algorithm that is guaranteed to return the optimal final order and run in better than O(n2) time complexity?3
I feel that a library standard sorting algorithm could be modified to handle these constrains by providing a custom comparator that accepts the candidate destination position for each element. This would be equivalent to the current position of each element, so maybe modifying the value holding class to include the current position of the element and do the extra accounting in the comparison (.equals()) and swap methods would be sufficient.
However, the more I think about it, an algorithm that runs in O(n log n) time could not work correctly with these restrictions. Intuitively, such algorithms are based on running n comparisons log n times. The log n is achieved by leveraging a divide and conquer mechanism, which only compares certain candidates for certain positions.
In other words, input lists with valid position constraints (i.e. counterexamples) exist for any O(n log n) sorting algorithm where a candidate element would be compared with an element (or range in the case of Quicksort and variants) with/to which it could not be swapped, and therefore would never move to the correct final position. If that's too vague, I can come up with a counter example for mergesort and quicksort.
In contrast, an O(n2) sorting algorithm makes exhaustive comparisons and can always move an element to its correct final position.
To ask an actual question: Is my intuition correct when I reason that an O(n log n) sort is not guaranteed to find a valid order? If so, can you provide more concrete proof? If not, why not? Is there other existing research on this class of problem?
1: I've not been able to find a set of search terms that points me in the direction of any concrete classification of such sorting algorithm or constraints; that's why I'm asking some basic questions about the complexity. If there is a term for this type of problem, please post it up.
2: Validation is a separate problem, worthy of its own investigation and algorithm. I'm pretty sure that the existence of a valid order can be proven in linear time:
Allocate array of tuples of length equal to your list. Each tuple is an integer counter k and a double value v for the relative assignment weight.
Walk the list, adding the fractional value of each elements position constraint to the corresponding range and incrementing its counter by 1 (e.g. range 2:5 on a list of 10 adds 0.4 to each of 2,3,4, and 5 on our tuple list, incrementing the counter of each as well)
Walk the tuple list and
If no entry has value v greater than the sum of the series from 1 to k of 1/k, a valid order exists.
If there is such a tuple, the position it is in is over-constrained; throw an exception, log an error, use the doubles array to correct the problem elements etc.
Edit: This validation algorithm itself is actually O(n2). Worst case, every element has the constraints 1:n, you end up walking your list of n tuples n times. This is still irrelevant to the scope of the question, because in the real problem domain, the constraints are enforced once and don't change.
Determining that a given list is in valid order is even easier. Just check each elements current position against its constraints.
3: This is admittedly a little bit premature optimization. Our initial use for this is for fairly small lists, but we're eyeing expansion to longer lists, so if we can optimize now we'd get small performance gains now and large performance gains later. And besides, my curiosity is piqued and if there is research out there on this topic, I would like to see it and (hopefully) learn from it.

On the existence of a solution: You can view this as a bipartite digraph with one set of vertices (U) being the k values, and the other set (V) the k ranks (1 to k), and an arc from each vertex in U to its valid ranks in V. Then the existence of a solution is equivalent to the maximum matching being a bijection. One way to check for this is to add a source vertex with an arc to each vertex in U, and a sink vertex with an arc from each vertex in V. Assign each edge a capacity of 1, then find the max flow. If it's k then there's a solution, otherwise not.
http://en.wikipedia.org/wiki/Maximum_flow_problem
--edit-- O(k^3) solution: First sort to find the sorted rank of each vertex (1-k). Next, consider your values and ranks as 2 sets of k vertices, U and V, with weighted edges from each vertex in U to all of its legal ranks in V. The weight to assign each edge is the distance from the vertices rank in sorted order. E.g., if U is 10 to 20, then the natural rank of 10 is 1. An edge from value 10 to rank 1 would have a weight of zero, to rank 3 would have a weight of 2. Next, assume all missing edges exist and assign them infinite weight. Lastly, find the "MINIMUM WEIGHT PERFECT MATCHING" in O(k^3).
http://www-math.mit.edu/~goemans/18433S09/matching-notes.pdf
This does not take advantage of the fact that the legal ranks for each element in U are contiguous, which may help get the running time down to O(k^2).

Here is what a coworker and I have come up with. I think it's an O(n2) solution that returns a valid, optimal order if one exists, and a closest-possible effort if the initial ranges were over-constrained. I just tweaked a few things about the implementation and we're still writing tests, so there's a chance it doesn't work as advertised. This over-constrained condition is detected fairly easily when it occurs.
To start, things are simplified if you normalize your inputs to have all non-null constraints. In linear time, that is:
for each item in input
if an item doesn't have a minimum position, set it to 1
if an item doesn't have a maximum position, set it to the length of your list
The next goal is to construct a list of ranges, each containing all of the candidate elements that have that range and ordered by the remaining capacity of the range, ascending so ranges with the fewest remaining spots are on first, then by start position of the range, then by end position of the range. This can be done by creating a set of such ranges, then sorting them in O(n log n) time with a simple comparator.
For the rest of this answer, a range will be a simple object like so
class Range<T> implements Collection<T> {
int startPosition;
int endPosition;
Collection<T> items;
public int remainingCapacity() {
return endPosition - startPosition + 1 - items.size();
}
// implement Collection<T> methods, passing through to the items collection
public void add(T item) {
// Validity checking here exposes some simple cases of over-constraining
// We'll catch these cases with the tricky stuff later anyways, so don't choke
items.add(item);
}
}
If an element A has range 1:5, construct a range(1,5) object and add A to its elements. This range has remaining capacity of 5 - 1 + 1 - 1 (max - min + 1 - size) = 4. If an element B has range 1:5, add it to your existing range, which now has capacity 3.
Then it's a relatively simple matter of picking the best element that fits each position 1 => k in turn. Iterate your ranges in their sorted order, keeping track of the best eligible element, with the twist that you stop looking if you've reached a range that has a remaining size that can't fit into its remaining positions. This is equivalent to the simple calculation range.max - current position + 1 > range.size (which can probably be simplified, but I think it's most understandable in this form). Remove each element from its range as it is selected. Remove each range from your list as it is emptied (optional; iterating an empty range will yield no candidates. That's a poor explanation, so lets do one of our examples from the question. Note that C(-:-) has been updated to the sanitized C(1:5) as described in above.
Input order
E(1:1) C(1:5) B(1:5) A(4:4) D(2:3)
Built ranges (min:max) <remaining capacity> [elements]
(1:1)0[E] (4:4)0[A] (2:3)1[D] (1:5)3[C,B]
Find best for 1
Consider (1:1), best element from its list is E
Consider further ranges?
range.max - current position + 1 > range.size ?
range.max = 1; current position = 1; range.size = 1;
1 - 1 + 1 > 1 = false; do not consider subsequent ranges
Remove E from range, add to output list
Find best for 2; current range list is:
(4:4)0[A] (2:3)1[D] (1:5)3[C,B]
Consider (4:4); skip it because it is not eligible for position 2
Consider (2:3); best element is D
Consider further ranges?
3 - 2 + 1 > 1 = true; check next range
Consider (2:5); best element is B
End of range list; remove B from range, add to output list
An added simplifying factor is that the capacities do not need to be updated or the ranges reordered. An item is only removed if the rest of the higher-sorted ranges would not be disturbed by doing so. The remaining capacity is never checked after the initial sort.
Find best for 3; output is now E, B; current range list is:
(4:4)0[A] (2:3)1[D] (1:5)3[C]
Consider (4:4); skip it because it is not eligible for position 3
Consider (2:3); best element is D
Consider further ranges?
same as previous check, but current position is now 3
3 - 3 + 1 > 1 = false; don't check next range
Remove D from range, add to output list
Find best for 4; output is now E, B, D; current range list is:
(4:4)0[A] (1:5)3[C]
Consider (4:4); best element is A
Consider further ranges?
4 - 4 + 1 > 1 = false; don't check next range
Remove A from range, add to output list
Output is now E, B, D, A and there is one element left to be checked, so it gets appended to the end. This is the output list we desired to have.
This build process is the longest part. At its core, it's a straightforward n2 selection sorting algorithm. The range constraints only work to shorten the inner loop and there is no loopback or recursion; but the worst case (I think) is still sumi = 0 n(n - i), which is n2/2 - n/2.
The detection step comes into play by not excluding a candidate range if the current position is beyond the end of that ranges max position. You have to track the range your best candidate came from in order to remove it, so when you do the removal, just check if the position you're extracting the candidate for is greater than that ranges endPosition.
I have several other counter-examples that foiled my earlier algorithms, including a nice example that shows several over-constraint detections on the same input list and also how the final output is closest to the optimal as the constraints will allow. In the mean time, please post any optimizations you can see and especially any counter examples where this algorithm makes an objectively incorrect choice (i.e. arrives at an invalid or suboptimal output when one exists).
I'm not going to accept this answer, because I specifically asked if it could be done in better than O(n2). I haven't wrapped my head around the constraints satisfaction approach in #DaveGalvin's answer yet and I've never done a maximum flow problem, but I thought this might be helpful for others to look at.
Also, I discovered the best way to come up with valid test data is to start with a valid list and randomize it: for 0 -> i, create a random value and constraints such that min < i < max. (Again, posting it because it took me longer than it should have to come up with and others might find it helpful.)

Not likely*. I assume you mean average run time of O(n log n) in-place, non-stable, off-line. Most Sorting algorithms that improve on bubble sort average run time of O(n^2) like tim sort rely on the assumption that comparing 2 elements in a sub set will produce the same result in the super set. A slower variant of Quicksort would be a good approach for your range constraints. The worst case won't change but the average case will likely decrease and the algorithm will have the extra constraint of a valid sort existing.
Is ... O(n log n) sort is not guaranteed to find a valid order?
All popular sort algorithms I am aware of are guaranteed to find an order so long as there constraints are met. Formal analysis (concrete proof) is on each sort algorithems wikepedia page.
Is there other existing research on this class of problem?
Yes; there are many journals like IJCSEA with sorting research.
*but that depends on your average data set.

Combinatorial best match

Say I have a Group data structure which contains a list of Element objects, such that each group has a unique set of elements.:
public class Group
{
public List<Element> Elements;
}
and say I have a list of populations who require certain elements, in such a way that each population has a unique set of required elements:
public class Population
{
public List<Element> RequiredElements;
}
I have an unlimited quantity of each defined Group, i.e. they are not consumed by populations.
Say I am looking at a particular Population. I want to find the best possible match of groups such that there is minimum excess elements, and no unmatched elements.
For example: I have a population which needs wood, steel, grain, and coal. The only groups available are {wood, herbs}, {steel, coal, oil}, {grain, steel}, and {herbs, meat}.
The last group - {herbs, meat} isn't required at all by my population so it isn't used. All others are needed, but herbs and oil are not required so it is wasted. Furthermore, steel exists twice in the minimum set, so one lot of steel is also wasted. The best match in this example has a wastage of 3.
So for a few hundred Population objects, I need to find the minimum wastage best match and compute how many elements are wasted.
How do I even begin to solve this? Once I have found a match, counting the wastage is trivial. Finding the match in the first place is hard. I could enumerate all possibilities but with a few thousand populations and many hundreds of groups, it's quite a task. Especially considering this whole thing sits inside each iteration of a simulated annealing algorithm.
I'm wondering whether I can formulate the whole thing as a mixed-integer program and call a solver like GLPK at each iteration.
I hope I have explained the problem correctly. I can clarify anything that's unclear.
Here's my binary program, for those of you interested...
x is the decision vector, an element of {0,1}, which says that the population in question does/doesn't receive from group i. There is an entry for each group.
b is the column vector, an element of {0,1}, which says which resources the population in question does/doesn't need. There is an entry for each resource.
A is a matrix, an element of {0,1}, which says what resources are in what groups.
The program is:
Minimise: ((Ax - b)' * 1-vector) + (x' * 1-vector);
Subject to: Ax >= b;
The constraint just says that all required resources must be satisfied. The objective is to minimise all excess and the total number of groups used. (i.e. 0 excess with 1 group used is better than 0 excess with 5 groups used).

You can formulate an integer program for each population P as follows. Use a binary variable xj to denote whether group j is chosen or not. Let A be a binary matrix, such that Aij is 1 if and only if item i is present in group j. Then the integer program is:
min Ei,j (xjAij)
s.t. Ej xjAij >= 1 for all i in P.
xj = 0, 1 for all j.
Note that you can obtain the minimum wastage by subtracting |P| from the optimal solution of the above IP.

Do you mean the Maximum matching problem?
You need to build a bipartite graph, where one of the sides is your populations and the other is groups, and edge exists between group A and population B if it have it in its set.
To find maximum edge matching you can easily use Kuhn algorithm, which is greatly described here on TopCoder.
But, if you want to find mimimum edge dominating set (the set of minimum edges that is covering all the vertexes), the problem becomes NP-hard and can't be solved in polynomial time.

Take a look at the weighted set cover problem, I think this is exactly what you described above. A basic description of the (unweighted) problem can be found here.
Finding the minimal waste as you defined above is equivalent to finding a set cover such that the sum of the cardinalities of the covering sets is minimal. Hence, the weight of each set (=a group of elements) has to be defined equal to its cardinality.
Since even the unweighted the set cover problem is NP-complete, it is not likely that an efficient algorithm for your problem instances exist. Maybe a good greedy approximation algorithm will be sufficient or your purpose? Googling weighted set cover provides several promising results, e.g. this script.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio