Need some help understanding this problem about maximizing graph connectivity - algorithm

I was wondering if someone could help me understand this problem. I prepared a small diagram because it is much easier to explain it visually.
alt text http://img179.imageshack.us/img179/4315/pon.jpg
Problem I am trying to solve:
1. Constructing the dependency graph
Given the connectivity of the graph and a metric that determines how well a node depends on the other, order the dependencies. For instance, I could put in a few rules saying that
node 3 depends on node 4
node 2 depends on node 3
node 3 depends on node 5
But because the final rule is not "valuable" (again based on the same metric), I will not add the rule to my system.
2. Execute the request order
Once I built a dependency graph, execute the list in an order that maximizes the final connectivity. I am not sure if this is a really a problem but I somehow have a feeling that there might exist more than one order in which case, it is required to choose the best order.
First and foremost, I am wondering if I constructed the problem correctly and if I should be aware of any corner cases. Secondly, is there a closely related algorithm that I can look at? Currently, I am thinking of something like Feedback Arc Set or the Secretary Problem but I am a little confused at the moment. Any suggestions?
PS: I am a little confused about the problem myself so please don't flame on me for that. If any clarifications are needed, I will try to update the question.

It looks like you are trying to determine an ordering on requests you send to nodes with dependencies (or "partial ordering" for google) between nodes.
If you google "partial order dependency graph", you get a link to here, which should give you enough information to figure out a good solution.
In general, you want to sort the nodes in such a way that nodes come after their dependencies; AKA topological sort.

I'm a bit confused by your ordering constraints vs. the graphs that you picture: nothing matches up. That said, it sounds like you have soft ordering constraints (A should come before B, but doesn't have to) with costs for violating the constraint. An optimal algorithm for scheduling that is NP-hard, but I bet you could get a pretty good schedule using a DFS biased towards large-weight edges, then deleting all the back edges.

If you know in advance the dependencies of each node, you can easily build layers.
It's amusing, but I faced the very same problem when organizing... the compilation of the different modules of my application :)
The idea is simple:
def buildLayers(nodes):
layers = []
n = nodes[:] # copy the list
while not len(n) == 0:
layer = _buildRec(layers, n)
if len(layer) == 0: raise RuntimeError('Cyclic Dependency')
for l in layer: n.remove(l)
layers.append(layer)
return layers
def _buildRec(layers, nodes):
"""Build the next layer by selecting nodes whose dependencies
already appear in `layers`
"""
result = []
for n in nodes:
if n.dependencies in flatten(layers): result.append(n) # not truly python
return result
Then you can pop the layers one at a time, and each time you'll be able to send the request to each of the nodes of this layer in parallel.
If you keep a set of the already selected nodes and the dependencies are also represented as a set the check is more efficient. Other implementations would use event propagations to avoid all those nested loops...
Notice in the worst case you have O(n3), but I only had some thirty components and there are not THAT related :p

Related

What's the difference between effect and control edges of V8's TurboFan?

I've read many blog posts, articles, presentation and videos, even inspected V8's source code, both the bytecode generator, the sea-of-nodes graph generator and the optimization phases, and still couldn't find an answer.
V8's optimizing compiler, TurboFan, uses an IR of type "sea of nodes". All of the academic articles I found about it says that it's basically a CFG combined with a data-flow graph, and as such has two type of edges to connect nodes: data edges and control edges. Basically, if you take only the data edges you form a data-flow graph while if you choose the control edges you get a control flow graph.
However, TurboFan has one more edge type: "effect edges" (and effect phis). I suppose that this is what this slide means when it says that this is not "sea" of nodes but "soup" of nodes, because I couldn't find this term anywhere else. From what I understood, effect edges help the compiler keep the structure of statements/expressions that if reordered will have a visible side-effect. The example everyone uses is o.f = o.f + 1: the load has to come before the store (or we'll read the new value), and the addition has to come before the store, too (or otherwise we'll store the old value and uselessly increment the result).
But I cannot understand: isn't this the goal of control edges? From searching through the code I can see that almost every node has an effect edge and a control edge. Their uses isn't obvious though. From what I understand, in sea of nodes you use control edges to constrain evaluation order. So why do we need both effect and control edges? Probably I'm missing something fundamental, but I can't find it.
TL;DR: What's the use of effect edges and EffectPhi nodes, and how they're different from control edges.
Great thanks.
The idea of a sea-of-nodes compiler is that IR nodes have maximum freedom to move around. That's why in a snippet like o.f = o.f + 1, the sequence load-add-store is not considered "control flow". Only conditions and branches are. So if we slightly extend the example:
if (condition) {
o.f = o.f + 1;
} else {
// something else
}
then, as you describe in the question, effect dependencies ensure that load-add-store are scheduled in this order, and control dependencies ensure that all three operations are only performed if condition is true (technically, if the true-branch of the if-condition is taken). Note that this is important even for the load; for instance, o might be an invalid object if condition is false, and attempting to load its f property might cause a segfault in that case.

Multiple depot vehicle scheduling

I have been playing around with algorithms and ILP for the single depot vehicle scheduling problem (SDVSP) and now want to extend my knowledge towards the multiple depot vehicle scheduling problem (MDVSP), as i would like to use this knowledge in a project of mine.
As for the question, I've found and implemented several algorithms for the MDSVP. However, one question i am very curious about is how to go about determining the amount of needed depots (and locations to an extend). Sadly enough i haven't been able to find any resources really which do not assume/require that the depots are set. Thus my question would be: How would i be able to approach a MDVSP in which i can determine the amount and locations of the depots?
(Edit) To clarify:
Assume we are given a set of trips T1, T2...Tn like usually in a SDVSP or MDVSP. Multiple trips can be driven in succession before returning to a depot. Leaving and returning to depots usually only happen at the start and end of a day. But as an extension to the normal problems, we can now determine the amount and locations of our depots, opposed to having set depots.
The objective is to find a solution in which all trips are driven with the minimal cost. The cost consists of the amount of deadhead (the distance which the car has to travel between trips, and from and to the depots), a fixed cost K per car, and a fixed cost C per depots.
I hope this clears up the question somewhat.
The standard approach involves adding |V| binary variables in ILP, one for each node where x_i = 1 if v_i is a depot and 0 otherwise.
However, the way the question is currently articulated, all x_i values will come out to be zero, since there is no "advantage" of making the node a depot and the total cost = (other cost factors) + sum_i (x_i) * FIXED_COST_PER_DEPOT.
Perhaps the question needs to be updated with some other constraint about the range of the car. For example, a car can only go so and so miles before returning to a depot.

n! combinations, how to find best one without killing computer?

I'll get straight to it. I'm working on an web or phone app that is responsible for scheduling. I want students to input courses they took, and I give them possible combinations of courses they should take that fits their requirements.
However, let's say there's 150 courses that fits their requirements and they're looking for 3 courses. That would be 150C3 combinations, right?.
Would it be feasible to run something like this in browser or a mobile device?
First of all you need a smarter algorithm which can prune the search tree. Also, if you are doing this for the same set of courses over and over again, doing the computation on the server would be better, and perhaps precomputing a feasible data structure can reduce the execution time of the queries. For example, you can create a tree where each sub-tree under a node contains nodes that are 'compatible'.
Sounds to me like you're viewing this completely wrong. At most institutions there are 1) curriculum requirements for graduation, and 2) prerequisites for many requirements and electives. This isn't a pure combinatorial problem, it's a dependency tree. For instance, if Course 201, Course 301, and Course 401 are all required for the student's major, higher numbers have the lower numbered ones as prereqs, and the student is a Junior, you should be strongly recommending that Course 201 be taken ASAP.
Yay, mathematics I think I can handle!
If there are 150 courses, and you have to choose 3, then the amount of possibilities are (150*149*148)/(3*2) (correction per jerry), which is certainly better than 150 factorial which is a whole lot more zeros ;)
Now, you really don't want to build an array that size, and you don't have to! All web languages have the idea of randomly choosing an element in an array, so you get an element in an array and request 3 random unique entries from it.
While the potential course combinations is very large, based on your post I see no reason to even attempt to calculate them. This task of random selection of k items from n-sized list is delightfully trivial even for old, slow devices!
Is there any particular reason you'd need to calculate all the potential course combinations, instead of just grab-bagging one random selection as a suggestion? If not, problem solved!
Option 1 (Time\Space costly): let the user on mobile phone browse the list of (150*149*148) possible choices, page by page, the processing is done at the server-side.
Option 2 (Simple): instead of the (150*149*148)-item decision tree, provide a 150-item bag, if he choose one item from the bag, remove it from the bag.
Option 3 (Complex): expand your decision tree (possible choices) using a dependency tree (parent course requires child courses) and the list of course already taken by the student, and his track\level.
As far as I know, most educational systems use the third option, which requires having a profile for the student.

Implementing D* Lite for Path-Planning - How detect Edge Cost Change?

I currently try implementing the D* Lite Algorithm for path-planning (see also here) to get a grasp on it. I found two implementations on the web, both for C/C++, but somehow couldn't entirely follow the ideas since they seem to differ more than expected from the pseudo code in the whitepapers. I especially use the following two papers:
Koening,S.;Likhachev,M. - D* Lite, 2002
Koenig & Likkachev, Fast replanning for Navigation in Unknown Terrain, IEEE Transactions on Robotics, Vol. 21, No. 3, June 2005
I tried implementing the optimized version of D* Lite from the first whitepaper (p.5,Fig.4) and for "debugging" I use the example as shown and explained in the second whitepaper (p.6,Fig.6 and Fig.7). All work is done in MatLab (easier for exchanging code with others).
So far I got the code running to find the initial shortest path by running computeShortestPath() once. But now I am stuck at lines {36''} and {37''} of the pseudo-code:
{36”} Scan graph for changed edge costs;
{37”} if any edge costs changed
How do I detect those changes? I somehow don't seem to have a grasp on how this is being detected? In my implementation so far, I mainly used 3 matrices.
One matrix of same size as the grid map containing all rhs-values. One matrix of same size containing similarly all g-values. And one matrix with variable row count for the priority queue with the first two columns being the priority keys and the third and fourth row being the x- and y-coordinates.
Comparing my results, I get the same result for the first run of computeShortestPath() in Step5 as seen in the second whitepaper, p.6 Fig.6. Moving the robot one step also isn't that a problem. But I really have no clue how the next step of scanning for changed edge costs should be implemented.
Thanks for any hint, advice and/or help!!!
The following was pointed out to me by someone else:
In real-world code, you almost never have to "scan the graph for
changes." Your graph only changes when you change it in the code, so
you already know exactly when and where it can change!
One common way of implementing this is to have a OnGraphChanged
callback in your Graph class, which can be setup to call the
OnGraphChanged method in your PathFinder class. Then anywhere the
graph changes in your Graph class, make sure the OnGraphChanged
callback is called.
I personally implemented it by using a "true map" and a "known map" and after every move letting the robot check/scan all next possible successors and comparing them on the true map and the known map.

Time Aware Social Graph DS/Queries

Classic social networks can be represented as a graph/matrix.
With a graph/matrix one can easily compute
shortest path between 2 participants
reachability from A -> B
general statistics (reciprocity, avg connectivity, etc)
etc
Is there an ideal data structure (or a modification to graph/matrix) that enables easy computation of the above while being time aware?
For example,
Input
t = 0...100
A <-> B (while t = 0...10)
B <-> C (while t = 5...100)
C <-> A (while t = 50...100)
Sample Queries
Is A associated with B at any time? (yes)
Is A associated with B while B is associated with C? (yes. #t = 5...10)
Is C ever reachable from A (yes. # t=5 )
What you're looking for is an explicitly persistent data structure. There's a fair body of literature on this, but it's not that well known. Chris Okasaki wrote a pretty substantial book on the topic. Have a look at my answer to this question.
Given a full implementation of something like Driscoll et al.'s node-splitting structure, there are a few different ways to set up your queries. If you want to know about stuff true in a particular time range, you would only examine nodes containing data about that time range. If you wanted to know what time range something was true, you would start searching, and progressively tighten your bounds as you explore each new node. Just remember that your results might not always be contiguous - consider two people start dating, breaking up, and getting back together.
I would guess that there's probably at least one publication worth of unexplored territory in how to do interesting queries over persistent graphs, if not much more.

Resources