How do programs like Apache Airflow/ Luigi determine shortest path? - etl

I am looking for a simple layman's term explanation or a reference to read on how do programs like Apache Airflow or Luigi determine the shortest path to complete a certain task and make it possible to parallelize it? And how does that, if any, relate to graph theory?

Related

Get directed acyclic graph showing OpenMP tasks

Is there an automated way to get the task graph from a given OpenMP code with depend clauses? The task graph should show the tasks as vertices and data dependencies as directed edges.
Simple answer: Probably not, but check out OpenMP profiling tools
More complicated answer...
You can potentially write your own tool to achieve some of this using the OMPT interfaces which allow you to log events of interest. However, mapping a task dependence from the runtime representation (a pointer value) back to something meaningful is likely to be "fun".
You could check out Score-P and TAU; they may have support (as may Intel VTune).

top-down community detection in a network

I'm trying to find a way to find network communities in a top-down way. Most of the algorithms available (e.g. in the igraph package) are working bottom up - that is they start by assuming all nodes are singleton communities, and then combine them to larger communities. I want to got the other way around, similar to how decision trees are built: start with the whole network, then find a split that improves some "measure of information", etc.
Does anyone know of such algorithm or such a measure? I can't find such in the literature, but maybe I am missing something.
Also, what bothers me with some measures of modularity is that if you think of the whole network as one module, then all edges are within module and no out-module edges exist, so this seems like a perfect partition into a modules. Is there a measure that overcome this limitation?
I think Newman's algorithm meets your requirements.
It works by computing "network modularity" and then splitting the network into two groups. After that it recursively applies the same principle to the newly formed groups until no further increase in modularity is possible.
It should also be implemented in igraph. At least in the r version.

Depth First Search using Map Reduce

I have successfully implemented the Shortest Path algorithm in Hadoop Map Reduce(Breath First Search). However I have a question that:
Is it possible to do graph traversal "Depth First Search" using Hadoop map reduce ?
Any Links..?
The nature of the Depth First Search makes it inappropriate for map reduce jobs. Because you only follow one strict path to the end before forking into another one. That lead to the fact that you can't use the scalability provided by hadoop properly. I'm not aware of a fine working implementation and I'm pretty sure you won't find one which uses the MapReduce paradigm in a good way.
If you try to implement graph algorithms in hadoop on your own you might want to have a look at some useful frameworks like Apache Giraph, xrime or Pegasus. xrime also contains a shortest path implementation which might be interesting for you.

Pathfinding with limited knowledge and no distance heuristic

I'm having trouble writing the pathfinding routine for the AI in a simple Elite-esque game I'm writing.
The world in this game is a handful of "systems" connected by "wormholes", and a ship can jump from the system it's in to any system it's linked to once per turn. The AI is limited to only knowing things that it should know; it doesn't know what links come from a system it hasn't been to (though it can work it out from the systems it has seen, since links are two-way). Other parts of the AI decide which system the ship needs to get to based on what goods it has in its inventory and how much it remembers things being worth on systems it has passed through.
The problem is, I don't know how to approach the problem of finding a path to the target system. I can't use A*; there's no way to determine the 'distance' to another system without pathing to it. I also need this algorithm to be efficient, since it'll need to run about 100 times every time the player takes his turn.
Does anyone know of a suitable algorithm?
I ended up implementing a bidirectional, greedy version of breadth-first search, which suits the purpose well enough. To put it simply, I just had the program look through each node its starting node connected through, then each node those nodes connected to, then each node those connected to... until the destination node was found.
Normally one would build a list of appropriate paths and pick the shortest one, but I tried a different method; I had the program run two searches in parallel, one from the starting point, and one from the end point. When the 'from' search found the last node of the 'to' search, the path was considered found.
It then optimizes the path by checking if each node on the path connects to a node further up in the path, and deleting each node in between them.
Whether or not this funky algorithm is actually any better than a straight BFS remains to be seen.
When it comess to unknown environments, I usually use an evolutionary algorithm approach. It doesn't guarantee that you'll find the best solution in the small timeframe you have, but is a way to approach such a problem.
Have a look at Partially Observable Markov Decision Problems (POMDP). You should be able to express your problem with this model.
Then you can use an algorithm that solves these problems to try to find a good solution. Note that solving POMDPs is usually very expensive and you probably have to resort to approximate methods.
Easiest way to cheat this would be o go through, or at least attempt to access as many systems as possible, then implement the distance heuristic as the sum of all the systems you've been to.
Alternatively, and way cooler:
I've implemented something similar using ACO (Ant colony optimization) and worked pretty well combined with PSO(particle swarm optimization), however, the additional constraints your system is imposing means that you'll have to spend a few (at least one) sessions figuring out the environment layout, and if it's dynamic... well... though.
The good thing is that this algorithm completely bypasses the need for heuristic generation which is what you need since you are flying blind. Be advised though, that this is a bad idea if your search space (number of runs) is small. (100 may be acceptable, but 10 or 5 ... not so much).
This scales up quite nicely when dealing with large numbers of nodes (systems) and it bypasses the heuristic distance computational need for every node-to-node relationship, thereby making it more efficient.
Good luck.

Parallel A* search in C# - different paths

I'm working on some bidirectional A* algorithm. I'm searching from end to start, and from start to end. When the first thread encounters with a node from other thread(from open or closed list) it stops and draws a path back.
But I have the problem when the thread take different paths and they dont meet where they should.
Example: http://i.imgur.com/ittIAlI.png
This was a common problem that discouraged research of bidirectional search until Kaindl & Kainz proved it unnecessary in 1997. Section 2.3 of PNBA*: A Parallel Bidirectional Heuristic Search Algorithm provides additional historical background, as well as a (parallel) bidirectional algorithm that overcomes this issue.
You may wish to read Yet Another Bidirectional Algorithm for Shortest Paths first as the (serial) NBA* algorithm it describes is referenced by the first paper extensively.
I have just successfully adapted my open source Hexgrid PathFinding utility found here to use a serial version of PNBA*. (Really about half-way between NBA* and PNBA*) This will be uploaded within a day or two.
Making the Shortest Path Even Quicker gives an overview of developing the Bing Maps path finder using Bidirectional A* with pre-processing. A detailed description of the preprocessing work, and use of the relevant algorithm, is available in Reach for A* and Better Landmarks Within Reach

Resources