Algorithms - Graph Depth-First Search

Algorithms - Graph Depth-First Search - algorithm

I'm learning about graph and DFS, and trying to do something similar to how ANT resolves the dependency. I'm confused about something and all the articles I read seems to assume everyone knows this.
I'm thinking of having a Map> with key = file, and value = set of files that the key depends on.
The DFS algorithm shows that I have to change the color of the node if it's already visited, that means the reference to the same fileNode must be the same between the one in key and the one in Set<> right?
Therefore, I'm thinking, each time a Node is created (including neighbor nodes), I would add it to one more Collection (maybe another Map?), then whenever a new Node is to be add to the graph (as key), search that Collection and use that reference instead? am I wasting too much space? How is it usually done? is there some other better way?

During my studies the DFS algorithm was implement like this:
Put all the nodes of a graph into a stack (this is a structure, where you can only retrieve and delete the first element).
Retrieve the first element, set it to seen, this can either be done through the coloring or by setting an attribute, lets call it isSeen, to true.
You then look at all the neighbors of that node, and if they are not seen already, you put them in the stack.
Once you looked at all the neighbors, you remove the node from the stack and retrieve the next element of the stack and do the same as for the first.
The result will then be, that all the nodes, that can be reached from the starting node, will have an attribute that is set to seen.
Hope this helped.

Related

Find min cost from node A to node B and also keep the path info

I got a question which is to find the min cost from the least number node (1) to the largest number node (7).
The cost is the edge between nodes. I labeled them.
This problem got me to think of the Dijkstra which leads the time complexity for O((v+e) log v)
Any other better approach to solving this question efficiently?
The other requirement is to keep the path information, any thought to keep the path?

As others pointed out, the complexity is as you say and cannot be better. As #nico-schertler commented, searching from both sides in parallel (or taking turns) and stopping as soon as something touches is faster than doing just a search from one side, but it will have the same complexity. It is possible in this case (with fixed costs for the bidirectional edges) but it needs not be in the general case (e. g. cost depending on the already taken path) where Dijkstra is still applicable.
Concerning the keeping of the path: Of course, the whole thing often only makes sense if you get the path to be taken as an answer. There are two main approaches to get the path as a result.
One is to store the path already taken to a certain node along with the node in the lists (white/grey in the classical implementation). Each time you add a new node, you extend the path of its former node by one step. If you find the target node, you can directly return the path as a result (along with the cost sum). Of course this way means uses a lot of memory.
The other is to not store the origin node along with each new found node, so each node points to the node it was visited from first. Think of it as putting up signposts in each node how to go back. This way, if you find the target node, you will have to go backwards from each node to the one it was first visited from and build the path in reverse order in the process. Then you can return this path.

What's the best pathfinding algorithm in complexity?

I need to implement a pathfinding algorithm in one of my programs. The goal is to know whether a path exists or not. As a consequence, knowing the path itself isn't important.
I already did some researches and I am not sure which one to pick. This post have been telling that a DFS or a BFS would be more suitable for this kind of programs but I'd rather have confirmation knowing the exact situation. I also would be interested in knowing the complexity itself of the program, but I guess I can find this. It's fine if it's not shared.
Here's the graph I am using: let's say I have a x*y grid with zones the path can and cannot take.
I want to know if there is an existing path that starts from the top of the graph and ends on the bottom of the graph. Here's an example with the path in red:
I believe DFS is the best in complexity but I also am not sure exactly how to implement it knowing the different start points the path can take. I am not sure if it's better to launch the DFS on each of the different points the path can start or if I add a layer of zones the path can take to let one test work.
Thank you for your help!

There are a number of different approaches that you can take here. Assuming that the grids you're working with are of roughly the size that you're showing above, and assuming you aren't, say, processing millions of grids at once, chances are that both breadth-first search and depth-first search would work equally well. The advantage of breadth-first search is that it will find the shortest path from anywhere in the top to anywhere in the bottom; the disadvantage is that it typically requires more memory than depth-first search. But again, if you're working with grids on the order of, say, hundreds or thousands of cells each, chances are that this memory overhead isn't going to be too much of a problem. I'd say to pick whichever algorithm you feel most comfortable working with and go with it.
As for how to implement a search from "anywhere in the top" to "anywhere in the bottom," you can achieve this in a few different ways.
If you're using a depth-first search, you can run one depth-first search from each of the cells in the top row and search for a path down to the bottom row. DFS requires you to maintain some information about which cells have and have not been visited. If you recycle this same information across all the calls to DFS, you'll ensure that no two calls do any duplicated work, and so the resulting solution should be very efficient, running in time O(mn) for an m × n grid.
If you're using a breadth-first search, the modification is pretty straightforward: instead of just enqueuing a single start point in the queue at the beginning of the search, enqueue every cell in the top row at the beginning of the search. The BFS will then naturally explore all possible paths starting anywhere in the top row.
Both of these ideas can be thought of in a different way. Imagine your grid is a graph where each cell is a node and edges correspond to pairs of adjacent cells. You can then add in a new node that sits above the top row of the grid and is connected to each of the nodes in the top row. You then add in a new node that sits just below the bottom row and is connected to each of the nodes in the bottom row. Now, if there's a path from the new top node to the new bottom node, it means that there's a path from some node in the top row to some node in the bottom row, so doing a single search in this graph will be sufficient to check if a path exists. (Fun fact: the two above modifications to DFS and BFS can each be thought of as implicitly doing a search in this new graph.)
There's another option you might want to consider that's fairly easy to implement and imperceptibly less efficient than DFS or BFS, and that's to use a disjoint-set forest data structure to determine what's connected. This data structure supports two kinds of queries:
Given two cells, mark that there's a way to get from the first cell to the second. ("Union")
Given two cells, determine whether there's a path between them, which can be a direct path or could be formed by chaining together multiple other paths. ("Find")
You could implement your connectivity query by building a disjoint-set forest, unioning together all pairs of adjacent cells, and then unioning together all nodes in the top row and unioning all nodes in the bottom row. Doing a "find" query to see if any one of the top nodes is connected to any of the bottom nodes will then solve your problem. This will take time O(mn α(mn)) for a function α(mn) that grows so slowly that it's essentially three or four, so it's effectively as efficient as BFS or DFS.

Graph search algorithm with fewest accessed nodes

I need an algorithm to find ANY path from point A to point B in a graph.
The problem is that finding out wich nodes can follow a specific one needs a quite lengthy matlab simulation, so i want to access as few nodes as possible.
I know some heuristics about the graph, I.E. every node has some coordinates and follow-up nodes are always "near" the previous one, but there is not always a connection between two close nodes.
I am not searching for an optimal path, or even a short one. I just need any connection.
My first try was some simple greedy algorithm that always picks a follow up node closest to the final node, but this ended in dead ends very often. This wouldn't be a problem, but i have no idea how to efficiently move out of a deadend, currently i simply move through all nodes inside the dead end until i find a better way.
Here is a drawing of an example where i already know the solution:
There are many nodes, so calculating the edges for every node in this small dead end on the top takes about 1h20min. (You can assume every pixel in the picture is a node.)
To put it in short words: how do i find a good way around the obstacle without looking at every node inside a whole area.
Sorry if this a silly question but i'm an engineer and never had a formal education in programming aside from making a LED blink.
Thanks in advance!

A-Star algorithm (reconstruct path)

I've managed to implement a* and i got the correct path.
The problem is how i can reconstruct the path from start to end, given that every node (from end to start) has a parent link, but not the first one, so my character doesn't know where to go first.
What I'm doing is returning the closed-list and starting from index 0 to x until I reach the end. This usually works well, but I know there's gotta be another way.
Also, what is the best way to check neighboring nodes?
I've made each node to create a rectangle and see if it intersects and that's how I know they're connected. I also use this technique with the player to know when a node has been reached.
Thanks!!

You have your target node (You can simply cache it once it is found).
Go up the tree (using the parent field) until you find a node without it, this node is your root. The path you found by going up the links, is the shortest path in reversed order.
I once addressed a similar question, regarding BFS instead of A*
EDIT: A simple way to reverse the stack back to the original is to put the nodes in a stack while you go from target to source, and when you find the source - start popping elements out the stack.

How do I find all paths through a set of given nodes in a DAG?

I have a list of items (blue nodes below) which are categorized by the users of my application. The categories themselves can be grouped and categorized themselves.
The resulting structure can be represented as a Directed Acyclic Graph (DAG) where the items are sinks at the bottom of the graph's topology and the top categories are sources. Note that while some of the categories might be well defined, a lot is going to be user defined and might be very messy.
Example:
(source: theuprightape.net)
On that structure, I want to perform the following operations:
find all items (sinks) below a particular node (all items in Europe)
find all paths (if any) that pass through all of a set of n nodes (all items sent via SMTP from example.com)
find all nodes that lie below all of a set of nodes (intersection: goyish brown foods)
The first seems quite straightforward: start at the node, follow all possible paths to the bottom and collect the items there. However, is there a faster approach? Remembering the nodes I already passed through probably helps avoiding unnecessary repetition, but are there more optimizations?
How do I go about the second one? It seems that the first step would be to determine the height of each node in the set, as to determine at which one(s) to start and then find all paths below that which include the rest of the set. But is this the best (or even a good) approach?
The graph traversal algorithms listed at Wikipedia all seem to be concerned with either finding a particular node or the shortest or otherwise most effective route between two nodes. I think both is not what I want, or did I just fail to see how this applies to my problem? Where else should I read?

It seems to me that its essentially the same operation for all 3 questions. You're always asking "Find all X below node(s) Y, where X is of type Z". All you need is a generic mechanism for 'locate all nodes below node', (solves Q3) and then you can filter the results for 'nodetype=sink' (solves Q1). For Q2, you have the starting-point (your node set) and your ending point (any sink below the starting point) so your solution set is all paths from starting node specified to the sink. So I would suggest that what you basically have a is a tree, and basic tree-traversal algorithms would be the way to go.

Despite the fact that your graph is acyclic, the operations you cite remind me of similar aspects of control flow graph analysis. There is a rich set of algorithms based on dominance that may be applicable. For example, your third operation reminds me od computing dominance frontiers; I believe that algorithm would work directly if you temporarily introduce "entry" and "exit" nodes. The entry node connects the "given set of nodes" and the exit nodes connects the sinks.
Also see Robert Tarjan's basic algorithms.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio