Application of modified BFS/DFS on a friendship network - algorithm

I have a very intresting problem regarding graphs and friendship networks. It is as follows:
A teacher wants to make sure that his students aren't cheating by making sure no pair of people who know each other get the same homework. He believes he can do this by making only two different versions of the homework. Design an algorithm (pseudo code) to test whether this is possible or not for a given graph of students and their connections. The algorithm should be based on DFS or BFS.
I approached the problem using BFS and modifying it by comparing to already visited nodes
Modified BFS(v)
Mark v as visited
Enqueue v
While queue is not empty
Dequeue v
Assign either homework to v
For all unvisited neighbors a of v
Give them the opposite homework
Mark them as visited
Add them to queue
For all visited neighbors b of v
Check if their homework is the same as v, terminate if it is
This algorith seems to work for small handmade graphs where i can write out every step. Is it bound to work for all such graphs? If the algorithm is correct, is the pseudo code also acceptable? I am a little uncertain regarding the order of queueing/dequeuing and when it happens as well as whether or not i can make the comparison on the last row without maybe a temporary variable to keep track of the homework for the "current" node.
I wold appreciate any help/input and if you have ideas for another algorithm (for example using DFS), i would appreciate that as well

You made a mistake -- the assign either homework to v goes before the loop.
Then every vertex in the queue will already have homework assigned, and there is never any choice as to which homework to assign to neighbors. This obviously works for all connected graphs. It's a good algorithm.
You need to handle unconnected graphs as well, with a loop over all vertices that runs BFS(v) when v is unvisited.

Related

Solving logic game Lights out with A* algorithm

I have some problems solving the logic puzzle called Lights out using the A* algorithm. For now, I'm using an implementation of the A* algorithm where I consider the entire matrix of lights as a node in the algorithm (where 1 represents the lights on and 0 the lights off), together with the coordinates of the current node that will be toggled. After I select the node from the open list with the lowest f score, I will toggle it and get its 8 adjacent neighbors and then append them to the open list and repeat until I find a node that has the sum of all the lights equals to 0 (all the lights are off).
For calculating the f score of each node, I simply compute the sum of all the lights in their local matrix, thus selecting every time the node which has the matrix with the lowest number of lights on.
I know that the algorithm will be not so performant, even when compared to the "Chasing the Lights" method, but I do not understand how to tell the algorithm which next node to pick, so which f scoring function to use, because considering the sum of the lights in the matrix will end up with the algorithm looping through the same 3/4 nodes every time.
Also, I would like some suggestions on how to represent a node for the algorithm since I can't get how to use an algorithm used generally for path optimization inside a matrix where you have a goal node, used in a situation like this where you consider the entire matrix as the node and your goal is not reaching a particular node but just checking that its sum is 0.
The language that I implemented all the work is Lua.
Thank you.
EDIT 5/27/19
Since I'm new to Lua, I'm blaming my mistakes and my capability of writing code in it and also my understanding of the algorithm to the fact that I'm not able to find the solution.
I wasn't good at explaining the problem I was having so I tried to get the best from the comments I received and now I will post the modified code so that guys if you want to help, you will understand better (code >> words haha).
Note: I wrote the algorithm based on this article A* algorithm
lua source code
Not sure what you're looking for.
First of all, simple observation: each field in your game board you want to toggle at most once (toggling a field two times doesn't do anything).
You can go with a very bad approach and create 2^n game states (nodes in your graph), where n is amount of fields in your game board. Why is it terrible? It would take you at least O(2^n) time and space to just create the graph. In O(2^n) time you can just check all possible moves (since you want to toggle each board once) and just immidiately tell the result instead of running additional A*.
Better idea(?): Let's not phisically create the whole graph. You don't want to visit a node two times, so you should store somewhere (probably as some set of bitmasks, where you'd be able to check if a node is already in the set quickly) already visited nodes. Then, when you're in some node, you check all your neighbours - game states after toggling one field. If they're visited, ignore them, otherwise add them to your priority queue. Then take the first node from priority queue, remove it and 'go' to a game state it is representing.
As I said, you can represent game state as bitmask of size n - 0 on i-th position if i-th field hasn't been toggled, 1 otherwise.
Will that be better than naive approach? Depends on your heuristic function. I have no idea which one is better, you have to try some and check the results. In worst case your A* will check every nodes, making the complexity worse than simple brute force. If you get lucky, it may significantly speed it up.

Calculating time complexity of DFS algorithm

I have been tasked with an assignment where I have to check if a group of people have a "close friendship". This is defined as a group of people where all persons in the group are friends with all other persons in the group. So far I have this as my algorithm:
1) Initialize vertices as not visited
2) Do a DFS traversal of the graph starting from any arbitrary vertex v, marking visited vertices as visited
3) If the DFS traversal visits all vertices, return true
4) If it does not, return false.
Now I have to calculate the time complexity. However, I am having a hard time with time complexity in general, and I am not entirely sure how to do this. The way I see it is that I go through all the vertices in my set, which would be... O(v)? Is this correct? And if it is, what do I do from here?
Since in DFS you visit all vertices only once, but you travel every edge to see whether that edge is taking you to a new vertex or to an already seen vertex, a very accurate measure of the complexity of DFS is O(#edges).
But O(#vertices) is generally an acceptable answer too to the complexity for DFS question, because when you see that an edge is not taking you to a new vertex, you don't explore it further.
So when asked, you can give either answer and explain the reasoning, because neither of them is wrong with supporting explanation.
But this may not be the answer to the actual question you are trying to solve. You are trying to find a closely connected group.
In graph terminology, a closely connected group of friends would be one where each friend node shares an edge with every other friend node. (Re-read your question - it actually literally says that.)
In the image below, majority of the graph is connected and other nodes are reachable from one node using DFTraversal. But close cohorts are the group of nodes with same color.

how to decide whether two persons are connected

Here is the problem:
assuming two persons are registered in a social networking website, how to decide whether they are connected or not?
my analysis (after reading more): actually, the question is looking for - the shortest path from A to B in a graph. I think both BFS and Dijkstra's Algorithms works here and time complexity is exactly the same (O(V+E)) because it is an unweighted graph, so we can't take advantage of the priority queue. So, a simple queue could resolve the problem. But, both of them doesnt resolve the problem that: find the path between them.
Bidrectrol should be a better solution at this point.
To find a path between the two, you should begin with a breadth first search. First find all neighbors of A, then find all neighbors of all neighbors of A, etc. Once B is hit, not only do you have a path from A to B, but you also have a shortest such path.
Dijkstra's algorithm rocks, and you may be able to speed this up by working from both end, i.e. find neighbors of A and neighbors of B, and compare.
If you do a depth first search, then you're following one path at a time. This will be much much slower.
If you do dfs for finding whether two people are connected on a social network, then it will take too long!
You already know the two persons, so you should use Bidirectional Search.. But, simple bidirectional search won't be enough for a graph as big as a social networking site. You will have to use some heuristics. Wikipedia page has some links to it.
You may also be able to use A* search. From wikipedia : "A* uses a best-first search and finds the least-cost path from a given initial node to one goal node (out of one or more possible goals)."
Edit: I suggest A* because "The additional complexity of performing a bidirectional search means that the A* search algorithm is often a better choice if we have a reasonable heuristic." So, if you can't form a reasonable heuristic, then use Bidirectional search. (Forming a good heuristic is never easy ;).)
One way is to use Union Find, add all links union(from,to), and if find(A) is find(B) is True then A and B are connected. This avoids the recursive search but it actually computes the connectivity of all pairs and doesn't give you the paths that connects A and B.
I think that the true criteria is: there are at least N paths between A and B shorter then K, or A and B are connected diectly. I would go with K = 3 and N near 5, i.e. have 5 common friends.
Note: answer edited.
Any method might end up being very slow. If you need to do this repeatedly, it's best to find the connected components of the graph, after which the task becomes a trivial O(1) operation: if two people are in the same component, they are connected.
Note that finding connected components for the first time might be slow, but keeping them updated as new edges/nodes are added to the graph is fast.
There are several methods for finding connected components.
One method is to construct the Laplacian of the graph, and look at its eigenvalues / eigenvectors. The number of zero eigenvalues gives you the number of connected components. The non-zero elements of the corresponding eigenvectors gives the nodes belonging to the respective components.
Another way is along the following lines:
Create a transformation table of nodes. Element n of the array contains the index of the node that node n transforms to.
Loop through all edges (i,j) in the graph (denoting a connection between i and j):
Compute recursively which node do i and j transform to based on the current table. Let us denote the results by k and l. Update entry k to make it transform to l. Update entries i and j to point to l as well.
Loop through the table again, and update each entry to point directly to the node it recursively transforms to.
Now nodes in the same connected component will have the same entry in the transformation table. So to check if two nodes are connected, just check if they transform to the same value.
Every time a new node or edge is added to the graph, the transformation table needs to be updated, but this update will be much faster than the original calculation of the table.

Should I iterate over a directed graph using Iterative deepening depth-first search (IDDFS)?

Example: I have 20 persons as object, and every person knows 0-n others. The direction of the link matters! A person A might know B, but B might not know A. It's a directed graph.
Edit: For simplification, my node objects (in this case Person objects) are able to store arbitrary information. I know this is not the best design but for now that would be fine.
So in the worst case everyone is connected with everyone else, everyone knows everyone else.
This is no real use case but I want to write a test for this to learn and play around. In a productive environment the number of objects would be limited to about 20, but the ways in which those objects are connected to eachother are unlimited.
This illustrates the problem in a simplified way:
thanks to source
Given a specific person as starting point, I want to walk through the whole graph and examine every possible path exactly once without getting stuck in an infinite loop.
Let's imagine person A knows B, who knows C, and who knows A. The output might be:
A knows B knows C knows A (ok but we don't want to end in an infinite loop so we stop here)
A knows C knows A
A knows T knows R knows V
This would be stupid and must be eliminated:
A knows B knows C knows A knows C knows A knows T knows R knows V ...
I do have a couple of crazy ideas how to tackle this problem. But...
Question) Must I do that with an Iterative deepening depth-first search (IDDFS)?
Jon was so kind to point out DFS on Wikipedia
I'm stuck with this part in the article:
a depth-first search starting at A,
assuming that the left edges in the
shown graph are chosen before right
edges, and assuming the search
remembers previously-visited nodes and
will not repeat them (since this is a
small graph), will visit the nodes in
the following order: A, B, D, F, E, C,
G. The edges traversed in this search
form a Trémaux tree, a structure with
important applications in graph
theory.
specifically this note:
"(since this is a small graph)"
OK so what if this is a huge graph?
Edit: I should mention the authors title and question has changed so much, some of the information in this answer may not be 100% relevant.
As Jon has already mentioned, this is indeed, a graph. A directed graph in fact.
I suggest you look at Adjacency matrices, they will provide you with direct insight as to how you can reach a solution.
I imagine your original lazy solution was probably something akin to an Adjacency list; which is fine, but isn't as easy to implement, and also may be harder to traverse. There is two main differences between the two.
Adjacency lists will take up more space, but may be nicer in larger networks in minimizing computation over unconnected nodes; whereas adjacency matrices are are little more friendly, but store data for every edge, regardless of whether it exists (connected) or not.
The primary concern I found when using adjacency lists, was not their theoretical space, but in C++, I was storing each connected node as a pointer in a vector inside each node; this could get way out of hand as soon as the network got bigger, and was very unfriendly to visualize as well as managing new nodes and deleting nodes.
In comparison with adjacency matrices, which have a single reference for all nodes (can be stored in a single vector of nodes) and can be easily modified.
If your question is truly about traversal, then if your graph is implemented as an adjacency matrix, as a vector of vectors, traversal is simple. See below pseudocode:
To read (for each neuron) all neurons a neuron's axon is connected to (ie neuron outputs)
for (size_t i = 0; i < n; ++i) { // adjacency matrix is n * n
Neuron& neuron = nodes[i];
for (size_t j = 0; i < n; ++i) {
Axon_connection& connection = edges[j][i];
if (connection.exists()) {
...
}
}
}
To read all (for each neuron) neurons a neuron's dendrites are connected to (ie neuron inputs)
for (size_t i = 0; i < n; ++i) { // adjacency matrix is n * n
Neuron& neuron = nodes[i];
for (size_t j = 0; i < n; ++i) {
Dendrite& dendrite = edges[j][i];
if (dendrite.exists()) {
...
}
}
}
Note this second method may not be cache friendly for big networks, depending on your implementation.
The exists method simply ensures the adjacency matrix bit is set to true, you can then implement other data such as strengths in these edges.
My friend, you have posted many very similar questions over the last day or two. I suggest you take a little bit of time out and read an introductory textbook on graph theory, or find some lectures on the subject.
Then you will at least know how to recognize and classify the standard problems. All you are going to get on SO are links back to such resources - it's not worth anyone's time writing out a fresh exposition. When you have a specific question, or are stuck understanding a particular issue, then ask and we will be happy to help, but you need to meet us half-way.
To answer your question, you can perform depth first search and breadth first search on an arbitrary graph as you have previously done on a tree - you just need to keep track of which nodes you have visited. Look out for this in any code/pseudocode you encounter. You don't have to keep track of visited notes on a tree (as in your other questions), as a tree is a special instance of a graph (a connected acyclic graph) which cannot be "wildly interconnected".
In answer to your original question, it is definitely theoretically possible to solve. However if you are after the shortest path then this looks suspiciously like the travelling salesman problem which is NP-hard.
In any case, there are many different graph traversal algorithms (DFS, IDDFS, BFS, etc) which could be of use.
Your data structure is indeed a graph.
I hate to provide such a bare answer, but the question is so basic that Graph Traversal on Wikipedia is more than adequate. The two basic approaches are explained and there is also pseudocode.
One way (and not, necessarily, the best way) to do this is to modify the graph.
For example, say that the graph initially encodes A-->B-->C. If the edge A-->C does not exist, add the edge A-->C.
You can do this for each node in your graph to explicitly state which nodes know each other.

Using A* to solve Travelling Salesman

I've been tasked to write an implementation of the A* algorithm (heuristics provided) that will solve the travelling salesman problem. I understand the algorithm, it's simple enough, but I just can't see the code that implements it. I mean, I get it. Priority queue for the nodes, sorted by distance + heuristic(node), add the closest node on to the path. The question is, like, what happens if the closest node can't be reached from the previous closest node? How does one actually take a "graph" as a function argument? I just can't see how the algorithm actually functions, as code.
I read the Wikipedia page before posting the question. Repeatedly. It doesn't really answer the question- searching the graph is way, way different to solving the TSP. For example, you could construct a graph where the shortest node at any given time always results in a backtrack, since two paths of the same length aren't equal, whereas if you're just trying to go from A to B then two paths of the same length are equal.
You could derive a graph by which some nodes are never reached by always going closest first.
I don't really see how A* applies to the TSP. I mean, finding a route from A to B, sure, I get that. But the TSP? I don't see the connection.
I found a solution here
Use minimum spanning tree as a heuristic.
Set
Initial State: Agent in the start city and has not visited any other city
Goal State: Agent has visited all the cities and reached the start city again
Successor Function: Generates all cities that have not yet visited
Edge-cost: distance between the cities represented by the nodes, use this cost to calculate g(n).
h(n): distance to the nearest unvisited city from the current city + estimated distance to travel all the unvisited cities (MST heuristic used here) + nearest distance from an unvisited city to the start city. Note that this is an admissible heuristic function.
 
You may consider maintaining a list of visited cities and a list of unvisited cities to facilitate computations.
The confusion here is that the graph on which you are trying to solve the TSP is not the graph you are performing an A* search on.
See related: Sudoku solving algorithm C++
To solve this problem you need to:
Define your:
TSP states
TSP initial state
TSP goal state(s)
TSP state successor function
TSP state heuristic
Apply a generic A* solver to this TSP state graph
A quick example I can think up:
TSP states: list of nodes (cities) currently in the TSP cycle
TSP initial state: the list containing a single node, the travelling salesman's home town
TSP goal state(s): a state is a goal if it contains every node in the graph of cities
TSP successor function: can add any node (city) that isn't in the current cycle to the end of the list of nodes in the cycle to get a new state
The cost of the transition is equal to the cost of the edge you're adding to the cycle
TSP state heuristic: you decide
If it's just a problem of understanding the algorithm and how it works you might want to consider drawing a graph on paper, assigning weights to it and drawing it out. Also you can probably find some animations that show Dijkstra's shortest path, Wikipedia has a good one. The only difference between Dijkstra and A* is the addition of the heuristic, and you stop the search as soon as you reach the target node. As far as using it to solve the TSP, good luck with that!
Think about this a little more abstractly. Forget about A* for a moment, it's just dijkstra's with a heuristic anyway. Before, you wanted to get from A to B. What was your goal? To get to B. The goal was to get to B with the least cost. At any given point, what was your current "state"? Probably just your location on the graph.
Now, you want to start at A, then go to both B and C. What is your goal now? To pass over both B and C, maintaining least cost. You can generalize this with more nodes: D, E, F, ... or just N nodes. Now, at any given point, what is your current "state"? This is critical: it ISN'T just your location in the graph--it's also which of B or C or whatever nodes you have visited so far in the search.
Implement your original algorithm so that it calls some function asking if it has reached "the goal state" after making X move. Before, the function would have just said "yes, you're at state B, therefore you are at the goal". But now, let that function return "yes, you're at the goal state" if the search's path has passed over each of the points of interest. It'll know whether or not the search has passed over all points of interest because that's included in the current state.
After you get that, improve the search with some heuristic, and A* it up.
To answer one of your questions...
To pass a graph as a function argument, you have several options. You could pass a pointer to an array containing all the nodes. You could pass just the one starting node and work from there, if it's a fully connected graph. And finally, you could write a graph class with whatever data structures you need inside it, and pass a reference to an instance of that class.
As for your other question about closest nodes, isn't part of A* search that it will backtrack as needed? Or you could implement your own sort of backtracking to handle that kind of situation.
The question is, like, what happens if the closest node can't be reached from the previous closest node?
This step isn't necessary. As in, you aren't computing a path from the previous closest to the current closest, you are trying to get to your goal node, and the current closest is the only thing that matters (e.g. the algorithm doesn't care that last iteration you were 100km away, because this iteration you are only 96km away).
As a broad introduction, A* doesn't directly construct a path: it explores until it definitely knows that the path is contained within the region it has explored, and then constructs the path based on the information recorded during the exploration.
(I'm going to use the code in the Wikipedia article as a reference implementation to aid my explanation.)
You have a two sets of nodes: closedset and openset
closedset holds nodes that have been fully evaluated, that is, you know exactly how far they are from start and all their neighbours are in one of the two sets. This there is no more computation you can do with them and so we can (sort of) ignore them. (Basically these are completely contained within the border.)
openset holds "border" nodes, you know how far these are from start, but you haven't touched their neighbours yet, so they are on the edge of your search so far.
(Implicitly, there is a third set: completely untouched nodes. But you don't really touch them until they are in openset so they don't matter.)
At a given iteration, if you've got nodes to explore (that is, nodes in openset), you need to work out which one to explore. This is the job of the heuristic, it basically gives you a hint about which point on the border will be the best to explore next by telling you which node it thinks will have the shortest path to goal.
The previous closest node is irrelevant, it just expanded the border a bit, adding new nodes to openset. These new nodes are now candidates for the closest node in this iteration.
At first, openset only contains start, but then you iterate and at each step the border is expanded a little (in the most promising direction), until you eventually reach goal.
When A* is actually doing the exploration, it doesn't worry about which nodes came from where. It doesn't need to, because it knows their distance from start and the heuristic function and that's all it needs.
However to reconstruct the path later, you need to have some record of the path, this is what camefrom is. For a given node, camefrom links it to the node that is closest to start, so you can reconstruct the shortest path by following the links backwards from goal.
How does one actually take a "graph" as a function argument?
By passing one of the representations of a graph.
I don't really see how A* applies to the TSP. I mean, finding a route from A to B, sure, I get that. But the TSP? I don't see the connection.
You need a different heuristic and a different end condition: goal is no longer a single node any more, but the state of having everything connected; and your heuristic is some estimate of the length of the shortest path connecting the remaining nodes.

Resources