Find connected-blocks with certain value in a grid - algorithm

I'm having trouble finding an algorithm for my problem.
I have a grid of 8x8 blocks, each block has a value ranging from 0 to 9. And I want to find collections of connected blocks that match a total value of for example 15. My first approach was to start of at the border, that worked fine. But when starting in the middle of the grid my algorithm gets lost.
Would anyone know a simple algorithm to use or can you point me in the right direction?
Thanks!

As far as I know, no simple algorithm exists for this. As for pointing you in the right direction, an 8x8 grid is really just a special case of a graph, so I'd start with graph traversal algorithms. I find that in cases like this, it sometimes helps to think how you would solve the problem for a smaller grid (say, 3x3 or 4x4) and then see if your algorithm scales up to "full size."
EDIT :
My proposed algorithm is a modified depth-first traversal. To use it, you'll have to convert your grid into a graph. The graph should be undirected, since connected blocks are connected equally in both directions.
Each graph node represents a single block, containing the block's value and a visited variable. Edge weights represent their edges' resistance to being followed. Set them by summing the values of the nodes they connect. Depending on the sum you're looking for, you may be able to optimize this by removing edges that are guaranteed to fail. For example, if you're looking for 15, you can delete all edges with weight of 16 or greater.
The rest of the algorithm will be performed as many times as there are blocks, with each block serving as the starting block once. Traverse the graph by following the lowest-weighted edge from the current node, unless that takes you to a visited node. Push each visited node onto a stack and set its visited variable to true. Keep a running sum for every path followed.
Whenever the desired sum is reached, save the current path as one of your answers. Do not stop traversal, because the current node could be connected to a zero.
Whenever the total exceeds the desired sum, backtrack by setting visited to false and popping the current node off the stack.
Whenever all edges for a given node have been explored, backtrack.
After every possible path from a given starting node is analyzed, every answer that includes that node has been found. So, remove all edges touching the starting node and choose a new starting node.
I haven't fully analyzed the efficiency/running time of this algorithm yet, but... it's not good. (Consider the number of paths to be searched in a graph containing all zeroes.) That said, it's far better than pure brute force.

Related

A star pathfinding. Why do you need to re-evaluate an adjacent node that's already in the open list if it has a lower g cost to the current node?

There is one thing about a star path finding algorithm that I do not understand. In the pseudocode; if the current node's (the node being analysed) g cost is less than the adjacent nodes g cost then recalculate the adjacent nodes g,h a f cost and reassign the parent node.
Why do you do this?
Why do you need to reevaluate the adjacent nodes costs and parent if it's gCost is greater than the current nodes gCost? I'm what instance would you need to do this?
Edit; I am watcing this video
https://www.youtube.com/watch?v=C0qCR18gXdU\
At at 8.19 he says: When you come across blocks (nodes) that have already been analysed, the question is should we change the properties of the block?
First a tip. You can actually add the time you want as a bookmark to get a video that starts right where you want. In this case https://www.youtube.com/watch?v=C0qCR18gXdU#t=08m19s is the bookmarked time link.
Now the quick answer to your question. We fill in a node the first time we find a path to it. But the first path we found to it might not be the cheapest one. We want the cheapest one, and if we find a cheaper one second, we want it.
Here is a visual metaphor. Imagine a running path with a fence next to it. The spot we want is on the other side of the fence. Actually draw this out, it will help.
The first path that our algorithm finds to it is run down the path, jump over the fence. The second path that we find is run part way down the path, go through the gate, then get to that spot. We don't want to throw away the idea of using the gate just because we already figured out that we could get there by jumping the fence!
Now in your picture put costs of moving from one spot to another that are reasonable for moving along a running path, an open field, through a gate, and jumping a fence. Run the algorithm by hand and you'll see that you figure out first that you can jump the fence, and then later that you really wanted to use the gate.
This guy is totally wrong because he says change the parent node however your succesors are based on your parent node and if you change parent Node then you you can't have a valid path because the path is simply by moving from parent to child.
Instead of changing parent, Pathmax function. It says that if a parent Node A have a child node whose cost (heuristic(A) <= heuristic(A) + accumulated(cost)) then set the cost of child equal to the cost of parent.
PathMax to ensure monotonicty:
Monotonicity: Every parent Node has cost greater or equal then the cost of it's child node.
A* has a property: It says that if your cost is monotonically increasing then the first (sub)path that A* finds is always the part of the final path. More precisely: Under monotonicity each node is reach first through the best path.
Do you see why?
Suppose you have a graph :(A,B) (B,C) (A,E) ,(E,D) here every tuple means they are connected. Suppose cost is monotonically increasing and your algortihm chooses (A,B),(B,C) and at this point you know your algorithm have chosen best path till now and everyother path which can reach this node,must have cost higher but if the cost is not monotonically increasing then it can be the case that (A,E) is cost greater than your current cost and from (E,D) it's zero. so you have better path there.
This algorithm relies on it's heuristic function , if it's underustimated then it gets corrected by accumulated cost but if it's over-estimated then it can explore extra node and i leave it for you why this happends.
Why do you need to re-evaluate an adjacent node that's already in the open list if it has a lower g cost to the current node?
Don't do this because it's just extra work.
Corollary: if you later come the same node from a node p with same cost then simply remove that node from queue. Do not extend it.

How to search through different parts of a graph?

Recently I met a coding problem that asks you on a given graph to find out how many different "closed" sub-graphs there are. And after you have found that out you need to search each sub-graph and find how many element are there in each sub-graph. So now to define sub-graph. Let's say we're given
.#####.
#.....#
#.E##.#
#.#.#.#
#.#####
#E..#E#
.#####.
Think of it like a maze where dots represents moving space, while the hashtags are walls and you can move horizontally or vertically. So let's say you are at one point in the graph. All the points you can reach by moving horizontally or vertically are part of a that particular "closed" sub-graph. So for the given example we have 3 "closed" sub-graphs
1#####1
#11111#
#11##1#
#1#2#1#
#1#####
#111#3#
1#####3
Also there are 2 elements in the first sub-graph, no elements in the second and one in the 3rd.
I guess it really doesn't matter what searching method you use, so I used BFS starting from the first entered dot in the line. So once I've reach all possible points starting from that particular point I have found one sub-graph and I have counted how many elements there are in the sub-graph. But the problem now is how to find the starting point of the next sub-graph. I can not think of another way than iterating through the graph until you find a non-visited point and the apply the BFS repeateadly until you have visited all the points. But this method proves to be too much time-consuming, so is there any way I can efficiently find the sub-graphs? For example is there a way to stack at least a point from each sub-graph in a queue, while you're entering the line?
Instead of iterating through the entire graph to find non-visited points you could try iterating though just the walls adjacent to your known sub-groups and look for non-visited points adjacent to those walls. You would be able to compile the list of walls during the bfs.
You can first iterate the graph and find all possible "starting point", hold these elements in a set data structure, and when you do a BFS - for each point you find - remove it from the set of entry points.
Now, at each iteration - all you have to do is choose a random entry point from the set (which is guaranteed to be unvisited yet), and do a new BFS.

Marking special nodes that traverses a certain length in a graph to fill up every node

So I am working on a problem where you have one graph, where all edges has a certain weight to it. Now the algorithm is supposed to select certain node(s) in the graph and the selected nodes must be able to traverse/span through all other nodes within a certain total weight. The output should be the minimum number of selected nodes and the position of the selected nodes (it doesn't matter which position it outputs as long as it is the minimum number of selected nodes)
I have thought of a couple simple solutions but neither of them seem too good so far.
Brute force by trying out every node. The algorithm will start with one selected node and then try every combination of selected nodes, which then loops through every other non-selected nodes and check if they are all within range. If not it increases one extra selected node and repeats the same process.
The algorithm makes a list of subgraph that can be traversed with 1 selected node at every node position. Then attempt to puzzle fit the subgraphs so that it re-creates the original graph and if it succeeds that would be the solution.
As an example, here's a picture of a grid as the graph.
The weights of every edge in this grid is 1, and each selected node can travel through a total weights of 2.
I'm not too familiar with graph problems so if there is a similar question out there or if anyone can provide any help with the solution that would be great!
First, you can get rid of the edge weights by simply making an edge between two nodes if they are within the distance limit. Then you need to find a subset of the nodes such that each node is either selected or neighbor of a selected node. This is known as the Dominating Set problem. Unfortunately, it is NP-hard, so typically it is solved heuristically or using Integer Linear Programming. It might be possible to take advantage of certain properties of the input, but it is hard to tell without knowing more about what they look like.

Why does Djikstra's algorithm need to keep track of the number of steps?

I can understand keeping track of the accumulated distance, the distance per path, and keeping track of the name (or position) of the vertex, but why keep track of the number of steps unless you are wanting to track how efficiently it reached its destination?
The step is totally unnecessary for finding the path, and it seems rather arbitrary anyway. For instance, if you have multiple vertices where the accumulated distance is the same, and the smallest number, there is no reason to care which one you start from, but whichever one it is gets labelled with the next step in line.
I see many pieces of code around, and they generally follow this principle of keeping track of the steps. It seems very strange, especially when many of them are pathfinding on a 2D matrix where the cost of movement is either 1 or infinite. In that case, it seems to me that not only is the number of steps per vertex superfluous, but the only information necessary to be bothered with is the distance and the label of the vertex. If you have a distance, you know you have visited the vertex, and since all distances are the same, the first time you reach a vertex should always be its lowest distance. No evaluating whether it is lower or greater is necessary, only that it exists.
Anyway, I'm just curious why something so simple should have superfluous information gathered. Is there some reason for it I'm just not grasping?
EDIT--
To add a little clarity, and since it wasn't formatting properly in the comment, the step is normally shown in the table people tell you to use.
____________________
|name|step|distance|
--------------------
|temporary Labels |
--------------------
The step is added when a position is the next shortest point to the origin.
Okay, I have seen that video now and it’s actually the first time I have ever seen such a table being used. It does not make much sense to me. It completely mixes “labels” with “distances”; a permanent label is the order in which nodes were marked, while temporary labels are the current non-fixed distances. Neither of these are necessary at all.
Instead what you usually have for a node is the following: The distance (from the start node), the parent (or previous) node, and a mark to mark a node as completed or not (in an implementation you usually have a priority queue for all unmarked nodes instead).
You then keep looking at the unmarked node with the smallest total distance, mark it and update the distance of all the unmarked neighbors. And whenever you update to a shorter distance you also update the parent node.
In no way though you need to have the order in which you marked the nodes as completed or have all the previous uncomplete distances. To me, in that video, it seems as if it’s just a way to make it easier to check a student’s work, as without identical distances you always have a single order in which you would look at the vertices.
That being said, the normal Dijkstra algorithm does not include this stuff, and it’s not necessary. See the pseudocode on Wikipedia for implementation details on what you actually store (as said, you usually have only the distance and parent for each node, and a priority queue for the unmarked nodes).
It seems very strange, especially when many of them are pathfinding on a 2D matrix where the cost of movement is either 1 or infinite.
What you are describing here is a very special case. The Dijkstra algorithm is actually used for many graph problems where distances are not equal, and with more connections that just 4 simple neighbors in every direction.

An algorithm to check if a vertex is reachable

Is there an algorithm that can check, in a directed graph, if a vertex, let's say V2, is reachable from a vertex V1, without traversing all the vertices?
You might find a route to that node without traversing all the edges, and if so you can give a yes answer as soon as you do. Nothing short of traversing all the edges can confirm that the node isn't reachable (unless there's some other constraint you haven't stated that could be used to eliminate the possibility earlier).
Edit: I should add that it depends on how often you need to do queries versus how large (and dense) your graph is. If you need to do a huge number of queries on a relatively small graph, it may make sense to pre-process the data in the graph to produce a matrix with a bit at the intersection of any V1 and V2 to indicate whether there's a connection from V1 to V2. This doesn't avoid traversing the graph, but it can avoid traversing the graph at the time of the query. I.e., it's basically a greedy algorithm that assumes you're going to eventually use enough of the combinations that it's easiest to just traverse them all and store the result. Depending on the size of the graph, the pre-processing step may be slow, but once it's done executing a query becomes quite fast (constant time, and usually a pretty small constant at that).
Depth first search or breadth first search. Stop when you find one. But there's no way to tell there's none without going through every one, no. You can improve the performance sometimes with some heuristics, like if you have additional information about the graph. For example, if the graph represents a coordinate space like a real map, and most of the time you know that there's going to be a mostly direct path, then you can attempt to have the depth-first search look along lines that "aim towards the target". However, imagine the case where the start and end points are right next to each other, but with no vector inbetween, and to find it, you have to go way out of the way. You have to check every case in order to be exhaustive.
I doubt it has a name, but a breadth-first search might go like this:
Add V1 to a queue of nodes to be visited
While there are nodes in the queue:
If the node is V2, return true
Mark the node as visited
For every node at the end of an outgoing edge which is not yet visited:
Add this node to the queue
End for
End while
Return false
Create an adjacency matrix when the graph is created. At the same time you do this, create matrices consisting of the powers of the adjacency matrix up to the number of nodes in the graph. To find if there is a path from node u to node v, check the matrices (starting from M^1 and going to M^n) and examine the value at (u, v) in each matrix. If, for any of the matrices checked, that value is greater than zero, you can stop the check because there is indeed a connection. (This gives you even more information as well: the power tells you the number of steps between nodes, and the value tells you how many paths there are between nodes for that step number.)
(Note that if you know the number of steps in the longest path in your graph, for whatever reason, you only need to create a number of matrices up to that power. As well, if you want to save memory, you could just store the base adjacency matrix and create the others as you go along, but for large matrices that may take a fair amount of time if you aren't using an efficient method of doing the multiplications, whether from a library or written on your own.)
It would probably be easiest to just do a depth- or breadth-first search, though, as others have suggested, not only because they're comparatively easy to implement but also because you can generate the path between nodes as you go along. (Technically you'd be generating multiple paths and discarding loops/dead-end ones along the way, but whatever.)
In principle, you can't determine that a path exists without traversing some part of the graph, because the failure case (a path does not exist) cannot be determined without traversing the entire graph.
You MAY be able to improve your performance by searching backwards (search from destination to starting point), or by alternating between forward and backward search steps.
Any good AI textbook will talk at length about search techniques. Elaine Rich's book was good in this area. Amazon is your FRIEND.
You mentioned here that the graph represents a road network. If the graph is planar, you could use Thorup's Algorithm which creates an O(nlogn) space data structure that takes O(nlogn) time to build and answers queries in O(1) time.
Another approach to this problem would allow you to ignore all of the vertices. If you were to only look at the edges, you can produce a transitive closure array that will show you each vertex that is reachable from any other vertex.
Start with your list of edges:
Va -> Vc
Va -> Vd
....
Create an array with start location as the rows and end location as the columns. Fill the arrays with 0. For each edge in the list of edges, place a one in the start,end coordinate of the edge.
Now you iterate a few times until either V1,V2 is 1 or there are no changes.
For each row:
NextRowN = RowN
For each column that is true for RowN
Use boolean OR to OR in the results of that row of that number with the current NextRowN.
Set RowN to NextRowN
If you run this algorithm until the end, you will quickly have a complete list of all reachable vertices without looking at any of them. The runtime is proportional to the number of edges. This would work well with a reasonable implementation and a reasonable number of edges.
A slightly more complex version of this algorithm would be to only calculate the vertices reachable by V1. To do this, you would focus your scope on the ones that are currently reachable at any given time. You can also limit adding rows to only one time, since the other rows are never changing.
In order to be sure, you either have to find a path, or traverse all vertices that are reachable from V1 once.
I would recommend an implementation of depth first or breadth first search that stops when it encounters a vertex that it has already seen. The vertex will be processed on the first occurrence only. You need to make sure that the search starts at V1 and stops when it runs out of vertices or encounters V2.

Resources