Choosing greedy algorithm to find lowest cost path - algorithm

I have a pyramid of numbers. Each number represents the number of points associated. I need to use a greedy algorithm to find the path with the lowest cost to get from the top of the pyramid to the bottom. I've read about uninformed & informed search algorithms, but still I don't know what to choose. What do you thing is best suited for this type of problem? Greedy best-first search / A* search or other? It's such a simple issue, but I'm not used with all these algorithms to know what's the best option. And as I said, it has to be a greedy algorithm.

If I am understanding you correctly, in your pyramid you always have the option of descending to the left or to the right, and the cost of getting to the bottom is the sum of all the nodes you pass through.
In this case, simply work your way up from the bottom. Start at the 2nd row from the bottom. For each node in the row, look at its left and right children in the row below. Add the cost of the cheaper child node to the node you are on. Move up a row and repeat, until you are at the root/peak. Each node will now contain the cost of the cheapest path from there to the bottom. Just greedily descend by choosing the child node with the cheaper cost.

If you don't have a must of using greedy algorithm which isn't correct here.
For this kind of problem you naturally use a technique called "dynamic programming".
You initialize all squares of your pyramid (you make a backup) with infinity - except the initial point which has value of its own.
And you proccess pyramid from top to bottom, row by row.
You try to go wherever you can from the first row (so the only one is top) and you update nodes at the second row, giving them the value of the top + their value. And then you move to second row, and update nodes in the next row.
It is possible that earlier you've found a better route to that node (leading from the node placed one place left) so you only update if the newly created route is "faster". (You made therefore an infinity initialization, meaning that at the beggining you don't know if any route actually exists) .After you finish processing a level of pyradim that way you know that you have best possible routes to nodes that are placed in the level just below.
Even if it sounds a bit complicated it's quite easy to implement, i hope it won't make you a problem.

What you want is the Dijkstra-Algorithm it is simpler then A* search but I guess a DFS would do that to. I'm not sure what you really want.

Related

Solving logic game Lights out with A* algorithm

I have some problems solving the logic puzzle called Lights out using the A* algorithm. For now, I'm using an implementation of the A* algorithm where I consider the entire matrix of lights as a node in the algorithm (where 1 represents the lights on and 0 the lights off), together with the coordinates of the current node that will be toggled. After I select the node from the open list with the lowest f score, I will toggle it and get its 8 adjacent neighbors and then append them to the open list and repeat until I find a node that has the sum of all the lights equals to 0 (all the lights are off).
For calculating the f score of each node, I simply compute the sum of all the lights in their local matrix, thus selecting every time the node which has the matrix with the lowest number of lights on.
I know that the algorithm will be not so performant, even when compared to the "Chasing the Lights" method, but I do not understand how to tell the algorithm which next node to pick, so which f scoring function to use, because considering the sum of the lights in the matrix will end up with the algorithm looping through the same 3/4 nodes every time.
Also, I would like some suggestions on how to represent a node for the algorithm since I can't get how to use an algorithm used generally for path optimization inside a matrix where you have a goal node, used in a situation like this where you consider the entire matrix as the node and your goal is not reaching a particular node but just checking that its sum is 0.
The language that I implemented all the work is Lua.
Thank you.
EDIT 5/27/19
Since I'm new to Lua, I'm blaming my mistakes and my capability of writing code in it and also my understanding of the algorithm to the fact that I'm not able to find the solution.
I wasn't good at explaining the problem I was having so I tried to get the best from the comments I received and now I will post the modified code so that guys if you want to help, you will understand better (code >> words haha).
Note: I wrote the algorithm based on this article A* algorithm
lua source code
Not sure what you're looking for.
First of all, simple observation: each field in your game board you want to toggle at most once (toggling a field two times doesn't do anything).
You can go with a very bad approach and create 2^n game states (nodes in your graph), where n is amount of fields in your game board. Why is it terrible? It would take you at least O(2^n) time and space to just create the graph. In O(2^n) time you can just check all possible moves (since you want to toggle each board once) and just immidiately tell the result instead of running additional A*.
Better idea(?): Let's not phisically create the whole graph. You don't want to visit a node two times, so you should store somewhere (probably as some set of bitmasks, where you'd be able to check if a node is already in the set quickly) already visited nodes. Then, when you're in some node, you check all your neighbours - game states after toggling one field. If they're visited, ignore them, otherwise add them to your priority queue. Then take the first node from priority queue, remove it and 'go' to a game state it is representing.
As I said, you can represent game state as bitmask of size n - 0 on i-th position if i-th field hasn't been toggled, 1 otherwise.
Will that be better than naive approach? Depends on your heuristic function. I have no idea which one is better, you have to try some and check the results. In worst case your A* will check every nodes, making the complexity worse than simple brute force. If you get lucky, it may significantly speed it up.

What's the best pathfinding algorithm in complexity?

I need to implement a pathfinding algorithm in one of my programs. The goal is to know whether a path exists or not. As a consequence, knowing the path itself isn't important.
I already did some researches and I am not sure which one to pick. This post have been telling that a DFS or a BFS would be more suitable for this kind of programs but I'd rather have confirmation knowing the exact situation. I also would be interested in knowing the complexity itself of the program, but I guess I can find this. It's fine if it's not shared.
Here's the graph I am using: let's say I have a x*y grid with zones the path can and cannot take.
I want to know if there is an existing path that starts from the top of the graph and ends on the bottom of the graph. Here's an example with the path in red:
I believe DFS is the best in complexity but I also am not sure exactly how to implement it knowing the different start points the path can take. I am not sure if it's better to launch the DFS on each of the different points the path can start or if I add a layer of zones the path can take to let one test work.
Thank you for your help!
There are a number of different approaches that you can take here. Assuming that the grids you're working with are of roughly the size that you're showing above, and assuming you aren't, say, processing millions of grids at once, chances are that both breadth-first search and depth-first search would work equally well. The advantage of breadth-first search is that it will find the shortest path from anywhere in the top to anywhere in the bottom; the disadvantage is that it typically requires more memory than depth-first search. But again, if you're working with grids on the order of, say, hundreds or thousands of cells each, chances are that this memory overhead isn't going to be too much of a problem. I'd say to pick whichever algorithm you feel most comfortable working with and go with it.
As for how to implement a search from "anywhere in the top" to "anywhere in the bottom," you can achieve this in a few different ways.
If you're using a depth-first search, you can run one depth-first search from each of the cells in the top row and search for a path down to the bottom row. DFS requires you to maintain some information about which cells have and have not been visited. If you recycle this same information across all the calls to DFS, you'll ensure that no two calls do any duplicated work, and so the resulting solution should be very efficient, running in time O(mn) for an m × n grid.
If you're using a breadth-first search, the modification is pretty straightforward: instead of just enqueuing a single start point in the queue at the beginning of the search, enqueue every cell in the top row at the beginning of the search. The BFS will then naturally explore all possible paths starting anywhere in the top row.
Both of these ideas can be thought of in a different way. Imagine your grid is a graph where each cell is a node and edges correspond to pairs of adjacent cells. You can then add in a new node that sits above the top row of the grid and is connected to each of the nodes in the top row. You then add in a new node that sits just below the bottom row and is connected to each of the nodes in the bottom row. Now, if there's a path from the new top node to the new bottom node, it means that there's a path from some node in the top row to some node in the bottom row, so doing a single search in this graph will be sufficient to check if a path exists. (Fun fact: the two above modifications to DFS and BFS can each be thought of as implicitly doing a search in this new graph.)
There's another option you might want to consider that's fairly easy to implement and imperceptibly less efficient than DFS or BFS, and that's to use a disjoint-set forest data structure to determine what's connected. This data structure supports two kinds of queries:
Given two cells, mark that there's a way to get from the first cell to the second. ("Union")
Given two cells, determine whether there's a path between them, which can be a direct path or could be formed by chaining together multiple other paths. ("Find")
You could implement your connectivity query by building a disjoint-set forest, unioning together all pairs of adjacent cells, and then unioning together all nodes in the top row and unioning all nodes in the bottom row. Doing a "find" query to see if any one of the top nodes is connected to any of the bottom nodes will then solve your problem. This will take time O(mn α(mn)) for a function α(mn) that grows so slowly that it's essentially three or four, so it's effectively as efficient as BFS or DFS.

A star pathfinding. Why do you need to re-evaluate an adjacent node that's already in the open list if it has a lower g cost to the current node?

There is one thing about a star path finding algorithm that I do not understand. In the pseudocode; if the current node's (the node being analysed) g cost is less than the adjacent nodes g cost then recalculate the adjacent nodes g,h a f cost and reassign the parent node.
Why do you do this?
Why do you need to reevaluate the adjacent nodes costs and parent if it's gCost is greater than the current nodes gCost? I'm what instance would you need to do this?
Edit; I am watcing this video
https://www.youtube.com/watch?v=C0qCR18gXdU\
At at 8.19 he says: When you come across blocks (nodes) that have already been analysed, the question is should we change the properties of the block?
First a tip. You can actually add the time you want as a bookmark to get a video that starts right where you want. In this case https://www.youtube.com/watch?v=C0qCR18gXdU#t=08m19s is the bookmarked time link.
Now the quick answer to your question. We fill in a node the first time we find a path to it. But the first path we found to it might not be the cheapest one. We want the cheapest one, and if we find a cheaper one second, we want it.
Here is a visual metaphor. Imagine a running path with a fence next to it. The spot we want is on the other side of the fence. Actually draw this out, it will help.
The first path that our algorithm finds to it is run down the path, jump over the fence. The second path that we find is run part way down the path, go through the gate, then get to that spot. We don't want to throw away the idea of using the gate just because we already figured out that we could get there by jumping the fence!
Now in your picture put costs of moving from one spot to another that are reasonable for moving along a running path, an open field, through a gate, and jumping a fence. Run the algorithm by hand and you'll see that you figure out first that you can jump the fence, and then later that you really wanted to use the gate.
This guy is totally wrong because he says change the parent node however your succesors are based on your parent node and if you change parent Node then you you can't have a valid path because the path is simply by moving from parent to child.
Instead of changing parent, Pathmax function. It says that if a parent Node A have a child node whose cost (heuristic(A) <= heuristic(A) + accumulated(cost)) then set the cost of child equal to the cost of parent.
PathMax to ensure monotonicty:
Monotonicity: Every parent Node has cost greater or equal then the cost of it's child node.
A* has a property: It says that if your cost is monotonically increasing then the first (sub)path that A* finds is always the part of the final path. More precisely: Under monotonicity each node is reach first through the best path.
Do you see why?
Suppose you have a graph :(A,B) (B,C) (A,E) ,(E,D) here every tuple means they are connected. Suppose cost is monotonically increasing and your algortihm chooses (A,B),(B,C) and at this point you know your algorithm have chosen best path till now and everyother path which can reach this node,must have cost higher but if the cost is not monotonically increasing then it can be the case that (A,E) is cost greater than your current cost and from (E,D) it's zero. so you have better path there.
This algorithm relies on it's heuristic function , if it's underustimated then it gets corrected by accumulated cost but if it's over-estimated then it can explore extra node and i leave it for you why this happends.
Why do you need to re-evaluate an adjacent node that's already in the open list if it has a lower g cost to the current node?
Don't do this because it's just extra work.
Corollary: if you later come the same node from a node p with same cost then simply remove that node from queue. Do not extend it.

Why does Djikstra's algorithm need to keep track of the number of steps?

I can understand keeping track of the accumulated distance, the distance per path, and keeping track of the name (or position) of the vertex, but why keep track of the number of steps unless you are wanting to track how efficiently it reached its destination?
The step is totally unnecessary for finding the path, and it seems rather arbitrary anyway. For instance, if you have multiple vertices where the accumulated distance is the same, and the smallest number, there is no reason to care which one you start from, but whichever one it is gets labelled with the next step in line.
I see many pieces of code around, and they generally follow this principle of keeping track of the steps. It seems very strange, especially when many of them are pathfinding on a 2D matrix where the cost of movement is either 1 or infinite. In that case, it seems to me that not only is the number of steps per vertex superfluous, but the only information necessary to be bothered with is the distance and the label of the vertex. If you have a distance, you know you have visited the vertex, and since all distances are the same, the first time you reach a vertex should always be its lowest distance. No evaluating whether it is lower or greater is necessary, only that it exists.
Anyway, I'm just curious why something so simple should have superfluous information gathered. Is there some reason for it I'm just not grasping?
EDIT--
To add a little clarity, and since it wasn't formatting properly in the comment, the step is normally shown in the table people tell you to use.
____________________
|name|step|distance|
--------------------
|temporary Labels |
--------------------
The step is added when a position is the next shortest point to the origin.
Okay, I have seen that video now and it’s actually the first time I have ever seen such a table being used. It does not make much sense to me. It completely mixes “labels” with “distances”; a permanent label is the order in which nodes were marked, while temporary labels are the current non-fixed distances. Neither of these are necessary at all.
Instead what you usually have for a node is the following: The distance (from the start node), the parent (or previous) node, and a mark to mark a node as completed or not (in an implementation you usually have a priority queue for all unmarked nodes instead).
You then keep looking at the unmarked node with the smallest total distance, mark it and update the distance of all the unmarked neighbors. And whenever you update to a shorter distance you also update the parent node.
In no way though you need to have the order in which you marked the nodes as completed or have all the previous uncomplete distances. To me, in that video, it seems as if it’s just a way to make it easier to check a student’s work, as without identical distances you always have a single order in which you would look at the vertices.
That being said, the normal Dijkstra algorithm does not include this stuff, and it’s not necessary. See the pseudocode on Wikipedia for implementation details on what you actually store (as said, you usually have only the distance and parent for each node, and a priority queue for the unmarked nodes).
It seems very strange, especially when many of them are pathfinding on a 2D matrix where the cost of movement is either 1 or infinite.
What you are describing here is a very special case. The Dijkstra algorithm is actually used for many graph problems where distances are not equal, and with more connections that just 4 simple neighbors in every direction.

Find connected-blocks with certain value in a grid

I'm having trouble finding an algorithm for my problem.
I have a grid of 8x8 blocks, each block has a value ranging from 0 to 9. And I want to find collections of connected blocks that match a total value of for example 15. My first approach was to start of at the border, that worked fine. But when starting in the middle of the grid my algorithm gets lost.
Would anyone know a simple algorithm to use or can you point me in the right direction?
Thanks!
As far as I know, no simple algorithm exists for this. As for pointing you in the right direction, an 8x8 grid is really just a special case of a graph, so I'd start with graph traversal algorithms. I find that in cases like this, it sometimes helps to think how you would solve the problem for a smaller grid (say, 3x3 or 4x4) and then see if your algorithm scales up to "full size."
EDIT :
My proposed algorithm is a modified depth-first traversal. To use it, you'll have to convert your grid into a graph. The graph should be undirected, since connected blocks are connected equally in both directions.
Each graph node represents a single block, containing the block's value and a visited variable. Edge weights represent their edges' resistance to being followed. Set them by summing the values of the nodes they connect. Depending on the sum you're looking for, you may be able to optimize this by removing edges that are guaranteed to fail. For example, if you're looking for 15, you can delete all edges with weight of 16 or greater.
The rest of the algorithm will be performed as many times as there are blocks, with each block serving as the starting block once. Traverse the graph by following the lowest-weighted edge from the current node, unless that takes you to a visited node. Push each visited node onto a stack and set its visited variable to true. Keep a running sum for every path followed.
Whenever the desired sum is reached, save the current path as one of your answers. Do not stop traversal, because the current node could be connected to a zero.
Whenever the total exceeds the desired sum, backtrack by setting visited to false and popping the current node off the stack.
Whenever all edges for a given node have been explored, backtrack.
After every possible path from a given starting node is analyzed, every answer that includes that node has been found. So, remove all edges touching the starting node and choose a new starting node.
I haven't fully analyzed the efficiency/running time of this algorithm yet, but... it's not good. (Consider the number of paths to be searched in a graph containing all zeroes.) That said, it's far better than pure brute force.

Resources