Greedy search in prolog using heuristic value

Greedy search in prolog using heuristic value - prolog

I have a graph and a heuristic table, with a list connections and node values and also the cost (heuristic table).
Graph:
Heuristic table:
They're represented in prolog as follows.
s(a,b,2).
s(a,c,1).
s(b,e,4).
s(b,g,2).
s(c,d,1).
s(c,x,3).
s(x,g,1).
h(a,9)
h(b,3)
h(c,2)
h(d,8)
h(e,4)
h(g,0)
h(x,2)
My query, how do I perform a greedy search using the heuristic values h(a,9) to find the next node at each iteration.
I know how to use DFS to get shortest possible path and store that path using a list. I don't know how to take the heuristic values at each node into account to account for this in a greedy search - I do know that it expands the lowest h value node, to find it's next neighbor.
DFS:
depthfirstsearch(GoalN,Path,[GoalN|Path]) :- goal(GoalN).
depthfirstsearch(Node,Path,Solution) :- s(Node,NextNode,_), \+member(NextNode,Path),
depthfirstsearch(NextNode,[Node|Path],Solution)
As I've search SO and the web (finding notes on how greedy search works) but nothing that explains how it actually works using prolog code. It explains how to do this in java, C++, etc. but I'm not using those languages.
Can anyone put me in the right direction? I read somewhere that findall could be used? But how do I combine the heuristic value to the node, for example Node "B" with a cost of 2, from A to B. Do I substitute the cost of 2, with the heuristic value/cost of 3. Please explain in more detail or direct me to another resource that would be beneficial right now?
I could maybe create a predicate to help find the next node at each iteration (using the heuristic value of course)?
Bear in mind I'm a beginner at prolog and trying out ways but struggling to piece it all together.
Update: This link is where I find out most of the info on searches

I think you need to know what is a heuristic value and how it can be used in your searching algorithm.
In my answer:
n is the node we want to reach
s is the source node
h() is the heuristic function. h(n) is an admissible (?)
value, I prefer to think of it an estimated cost to reach n
from source s. We call a heuristic good only if it does not safely
over-estimate the cost.
w(a,b) is the actual cost to go from a to b
g() is a function giving actual cost to reach node n from s
by summing up w(a,b) where both a and b are nodes in path from
n to s.
Now to answer your question, AFAIK you can use this heuristic in 2 ways:
As you have said, you can lazily ignore w(a,b) values and use
h(b) values to sort the successor nodes (where b is any successor
node) -- This is called best first search algorithm
Another way would be to sort successor nodes based on value h(b) +
g(b) (where b is any successor node) -- This is called A*
search algorithm.
Recommended Reading:
Stuart J Russell and Peter Norvig, ―Artificial Intelligence - A Modern Approach, Third Edition
Ivan Brakto, ―Prolog Programming for Artificial Intelligence, Pearson Education, August 2011.
P.S. findall is the right thing to use in prolog to implement these 2 searches.

Related

Is Dijkstra's algorithm deterministic?

I think that Dijkstra's algorithm is determined, so that if you choose the same starting vertex, you will get the same result (the same distances to every other vertex). But I don't think that it is deterministic (that it has defined the following operation for each operation), because that would mean that it wouldn't have to search for the shortest distances in the first place.
Am I correct? If I'm wrong, could you please explain why it is deterministic, and maybe give an example?

I'm not sure there is a universal definition of determinism, but Wikipedia defines it as...
... an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.
So this requires both determinism of the output and determinism of the execution. The output of Dijkstra's algorithm is deterministic no matter how you look at it, because it's the length of the shortest path, and there is only one such length.
The execution of Dijkstra's algorithm in the abstract sense is not deterministic, because the final step is:
Otherwise, select the unvisited node that is marked with the smallest tentative distance, set it as the new "current node", and go back to step 3.
If there are multiple nodes with the same smallest tentative distance, the algorithm is free to select one arbitrarily. This doesn't affect the output, but it does affect the order of operations within the algorithm.
A particular implementation of Dijkstra's algorithm, however, probably is deterministic, because the nodes will be stored in a deterministic data structure like a min heap. This will result in the same node being selected each time the program is run. Although things like hashtable salts may also affect determinism even here.

Allow me to expand on Thomas's answer.
If you look at an implementation of Dijkstra, such as this example: http://graphonline.ru/en/?graph=NnnNwZKjpjeyFnwx you'll see a graph like this
In the example graph, 0→1→5, 0→2→5, 0→3→5 and 0→4→5 are all the same length. To find "the shortest path" is not necessarily unique, as is evidenced by this diagram.
Using the wording on Wikipedia, at some point the algorithm instructs us to:
select the unvisited node that is marked with the smallest tentative distance.
The problem here is the word the, suggesting that it is somehow unique. It may not be. For an implementation to actually pick one node from many equal candidates requires further specification of the algorithm regarding how to select it. But any such selected candidate having the required property will determine a path of the shortest length. So the algorithm doesn't commit. The modern approach to wording this algorithm would be to say:
select any unvisited node that is marked with the smallest tentative distance.
From a mathematical graph theory algorithm standpoint, that algorithm would technically proceed with all such candidates simultaneously in a sort of multiverse. All answers it may arrive at are equally valid. And when proving the algorithm works, we would prove it for all such candidates in all the multiverses and show that all choices arrive at a path of the same distance, and that the distance is the shortest distance possible.
Then, if you want to use the algorithm to just compute one such answer because you want to either A) find one such path, or B) determine the distance of such a path, then it is left up to you to select one specific branch of the multiverse to explore. All such selections made according to the algorithm as defined will yield a path whose length is the shortest length possible. You can define any additional non-conflicting criteria you wish to make such a selection.
The reason the implementation I linked is deterministic and always gives the same answer (for the same start and end nodes, obviously) is because the nodes themselves are ordered in the computer. This additional information about the ordering of the nodes is not considered for graph theory. The nodes are often labelled, but not necessarily ordered. In the implementation, the computer relies on the fact that the nodes appear in an ordered array of nodes in memory and the implementation uses this ordering to resolve ties. Possibly by selecting the node with the lowest index in the array, a.k.a. the "first" candidate.
If an implementation resolved ties by randomly (not pesudo-randomly!) selecting a winner from equal candidates, then it wouldn't be deterministic either.
Dijkstra's algorithm as described on Wikipedia just defines an algorithm to find the shortest paths (note the plural paths) between nodes. Any such path that it finds (if it exists) it is guaranteed to be of the shortest distance possible. You're still left with the task of deciding between equivalent candidates though at step 6 in the algorithm.

As the tag says, the usual term is "deterministic". And the algorithm is indeed deterministic. For any given input, the steps executed are always identical.
Compare it to a simpler algorithm like adding two multi-digit numbers. The result is always the same for two given inputs, the steps are also the same, but you still need to add the numbers to get the outcome.

By deterministic I take it you mean it will give the same answer to the same query for the same data every time and give only one answer, then it is deterministic. If it were not deterministic think of the problems it would cause by those who use it. I write in Prolog all day so I know non-deterministic answers when I see them.
Here I just introduced a simple mistake in Prolog and the answer was not deterministic, and with a simple fix it is deterministic.
Non-deterministic
spacing_rec(0,[]).
spacing_rec(Length0,[' '|T]) :-
succ(Length,Length0),
spacing_rec(Length,T).
?- spacing(0,Atom).
Atom = '' ;
false.
Deterministic
spacing_rec(0,[]) :- !.
spacing_rec(Length0,[' '|T]) :-
succ(Length,Length0),
spacing_rec(Length,T).
?- spacing(0,Atom).
Atom = ''.

I will try and keep this short and simple, there are so many great explanations on this on here and online as well, if some good research is done of course.
Dijkstra's algorithm is a greedy algorithm, the main goal of a Dijsktra's algorithm is to find the shortest path between two nodes of a weighted graph.
Wikipedia does a great job with explaining what a deterministic and non-deterministic algorithms are and how you can 'determine' which algorithm would fall either either category:
From Wikipedia Source:
Deterministic Algorithm:
In computer science, a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states. Deterministic algorithms are by far the most studied and familiar kind of algorithm, as well as one of the most practical, since they can be run on real machines efficiently.
Formally, a deterministic algorithm computes a mathematical function; a function has a unique value for any input in its domain, and the algorithm is a process that produces this particular value as output.
Nondeterministic Algorithm
In computer science, a nondeterministic algorithm is an algorithm that, even for the same input, can exhibit different behaviors on different runs, as opposed to a deterministic algorithm. There are several ways an algorithm may behave differently from run to run. A concurrent algorithm can perform differently on different runs due to a race condition.
So going back to the goal of Dijkstra's algorithm is like saying I want to get from X location to Z location but to do that I have options going through shorter nodes that will get my to my end a lot quicker and more efficiently than other longer routes or nodes...
Thinking through cases where Dijsktra's algorithm could be deterministic would be a good idea as well.

optimal search algorithm without admissible heuristic

Please forgive me if I'm not using the correct terms or have overlooked an existing solution. I'm not experienced in search algorithms and the theories behind it. I just would like to solve a problem.
I've previously used what I was told to be the A* algorithm to solve a different problem. But reading up on it I've realized that what I learned is not quite what wikipedia tells me.
What I learned was:
Start at your origin node
Open a new solution for each path you can take
Recursively create a new subsolution for each path you can take from there
When you arrive at the same place with multiple solutions, drop those who took longer than the fastest
Now if I understand wikipedia correctly, this is what I was supposed to do:
Start at your origin node
Open a new solution for each path you can take
Order the solutions by "cost of path taken" + "estimated cost to target"
Take cheapest solution and create subsolutions for each possible path
order those solutions into the others then rinse repeat
I can see how this would help with not calculating quite as many solutions but my problem is that I see no possiblity to create an "optimistic" estimate.
I'm not searching for a path on a geographical map. I'm trying to find the best sequence of actions. There's a minimum sequence of - say - ABCDEFGH. You cannot do F before E but repeating previous actions in particilar ordering might make later actions more efficient.
Do I need a different search algorithm? Do I do what I originally learned and just live with the fact that doing more work is the price for not having a good heuristic function?
I believe my teacher recognized this problem. And what I learned was simply A* with a heuristic function of f(n) = 0.

I'm not searching for a path on a geographical map. I'm trying to find
the best sequence of actions. There's a minimum sequence of - say -
ABCDEFGH. You cannot do F before E but repeating previous actions in
particular ordering might make later actions more efficient.
It is not clear to me whether you can repeat one action, i.e., a solution is ABCDEFGH, but would ABBBBCDEFGH be possible?
If not, then you might be able to have A* algorithm, implemented like this:
1. At some stage (say the first, "empty"), you have one of several actions
available.
2. The cost of going from Empty City to A City is the cost of action A.
3. The cost of going from Empty City to B city is the cost of action B.
When you've reached B, the cost of doing C is constant (if it is not, then you can't use A* as is) and you insert the cost of going from B City to C City as the cost of C.
So you can handle the case in which an action has different costs, provided that this difference is completely described by the previous state. For example, if you can only do C if you have done A or B, and the cost of C is 5 and 8, you enter the "distance" between A and C as 5, and B to C as 8.
If the cost of, say, D depends on the two previous states, you can still use a more complicated A* implementation where you define the virtual "cities" BC, AB and AC, and the distance from BC to D is "the cost of D having done B and C", and so on. The cost of reaching BC from A is "the cost of B given A, and the cost of C given A and B". So if these costs depend on the previous states, things get even more complicated.
In the end, the complexity of this revised A* will grow until it becomes your algorithm, where every state depends potentially on the sequence of all preceding states. The more this is true, the more your algorithm is convenient; the more every state is a cost unto itself, the more A* is convenient.
And of course the possibility of closed loops (visiting the same state/action twice, making this a cyclic graph) blows A* straight out of the water.

Is the greedy best-first search algorithm different from the best-first search algorithm?

Is the greedy best-first search algorithm different from the best-first search algorithm?
The wiki page has a separate paragraph about Greedy BFS but it's a little unclear.
My understanding is that Greedy BFS is just BFS where the "best node from OPEN" in wikipedia's algorithm is a heuristic function one calculates for a node. So implementing this:
OPEN = [initial state]
CLOSED = []
while OPEN is not empty
do
1. Remove the best node from OPEN, call it n, add it to CLOSED.
2. If n is the goal state, backtrace path to n (through recorded parents) and return path.
3. Create n's successors.
4. For each successor do:
a. If it is not in CLOSED: evaluate it, add it to OPEN, and record its parent.
b. Otherwise: change recorded parent if this new path is better than previous one.
done
with "best node from OPEN" being a heuristic function estimating how close the node is to the goal, is actually Greedy BFS. Am I right?
EDIT: Comment on Anonymouse's answer:
So essentially a greedy BFS doesn't need an "OPEN list" and should base its decisions only on the current node? Is this algorithm GBFS:
1. Set START as CURRENT node
2. Add CURRENT to Path [and optinally, to CLOSED?]
3. If CURRENT is GOAL, exit
4. Evaluate CURRENT's successors
5. Set BEST successor as CURRENT and go to 2.

"Best first" could allow revising the decision, whereas, in a greedy algorithm, the decisions should be final, and not revised.
For example, A*-search is a best-first-search, but it is not greedy.
However, note that these terms are not always used with the same definitions. "Greedy" usually means that the decision is never revised, eventually accepting suboptimal solutions at the benefit of improvements in running time. However, I bet you will find situations where "greedy" is used for the combination of "best first + depth first" as in "try to expand the best next step until we hit a dead end, then return to the previous step and continue with the next best there" (which I would call a "prioritized depth first").
Also, it depends on which level of abstraction you are talking about. A* is not greedy in "building a path". It's fine with keeping a large set of open paths around. It is however greedy in "expanding the search space" towards the true shortest path.

BFS is an instance of tree search and graph search algorithms in which a node is selected for expansion based on the evaluation function f(n) = g(n) + h(n), where g(n) is length of the path from the root to n and h(n) is an estimate of the length of the path from n to the goal node. In a BFS algorithm, the node with the lowest evaluation (i.e. lowest f(n)) is selected for expansion.
Greedy BFS uses the following evaluation function f(n) = h(n), which is just the heuristic function h(n), which estimates the closeness of n to the goal. Hence, greedy BFS tries to expand the node that is thought to be closest to the goal, without taking into account previously gathered knowledge (i.e. g(n)).
To summarize, the main difference between these (similar) search methods is the evaluation function.
As a side note, the A* algorithm is a best-first search algorithm in which the heuristic function h is an admissible heuristic (i.e. h is always an underestimation of the perfect heuristic function h*, for all n). A* is not a gredy BFS algorithm because its evaluation function is f(n) = g(n) + h(n).

Ten years after I asked this question, I got back to it and finally understood what that article in Wikipedia was saying.
Greedy BFS is greedy in expanding a potentially better successor of the current node. The difference between the two algorithms is in the loop that handles the evaluation of successors. Best-first search always exhausts the current node's successors by evaluating them and continues with the best one from them:
4. For each successor do:
a. If it is not in CLOSED: evaluate it, add it to OPEN, and record its parent.
b. Otherwise: change recorded parent if this new path is better than previous one.
Greedy BFS doesn't expand all successors of a node if it finds one that has a better heuristic than the current node. Instead it greedily expands this potentially better node, leaving some of the current node's successors unexpanded. This means the current node shouldn't be removed from the OPEN list unless all its successors have been evaluated. This is the pseudo-code:
OPEN = [initial state]
CLOSED = []
while OPEN is not empty
do
1. Remove the best node from OPEN, call it n, add it to CLOSED.
2. If n is the goal state, backtrace path to n (through recorded parents) and return path.
3. For each successor do:
a. If it is not in CLOSED:
i. Evaluate it, add it to OPEN, and record its parent
ii. If it has a better heuristic than n: remove n from CLOSED, add n to OPEN
(after the current successor) and break from loop 3.
b. Otherwise: change recorded parent if this new path is better than previous one.
done
I have removed the step 3. Create n's successors. from the BFS code above because some of n's successors may not be evaluated, so there is no use creating them. Instead each successor should be created and immediately evaluated in 3. For each successor do:.

As far as I understand, "best-first search" is only a collective name of a particular search technique in which you use a heuristic evaluation function h(n).
So, A* and greedy best-first search are algorithms that fall into this category.

What's the difference between backtracking and depth first search?

What's the difference between backtracking and depth first search?

Backtracking is a more general purpose algorithm.
Depth-First search is a specific form of backtracking related to searching tree structures. From Wikipedia:
One starts at the root (selecting some node as the root in the graph case) and explores as far as possible along each branch before backtracking.
It uses backtracking as part of its means of working with a tree, but is limited to a tree structure.
Backtracking, though, can be used on any type of structure where portions of the domain can be eliminated - whether or not it is a logical tree. The Wiki example uses a chessboard and a specific problem - you can look at a specific move, and eliminate it, then backtrack to the next possible move, eliminate it, etc.

I think this answer to another related question offers more insights.
For me, the difference between backtracking and DFS is that backtracking handles an implicit tree and DFS deals with an explicit one. This seems trivial, but it means a lot. When the search space of a problem is visited by backtracking, the implicit tree gets traversed and pruned in the middle of it. Yet for DFS, the tree/graph it deals with is explicitly constructed and unacceptable cases have already been thrown, i.e. pruned, away before any search is done.
So, backtracking is DFS for implicit tree, while DFS is backtracking without pruning.

IMHO, most of the answers are either largely imprecise and/or without any reference to verify. So let me share a very clear explanation with a reference.
First, DFS is a general graph traversal (and search) algorithm. So it can be applied to any graph (or even forest). Tree is a special kind of Graph, so DFS works for tree as well. In essence, let’s stop saying it works only for a tree, or the likes.
Based on [1], Backtracking is a special kind of DFS used mainly for space (memory) saving. The distinction that I’m about to mention may seem confusing since in Graph algorithms of such kind we’re so used to having adjacency list representations and using iterative pattern for visiting all immediate neighbors (for tree it is the immediate children) of a node, we often ignore that a bad implementation of get_all_immediate_neighbors may cause a difference in memory uses of the underlying algorithm.
Further, if a graph node has branching factor b, and diameter h (for a tree this is the tree height), if we store all immediate neighbors at each steps of visiting a node, memory requirements will be big-O(bh). However, if we take only a single (immediate) neighbor at a time and expand it, then the memory complexity reduces to big-O(h). While the former kind of implementation is termed as DFS, the latter kind is called Backtracking.
Now you see, if you’re working with high level languages, most likely you’re actually using Backtracking in the guise of DFS. Moreover, keeping track of visited nodes for a very large problem set could be really memory intensive; calling for a careful design of get_all_immediate_neighbors (or algorithms that can handle revisiting a node without getting into infinite loops).
[1] Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd Ed

According to Donald Knuth, it's the same.
Here is the link on his paper about Dancing Links algorithm, which is used to solve such "non-tree" problems as N-queens and Sudoku solver.
Backtracking, also called depth-first search

Backtracking is usually implemented as DFS plus search pruning. You traverse search space tree depth-first constructing partial solutions along the way. Brute-force DFS can construct all search outcomes, even the ones, that do not make sense practically. This can be also very inefficient to construct all solutions (n! or 2^n). So in reality as you do DFS, you need to also prune partial solutions, which do not make sense in context of the actual task, and focus on the partial solutions, which can lead to valid optimal solutions. This is the actual backtracking technique - you discard partial solutions as early as possible, make a step back and try to find local optimum again.
Nothing stops to traverse search space tree using BFS and execute backtracking strategy along the way, but it doesn't make sense in practice because you would need to store search state layer by layer in the queue, and tree width grows exponentially to the height, so we would waste a lot of space very quickly. That's why trees are usually traversed using DFS. In this case search state is stored in the stack (call stack or explicit structure) and it can't exceed tree height.

Usually, a depth-first-search is a way of iterating through an actual graph/tree structure looking for a value, whereas backtracking is iterating through a problem space looking for a solution. Backtracking is a more general algorithm that doesn't necessarily even relate to trees.

I would say, DFS is the special form of backtracking; backtracking is the general form of DFS.
If we extend DFS to general problems, we can call it backtracking.
If we use backtracking to tree/graph related problems, we can call it DFS.
They carry the same idea in algorithmic aspect.

DFS describes the way in which you want to explore or traverse a graph. It focuses on the concept of going as deep as possible given the choice.
Backtracking, while usually implemented via DFS, focuses more on the concept of pruning unpromising search subspaces as early as possible.

IMO, on any specific node of the backtracking, you try to depth firstly branching into each of its children, but before you branch into any of the child node, you need to "wipe out" previous child's state(this step is equivalent to back walk to the parent node). In other words, each siblings state should not impact each other. On the contrary, during normal DFS algorithm, you don't usually have this constraint, you don't need to wipe out(back track) previous siblings state in order to construct next sibling node.

Depth first is an algorithm for traversing or searching a tree. See here. Backtracking is a much more broad term that is used whereever a solution candidate is formed and later discarded by backtracking to a former state. See here. Depth first search uses backtracking to search a branch first (solution candidate) and if not successful search the other branch(es).

Depth first search(DFS) and backtracking are different searching and traversing algorithms. DFS is more broad and is used in both graph and tree data structure while DFS is limited to tree. With that being said,it does not mean DFS can't be used in graph. It is used in graph as well but only produce spanning tree, a tree without loop(multiple edge starting and ending at same vertex). That is why it is limited to tree.
Now coming back to backtracking, DFS use backtracking algorithm in tree data structure so in tree, DFS and backtracking are similar.
Thus,we can say, they are in same in tree data structure whereas in graph data structure, they are not same.

Idea - Start from any point, check if its the desired endpoint, if yes then we found a solution else goes to all next possible positions and if we can't go further then return to the previous position and look for other alternatives marking that current path wont lead us to the solution.
Now backtracking and DFS are 2 different names given to the same idea applied on 2 different abstract data types.
If the idea is applied on matrix data structure we call it backtracking.
If the same idea is applied on tree or graph then we call it DFS.
The cliche here is that a matrix could be converted to a graph and a graph could be transformed to a matrix. So, we actually apply the idea. If on a graph then we call it DFS and if on a matrix then we call it backtracking.
The idea in both of the algorithm is same.

The way I look at DFS vs. Backtracking is that backtracking is much more powerful. DFS helps me answer whether a node exists in a tree, while backtracking can help me answer all paths between 2 nodes.
Note the difference: DFS visits a node and marks it as visited since we are primarily searching, so seeing things once is sufficient. Backtracking visits a node multiple times as it's a course correction, hence the name backtracking.
Most backtracking problems involve:
def dfs(node, visited):
visited.add(node)
for child in node.children:
dfs(child, visited)
visited.remove(node) # this is the key difference that enables course correction and makes your dfs a backtracking recursion.

In a depth-first search, you start at the root of the tree and then explore as far along each branch, then you backtrack to each subsequent parent node and traverse it's children
Backtracking is a generalised term for starting at the end of a goal, and incrementally moving backwards, gradually building a solution.

Backtracking is just depth first search with specific termination conditions.
Consider walking through a maze where for each step you make a decision, that decision is a call to the call stack (which conducts your depth first search)... if you reach the end, you can return your path. However, if you reach a deadend, you want to return out of a certain decision, in essence returning out of a function on your call stack.
So when I think of backtracking I care about
State
Decisions
Base Cases (Termination Conditions)
I explain it in my video on backtracking here.
An analysis of backtracking code is below. In this backtracking code I want all of the combinations that will result in a certain sum or target. Therefore, I have 3 decisions which make calls to my call stack, at each decision I either can pick a number as part of my path to reach the target num, skip that number, or pick it and pick it again. And then if I reach a termination condition, my backtracking step is just to return. Returning is the backtracking step because it gets out of that call on the call stack.
class Solution:
"""
Approach: Backtracking
State
-candidates
-index
-target
Decisions
-pick one --> call func changing state: index + 1, target - candidates[index], path + [candidates[index]]
-pick one again --> call func changing state: index, target - candidates[index], path + [candidates[index]]
-skip one --> call func changing state: index + 1, target, path
Base Cases (Termination Conditions)
-if target == 0 and path not in ret
append path to ret
-if target < 0:
return # backtrack
"""
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
"""
#desc find all unique combos summing to target
#args
#arg1 candidates, list of ints
#arg2 target, an int
#ret ret, list of lists
"""
if not candidates or min(candidates) > target: return []
ret = []
self.dfs(candidates, 0, target, [], ret)
return ret
def dfs(self, nums, index, target, path, ret):
if target == 0 and path not in ret:
ret.append(path)
return #backtracking
elif target < 0 or index >= len(nums):
return #backtracking
# for i in range(index, len(nums)):
# self.dfs(nums, i, target-nums[i], path+[nums[i]], ret)
pick_one = self.dfs(nums, index + 1, target - nums[index], path + [nums[index]], ret)
pick_one_again = self.dfs(nums, index, target - nums[index], path + [nums[index]], ret)
skip_one = self.dfs(nums, index + 1, target, path, ret)

In my opinion, the difference is pruning of tree. Backtracking can stop (finish) searching certain branch in the middle by checking the given conditions (if the condition is not met). However, in DFS, you have to reach to the leaf node of the branch to figure out if the condition is met or not, so you cannot stop searching certain branch until you reach to its leaf nodes.

The difference is: Backtracking is a concept of how an algorithm works, DFS (depth first search) is an actual algorithm that bases on backtracking. DFS essentially is backtracking (it is searching a tree using backtracking) but not every algorithm based on backtracking is DFS.
To add a comparison: Backtracking is a concept like divide and conqueror. QuickSort would be an algorithm based on the concept of divide and conqueror.

Hill climbing and single-pair shortest path algorithms

I have a bit of a strange question. Can anyone tell me where to find information about, or give me a little bit of an introduction to using shortest path algorithms that use a hill climbing approach? I understand the basics of both, but I can't put the two together. Wikipedia has an interesting part about solving the Travelling Sales Person with hill climbing, but doesn't provide a more in-depth explanation of how to go about it exactly.
For example, hill climbing can be
applied to the traveling salesman
problem. It is easy to find a solution
that visits all the cities but will be
very poor compared to the optimal
solution. The algorithm starts with
such a solution and makes small
improvements to it, such as switching
the order in which two cities are
visited. Eventually, a much better
route is obtained.
As far as I understand it, you should pick any path and then iterate through it and make optimisations along the way. For instance going back and picking a different link from the starting node and checking whether that gives a shorter path.
I am sorry - I did not make myself very clear. I understand how to apply the idea to Travelling Salesperson. I would like to use it on a shortest distance algorithm.

You could just randomly exchange two cities.
You first path is: A B C D E F A with length 200
Now you change it by swapping C and D: A B D C E F A with length 350 - Worse!
Next step: A B C D F E A: length 150 - You improved your solution. ;-)
Hill climbing algorithms are really easy to implement but have several problems with local maxima! [A better approch based on the same idea is simulated annealing.]
Hill climbing is a very simple kind of evolutionary optimization, a much more sophisticated algorithm class are genetic algorithms.
Another good metaheuristic for solving the TSP is ant colony optimization

Examples would be genetic algorithms or expectation maximization in data clustering. With an iteration of single steps it is tried to come to a better solution with every step. The problem is that it only finds a local maximum/minimum, it is never assured that it finds the global maximum/minimum.
A solution for the travelling salesman problem as a genetic algorithm for which we need:
Representation of the solution as order of visited cities, e.g. [New York, Chicago, Denver, Salt Lake City, San Francisco]
Fitness function as the travelled distance
Selection of the best results is done by selecting items randomly depending on their fitness, the higher the fitness, the higher the probability that the solution is chosen to survive
Mutation would be switching to cities in a list, like [A,B,C,D] becomes [A,C,B,D]
Crossing of two possible solutions [B,A,C,D] and [A,B,D,C] result in [B,A,D,C], i.e. cutting both list in the middle and use the beginning of one parent and the end of the other parent to form the child
The algorithm then:
initalization of the starting set of solution
calculation of the fitness of every solution
until desired maximum fitness or until no changes happen any more
selection of the best solutions
crossing and mutation
fitness calculation of every solution
It is possible that with every execution of the algorithm the result is differently, therefore it should be executed more then once.

I'm not sure why you would want to use a hill-climbing algorithm, since Djikstra's algorithm is polynomial complexity O( | E | + | V | log | V | ) using Fibonacci queues:
http://en.wikipedia.org/wiki/Dijkstra's_algorithm
If you're looking for an heuristic approach to the single-path problem, then you can use A*:
http://en.wikipedia.org/wiki/A*_search_algorithm
but an improvement in efficiency is dependent on having an admissible heuristic estimate of the distance to the goal.
http://en.wikipedia.org/wiki/A*_search_algorithm

To hillclimb the TSP you should have a starting route. Of course picking a "smart" route wouldn't hurt.
From that starting route you make one change and compare the result. If it's higher you keep the new one, if it's lower keep the old one. Repeat this until you reach a point from where you can't climb anymore, which becomes your best result.
Obviously, with TSP, you will more than likely hit a local maximum. But it is possible to get decent results.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio