In programming terms, what is a backtracking solution? - algorithm

I have a couple of questions, as to what a backtracking solution really means.
Say, you have n options from a current state, does a backtracking
solution basically mean that you try out all of those states, and do
the same for the sub-problems(where you might have n-1 states for
the solution), and so on, and you return the best solution from the
(n-1)th frame up to the nth frame.
See problem below where : Given a rope with length n, how to cut the rope into m parts with length n[0], n[1], ..., n[m-1], in order to get the maximal product of n[0]n[1] ... *n[m-1] Soa rope of length would be cut into 2*3*3 to get a product of 18.
,
public class RopeCuttingMaxProduct
{
public static void main(String[] args)
{
System.out.println(fun(8));
}
static int fun(int n)
{
if(n == 1)
return 1;
int totalret = 0;
for(int i = 1;i<n;i++)
{
/*At every frame, two options 1.to cut 2.to not cut
* 1.If cutting, multiple ways to cut : Remember i+(n-i) == n
* 2.If not cutting, just return n
*/
/* Explore all possible solutions, from all possible paths
* and the subpaths that these lead to,
* and their subpaths*/
int reti = max(fun(n-i)*i,n);
if(reti > totalret)
totalret = reti;
}
return totalret;
}
static int max(int a, int b)
{
return a>b?a:b;
}
}
So, are all backtracking solutions exponential in time complexity?
This sounds so much like recursion, that I cannot imagine something
like this achieved by anything other recursion. Can you give me an
example of a backtracking solution achieved without recursion
How is it different from brute-force. Is the brute force solution for this problem to try out all possible combinations of the ways to add up to n. I find the above backtracking solution to be doing pretty much the same.

If you consider the path your algorithm takes to be the traversal of a decision tree, then the solution, if it exists, is some leaf node of the tree, or if there are multiple solutions, then there are multiple leaf nodes that represent them.
Backtracking simply means that the algorithm detects at some point that the solution is not to be found in the branch of the tree it is currently in and then move back up one or more nodes to continue with other branches.
This does not mean that all nodes need to be visited. For example, if the algorithm detects that the current branch is not worth pursuing before it actually reaches a leaf, then it would avoid visiting all the remaining nodes in that branch.
Consequently, not all backtracking algorithms are brute-force.
How much unnecessary work such an algorithm can avoid is very specific to the problem, it cannot be answered in general. The general answer is that backtracking is not necessarily based on exhaustive searches/trials.

Related

Is the visited array really needed in Dijkstra's Algorithm using Priority Queue?

vector<int> dijkstra(vector<vector<int>> &vec, int vertices, int edges, int source)
{
vector <pair<int,int>> adj[vertices];
for (int i=0;i<edges;i++)
{
int u = vec[i][0];
int v = vec[i][1];
int w = vec[i][2];
adj[u].push_back(make_pair(v,w));
adj[v].push_back(make_pair(u,w));
}
vector<int> distance(vertices,INT_MAX);
distance[source]= 0;
priority_queue<pair<int,int>, vector<pair<int,int>>, greater<pair<int,int> > > pq;
pq.push(make_pair(0,source));
while(!pq.empty())
{
int dis = pq.top().first;
int node = pq.top().second;
pq.pop();
for (auto adjacentNode : adj[node])
{
int currNode = adjacentNode.first;
int dfn = adjacentNode.second;
if (dfn+dis < distance[currNode])
{
distance[currNode]= dfn+dis;
pq.push(make_pair(distance[currNode], currNode));
}
}
}
return distance;
}
I was writing code for Dijkstra's Algorithm using priority queue but forgot to initialize the visited array to keep track of visited vertices. I submitted the code and all the test cases passed. Is the visited array really needed, or am I missing something?
Since your algorithm keeps track of the (so far) shortest distance to a node, the if (dfn+dis < distance[currNode]) statement will prevent the algorithm from revisiting nodes it had already visited, because the loop it made to revisit a node would add to the distance it already had for that revisited node, and so this condition is false (assuming positive weights).
So indeed, you don't need the visited array in this variant of Dijkstra's algorithm. See also how it is not there in the pseudo code that Wikipedia offers.
This isn't really Dijkstra's algorithm, as Dijkstra's involves each node being added and removed from the priority queue at most once. This version will reinsert nodes in the queue when you have negative weight edges. It will also go into an infinite loop if you have negative weight cycles.
Note that the wikipedia version uses a decrease key operation, it does not insert to the priority queue in the if statement.
I don't know which version you're referring to that uses a visited array, but it's likely that visited array achieved the same purpose.
This is closer to the Bellman-Ford algorithm, which can also be implemented with a queue (and it's usually faster in practice if you do it that way than if you do it as shown in most sources by iterating the edges |V| times). A priority queue achieves the same results, but it will be slower than a simple FIFO queue. It was probably fast enough for the online judge you submitted this to.
Bottom line: what you have there isn't Dijkstra's algorithm. The visited array is likely necessary to make it Dijkstra's. You should edit your post to show that version as well so we can say for sure.

Greedy Algorithm for solving Horn formulas

This is my assignment question I've been trying to understand for a couple of days and ultimately solve it. So far, I have got no success. So any guidance, help in understanding or solving the problem is appreciated.
You are given a set of m constraints over n Boolean variables
{x1, x2, ..., xn}.
The constraints are of two types:
equality constraints: xi = xj, for some i != j
inequality constraints: xi != xj, for some i != j
Design an efficient greedy algorithm that given the
set of equality and inequality constraints determines if it is
possible or not to satisfy all the constraints simultaneously.
If it
is possible to satisfy all the constraints, your algorithm should
output an assignment to the variables that satisfyes all the
constraints.
Choose a representation for the input to this problem
and state the problem formally using the notation Input: ..., Output:
....
Describe your greedy algorithm in plain English. In what
sense is your algorithm "greedy"?
Describe your greedy algorithm
in pseudocode.
Briefly justify the correctness of your algorithm.
State and justify the running time of your algorithm. The more
efficient algorithm the better.
What I've figured out so far is that this problem is related to the Boolean satisfiability (SAT) problem. I've tried setting all the variables to false first and then, by counter examples, prove that it cannot satisfy all the constraints at once.
I am getting confused between constraint satisfaction problems (CSP) and Horn SAT. I read certain articles on these to get a solution and this led me to confusion. My logic was to create a tree and apply DFS to check if constraints are satisfied, whereas Horn SAT solutions are leading me to mathematical proofs.
Any help is appreciated as this is my learning stage and I cannot master it all at once. :)
(informal) Classification:
So firstly, it's not the boolean SAT problem, because that's NP-complete. Your teacher has implied that this isn't NP-complete by asking for an efficient (ie. at most polynomial-time) way to always solve the problem.
Modelling (thinking about) the problem:
One way to think of this problem is as a graph, where inequalities represent one type of edge, while equalities represent another:
Thinking of this problem graphically helped me realise that it's a bit like a graph-colouring problem: we could set all nodes to ? (unset), then choose any node to set to true, then do a breadth-first search from that node to set all connecting nodes (setting them to either true or false), checking for any contradiction. If we complete this for a connected component of the graph, without finding contradictions, then we can ignore all nodes in that part and randomly set the value of another node, etc. If we do this until no connected components are left, and we still have no contradictions, then we've set the graph in a way that represents a legitimate solution.
Solution:
Because there's exactly n elements, we can make an associated "bucket" array of the equalities and another for the inequalities (each "bucket" could contain an array of what it equates to, but we could get even more efficient than this if we wanted [the complexity would remain the same]).
Your array of arrays for equalities could be imagined like this:
which would represent that:
0 == 1
1 == 2
3 == 4
Note that this is an irregular matrix, and requires 2*m space. We do the same thing for the an inequality matrix. Moreover, setting up both of these arrays (of arrays) uses O(m + n) space and time complexity.
Now, if there exists a solution, {x0, x1, x2, x3}, then {!x0, !x1, !x2, !x3} is also a solution. Proof:
(xi == xj) iff (!xi == !xj)
So it won't effect our solution if we set one of the elements randomly. Let's set xi to true, and set the others to ? [numerically we'll be dealing with three values: 0 (false), 1 (true), and 2 (unset)].
We'll call this array solution (even though it's not finished yet).
Now we can use recursion to consider all the consequences of setting our value:
(The below code is psuedo-code, as the questioner didn't specify a language. I've made it somewhat c++-style, but just to keep it generic and to use the pretty formatting colours.)
bool Set (int i, bool val) // i is the index
{
if (solution[i] != '?')
return (solution[i] == val);
solution[i] == val;
for (int j = 0; j < equalities[i].size(); j += 1)
{
bool success = Set(equalities[i][j], val);
if (!success)
return false; // Contradiction found
}
for (int j = 0; j < inequalities[i].size(); j += 1)
{
bool success = Set(inequalities[i][j], !val);
if (!success)
return false; // Contradiction found
}
return true; // No contradiction found
}
void Solve ()
{
for (int i = 0; i < solution.size(); i += 1)
solution[i] == '?';
for (int i = 0; i < solution.size(); i += 1)
{
if (solution[i] != '?')
continue; // value has already been set/checked
bool success = Set(i, true);
if (!success)
{
print "No solution";
return;
}
]
print "At least one solution exists. Here is a solution:";
print solution;
}
Because of the first if condition in the Set function, the function can only be executed (beyond the if statement) n times. The Set function can call itself only when passing the first if statement, which it does n times, 1 for each node value. Each time the Set function passes into the body of the function (beyond the if statement), the work it does is proportional to the number of edges associated with the corresponding node. The Solve function can call the Set function at most n times. Hence the number of times that the function can be called is O(m+n), which corresponds to the amount of work done during the solving process.
A trick here is to recognise that the Solve function will need to call the Set function C times, where C is the number of connected components of the graph. Note that each connected component is independent of each other, so the same rule applies: we can legitimately choose a value of one of its elements then consider the consequences.
The fastest solution would still need to read all of the constraints, O(m) and would need to output a solution when it's possible, O(n); therefore it's not possible to get a solution with better time complexity than O(m+n). The above is a greedy algorithm with O(m+n) time and space complexity.
It's probably possible to get better space complexity (while maintaining the O(m+n) time complexity), maybe even O(1), but I'm not sure.
As for Horn formulas, I'm embarrassed to admit that I know nothing about them, but this answer directly responds to everything that was asked of you in the assignment.
Let’s take an example 110 with constraints x1=x2 and x2!=x3
Remember since we are only given the constraints, the algorithm can also end up generating 001 as output as it satisfies the constraints too
One way to solve it would be
Have two lists one for each constraint type,
Each list holds a pair of i,j index.
Sort the lists based on the i index.
Now for each pair in equality constraint check if there’s no constraint in inequality that conflicts with it.
If it does then you can exit right away
Otherwise you have to check if there’s more pairs in equality constraint lists that have one of the pairs.
You can then assign one or zero to that and eventually you would be able to generate the complete output

Fast Algorithm to Solve Unique Paths With Backtracking

A robot located at the top left corner of a XxX grid is trying to reach the bottom right corner. The robot can move either up, down, left, or right, but cannot visit the same spot twice. How many possible unique paths are there to the bottom right corner?
What is a fast algorithmic solution to this? I've spent a huge amount of time trying to figure out a fast algorithm to this. But still stuck.
This is basically the unique paths Leetcode problem, except with backtracking.
Unique paths, without backtracking, can be solved with dynamic programming such as:
class Solution {
public:
int uniquePaths(int m, int n) {
vector<int> cur(n, 1);
for (int i = 1; i < m; i++) {
for (int j = 1; j < n; j++) {
cur[j] += cur[j - 1];
}
}
return cur[n - 1];
}
};
What would be a fast algorithmic solution, using dynamic programming, to unique paths, except with backtracking? Something that could quickly find the result 1,568,758,030,464,750,013,214,100 for a 10X10 grid.
Reddit, Wikipedia, and Youtube have resources illustrating the complexity of this problem. But they don't have any answers.
The problem cannot be solved using dynamic programming because the recurrence relation does not break the problem into sub-problems. Dynamic programming assumes that the state to be computed is dependent on only the sub-states in the recurrence. It is not true in this case because there can be cycles, ie. going up and down.
The general case of this problem, to count the number of simple paths in a directed cyclic graph, is considered to be #P-Complete.
This can also been as enumerating self avoiding walks in 2-dimensions. As per wikipedia,
Finding the number of such paths is conjectured to be an NP-hard problem[citation needed].
However, if we consider moves in only the positive direction, ie. right and down, it has a closed form solution, of m+nCm. Basically, the total number of moves is always fixed to be m + n where m,n are cartesian distances to the end point of the diagonal and we simply have to choose the m right(s) or n down(s). The dynamic programming solution is essentially the same.

Using Breadth First Search and inorder traversal to analyze the validity of a really large binary search tree

I was thinking about the different techniques to check the validity of a binary search tree. Naturally, the invariant that needs to be maintained is that the left subtree must be less than or equal to the current node, which in turn should be less than or equal to the right subtree. There are a couple of different ways to tackle this problem: The first is to check the constraints for values on each subtree and can be outlined like this (in Java, for integer nodes):
public static boolean isBST(TreeNode node, int lower, int higher){
if(node == null) return true;
else if(node.data < lower || node.data > higher) return false;
return isBST(node.left, lower, node.data) && isBST(node.right, node.data, higher);
}
There is also another way to accomplish this using an inOrder traversal where you keep track of the previous element and make sure the progression is strictly non-decreasing. Both these methods explore the left subtrees first though, and in the event we have an inconsistency in the middle of the root's right subtree, what is the recommended path? I know that a BFS variant could be used, but would it be possible to use multiple techniques at the same time and is that recommended? For example, we could to a BFS, an inorder and a reverseInorder and return the moment there is a failure detected. This could only maybe be desirable for really large trees in order to reduce the average runtime at the cost of a bit more space and multiple threads accessing the same data structure. Ofcourse, if we're using a simple iterative solution for inorder solution (NOT a morris traversal that modifies the tree) we will be using up O(lgN) space.
I would expect this to depend on your precise situation. In particular, what is the probability that your tree will fail to be binary, and the expected depth at which the failure will occur.
For example, if it is likely that the tree is correctly binary, then it would be wasteful to use 3 multiple techniques as the overall runtime for a valid tree will be roughly tripled.
What about iterative deepening depth-first search?
It is generally (asymptotically) as fast as breadth-first search (and also finds any early failure), but uses as little memory as depth-first search.
It would typically look something like this:
boolean isBST(TreeNode node, int lower, int higher, int depth)
{
if (depth == 0)
return true;
...
isBST(..., depth-1)
...
}
Caller:
boolean failed = false;
int treeHeight = height(root);
for (int depth = 2; depth <= treeHeight && !failed; depth++)
failed = !isBST(root, -INFINITY, INFINITY, depth);

What would be the time complexity of counting the number of all structurally different binary trees?

Using the method presented here: http://cslibrary.stanford.edu/110/BinaryTrees.html#java
12. countTrees() Solution (Java)
/**
For the key values 1...numKeys, how many structurally unique
binary search trees are possible that store those keys?
Strategy: consider that each value could be the root.
Recursively find the size of the left and right subtrees.
*/
public static int countTrees(int numKeys) {
if (numKeys <=1) {
return(1);
}
else {
// there will be one value at the root, with whatever remains
// on the left and right each forming their own subtrees.
// Iterate through all the values that could be the root...
int sum = 0;
int left, right, root;
for (root=1; root<=numKeys; root++) {
left = countTrees(root-1);
right = countTrees(numKeys - root);
// number of possible trees with this root == left*right
sum += left*right;
}
return(sum);
}
}
I have a sense that it might be n(n-1)(n-2)...1, i.e. n!
If using a memoizer, is the complexity O(n)?
The number of full binary trees with number of nodes n is the nth Catalan number. Catalan Numbers are calculated as
which is complexity O(n).
http://mathworld.wolfram.com/BinaryTree.html
http://en.wikipedia.org/wiki/Catalan_number#Applications_in_combinatorics
It's easy enough to count the number of calls to countTrees this algorithm uses for
a given node count. After a few trial runs, it looks to me like it requires 5*3^(n-2) calls for n >= 2, which grows much more slowly than n!. The proof of this assertion is left as an exercise for the reader. :-)
A memoized version required O(n) calls, as you suggested.
Incidentally, the number of binary trees with n nodes equals the n-th Catalan number.
The obvious approaches to calculating Cn all seem to be linear in n, so a memoized implementation of countTrees is probably the best one can do.
Not sure of how many hits to the look-up table is the memoized version going to make (which is definitely super-linear and will have the overheads of function calling) but with the mathematical proof yielding the result to be the same as nth Catalan number, one can quickly cook up a linear-time tabular method:
int C=1;
for (int i=1; i<=n; i++)
{
C = (2*(2*(i-1)+1)*C/((i-1)+2));
}
return C;
Note the difference between Memoization and Tabulation here

Resources