General algorithm for partial backtracking search

General algorithm for partial backtracking search - algorithm

Backtracking search is a well-known problem-solving technique, that recurs through all possible combinations of variable assignments in search of a valid solution. The general algorithm is abstracted into a concise higher-order function: https://en.wikipedia.org/wiki/Backtracking
Some problems require partial backtracking, that is, they have a mixture of don't-know non-determinism (have a choice to make, that matters, if you get it wrong you have to backtrack) and don't-care non-determinism (have a choice to make, that doesn't matter, maybe it matters for how long it takes you to find the solution, but not for the correctness thereof, you don't have to backtrack).
Consider for example the Boolean satisfiability problem that can be solved with the DPLL algorithm. If you try to represent that with the general backtracking algorithm, the result will not only recur through all 2^N variable assignments (which is sadly necessary in the general case), but all N! orders of trying the variables (completely unnecessary and hopelessly inefficient).
Is there a general algorithm for partial backtracking? A concise higher-order function that takes function parameters for both don't-know and don't-care choices?

If I understand you correctly, you’re asking about symmetry-breaking in tree search. In the specific example you gave, all permutations of the list of variable assignments are equivalent.
Symmetries are going to be domain-specific. So is the more-general technique of pruning the search tree, by short-circuiting and backtracking eagerly. There are a few symmetry-breaking techniques I’ve used that generalize.
One is to search the problem space in a canonical order. If the branch that sets variable 10 only tries variables 11, 12 and up, not variables 9, 8 or 7, it won’t search any permutation of the same solution. It will only test solutions that are unique up to permutation. (In the specific case of SAT-solving, this might rule out an optimal search order—although you could re-order the variables arbitrarily.)
Another is to make a test that only one distinct solution of any equivalence class will pass, ideally one that can be checked near the top of the search tree. The classic example of this is, in the 8-queens problem, checking whether the queen on the row you look at first is on the left or the right side of the chessboard. Any solution where she’s on the right is a mirror-image of one other solution where she’s on the left, so you can cut the search space in half. (You can actually do better than this with that problem.) If you only need to test for satisfiability, you can get by with a filter that merely guarantees that, if any solution exists, at least one solution will pass.
If you have enough memory, you might also store a set of branches that have already been searched, and then check whether a branch that you are considering whether to search is equivalent to one already in the set. This would be more practical for a search space with a huge number of symmetries than one with a huge number of solutions unique up to symmetry.

Related

Is Dijkstra's algorithm deterministic?

I think that Dijkstra's algorithm is determined, so that if you choose the same starting vertex, you will get the same result (the same distances to every other vertex). But I don't think that it is deterministic (that it has defined the following operation for each operation), because that would mean that it wouldn't have to search for the shortest distances in the first place.
Am I correct? If I'm wrong, could you please explain why it is deterministic, and maybe give an example?

I'm not sure there is a universal definition of determinism, but Wikipedia defines it as...
... an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.
So this requires both determinism of the output and determinism of the execution. The output of Dijkstra's algorithm is deterministic no matter how you look at it, because it's the length of the shortest path, and there is only one such length.
The execution of Dijkstra's algorithm in the abstract sense is not deterministic, because the final step is:
Otherwise, select the unvisited node that is marked with the smallest tentative distance, set it as the new "current node", and go back to step 3.
If there are multiple nodes with the same smallest tentative distance, the algorithm is free to select one arbitrarily. This doesn't affect the output, but it does affect the order of operations within the algorithm.
A particular implementation of Dijkstra's algorithm, however, probably is deterministic, because the nodes will be stored in a deterministic data structure like a min heap. This will result in the same node being selected each time the program is run. Although things like hashtable salts may also affect determinism even here.

Allow me to expand on Thomas's answer.
If you look at an implementation of Dijkstra, such as this example: http://graphonline.ru/en/?graph=NnnNwZKjpjeyFnwx you'll see a graph like this
In the example graph, 0→1→5, 0→2→5, 0→3→5 and 0→4→5 are all the same length. To find "the shortest path" is not necessarily unique, as is evidenced by this diagram.
Using the wording on Wikipedia, at some point the algorithm instructs us to:
select the unvisited node that is marked with the smallest tentative distance.
The problem here is the word the, suggesting that it is somehow unique. It may not be. For an implementation to actually pick one node from many equal candidates requires further specification of the algorithm regarding how to select it. But any such selected candidate having the required property will determine a path of the shortest length. So the algorithm doesn't commit. The modern approach to wording this algorithm would be to say:
select any unvisited node that is marked with the smallest tentative distance.
From a mathematical graph theory algorithm standpoint, that algorithm would technically proceed with all such candidates simultaneously in a sort of multiverse. All answers it may arrive at are equally valid. And when proving the algorithm works, we would prove it for all such candidates in all the multiverses and show that all choices arrive at a path of the same distance, and that the distance is the shortest distance possible.
Then, if you want to use the algorithm to just compute one such answer because you want to either A) find one such path, or B) determine the distance of such a path, then it is left up to you to select one specific branch of the multiverse to explore. All such selections made according to the algorithm as defined will yield a path whose length is the shortest length possible. You can define any additional non-conflicting criteria you wish to make such a selection.
The reason the implementation I linked is deterministic and always gives the same answer (for the same start and end nodes, obviously) is because the nodes themselves are ordered in the computer. This additional information about the ordering of the nodes is not considered for graph theory. The nodes are often labelled, but not necessarily ordered. In the implementation, the computer relies on the fact that the nodes appear in an ordered array of nodes in memory and the implementation uses this ordering to resolve ties. Possibly by selecting the node with the lowest index in the array, a.k.a. the "first" candidate.
If an implementation resolved ties by randomly (not pesudo-randomly!) selecting a winner from equal candidates, then it wouldn't be deterministic either.
Dijkstra's algorithm as described on Wikipedia just defines an algorithm to find the shortest paths (note the plural paths) between nodes. Any such path that it finds (if it exists) it is guaranteed to be of the shortest distance possible. You're still left with the task of deciding between equivalent candidates though at step 6 in the algorithm.

As the tag says, the usual term is "deterministic". And the algorithm is indeed deterministic. For any given input, the steps executed are always identical.
Compare it to a simpler algorithm like adding two multi-digit numbers. The result is always the same for two given inputs, the steps are also the same, but you still need to add the numbers to get the outcome.

By deterministic I take it you mean it will give the same answer to the same query for the same data every time and give only one answer, then it is deterministic. If it were not deterministic think of the problems it would cause by those who use it. I write in Prolog all day so I know non-deterministic answers when I see them.
Here I just introduced a simple mistake in Prolog and the answer was not deterministic, and with a simple fix it is deterministic.
Non-deterministic
spacing_rec(0,[]).
spacing_rec(Length0,[' '|T]) :-
succ(Length,Length0),
spacing_rec(Length,T).
?- spacing(0,Atom).
Atom = '' ;
false.
Deterministic
spacing_rec(0,[]) :- !.
spacing_rec(Length0,[' '|T]) :-
succ(Length,Length0),
spacing_rec(Length,T).
?- spacing(0,Atom).
Atom = ''.

I will try and keep this short and simple, there are so many great explanations on this on here and online as well, if some good research is done of course.
Dijkstra's algorithm is a greedy algorithm, the main goal of a Dijsktra's algorithm is to find the shortest path between two nodes of a weighted graph.
Wikipedia does a great job with explaining what a deterministic and non-deterministic algorithms are and how you can 'determine' which algorithm would fall either either category:
From Wikipedia Source:
Deterministic Algorithm:
In computer science, a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states. Deterministic algorithms are by far the most studied and familiar kind of algorithm, as well as one of the most practical, since they can be run on real machines efficiently.
Formally, a deterministic algorithm computes a mathematical function; a function has a unique value for any input in its domain, and the algorithm is a process that produces this particular value as output.
Nondeterministic Algorithm
In computer science, a nondeterministic algorithm is an algorithm that, even for the same input, can exhibit different behaviors on different runs, as opposed to a deterministic algorithm. There are several ways an algorithm may behave differently from run to run. A concurrent algorithm can perform differently on different runs due to a race condition.
So going back to the goal of Dijkstra's algorithm is like saying I want to get from X location to Z location but to do that I have options going through shorter nodes that will get my to my end a lot quicker and more efficiently than other longer routes or nodes...
Thinking through cases where Dijsktra's algorithm could be deterministic would be a good idea as well.

Find mutually compatible options from list of list of options

For purposes of this question, let us call a list of mutually incompatible options for "OptionS". I have a list of such OptionS, where each Option, apart from disqualifying all other Options in it's own OptionS list, also disqualify some Options from the other OptionS lists. These rules are symmetrical, so if A forbids B, B forbids A.
I want to pick exactly one Option from each list, such that no Options disqualify each other. There are too many Options (and OptionS) and too few disqualifications in each step to brute force a backtracking solution.
It reminds be a bit of Sudoku, but it is not an exact analog. From certain external factors, I have a rough likelihood for the different Options, or at least an ordering.
Is there a known better solution to this problem? Is it in NP?
Currently, I plan to just take random "paths" through the solution space, weighted by likelihood. A sort of simulated annealing.
EDIT - Clarification
I have a number, let's say between 5 and 500, of vectors.
Each vector contains a number, between 10 and 10000, of elements
Each element rules out a number of elements in the other vectors
This relation is symmetric
I want to pick exactly one element from each vector in a way that no elements disqualify each other
If there is no way to choose one from each vector, I want to at least choose as many as possible. The nature of the data is such that there will always be at least one (and at most a few) solution (or almost-solution - with just a few misses).
I cannot share the real data, but an example would be that the elements are integers between 1 and 10e9 and that only elements whose pairwise sum has more than P prime factors are allowed. Some numbers are more likely than others to "fit" other numbers, since larger numbers tend to have more factors, which makes some choices more likely just like the real one.
Pick P and the sizes and number of vectors as needed to make it suitably challenging :).
My naive solution:
I order the elements by how many other elements they rule out and try those who rule out few first (because that gives you a larger chance to be able to pick one from each).
Then I order the vectors by how many elements the "best" element rules out. Vectors that rule out many other elements are first. So the most constrained vector is tried first, even though the least constrained elements of that vector are tried first.
I then search depth first
The problem with this approach is that if the first choice is wrong, then the depth first search will never have time to reach the next choice.
A better way, which I try to explain in a comment below, would be to score each partial choice (node in the search tree) of elements according to how many you have chosen and how many elements are left. Then I could look deeper in the highest scoring node at each step, so the first choice is less rigid.
A similar way, which I might try first because it is slightly easier, is to do simulated annealing and take random paths, weighted by how many possibilities they keep, down the tree.

Depending on what constraints are allowed, I think you can reduce SAT to this.
Take a SAT expression e.g. (A|B|C)(~A|C|~D)...
Replace ~A by a and make a vector out of each term giving you {A,B,C} {a,C,d}...
You now have the problem of choosing one element from each vector, subject to the constraint that you cannot choose both versions of a variable - the constraints say that A is incompatible with a, B is incompatible with b, and so on.
If you can solve this instance of your problem you can solve SAT by setting to true variables that are chosen in your problem as A, B, C,... to false variables that are chosen as a, b, c,.. and making an arbitrary choice for anything not chosen - therefore your problem is at least as hard as SAT. (Except if you don't encounter these sorts of constraints, in which case I have not proved that your problem is this hard).
Given an instance of your problem, associate a variable with each element, write the constraints as boolean expressions (typically with only 2 variables) to give something which looks like 2-SAT, except that you need an expression for each vector of the form (A|B|C|D|...) to say that you must choose at least one element from each vector - so the exact solution version of your problem, at least, might code up quite nicely as input for a SAT-solver - so it is in NP and since we have already shown it is NP-hard it is NP-complete.

My first recommendation would be to find an off-the-shelf constraint solver and try that (request a maximum-weight solution with the log-likelihoods as weights), but if you're determined to implement a solver from scratch, then I would suggest that you start with something like WalkSAT. To summarize the link in the language of your question: at all times, keep a list of option choices (one from each option list, not necessarily compatible) and a list of conflicts (i.e., a set of pairs of indexes into the list of option lists). Repeatedly choose a conflict at random and resolve it by choosing differently for one half of the conflict or the other (most of the time) so as to decrease the number of conflicts afterward as much as possible or (some of the time) randomly, perhaps according to the likelihoods. Good data structures will be essential in making this run fast.

How can you compute a shortest addition chain for an arbitrary n <= 600 within one second?

How can you compute a shortest addition chain (sac) for an arbitrary n <= 600 within one second?
Notes
This is the programming competition on codility for this month.
Addition chains are numerically very important, since they are the most economical way to compute x^n (by consecutive multiplications).
Knuth's Art of Computer Programming, Volume 2, Seminumerical Algorithms has a nice introduction to addition chains and some interesting properties, but I didn't find anything that enabled me to fulfill the strict performance requirements.
What I've tried (spoiler alert)
Firstly, I constructed a (highly branching) tree (with the start 1-> 2 -> ( 3 -> ..., 4 -> ...)) such that for each node n, the path from the root to n is a sac for n. But for values >400, the runtime is about the same as for making a coffee.
Then I used that program to find some useful properties for reducing the search space. With that, I'm able to build all solutions up to 600 while making a coffee. But for n, I need to compute all solutions up to n. Unfortunately, codility measures the class initialization's runtime, too...
Since the problem is probably NP-hard, I ended up hard-coding a lookup table. But since codility asked to construct the sac, I don't know if they had a lookup table in mind, so I feel dirty and like a cheater. Hence this question.
Update
If you think a hard-coded, full lookup table is the way to go, can you give an argument why you think a full computation/partly computed solutions/heuristics won't work?

I have just got my Golden Certificate for this problem. I will not provide a full solution because the problem is still available on the site.I will instead give you some hints:
You might consider doing a deep-first search.
There exists a minimal star-chain for each n < 12509
You need to know how prune your search space.
You need a good lower bound for the length of the chain you are looking for.
Remember that you need just one solution, not all.
Good luck.

Addition chains are numerically very important, since they are the
most economical way to compute x^n (by consecutive multiplications).
This is not true. They are not always the most economical way to compute x^n. Graham et. all proved that:
If each step in addition chain is assigned a cost equal to the product
of the numbers at that step, "binary" addition chains are shown to
minimize the cost.
Situation changes dramatically when we compute x^n (mod m), which is a common case, for example in cryptography.
Now, to answer your question. Apart from hard-coding a table with answers, you could try a Brauer chain.
A Brauer chain (aka star-chain) is an addition chain where each new element is formed as the sum of the previous element and some element (possibly the same). Brauer chain is a sac for n < 12509. Quoting Daniel. J. Bernstein:
Brauer's algorithm is often called "the left-to-right 2^k-ary method",
or simply "2^k-ary method". It is extremely popular. It is easy to
implement; constructing the chain for n is a simple matter of
inspecting the bits of n. It does not require much storage.
BTW. Does anybody know a decent C/C++ implementation of Brauer's chain computation? I'm working partially on a comparison of exponentiation times using binary and Brauer's chains for both cases: x^n and x^n (mod m).

Using finite automata as keys to a container

I have a problem where I really need to be able to use finite automata as the keys to an associative container. Each key should actually represent an equivalence class of automata, so that when I search, I will find an equivalent automaton (if such a key exists), even if that automaton isn't structurally identical.
An obvious last-resort approach is of course to use linear search with an equivalence test for each key checked. I'm hoping it's possible to do a lot better than this.
I've been thinking in terms of trying to impose an arbitrary but consistent ordering, and deriving an ordered comparison algorithm. First principles involve the sets of strings that the automata represent. Evaluate the set of possible first tokens for each automaton, and apply an ordering based on those two sets. If necessary, continue to the sets of possible second tokens, third tokens etc. The obvious problem with doing this naively is that there's an infinite number of token-sets to check before you can prove equivalence.
I've been considering a few vague ideas - minimising the input automata first and using some kind of closure algorithm, or converting back to a regular grammar, some ideas involving spanning trees. I've come to the conclusion that I need to abandon the set-of-tokens lexical ordering, but the most significant conclusion I've reached so far is that this isn't trivial, and I'm probably better off reading up on someone elses solution.
I've downloaded a paper from CiteSeerX - Total Ordering on Subgroups and Cosets - but my abstract algebra isn't even good enough to know if this is relevant yet.
It also occurred to me that there might be some way to derive a hash from an automaton, but I haven't given this much thought yet.
Can anyone suggest a good paper to read? - or at least let me know if the one I've downloaded is a red herring or not?

I believe that you can obtain a canonical form from minimized automata. For any two equivalent automatons, their minimized forms are isomorphic (I believe this follows from Myhill-Nerode theorem). This isomorphism respects edge labels and of course node classes (start, accepting, non-accepting). This makes it easier than unlabeled graph isomorphism.
I think that if you build a spanning tree of the minimized automaton starting from the start state and ordering output edges by their labels, then you'll get a canonical form for the automaton which can then be hashed.
Edit: Non-tree edges should be taken into account too, but they can also be ordered canonically by their labels.

here is a thesis form 1992 where they produce canonical minimized automata: Minimization of Nondeterministic Finite Automata
Once you have the canonical, form you can easily hash it for example by performing a depth first enumeration of the states and transitions, and hashing a string obtained by encoding state numbers (count them in the order of their first appearance) for states and transitions as triples
<from_state, symbol, to_state, is_accepting_final_state>
This should solve the problem.

When a problem seems insurmountable, the solution is often to publicly announce how difficult you think the problem is. Then, you will immediately realise that the problem is trivial and that you've just made yourself look an idiot - and that's basically where I am now ;-)
As suggested in the question, to lexically order the two automata, I need to consider two things. The two sets of possible first tokens, and the two sets of possible everything-else tails. The tails can be represented as finite automata, and can be derived from the original automata.
So the comparison algorithm is recursive - compare the head, if different you have your result, if the same then recursively compare the tail.
The problem is the infinite sequence needed to prove equivalence for regular grammars in general. If, during a comparison, a pair of automata recur, equivalent to a pair that you checked previously, you have proven equivalence and you can stop checking. It is in the nature of finite automata that this must happen in a finite number of steps.
The problem is that I still have a problem in the same form. To spot my termination criteria, I need to compare my pair of current automata with all the past automata pairs that occurred during the comparison so far. That's what has been giving me a headache.
It also turns out that that paper is relevant, but probably only takes me this far. Regular languages can form a group using the concatenation operator, and the left coset is related to the head:tail things I've been considering.
The reason I'm an idiot is because I've been imposing a far too strict termination condition, and I should have known it, because it's not that unusual an issue WRT automata algorithms.
I don't need to stop at the first recurrence of an automata pair. I can continue until I find a more easily detected recurrence - one that has some structural equivalence as well as logical equivalence. So long as my derive-a-tail-automaton algorithm is sane (and especially if I minimise and do other cleanups at each step) I will not generate an infinite sequence of equivalent-but-different-looking automata pairs during the comparison. The only sources of variation in structure are the original two automata and the tail automaton algorithm, both of which are finite.
The point is that it doesn't matter that much if I compare too many lexical terms - I will still get the correct result, and while I will terminate a little later, I will still terminate in finite time.
This should mean that I can use an unreliable recurrence detection (allowing some false negatives) using a hash or ordered comparison that is sensitive to the structure of the automata. That's a simpler problem than the structure-insensitive comparison, and I think it's the key that I need.
Of course there's still the issue of performance. A linear search using a standard equivalence algorithm might be a faster approach, based on the issues involved here. Certainly I would expect this comparison to be a less efficient equivalence test than existing algorithms, as it is doing more work - lexical ordering of the non-equivalent cases. The real issue is the overall efficiency of a key-based search, and that is likely to need some headache-inducing analysis. I'm hoping that the fact that non-equivalent automata will tend to compare quickly (detecting a difference in the first few steps, like traditional string comparisons) will make this a practical approach.
Also, if I reach a point where I suspect equivalence, I could use a standard equivalence algorithm to check. If that check fails, I just continue comparing for the ordering where I left off, without needing to check for the tail language recurring - I know that I will find a difference in a finite number of steps.

If all you can do is == or !=, then I think you have to check every set member before adding another one. This is slow. (Edit: I guess you already know this, given the title of your question, even though you go on about comparison functions to directly compare two finite automata.)
I tried to do that with phylogenetic trees, and it quickly runs into performance problems. If you want to build large sets without duplicates, you need a way to transform to a canonical form. Then you can check a hash, or insert into a binary tree with the string representation as a key.
Another researcher who did come up with a way to transform a tree to a canonical rep used Patricia trees to store unique trees for duplicate-checking.

Why does backtracking make an algorithm non-deterministic?

So I've had at least two professors mention that backtracking makes an algorithm non-deterministic without giving too much explanation into why that is. I think I understand how this happens, but I have trouble putting it into words. Could somebody give me a concise explanation of the reason for this?

It's not so much the case that backtracking makes an algorithm non-deterministic.
Rather, you usually need backtracking to process a non-deterministic algorithm, since (by the definition of non-deterministic) you don't know which path to take at a particular time in your processing, but instead you must try several.

I'll just quote wikipedia:
A nondeterministic programming language is a language which can specify, at certain points in the program (called "choice points"), various alternatives for program flow. Unlike an if-then statement, the method of choice between these alternatives is not directly specified by the programmer; the program must decide at runtime between the alternatives, via some general method applied to all choice points. A programmer specifies a limited number of alternatives, but the program must later choose between them. ("Choose" is, in fact, a typical name for the nondeterministic operator.) A hierarchy of choice points may be formed, with higher-level choices leading to branches that contain lower-level choices within them.
One method of choice is embodied in backtracking systems, in which some alternatives may "fail", causing the program to backtrack and try other alternatives. If all alternatives fail at a particular choice point, then an entire branch fails, and the program will backtrack further, to an older choice point. One complication is that, because any choice is tentative and may be remade, the system must be able to restore old program states by undoing side-effects caused by partially executing a branch that eventually failed.
Out of the Nondeterministic Programming article.

Consider an algorithm for coloring a map of the world. No color can be used on adjacent countries. The algorithm arbitrarily starts at a country and colors it an arbitrary color. So it moves along, coloring countries, changing the color on each step until, "uh oh", two adjacent countries have the same color. Well, now we have to backtrack, and make a new color choice. Now we aren't making a choice as a nondeterministic algorithm would, that's not possible for our deterministic computers. Instead, we are simulating the nondeterministic algorithm with backtracking. A nondeterministic algorithm would have made the right choice for every country.

The running time of backtracking on a deterministic computer is factorial, i.e. it is in O(n!).
Where a non-deterministic computer could instantly guess correctly in each step, a deterministic computer has to try all possible combinations of choices.
Since it is impossible to build a non-deterministic computer, what your professor probably meant is the following:
A provenly hard problem in the complexity class NP (all problems that a non-deterministic computer can solve efficiently by always guessing correctly) cannot be solved more efficiently on real computers than by backtracking.
The above statement is true, if the complexity classes P (all problems that a deterministic computer can solve efficiently) and NP are not the same. This is the famous P vs. NP problem. The Clay Mathematics Institute has offered a $1 Million prize for its solution, but the problem has resisted proof for many years. However, most researchers believe that P is not equal to NP.
A simple way to sum it up would be: Most interesting problems a non-deterministic computer could solve efficiently by always guessing correctly, are so hard that a deterministic computer would probably have to try all possible combinations of choices, i.e. use backtracking.

Thought experiment:
1) Hidden from view there is some distribution of electric charges which you feel a force from and you measure the potential field they create. Tell me exactly the positions of all the charges.
2) Take some charges and arrange them. Tell me exactly the potential field they create.
Only the second question has a unique answer. This is the non-uniqueness of vector fields. This situation may be in analogy with some non-deterministic algorithms you are considering. Further consider in math limits which do not exist because they have different answers depending on which direction you approach a discontinuity from.

I wrote a maze runner that uses backtracking (of course), which I'll use as an example.
You walk through the maze. When you reach a junction, you flip a coin to decide which route to follow. If you chose a dead end, trace back to the junction and take another route. If you tried them all, return to the previous junction.
This algorithm is non-deterministic, non because of the backtracking, but because of the coin flipping.
Now change the algorithm: when you reach a junction, always try the leftmost route you haven't tried yet first. If that leads to a dead end, return to the junction and again try the leftmost route you haven't tried yet.
This algorithm is deterministic. There's no chance involved, it's predictable: you'll always follow the same route in the same maze.

If you allow backtracking you allow infinite looping in your program which makes it non-deterministic since the actual path taken may always include one more loop.

Non-Deterministic Turing Machines (NDTMs) could take multiple branches in a single step. DTMs on the other hand follow a trial-and-error process.
You can think of DTMs as regular computers. In contrast, quantum computers are alike to NDTMs and can solve non-deterministic problems much easier (e.g. see their application in breaking cryptography). So backtracking would actually be a linear process for them.

I like the maze analogy. Lets think of the maze, for simplicity, as a binary tree, in which there is only one path out.
Now you want to try a depth first search to find the correct way out of the maze.
A non deterministic computer would, at every branching point, duplicate/clone itself and run each further calculations in parallel. It is like as if the person in the maze would duplicate/clone himself (like in the movie Prestige) at each branching point and send one copy of himself into the left subbranch of the tree and the other copy of himself into the right subbranch of the tree.
The computers/persons who end up at a dead end they die (terminate without answer).
Only one computer will survive (terminate with an answer), the one who gets out of the maze.
The difference between backtracking and non-determinism is the following.
In the case of backtracking there is only one computer alive at any given moment, he does the traditional maze solving trick, simply marking his path with a chalk and when he gets to a dead end he just simply backtracks to a branching point whose sub branches he did not yet explore completely, just like in a depth first search.
IN CONTRAST :
A non deteministic computer can clone himself at every branching point and check for the way out by running paralell searches in the sub branches.
So the backtracking algorithm simulates/emulates the cloning ability of the non-deterministic computer on a sequential/non-parallel/deterministic computer.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio