How to decide if randomised algorithm is OK to use? - algorithm

From what I understand, randomised algorithm could give wrong answer.For example, using contraction algorithm to solve graph min-cut problem, you need to run the algorithm n^2*ln(n) times so that the possibility of failing to get the correct answer is at most 1/n. No matter how small the possibility of failure is, the answer could be incorrect, so when is the right time that we allow the incorrect answer?

To begin with, I think you need to differentiate between different classes of randomized algorithms:
A Monte Carlo algorithm is an algorithm which is random w.r.t. correctness. The randomized min-cut algorithm, from your question, is an example of such an algorithm.
A Las Vegas algorithm is an algorithm which is random w.r.t. running time. Randomized quicksort, for example, is such an algorithm.
You seem to mean Monte-Carlo algorithms in your question.
The question of whether a Monte-Carlo algorithm is suitable to you, probably can't be answered objectively, because it is based on something like the ecomonic theory of utility. Given two algorithms, A and B, then each invocation of A or B takes some time t and gives you the result whose correctness is c. The utility U(t, c) is a random variable, and only you can determine whether the distribution of UA(T, C) is better or worse than UB(T, C). Some examples, where algorithm A performs twice as fast as B, but errs with probability 1e-6:
If these are preference recommendations on a website, then it might be worth it for you to have your website twice as responsive as that of a competitor, at the risk that, rarely, a client gets wrong recommendations.
If these are control systems for a nuclear reactor (to borrow from TemplateTypedef's comment), then a slight chance of failure might not be worth the time saving (e.g., you probably would be better investing in a processor twice as fast running the slower algorithm).
The two examples above show that each of the two choices might be correct for different settings. In fact, utility theory rarely shows sets of choices that are clearly wrong. In the introduction to the book Randomized Algorithms by Motwani and Raghavan, however, the authors do give such an example for the fallacy of avoiding Monte-Carlo algorithms. The probability of a CPU malfunctioning due to cosmic radiation is some α (whose value I forget). Thus avoiding running a Monte-Carlo algorithm with probability of error much lower than α, is probably simply irrational.

You'll always need to analyze the properties of the algorithm and decide if the risk of a non-optimal answer is bearable in your application. (If the answer is Boolean, then "non-optimal" is the same as "wrong.")
There are many kinds of programming problems where some answer that's close to optimal and obtained in reasonable time is much better than the optimal answer provided too late or not at all.
The Traveling Salesman problem is an example. If you are Walmart and need to plan delivery routes each night for given sets of cities, getting a route that's close to optimal is much better than no route or a naively chosen one or the best possible route obtained 2 days from now.
There are many kinds of guarantees provided by randomized algorithms. They often have the form error <= F(cost), where error and cost can be almost anything. The cost may be expressed in run time or how many repeat runs are spent looking for better answers. Space may also figure in cost. The error may be probability of a wrong 1/0 answer, a distance metric from an optimal result, a discrete count of erroneous components, etc., etc.
Sometimes you just have to live with a maybe-wrong answer because there's no useful alternative. Primality testing on big numbers is in this category. Though there are polynomial time deterministic tests, they are still much slower than a probabilistic test that produces the correct answer for all practical purposes.
For example, if you have a Boolean randomized function where True results are always correct, but False are wrong 50% of the time, then you are in pretty good shape. (The Miller-Rabin primality test is actually better than this.)
Suppose you can afford to run the algorithm 40 times. If any of the runs says False, you know the answer is False. If they're all True then the probability of that the real answer if false is roughly 2^40 = 1/(1 trillion).
Even in safety-critical applications, this may be a fine result. The chance of being hit by lightning in a lifetime is about 1/10,000. We all live with that and don't give it a second thought.

Related

About Anytime Algorithm

I recently learnt about anytime algorithm but couldn't find any good explanation of this.
Can any one explain about anytime algorithm and how it works?
Traditionally, an algorithm is some process that, when followed, eventually will stop and return a result (think about something like binary search, mergesort, Dijkstra's algorithm, etc.)
An anytime algorithm is an algorithm that, rather than producing a final answer, continuously searches for better and better answers to a particular problem. The "anytime" aspect means that at any point in time, you can ask the algorithm for its current best guess.
For example, suppose that you have some mathematical function and you want to find the minimum value that the function obtains. There are many numerical algorithms that you can use to do this - gradient descent, Newton's method, etc. - that under most circumstances never truly reach the ultimate answer. Instead, they converge closer and closer to the true value. These algorithms can be made into anytime algorithms. You can just run them indefinitely, and at any point in time, you can ask the algorithm what its best guess is so far.
Note that there is no one single algorithm called the "anytime algorithm." It's a class of algorithms, just in the same way that there's no one "randomized algorithm" or no one "approximation algorithm."
Hope this helps!
An anytime algorithm is an a class of computational procedures that computes a solution to some problem and that also needs to have three technical properties.
(1) It needs to be an algorithm, meaning it is guaranteed to terminate.
(2) It needs to be stoppable at any time, and at that time it needs to provide an answer to to the problem (think of this as an approximation to the ideal solution).
(3) As more time passes, the result you get from stopping the algorithm gets uniformly and continuously better (i.e. it never comes up with a worse solution, which can happen for some optimization procedures that might oscillate or occasionally restart from scratch).

How to give mathemarical proof or support my answer through reasoning as a general case?

You are managing a software project that involves building a computer-assisted instrument for medical surgery. The exact placement of the surgical knife is dependent on a number of different parameters, usually at least 25, sometimes more. Your programmer has developed two algorithms for positioning the cutting tool, and is seeking your advice about which algorithm to use:
Algorithm A has an average – case run time of n, and a worst case run
time of n^4, where n is the number of input parameters.
Algorithm B has an average case run time of n[(log n)^3], and a worst
case of n^2. Which algorithm would you favour for inclusion in the
software? Justify your answer.
I think i should choose algorithm 1 because in medical science we must be more focused towards saving life of more people hence average case should be better however for worst case we can apply some optimization to work better or we can choose algorithm 2.
But i m confused how to proof it Mathematically that i am correct.
For all interested readers. However i have marked first answer good
but actually to get all the perspective do read comments and second
answer.
This would be a hard real-time system - if a deadline is missed then a patient might die. As such, Algorithm B is the preferred algorithm because you can design the hardware around the n^2 worst case, whereas with Algorithm A the hardware must account for the n^4 worst case (the average case doesn't matter because the system is ostensibly intended to save every patient's life, not 80% of the patient's lives).
This assumes that the hardware budget is not a factor. If the hardware budget is a factor (e.g. the machines are intended to be sold to facilities that are short on funds) then Algorithm A is preferred because the hardware can be designed for the n average case - this will result in patient deaths in the n^4 case, but presumably more patients will be saved by use of the machines. However, this is getting outside the scope of computer science and more into the scope of ethics - my preferred answer would be to select Algorithm B and pay for decent hardware.
I think for starters you need to determine what the time needed for the average surgical procedure is going to be. Obviously, if either O(n) or O(n*(log_n)^3) be significantly slower than the average procedure time then neither will do. If the former suffices but the latter would result in either lost lives or an unacceptable user experience, then you can rule out the latter algorithm. Assuming that both these running times will meet the average demand, then the next thing to examine is the worst case running time. The O(n) algorithm has a stiff penalty when running at its worst - O(n^4), while the O(n*(log_n)^3) runs at only O(n^2) at its worst. For every surgical procedure in your data set, you should examine how long the patient can survive before receiving the surgery, for each algorithm in the worst case. If many lives would potentially be lost due to a potential O(n^4) performance, then you may rule out this algorithm.

Difference between a stochastic and a heuristic algorithm

Extending the question of streetparade, I would like to ask what is the difference, if any, between a stochastic and a heuristic algorithm.
Would it be right to say that a stochastic algorithm is actually one type of heuristic?
TTBOMK, "stochastic algorithm" is not a standard term. "Randomized algorithm" is, however, and it's probably what is meant here.
Randomized: Uses randomness somehow. There are two flavours: Monte Carlo algorithms always finish in bounded time, but don't guarantee an optimal solution, while Las Vegas algorithms aren't necessarily guaranteed to finish in any finite time, but promise to find the optimal solution. (Usually they are also required to have a finite expected running time.) Examples of common Monte Carlo algorithms: MCMC, simulated annealing, and Miller-Rabin primality testing. Quicksort with randomized pivot choice is a Las Vegas algorithm that always finishes in finite time. An algorithm that does not use any randomness is deterministic.
Heuristic: Not guaranteed to find the correct answer. An algorithm that is not heuristic is exact.
Many heuristics are sensitive to "incidental" properties of the input that don't affect the true solution, such as the order items are considered in the First-Fit heuristic for the Bin Packing problem. In this case they can be thought of as Monte Carlo randomized algorithms: you can randomly permute the inputs and rerun them, always keeping the best answer you find. OTOH, other heuristics don't have this property -- e.g. the First-Fit-Decreasing heuristic is deterministic, since it always first sorts the items in decreasing size order.
If the set of possible outputs of a particular randomized algorithm is finite and contains the true answer, then running it long enough is "practically guaranteed" to eventually find it (in the sense that the probability of not finding it can be made arbitrarily small, but never 0). Note that it's not automatically the case that some permutation of the inputs to a heuristic will result in getting the exact answer -- in the case of First-Fit, it turns out that this is true, but this was only proven in 2009.
Sometimes stronger statements about convergence of randomized algorithms can be made: these are usually along the lines of "For any given small threshold d, after t steps we will be within d of the optimal solution with probability f(t, d)", with f(t, d) an increasing function of t and d.
Booth approaches are usually used to speed up genere and test solutions to NP complete problems
Stochastic algorithms use randomness
They use all combinations but not in order but instead they use random ones from the whole range of possibilities hoping to hit the solution sooner. Implementation is fast easy and single iteration is also fast (constant time)
Heuristics algorithms
They pick up the combinations not randomly but based on some knowledge on used process, input dataset, or usage instead. So they lower the number of combinations significantly to only those they are probably the solution and use only those but usually all of them until solution is found.
Implementation complexity depends on the problem, single iteration is usually much much slower then stochastic approach (constant time) so heuristics is used only if the number of possibilities is lowered enough to actual speed up is visible because even if algorithm complexity with heuristic is usually much lower sometimes the constant time is big enough to even slow things down ... (in runtime terms)
Booth approaches can be combined together

How to compute exact complexity of an algorithm?

Without resorting to asymptotic notation, is tedious step counting the only way to get the time complexity of an algorithm? And without step count of each line of code can we arrive at a big-O representation of any program?
Details: trying to find out the complexity of several numerical analysis algorithms to decide which will be best suited for solving a particular problem.
E.g. - from among Regula-Falsi or Newton-Rhapson method for solving eqns, intention is to evaluate the exact complexity of each method and then decide (putting value of 'n' or whatever arguments there are) which method is less complex.
The only way --- not the "easy" or hard way but the only reasonable way --- to find the exact complexity of a complicated algorithm is to profile it. A modern implementation of an algorithm has a complex interaction with numerical libraries and with the CPU and its floating point unit. For instance in-cache memory access is much faster than out-of-cache memory access, and on top of that there may be more than one level of cache. Counting steps is really much more suitable to the asymptotic complexity that you say isn't enough for your purpose.
But, if you did want to count steps automatically, there are also ways to do that. You can add a counter increment command (like "bloof++;" in C) to every line of code, and then display the value at the end.
You should also know about the more refined time complexity expression, f(n)*(1+o(1)), that is also useful for analytical calculations. For instance n^2+2*n+7 simplifies to n^2*(1+o(1)). If the constant factor is what bothers you about usual asymptotic notation O(f(n)), this refinement is a way to keep track of it and still throw out negligible terms.
The 'easy way' is to simulate it. Try your algorithms with lots of values of n and lots of different data, plot the results then match the curve on the graph to an equation.
Your results may not be strictly correct and they're only as valid as your ability to generate good test data but for most cases this will work.
E.g. - from among Regula-Falsi or Newton-Rhapson method for solving eqns, intention is to evaluate the exact complexity of each method and then decide (putting value of 'n' or whatever arguments there are) which method is less complex.
I don't think it's possible to answer this question in general for nonlinear solvers. You could an exact number of computations per iteration, but you're never going to know in general how many iterations it will take for each solver to converge. There are other complications like needing the Jacobian for Newton's which could make computing the complexity even more difficult.
To sum up, the most efficient nonlinear solver is always dependent on the problem you're solving. If the variety of problems you're solving is very limited, doing a bunch of experiments with different solvers and measuring the number of iterations and CPU time will probably give you more useful information.

What is the difference between a heuristic and an algorithm?

What is the difference between a heuristic and an algorithm?
An algorithm is the description of an automated solution to a problem. What the algorithm does is precisely defined. The solution could or could not be the best possible one but you know from the start what kind of result you will get. You implement the algorithm using some programming language to get (a part of) a program.
Now, some problems are hard and you may not be able to get an acceptable solution in an acceptable time. In such cases you often can get a not too bad solution much faster, by applying some arbitrary choices (educated guesses): that's a heuristic.
A heuristic is still a kind of an algorithm, but one that will not explore all possible states of the problem, or will begin by exploring the most likely ones.
Typical examples are from games. When writing a chess game program you could imagine trying every possible move at some depth level and applying some evaluation function to the board. A heuristic would exclude full branches that begin with obviously bad moves.
In some cases you're not searching for the best solution, but for any solution fitting some constraint. A good heuristic would help to find a solution in a short time, but may also fail to find any if the only solutions are in the states it chose not to try.
An algorithm is typically deterministic and proven to yield an optimal result
A heuristic has no proof of correctness, often involves random elements, and may not yield optimal results.
Many problems for which no efficient algorithm to find an optimal solution is known have heuristic approaches that yield near-optimal results very quickly.
There are some overlaps: "genetic algorithms" is an accepted term, but strictly speaking, those are heuristics, not algorithms.
Heuristic, in a nutshell is an "Educated guess". Wikipedia explains it nicely. At the end, a "general acceptance" method is taken as an optimal solution to the specified problem.
Heuristic is an adjective for
experience-based techniques that help
in problem solving, learning and
discovery. A heuristic method is used
to rapidly come to a solution that is
hoped to be close to the best possible
answer, or 'optimal solution'.
Heuristics are "rules of thumb",
educated guesses, intuitive judgments
or simply common sense. A heuristic is
a general way of solving a problem.
Heuristics as a noun is another name
for heuristic methods.
In more precise terms, heuristics
stand for strategies using readily
accessible, though loosely applicable,
information to control problem solving
in human beings and machines.
While an algorithm is a method containing finite set of instructions used to solving a problem. The method has been proven mathematically or scientifically to work for the problem. There are formal methods and proofs.
Heuristic algorithm is an algorithm that is able to produce an
acceptable solution to a problem in
many practical scenarios, in the
fashion of a general heuristic, but
for which there is no formal proof of
its correctness.
An algorithm is a self-contained step-by-step set of operations to be performed 4, typically interpreted as a finite sequence of (computer or human) instructions to determine a solution to a problem such as: is there a path from A to B, or what is the smallest path between A and B. In the latter case, you could also be satisfied with a 'reasonably close' alternative solution.
There are certain categories of algorithms, of which the heuristic algorithm is one. Depending on the (proven) properties of the algorithm in this case, it falls into one of these three categories (note 1):
Exact: the solution is proven to be an optimal (or exact solution) to the input problem
Approximation: the deviation of the solution value is proven to be never further away from the optimal value than some pre-defined bound (for example, never more than 50% larger than the optimal value)
Heuristic: the algorithm has not been proven to be optimal, nor within a pre-defined bound of the optimal solution
Notice that an approximation algorithm is also a heuristic, but with the stronger property that there is a proven bound to the solution (value) it outputs.
For some problems, noone has ever found an 'efficient' algorithm to compute the optimal solutions (note 2). One of those problems is the well-known Traveling Salesman Problem. Christophides' algorithm for the Traveling Salesman Problem, for example, used to be called a heuristic, as it was not proven that it was within 50% of the optimal solution. Since it has been proven, however, Christophides' algorithm is more accurately referred to as an approximation algorithm.
Due to restrictions on what computers can do, it is not always possible to efficiently find the best solution possible. If there is enough structure in a problem, there may be an efficient way to traverse the solution space, even though the solution space is huge (i.e. in the shortest path problem).
Heuristics are typically applied to improve the running time of algorithms, by adding 'expert information' or 'educated guesses' to guide the search direction. In practice, a heuristic may also be a sub-routine for an optimal algorithm, to determine where to look first.
(note 1): Additionally, algorithms are characterised by whether they include random or non-deterministic elements. An algorithm that always executes the same way and produces the same answer, is called deterministic.
(note 2): This is called the P vs NP problem, and problems that are classified as NP-complete and NP-hard are unlikely to have an 'efficient' algorithm. Note; as #Kriss mentioned in the comments, there are even 'worse' types of problems, which may need exponential time or space to compute.
There are several answers that answer part of the question. I deemed them less complete and not accurate enough, and decided not to edit the accepted answer made by #Kriss
Actually I don't think that there is a lot in common between them. Some algorithm use heuristics in their logic (often to make fewer calculations or get faster results). Usually heuristics are used in the so called greedy algorithms.
Heuristics is some "knowledge" that we assume is good to use in order to get the best choice in our algorithm (when a choice should be taken). For example ... a heuristics in chess could be (always take the opponents' queen if you can, since you know this is the stronger figure). Heuristics do not guarantee you that will lead you to the correct answer, but (if the assumptions is correct) often get answer which are close to the best in much shorter time.
An Algorithm is a clearly defined set of instructions to solve a problem, Heuristics involve utilising an approach of learning and discovery to reach a solution.
So, if you know how to solve a problem then use an algorithm. If you need to develop a solution then it's heuristics.
Heuristics are algorithms, so in that sense there is none, however, heuristics take a 'guess' approach to problem solving, yielding a 'good enough' answer, rather than finding a 'best possible' solution.
A good example is where you have a very hard (read NP-complete) problem you want a solution for but don't have the time to arrive to it, so have to use a good enough solution based on a heuristic algorithm, such as finding a solution to a travelling salesman problem using a genetic algorithm.
Algorithm is a sequence of some operations that given an input computes something (a function) and outputs a result.
Algorithm may yield an exact or approximate values.
It also may compute a random value that is with high probability close to the exact value.
A heuristic algorithm uses some insight on input values and computes not exact value (but may be close to optimal).
In some special cases, heuristic can find exact solution.
A heuristic is usually an optimization or a strategy that usually provides a good enough answer, but not always and rarely the best answer. For example, if you were to solve the traveling salesman problem with brute force, discarding a partial solution once its cost exceeds that of the current best solution is a heuristic: sometimes it helps, other times it doesn't, and it definitely doesn't improve the theoretical (big-oh notation) run time of the algorithm
I think Heuristic is more of a constraint used in Learning Based Model in Artificial Intelligent since the future solution states are difficult to predict.
But then my doubt after reading above answers is
"How would Heuristic can be successfully applied using Stochastic Optimization Techniques? or can they function as full fledged algorithms when used with Stochastic Optimization?"
http://en.wikipedia.org/wiki/Stochastic_optimization
One of the best explanations I have read comes from the great book Code Complete, which I now quote:
A heuristic is a technique that helps you look for an answer. Its
results are subject to chance because a heuristic tells you only how
to look, not what to find. It doesn’t tell you how to get directly
from point A to point B; it might not even know where point A and
point B are. In effect, a heuristic is an algorithm in a clown suit.
It’s less predict- able, it’s more fun, and it comes without a 30-day,
money-back guarantee.
Here is an algorithm for driving to someone’s house: Take Highway 167
south to Puy-allup. Take the South Hill Mall exit and drive 4.5 miles
up the hill. Turn right at the light by the grocery store, and then
take the first left. Turn into the driveway of the large tan house on
the left, at 714 North Cedar.
Here’s a heuristic for getting to someone’s house: Find the last
letter we mailed you. Drive to the town in the return address. When
you get to town, ask someone where our house is. Everyone knows
us—someone will be glad to help you. If you can’t find anyone, call us
from a public phone, and we’ll come get you.
The difference between an algorithm and a heuristic is subtle, and the
two terms over-lap somewhat. For the purposes of this book, the main
difference between the two is the level of indirection from the
solution. An algorithm gives you the instructions directly. A
heuristic tells you how to discover the instructions for yourself, or
at least where to look for them.
They find a solution suboptimally without any guarantee as to the quality of solution found, it is obvious that it makes sense to the development of heuristics only polynomial. The application of these methods is suitable to solve real world problems or large problems so awkward from the computational point of view that for them there is not even an algorithm capable of finding an approximate solution in polynomial time.

Resources