Detecting infinite loop in brainfuck program - algorithm

I have written a simple brainfuck interpreter in MATLAB script language. It is fed random bf programs to execute (as part of a genetic algorithm project). The problem I face is, the program turns out to have an infinite loop in a sizeable number of cases, and hence the GA gets stuck at the point.
So, I need a mechanism to detect infinite loops and avoid executing that code in bf.
One obvious (trivial) case is when I have
[]
I can detect this and refuse to run that program.
For the non-trivial cases, I figured out that the basic idea is: to determine how one iteration of the loop changes the current cell. If the change is negative, we're eventually going to reach 0, so it's a finite loop. Otherwise, if the change is non-negative, it's an infinite loop.
Implementing this is easy for the case of a single loop, but with nested loops it becomes very complicated. For example, (in what follows (1) refers to contents of cell 1, etc. )
++++ Put 4 in 1st cell (1)
>+++ Put 3 in (2)
<[ While( (1) is non zero)
-- Decrease (1) by 2
>[ While( (2) is non zero)
- Decrement (2)
<+ Increment (1)
>]
(2) would be 0 at this point
+++ Increase (2) by 3 making (2) = 3
<] (1) was decreased by 2 and then increased by 3, so net effect is increment
and hence the code runs on and on. A naive check of the number of +'s and -'s done on cell 1, however, would say the number of -'s is more, so would not detect the infinite loop.
Can anyone think of a good algorithm to detect infinite loops, given arbitrary nesting of arbitrary number of loops in bf?
EDIT: I do know that the halting problem is unsolvable in general, but I was not sure whether there did not exist special case exceptions. Like, maybe Matlab might function as a Super Turing machine able to determine the halting of the bf program. I might be horribly wrong, but if so, I would like to know exactly how and why.
SECOND EDIT: I have written what I purport to be infinite loop detector. It probably misses some edge cases (or less probably, somehow escapes Mr. Turing's clutches), but seems to work for me as of now.
In pseudocode form, here it goes:
subroutine bfexec(bfprogram)
begin
Looping through the bfprogram,
If(current character is '[')
Find the corresponding ']'
Store the code between the two brackets in, say, 'subprog'
Save the value of the current cell in oldval
Call bfexec recursively with subprog
Save the value of the current cell in newval
If(newval >= oldval)
Raise an 'infinite loop' error and exit
EndIf
/* Do other character's processings */
EndIf
EndLoop
end

Alan Turing would like to have a word with you.
http://en.wikipedia.org/wiki/Halting_problem

When I used linear genetic programming, I just used an upper bound for the number of instructions a single program was allowed to do in its lifetime. I think that this is sensible in two ways: I cannot really solve the halting problem anyway, and programs that take too long to compute are not worthy of getting more time anyway.

Let's say you did write a program that could detect whether this program would run in an infinite loop. Let's say for the sake of simplicity that this program was written in brainfuck to analyze brainfuck programs (though this is not a precondition of the following proof, because any language can emulate brainfuck and brainfuck can emulate any language).
Now let's say you extend the checker program to make a new program. This new program exits immediately when its input loops indefinitely, and loops forever when its input exits at some point.
If you input this new program into itself, what will the results be?
If this program loops forever when run, then by its own definition it should exit immediately when run with itself as input. And vice versa. The checker program cannot possibly exist, because its very existence implies a contradiction.
As has been mentioned before, you are essentially restating the famous halting problem:
http://en.wikipedia.org/wiki/Halting_problem
Ed. I want to make clear that the above disproof is not my own, but is essentially the famous disproof Alan Turing gave back in 1936.

State in bf is a single array of chars.
If I were you, I'd take a hash of the bf interpreter state on every "]" (or once in rand(1, 100) "]"s*) and assert that the set of hashes is unique.
The second (or more) time I see a certain hash, I save the whole state aside.
The third (or more) time I see a certain hash, I compare the whole state to the saved one(s) and if there's a match, I quit.
On every input command ('.', IIRC) I reset my saved states and list of hashes.
An optimization is to only hash the part of state that was touched.
I haven't solved the halting problem - I'm detecting infinite loops while running the program.
*The rand is to make the check independent of loop period

Infinite loop cannot be detected, but you can detect if the program is taking too much time.
Implement a timeout by incrementing a counter every time you run a command (e.g. <, >, +, -). When the counter reaches some large number, which you set by observation, you can say that it takes very long time to execute your program. For your purpose, "very long" and infinite is a good-enough approximation.

As already mentioned this is the Halting Problem.
But in your case there might be a solution: The Halting Problem is considering is about the Turing machine, which has unlimited memory.
In case you know that you have a upper limit of memory (e.g. you know you dont use more than 10 memory cells), you can execute your programm and stop it. The idea is that the computation space bounds computation time (as you cant write more than one cell at one step). After you executed as much steps as you can have different memory configurations, you can break. E.g. if you have 3 cells, with 256 conditions, you can have at most 3^256 different states, and so you can stop after executing that many steps. But be careful, there are implicit cells, like the instruction pointer and the registers. You do it even shorter, if you save every state configuration and as soon as you detect one, which you already had, you have an infite loop. This approach is definitly much better in the run time, but therefor needs much more space (here it might be suitable to hash the configurations).

This is not the halting problem, however, it is still not reasonable to try to detect halting even in such a limited machine as a 1000 cell BF machine.
Consider this program:
+[->[>]+<[-<]+]
This program will not repeat until it has filled up the entire of memory which for just 1000 cells will take about 10^300 years.

If I remember correctly, the halting problem proof was only true for some extreme case that involved self reference. However it's still trivial to show a practical example of why you can't make an infinite loop detector.
Consider Fermat's Last Theorem. It's easy to create a program that iterates through every number (or in this case 3 numbers), and detects if it's a counterexample to the theorem. If so it halts, otherwise it continues.
So if you have an infinite loop detector, it should be able to prove this theorem, and many many others (perhaps all others, if they can be reduced to searching for counterexamples.)
In general, any program that involves iterating through numbers and only stopping under some condition, would require a general theorem prover to prove if that condition can ever be met. And that's the simplest case of looping there is.

Off the top of my head (and I could be wrong), I would think it would be a little bit difficult to detect whether or not a program has an infinite loop without actually executing the program itself.
As the conditional execution of portions of the program depends on the execution state of the program, it will be difficult to know the particular state of the program without actually executing the program.
If you don't require that a program with an infinite loop be executed, you could try having an "instructions executed" counter, and only execute a finite number of instructions. This way, if a program does have an infinite loop, the interpreter can terminate the program which is stuck in an infinite loop.

Related

Prove that we can decide whether a Turing machine takes at least 100 steps on some input

We know that the problem “Does this Turing machine take at least this finite number of steps on that input?” is decidable, because it will always answer yes or no, where it will say yes if the machine reaches the given number of steps and no if it halts before that.
Now here is my doubt: if it halts before reaching those many steps — i.e. the input either (1) got accepted or (2) got rejected or maybe (3)if it doesn’t halt but rather goes into an infinite loop — then, when we are in case (3), how can we be sure that it will always be in that loop?
What I mean to say is that if it doesn't run forever but comes out of the loop at some point of time then it might cross the asked number of steps and the decision can be made now which was earlier not possible. If so, then how can we conclude that it's decidable when we know that being stuck in a loop we won’t be able to say anything about the outcome?
(I already more or less answered your question when I edited it.)
The thing is, the decision system (a Turing machine, an algorithm or any other equivalent formalism) that takes as inputs a Turing machine M, a number N and a value X, and returns yes or no, has total control over how it executes M on X. It simulates it step by step. So it can run one step of M(X), increment an instruction counter, compare it to N and, as soon as the given number of steps is reached, it stops and returns yes. At that point, there is no need that the simulated machine M be in a final state, and actually the full computation M(X) could very well diverge. We don’t care, because we only run the first N steps.
Most likely the "conditional structures where not being debuged/developed enough so that multiple conditions often conflicted each other..the error reporting where not as definitive, so it where used semi abstract notions as "decidable" and "undecidable"
as a semi example i writen years ago in vbs a "64 bit rom memory" simulator, as i tried to manage the memory cells, where i/o read/write locations where atributed , using manny formulas and conditions to set conversions from decimal to binary and all the operations, indexing, etc.
I had allso run into bugs becouse that the conditons where not perfect.Why? becouse the program had some unresolved somewhat arbitrary results that could had ended up in :
print.debug "decidable"
On Error Resume h
h:
print.debug "undecidable"
this was a example with a clear scope and with a debatable result.
to resume to your question : > "so how do we conclude that it's decidable??"
wikipedia :
The Turing machine was invented in 1936 by Alan Turing, who called it an "a-machine" (automatic machine). With this model, Turing was able to answer two questions in the negative:
Does a machine exist that can determine whether any arbitrary machine on its tape is "circular" (e.g., freezes, or fails to continue its computational task)?
Does a machine exist that can determine whether any arbitrary machine on its tape ever prints a given symbol?
Thus by providing a mathematical description of a very simple device capable of arbitrary computations, he was able to prove properties of computation in general—and in particular, the uncomputability of the ('decision problem').

Looping in vhdl

I am writing a code for RSA algorithm . I need to use loop for it to work. But the loop doesn't have definite bound. So it is not sythesizable . Is there any other methods for looping? Please help.
The kind of loop is irrelevant. You cannot synthesise a variable amount of hardware. However, for a loop to be synthesisable, it must have a definite upper bound - a maximum number of iterations must be clear to the synthesiser. It is allowed to exit a loop early.
I would recommend you stick to for loops for synthesis. This will make your code more portable.
You don't need to use a loop, but perhaps you feel it's most convenient?
If you are using the loop to define how much hardware gets built you need to include all the possible hardware (so have a high bound to the loop) and then use some logic to take the output you require from the right place in the hardware so emulate the case of the loop "exiting early"
Alternatively, if you are emulating a software loop in a state machine, then you can keep track of iterations, or a flag for "complete", in a state variable, and use that to move onto the next state when you have performed enough computation.

How to recognize variables that don't affect the output of a program?

Sometimes the value of a variable accessed within the control-flow of a program cannot possibly have any effect on a its output. For example:
global var_1
global var_2
start program hello(var_3, var_4)
if (var_2 < 0) then
save-log-to-disk (var_1, var_3, var_4)
end-if
return ("Hello " + var_3 + ", my name is " + var_1)
end program
Here only var_1 and var_3 have any influence on the output, while var_2 and var_4 are only used for side effects.
Do variables such as var_1 and var_3 have a name in dataflow-theory/compiler-theory?
Which static dataflow analysis techniques can be used to discover them?
References to academic literature on the subject would be particularly appreciated.
The problem that you stated is undecidable in general,
even for the following very narrow special case:
Given a single routine P(x), where x is a parameter of type integer. Is the output of P(x) independent of the value of x, i.e., does
P(0) = P(1) = P(2) = ...?
We can reduce the following still undecidable version of the halting problem to the question above: Given a Turing machine M(), does the program
never stop on the empty input?
I assume that we use a (Turing-complete) language in which we can build a "Turing machine simulator":
Given the program M(), construct this routine:
P(x):
if x == 0:
return 0
Run M() for x steps
if M() has terminated then:
return 1
else:
return 0
Now:
P(0) = P(1) = P(2) = ...
=>
M() does not terminate.
M() does terminate
=> P(x) = 1 for a sufficiently large x
=> P(x) != P(0) = 0
So, it is very difficult for a compiler to decide whether a variable actually does not influence the return value of a routine; in your example, the "side effect routine" might manipulate one of its values (or even loop infinitely, which would most definitely change the return value of the routine ;-)
Of course overapproximations are still possible. For example, one might conclude that a variable does not influence the return value if it does not appear in the routine body at all. You can also see some classical compiler analyses (like Expression Simplification, Constant propagation) having the side effect of eliminating appearances of such redundant variables.
Pachelbel has discussed the fact that you cannot do this perfectly. OK, I'm an engineer, I'm willing to accept some dirt in my answer.
The classic way to answer you question is to do dataflow tracing from program outputs back to program inputs. A dataflow is the connection of a program assignment (or sideeffect) to a variable value, to a place in the application that consumes that value.
If there is (transitive) dataflow from a program output that you care about (in your example, the printed text stream) to an input you supplied (var2), then that input "affects" the output. A variable that does not flow from the input to your desired output is useless from your point of view.
If you focus your attention only the computations involved in the dataflows, and display them, you get what is generally called a "program slice" . There are (very few) commercial tools that can show this to you.
Grammatech has a good reputation here for C and C++.
There are standard compiler algorithms for constructing such dataflow graphs; see any competent compiler book.
They all suffer from some limitation due to Turing's impossibility proofs as pointed out by Pachelbel. When you implement such a dataflow algorithm, there will be places that it cannot know the right answer; simply pick one.
If your algorithm chooses to answer "there is no dataflow" in certain places where it is not sure, then it may miss a valid dataflow and it might report that a variable does not affect the answer incorrectly. (This is called a "false negative"). This occasional error may be satisfactory if
the algorithm has some other nice properties, e.g, it runs really fast on a millions of code. (The trivial algorithm simply says "no dataflow" in all places, and it is really fast :)
If your algorithm chooses to answer "yes there is a dataflow", then it may claim that some variable affects the answer when it does not. (This is called a "false positive").
You get to decide which is more important; many people prefer false positives when looking for a problem, because then you have to at least look at possibilities detected by the tool. A false negative means it didn't report something you might care about. YMMV.
Here's a starting reference: http://en.wikipedia.org/wiki/Data-flow_analysis
Any of the books on that page will be pretty good. I have Muchnick's book and like it lot. See also this page: (http://en.wikipedia.org/wiki/Program_slicing)
You will discover that implementing this is pretty big effort, for any real langauge. You are probably better off finding a tool framework that does most or all this for you already.
I use the following algorithm: a variable is used if it is a parameter or it occurs anywhere in an expression, excluding as the LHS of an assignment. First, count the number of uses of all variables. Delete unused variables and assignments to unused variables. Repeat until no variables are deleted.
This algorithm only implements a subset of the OP's requirement, it is horribly inefficient because it requires multiple passes. A garbage collection may be faster but is harder to write: my algorithm only requires a list of variables with usage counts. Each pass is linear in the size of the program. The algorithm effectively does a limited kind of dataflow analysis by elimination of the tail of a flow ending in an assignment.
For my language the elimination of side effects in the RHS of an assignment to an unused variable is mandated by the language specification, it may not be suitable for other languages. Effectiveness is improved by running before inlining to reduce the cost of inlining unused function applications, then running it again afterwards which eliminates parameters of inlined functions.
Just as an example of the utility of the language specification, the library constructs a thread pool and assigns a pointer to it to a global variable. If the thread pool is not used, the assignment is deleted, and hence the construction of the thread pool elided.
IMHO compiler optimisations are almost invariably heuristics whose performance matters more than effectiveness achieving a theoretical goal (like removing unused variables). Simple reductions are useful not only because they're fast and easy to write, but because a programmer using a language who understand basics of the compiler operation can leverage this knowledge to help the compiler. The most well known example of this is probably the refactoring of recursive functions to place the recursion in tail position: a pointless exercise unless the programmer knows the compiler can do tail-recursion optimisation.

Infinite Loop : Determining and breaking out of Infinite loop

How would you determine a loop is a infinite loop and will break out of it.
Does anyone has the algorithm or can assist me on this one.
Thanks
There is no general case algorithm that can determine if a program is in an infinite loop or not for every turing complete language, this is basically the Halting Problem.
The idea of proving it is simple:
Assume you had such an algorithm A.
Build a program B that invokes A on itself [on B].
if A answers "the program will halt" - do an infinite loop
else [A answers B doesn't halt] - halt immidiately
Now, assume you invoke A on B - the answer will be definetly wrong, thus A doesn't exist.
Note: the above is NOT a formal proof, just a sketch of it.
As written by others, it cannot be determined.
However, if you want to have some checking, you can use the WatchDog design pattern.
This is a separate thread that checks if a task still is active. Your own thread should give a signal regularly to say it is alive. Make sure this signal is not set inside your (infinite) loop.
If there was no signal, the program is inside an infinite loop or has stopped and the watchdog can act on it.

Print 1 followed by googolplex number of zeros

Assuming we are not concerned about running time of the program (which is practically infinite for human mortals) and using limited amount of memory (2^64 bytes), we want to print out in base 10, the exact value of 10^(googolplex), one digit at a time on screen (mostly zeros).
Describe an algorithm (which can be coded on current day computers), or write a program to do this.
Since we cannot practically check the output, so we will rely on collective opinion on the correctness of the program.
NOTE : I do not know the solution, or whether a solution exists or not. The problem is my own invention. To those readers who are quick to mark this offtopic... kindly reconsider. This is difficult and bit theoretical but definitely CS.
This is impossible. There are more states (10^(10^100)) in the program than there are electrons in the universe (~10^80). Therefore, in our universe, there can be no such realization of a machine capable of executing the task.
First of all, we note that 10^(10^100) is equivalent to ((((10^10)^10)^...)^10), 100 times.
Or 10↑↑↑↑↑↑↑↑↑↑10.
This gives rise to the following solution:
print 1
for i in A(10, 100)
print 0
in bash:
printf 1
while true; do
printf 0
done
... close enough.
Here's an algorithm that solves this:
print 1
for 1 to 10^(10^100)
print 0
One can trivially prove correctness using Hoare logic:
There are no pre-conditions
The post condition is that a one followed by 10^(10^100) zeros are printed
The cycle's invariant is that the number of zeros printed so far is equal to i
EDIT: A machine to solve the problem needs the ability to distinguish between one googolplex of distinct states: each state is the result of printing one more zero than the previous. The amount of memory needed to do this is the same needed to store the number one googolplex. If there isn't that much memory available, this problem cannot be solved.
This does not mean it isn't a computable problem: it can be solved by a Turing machine because a Turing machine has a limitless amount of memory.
There definitely is a solution to this problem in theory, assuming of course you have a machine that is capable of producing that sort of output. I'm pretty sure that a googolplex is larger than the number of atoms in the universe, at least according to what the physicists tell us, so I don't think that any physically realizable model of computation could print it out. However, mathematically speaking, you could define a Turing machine capable of printing out the value by just giving it a googolplex-ish number of states and having each write a zero and then move to the next lower state.
Consider the following:
The console window to which you are printing the output will have a maximum buffer size.
When this buffer size is exceeded, anything printed earlier is discarded, and the user will not be able to scroll back to see it.
The maximum buffer size will be minuscule compared to a googolplex.
Therefore, if you want to mimic the user experience of your program running to completion, find the maximum buffer size of the console you will print to and print that many zeroes.
Hurray laziness!

Resources