Looping in vhdl - vhdl

I am writing a code for RSA algorithm . I need to use loop for it to work. But the loop doesn't have definite bound. So it is not sythesizable . Is there any other methods for looping? Please help.

The kind of loop is irrelevant. You cannot synthesise a variable amount of hardware. However, for a loop to be synthesisable, it must have a definite upper bound - a maximum number of iterations must be clear to the synthesiser. It is allowed to exit a loop early.
I would recommend you stick to for loops for synthesis. This will make your code more portable.

You don't need to use a loop, but perhaps you feel it's most convenient?
If you are using the loop to define how much hardware gets built you need to include all the possible hardware (so have a high bound to the loop) and then use some logic to take the output you require from the right place in the hardware so emulate the case of the loop "exiting early"
Alternatively, if you are emulating a software loop in a state machine, then you can keep track of iterations, or a flag for "complete", in a state variable, and use that to move onto the next state when you have performed enough computation.

Related

How to recognize variables that don't affect the output of a program?

Sometimes the value of a variable accessed within the control-flow of a program cannot possibly have any effect on a its output. For example:
global var_1
global var_2
start program hello(var_3, var_4)
if (var_2 < 0) then
save-log-to-disk (var_1, var_3, var_4)
end-if
return ("Hello " + var_3 + ", my name is " + var_1)
end program
Here only var_1 and var_3 have any influence on the output, while var_2 and var_4 are only used for side effects.
Do variables such as var_1 and var_3 have a name in dataflow-theory/compiler-theory?
Which static dataflow analysis techniques can be used to discover them?
References to academic literature on the subject would be particularly appreciated.
The problem that you stated is undecidable in general,
even for the following very narrow special case:
Given a single routine P(x), where x is a parameter of type integer. Is the output of P(x) independent of the value of x, i.e., does
P(0) = P(1) = P(2) = ...?
We can reduce the following still undecidable version of the halting problem to the question above: Given a Turing machine M(), does the program
never stop on the empty input?
I assume that we use a (Turing-complete) language in which we can build a "Turing machine simulator":
Given the program M(), construct this routine:
P(x):
if x == 0:
return 0
Run M() for x steps
if M() has terminated then:
return 1
else:
return 0
Now:
P(0) = P(1) = P(2) = ...
=>
M() does not terminate.
M() does terminate
=> P(x) = 1 for a sufficiently large x
=> P(x) != P(0) = 0
So, it is very difficult for a compiler to decide whether a variable actually does not influence the return value of a routine; in your example, the "side effect routine" might manipulate one of its values (or even loop infinitely, which would most definitely change the return value of the routine ;-)
Of course overapproximations are still possible. For example, one might conclude that a variable does not influence the return value if it does not appear in the routine body at all. You can also see some classical compiler analyses (like Expression Simplification, Constant propagation) having the side effect of eliminating appearances of such redundant variables.
Pachelbel has discussed the fact that you cannot do this perfectly. OK, I'm an engineer, I'm willing to accept some dirt in my answer.
The classic way to answer you question is to do dataflow tracing from program outputs back to program inputs. A dataflow is the connection of a program assignment (or sideeffect) to a variable value, to a place in the application that consumes that value.
If there is (transitive) dataflow from a program output that you care about (in your example, the printed text stream) to an input you supplied (var2), then that input "affects" the output. A variable that does not flow from the input to your desired output is useless from your point of view.
If you focus your attention only the computations involved in the dataflows, and display them, you get what is generally called a "program slice" . There are (very few) commercial tools that can show this to you.
Grammatech has a good reputation here for C and C++.
There are standard compiler algorithms for constructing such dataflow graphs; see any competent compiler book.
They all suffer from some limitation due to Turing's impossibility proofs as pointed out by Pachelbel. When you implement such a dataflow algorithm, there will be places that it cannot know the right answer; simply pick one.
If your algorithm chooses to answer "there is no dataflow" in certain places where it is not sure, then it may miss a valid dataflow and it might report that a variable does not affect the answer incorrectly. (This is called a "false negative"). This occasional error may be satisfactory if
the algorithm has some other nice properties, e.g, it runs really fast on a millions of code. (The trivial algorithm simply says "no dataflow" in all places, and it is really fast :)
If your algorithm chooses to answer "yes there is a dataflow", then it may claim that some variable affects the answer when it does not. (This is called a "false positive").
You get to decide which is more important; many people prefer false positives when looking for a problem, because then you have to at least look at possibilities detected by the tool. A false negative means it didn't report something you might care about. YMMV.
Here's a starting reference: http://en.wikipedia.org/wiki/Data-flow_analysis
Any of the books on that page will be pretty good. I have Muchnick's book and like it lot. See also this page: (http://en.wikipedia.org/wiki/Program_slicing)
You will discover that implementing this is pretty big effort, for any real langauge. You are probably better off finding a tool framework that does most or all this for you already.
I use the following algorithm: a variable is used if it is a parameter or it occurs anywhere in an expression, excluding as the LHS of an assignment. First, count the number of uses of all variables. Delete unused variables and assignments to unused variables. Repeat until no variables are deleted.
This algorithm only implements a subset of the OP's requirement, it is horribly inefficient because it requires multiple passes. A garbage collection may be faster but is harder to write: my algorithm only requires a list of variables with usage counts. Each pass is linear in the size of the program. The algorithm effectively does a limited kind of dataflow analysis by elimination of the tail of a flow ending in an assignment.
For my language the elimination of side effects in the RHS of an assignment to an unused variable is mandated by the language specification, it may not be suitable for other languages. Effectiveness is improved by running before inlining to reduce the cost of inlining unused function applications, then running it again afterwards which eliminates parameters of inlined functions.
Just as an example of the utility of the language specification, the library constructs a thread pool and assigns a pointer to it to a global variable. If the thread pool is not used, the assignment is deleted, and hence the construction of the thread pool elided.
IMHO compiler optimisations are almost invariably heuristics whose performance matters more than effectiveness achieving a theoretical goal (like removing unused variables). Simple reductions are useful not only because they're fast and easy to write, but because a programmer using a language who understand basics of the compiler operation can leverage this knowledge to help the compiler. The most well known example of this is probably the refactoring of recursive functions to place the recursion in tail position: a pointless exercise unless the programmer knows the compiler can do tail-recursion optimisation.

Parallelizing an algorithm with many exit points?

I'm faced with parallelizing an algorithm which in its serial implementation examines the six faces of a cube of array locations within a much larger three dimensional array. (That is, select an array element, and then define a cube or cuboid around that element 'n' elements distant in x, y, and z, bounded by the bounds of the array.
Each work unit looks something like this (Fortran pseudocode; the serial algorithm is in Fortran):
do n1=nlo,nhi
do o1=olo,ohi
if (somecondition(n1,o1) .eq. .TRUE.) then
retval =.TRUE.
RETURN
endif
end do
end do
Or C pseudocode:
for (n1=nlo,n1<=nhi,n++) {
for (o1=olo,o1<=ohi,o++) {
if(somecondition(n1,o1)!=0) {
return (bool)true;
}
}
}
There are six work units like this in the total algorithm, where the 'lo' and 'hi' values generally range between 10 and 300.
What I think would be best would be to schedule six or more threads of execution, round-robin if there aren't that many CPU cores, ideally with the loops executing in parallel, with the goal the same as the serial algorithm: somecondition() becomes True, execution among all the threads must immediately stop and a value of True set in a shared location.
What techniques exist in a Windows compiler to facilitate parallelizing tasks like this? Obviously, I need a master thread which waits on a semaphore or the completion of the worker threads, so there is a need for nesting and signaling, but my experience with OpenMP is introductory at this point.
Are there message passing mechanisms in OpenMP?
EDIT: If the highest difference between "nlo" and "nhi" or "olo" and "ohi" is eight to ten, that would imply no more than 64 to 100 iterations for this nested loop, and no more than 384 to 600 iterations for the six work units together. Based on that, is it worth parallelizing at all?
Would it be better to parallelize the loop over the array elements and leave this algorithm serial, with multiple threads running the algorithm on different array elements? I'm thinking this from your comment "The time consumption comes from the fact that every element in the array must be tested like this. The arrays commonly have between four million and twenty million elements." The design of implementing the parallelelization of the array elements is also flexible in terms of the number threads. Unless there is a reason that the array elements have to be checked in some order?
It seems that the portion that you are showing us doesn't take that long to execute so making it take less clock time by making it parallel might not be easy ... there is always some overhead to multiple threads, and if there is not much time to gain, parallel code might not be faster.
One possibility is to use OpenMP to parallelize over the 6 loops -- declare logical :: array(6), allow each loop to run to completion, and then retval = any(array). Then you can check this value and return outside the parallelized loop. Add a schedule(dynamic) to the parallel do statement if you do this. Or, have a separate !$omp parallel and then put !$omp do schedule(dynamic) ... !$omp end do nowait around each of the 6 loops.
Or, you can follow the good advice by #M.S.B. and parallelize the outermost loop over the whole array. The problem here is that you cannot have a RETURN inside a parallel loop -- so label the second outermost loop (the largest one within the parallel part), and EXIT that loop -- smth like
retval = .FALSE.
!$omp parallel do default(private) shared(BIGARRAY,retval) schedule(dynamic,1)
do k=1,NN
if(.not. retval) then
outer2: do j=1,NN
do i=1,NN
! --- your loop #1
do n1=nlo,nhi
do o1=olo,ohi
if (somecondition(BIGARRAY(i,j,k),n1,o1)) then
retval =.TRUE.
exit outer2
endif
end do
end do
! --- your loops #2 ... #6 go here
end do
end do outer2
end if
end do
!$omp end parallel do
[edit: the if statement is there presuming that you need to find out if there is at least one element like that in the big array. If you need to figure the condition for every element, you can similarly either add a dummy loop exit or goto, skipping the rest of the processing for that element. Again, use schedule(dynamic) or schedule(guided).]
As a separate point, you might also want to check if it may be a good idea to go through the innermost loop by some larger step (depending on float size), compute a vector of logicals on each iteration and then aggregate the results, eg. smth like if(count(somecondition(x(o1:o1+step,n1,k)))>0); in this case the compiler may be able to vectorize somecondition.
I believe you can do what you want with the task construct introduced in OpenMP 3; Intel Fortran supports tasking in OpenMP. I don't use tasks often so I won't offer you any wonky pseudocode.
You already mentioned the obvious way to stop all threads as soon as any thread finds the ending condition: have each check some shared variable which gives the status of the ending condition, thereby determining whether to break out of the loops. Obviously this is an overhead, so if you decide to take this approach I would suggest a few things:
Use atomics to check the ending condition, this avoids expensive memory flushing as just the variable in question is flushed. Move to OpenMP 3.1, there are some new atomic operations supported.
Check infrequently, maybe like once per outer iteration. You should only be parallelizing large cases to overcome the overhead of multithreading.
This one is optional, but you can try adding compiler hints, e.g. if you expect a certain condition to be false most of the time, the compiler will optimize the code accordingly.
Another (somewhat dirty) approach is to use shared variables for the loop ranges for each thread, maybe use a shared array where index n is for thread n. When one thread finds the ending condition, it changes the loop ranges of all the other threads so that they stop. You'll need the appropriate memory synchronization. Basically the overhead has now moved from checking a dummy variable to synchronizing/checking loop conditions. Again probably not so good to do this frequently, so maybe use shared outer loop variables and private inner loop variables.
On another note, this reminds me of the classic polling versus interrupt problem. Unfortunately I don't think OpenMP supports interrupts where you can send some kind of kill signal to each thread.
There are hacking work-arounds like using a child process for just this parallel work and invoking the operating system scheduler to emulate interrupts, however this is rather tricky to get correct and would make your code extremely unportable.
Update in response to comment:
Try something like this:
char shared_var = 0;
#pragma omp parallel
{
//you should have some method for setting loop ranges for each thread
for (n1=nlo; n1<=nhi; n1++) {
for (o1=olo; o1<=ohi; o1++) {
if (somecondition(n1,o1)!=0) {
#pragma omp atomic write
shared_var = 1; //done marker, this will also trigger the other break below
break; //could instead use goto to break out of both loops in 1 go
}
}
#pragma omp atomic read
private_var = shared_var;
if (private_var!=0) break;
}
}
A suitable parallel approach might be, to let each worker examine a part of the overall problem, exactly as in the serial case and use a local (non-shared) variable for the result (retval). Finally do a reduction over all workers on these local variables into a shared overall result.

Performance difference between iterating once and iterating twice?

Consider something like...
for (int i = 0; i < test.size(); ++i) {
test[i].foo();
test[i].bar();
}
Now consider..
for (int i = 0; i < test.size(); ++i) {
test[i].foo();
}
for (int i = 0; i < test.size(); ++i) {
test[i].bar();
}
Is there a large difference in time spent between these two? I.e. what is the cost of the actual iteration? It seems like the only real operations you are repeating are an increment and a comparison (though I suppose this would become significant for a very large n). Am I missing something?
First, as noted above, if your compiler can't optimize the size() method out so it's just called once, or is nothing more than a single read (no function call overhead), then it will hurt.
There is a second effect you may want to be concerned with, though. If your container size is large enough, then the first case will perform faster. This is because, when it gets to test[i].bar(), test[i] will be cached. The second case, with split loops, will thrash the cache, since test[i] will always need to be reloaded from main memory for each function.
Worse, if your container (std::vector, I'm guessing) has so many items that it won't all fit in memory, and some of it has to live in swap on your disk, then the difference will be huge as you have to load things in from disk twice.
However, there is one final thing that you have to consider: all this only makes a difference if there is no order dependency between the function calls (really, between different objects in the container). Because, if you work it out, the first case does:
test[0].foo();
test[0].bar();
test[1].foo();
test[1].bar();
test[2].foo();
test[2].bar();
// ...
test[test.size()-1].foo();
test[test.size()-1].bar();
while the second does:
test[0].foo();
test[1].foo();
test[2].foo();
// ...
test[test.size()-1].foo();
test[0].bar();
test[1].bar();
test[2].bar();
// ...
test[test.size()-1].bar();
So if your bar() assumes that all foo()'s have run, you will break it if you change the second case to the first. Likewise, if bar() assumes that foo() has not been run on later objects, then moving from the second case to the first will break your code.
So be careful and document what you do.
There are many aspects in such comparison.
First, complexity for both options is O(n), so difference isn't very big anyway. I mean, you must not care about it if you write quite big and complex program with a large n and "heavy" operations .foo() and bar(). So, you must care about it only in case of very small simple programs (this is kind of programs for embedded devices, for example).
Second, it will depend on programming language and compiler. I'm assured that, for instance, most of C++ compilers will optimize your second option to produce same code as for the first one.
Third, if compiler haven't optimized your code, performance difference will heavily depend on the target processor. Consider loop in a term of assembly commands - it will look something like this (pseudo assembly language):
LABEL L1:
do this ;; some commands
call that
IF condition
goto L1
;; some more instructions, ELSE part
I.e. every loop passage is just IF statement. But modern processors don't like IF. This is because processors may rearrange instructions to execute them beforehand or just to avoid idles. With the IF (in fact, conditional goto or jump) instructions, processors do not know if they may rearrange operation or not.
There's also a mechanism called branch predictor. From material of Wikipedia:
branch predictor is a digital circuit that tries to guess which way a branch (e.g. an if-then-else structure) will go before this is known for sure.
This "soften" effect of IF's, through if the predictor's guess is wrong, no optimization will be performed.
So, you can see that there's a big amount of conditions for both your options: target language and compiler, target machine, it's processor and branch predictor. This all makes very complex system, and you cannot foresee what exact result you will get. I believe, that if you don't deal with embedded systems or something like that, the best solution is just to use the form which your are more comfortable with.
For your examples you have the additional concern of how expensive .size() is, since it's compared for each time i increments in most languages.
How expensive is it? Well that depends, it's certainly all relative. If .foo() and .bar() are expensive, the cost of the actual iteration is probably minuscule in comparison. If they're pretty lightweight, then it'll be a larger percentage of your execution time. If you want to know about a particular case test it, this is the only way to be sure about your specific scenario.
Personally, I'd go with the single iteration to be on the cheap side for sure (unless you need the .foo() calls to happen before the .bar() calls).
I assume .size() will be constant. Otherwise, the first code example might not give the same as the second one.
Most compilers would probably store .size() in a variable before the loop starts, so the .size() time will be cut down.
Therefore the time of the stuff inside the two for loops will be the same, but the other part will be twice as much.
Performance tag, right.
As long as you are concentrating on the "cost" of this or that minor code segment, you are oblivious to the bigger picture (isolation); and your intention is to justify something that, at a higher level (outside your isolated context), is simply bad practice, and breaks guidelines. The question is too low level and therefore too isolated. A system or program which is set of integrated components will perform much better that a collection of isolated components.
The fact that this or that isolated component (work inside the loop) is fast or faster is irrelevant when the loop itself is repeated unnecessarily, and which would therefore take twice the time.
Given that you have one family car (CPU), why on Earth would you:
sit at home and send your wife out to do her shopping
wait until she returns
take the car, go out and do your shopping
leaving her to wait until you return
If it needs to be stated, you would spend (a) almost half of your hard-earned resources executing one trip and shopping at the same time and (b) have those resources available to have fun together when you get home.
It has nothing to do with the price of petrol at 9:00 on a Saturday, or the time it takes to grind coffee at the café, or cost of each iteration.
Yes, there is a large diff in the time and the resources used. But the cost is not merely in the overhead per iteration; it is in the overall cost of the one organised trip vs the two serial trips.
Performance is about architecture; never doing anything twice (that you can do once), which are the higher levels of organisation; integrated of the parts that make up the whole. It is not about counting pennies at the bowser or cycles per iteration; those are lower orders of organisation; which ajust a collection of fragmented parts (not a systemic whole).
Masseratis cannot get through traffic jams any faster than station wagons.

Why does Pascal forbid modification of the counter inside the for block?

Is it because Pascal was designed to be so, or are there any tradeoffs?
Or what are the pros and cons to forbid or not forbid modification of the counter inside a for-block? IMHO, there is little use to modify the counter inside a for-block.
EDIT:
Could you provide one example where we need to modify the counter inside the for-block?
It is hard to choose between wallyk's answer and cartoonfox's answer,since both answer are so nice.Cartoonfox analysis the problem from language aspect,while wallyk analysis the problem from the history and the real-world aspect.Anyway,thanks for all of your answers and I'd like to give my special thanks to wallyk.
In programming language theory (and in computability theory) WHILE and FOR loops have different theoretical properties:
a WHILE loop may never terminate (the expression could just be TRUE)
the finite number of times a FOR loop is to execute is supposed to be known before it starts executing. You're supposed to know that FOR loops always terminate.
The FOR loop present in C doesn't technically count as a FOR loop because you don't necessarily know how many times the loop will iterate before executing it. (i.e. you can hack the loop counter to run forever)
The class of problems you can solve with WHILE loops is strictly more powerful than those you could have solved with the strict FOR loop found in Pascal.
Pascal is designed this way so that students have two different loop constructs with different computational properties. (If you implemented FOR the C-way, the FOR loop would just be an alternative syntax for while...)
In strictly theoretical terms, you shouldn't ever need to modify the counter within a for loop. If you could get away with it, you'd just have an alternative syntax for a WHILE loop.
You can find out more about "while loop computability" and "for loop computability" in these CS lecture notes: http://www-compsci.swan.ac.uk/~csjvt/JVTTeaching/TPL.html
Another such property btw is that the loopvariable is undefined after the for loop. This also makes optimization easier
Pascal was first implemented for the CDC Cyber—a 1960s and 1970s mainframe—which like many CPUs today, had excellent sequential instruction execution performance, but also a significant performance penalty for branches. This and other characteristics of the Cyber architecture probably heavily influenced Pascal's design of for loops.
The Short Answer is that allowing assignment of a loop variable would require extra guard code and messed up optimization for loop variables which could ordinarily be handled well in 18-bit index registers. In those days, software performance was highly valued due to the expense of the hardware and inability to speed it up any other way.
Long Answer
The Control Data Corporation 6600 family, which includes the Cyber, is a RISC architecture using 60-bit central memory words referenced by 18-bit addresses. Some models had an (expensive, therefore uncommon) option, the Compare-Move Unit (CMU), for directly addressing 6-bit character fields, but otherwise there was no support for "bytes" of any sort. Since the CMU could not be counted on in general, most Cyber code was generated for its absence. Ten characters per word was the usual data format until support for lowercase characters gave way to a tentative 12-bit character representation.
Instructions are 15 bits or 30 bits long, except for the CMU instructions being effectively 60 bits long. So up to 4 instructions packed into each word, or two 30 bit, or a pair of 15 bit and one 30 bit. 30 bit instructions cannot span words. Since branch destinations may only reference words, jump targets are word-aligned.
The architecture has no stack. In fact, the procedure call instruction RJ is intrinsically non-re-entrant. RJ modifies the first word of the called procedure by writing a jump to the next instruction after where the RJ instruction is. Called procedures return to the caller by jumping to their beginning, which is reserved for return linkage. Procedures begin at the second word. To implement recursion, most compilers made use of a helper function.
The register file has eight instances each of three kinds of register, A0..A7 for address manipulation, B0..B7 for indexing, and X0..X7 for general arithmetic. A and B registers are 18 bits; X registers are 60 bits. Setting A1 through A5 has the side effect of loading the corresponding X1 through X5 register with the contents of the loaded address. Setting A6 or A7 writes the corresponding X6 or X7 contents to the address loaded into the A register. A0 and X0 are not connected. The B registers can be used in virtually every instruction as a value to add or subtract from any other A, B, or X register. Hence they are great for small counters.
For efficient code, a B register is used for loop variables since direct comparison instructions can be used on them (B2 < 100, etc.); comparisons with X registers are limited to relations to zero, so comparing an X register to 100, say, requires subtracting 100 and testing the result for less than zero, etc. If an assignment to the loop variable were allowed, a 60-bit value would have to be range-checked before assignment to the B register. This is a real hassle. Herr Wirth probably figured that both the hassle and the inefficiency wasn't worth the utility--the programmer can always use a while or repeat...until loop in that situation.
Additional weirdness
Several unique-to-Pascal language features relate directly to aspects of the Cyber:
the pack keyword: either a single "character" consumes a 60-bit word, or it is packed ten characters per word.
the (unusual) alfa type: packed array [1..10] of char
intrinsic procedures pack() and unpack() to deal with packed characters. These perform no transformation on modern architectures, only type conversion.
the weirdness of text files vs. file of char
no explicit newline character. Record management was explicitly invoked with writeln
While set of char was very useful on CDCs, it was unsupported on many subsequent 8 bit machines due to its excess memory use (32-byte variables/constants for 8-bit ASCII). In contrast, a single Cyber word could manage the native 62-character set by omitting newline and something else.
full expression evaluation (versus shortcuts). These were implemented not by jumping and setting one or zero (as most code generators do today), but by using CPU instructions implementing Boolean arithmetic.
Pascal was originally designed as a teaching language to encourage block-structured programming. Kernighan (the K of K&R) wrote an (understandably biased) essay on Pascal's limitations, Why Pascal is Not My Favorite Programming Language.
The prohibition on modifying what Pascal calls the control variable of a for loop, combined with the lack of a break statement means that it is possible to know how many times the loop body is executed without studying its contents.
Without a break statement, and not being able to use the control variable after the loop terminates is more of a restriction than not being able to modify the control variable inside the loop as it prevents some string and array processing algorithms from being written in the "obvious" way.
These and other difference between Pascal and C reflect the different philosophies with which they were first designed: Pascal to enforce a concept of "correct" design, C to permit more or less anything, no matter how dangerous.
(Note: Delphi does have a Break statement however, as well as Continue, and Exit which is like return in C.)
Clearly we never need to be able to modify the control variable in a for loop, because we can always rewrite using a while loop. An example in C where such behaviour is used can be found in K&R section 7.3, where a simple version of printf() is introduced. The code that handles '%' sequences within a format string fmt is:
for (p = fmt; *p; p++) {
if (*p != '%') {
putchar(*p);
continue;
}
switch (*++p) {
case 'd':
/* handle integers */
break;
case 'f':
/* handle floats */
break;
case 's':
/* handle strings */
break;
default:
putchar(*p);
break;
}
}
Although this uses a pointer as the loop variable, it could equally have been written with an integer index into the string:
for (i = 0; i < strlen(fmt); i++) {
if (fmt[i] != '%') {
putchar(fmt[i]);
continue;
}
switch (fmt[++i]) {
case 'd':
/* handle integers */
break;
case 'f':
/* handle floats */
break;
case 's':
/* handle strings */
break;
default:
putchar(fmt[i]);
break;
}
}
It can make some optimizations (loop unrolling for instance) easier: no need for complicated static analysis to determine if the loop behavior is predictable or not.
From For loop
In some languages (not C or C++) the
loop variable is immutable within the
scope of the loop body, with any
attempt to modify its value being
regarded as a semantic error. Such
modifications are sometimes a
consequence of a programmer error,
which can be very difficult to
identify once made. However only overt
changes are likely to be detected by
the compiler. Situations where the
address of the loop variable is passed
as an argument to a subroutine make it
very difficult to check, because the
routine's behaviour is in general
unknowable to the compiler.
So this seems to be to help you not burn your hand later on.
Disclaimer: It has been decades since I last did PASCAL, so my syntax may not be exactly correct.
You have to remember that PASCAL is Nicklaus Wirth's child, and Wirth cared very strongly about reliability and understandability when he designed PASCAL (and all of its successors).
Consider the following code fragment:
FOR I := 1 TO 42 (* THE UNIVERSAL ANSWER *) DO FOO(I);
Without looking at procedure FOO, answer these questions: Does this loop ever end? How do you know? How many times is procedure FOO called in the loop? How do you know?
PASCAL forbids modifying the index variable in the loop body so that it is POSSIBLE to know the answers to those questions, and know that the answers won't change when and if procedure FOO changes.
It's probably safe to conclude that Pascal was designed to prevent modification of a for loop index inside the loop. It's worth noting that Pascal is by no means the only language which prevents programmers doing this, Fortran is another example.
There are two compelling reasons for designing a language that way:
Programs, specifically the for loops in them, are easier to understand and therefore easier to write and to modify and to verify.
Loops are easier to optimise if the compiler knows that the trip count through a loop is established before entry to the loop and invariant thereafter.
For many algorithms this behaviour is the required behaviour; updating all the elements in an array for example. If memory serves Pascal also provides do-while loops and repeat-until loops. Most, I guess, algorithms which are implemented in C-style languages with modifications to the loop index variable or breaks out of the loop could just as easily be implemented with these alternative forms of loop.
I've scratched my head and failed to find a compelling reason for allowing the modification of a loop index variable inside the loop, but then I've always regarded doing so as bad design, and the selection of the right loop construct as an element of good design.
Regards
Mark

Detecting infinite loop in brainfuck program

I have written a simple brainfuck interpreter in MATLAB script language. It is fed random bf programs to execute (as part of a genetic algorithm project). The problem I face is, the program turns out to have an infinite loop in a sizeable number of cases, and hence the GA gets stuck at the point.
So, I need a mechanism to detect infinite loops and avoid executing that code in bf.
One obvious (trivial) case is when I have
[]
I can detect this and refuse to run that program.
For the non-trivial cases, I figured out that the basic idea is: to determine how one iteration of the loop changes the current cell. If the change is negative, we're eventually going to reach 0, so it's a finite loop. Otherwise, if the change is non-negative, it's an infinite loop.
Implementing this is easy for the case of a single loop, but with nested loops it becomes very complicated. For example, (in what follows (1) refers to contents of cell 1, etc. )
++++ Put 4 in 1st cell (1)
>+++ Put 3 in (2)
<[ While( (1) is non zero)
-- Decrease (1) by 2
>[ While( (2) is non zero)
- Decrement (2)
<+ Increment (1)
>]
(2) would be 0 at this point
+++ Increase (2) by 3 making (2) = 3
<] (1) was decreased by 2 and then increased by 3, so net effect is increment
and hence the code runs on and on. A naive check of the number of +'s and -'s done on cell 1, however, would say the number of -'s is more, so would not detect the infinite loop.
Can anyone think of a good algorithm to detect infinite loops, given arbitrary nesting of arbitrary number of loops in bf?
EDIT: I do know that the halting problem is unsolvable in general, but I was not sure whether there did not exist special case exceptions. Like, maybe Matlab might function as a Super Turing machine able to determine the halting of the bf program. I might be horribly wrong, but if so, I would like to know exactly how and why.
SECOND EDIT: I have written what I purport to be infinite loop detector. It probably misses some edge cases (or less probably, somehow escapes Mr. Turing's clutches), but seems to work for me as of now.
In pseudocode form, here it goes:
subroutine bfexec(bfprogram)
begin
Looping through the bfprogram,
If(current character is '[')
Find the corresponding ']'
Store the code between the two brackets in, say, 'subprog'
Save the value of the current cell in oldval
Call bfexec recursively with subprog
Save the value of the current cell in newval
If(newval >= oldval)
Raise an 'infinite loop' error and exit
EndIf
/* Do other character's processings */
EndIf
EndLoop
end
Alan Turing would like to have a word with you.
http://en.wikipedia.org/wiki/Halting_problem
When I used linear genetic programming, I just used an upper bound for the number of instructions a single program was allowed to do in its lifetime. I think that this is sensible in two ways: I cannot really solve the halting problem anyway, and programs that take too long to compute are not worthy of getting more time anyway.
Let's say you did write a program that could detect whether this program would run in an infinite loop. Let's say for the sake of simplicity that this program was written in brainfuck to analyze brainfuck programs (though this is not a precondition of the following proof, because any language can emulate brainfuck and brainfuck can emulate any language).
Now let's say you extend the checker program to make a new program. This new program exits immediately when its input loops indefinitely, and loops forever when its input exits at some point.
If you input this new program into itself, what will the results be?
If this program loops forever when run, then by its own definition it should exit immediately when run with itself as input. And vice versa. The checker program cannot possibly exist, because its very existence implies a contradiction.
As has been mentioned before, you are essentially restating the famous halting problem:
http://en.wikipedia.org/wiki/Halting_problem
Ed. I want to make clear that the above disproof is not my own, but is essentially the famous disproof Alan Turing gave back in 1936.
State in bf is a single array of chars.
If I were you, I'd take a hash of the bf interpreter state on every "]" (or once in rand(1, 100) "]"s*) and assert that the set of hashes is unique.
The second (or more) time I see a certain hash, I save the whole state aside.
The third (or more) time I see a certain hash, I compare the whole state to the saved one(s) and if there's a match, I quit.
On every input command ('.', IIRC) I reset my saved states and list of hashes.
An optimization is to only hash the part of state that was touched.
I haven't solved the halting problem - I'm detecting infinite loops while running the program.
*The rand is to make the check independent of loop period
Infinite loop cannot be detected, but you can detect if the program is taking too much time.
Implement a timeout by incrementing a counter every time you run a command (e.g. <, >, +, -). When the counter reaches some large number, which you set by observation, you can say that it takes very long time to execute your program. For your purpose, "very long" and infinite is a good-enough approximation.
As already mentioned this is the Halting Problem.
But in your case there might be a solution: The Halting Problem is considering is about the Turing machine, which has unlimited memory.
In case you know that you have a upper limit of memory (e.g. you know you dont use more than 10 memory cells), you can execute your programm and stop it. The idea is that the computation space bounds computation time (as you cant write more than one cell at one step). After you executed as much steps as you can have different memory configurations, you can break. E.g. if you have 3 cells, with 256 conditions, you can have at most 3^256 different states, and so you can stop after executing that many steps. But be careful, there are implicit cells, like the instruction pointer and the registers. You do it even shorter, if you save every state configuration and as soon as you detect one, which you already had, you have an infite loop. This approach is definitly much better in the run time, but therefor needs much more space (here it might be suitable to hash the configurations).
This is not the halting problem, however, it is still not reasonable to try to detect halting even in such a limited machine as a 1000 cell BF machine.
Consider this program:
+[->[>]+<[-<]+]
This program will not repeat until it has filled up the entire of memory which for just 1000 cells will take about 10^300 years.
If I remember correctly, the halting problem proof was only true for some extreme case that involved self reference. However it's still trivial to show a practical example of why you can't make an infinite loop detector.
Consider Fermat's Last Theorem. It's easy to create a program that iterates through every number (or in this case 3 numbers), and detects if it's a counterexample to the theorem. If so it halts, otherwise it continues.
So if you have an infinite loop detector, it should be able to prove this theorem, and many many others (perhaps all others, if they can be reduced to searching for counterexamples.)
In general, any program that involves iterating through numbers and only stopping under some condition, would require a general theorem prover to prove if that condition can ever be met. And that's the simplest case of looping there is.
Off the top of my head (and I could be wrong), I would think it would be a little bit difficult to detect whether or not a program has an infinite loop without actually executing the program itself.
As the conditional execution of portions of the program depends on the execution state of the program, it will be difficult to know the particular state of the program without actually executing the program.
If you don't require that a program with an infinite loop be executed, you could try having an "instructions executed" counter, and only execute a finite number of instructions. This way, if a program does have an infinite loop, the interpreter can terminate the program which is stuck in an infinite loop.

Resources