Usually when the program crashes due to stack overflow, it means there was a recursive call without a proper exit condition. But are there other ways to get the stack overflow?
If you allocate on the stack, yes, it can happen depending on the language. For example, using the C99 function alloca: it specifically says on the man page:
The allocation made may exceed the bounds of the stack, or even go further into other objects in memory, and alloca() cannot determine such an error.
Related
I have a question of whether there will be a performance hit when we write recursive functions in Register based compilers like DVM. I'm aware that recursion isn't recommended in compilers with limited depth like compilers for python.
Being register-based does not help for recursive functions, they still have the same problem: conceptually, every call creates a new stack frame. If that is implemented literally, then a recursive call is inherently a little slower than looping, and perhaps more importantly, uses up a finite resource so the recursion depth is limited. A register-based code representation does not have the concept of an operand stack, but that concept is mostly disjoint from the concept of a call stack, which is still necessary just to have general subroutines. Subroutines can be implemented without a call stack if recursion is banned, in which case they need not be re-entrant so the local variables and the variable that holds the return address can be statically allocated.
Going through a trampoline works around the stack growth by quickly returning to a special caller that then calls the continuation, that way recursion doesn't grow the stack at all since the old frame gets deallocated before a new one is created, but it adds even more run-time overhead. Tail call elimination by rewriting the call into a jump achieves a similar effect but by reusing the same frame, with less associated overhead, this requires explicit support from the VM.
Both of those techniques apply equally to stack based and register based representations of the code, which incidentally is primarily a difference in the format in which the code is stored, and need not reflect a difference in the way the code is actually executed: a JIT compiler can turn both of them into whatever form the machine requires.
I have some Fortran code that calls RESHAPE to reorder a matrix such that the dimension that I am now about to loop over becomes the first varying dimension (Column-major order in Fortran).
This has nothing to do with C/Fortran interoperability.
Now the matrix is rather large and when I call the RESHAPE function I get a seg fault which I am very confident is a stack overflow. I know this because I can compile my code in ifort with -heap-arrays and the problem disappears.
I do not want to modify the stack-size. This code needs to be portable for any computer without the user having to concern himself with stack-size.
Is there someway I can get this call of the RESHAPE function to use the heap and not the stack for its internal memory use.
Worst case I will have to 'roll my own' RESHAPE function for this instance but I wish there was a better way.
The Fortran standard does not speak about stack and heap at all, that is an implementation detail. In which part of memory something is placed and whether there are any limits is implementation defined.
Therefore it is impossible to control the stack or heap behaviour from the Fortran code itself. The compiler must be instructed by other means if you want to specify this and the compiler options are used for that. Intel Fortran uses stack by default and has the -heap-arrays n option (n is the limit in kB), gfortran is slightly different and has the opposite -fstack-arrays option (included in -Ofast, but can be disabled).
This is valid for all kinds of temporaries and automatic arrays.
I am mainly thinking about Windows.
AFAIK on such platforms there are many stacks, each program, or maybe even each thread has its own stack, and each of such threads can push bytes onto it - AFAIK every of such push should be checked in runtime in case of stack overflow - so it seem it is some cost related to each and every push (something like arrays bounds checking) - how exactly this checking is implemented ?
On old machines as I remember there was no checking but some fff become 000 so there was no cost of checking, but today on windows platform it seem to me that probably every stack is bound checked - but I do not know how it is implemented.
I'm not aware of any fully-compiled language on Windows or Linux platforms that does call stack bounds checking by default. Thus, overflowing the available stack space leads to a segmentation fault as described in (for instance) the questions Segmentation fault due to recursion and What is the difference between a segmentation fault and a stack overflow?.
The benefit of not doing bounds checking, as observed in the question is that the code runs more quickly. If one wanted to bounds check for some particular reason, one could insert the bounds checks for that specific case.
I'm getting "Program stack overflow RESET" message while running my program. So I set added a counter to see how many times I'm recursively calling the main function in my program. Turns out that it is around 30,000 times and the data I'm stacking are lists of length around 10 elements, which I think are not so many. My question is whether this amount of recursive call and memory usage are common or not, or is it more likely that I'm doing something wrong? I checked the resource manager of vista and found the memory only grew for like 1MB for lisp.exe process. And how do I adjust the stack overflow limit of CLisp?
http://clisp.cons.org/impnotes.html#faq-stack
Note that if you do tail calls and compile your function(s) there will be no limit at all.
1 MB seems to be the default stack size on Windows. I do not know if it is possible to change it without relinking the program, but in any case I would recommend either converting the program to tail-recursive form and using the CLisp byte compiler, which will optimize it away, or just converting it to iterative form. While many Common Lisp compilers do implement tail call optimization, the standard does not require it, so unbounded recursion should not be used.
I'm a beginner in assembly language and have noticed that the x86 code emitted by compilers usually keeps the frame pointer around even in release/optimized mode when it could use the EBP register for something else.
I understand why the frame pointer might make code easier to debug, and might be necessary if alloca() is called within a function. However, x86 has very few registers and using two of them to hold the location of the stack frame when one would suffice just doesn't make sense to me. Why is omitting the frame pointer considered a bad idea even in optimized/release builds?
Frame pointer is a reference pointer allowing a debugger to know where local variable or an argument is at with a single constant offset. Although ESP's value changes over the course of execution, EBP remains the same making it possible to reach the same variable at the same offset (such as first parameter will always be at EBP+8 while ESP offsets can change significantly since you'll be pushing/popping things)
Why don't compilers throw away frame pointer? Because with frame pointer, the debugger can figure out where local variables and arguments are using the symbol table since they are guaranteed to be at a constant offset to EBP. Otherwise there isn't an easy way to figure where a local variable is at any point in code.
As Greg mentioned, it also helps stack unwinding for a debugger since EBP provides a reverse linked list of stack frames therefore letting the debugger to figure out size of stack frame (local variables + arguments) of the function.
Most compilers provide an option to omit frame pointers although it makes debugging really hard. That option should never be used globally, even in release code. You don't know when you'll need to debug a user's crash.
Just adding my two cents to already good answers.
It's part of a good language architecture to have a chain of stack frames. The BP points to the current frame, where subroutine-local variables are stored. (Locals are at negative offsets, and arguments are at positive offsets.)
The idea that it is preventing a perfectly good register from being used in optimization raises the question: when and where is optimization actually worthwhile?
Optimization is only worthwhile in tight loops that 1) do not call functions, 2) where the program counter spends a significant fraction of its time, and 3) in code the compiler actually will ever see (i.e. non-library functions). This is usually a very small fraction of the overall code, especially in large systems.
Other code can be twisted and squeezed to get rid of cycles, and it simply won't matter, because the program counter is practically never there.
I know you didn't ask this, but in my experience, 99% of performance problems have nothing at all to do with compiler optimization. They have everything to do with over-design.
It depends on the compiler, certainly. I've seen optimized code emitted by x86 compilers that freely uses the EBP register as a general purpose register. (I don't recall which compiler I noticed that with, though.)
Compilers may also choose to maintain the EBP register to assist with stack unwinding during exception handling, but again this depends on the precise compiler implementation.
However, x86 has very few registers
This is true only in the sense that opcodes can only address 8 registers. The processor itself will actually have many more registers than that and use register renaming, pipelining, speculative execution, and other processor buzzwords to get around that limit. Wikipedia has a good introductory paragraph as to what an x86 processor can do to overcome the register limit: http://en.wikipedia.org/wiki/X86#Current_implementations.
Using stack frames has gotten incredibly cheap in any hardware even remotely modern. If you have cheap stack frames then saving a couple of registers isn't as important. I'm sure fast stack frames vs. more registers was an engineering trade-off, and fast stack frames won.
How much are you saving going pure register? Is it worth it?