Will there be functions if there were no stacks? - compilation

I know that a stack data structure is used to store the local variables among many other things of a function that is being run.
I also understand how stack can be used to eleganlty manage recursion.
Suppose there was a machine that did not provide a stack area in memory, I don't think there will be programming languages for the machine that will support recursion. I am also wondering if programming languages for the machine would support functions without recursion.
Please, someone shread some sight on this for me.

A bit of theoretical framework is needed to understand that recursion is indeed not tied to functions at all, rather it is tied to expressiveness.
I won't dig into that, leaving Google fill any gaps.
Yes, we can have functions without a stack.
We don't need even the call/ret machinery for functions, we can just have the compiler inline every function call.
So there is no need for a stack at all.
This considers only functions in the programming sense, not mathematical sense.
A better name would be routines.
Anyway that is a simply proof of concepts that functions, intended as reusable code, don't need a stack.
However not all functions, in the mathematical sense, can implemented this way.
This is analogous to say: "We can have dogs on the bed but not all dogs can be on the bed".
You are in the right track by citing recursion, however when it comes to recursion, we need to be a lot more formal as there are various forms of recursion.
For example in-lining every function call may loop the compiler if the function being inlined is not constrained somehow.
Without digging into the theory, in order to be always sure that our compiler won't loop we can only allow primitive (bounded) recursion.
What you probably means by "recursion" is general recursion, that cannot be achieved by in-lining, we can show that we need an infinite amount of memory for GR and that is the demarcation between PR and GR, not having a stack.
So we can have function without a stack, even recursive (for some form of recursion) functions.
If your question was more practical then just consider MIPS.
There is no stack instructions or stack pointer register in the MIPS ISA, everything related to stack is just convention.
The compiler could use any memory area and treat it like a stack.

Related

Performance of recursive function in register based compiler

I have a question of whether there will be a performance hit when we write recursive functions in Register based compilers like DVM. I'm aware that recursion isn't recommended in compilers with limited depth like compilers for python.
Being register-based does not help for recursive functions, they still have the same problem: conceptually, every call creates a new stack frame. If that is implemented literally, then a recursive call is inherently a little slower than looping, and perhaps more importantly, uses up a finite resource so the recursion depth is limited. A register-based code representation does not have the concept of an operand stack, but that concept is mostly disjoint from the concept of a call stack, which is still necessary just to have general subroutines. Subroutines can be implemented without a call stack if recursion is banned, in which case they need not be re-entrant so the local variables and the variable that holds the return address can be statically allocated.
Going through a trampoline works around the stack growth by quickly returning to a special caller that then calls the continuation, that way recursion doesn't grow the stack at all since the old frame gets deallocated before a new one is created, but it adds even more run-time overhead. Tail call elimination by rewriting the call into a jump achieves a similar effect but by reusing the same frame, with less associated overhead, this requires explicit support from the VM.
Both of those techniques apply equally to stack based and register based representations of the code, which incidentally is primarily a difference in the format in which the code is stored, and need not reflect a difference in the way the code is actually executed: a JIT compiler can turn both of them into whatever form the machine requires.

Algorithms that can only be written in assembly

Any algorithm you can implement in a HLL you can implement in assembly. On the other hand, there are many algorithms you can implement in assembly which you cannot implement in a HLL. - Randall Hyde
I found this statement in the forward to a book on assembly. The book is here: https://courses.engr.illinois.edu/ece390/books/artofasm/fwd/fwd.html#109
Does anyone know an example of this type of algorithm?
It's plain wrong.
You can implement any algorithm (in the CS sense of the word) in any turing complete programming language.
On the other hand, if he would have said something a like: "Some algorithms can be implemented very efficiently, and with ease in assembly, much more so than what is possible in most high level programming languages", then his statement would have made sense...
Interesting text though....
There is a sense in which it is trivially false: in the worst case, you could write an emulator in the HLL and then run the algorithm in there. But that's cheating a bit because now the HLL does not directly implement the algorithm.
A concrete example of what many HLL's can't do (or maybe they can in practice, but it is not guaranteed that they can do it), is directly implementing a XOR linked list. In many languages you just cannot XOR pointers, and/or it wouldn't make sense even if you could (consider garbage collection). Of course you can refer to every node by an integer ID and XOR those, but that's a workaround, not a direct implementation.
HLL's often have trouble implementing unstructured control flow, though many (particularly older) languages offer a goto. That means you may have to jump through hoops to implement a state machine (using a switch in a loop or whatever), instead of letting the state be implied by the program counter.
There are also many algorithms and data structures that rely on operations that don't exist in typical HLL's, for example popcnt or lzcnt, which can again be emulated, but then so can everything.
In case you have strict limitations in terms of memory and/or execution time, you might be forced to use assembly language.
High level languages typically require a run-time library which might be too big to fit into your program memory.
Think of a time-critical driver routine. An interrupt service routine for example. If there are only a few nanoseconds available for the routine, assembly language might be the only viable option.
How about this? You need to write some assembly code in order to access system registers and tables. But onces the setup is done, no CPU instructions are executed (everything's done by the complex CPU exception handling mechanisms) and yet the thing is Turing-complete and can "run" programs.

Iteration vs Recursion efficiency

I got a basic idea of how recursion works - but I've always programmed iteratively.
When we look at the keywords CPU/stack/calls and space, how is recursion different from iterations?
It needs more memory because of running many "stacks(?)" which each (most likely) stores a value. It therefore takes up much more space, than an iterative solution to the same problem. This is generally speaking. There are some cases where Recursion would be better, such as programming Towers of Hanoi and such.
Am I all wrong? I've got an exam soon and I have to prepare a lot of subjects. Recursion is not my strong suit, so I would appreciate some help on this matter :)
This really depends on the nature of the language and compiler/interpreter.
Some functional languages implement tail recursion, for example, to recognize specific cases where the stack frame can be destroyed/freed prior to recursing into the next call. In those special cases among the languages/compilers/interpreters that support it, you can actually have infinite recursion without overflowing the stack.
If you're working with languages that use the hardware stack and don't implement tail recursion, then typically you have to push arguments to the stack prior to branching into a function and pop them off along with return values, so there's somewhat of an implicit data structure there under the hood (if we can call it that). There's all kinds of additional things that can happen here as well, like register shadowing to optimize it.
The hardware stack is usually very efficient, typically just incrementing and decrementing a stack pointer register to push and pop, but it does involve a bit more state and instructions than branching with a loop counter or a condition. Perhaps more importantly, it tends to involve more distant branching to jump into another function's code as opposed to looping within the same body of code which could involve more instruction cache and page misses.
In these types of languages/compilers/interpreters that use the hardware stack and will always overflow it with enough recursion, the loopy routes often provide a performance advantage (but can be more tedious to code).
As a twist, you also have aggressive optimizers sometimes which do all kinds of magic with your code in the process of translating it to machine instructions and linking it like inlining functions and unrolling loops, and when you take all these factors into account, it's often better to just code things a bit more naturally and then measure with a profiler if it could use some tweaking. Of course you shouldn't use recursion in cases that can overflow, but I generally wouldn't worry about the overhead most of the time.

Inlining Algorithm

Does anyone know of any papers discussion inlining algorithms? And closely related, the relationship of parent-child graph to call graph.
Background: I have a compiler written in Ocaml which aggressively inlines functions, primarily as a result of this and some other optimisations it generates faster code for my programming language than most others in many circumstances (including even C).
Problem #1: The algorithm has trouble with recursion. For this my rule is only to inline children into parents, to prevent infinite recursion, but this precludes sibling functions inlining once into each other.
Problem #2: I do not know of a simple way to optimise inlining operations. My algorithm is imperative with mutable representation of function bodies because it does not seem even remotely possible to make an efficient functional inlining algorithm. If the call graph is a tree, it is clear that a bottom up inlining is optimal.
Technical information: Inlining consists of a number of inlining steps. The problem is the ordering of the steps.
Each step works as follows:
we make a copy of the function to be inlined and beta-reduce by
replacing both type parameters and value parameters with arguments.
We then replace return statement with an assignment to a new variable
followed by a jump to the end of the function body.
The original call to the function is then replaced by this body.
However we're not finished. We must also clone all the children of
the function, beta-reducting them as well, and reparent the clones to
the calling function.
The cloning operation makes it extremely hard to inline recursive functions. The usual trick of keeping a list of what is already in progress and just checking to see if we're already processing this call does not work in naive form because the recursive call is now moved into the beta-reduced code being stuffed into the calling function, and the recursion target may have changed to a cloned child. However that child, in calling the parent, is still calling the original parent which calls its child, and now the unrolling of the recursion will not stop. As mentioned I broke this regress by only allowing inlining a recursive call to a child, preventing sibling recursions being inlined.
The cost of inlining is further complicated by the need to garbage collect unused functions. Since inlining is potentially exponential, this is essential. If all the calls to a function are inlined, we should get rid of the function if it has not been inlined into yet, otherwise we'll waste time inlining into a function which is no longer used. Actually keeping track of who calls what is extremely difficult, because when inlining we're not working with an actual function representation, but an "unravelled" one: for example, the list of instructions is being processed sequentially and a new list built up, and at any one point in time there may not be a coherent instruction list.
In his ML compiler Steven Weeks chose to use a number of small optimisations applied repeatedly, since this made the optimisations easy to write and easy to control, but unfortunately this misses a lot of optimisation opportunities compared to a recursive algorithm.
Problem #3: when is it safe to inline a function call?
To explain this problem generically: in a lazy functional language, arguments are wrapped in closures and then we can inline an application; this is the standard model for Haskell. However it also explains why Haskell is so slow. The closures are not required if the argument is known, then the parameter can be replaced directly with its argument where is occurs (this is normal order beta-reduction).
However if it is known the argument evaluation is not non-terminating, eager evaluation can be used instead: the parameter is assigned the value of the expression once, and then reused. A hybrid of these two techniques is to use a closure but cache the result inside the closure object. Still, GHC hasn't succeeded in producing very efficient code: it is clearly very difficult, especially if you have separate compilation.
In Felix, I took the opposite approach. Instead of demanding correctness and gradually improving efficiency by proving optimisations preserved semantics, I mandate that the optimisation defines the semantics. This guarantees correct operation of the optimiser at the expense of uncertainty about what how certain code will behave. The idea is to provide ways for the programmer to force the optimiser to conform to intended semantics if the default optimisation strategy is too aggressive.
For example, the default parameter passing mode allows the compiler to chose whether to wrap the argument in a closure, replace the parameter with the argument, or assign the argument to the parameter. If the programmer wants to force a closure, they can just pass in a closure. If the programmer wants to force eager evaluation, they mark the parameter var.
The complexity here is much greater than a functional programming language: Felix is a procedural language with variables and pointers. It also has Haskell style typeclasses. This makes the inlining routine extremely complex, for example, type-class instances replace abstract functions whenever possible (due to type specialisation when calling a polymorphic function, it may be possible to find an instance whilst inlining, so now we have a new function we can inline).
Just to be clear I have to add some more notes.
Inlining and several other optimisations such as user defined term reductions, typeclass instantiations, linear data flow checks for variable elimination, tail rec optimisation, are done all at once on a given function.
The ordering problem isn't the order to apply different optimisations, the problem is to order the functions.
I use a brain dead algorithm to detect recursion: I build up a list of everything used directly by a each function, find the closure, and then check if the function is in the result. Note the usage set is built up many times during optimisation, and this is a serious bottleneck.
Whether a function is recursive or not can change unfortunately. A recursive function might become non-recursive after tail rec optimisation. But there is a much harder case: instantiating a typeclass "virtual" function can make what appeared to be non-recursive recursive.
As to sibling calls, the problem is that given f and g where f calls g and g calls f I actually want to inline f into g, and g into f .. once. My infinite regress stopping rule is to only allow inlining of f into g if they're mutually recursive if f is a child of g, which excludes inlining siblings.
Basically I want to "flatten out" all code "as much as possible".
I realize you probably already know all this, but it seems important to still write it in full, at least for further reference.
In the functional community, there is some litterature mostly from the GHC people. Note that they consider inlining as a transformation in the source language, while you seem to work at a lower level. Working in the source language -- or an intermediate language of reasonably similar semantics -- is, I believe, a big help for simplicity and correctness.
GHC Wiki : Inlining (contains a bibliography)
Secrets of the Glasgow Haskell inliner
For the question of the ordering compiler passes, this is quite arcane. Still in a Haskell setting, there is the Compilation by Transformation in a Non-strict Functional Language PhD Thesis which discusses the ordering of different compiler passes (and also inlining).
There is also the quite recent paper on Compilation by Equality Saturation which propose a novel approach to optimisation passes ordering. I'm not sure it has yet demonstrated applicability at a large scale, but it's certainly an interesting direction to explore.
As for the recursion case, you could use Tarjan algorithm on your call graph to detect circular dependency clusters, and exclude them from inlining. It won't affect sibling calls.
http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm

Cons of first class continuations

What are some of the criticisms leveled against exposing continuations as first class objects?
I feel that it is good to have first class continuations. It allow complete control over the execution flow of instructions. Advanced programmers can develop intuitive solutions to certain kind of problems. For instance, continuations are used to manage state on web servers. A language implementation can provide useful abstractions on top of continuations. For example, green threads.
Despite all these, are there strong arguments against first class continuations?
The reality is that many of the useful situations where you could use continuations are already covered by specialized language constructs: throw/catch, return, C#/Python yield. Thus, language implementers don't really have all that much incentive to provide them in a generalized form usable for roll-your-own solutions.
In some languages, generalized continuations are quite hard to implement efficiently. Stack-based languages (i.e. most languages) basically have to copy the whole stack every time you create a continuation.
Those languages can implement certain continuation-like features, those that don't break the basic stack-based model, a lot more efficiently than the general case, but implementing generalized continuations is quite a bit harder and not worth it.
Functional languages are more likely to implement continuations for a couple of reasons:
They are frequently implemented in continuation passing style, which means the "call stack" is probably a linked list allocated on the heap. This makes it trivial to pass a pointer to the stack as a continuation, since you don't need to overwrite the stack context when you pop the current frame and push a new one. (I've never implemented CPS but that's my understanding of it.)
They favor immutable data bindings, which make your old continuation a lot more useful because you will not have altered the contents of variables that the stack pointed to when you created it.
For these reasons, continuations are likely to remain mostly just in the domain of functional languages.
First up, there is more then just call/cc when it comes to continuation. I suggest starting with Mark Feelys paper: A better API for first class continuations
Next up I suggest reading about the control operators shift and reset, which is a different way of representing contunations.
A significant objection is implementation cost. If the runtime uses a stack, then first-class continuations require a stack copy at some point. The copy cost can be controlled (see Representing Control in the Presence of First-Class Continuations for a good strategy), but it also means that mutable variables cannot be allocated on the stack. This isn't an issue for functional or mostly-functional (e.g., Scheme) languages, but this adds significant overhead for OO languages.
Most programmers don't understand them. If you have code that uses them, it's harder to find replacement programmers who will be able to work with it.
Continuations are hard to implement on some platforms. For example, JRuby doesn't support continuations.
First-class continuations undermine the ability to reason about code, especially in languages that allow continuations to be imperatively assigned to variables, because the insides of closures can be brought alive again in hairy ways.
Cf. Kent Pitman's complaint about continuations, about the tricky way that unwind-protect interacts with call/cc
Call/cc is the 'goto' of advanced functional programming (a la the example here).
in ruby 1.8 the implementation was extremely slow. better in 1.9, and of course most schemes have had them built in and performing well from the outset.

Resources