Make-array in SBCL - compilation

How does make-array work in SBCL? Are there some equivalents of new and delete operators in C++, or is it something else, perhaps assembler level?
I peeked into the source, but didn't understand anything.

When using SBCL compiled from source and an environment like Emacs/Slime, it is possible to navigate the code quite easily using M-. (meta-point). Basically, the make-array symbol is bound to multiple things: deftransform definitions, and a defun. The deftransform are used mostly for optimization, so better just follow the function, first.
The make-array function delegates to an internal make-array% one, which is quite complex: it checks the parameters, and dispatches to different specialized implementation of arrays, based on those parameters: a bit-vector is implemented differently than a string, for example.
If you follow the case for simple-array, you find a function which calls allocate-vector-with-widetag, which in turn calls allocate-vector.
Now, allocate-vector is bound to several objects, multiple defoptimizers forms, a function and a define-vop form.
The function is only:
(defun allocate-vector (type length words)
(allocate-vector type length words))
Even if it looks like a recursive call, it isn't.
The define-vop form is a way to define how to compile a call to allocate-vector. In the function, and anywhere where there is a call to allocate-vector, the compiler knows how to write the assembly that implements the built-in operation. But the function itself is defined so that there is an entry point with the same name, and a function object that wraps over that code.
define-vop relies on a Domain Specific Language in SBCL that abstracts over assembly. If you follow the definition, you can find different vops (virtual operations) for allocate-vector, like allocate-vector-on-heap and allocate-vector-on-stack.
Allocation on heap translates into a call to calc-size-in-bytes, a call to allocation and put-header, which most likely allocates memory and tag it (I followed the definition to src/compiler/x86-64/alloc.lisp).
How memory is allocated (and garbage collected) is another problem.
allocation emits assembly code using %alloc-tramp, which in turns executes the following:
(invoke-asm-routine 'call (if to-r11 'alloc-tramp-r11 'alloc-tramp) node)
There are apparently assembly routines called alloc-tramp-r11 and alloc-tramp, which are predefined assembly instructions. A comment says:
;;; Most allocation is done by inline code with sometimes help
;;; from the C alloc() function by way of the alloc-tramp
;;; assembly routine.
There is a base of C code for the runtime, see for example /src/runtime/alloc.c.
The -tramp suffix stands for trampoline.
Have also a look at src/runtime/x86-assem.S.

Related

What's the differences from inline and block compilation of SBCL?

Several weeks ago, SBCL updated 2.0.2 and brought the Block compilation feature. I have read this article to understand what it is.
I have a question, what's the difference between (declaim (inline 'some-function)) and Block compilation? Block compilation is automatic by the compiler?
Thanks.
Inline compilation is a specific optimization technique. A function being called is directly integrated into the calling function - usually using its source code - and then compiled.
This means that the inlined function might not be inlined only in one function, but in multiple functions.
Advantage: the overhead of calling a function disappears.
Disadvantage: the code size increases and the calling function(s) needs to be recompiled, when the inlined function changed and we want this change to become visible. Macros have the same problem.
Block compilation means that a bunch of code gets compiled together with different semantic constraints and that this enables the compiler to do a bunch of new optimizations.
Common Lisp has in the standard support for block compilation of single files. It allows the file compiler to assume that a file is such a block of code.
Example from the Common Lisp standard:
3.2.2.3 Semantic Constraints
A call within a file to a named function that is defined in the same file refers to that function, unless that function has been declared notinline. The consequences are unspecified if functions are redefined individually at run time or multiply defined in the same file.
This allows the code to call a global function and not use the symbol's function cell for the call. Thus this disables late binding for global function calls - in this file and for functions in this file.
It's not said how this can be achieved, but the compiler might just allocate the code somewhere and the calls just jump there.
So this part of block compilation is defined in the standard and some compilers are doing that.
Block compilation for multiple files
If the file compiler can use block compilation for one file, then what about multiple files? A few compilers can also tell the file compiler that several files make a block for compilation. CMUCL does that. SBCL was derived and simplified from CMUCL and lacks it until now. I think Lucid Common Lisp (which is no longer actively sold) did support something like that, too.
Might be useful to add this to SBCL, too.

Documentation for architecture-specific Golang function

I have a function that I would like to provide an assembly implementation for
on amd64 architecture. For the sake of discussion let's just suppose it's an
Add function, but it's actually more complicated than this. I have the
assembly version working but my question concerns getting the godoc to display
correctly. I have a feeling this is currenty impossible, but I wanted to seek
advice.
Some more details:
The assembly implementation of this function contains only a few
instructions. In particular, the mere cost of calling the function is a
significant part of the entire cost.
It makes use of special instructions (BMI2) therefore can only be used
following a CPUID capability check.
The implementation is structured like this gist. At a high level:
In the generic (non-amd64 case) the function is defined by delegating to
addGeneric.
In the amd64 case the function is actually a variable, initially set to
addGeneric but replaced by addAsm in the init function if a cpuid
check passes.
This approach works. However the godoc output is crappy because in the
amd64 case the function is actually a variable. Note godoc appears to be
picking up the same build tags as the machine it's running on. I'm not sure
what godoc.org would do.
Alternatives considered:
The Add function delegates to addImpl. Then we pull some similar trick
to replace addImpl in the amd64 case. The problem with this is (in my
experiments) Go doesn't seem to be able to inline the call, and the assembly
is now wrapped in two function calls. Since the assembly is so small already
this has a noticable impact on performance.
In the amd64 case we define a plain function Add that has the useAsm
check inside it, and calls one of addGeneric and addAsm depending on the
result. This would have an even worse impact on performance.
So I guess the questions are:
Is there a better way to structure the code to achieve the performance I
want, and have it appear properly in documentation.
If there is no alternative, is there some other way to "trick" godoc?
See math.Sqrt for an example of how to do this.
Write a stub function with the documentation
Write a generic implementation as an unexported function.
For each architecture, write a function in assembler that jumps to the unexported generic implementation or implements the function directly.
To handle the cpuid check, set a package variable in init() and conditionally jump based on that variable in the assembly implementation.

when to free a closure's memory in a lisp interpreter

I'm writing a simple lisp interpreter from scratch. I have a global environment that top level variables are bound in during evaluation of all the forms in a file. When all the forms in the file have been evaluated, the top level env and all of the key value data structs inside of it are freed.
When the evaluator encounters a lambda form, it creates a PROC object that contains 3 things: a list of arguments to be bound in a local frame when the procedure is applied, the body of the function, and a pointer to the environment it was created in. For example:
(lambda (x) x)
would produce something internally like:
PROC- args: x,
body: x,
env: pointer to top level env
When the PROC is applied, a new environment is created for the frame and the local bindings are staged there to allow the body to be evaluated with the appropriate bindings. This frame environment contains a pointer to its closure to allow variable lookup inside of THAT. In this case, that would be the global environment. After the PROC body is evaluated, I can free all the cells associated with it including its frame environment, and exit with no memory leaks.
My problem is with higher order functions. Consider this:
(define conser
(lambda (x)
(lambda (y) (cons x y))))
A function that takes one argument and produces another function that will cons that argument to something you pass into it. So,
(define aconser (conser '(1)))
Would yield a function that cons'es '(1) to whatever is passed into it. ex:
(aconser '(2)) ; ((1) 2)
My problem here is that aconser must retain a pointer to the environment it was created in, namely that of conser when is was produced via the invocation (conser '(1)). When aconser the PROC is applied, its frame must point to the frame of conser that existed when aconser was defined, so I can't free the frame of conser after applying it. I don't know how/the best way to both free the memory associated with a lambda frame when it is applied and also support this kind of persistent higher order function.
I can think of some solutions:
some type of ARC
copying the enclosing environment into the frame of the evaluated PROC when it is produced
This seems to be what is being implied here. So, instead of saving a pointer in the PROC object to its closure, I would... copy the closure environment and store a pointer to that directly in the cell? Would this not just be kicking the can one level deeper and result in the same problem?
recursively substituting the labels at read time inside of the body of the higher order function
I am worried I might be missing something very simple here, and also I am curious as to how this procedure is supported in other implementations of lisp and other languages with closures in general. I have not had much luck searching for answers because the question is very specific, perhaps even to this implementation (that I am admittedly just pulling out of my hat as a learning project) and much of what I am able to find simply explains the particulars of closures from the language being implemented's perspective, not from the language that the language is being implemented in's.
Here is a link to the relevant line in my source, if it is helpful, and I am happy to elaborate if this question is not detailed enough to describe the problem thoroughly. Thanks!
The way this is handled usually in naive interpreters is to use a garbage-collector (GC) and allocate your activation frames in the GC'd heap. So you never explicitly free those frames, you let the GC free them when applicable.
In more sophisticated implementations, you can use a slightly different approach:
when a closure is created, don't store a pointer to the current environment. Instead, copy the value of those variables which are used by the closure (it's called the free variables of the lambda).
and change the closure's body to use those copies rather than look in the environment for those variables. It's called closure conversion.
Now you can treat your environment as a normal stack, and free activation frames as soon as you exit a scope.
You still need a GC to decide when closures can be freed.
this in turn requires an "assignment conversion": copying the value of variables implies a change of semantics if those variables get modified. So to recover the original semantics, you need to look for those variables which are "copied into a closure" as well as "modified", and turn them into "reference cells" (e.g. a cons cell where you keep the value in the car), so that the copy doesn't copy the value any more, but just copies a reference to the actual place where the value is kept. [ Side note: such an implementation obviously implies that avoiding setq and using a more functional style may end up being more efficient. ]
The more sophisticated implementation also has the advantage that it can provide a safe for space semantics: a closure will only hold on to data to which it actually refers, contrary to the naive approach where closures end up referring to the whole surrounding environment and hence can prevent the GC from collecting data that is not actually referenced but just happened to be in the environment at the time it was captured by the closure.

How is Ruby's throw-catch implemented?

In ruby you can throw :label so long as you've wrapped everything in a catch(:label) do block.
I want to add this to a custom lispy language but I'm not sure how it's implemented under the hood. Any pointers?
This is an example of a non-local exit. If you are using your host language's (in this case C) call stack for function calls in your target language (so e.g. a function call in your lisp equates to a function call in C), then the easiest way is to use your host language's form of non-local exit. In C that means setjmp/longjmp.
If, however, you are maintaining your target language's call stack separately then you have many options for how to do this. One really simple way would be to have each lexical-scope exit yield two values; the actual value returned, and an exception state, if any. Then you can check for the exception at runtime and propagate this value up. This has the downside of incurring extra cost to function calls when no condition is signaled, but may be sufficient for a toy language.
The book "Lisp In Small Pieces" covers about a half-dozen ways of handling this, if you're interested.

Compiling Fortran external symbols

When compiling fortran code into object files: how does the compiler determine the symbol names?
when I use the intrinsic function "getarg" the compiler converts it into a symbol called "_getarg#12"
I looked in the external libraries and found that the symbol name inside is called "_getarg#16" what is the significance of the "#[number]" at the end of "getarg" ?
_name#length is highly Windows-specific name mangling applied to the name of routines that obey the stdcall (or __stdcall by the name of the keyword used in C) calling convention, a variant of the Pascal calling convention. This is the calling convention used by all Win32 API functions and if you look at the export tables of DLLs like KERNEL32.DLL and USER32.DLL you'd see that all symbols are named like this.
The _...#length decoration gives the number of bytes occupied by the routine arguments. This is necessary since in the stdcall calling conventions it is the callee who cleans up the arguments from the stack and not the caller as is the case with the C calling convention. When the compiler generates a call to func with two 4-byte arguments, it puts a reference to _func#8 in the object code. If the real func happens to have different number or size of arguments, its decorated name would be something different, e.g. _func#12 and hence a link error would occur. This is very useful with dynamic libraries (DLLs). Imagine that a DLL was replaced with another version where func takes one additional argument. If it wasn't for the name mangling (the technical term for prepending _ and adding #length to the symbol name), the program would still call into func with the wrong arguments and then func would increment the stack pointer with more bytes than was the size of the passed argument list, thus breaking the caller. With name mangling in place the loader would not launch the executable at all since it would not be able to resolve the reference to _func#8.
In your case it looks like the external library is not really intended to be used with this compiler or you are missing some pragma or compiler option. The getarg intrinsic takes two arguments - one integer and one assumed-sized character array (string). Some compilers pass the character array size as an additional argument. With 32-bit code this would result in 2 pointers and 1 integer being passed, totalling in 12 bytes of arguments, hence the _getarg#12. The _getarg#16 could be, for example, 64-bit routine with strings being passed by some kind of descriptor.
As IanH reminded me in his comment, another reason for this naming discrepancy could be that you are calling getarg with fewer arguments than expected. Fortran has this peculiar feature of "prototypeless" routine calls - Fortran compilers can generate calls to routines without actually knowing their signature, unlike in C/C++ where an explicit signature has to be supplied in the form of a function prototype. This is possible since in Fortran all arguments are passed by reference and pointers are always the same size, no matter the actual type they point to. In this particular case the stdcall name mangling plays the role of a very crude argument checking mechanism. If it wasn't for the mangling (e.g. on Linux with GNU Fortran where such decorations are not employed or if the default calling convention was cdecl) one could call a routine with different number of arguments than expected and the linker would happily link the object code into an executable that would then most likely crash at run time.
This is totally implementation dependent. You did not say, which compiler do you use. The (nonstandard) intrinsic can exist in more versions for different integer or character kinds. There can also be more versions of the runtime libraries for more computer architectures (e.g. 32 bit and 64 bit).

Resources