How to prevent Closure Compiler from renaming any property or method of a specific object? - extern

I am working with a huge 3rdparty library (Babylon JS) that will be served from its own CDN and cannot be included in my Closure Compiler run.
The library contains one object and everything defined as parts of it.
It has no externs file available so I started to write one but it is growing quickly. It would be easier to just tell Closure Compiler to not mangle any properties I am setting, including the ones I am setting on objects created by constructors on the object.
The Closure Compiler has no feature that would allow you to say "don't rename any property on this object" except to disable property renaming entirely. The general idea is that it would be very easy for an "unrenamable object" to leak into a loosely typed value ('unknown', Object, etc) and disable renaming for the entire program. And that would make maintaining the expected optimizations for larger projects difficult. However, that is certainly something I would like the team to revisit at some point.


Common lisp best practices for splitting code between files

I'm moderately new to common lisp, but have extended experience with other 'separate compilation' languages (think C/C++/FORTRAN and such)
I know how to do an ASDF system definition. I know how to separate stuff in packages. I'm using SBCL, by the way.
The question is this: what's the best practice for splitting code (large packages) between .lisp files? I mean, in C there are include files, while lisp lives with the current image state. So with multiple files I need to handle dependencies or serial order in the system definition. But without something like forward declarations it's painful.
Simple example on what I want to do: I have, for example, two defstructs that are part of the same bigger data structure (like struct1 is a parent of some set of struct2). Some functions works on one, some other works on the other and some other use both.
So I would have: a packages.lisp, a fun1.lisp (with the first defstruct and related functions), a fun2.lisp (with the other defstruct and functions) and a funmix.lisp (with functions that use both). In an ideal world everything is sealed and compiling these in this order would be fine. As most of you know, this in practice almost never happen.
If I need to use struct2 functions from the struct1 ones I would need to either reorder or add a dependency. But then if there's some kind of back call (that can't be done with a closure) I would have struct1.lisp depending on struct2.lisp and vice-versa which is obviously not valid. So what? I could break the loop putting the defstruct in a separate file (say, structs.lisp) but what if either of the struct's function need to access the common functions in the third file? I would like to avoid style notes.
What's the common way to solve this, i.e. keeping loosely related code in the same file but still be able to interface to other ones. Is the correct solution to seal everything in a compilation unit (a single file)? use a package for every file with exports?
Lisp dependencies are simple, because in many cases, a Lisp implementation doesn't need to process the definition of something in order to compile its use.
Some exceptions to the rule are:
Macros: macros must be loaded in order to be expanded. There is a compile-time dependency between a file which uses macro and the file which defines them.
Packages: a package foo must be defined in order to use symbols like foo:bar or foo::priv. If foo is defined by a defpackage form in some foo.lisp file, then that file has to be loaded (either in source or compiled form).
Constants: constants defined with defconstant should be seen before their use. Similar remarks apply to inline functions, compiler macros.
Any custom things in a "domain specific language" which enforces definition before use. E.g. if Whizbang Inference Engine needs rules to be defined when uses of the rules are compiled, you have to arrange for that.
For certain diagnostics to be suppressed like calls to undefined functions, the defining and using files must be taken to be as a single compilation unit. (See below.)
All the above remarks also have implications for incremental recompilation.
When there is dependency like the above between files so that one is a prerequisite of the other, when the prerequisite is touched, the dependent one must be recompiled.
How to split code into files is going to be influenced by all the usual things: cohesion, coupling and what have you. Common-Lisp-specific reasons to keep certain things together in one file is inlining. The call to a function which is in the same file as the caller may be inlined. If your program supports any in-service upgrade, the granularity of code loading is individual files. If some functions foo and bar should be independently redefinable, don't put them in the same file.
Now about compilation units. Suppose you have a file foo.lisp which defines a function called foo and bar.lisp which calls (foo). If you just compile bar.lisp, you will likely get a warning that an undefined function foo has been called. You could compile foo.lisp first and then load it, and then compile bar.lisp. But that will not work if there is a circular reference between the two: say foo.lisp also calls (bar) which bar.lisp defines.
In Common Lisp, you can defer such warnings to the end of a compilation unit, and what defines a compilation unit isn't a single file, but a dynamic scope established by a macro called with-compilation-unit. Simply put, if we do this:
(compile-file "foo.lisp") ;; contains (defun foo () (bar))
(compile-file "bar.lisp")) ;; contains (defun bar () (foo))
If a compile-file isn't surrounded by with-compilation-unit then there is a compilation unit spanning that file. Otherwise, the outermost nesting of the with-compilation-unit macro determines the scope of what is in the compilation unit.
Warnings about undefined functions (and such) are deferred to the end of the compilation unit. So by putting foo.lisp and bar.lisp compilation into one unit, we suppress the warnings about either foo or bar not being defined and we can compile the two in any order.
Build systems use with-compilation-unit under the hood, as appropriate.
The compilation unit isn't about dependencies but diagnostics. Above, we don't have a compile time dependency. If we touch foo.lisp, bar.lisp doesn't have to be recompiled or vice versa.
By and large, Lisp codebases don't have a lot of hard dependencies among the files. Incremental compilation often means that just the affected files that were changed have to be recompiled. The C or C++ problem that everything has to be rebuilt because a core header file was touched is essentially nonexistent.
but what if
No matter how you first organize your code, if you change it significantly you are going to have to refactor. IMO there is no ideal way of grouping dependencies in advance.
As a rule of thumb it is generally safe to define generic functions first, then types, then actual methods, for example. For non-generic functions, you can cut circular dependencies by adding forward declarations:
(declaim (ftype function ...))
Having too much circular dependency is a bit of a code smell.
Is the correct solution to seal everything in a compilation unit
Yes, if you group the definitions in the same compilation unit (the same file), the file compiler will be able to silence the style notes until it reaches the end of file: at this point it knows if there are still missing references or if all the cross-references are resolved.
But then if there's some kind of back call (that can't be done with a closure)
If you have a specific example in mind please share, but typically you can define struct1 and its functions in a way that can be self-contained; maybe it can accept a map that binds event names to callbacks:
(make-struct-1 :callbacks (list :on-empty one-is-empty
:on-full one-is-full))
Similarly, struct2 can accept callbacks too (Dependency Injection) and the main struct ties them using closures (?).
Alternatively, you can design your data-structures so that they signal conditions, and the in the caller code you intercept them to bind things together.

Is there a way to somehow group several symbols from different files together so that if one of them is referenced, all are linked?

A bit of context first.
I'm curently working on a modular (embedded) microOS with drivers, ports, partitions and other such objects represented as structures, with distinct "operations" structures containing pointer to what you would call their methods. Nothing fancy there.
I have macros and linker script bits to make it so that all objects of a given type (say, all drivers), though their definitions are scattered across source files, are disposed as in an array, somewhere in flash, but in a way that lets the linker (I work with GNU GCC/LD.) garbage collected those who aren't explicitly referenced in the code.
However, after a few years refining the system and increasing its flexibility, I come at a point where it is too flash-greedy for small to medium microcontrollers. (I work only with 32 bits architectures, nothing too small.) I was to be exepected, you might say, but I'm trying to go further and do better, and currently I'm doubting LD will let me do it.
What I would like to get is that methods/functions which aren't used by the code get garbage collected too. This isn't currently the case, since they are all referred to by the pointer structures of the objects owning them. I would like to avoid using macro configuration switches and the application developper having to go through several layers of code to determine which functions are currently used and which ones can be safely disabled. I would really, really like that the linker's garbage collection let me automatize all the process.
First I thought I could split each methods structure between sections, so that, say, all the ports' "probe" methods end up in a section called .struct_probe, and that the wrapping port_probe() function could reference a zero-length object inside that function, so that all ports' probe references get linked if and only if the port_probe() function is called somewhere.
But I was wrong in that, for the linker, the "input sections" which are his resolution for garbage collection (since at this point there's no more alignment information inside, and it couldn't afford to take advantage of a symbol - and afferent object - being removed by reordering the insides of the containing section and shrinking it) aren't identified solely by a section name, but by a section name and a source file. So if I implement what I intended to, none of my methods will get linked in the final executable, and I will be toast.
That's where I'm at currently, and frankly I'm quite at a loss. I wondered if maybe someone here would have a better idea for either having each method "backward reference" the wrapping function or some other object which would in turn be referenced by the function and take all methods along, or as the title says, somehow group those methods / sections (without gathering all the code in a single file, please) so that referencing one means linking them all.
My gratitude for all eternity is on the line, here. ;)
Since I have spent some time documenting and experimenting on the following lead, even though without success, I would like to expose here what I found.
There's a feature of ELF called "group sections", which are used to define sections groups, or groups of sections, such as if one member section of the group is live (hence linked), all are.
I hoped this was the answer to my question. TL;DR: It wasn't, because group sections are meant to group sections inside a module. Actually, the only type of groups currently defined is COMDAT groups, which are by definition exclusive from groups with the same name defined in other modules.
Documentation on that feature and its implementation(s) is scarce to say the least. Currently, the standard's definition of group sections can be found here.
GCC doesn't provide a construct to manipulate section groups (or any kind of sections flags/properties for that matter). The GNU assembler's documentation specifies how to affect a section to a group here.
I've found no evocation in any GNU document about LD's handling of group sections. It is mentioned here and here, though.
As a bonus, I've found a way to specify sections properties (including grouping) in C code with GCC. This is a dirty hack, so it may be it won't work anymore by the time you read this.
Apparently, when you write
int bar __attribute__((section("<name>")));
GCC takes what's between the quotes an blindly pastes it so in the assembly output:
.section <name>,"aw"
(The actual flags can differ if the name matches one of a few predefined patterns.)
From there, it's all a matter of code injection. If you write
int bar __attribute__((section("<name>,\"awG\",%probbits,<group> //")));
you get
.section <name>,"awG",%progbits,<group> //"aw"
and the job is done. If you wonder why simply characterizing the section in a separate inline assembly statement isn't enough, if you do that you get an empty grouped section and a stuffed solitary section with the same name, which won't have any effect at link time. So.
This isn't entierely satisfying, but for lack of a better way, that's what I went for:
It seems the only way you have to effectively merge sections from several compilation units from the linker's point of view is to first link the resulting objects together in one big object, then link the final program using that big object instead of the small ones. Sections that had the same name in the small objects will be merged in the big one.
This is a bit dirty, though, and has also some drawbacks, such as perhaps merging some sections you wouldn't want to be, for garbage collecting purposes, and hiding which file each section comes from (though the information remains in the debug sections) if, say, you wanted to split the main sections (.text, .data, .bss...) in the final ELF so as to be able to see what file contributes what amount to flash and RAM usage, for instance.

Will go compilers ignore unused functions

If there is a function from an external package that is not used at all in my project, will the compiler remove the function from the generated machine code?
This question could be targeted at any language compiler in general. But, I think the behaviour may vary language to language. So, I am interested in knowing what does go compilers do.
I would appreciate any help on understanding this.
The language spec does not mention this anywhere, and from a correctness point of view this is irrelevant.
But know that the current version does remove certain constructs that the compiler can prove is not used and will not change the runtime behaviour of the app.
Quoting from The Go Blog: Smaller Go 1.7 binaries:
The second change is method pruning. Until 1.6, all methods on all used types were kept, even if some of the methods were never called. This is because they might be called through an interface, or called dynamically using the reflect package. Now the compiler discards any unexported methods that do not match an interface. Similarly the linker can discard other exported methods, those that are only accessible through reflection, if the corresponding reflection features are not used anywhere in the program. That change shrinks binaries by 5–20%.
Methods are a "harder" case than functions because methods can be listed and called with reflection (unlike functions), but the Go tools do what they can even to remove unused methods too.
You can see examples and proof of removed / unlinked code in this answer:
How to remove unused code at compile time?
How to avoid implicit include statements in Rhapsody code generation

I'm creating code for interfaces specified in IBM Rational Rhapsody. Rhapsody implicitly generates include statements for other data types used in my interfaces. But I would like to have more control over the include statements, so I specify them explicitly as text elements in the source artifacts of the component. Therefore I would like to prevent Rhapsody from generating the include statements itself. Is this possible?
If this can be done, it is mostly likely with Properties. In the feature box click on properties and filter by 'include' to see some likely candidates. Not all of the properties have descriptions of what exactly they do so good luck.
I spent some time looking through the properties as well an could not find any to get what you want. It seems likely you cannot do this with the basic version of Rhapsody. IBM does license an add-on to customize the code generation, called Rules Composer (I think); this would almost certainly allow you to customize the includes but at quite a cost.
There are two other possible approaches. Depending on how you are customizing the include statements you may be able to write a simple shell script, perhaps using sed, and then just run that script to update your code every time Rhapsody generates it.
The other approach would be to use the Rhapsody API to create a plugin/tool that iterates through all the interfaces and changes the source artifacts accordingly. I have not tried this method myself but I know my coworkers have used the API to do similar things.
Finally I found the properties that let Rhapsody produce the required output: GenerateImplicitDependencies for several elements and GenerateDeclarationDependency for Type elements. Disabling these will avoid the generation of implicit include statements.

Is there any downside to redundant qualifiers? Any benefit?

For example, referencing something as System.Data.Datagrid as opposed to just Datagrid. Please provide examples and explanation. Thanks.
The benefit is that you don't need to add an import for everything you use, especially if it's the only thing you use from a particular namespace, it also prevents collisions.
The downside, of course, is that the code balloons out in size and gets harder to read the more you use specific qualifiers.
Personally I tend to use imports for most things unless I know for sure I will only be using something from a particular namespace once or twice, so it won't impact the readability of my code.
You're being very explicit about the type you're referencing, and that is a benefit. Although, in the very same process you're giving up code clarity, which clearly is a downside in my case, as I want code to be readable and understandable. I go for the short version unless I have a conflict in different namespaces which can only be solved with the explicit referencing to classes.. Unless I make an alias for it with the keyword using:
using Datagrid = System.Data.Datagrid;
Actually the full path is global::System.Data.DataGrid. The point of using a more qualified path is to avoid having to use additional using statements, especially if the introduction of another using will cause problems with type resolution. More fully qualified identifiers exist so that you can be explicit when you need to be explicit, but if the class's namespace is clear, then the DataGrid version is clearer to many.
I generally use the shortest form available in order to keep the code as clean and readable as possible. That's what using directives are for, after all, and tooltips in the VS editor give you instant detail on the provenance of a type.
I also tend to use a namespace tag for RCWs in a COM interop layer, to call out those variables explicitly in the code (they may need special attention on lifecycle and collection), eg
using _Interop = Some.Interop.Namespace;
In terms of performance there is no upside/downside. Everything is resolved at compile time and the generated MSIL is identical whether you use fully-qualified names or not.
The reason why its use is prevalent in the .NET world is because of auto-generated code, such as designer markup. In that case it would be better to fully-qualify names like class names because of possible conflicts with other classes you may have in your code.
If you have a tool like ReSharper, it will actually tell you what fully-qualified references you have are unnecessary (e.g. by graying them out) so you can lop them off. If you frequently cut-paste code across your various code bases, it would be a must to fully qualify them. (then again, why would you want to do cut-paste all the time; it's a bad form of code reuse!)
I don't think there is really a downside, just readability vs actual time spent coding. In general if you don't have namespaces with ambiguous object I don't think it's really needed. Another thing to consider is level of use. If you have one method that uses reflection and you are alright with typeing System.Reflection 10 times, then it's not a big deal but if you plan on using a namespace alot then I would recommend an include.
Depending on your situation, extra qualifiers will generate a warning (if this is what you mean by redundant). If you then treat warnings as errors, that's a pretty serious downside.
I've run into this with GCC for example.
struct A {
int A::b; // warning!
