If there is a function from an external package that is not used at all in my project, will the compiler remove the function from the generated machine code?
This question could be targeted at any language compiler in general. But, I think the behaviour may vary language to language. So, I am interested in knowing what does go compilers do.
I would appreciate any help on understanding this.
The language spec does not mention this anywhere, and from a correctness point of view this is irrelevant.
But know that the current version does remove certain constructs that the compiler can prove is not used and will not change the runtime behaviour of the app.
Quoting from The Go Blog: Smaller Go 1.7 binaries:
The second change is method pruning. Until 1.6, all methods on all used types were kept, even if some of the methods were never called. This is because they might be called through an interface, or called dynamically using the reflect package. Now the compiler discards any unexported methods that do not match an interface. Similarly the linker can discard other exported methods, those that are only accessible through reflection, if the corresponding reflection features are not used anywhere in the program. That change shrinks binaries by 5–20%.
Methods are a "harder" case than functions because methods can be listed and called with reflection (unlike functions), but the Go tools do what they can even to remove unused methods too.
You can see examples and proof of removed / unlinked code in this answer:
How to remove unused code at compile time?
Also see other relevant questions:
Splitting client/server code
Call all functions with special prefix or suffix in Golang
Related
I'm moderately new to common lisp, but have extended experience with other 'separate compilation' languages (think C/C++/FORTRAN and such)
I know how to do an ASDF system definition. I know how to separate stuff in packages. I'm using SBCL, by the way.
The question is this: what's the best practice for splitting code (large packages) between .lisp files? I mean, in C there are include files, while lisp lives with the current image state. So with multiple files I need to handle dependencies or serial order in the system definition. But without something like forward declarations it's painful.
Simple example on what I want to do: I have, for example, two defstructs that are part of the same bigger data structure (like struct1 is a parent of some set of struct2). Some functions works on one, some other works on the other and some other use both.
So I would have: a packages.lisp, a fun1.lisp (with the first defstruct and related functions), a fun2.lisp (with the other defstruct and functions) and a funmix.lisp (with functions that use both). In an ideal world everything is sealed and compiling these in this order would be fine. As most of you know, this in practice almost never happen.
If I need to use struct2 functions from the struct1 ones I would need to either reorder or add a dependency. But then if there's some kind of back call (that can't be done with a closure) I would have struct1.lisp depending on struct2.lisp and vice-versa which is obviously not valid. So what? I could break the loop putting the defstruct in a separate file (say, structs.lisp) but what if either of the struct's function need to access the common functions in the third file? I would like to avoid style notes.
What's the common way to solve this, i.e. keeping loosely related code in the same file but still be able to interface to other ones. Is the correct solution to seal everything in a compilation unit (a single file)? use a package for every file with exports?
Lisp dependencies are simple, because in many cases, a Lisp implementation doesn't need to process the definition of something in order to compile its use.
Some exceptions to the rule are:
Macros: macros must be loaded in order to be expanded. There is a compile-time dependency between a file which uses macro and the file which defines them.
Packages: a package foo must be defined in order to use symbols like foo:bar or foo::priv. If foo is defined by a defpackage form in some foo.lisp file, then that file has to be loaded (either in source or compiled form).
Constants: constants defined with defconstant should be seen before their use. Similar remarks apply to inline functions, compiler macros.
Any custom things in a "domain specific language" which enforces definition before use. E.g. if Whizbang Inference Engine needs rules to be defined when uses of the rules are compiled, you have to arrange for that.
For certain diagnostics to be suppressed like calls to undefined functions, the defining and using files must be taken to be as a single compilation unit. (See below.)
All the above remarks also have implications for incremental recompilation.
When there is dependency like the above between files so that one is a prerequisite of the other, when the prerequisite is touched, the dependent one must be recompiled.
How to split code into files is going to be influenced by all the usual things: cohesion, coupling and what have you. Common-Lisp-specific reasons to keep certain things together in one file is inlining. The call to a function which is in the same file as the caller may be inlined. If your program supports any in-service upgrade, the granularity of code loading is individual files. If some functions foo and bar should be independently redefinable, don't put them in the same file.
Now about compilation units. Suppose you have a file foo.lisp which defines a function called foo and bar.lisp which calls (foo). If you just compile bar.lisp, you will likely get a warning that an undefined function foo has been called. You could compile foo.lisp first and then load it, and then compile bar.lisp. But that will not work if there is a circular reference between the two: say foo.lisp also calls (bar) which bar.lisp defines.
In Common Lisp, you can defer such warnings to the end of a compilation unit, and what defines a compilation unit isn't a single file, but a dynamic scope established by a macro called with-compilation-unit. Simply put, if we do this:
(with-compilation-unit
(compile-file "foo.lisp") ;; contains (defun foo () (bar))
(compile-file "bar.lisp")) ;; contains (defun bar () (foo))
If a compile-file isn't surrounded by with-compilation-unit then there is a compilation unit spanning that file. Otherwise, the outermost nesting of the with-compilation-unit macro determines the scope of what is in the compilation unit.
Warnings about undefined functions (and such) are deferred to the end of the compilation unit. So by putting foo.lisp and bar.lisp compilation into one unit, we suppress the warnings about either foo or bar not being defined and we can compile the two in any order.
Build systems use with-compilation-unit under the hood, as appropriate.
The compilation unit isn't about dependencies but diagnostics. Above, we don't have a compile time dependency. If we touch foo.lisp, bar.lisp doesn't have to be recompiled or vice versa.
By and large, Lisp codebases don't have a lot of hard dependencies among the files. Incremental compilation often means that just the affected files that were changed have to be recompiled. The C or C++ problem that everything has to be rebuilt because a core header file was touched is essentially nonexistent.
but what if
No matter how you first organize your code, if you change it significantly you are going to have to refactor. IMO there is no ideal way of grouping dependencies in advance.
As a rule of thumb it is generally safe to define generic functions first, then types, then actual methods, for example. For non-generic functions, you can cut circular dependencies by adding forward declarations:
(declaim (ftype function ...))
Having too much circular dependency is a bit of a code smell.
Is the correct solution to seal everything in a compilation unit
Yes, if you group the definitions in the same compilation unit (the same file), the file compiler will be able to silence the style notes until it reaches the end of file: at this point it knows if there are still missing references or if all the cross-references are resolved.
But then if there's some kind of back call (that can't be done with a closure)
If you have a specific example in mind please share, but typically you can define struct1 and its functions in a way that can be self-contained; maybe it can accept a map that binds event names to callbacks:
(make-struct-1 :callbacks (list :on-empty one-is-empty
:on-full one-is-full))
Similarly, struct2 can accept callbacks too (Dependency Injection) and the main struct ties them using closures (?).
Alternatively, you can design your data-structures so that they signal conditions, and the in the caller code you intercept them to bind things together.
In general one can implement typical type_traits using template techniques.
However I didn't imagine how std::is_standard_layout could be implemented in these terms. http://en.cppreference.com/w/cpp/types/is_standard_layout
When I checked the gcc standard library, I found that it is implemented in terms of __is_standard_layout(T) which I could not find defined anywhere else. Is this a compiler magic function?
Would it be possible to implement std::is_standard_layout explicitly?
For example one of the conditions is that it inherits from a single class.
That seems to be impossible to determine at compile time.
No, std::is_standard_layout is not something you can implement without compiler intrinsics. As you've correctly pointed out, it needs more information than the C++ type system can express.
I am starting to work on a project and for one of the tasks I need to analyze the source code in order to gather information about the classes and their methods. More specifically, for each method I need to know which internal attributes and external objects (references) it uses throughout the entire method body.
I discussed it with my supervisors and they think that Bytecode manipulation libraries is the way to go. I already looked at BCEL, ASM and Javassist but I'm not sure which one I need to use. Do they all provide access to the method body where I can see all the instructions and get the information I need?
Any advice would be appreciate it. Thank you!
If you really “need to analyze the source code”, then libraries which allow to inspect the bytecode are not the way to go.
Otherwise, you really need to define your task precisely. Either, you are about to analyze classes, regardless of whether you will look at their source code or byte code, or you want to analyze source code and consider doing it by compiling first, followed by analyzing the compiled result. In the latter case, you have to compare the effort of both steps with alternative solution, which may, e.g. incorporate direct source code analysis.
Parsing byte code is rather easy, easier than analyzing source code, which is the reason why bytecode is produced prior to the execution of Java programs. To answer your concrete question, yes, all three libraries offer you a way to analyze the instructions and associated information. Which one is the best to fit your needs, is a question that is beyond the scope of Stackoverflow.
Whether analyzing the byte code helps, depends on your exact requirements. When it comes to field and method access, you may precisely get most of them using that approach. Only inlined compile-time constants lack their origins. When it comes to type use, you have to consider that not every source code artifact has an existing counterpart in the byte code, e.g. widening casts produce no actual code and and local variables usually don’t have a declared type (debugging information aside), but only an implied type which depends on how they are actually used. They also have no information about Generics, unless debugging information has been included.
This is my attempt to start a collection of GCC special features which usually do not encounter. this comes after #jlebedev in the another question mentioned "Effective C++" option for g++,
-Weffc++
This option warns about C++ code which breaks some of the programming guidelines given in the books "Effective C++" and "More Effective C++" by Scott Meyers. For example, a warning will be given if a class which uses dynamically allocated memory does not define a copy constructor and an assignment operator. Note that the standard library header files do not follow these guidelines, so you may wish to use this option as an occasional test for possible problems in your own code rather than compiling with it all the time.
What other cool features are there?
From time to time I go through the current GCC/G++ command line parameter documentation and update my compiler script to be even more paranoid about any kind of coding error. Here it is if you are interested.
Unfortunately I didn't document them so I forgot most, but -pedantic, -Wall, -Wextra, -Weffc++, -Wshadow, -Wnon-virtual-dtor, -Wold-style-cast, -Woverloaded-virtual, and a few others are always useful, warning me of potentially dangerous situations. I like this aspect of customizability, it forces me to write clean, correct code. It served me well.
However they are not without headaches, especially -Weffc++. Just a few examples:
It requires me to provide a custom copy constructor and assignment operator if there are pointer members in my class, which are useless since I use garbage collection. So I need to declare empty private versions of them.
My NonInstantiable class (which prevents instantiation of any subclass) had to implement a dummy private friend class so G++ didn't whine about "only private constructors and no friends"
My Final<T> class (which prevents subclassing of T if T derived from it virtually) had to wrap T in a private wrapper class to declare it as friend, since the standard flat out forbids befriending a template parameter.
G++ recognizes functions that never return a return value, and throw an exception instead, and whines about them not being declared with the noreturn attribute. Hiding behind always true instructions didn't work, G++ was too clever and recognized them. Took me a while to come up with declaring a variable volatile and comparing it against its value to be able to throw that exception unmolested.
Floating point comparison warnings. Oh god. I have to work around them by writing x <= y and x >= y instead of x == y where it is acceptable.
Shadowing virtuals. Okay, this is clearly useful to prevent stupid shadowing/overloading problems in subclasses but still annoying.
No previous declaration for functions. Kinda lost its importance as soon as I started copypasting the function declaration right above it.
It might sound a bit masochist, but as a whole, these are very cool features that increased my understanding of C++ and general programming.
What other cool features G++ has? Well, it's free, open, it's one of the most widely used and modern compilers, consistently outperforms its competitors, can eat almost anything people throw at it, available on virtually every platform, customizable to hell, continuously improved, has a wide community - what's not to like?
A function that returns a value (for example an int) will return a random value if a code path is followed that ends the function without a 'return value' statement. Not paying attention to this can result in exceptions and out of range memory writes or reads.
For example if a function is used to obtain the index into an array, and the faulty code path is used (the one that doesn't end with a return 'value' statement) then a random value will be returned which might be too big as an index into the array, resulting in all sorts of headaches as you wrongly mess up the stack or heap.
For example, referencing something as System.Data.Datagrid as opposed to just Datagrid. Please provide examples and explanation. Thanks.
The benefit is that you don't need to add an import for everything you use, especially if it's the only thing you use from a particular namespace, it also prevents collisions.
The downside, of course, is that the code balloons out in size and gets harder to read the more you use specific qualifiers.
Personally I tend to use imports for most things unless I know for sure I will only be using something from a particular namespace once or twice, so it won't impact the readability of my code.
You're being very explicit about the type you're referencing, and that is a benefit. Although, in the very same process you're giving up code clarity, which clearly is a downside in my case, as I want code to be readable and understandable. I go for the short version unless I have a conflict in different namespaces which can only be solved with the explicit referencing to classes.. Unless I make an alias for it with the keyword using:
using Datagrid = System.Data.Datagrid;
Actually the full path is global::System.Data.DataGrid. The point of using a more qualified path is to avoid having to use additional using statements, especially if the introduction of another using will cause problems with type resolution. More fully qualified identifiers exist so that you can be explicit when you need to be explicit, but if the class's namespace is clear, then the DataGrid version is clearer to many.
I generally use the shortest form available in order to keep the code as clean and readable as possible. That's what using directives are for, after all, and tooltips in the VS editor give you instant detail on the provenance of a type.
I also tend to use a namespace tag for RCWs in a COM interop layer, to call out those variables explicitly in the code (they may need special attention on lifecycle and collection), eg
using _Interop = Some.Interop.Namespace;
In terms of performance there is no upside/downside. Everything is resolved at compile time and the generated MSIL is identical whether you use fully-qualified names or not.
The reason why its use is prevalent in the .NET world is because of auto-generated code, such as designer markup. In that case it would be better to fully-qualify names like class names because of possible conflicts with other classes you may have in your code.
If you have a tool like ReSharper, it will actually tell you what fully-qualified references you have are unnecessary (e.g. by graying them out) so you can lop them off. If you frequently cut-paste code across your various code bases, it would be a must to fully qualify them. (then again, why would you want to do cut-paste all the time; it's a bad form of code reuse!)
I don't think there is really a downside, just readability vs actual time spent coding. In general if you don't have namespaces with ambiguous object I don't think it's really needed. Another thing to consider is level of use. If you have one method that uses reflection and you are alright with typeing System.Reflection 10 times, then it's not a big deal but if you plan on using a namespace alot then I would recommend an include.
Depending on your situation, extra qualifiers will generate a warning (if this is what you mean by redundant). If you then treat warnings as errors, that's a pretty serious downside.
I've run into this with GCC for example.
struct A {
int A::b; // warning!
}