Attempting to reduce executable size of Go program [duplicate] - go

This question already has answers here:
Reason for huge size of compiled executable of Go
(3 answers)
Closed 3 years ago.
EDIT / CLARIFICATION:
It seems that I have failed in explaining myself here. I am not criticizing Go, or it's runtime, or the fact that the executables are large. I was also not trying to say that Go is bad while C is good.
I was merely pointing out that the compiled executable seems to always be at least around 1MB (presumably this is the runtime overhead), and that importing a package seems to put the entire package inside, regardless of usage.
My actual question was basically if those 2 points are the default behavior or the only behavior? I gave some examples of C programs that are code-wise equivalent to the Go programs, but that I have carefully picked compiler and linker flags for them to avoid linking with any external C runtime (I verified this with Dependency Walker just to be sure). The purpose of the C examples was to show how small the actual code is, and also to show a case where you do need something and you import only what you need.
I actually think that this behavior (put everything inside just in case) is a good setting to have as a default, but I thought that there may be some compiler or linker flag to change this. Now, I do not think that it would be sensible to say you don't want the runtime or parts of it. However, I think that selectively including parts of a package is not such a strange thing to have. Let me explain:
Let's say we are writing in C/C++ and we include a huge header file with tons of functions, but we only use a small portion of them. In this scenario it is possible to end up with an executable that will not contain any unused code from that header file. (Real world example: A math library with support for 2D/3D/4D vectors and matrices, quaternions, etc.. all of which come in 2 version one for 32bit floats and one for 64bit floats + conversions from one to the other)
This is the sort of thing I was looking for. I fully understand that doing this may cause issues in some cases, but still. It's not like Go does not have other things that may cause serious issues.. they have the "unsafe" package, which is there if you need it but it's like "use at your own risk" kinda package.
ORIGINAL QUESTION:
After working with Go (golang) for some time I decided to look into the executable that it produces. I saw that the my project was clocking in at more than 4.5MB for the executable alone, while another project that is similar in complexity and scope, but written in C/C++ (compiled with MSVC) was less than 100KB.
So I decided to try some things out. I've written really stupid and dead simple programs both in C and Go to compare the output.
For C I am using MSVC, compiling a 64 bit executable in release mode and NOT linking with the C runtime (as far as I understand it seems to me that Go executable only link with it if using CGO)
First run: A simple endless loop, that's it. No prints, no interaction with the OS, nothing.
C:
#include "windows.h"
int main()
{
while (true);
}
void mainCRTStartup()
{
main();
}
GO:
package main
func main() {
for {
}
}
Results:
C : 3KB executable. Depends on nothing
GO: 1,057 KB executable. Depends on 29 procedures from KERNEL32.DLL
There is a huge difference there, but I thought that it might be unfair. So next test I decided to remove the loop and just write a program that immediately returns with an exit code of 13:
C:
#include "windows.h"
int main()
{
return 13;
}
void mainCRTStartup()
{
ExitProcess(main());
}
GO:
package main
import "os"
func main() {
os.Exit(13)
}
Results:
C: 4KB executable. Depends on 1 procedure from KERNEL32.DLL
GO: 1,281 KB executable. Depends on 31 procedures from KERNEL32.DLL
It seems that Go executable "bloated". I understand that unlike C, Go puts a considerable amount of it's runtime code into the executable, which is understandable, but it's not enough to explain the sizes.
Also it seems like Go works in a package granularity. What I mean is that it will not cram into the executable packages that you do not use, but if you import a package you get ALL of it, even if you only need a small subset, even if you don't use it at all. For example just importing "fmt" without even calling anything there expands the previous executable from 1,281KB to 1,777 KB.
Am I missing something like some flags to the Go compiler to tell it to be less bloated (I know there are many flags that I can set and also give flags to the native compiler and linker, but I have not found for this specifically) or is it just something no one cares about in 2019 anymore since what are a few megabytes really?

Here are some things that the Go program includes that the C program does not include:
Container types, such as hash maps and arrays, and their associated functions
Memory allocator, with optimizations for multithreaded programs
Concurrent garbage collector
Types and functions for threading, such as mutexes, condition variables, channels, and threads
Debugging tools like stack trace dumping and the SIGQUIT handler
Reflection code
(If you are curious exactly what is included, you can look at the symbols in your binary with debugging tools. On macOS and Linux you can use nm to dump the symbols in your program.)
The thing is—most Go programs use all of these features! It’s hard to imagine a Go program that doesn't use the garbage collector. So the creators of Go have not created a special way to remove this code from programs—since nobody needs this feature. After all, do you really care how big "Hello, world!" is? No, you don’t.
From the FAQ Why is my trivial program such a large binary?
The linker in the gc toolchain creates statically-linked binaries by default. All Go binaries therefore include the Go runtime, along with the run-time type information necessary to support dynamic type checks, reflection, and even panic-time stack traces.
Also keep in mind that if you are compiling on Windows with MSVC, you may be using a DLL runtime, such as MSVCR120.DLL... which is about 1 MB.

Related

If both Mac OS and Windows use the x86 instruction set, why do we have to recompile for each platform?

If both Mac OS and Windows, running on Intel processors, use the x86 instruction set, why can't a program written using only C++11 (no OS Specific libraries, frameworks or API's), run on both without having to recompile for that platform ?
Ultimately the program gets compiled to machine code, so if the instruction set is the same, whats the difference ? What's really going on ?
EDIT: I'm really just talking about a simple "Hello world" program compiled with something like gcc. Not Apps!
EDIT: For example:
#include<iostream>
using namespace std;
int main()
{
cout << "Hello World!";
return 0;
}
EDIT: An even simpler program:
int main(){
int j = 2;
j = j + 3;
}
Because a "program" nowadays consists of more than just a blob of binary code. Their file formats are not cross-compatible (PE/COFF vs. ELF vs. Mach-O). It's kind of silly when you think about it, yes, but that's the reality. It wouldn't have to be this way if you could start history over again.
Edit:
You may also want to see my longer answer on SoftwareEngineering.StackExchange (and others').
Even "Hello, world" needs to generate output. That will either be OS calls, BIOS calls at a somewhat lower level, or, as was common in DOS days for performance reasons, direct output to video via I/O calls or memory mapped video. Any of those methods will be highly specific to the operating system and other design issues. In the example you've listed, iostream hides those details and will be different for each target system.
One reason is provided by #Mehrdad in their answer: even if the assembly code is the same on all platforms, the way it's "wrapped" into an executable file may differ. Back in the day, there were COM files in MS-DOS. You could load this file in a memory and then just start executing it from the very beginning.
Eventually we've got read-only memory pages, .bss, non-executable read-write memory pages (non-executable for safety reasons), embedded resources (like icons on Windows), and other stuff which the OS should know about before running the code in order to properly configure the isolated environment for the newly created process. Of course, there are also shared libraries (which have to be loaded by the OS) and any program which does anything meaningful has to output some result via OS call, e.g. it has to know how to perform system calls.
So, turns out that in multi-process modern OSes executable files should contain a lot of metainformation in addition to the code. That's why we have file formats. They are different on different platforms mainly for historical reasons. Think of it as of PNG vs JPEG - both are compressed rasterized image formats, but they're incompatible, use different algorithms for compression and different storage formats.
no OS Specific libraries, frameworks or API's
That's not true. As we live in multi-process OS, no process has any kind of direct access to the hardware - be it network card or display. In general, it can only access CPU and memory (in a very limited way).
E.g. when you run your program in terminal, its output should get to the terminal emulator, so it can be displayed in a window, which you can drag across the screen, transparently for your "Hello World". So, OS gets involved anyway.
Even your "hello world" application has to:
Load dynamic C++ runtime, which will initialize cin object before your main starts. Who else will initialize cin object and call destructors when main ends?
When you try to print something, your C++ runtime will eventually have to make a call to the OS. Nowadays, it's typically abstracted away in C standard library (libc), which we have to load dynamically even before C++ runtime.
That C standard library invokes some x86 instructions which make the system call which "prints" the string on the screen. Note that different OSes and different CPUs (even among x86 family) have different mechanisms and conventions about system calls. Some use interruptions, some use specifically designed sysenter/syscall instructions (hello from Intel and AMD), some pass arguments in known memory locations, some pass them via registers. Again, that's why this code is abstracted away by the OS's standard library - it typically provides some simple C interface which makes necessary assembly-level magic.
All in all, answering your question: because your program have to interact with the OS and different OSes use completely different mechanisms for that.
If your program has no side effects (like your second example), then it is still saved in the "general" format. And, as "general" formats differ between platforms, we should recompile. It's just not worth to invent a common compatible format for simple programs with no side effects, as they are useless.

Is it recommended to keep a program sources (as opposed to lib sources) in a single file?

I am making my first steps into Go and obviously am reasoning from what I'm used to in other languages rather than understanding go specificity and styles yet.
I've decided to rewrite a ruby background job I have that takes ages to execute. It iterates over a huge table in my database and process data individually for each row, so it's a good candidate for parallelization.
Coming from a ruby on rails task and using ORM, this was meant to be, as I thought of it, a quite simple two files program: one that would contain a struct type and its methods to represent and work with a row and the main file to operate the database query and loop on rows (maybe a third file to abstract database access logic if it gets too heavy in my main file). This file separation as I intended it was meant for codebase clarity more than having any relevance in the final binary.
I've read and seen several things on the topic, including questions and answers here, and it always tends to resolve into writing code as libraries, installing them and then using them into a single file source (in package main) program.
I've read that one may pass multiple files to go build/run, but it complains if there is several package name (so basically, everything should be in main) and it doesn't seem that common.
So, my questions are :
did I get it right, and having code mostly as a library with a single file program importing it the way to go?
if so, how do you deal with having to build libraries repeatedly? Do you build/install on each change in library codebase before executing (which is way less convenient that what go run promise to be) or is there something common I don't know of to execute library dependent program quick and fast while working on those libraries code?
No.
Go and the go tool works on packages only (just go run works on files, but that is a different story): You should not think about files when organizing Go code but packages. A package may be split into several files, but that is used for keeping test code separated and limiting file size or
grouping types, methods, functions, etc.
Your questions:
did I get it right, and having code mostly as a library with a single file program
importing it the way to go?
No. Sometimes this has advantages, sometimes not. Sometimes a split may be one lib + one short main,
in other cases, just one large main might be better. Again: It is all about packages and never about files. There is nothing wrong with a single 12 file main package if this is a real standalone program. But maybe extracting some stuff into one or a few other packages might result in more readable code. It all depends.
if so, how do you deal with having to build libraries repeatedly? Do you build/install on each change in library codebase before executing (which is way less convenient that what go run promise to be) or is there something common I don't know of to execute library dependent program quick and fast while working on those libraries code?
The go tool tracks the dependencies and recompiles whatever is necessary. Say you have a package main in main.go which imports a package foo. If you execute go run main.go it will recompile package foo transparently iff needed. So for quick hacks: No need for a two-step go install foo; go run main. Once you extract code into three packages foo, bar, and waz it might be a bit faster to install foo, bar and waz.
No. Look at the Go commands and Go standard packages for exemplars of good programming style.
Go Source Code

Writing a Ruby extension in Go (golang)

Are there some tutorials or practical lessons on how to write an extension for Ruby in Go?
Go 1.5 added support for building shared libraries that are callable from C (and thus from Ruby via FFI). This makes the process easier than in pre-1.5 releases (when it was necessary to write the C glue layer), and the Go runtime is now usable, making this actually useful in real life (goroutines and memory allocations were not possible before, as they require the Go runtime, which was not useable if Go was not the main entry point).
goFuncs.go:
package main
import "C"
//export GoAdd
func GoAdd(a, b C.int) C.int {
return a + b
}
func main() {} // Required but ignored
Note that the //export GoAdd comment is required for each exported function; the symbol after export is how the function will be exported.
goFromRuby.rb:
require 'ffi'
module GoFuncs
extend FFI::Library
ffi_lib './goFuncs.so'
attach_function :GoAdd, [:int, :int], :int
end
puts GoFuncs.GoAdd(41, 1)
The library is built with:
go build -buildmode=c-shared -o goFuncs.so goFuncs.go
Running the Ruby script produces:
42
Normally I'd try to give you a straight answer but the comments so far show there might not be one. So, hopefully this answer with a generic solution and some other possibilities will be acceptable.
One generic solution: compile high level language program into library callable from C. Wrap that for Ruby. One has to be extremely careful about integration at this point. This trick was a nice kludge to integrate many languages in the past, usually for legacy reasons. Thing is, I'm not a Go developer and I don't know that you can compile Go into something callable from C. Moving on.
Create two standalone programs: Ruby and Go program. In the programs, use a very efficient way of passing data back and forth. The extension will simply establish a connection to the Go program, send the data, wait for the result, and pass the result back into Ruby. The communication channel might be OS IPC, sockets, etc. Whatever each supports. The data format can be extremely simple if there's no security issues and you're using predefined message formats. That further boosts speed. Some of my older programs used XDR for binary format. These days, people seem to use things like JSON, Protocol Buffers and ZeroMQ style wire protocols.
Variation of second suggestion: use ZeroMQ! Or something similar. ZeroMQ is fast, robust and has bindings for both languages. It manages the whole above paragraph for you. Drawbacks are that it's less flexible wrt performance tuning and has extra stuff you don't need.
The tricky part of using two processes and passing data between them is a speed penalty. The overhead might not justify leaving Ruby. However, Go has great native performance and concurrency features that might justify coding part of an application in it versus a scripting language like Ruby. (Probably one of your justifications for your question.) So, try each of these strategies. If you get a working program that's also faster, use it. Otherwise, stick with Ruby.
Maybe less appealing option: use something other than Go that has similar advantages, allows call from C, and can be integrated. Althought it's not very popular, Ada is a possibility. It's long been strong in native code, (restricted) concurrency, reliability, low-level support, cross-language development and IDE (GNAT). Also, Julia is a new language for high performance technical and parallel programming that can be compiled into a library callable from C. It has a JIT too. Maybe changing problem statement from Ruby+Go to Ruby+(more suitable language) will solve the problem?
As of Go 1.5, there's a new build mode that tells the Go compiler to output a shared library and a C header file:
-buildmode c-shared
(This is explained in more detail in this helpful tutorial: http://blog.ralch.com/tutorial/golang-sharing-libraries/)
With the new build mode, you no longer have to write a C glue layer yourself (as previously suggested in earlier responses). Once you have the shared-library and the header file, you can proceed to use FFI to call the Go-created shared library (example here: https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/)

How do different apps written in different languages interact?

Off late I'd been hearing that applications written in different languages can call each other's functions/subroutines. Now, till recently I felt that was very natural - since all, yes all - that's what I thought then, silly me! - languages are compiled into machine code and that should be same for all the languages. Only some time back did I realise that even languages compiled in 'higher machine code' - IL, byte code etc. can interact with each other, the applications actually. I tried to find the answer a lot of times, but failed - no answer satisfied me - either they assumed I knew a lot about compilers, or something that I totally didn't agree with, and other stuff...Please explain in an easy to understand way how this works out. Especially how languages compiled into 'pure' machine code have different something called 'calling conventions' is what is making me clutch my hair.
This is actually a very broad topic. Languages compiled to machine code can often call each others' routines, though usually not without effort; e.g., C++ code can call C routines when properly declared:
// declare the C function foo so it can be called by C++ code
extern "C" {
void foo(int, char *);
}
This is about as simple as it gets, because C++ was explicitly designed for compatibility with C (it includes support for calling C++ routines from C as well).
Calling conventions indeed complicate the picture in that C routines compiled by one compiler might not be callable from C compiled by another compiler, unless they share a common calling convention. For example, one compiler might compile
foo(i, j);
to (pseudo-assembly)
PUSH the value of i on the stack
PUSH the value of j on the stack
JUMP into foo
while another might push the values of i and j in reverse order, or place them in registers. If foo was compiled by a compiler following another convention, it might try to fetch its arguments off the stack in the wrong order, leading to unpredictable behavior (consider yourself lucky if it crashes immediately).
Some compilers support various calling conventions for this purpose. The Wikipedia article introduces calling conventions; for more details, consult your compiler's documentation.
Finally, mixing bytecode-compiled or interpreted languages and lower-level ones in the same address space is still more complicated. High-level language implementations commonly come with their own set of conventions to extend them with lower-level (C or C++) code. E.g., Java has JNI and JNA.

C Runtime objects, dll boundaries

What is the best way to design a C API for dlls which deals with the problem of passing "objects" which are C runtime dependent (FILE*, pointer returned by malloc, etc...). For example, if two dlls are linked with a different version of the runtime, my understanding is that you cannot pass a FILE* from one dll to the other safely.
Is the only solution to use windows-dependent API (which are guaranteed to work across dlls) ? The C API already exists and is mature, but was designed from a unix POV, mostly (and still has to work on unix, of course).
You asked for a C, not a C++ solution.
The usual method(s) for doing this kind of thing in C are:
Design the modules API to simply not require CRT objects. Get stuff passed accross in raw C types - i.e. get the consumer to load the file and simply pass you the pointer. Or, get the consumer to pass a fully qualified file name, that is opened , read, and closed, internally.
An approach used by other c modules, the MS cabinet SD and parts of the OpenSSL library iirc come to mind, get the consuming application to pass in pointers to functions to the initialization function. So, any API you pass a FILE* to would at some point during initialization have taken a pointer to a struct with function pointers matching the signatures of fread, fopen etc. When dealing with the external FILE*s the dll always uses the passed in functions rather than the CRT functions.
With some simple tricks like this you can make your C DLLs interface entirely independent of the hosts CRT - or in fact require the host to be written in C or C++ at all.
Neither existing answer is correct: Given the following on Windows: you have two DLLs, each is statically linked with two different versions of the C/C++ standard libraries.
In this case, you should not pass pointers to structures created by the C/C++ standard library in one DLL to the other. The reason is that these structures may be different between the two C/C++ standard library implementations.
The other thing you should not do is free a pointer allocated by new or malloc from one DLL that was allocated in the other. The heap manger may be differently implemented as well.
Note, you can use the pointers between the DLLs - they just point to memory. It is the free that is the issue.
Now, you may find that this works, but if it does, then you are just luck. This is likely to cause you problems in the future.
One potential solution to your problem is dynamically linking to the CRT. For example,you could dynamically link to MSVCRT.DLL. That way your DLL's will always use the same CRT.
Note, I suggest that it is not a best practice to pass CRT data structures between DLLs. You might want to see if you can factor things better.
Note, I am not a Linux/Unix expert - but you will have the same issues on those OSes as well.
The problem with the different runtimes isn't solvable because the FILE* struct belongs
to one runtime on a windows system.
But if you write a small wrapper Interface your done and it does not really hurt.
stdcall IFile* IFileFactory(const char* filename, const char* mode);
class IFile {
virtual fwrite(...) = 0;
virtual fread(...) = 0;
virtual delete() = 0;
}
This is save to be passed accross dll boundaries everywhere and does not really hurt.
P.S.: Be careful if you start throwing exceptions across dll boundaries. This will work quiet well if you fulfill some design creterions on windows OS but will fail on some others.
If the C API exists and is mature, bypassing the CRT internally by using pure Win32 API stuff gets you half the way. The other half is making sure the DLL's user uses the corresponding Win32 API functions. This will make your API less portable, in both use and documentation. Also, even if you go this way with memory allocation, where both the CRT functions and the Win32 ones deal with void*, you're still in trouble with the file stuff - Win32 API uses handles, and knows nothing about the FILE structure.
I'm not quite sure what are the limitations of the FILE*, but I assume the problem is the same as with CRT allocations across modules. MSVCRT uses Win32 internally to handle the file operations, and the underlying file handle can be used from every module within the same process. What might not work is closing a file that was opened by another module, which involves freeing the FILE structure on a possibly different CRT.
What I would do, if changing the API is still an option, is export cleanup functions for any possible "object" created within the DLL. These cleanup functions will handle the disposal of the given object in the way that corresponds to the way it was created within that DLL. This will also make the DLL absolutely portable in terms of usage. The only worry you'll have then is making sure the DLL's user does indeed use your cleanup functions rather than the regular CRT ones. This can be done using several tricks, which deserve another question...

Resources