Bundling a static library as a single archive - go

I am working on a go library which I intend to distribute as a binary artifact. Now I am aware that there are different -buildmode options and archive should pretty much do that for me and go 1.7+'s //go:binary-only-package makes the trick official.
However, when I build my library the resulting archive (*.a) contains my library only, not any dependencies. I am in fact having a dependency which itself is binary only and comes as a statically linked archive itself (it's a C library which I am integrating with).
Using proper native code archives I am actually able to assemble such a fat archive using ar or libtool trickery - but these tools do not really work with my go artifacts.
Is there a way I can distribute a single *.a file or do I have to resort to packaging multiple archives into, say, a zip file which resembles the directory structure in $GOPATH and tell the my client's developers to just unzip that into their $GOPATH and be done with it?

Related

What does it mean when a package is in the go/pkg/mod/cache dir but it has no source code extracted?

I'm trying to understand how the source code for third-party dependencies is or is not compiled into my Go binary. I'm building in a Docker container, so I can see precisely what's fetched for my build without interference from other builds.
After my go build completes I see source code files for several dependencies under go/pkg/mod/$module#$version directories. The Module cache documentation tells me that these directories contain "extracted contents of a module .zip file. This serves as a module root directory for a downloaded module." My best guess is that the presence of extracted source code for these dependencies indicates that "yes, these dependencies are definitely compiled into your binary."
I also see many more dependencies pulled into go/pkg/mod/cache/download/$module directories. The Module cache documentation tells me that this directory contains "files downloaded from module proxies and files derived from version control systems," which I don't fully understand. As far as I can see, these files do not include any extracted source code, though there are several .zip files that I assume contain the source. For the most part these files seem to be .mod files that just contain text representing some sort of dependency graph.
My question is: if a third-party dependency has module files under go/pkg/mod/cache/download but no source code under go/pkg/mod/$module#$version, does that mean that dependency's code was NOT compiled into my Go binary?
I don't understand why the Go build pulls in all these module files but only has extracted source code for some of the third-party modules. Perhaps Go preemptively parses and pulls module information for the full transitive set of modules referenced from the modules my first-party code imports, but perhaps many of those modules don't end up being needed for my binary's compile + build process and therefore don't get extracted. If that's not true and the answer to my question is no, then I don't understand how or why my binary can link in those dependencies without go build fetching their source code.
As mentioned in "Compile and install packages and dependencies"
Compiled packages are cached automatically.
GOPATH and Module includes:
When using modules, GOPATH is no longer used for resolving imports.
However, it is still used to store downloaded source code (in GOPATH/pkg/mod) and compiled commands (in GOPATH/bin).
So if you see sources in pkg/mod which are not in pkg/mod/cache, try a go mod tidy
add missing and remove unused modules
From there, you should have the same modules between sources (pkg/mod) and compiled modules (pkg/mod/cache)
Based on the OP's comment
I need to know exactly what's included in the binary for compliance reasons.
I would recommend a completely different approach: dumping the list of symbols contained in the binary and then correlating it with whatever information is needed.
The command
go tool nm -type /path/to/the/executable/image/file
would dump the symbols — names of the functions — whose code was taken from both the standard library packages, 3rd-party and/or vendored packages and internal packages, compiled and linked into the binary, and print to its standard output stream a sequence of lines
address type name
which you can then process programmatically.
Another approach you might employ is to use go list with various flags to query the program's source code about the packages and/or modules which will be used when building: whatever that command outputs describing the full dependency graph of the source code is whatever go build will use when building — provided the source code is not changed between these calls.
Yet another possibility is to build the program using go build -x, save the debug trace it produces on its standard error stream and parsing it for exact module names the command reported as used during building.

How to inspect Go package files (.a files)?

How to inspect the content of Go package/object files (.a, .o)
The only thing I found is showing the full disassembly with go tool objdump.
But how to show the file structure, like imported and exported symbols, code, data and other sections, other metadata etc.?
Compiled Go packages are stored in Unix ar archive files, so you can extract the files within them using ar or go tool pack x pkg.a.
Package files contain compiled code (_go_.o), used by the linker to build binaries, and export data (__.PKGDEF), used by the compiler when building dependent packages. If the package contained assembly files or cgo code, there may be additional .o files, too.
You can use golang.org/x/tools/go/gcexportdata to read export data into a format that can be understood by go/types.
It sounds like you've already found go tool objdump for disassembly. You might also find go tool nm useful for listing symbols.

Does 'make'-ing something from source make it self-contained?

Forgive me before I start, as I'm not a C / C++ etc programmer, a mere PHP one :) but I've been working on projects that use some others sourced from online open source repos, such as svn and git. For some of these projects, I need to install libraries and then run "./configure", "make" and then "make all" (as an example) and I do this on a "build" virtual machine to get the binaries that I need to use within my project.
The ultimate goal of some of my projects is to then take these "compiled" (if that's the correct term) binaries and place them onto a virtual machine which I would then re-distribute (according to licenses etc).
My question is this : when I build these binaries on my build machine, with all the pre-requisities that I need in order to build them in the first place ("build-essential" and "cmake" and "gcc" etc etc) - once the binaries are on my build machine (in /usr/lib for example) are they self-contained to the point that I can merely copy those /usr/lib binary files that the build created and place them in the same folder on the virtual machines that I would distribute, without the build servers having all the build components installed on them?
With all the dependencies that I would need to build the source in the place, would that finally built binary contain them all in itself, or would I have to include them on the distributed servers as well?
Would that work? Is the question a little too general and perhaps it would all depend on what I'm building?
Update from original posting after a couple of responses
I will be distributing the VMs myself, inasmuch as I will build them and then install my projects upon them. Therefore, I know the OS and environment completely. I just don't want to "bloat" them with unnecessary software that's been installed that I don't actually need because the compiled executables I will place on the distributed VMs in for example /usr/local/bin ...
That depends on how you link your program to libraries it depends on. In most cases, the default is to link dynamically, which means that you need to distribute your executable along its deps. You can check out what libraries are required to run the file using ldd command.
Theoretically, you can link everything statically, which means that library code would be compiled into executable. Thus, executable would really be self-contained, but linking statically is not always possible. This depends on actual libraries you are using and probably require playing with ./configure args when building them.
Finally, there are some liraries that always linked dynamically, such as libc. The good thing is that machine you are distributing to would surely have this library. The bad thing is that versions of these libraries may differ, and you might face ABI mismatch.
In short, if your project not huge and there is possibility to link everything statically, go this way. If not, read about AppImage and Docker.
The distribution of built libraries and headers (binary distribution) is a possible way and should work. (I do it in my projects always.)
It is not necessary that all of the libraries you built are installed into /usr/lib. To keep your target machine clean you can install it in other folder to, e.g.
/usr/local/MYLIB/lib/libmylib.so
/usr/local/MYLIB/include/mylib.h
/usr/local/MYOTHERLIB/lib/libmyotherlib.so
/usr/local/MYOTHERLIB/include/libmyotherlib.so
Advantages:
Easy installation, easy remove
All files within one subfolder, no files are missing, no mix with other libs
Disadvantage:
The loader must know the extra search path

Exclude specific symbols from dSYM

I'm building an iOS project that includes a sub-project whose symbols I would like exclude from the product's .dSYM DWARF file.
The situation is that the sub-project (a static library) contains valuable proprietary code that I would not want an attacker to be able to symbolicate, even if they had the dSYM files used for resymbolicate crash reports for the whole app. The subproject covers a very specific domain and is well tested independently, so I'm not worried about being unable to resymbolicate stack traces in that code. However, I do need to be able to resymbolicate crash reports for the rest of the app, so I need a dSYM (as distributing symbols with the app is not an option).
I've already managed to make sure that all of the relevant symbols are stripped from the binary, and setting GCC_GENERATE_DEBUGGING_SYMBOLS=NO removed a lot from the dSYM, but I'm still seeing class-private C++ method names inside the dSYM file. For reference, I'm using clang.
How could I produce a dSYM for my app without compromising the symbols of this sub-project?
With a bog-standard Xcode workflow, this might be difficult. You could probably do something with a shell script phase which moves the static library to a different filename ("hides" it) and then runs dsymutil on your main app binary to create a dSYM. Because dsymutil can't find the static library, it won't be able to include any debug information for those functions. Alternatively, you can create a no-debug-info version of the static library although this will take a little bit more scripting. A static library is really a zip file of object (.o) files -- you need to create a directory, extract the .o files (ar x mylib.a), strip the .o files, then create a new static library (ar q mylib-nodebuginfo.a *.o I think) and put that in place before running dsymutil.
I know no on way to selectively remove debug information from a dSYM once it has been created, though. It's possible to do but I don't think anyone has written a tool like that.

How should open source libraries be used on Windows?

There are many open-source libraries that can be compiled with Visual Studio. I'm porting a program from Linux to Windows, but it depends on a number of libraries. I don't know what the best practices regarding libraries are on Windows.
On Linux, these libraries are typically part of the distribution. To use sqlite on Debian, for example, you need only to install libsqlite3-dev and the include files and libraries (both static and dynamic) are automatically installed and available to your program.
If you need a different version than your distribution supplies, you can compile it in your home directory, install it to ~/include and ~/lib, and set the appropriate environment variables so that your compiler includes those directories in its search path.
What is the best way to use libraries that are distributed as source on Windows? If I link dynamically rather than statically, is there an easy way to copy required DLLs into the output directory to ease redistribution (assuming license requirements are met)?
Option 1 - Projects that have binary distributions for windows / do not build in DevStudio.
E.g. OpenSSL.
Projects like OpenSSL are best downloaded to their own folder and built using their own scripts. OpenSSL typically installs itself to C:\OpenSSL on windows builds, so one done, you can add C:\OpenSSL\include and C:\OpenSSL\lib to your project environment to access the OpenSSL headers and Libs. The actual dll files you will need to copy from C:\OpenSSL\bin into your projects staging folder (normally your SolutionDir\Debug or Release).
Once youve gone through the hassle of building OpenSSL once, you don't want to do it again. Or, if you've downloaded the binary distribution, its best left alone. Just document to others which binary distribution you used so they can set up their Visual Studio build environment appropriately.
Option 2 - Small libraries that are easy to create Visual Studio Projects for (or already have). Lua and sqllite fall into this category.
For projects that are small enough, it is not inconvenient to simply add them to your solution in a sub folder. This way you can get their outputs built directly to the solutions output folder, and you do not have to bundle pre-build binary files in your solution making it far easier to share the project with others.
Option 3 - As an alternative you could create your own standardized folder for the products of open source projects. Create C:\oss\include, c:\oss\lib, c:\oss\bin etc, add these paths to DevStudios lib and include paths, add c:\oss\bin to the systems PATH variable, as you build each OSS project, copy the appropriate files to these locations.
Again, while convenient, this setup makes it diffucult to replicate the build environment on a 2nd PC, so you might want to keep the entire C:\oss tree in source control as well.
On windows the easiest way is to build your own DLLs and include them in the program directory.
Yes it uses a bit more space, but HD are large these days and avoids a lot of headaches of incompatible versions (DLL hell). Windows also suffers a few more wrinkles with versions of libs built with different compilers so shipping your own builds is safest

Resources