why is the Go compiled version is relatively huge? [duplicate] - compilation

This question already has answers here:
How to reduce Go compiled file size?
(11 answers)
Closed 9 years ago.
I recently installed go and was trying out hello world example.
package main
import "fmt"
func main() {
fmt.Printf("hello, world\n")
}
$ go build hello.go
returns hello binary file which is 1.2Mb size. Thats comparatively huge for just a hello world program. Any particular reason on why is the file size large ? Is it because of importing "fmt" ?

This is a Go FAQ
Why is my trivial program such a large binary?
The linkers in the gc tool chain (5l, 6l, and 8l) do static linking.
All Go binaries therefore include the Go run-time, along with the
run-time type information necessary to support dynamic type checks,
reflection, and even panic-time stack traces.
A simple C "hello, world" program compiled and linked statically using
gcc on Linux is around 750 kB, including an implementation of printf.
An equivalent Go program using fmt.Printf is around 1.2 MB, but that
includes more powerful run-time support.

Yes, package "fmt" is one of the reasons. It also in turn imports other packages. But even without using "fmt", there's the whole runtime statically linked into a Go binary. And Go's runtime is not a simple one - it includes for instance a scheduler/goroutine vs OS thread manager, split stack allocator, the garbage collector and garbage collector friendly memory allocator which is also C threads friendly, signal handlers and stack trace generator, ...

Related

Why does the golang.org/x/sys package encourage the use of the syscall package it's meant to replace?

I have read some Go code making use of syscall for low-level interaction with the underlying OS (e.g. Linux or Windows).
I wanted to make use of the same package for native Windows development, but reading its documentation says it's deprecated in favor of golang/x/sys:
$ go doc syscall
package syscall // import "syscall"
Package syscall contains an interface to the low-level operating system
primitives.
...
Deprecated: this package is locked down. Callers should use the
corresponding package in the golang.org/x/sys repository instead. That is
also where updates required by new systems or versions should be applied.
See https://golang.org/s/go1.4-syscall for more information.
Now, reading the documentation for golang/x/sys and inspecting its code, it relies heavily on and encourages the use of the syscall package:
https://github.com/golang/sys/blob/master/windows/svc/example/beep.go
package main
import (
"syscall"
)
var (
beepFunc = syscall.MustLoadDLL("user32.dll").MustFindProc("MessageBeep")
)
func beep() {
beepFunc.Call(0xffffffff)
}
and
https://godoc.org/golang.org/x/sys/windows#example-LoadLibrary
...
r, _, _ := syscall.Syscall(uintptr(proc), 0, 0, 0, 0)
...
Why does golang/x/sys rely and encourage the use of the package it's meant to replace?
Disclaimer: I'm pretty new to Go specifically (though not to low-level OS programming). Still, the path here seems clear.
Go, as an ecosystem—not just the language itself, but all the various libraries as well—tries1 to be portable. But direct system calls are pretty much not portable at all. So there is some tension here automatically.
In order to do anything useful, the Go runtime needs various services from the operating system, such as creating OS-level threads, sending and receiving signals, opening files and network connections, and so on. Many of these operations can be, and have been, abstracted away from how it is done on operating systems A, B, and C to generic concepts supported by most or all OSes. These abstractions build on the actual mechanisms in the various OSes.
They may even do this in layers internally. A look at the Go source for the os package, for instance, shows file.go, file_plan9.go, file_posix.go, file_unix.go, and file_windows.go source files. The top of file_posix.go showss a +build directive:
// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build aix darwin dragonfly freebsd js,wasm linux nacl netbsd openbsd solaris windows
Clearly this code itself is not completely portable, but the routines it implements for os, which are wrapped by the os.File abstraction, suffice for all POSIX-conformant systems. That reduces the amount of code that has to go in the Unix/Linux-specific files_unix.go file for instance.
To the extent that OS-level operations can be wrapped into more-abstract, more-portable operations, then, the various built-in Go packages do this. You don't need to know whether there's a different system call for opening a device-file vs a text-file vs a binary-file, for instance, or a long pathname vs a short one: you just call os.Create or os.Open and it does any work necessary behind the scenes.
This whole idea just doesn't fly with system calls. A Linux system call to create a new UID namespace has no Windows equivalent.2 A Windows WaitForMultipleObjects system call has no real equivalent on Linux. The low level details of a stat/lstat call differ from one system to another, and so on.
In early versions of Go, there was some attempt to paper over this with the syscall package. But the link you quoted—https://golang.org/s/go1.4-syscall—describes this attempt as, if not failed, at least overstretched. The last word in the "problems" section is "issues".
The proposal at this same link says that the syscall package is to be frozen (or mostly frozen) as of Go 1.4: don't put new features into it. But the features that are in it are sufficient to implement the new, experimental golang.org/x/sys/* packages, or at least some of them. There's no harm in the experimental package borrowing the existing, formally-deprecated syscall package if that does what the experimental new package needs.
Things in golang.org/x/ are experimental: feel free to use them, but be aware that there are no compatibility promises across version updates, unlike things in the standard packages. So, to answer the last line of your question:
Why does golang/x/sys rely [on] and encourage the use of the package it's meant to replace?
It relies on syscall because that's fine. It doesn't "encourage the use of" syscall at all though. It just uses it when that's sufficient. Should that become insufficient, for whatever reason, it will stop relying on it.
Answering a question you didn't ask (but I did): suppose you want Unix-specific stat information about a file, such as its inode number. You have a choice:
info, err := os.Stat(path) // or os.Lstat(path), etc
if err != nil { ... handle error ... }
raw, ok := info.Sys().(*syscall.Stat_t)
if !ok { ... do whatever is appropriate ... }
inodeNumber := raw.Ino
or:
var info unix.Stat
err := unix.Stat(path, &info) // or unix.Lstat, etc
if err != nil { ... handle error ... }
inodeNumber := unix.Ino
The advantage to the first block of code is that you get all the other (portable) information about the file—its mode and size and time-stamps, for instance. You maybe do, maybe don't get the inode number; the !ok case tells you whether you did. The primary disadvantage here is that it takes more code to do this.
The advantage to the second block of code is that it says just what you mean. You either get all the information from the stat call, or none of it. The disadvantages are obvious:
it only works on Unix-ish systems, and
it uses an experimental package, whose behavior might change.
So it's up to you which of these matters more to you.
1Either this is a metaphor, or I've just anthropomorphized this. There's an old rule: Don't anthropomorphize computers, they hate that!
2A Linux UID namespace maps from UIDs inside a container to UIDs outside the container. That is, inside the container, a file might be owned by UID 1234. If the file is in a file system that is also mounted outside the container, that file can be owned by a different owner, perhaps 5678. Changing the ownership on either "side" of the container makes the change in that side's namespace; the change shows up on the other side as the result of mapping, or reverse-mapping, the ID through the namespace mapping.
(This same trick also works for NFS UID mappings, for instance. The Docker container example above is just one use, but probably the most notable one these days.)

How does Go make system calls?

As far as I know, in CPython, open() and read() - the API to read a file is written in C code. The C code probably calls some C library which knows how to make system call.
What about a language such as Go? Isn't Go itself now written in Go? Does Go call C libraries behind the scenes?
The short answer is "it depends".
Go compiles for multiple combinations of H/W and OS, and they all have different approaches to how syscalls are to be made when working with them.
For instance, Solaris does not provide a stable supported set of syscalls, so they go through the systems libc — just as required by the vendor.
Windows does support a rather stable set of syscalls but it is defined as a C API provided by a set of standard DLLs.
The functions exposed by those DLLs are mostly shims which use a single "make a syscall by number" function, but these numbers are not documented and are different between the kernel flavours and releases (perhaps, intentionally).
Linux does provide a stable and documented set of numbered syscalls and hence there Go just calls the kernel directly.
Now keep in mind that for Go to "call the kernel directly" means following the so-called ABI of the H/W and OS combo. For instance, on modern Linux on amd64 making a syscall requires filling a set of CPU registers with certain values, doing some other arrangements and then issuing the SYSENTER CPU instruction.
On Windows, you have to use its native calling convention (which is stdcall, not cdecl).
Yes go is now written in go. But, you don't need C to make syscalls.
An important thing to call out is that syscalls aren't "written in C." You can make syscalls from C on Unix because of <unistd.h>. In particular, how Linux defines this header is a little convoluted, but you can see from this file the general idea. Syscalls are defined with a name and a number. When you call read for example, what really happens behind the scenes is the parameters are setup in the proper registers/memory (linux expects the syscall number in eax) followed by the instruction syscall which fires interrupt 0x80. The OS has already setup the proper interrupt handlers that will receive this interrupt and the OS goes about doing whatever is needed for that syscall. So, you don't need something written in C (or a standard library for that matter) to make syscalls. You just need to understand the call ABI and know the interrupt numbers.
However, as #retgits points out golang's approach is to piggyback off the fact that libc already has all of the logic for handling syscalls. mksyscall.go is a CLI script that parses these libc files to extract the necessary information.
You can actually trace the life of a syscall if you compile a go script like:
package main
import (
"syscall"
)
func main() {
var buf []byte
syscall.Read(9, buf)
}
Run objdump -D on the resulting binary. The go runtime is rather large, so your best bet is to find the main function, see where it calls syscall.Read and then search for the offsets from there: syscall.Read calls syscall.syscall, syscall.syscall calls runtime.libcCall (which switches from the go ABI to C ABI compatibility so that arguments are located where the OS expects--you can see this in runtime, for darwin for example), runtime.libcCall calls runtime.asmcgocall, etc.
For extra fun, run that binary with gdb and continue stepping in until you hit the syscall.
The sys package takes care of the syscalls to the underlying OS. Depending on the OS you're using different packages are used to generate the appropriate calls. Here is a link to the README for Go running on Unix systems: https://github.com/golang/sys/blob/master/unix/README.md the parts on mksyscall.go, which are hand-written Go files which implement system calls that need special handling, and type files, should walk you through how it works.
The Go compiler (which translates the Go code to target CPU code) is written in Go but that is different to the run time support code which is what you are talking about. The standard library is mainly written in Go and probably knows how to directly make system calls with no C code involved. However, there may be a bit of C support code, depending on the target platform.

Julia 1.1 with JLD HDF5 package and memory release in Windows

I'm using Julia 1.1 with JLD and HDF5 to save a file onto the disk, where I met a couple of question about the memory usage.
Issue 1:
First, I defined a 4 GB matrix A.
A = zeros(ComplexF64,(243,243,4000));
When I type the command and look at windows task manager:
A=nothing
It took several minutes for Julia to release those memory back to me. Most of the time, (In Task manager) Julia just doesn't release the memory usage at all, even though the command returned results saying that A occupied 0 bytes instantly.
varinfo()
name size summary
–––––––––––––––– ––––––––––– –––––––
A 0 bytes Nothing
Base Module
Core Module
InteractiveUtils 162.930 KiB Module
Main Module
ans 0 bytes Nothing
Issue 2:
Further, when I tried to use JLD and HDF5 to save file onto the disk. This time, the task manager told me that, when using the save("test.jld", "A", A) command, an extra 4GB memory was used.
using JLD,HDF5
A = zeros(ComplexF64,(243,243,4000));
save("test.jld", "A", A)
Further, after I typed
A=nothing
Julia won't release the 8 GB memory back to me.
Finding 3:
An interesting thing I found was that, if I retype the command
A = zeros(ComplexF64,(243,243,4000));
The task manager would told me the cashed memory was released, and the total memory usage was again only 4GB.
Question 1:
What's going on with memory management in Julia? Was it just a mistake by Windows, or some command in Julia? How to check the Julia memory usage instantly?
Question 2:
How to tell the Julia to instantly release the memory usage?
Question 3:
Is there a way to tell JLD package not use those extra 4GB meomory?
(Better, could someone tell me how to create A directly on the disk without even creating it in the memory? I knew there's memory mapped I/O in JLD package. I have tried it, but it seemed to require me to create matrix A in the memory and save A onto the disk first, before I could recall the memory mapped A again. )
This is a long question, so thanks ahead!
Julia uses garbage collector to de-alocate the memory. Usually a garbage collector does not run after every line of code but only when needed.
Try to force garbage collection by running the command:
GC.gc()
This releases memory space for unreferenced Julia objects. In this way you can check whether the memory actually has been released.
Side note: JLD used to be somewhat not-always-working (I do not know the current status). Hence you first consideration for non-cross-platform object persistence always should be the serialize function from the in-built Serialization package - check the documentation at https://docs.julialang.org/en/v1/stdlib/Serialization/index.html#Serialization.serialize

Understanding go tool compile and link commands

I understand that there are three steps involved in converting a high level language code to machine language or executable code namely - compiling, assembling, and linking.
As per go docs go tool compile does the following -
It then writes a single object file named for the basename of the first source file with a .o suffix
So, the final object file must contain the machine language code (after compiling and assembling are run) of each file. If I pass go tool compile -S on the go file, it shows assembly language go generates.
After this when I run go tool link on the object file, it must link all required object files (if multiple) and then generate final machine language code (based on GOOS and GOARCH). It generates a file a.out
Have few basic questions here -
How do I know which variables and when are they going to be allocated memory in stack and heap? Does it matter if I generate executable for one machine and run on another with different architecture?
My test program
package main
func f(a *int, b *int) *int {
var c = (*a + *b);
var d = c;
return &d;
}
func main() {
var a = 2;
var b = 6;
f(&a,&b);
}
Result of go tool compile -m -l test.go
test.go:6: moved to heap: d
test.go:7: &d escapes to heap
test.go:3: f a does not escape
test.go:3: f b does not escape
test.go:14: main &a does not escape
test.go:14: main &b does not escape
In which step is the memory getting allocated?
It depends, some during linking, some during compiling, most during runtime (And some during loading).
How do I know which variables and when are they going to be allocated memory in stack and heap?
Ad hoc you do not know at all. The compiler decides this. If the compiler can prove that a variable does not escape it might keep it on the stack. Google for "golang escape analysis". There is a flag -m which makes the compiler output his decisions if you are interested in it.
Does it matter if I generate executable for one machine and run on another with different architecture?
No, but simply because this does not work at all: Executables are tied to architecture and won't run on a different one.
It seems you are mixing up compilation/linking and memory allocation. The later is pretty different from the former two. (Technically your linked program may contain memory and during loading it might get even more, but this is highly technical and architecture specific and really nothing to be concerned of).

CUDA-like workflow for OpenCL

The typical example workflow for OpenCL programming seems to be focused on source code within strings, passed to the JIT compiler, then finally enqueued (with a specific kernel name); and the compilation results can be cached - but that's left for you the programmer to take care of.
In CUDA, the code is compiled in a non-JIT way to object files (alongside host-side code, but forget about that for a second), and then one just refers to device-side functions in the context of an enqueue or arguments etc.
Now, I'd like to have the second kind of workflow, but with OpenCL sources. That is, suppose I have some C host-side code my_app.c, and some OpenCL kernel code in a separate file, my_kernel.cl (which for the purpose of discussion is self-contained). I would like to be able to run a magic command on my_kernel.cl, get a my_kernel.whatever, link or faux-link that together with my_app.o, and get a binary. Now, in my_app.c I want to be able to somehow to refer to the kernel, even if it's not an extern symbol, as compiled OpenCL program (or program + kernel name) - and not get compilation errors.
Is this supported somehow? With nVIDIA's ICD or with one of the other ICDs? If not, is at least some of this supported, say, the magic kernel compiler + generation of an extra header or source stub to use in compiling my_app.c?
Look into SYCL, it offers single-source C++ OpenCL. However, not yet available on every platform.
https://www.khronos.org/sycl
There is already ongoing effort that enables CUDA-like workflow in TensorFlow, and it uses SYCL 1.2 - it is actively up-streamed.
Similarly to CUDA, SYCL's approach needs the following steps:
device registration via device factory ( device is called SYCL ) - done here: https://github.com/lukeiwanski/tensorflow/tree/master/tensorflow/core/common_runtime/sycl
operation registration for above device. In order to create / port operation you can either:
re-use Eigen's code since Tensor module has SYCL back-end ( look here: https://github.com/lukeiwanski/tensorflow/blob/opencl/adjustcontrastv2/tensorflow/core/kernels/adjust_contrast_op.cc#L416 - we just partially specialize operation for SYCL device and calling the already implemented functor https://github.com/lukeiwanski/tensorflow/blob/opencl/adjustcontrastv2/tensorflow/core/kernels/adjust_contrast_op.h#L91;
write SYCL code - it has been done for FillPhiloxRandom - see https://github.com/lukeiwanski/tensorflow/blob/master/tensorflow/core/kernels/random_op.cc#L685
SYCL kernel uses modern C++
you can use OpenCL interoperability - thanks to which you can write pure OpenCL C kernel code! - I think this bit is most relevant to you
The workflow is a bit different as you do not have to do an explicit instantiation of the functor templates as CUDA does https://github.com/lukeiwanski/tensorflow/blob/master/tensorflow/core/kernels/adjust_contrast_op_gpu.cu.cc or any .cu.cc file ( in fact you do not have to add any new files - avoids mess with the build system )
As well as this thing: https://github.com/lukeiwanski/tensorflow/issues/89;
TL;DR - CUDA can create "persistent" pointers, OpenCL needs to go through Buffers and Accessors.
Codeplay's SYCL compiler ( ComputeCpp ) at the moment requires OpenCL 1.2 with SPIR extension - these are Intel CPU, Intel GPU ( Beignet work in progress ), AMD GPU ( although older drivers ) - additional platforms are coming!
Setup instructions can be found here: https://www.codeplay.com/portal/03-30-17-setting-up-tensorflow-with-opencl-using-sycl
Our effort can be tracked in my fork of TensorFlow: https://github.com/lukeiwanski/tensorflow ( branch dev/eigen_mehdi )
Eigen used is: https://bitbucket.org/mehdi_goli/opencl ( branch default )
We are getting there! Contributions are welcome! :)

Resources