How to demonstrate memory visibility problems in Go? - go

I'm giving a presentation about the Go Memory Model. The memory model says that without a happens-before relationship between a write in one goroutine, and a read in another goroutine, there is no guarantee that the reader will observe the change.
To have a bigger impact on the audience, instead of just telling them that bad things can happen if you don't synchronize, I'd like to show them.
When I run the below code on my machine (2017 MacBook Pro with 3.5GHz dual-core Intel Core i7), it exits successfully.
Is there anything I can do to demonstrate the memory visibility issues?
For example are there any specific changes to the following values I could make to demonstrate the issue:
use different compiler settings
use an older version of Go
run on a different operating system
run on different hardware (such as ARM or a machine with multiple NUMA nodes).
For example in Java the flags -server and -client affect the optimizations the JVM takes and lead to visibility issues occurring.
I'm aware that the answer may be no, and that the spec may have been written to give future maintainers more flexibility with optimization. I'm aware I can make the code never exit by setting GOMAXPROCS=1 but that doesn't demonstrate visibility issues.
package main
var a string
var done bool
func setup() {
a = "hello, world"
done = true
}
func main() {
go setup()
for !done {
}
print(a)
}

Related

how to understand the relation between uintptr and struct?

I have learned code like the following
func str2bytes(s string) []byte {
x := (*[2]uintptr)(unsafe.Pointer(&s))
h := [3]uintptr{x[0], x[1], x[1]}
return *(*[]byte)(unsafe.Pointer(&h))
}
this function is to change string to []byte without the stage copying data.
I try to convert num to reverseNum
type Num struct {
name int8
value int8
}
type ReverseNum struct {
value int8
name int8
}
func main() {
n := Num{100, 10}
z := (*[2]uintptr)(unsafe.Pointer(&n))
h := [2]uintptr{z[1], z[0]}
fmt.Println(*(*ReverseNum)(unsafe.Pointer(&h))) // print result is {0, 0}
}
this code doesn't get the result I want.
Can anybody tell my about
That's too compilcated.
A simpler
package main
import (
"fmt"
"unsafe"
)
type Num struct {
name int8
value int8
}
type ReverseNum struct {
value int8
name int8
}
func main() {
n := Num{name: 42, value: 12}
p := (*ReverseNum)(unsafe.Pointer(&n))
fmt.Println(p.value, p.name)
}
outputs "42, 12".
But the real question is why on Earth would you want to go for such trickery instead of copying two freaking bytes which is done instantly on any sensible CPU Go programs run on?
Another problem with your approach is that IIUC nothing in the Go language specification guarantees that two types which have seemingly identical fields must have identical memory layouts. I beleive they should on most implementations but I do not think they are required to do that.
Also consider that seemingly innocuous things like also having an extra field (even of type struct{}!) in your data type may do interesting things to memory layouts of the variables of those types, so it may be outright dangerous to assume you may reinterpret memory of Go variables the way you want.
... I just want to learn about the principle behind the package unsafe.
It's an escape hatch.
All strongly-typed but compiled languages have a basic problem: the actual machines on which the compiled programs will run do not have the same typing system as the compiler.1 That is, the machine itself probably has a linear address space where bytes are assembled into machine words that are grouped into pages, and so on. The operating system may also provide access at, say, page granularity: if you need more memory, the OS will give you one page—4096 bytes, or 8192 bytes, or 65536 bytes, or whatever the page size is—of additional memory at a time.
There are many ways to attack this problem. For instance, one can write code directly in machine (or assembly) language, using the hardware's instruction set, to talk to the OS to achieve OS-level things. This code can then talk to the compiled program, acting as the go-between. If the compiled program needs to allocate a 40-byte data structure, this machine-level code can figure out how to do that within the strictures of the OS's page-size allocations.
But writing machine code is difficult and time-consuming. That's precisely why we have high-level languages and compilers in the first place. What if we had a way to, within the high-level language, violate the normal rules imposed by the language? By violating specific requirements in specific ways, carefully coordinating those ways with all other code that also violates those requirements, we can, in code we keep away from the usual application programming, write much of our memory-management, process-management, and so on in our high-level language.
In other words, we can use unsafe (or something similar in other languages) to deliberately break the type-safety provided by our high level language. When we do this—when we break the rules—we must know what all the rules are, and that our specific violations here will function correctly when combined with all the normal code that does obey the normal rules and when combined with all the special, unsafe code that breaks the rules.
This often requires help from the compiler itself. If you inspect the runtime source distributed with Go, you will find routines with annotations like go:noescape, go:noinline, go:nosplit, and go:nowritebarrier. You need to know when and why these are required if you are going to make much use of some of the escape-hatch programming.
A few of the simpler uses, such as tricks to gain access to string or slice headers, are ... well, they are still unsafe, but they are unsafe in more-predictable ways and do not require this kind of close coordination with the compiler itself.
To understand how, when, and why they work, you need to understand how the compiler and runtime allocate and work with strings and slices, and in some cases, how memory is laid out on the hardware, and some of the rules about Go's garbage collector. In particular, the GC code is aware of unsafe.Pointer but not of uintptr. Much of this is pretty tricky: see, e.g., https://utcc.utoronto.ca/~cks/space/blog/programming/GoUintptrVsUnsafePointer and the link to https://github.com/golang/go/issues/19135, in which writing nil to a Go pointer value caused Go's garbage collector to complain, because the write caused the GC to inspect the previously stored value, which was invalid.
1See this Wikipedia article on the Intel 432 for a notable attempt at designing hardware to run compiled high level languages. There have been others in the past as well, often with the same fate, though some IBM projects have been more successful.

Why does the golang.org/x/sys package encourage the use of the syscall package it's meant to replace?

I have read some Go code making use of syscall for low-level interaction with the underlying OS (e.g. Linux or Windows).
I wanted to make use of the same package for native Windows development, but reading its documentation says it's deprecated in favor of golang/x/sys:
$ go doc syscall
package syscall // import "syscall"
Package syscall contains an interface to the low-level operating system
primitives.
...
Deprecated: this package is locked down. Callers should use the
corresponding package in the golang.org/x/sys repository instead. That is
also where updates required by new systems or versions should be applied.
See https://golang.org/s/go1.4-syscall for more information.
Now, reading the documentation for golang/x/sys and inspecting its code, it relies heavily on and encourages the use of the syscall package:
https://github.com/golang/sys/blob/master/windows/svc/example/beep.go
package main
import (
"syscall"
)
var (
beepFunc = syscall.MustLoadDLL("user32.dll").MustFindProc("MessageBeep")
)
func beep() {
beepFunc.Call(0xffffffff)
}
and
https://godoc.org/golang.org/x/sys/windows#example-LoadLibrary
...
r, _, _ := syscall.Syscall(uintptr(proc), 0, 0, 0, 0)
...
Why does golang/x/sys rely and encourage the use of the package it's meant to replace?
Disclaimer: I'm pretty new to Go specifically (though not to low-level OS programming). Still, the path here seems clear.
Go, as an ecosystem—not just the language itself, but all the various libraries as well—tries1 to be portable. But direct system calls are pretty much not portable at all. So there is some tension here automatically.
In order to do anything useful, the Go runtime needs various services from the operating system, such as creating OS-level threads, sending and receiving signals, opening files and network connections, and so on. Many of these operations can be, and have been, abstracted away from how it is done on operating systems A, B, and C to generic concepts supported by most or all OSes. These abstractions build on the actual mechanisms in the various OSes.
They may even do this in layers internally. A look at the Go source for the os package, for instance, shows file.go, file_plan9.go, file_posix.go, file_unix.go, and file_windows.go source files. The top of file_posix.go showss a +build directive:
// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build aix darwin dragonfly freebsd js,wasm linux nacl netbsd openbsd solaris windows
Clearly this code itself is not completely portable, but the routines it implements for os, which are wrapped by the os.File abstraction, suffice for all POSIX-conformant systems. That reduces the amount of code that has to go in the Unix/Linux-specific files_unix.go file for instance.
To the extent that OS-level operations can be wrapped into more-abstract, more-portable operations, then, the various built-in Go packages do this. You don't need to know whether there's a different system call for opening a device-file vs a text-file vs a binary-file, for instance, or a long pathname vs a short one: you just call os.Create or os.Open and it does any work necessary behind the scenes.
This whole idea just doesn't fly with system calls. A Linux system call to create a new UID namespace has no Windows equivalent.2 A Windows WaitForMultipleObjects system call has no real equivalent on Linux. The low level details of a stat/lstat call differ from one system to another, and so on.
In early versions of Go, there was some attempt to paper over this with the syscall package. But the link you quoted—https://golang.org/s/go1.4-syscall—describes this attempt as, if not failed, at least overstretched. The last word in the "problems" section is "issues".
The proposal at this same link says that the syscall package is to be frozen (or mostly frozen) as of Go 1.4: don't put new features into it. But the features that are in it are sufficient to implement the new, experimental golang.org/x/sys/* packages, or at least some of them. There's no harm in the experimental package borrowing the existing, formally-deprecated syscall package if that does what the experimental new package needs.
Things in golang.org/x/ are experimental: feel free to use them, but be aware that there are no compatibility promises across version updates, unlike things in the standard packages. So, to answer the last line of your question:
Why does golang/x/sys rely [on] and encourage the use of the package it's meant to replace?
It relies on syscall because that's fine. It doesn't "encourage the use of" syscall at all though. It just uses it when that's sufficient. Should that become insufficient, for whatever reason, it will stop relying on it.
Answering a question you didn't ask (but I did): suppose you want Unix-specific stat information about a file, such as its inode number. You have a choice:
info, err := os.Stat(path) // or os.Lstat(path), etc
if err != nil { ... handle error ... }
raw, ok := info.Sys().(*syscall.Stat_t)
if !ok { ... do whatever is appropriate ... }
inodeNumber := raw.Ino
or:
var info unix.Stat
err := unix.Stat(path, &info) // or unix.Lstat, etc
if err != nil { ... handle error ... }
inodeNumber := unix.Ino
The advantage to the first block of code is that you get all the other (portable) information about the file—its mode and size and time-stamps, for instance. You maybe do, maybe don't get the inode number; the !ok case tells you whether you did. The primary disadvantage here is that it takes more code to do this.
The advantage to the second block of code is that it says just what you mean. You either get all the information from the stat call, or none of it. The disadvantages are obvious:
it only works on Unix-ish systems, and
it uses an experimental package, whose behavior might change.
So it's up to you which of these matters more to you.
1Either this is a metaphor, or I've just anthropomorphized this. There's an old rule: Don't anthropomorphize computers, they hate that!
2A Linux UID namespace maps from UIDs inside a container to UIDs outside the container. That is, inside the container, a file might be owned by UID 1234. If the file is in a file system that is also mounted outside the container, that file can be owned by a different owner, perhaps 5678. Changing the ownership on either "side" of the container makes the change in that side's namespace; the change shows up on the other side as the result of mapping, or reverse-mapping, the ID through the namespace mapping.
(This same trick also works for NFS UID mappings, for instance. The Docker container example above is just one use, but probably the most notable one these days.)

Actual memory used by web server. Memory not released to OS

I push a bunch of requests through the web server, and according to Htop / activity monitor on my mac, Virt is 530G, Res is 247Mb.
The memory doesn't seem to be being released to the OS. I tried adding the following for force memory to be returned to OS as a test to no avail:
func freeMem() {
tick := time.Tick(time.Second * 10)
for range tick {
debug.FreeOSMemory()
}
}
and at the top of main, calling go freeMem(), but this seems to have no effect.
So I tried checking garbage collector is working properly and visualising with dave cheney's gcvis https://github.com/davecheney/gcvis:
Looks like gcvis shows things are working fine and dandy, but htop & activity monitor seem to be v high memory usage.
Do I have anything to worry about? One thing I did notice in gcvis, is that whilst gc.heapinuse goes down to acceptable levels, scvg.released and scvg.sys seem to remain high.

Stimulate code-inlining

Unlike in languages like C++, where you can explicitly state inline, in Go the compiler dynamically detects functions that are candidate for inlining (which C++ can do too, but Go can't do both). Also there's a debug option to see possible inlining happening, yet there is very few documented online about the exact logic of the go compiler(s) doing this.
Let's say I need to rerun some big loop over a set of data every n-period;
func Encrypt(password []byte) ([]byte, error) {
return bcrypt.GenerateFromPassword(password, 13)
}
for id, data := range someDataSet {
newPassword, _ := Encrypt([]byte("generatedSomething"))
data["password"] = newPassword
someSaveCall(id, data)
}
Aiming for example for Encrypt to being inlined properly what logic should I need to take into consideration for the compiler?
I know from C++ that passing by reference will increase likeliness for automatic inlining without the explicit inline keyword, but it's not very easy to understand what the compiler exactly does to determine the decisions on choosing to inline or not in Go. Scriptlanguages like PHP for example suffer immensely if you do a loop with a constant addSomething($a, $b) where benchmarking such a billion cycles the cost of it versus $a + $b (inline) is almost ridiculous.
Until you have performance problems, you shouldn't care. Inlined or not, it will do the same.
If performance does matter and it makes a noticable and significant difference, then don't rely on current (or past) inlining conditions, "inline" it yourself (do not put it in a separate function).
The rules can be found in the $GOROOT/src/cmd/compile/internal/inline/inl.go file. You may control its aggressiveness with the 'l' debug flag.
// The inlining facility makes 2 passes: first caninl determines which
// functions are suitable for inlining, and for those that are it
// saves a copy of the body. Then InlineCalls walks each function body to
// expand calls to inlinable functions.
//
// The Debug.l flag controls the aggressiveness. Note that main() swaps level 0 and 1,
// making 1 the default and -l disable. Additional levels (beyond -l) may be buggy and
// are not supported.
// 0: disabled
// 1: 80-nodes leaf functions, oneliners, panic, lazy typechecking (default)
// 2: (unassigned)
// 3: (unassigned)
// 4: allow non-leaf functions
//
// At some point this may get another default and become switch-offable with -N.
//
// The -d typcheckinl flag enables early typechecking of all imported bodies,
// which is useful to flush out bugs.
//
// The Debug.m flag enables diagnostic output. a single -m is useful for verifying
// which calls get inlined or not, more is for debugging, and may go away at any point.
Also check out blog post: Dave Cheney - Five things that make Go fast (2014-06-07) which writes about inlining (long post, it's about in the middle, search for the "inline" word).
Also interesting discussion about inlining improvements (maybe Go 1.9?): cmd/compile: improve inlining cost model #17566
Better still, don’t guess, measure!
You should trust the compiler and avoid trying to guess its inner workings as it will change from one version to the next.
There are far too many tricks the compiler, the CPU or the cache can play to be able to predict performance from source code.
What if inlining makes your code bigger to the point that it doesn’t fit in the cache line anymore, making it much slower than the non-inlined version? Cache locality can have a much bigger impact on performance than branching.

Desktop Duplication and C++ AMP incompatible?

I want to capture and compress the screen on the GPU. C++ AMP and DXGI Desktop Duplication each work individually, but don't seem to work together.
Example:
This project works great, but adding minimal C++ AMP code near the top of DesktopDuplication.cpp makes it fail:
#include <amp.h>
//void f() { Concurrency::direct3d::create_accelerator_view( nullptr ); }
//void f() { Concurrency::accelerator default_acc; }
void f() { Concurrency::accelerator::get_all(); }
Even though f() is never called, m_Factory->CreateSwapChainForHwnd(...) returns E_ACCESSDENIED. (The commented versions of f() produce the same result.)
In my own project, IDXGIOutput1::DuplicateOutput() returns DXGI_ERROR_UNSUPPORTED when I attempt to use C++ AMP.
What's going on?
Update: In the NVIDIA Control Panel, changing the "Preferred graphics processor" to "Integrated graphics" works. (But, using the NVIDIA card is much preferred.)
MSDN does not state this as a mandatory requirement, however still suggests that you don't use the API across mutlithreaded environment:
An application can use IDXGIOutputDuplication on a separate thread to receive the desktop images and to feed them into their specific image-processing pipeline.
That is, it is suggested that you have a single loop aligned with specific thread that captures the updates, and further on you are free to leverage multithreading of sorts to speed up processing.

Resources