golang proxy io.Writer behaves differently when used in a log.Logger - go

I am trying to implement a proxy that satisfies io.Writer, so I can plug it into a logger. The idea is that it will print the output like normal but also keeps a copy of the data to be read later.
The ProxyIO struct in the following code should do this, and indeed it does it as long as I directly call its Write() method. However, when I plug it into a log.Logger instance the output is unexpected.
(This is stripped down code, the original implementation I want to use is with a map and a circular pointer instead of the [][]byte buf used in the example code. Also I removed all the locking.)
package main
import (
"fmt"
"io"
"io/ioutil"
"log"
)
type ProxyIO struct {
out io.Writer // the io we are proxying
buf [][]byte
}
func newProxyIO(out io.Writer) *ProxyIO {
return &ProxyIO{
out: out,
buf: [][]byte{},
}
}
func (r *ProxyIO) Write(s []byte) (int, error) {
r.out.Write(s)
r.buf = append(r.buf, s)
return len(s), nil
}
func main() {
p := newProxyIO(ioutil.Discard)
p.Write([]byte("test1\n"))
p.Write([]byte("test2\n"))
p.Write([]byte("test3\n"))
l := log.New(p, "", 0)
l.Print("test4")
l.Print("test5")
l.Print("test6")
for i, e := range p.buf {
fmt.Printf("%d: %s", i, e)
}
}
(Here is the code on the playground https://play.golang.org/p/UoOq4Nd-rmI)
I would expect the following output from this code:
0: test1
1: test2
2: test3
3: test4
4: test5
5: test6
However, it will always print this:
0: test1
1: test2
2: test3
3: test6
4: test6
5: test6
The behaviour with my map implementation is the same. I also tried using a doubly linked list from container/list as storage, it's always the same. So I must be missing something substantial here.
Why am I seeing the last log output three times in the buffer instead of the last three lines of log output?

If you look at the source code for Logger.Print you'll see it calls logger.Output. You'll notice how it sets the value of the string to l.buf and then calls Write
If you read this answer you'll see that even though everything is pass by value
when you pass a slice to a function, a copy will be made from this
header, including the pointer, which will point to the same backing
array.
So when you do:
l.Print("test4")
l.Print("test5")
l.Print("test6")
Logger is effectively reusing the same slice and you're appending a reference to that same slice three times so naturally upon printing it uses the most recent value set three times.
To fix this you can copy the []byte before using it like this:
func (r *ProxyIO) Write(s []byte) (int, error) {
c := make([]byte, len(s))
copy(c, s)
r.out.Write(c)
r.buf = append(r.buf, c)
return len(c), nil
}
Updated playground: https://play.golang.org/p/DIWC1Xa6w0R

Related

Byte slice converted with unsafe from string changes its address

I have this function to convert string to slice of bytes without copying
func StringToByteUnsafe(s string) []byte {
strh := (*reflect.StringHeader)(unsafe.Pointer(&s))
var sh reflect.SliceHeader
sh.Data = strh.Data
sh.Len = strh.Len
sh.Cap = strh.Len
return *(*[]byte)(unsafe.Pointer(&sh))
}
That works fine, but with very specific setup gives very strange behavior:
The setup is here: https://github.com/leviska/go-unsafe-gc/blob/main/pkg/pkg_test.go
What happens:
Create a byte slice
Convert it into temporary (rvalue) string and with unsafe convert it into byte slice again
Then, copy this slice (by reference)
Then, do something with the second slice inside goroutine
Print the pointers before and after
And I have this output on my linux mint laptop with go 1.16:
go test ./pkg -v -count=1
=== RUN TestSomething
0xc000046720 123 0xc000046720 123
0xc000076f20 123 0xc000046721 z
--- PASS: TestSomething (0.84s)
PASS
ok github.com/leviska/go-unsafe-gc/pkg 0.847s
So, the first slice magically changes its address, while the second isn't
If we remove the goroutine with runtime.GC() (and may be play with the code a little bit), we can get the both pointers to change the value (to the same one).
If we change the unsafe cast to just []byte() everything works without changing the addresses. Also, if we change it to the unsafe cast from here https://stackoverflow.com/a/66218124/5516391 everything works the same.
func StringToByteUnsafe(str string) []byte { // this works fine
var buf = *(*[]byte)(unsafe.Pointer(&str))
(*reflect.SliceHeader)(unsafe.Pointer(&buf)).Cap = len(str)
return buf
}
I run it with GOGC=off and got the same result. I run it with -race and got no errors.
If you run this as main package with main function, it seems to work correctly. Also if you remove the Convert function. My guess is that compiler optimizes stuff in this cases.
So, I have several questions about this:
What the hell is happening? Looks like a weird UB
Why and how go runtime magically changes the address of the variable?
Why in concurentless case it can change both addresses, while in concurrent can't?
What's the difference between this unsafe cast and the cast from stackoverflow answer? Why it does work?
Or is this just a compiler bug?
A copy of the full code from github, you need to put it in some package and run as test:
import (
"fmt"
"reflect"
"sync"
"testing"
"unsafe"
)
func StringToByteUnsafe(s string) []byte {
strh := (*reflect.StringHeader)(unsafe.Pointer(&s))
var sh reflect.SliceHeader
sh.Data = strh.Data
sh.Len = strh.Len
sh.Cap = strh.Len
return *(*[]byte)(unsafe.Pointer(&sh))
}
func Convert(s []byte) []byte {
return StringToByteUnsafe(string(s))
}
type T struct {
S []byte
}
func Copy(s []byte) T {
return T{S: s}
}
func Mid(a []byte, b []byte) []byte {
fmt.Printf("%p %s %p %s\n", a, a, b, b)
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
b = b[1:2]
wg.Done()
}()
wg.Wait()
fmt.Printf("%p %s %p %s\n", a, a, b, b)
return b
}
func TestSomething(t *testing.T) {
str := "123"
a := Convert([]byte(str))
b := Copy(a)
Mid(a, b.S)
}
Answer from the github issue https://github.com/golang/go/issues/47247
The backing store of a is allocated on stack, because it does not
escape. And goroutine stacks can move dynamically. b, on the other
hand, escapes to heap, because it is passed to another goroutine. In
general, we don't assume the address of an object don't change.
This works as intended.
And my version is incorrect because
it uses reflect.SliceHeader as plain struct. You can run go vet on it,
and go vet will warn you.`

Using reflect to update value by reference when argument is not a pointer in go

I've had difficulty learning the basics of reflect, pointers and interface in go, so here's another entry level question I can't seem to figure out.
This code does what I want it to do - I'm using reflect to add another record to a slice that's typed as an interface.
package main
import (
"reflect"
"log"
)
type Person struct {
Name string
}
func Add(slice interface{}) {
s := reflect.ValueOf(slice).Elem()
// in my actual code, p is declared via the use of reflect.New([Type])
p := Person{Name:"Sam"}
s.Set(reflect.Append(s,reflect.ValueOf(p)))
}
func main() {
p := []Person{}
Add(&p)
log.Println(p)
}
If I changed the Add and main function to this, things don't work the way I want it to.
func Add(slice interface{}) {
s := reflect.ValueOf(&slice).Elem()
p := Person{Name:"Sam"}
s.Set(reflect.Append(reflect.ValueOf(slice),reflect.ValueOf(p)))
log.Println(s)
}
func main() {
p := []Person{}
Add(p)
log.Println(p)
}
That is, the log.Println(p) at the end doesn't show a slice with the record Sam in it like the way I had hoped. So my question is whether it's possible for me to have Add() receive a slice that is not a pointer, and for me to still write some code in Add() that will produce the outcome shown in my first scenario?
A lot of my recent questions dance around this kind of subject, so it's still taking me a while to figure out how to use the reflect package effectively.
No, it's not possible to append to a slice in a function without passing in a pointer to the slice. This isn't related to reflection, but to how variables are passed in to functions. Here's the same code, modified to not use reflection:
package main
import (
"log"
)
type Person struct {
Name string
}
func AddWithPtr(slicep interface{}) {
sp := slicep.(*[]Person)
// This modifies p1 itself, since *sp IS p1
*sp = append(*sp, Person{"Sam"})
}
func Add(slice interface{}) {
// s is now a copy of p2
s := slice.([]Person)
sp := &s
// This modifies a copy of p2 (i.e. s), not p2 itself
*sp = append(*sp, Person{"Sam"})
}
func main() {
p1 := []Person{}
// This passes a reference to p1
AddWithPtr(&p1)
log.Println("Add with pointer: ", p1)
p2 := []Person{}
// This passes a copy of p2
Add(p2)
log.Println("Add without pointer:", p2)
}
(Above, when it says 'copy' of the slice, it doesn't mean the copy of the underlying data - just the slice)
When you pass in a slice, the function effectively gets a new slice that refers to the same data as the original. Appending to the slice in the function increases the length of the new slice, but doesn't change the length of the original slice that was passed in. That's why the original slice remains unchanged.

Golang bytes.Buffer - Passby value issue

The below golang(go1.10.2) code will give an unexpected output
package main
import (
"bytes"
"fmt"
)
func main() {
var b bytes.Buffer
//Commenting the below line will fix the problem
b.WriteString("aas-")
fmt.Printf("Before Calling - \"%s\"\n", b.String())
b = makeMeMad(b)
fmt.Printf("FinalValue - \"%s\"\n", b.String())
}
func makeMeMad(b bytes.Buffer) bytes.Buffer {
b.WriteString("xcxxcx asdasdas dasdsd asdasdasdasd")
fmt.Printf("Write More - \"%s\"\n", b.String())
/*
//This will fix the problem
var newBuffer bytes.Buffer
newBuffer.WriteString(b.String())
return newBuffer
*/
return b
}
Output
Before Calling - "aas-"
Write More - "aas-xcxxcx asdasdas dasdsd asdasdasdasd"
FinalValue - "aas- "
I was expecting "aas-xcxxcx asdasdas dasdsd asdasdasdasd" in the last line of output. Could anyone please explain.
Under the hood bytes.Buffer contain among other unexported fields bootstrap array and buf slice. While buffer content is small slice points to internal array to avoid allocation. When you pass bytes.Buffer argument as value, function receives a copy. Slice is reference type, so this copy's slice continue to point on the original buffer's array. When you write to this copy's slice you actually write to original's bootstrap array, copy's array stay unchanged("aas-" in our case). Then you return this copy and you can print it. But when you assign it to variable containing original, bootstrap array first assigned("aas-") and then buf slice pointed on it.
Bootstrap array is [64]byte, so you can use long string literals >64 in you code snippet and see things works as expected when buffer allocate buf slice.
Also here simplified example trying to show why all this looks so contrintuitive.
It is mentioned in Golang FAQ section as:
If an interface value contains a pointer *T, a method call can obtain
a value by dereferencing the pointer, but if an interface value
contains a value T, there is no useful way for a method call to obtain
a pointer.
Even in cases where the compiler could take the address of a value to
pass to the method, if the method modifies the value the changes will
be lost in the caller. As an example, if the Write method of
bytes.Buffer used a value receiver rather than a pointer, this code:
var buf bytes.Buffer
io.Copy(buf, os.Stdin)
would copy standard input into a copy of buf, not into buf itself
The error is because you're not passing the address of buffer inside makeMeMad function. That's why it has not override the original buffer inside main function.
Pass address to the created buffer to append string to the existing buffer value.
package main
import (
"bytes"
"fmt"
)
func main() {
var b bytes.Buffer
//Commenting the below line will fix the problem
b.WriteString("aas-")
fmt.Printf("Before Calling - \"%s\"\n", b.String())
makeMeMad(&b)
fmt.Printf("FinalValue - \"%s\"\n", b.String())
}
func makeMeMad(b *bytes.Buffer) {
b.WriteString("xcxxcx asdasdas dasdsd asdasdasdasd")
fmt.Printf("Write More - \"%s\"\n", b.String())
/*
//This will fix the problem
var newBuffer bytes.Buffer
newBuffer.WriteString(b.String())
return newBuffer
*/
}
Playground Example
Or you can assign the returned buffer value to a new variable and you will get the updated buffer value.
package main
import (
"bytes"
"fmt"
)
func main() {
var b bytes.Buffer
//Commenting the below line will fix the problem
b.WriteString("aas-")
fmt.Printf("Before Calling - \"%s\"\n", b.String())
ab := makeMeMad(b)
fmt.Printf("FinalValue - \"%s\"\n", ab.String())
}
func makeMeMad(b bytes.Buffer) bytes.Buffer {
b.WriteString("xcxxcx asdasdas dasdsd asdasdasdasd")
fmt.Printf("Write More - \"%s\"\n", b.String())
/*
//This will fix the problem
var newBuffer bytes.Buffer
newBuffer.WriteString(b.String())
return newBuffer
*/
return b
}
Working Code on Go Playground
Or you can create a global buffer to change the value inside the buffer whenever it is written by any function.
package main
import (
"bytes"
"fmt"
)
var b bytes.Buffer
func main() {
//Commenting the below line will fix the problem
b.WriteString("aas-")
fmt.Printf("Before Calling - \"%s\"\n", b.String())
b := makeMeMad(b)
fmt.Printf("FinalValue - \"%s\"\n", b.String())
}
func makeMeMad(b bytes.Buffer) bytes.Buffer {
b.WriteString("xcxxcx asdasdas dasdsd asdasdasdasd")
fmt.Printf("Write More - \"%s\"\n", b.String())
/*
//This will fix the problem
var newBuffer bytes.Buffer
newBuffer.WriteString(b.String())
return newBuffer
*/
return b
}
Playground Example

Golang interface benefits

I read about the interfaces a lot and I think I understand how it works. I read about the interface{} type and use it to take an argument of function. It is clear. My question (and what I don't understand) is what is my benefit if I am using it. It is possible I didn't get it entirely but for example I have this:
package main
import (
"fmt"
)
func PrintAll(vals []interface{}) {
for _, val := range vals {
fmt.Println(val)
}
}
func main() {
names := []string{"stanley", "david", "oscar"}
vals := make([]interface{}, len(names))
for i, v := range names {
vals[i] = v
}
PrintAll(vals)
}
Why is it better than this:
package main
import (
"fmt"
)
func PrintAll(vals []string) {
for _, val := range vals {
fmt.Println(val)
}
}
func main() {
names := []string{"stanley", "david", "oscar"}
PrintAll(names)
}
If you're always want to print string values, then the first using []interface{} is not better at all, it's worse as you lose some compile-time checking: it won't warn you if you pass a slice which contains values other than strings.
If you want to print values other than strings, then the second with []string wouldn't even compile.
For example the first also handles this:
PrintAll([]interface{}{"one", 2, 3.3})
While the 2nd would give you a compile-time error:
cannot use []interface {} literal (type []interface {}) as type []string in argument to PrintAll
The 2nd gives you compile-time guarantee that only a slice of type []string is passed; should you attempt to pass anything other will result in compile-time error.
Also see related question: Why are interfaces needed in Golang?

io.MultiWriter vs. golang's pass-by-value

I'd like to create a situation where everything set to a particular log.Logger is also appended to a particular variable's array of strings.
The variable's type implements the io.Writer interface so it should be easy to add that via io.MultiWriter to log.New(), but I seem to have run into an intractable problem: the io.Writer interface is fixed and it's impossible for the variable to reference itself given golang's pass-by-value.
Maybe it will make more sense with an example:
package main
import "fmt"
import "io"
import "log"
import "os"
import "strings"
var Log *log.Logger
type Job_Result struct {
Job_ID int64
// other stuff
Log_Lines []string
}
// satisfies io.Writer interface
func (jr Job_Result) Write (p []byte) (n int, err error) {
s := strings.TrimRight(string(p),"\n ")
jr.Log_Lines= append(jr.Log_Lines,s)
return len(s), nil
}
func (jr Job_Result) Dump() {
fmt.Println("\nHere is a dump of the job result log lines:")
for n, s := range jr.Log_Lines{
fmt.Printf("\tline %d: %s\n",n,s)
}
}
func main() {
// make a Job_Result
var jr Job_Result
jr.Job_ID = 123
jr.Log_Lines = make([]string,0)
// create an io.MultiWriter that points to both stdout
// and that Job_Result var
var writers io.Writer
writers = io.MultiWriter(os.Stdout,jr)
Log = log.New(writers,
"",
log.Ldate|log.Ltime|log.Lshortfile)
// send some stuff to the log
Log.Println("program starting")
Log.Println("something happened")
Log.Printf("last thing that happened, should be %drd line\n",3)
jr.Dump()
}
This is the output, which is not surprising:
2016/07/28 07:20:07 testjob.go:43: program starting
2016/07/28 07:20:07 testjob.go:44: something happened
2016/07/28 07:20:07 testjob.go:45: last thing that happened, should be 3rd line
Here is a dump of the job result log lines:
I understand the problem - Write() is getting a copy of the Job_Result variable, so it's dutifully appending and then the copy vanishes as it's local. I should pass it a pointer to the Job_Result...but I'm not the one calling Write(), it's done by the Logger, and I can't change that.
I thought this was a simple solution to capturing log output into an array (and there is other subscribe/unsubscribe stuff I didn't show), but it all comes down to this problematic io.Write() interface.
Pilot error? Bad design? Something I'm not grokking? Thanks for any advice.
redefine the write function (is now pointer receiver)
// satisfies io.Writer interface
func (jr *Job_Result) Write (p []byte) (n int, err error) {
s := strings.TrimRight(string(p),"\n ")
jr.Log_Lines= append(jr.Log_Lines,s)
return len(s), nil
}
initialize
jr := new(Job_Result) // makes a pointer.
rest stays as is. This way, *Job_Result still implements io.Writer, but doesn't lose state.
The go tutorial already said, when a method modifies the receiver, you should probably use a pointer receiver, or the changes may be lost. Working with a pointer instead of the actual object has little downside, when you want to make sure, there is exactly one object. (And yes, it technically isn't an object).

Resources