Golang profiling - top10 shows only one line with 100% - go

I try to profiling my go library, to find out what is the cause of being so much slower than same thing in c++.
I have simple benchmark
func BenchmarkFile(t *testing.B) {
tmpFile, err := ioutil.TempFile("", TMP_FILE_PREFIX)
fw, err := NewFile(tmpFile.Name())
text := []byte("testing")
for i := 0; i < b.N; i++ {
_, err = fw.Write(text)
}
fw.Close()
}
NewFile return my custom Writer which encodes data to our binary representation, even compress them, and write to file system.
Running go test -bench . -memprofile mem.out -cpuprofile cpu.out I get
PASS
BenchmarkFile-16 2000000000 0.20 ns/op
ok .../writer/iowriter 9.074s
Than analysing it
# go tool pprof cpu.out
Entering interactive mode (type "help" for commands)
(pprof) top10
930ms of 930ms total ( 100%)
flat flat% sum% cum cum%
930ms 100% 100% 930ms 100%
(pprof)
I even try to write example.go app which is using my writer, and add pprof.StartCPUProfile(f) as is shown in http://blog.golang.org/profiling-go-programs but with same result.
What am I doing wrong, and how can I determine what is bottleneck of my lib?
Thank you in advance

Ok it's easy, I miss to add binary to go tool pprof, si it has to be
# go tool pprof write cpu.out
Entering interactive mode (type "help" for commands)
(pprof) top10
7.02s of 7.38s total (95.12%)
Dropped 14 nodes (cum <= 0.04s)
Showing top 10 nodes out of 32 (cum >= 0.19s)
flat flat% sum% cum cum%
6.55s 88.75% 88.75% 6.76s 91.60% syscall.Syscall
...
and when using benchmark tests, binary is created there and using it gives same result.

To expand on sejvolnd's answer:
pprof needs the binary that actually generated cpu.out file as a first argument.
So you need to run the command as go tool pprof <go binary of your program> <generaged profiling output file>
e.g. go tool pprof go_binary cpu.pprof

Related

what happened when use '-race' flag in go build

I am confusing for following code, what's the difference between go run with go run -race, Does the -race will change the program behavior?
// test.go
package main
import "fmt"
func main() {
c := make(chan string)
go func() {
for i := 0; i < 2; i++ {
c <- "hello there"
}
}()
for msg := range c {
fmt.Println(msg)
}
}
when go run test.go, result is :
hello there
hello there
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.main()
/Users/donghui6/go/src/jd.com/iaas-sre/test/test.go:14 +0xf4
exit status 2
when go run -race test.go, program will hang as following:
hello there
hello there
so, who can tell me what happened when use -race flag
what happen[s] when use '-race' flag in go build
Then the program is built with the so called "race detector" enabled. Dataraces are a programming error and any program with a datarace is invalid and its behaviour is undefined. You must never write code with data races.
A data race is when two or more goroutines read and write to the same memory without proper synchronisation. Data races happen but are a major fault of the programmer.
The race detector detects unsynchronised read/writes to the same memory and reports them as a failure (which it is). Note that if the race detector detects a data race your code is buggy even if it runs "properly" without -race.
The race detector is not on always because detecting data races slows down execution drastically.

How to print gc traces by embedding gctrace env with go binary?

I have a simple program that prints hello world. To emit gc trace information am using following command by directly running the main.go file
$ GODEBUG=gctrace=1 go run main.go
gc 1 #0.011s 1%: 0.020+1.3+0.056 ms clock, 0.24+0.62/0.38/1.6+0.67 ms cpu, 4->4->0 MB, 5 MB goal, 12 P
Hello Playground
gc 2 #0.021s 0%: 0.004+0.55+0.006 ms clock, 0.052+0.12/0.36/0.49+0.079 ms cpu, 4->4->0 MB, 5 MB goal, 12 P
Question
Is there a way to build the binary and yet print the gctraces while running the binary directly and without explicitly passing the GODEBUG=gctrace=1 to the binary?
(OR)
How to build binary with GODEBUG=gctrace=1 without explicitly mentioning at runtime ?
$ go build -O main // (builds binary main)
$ ./main // doesn't print gc trace information
Hello Playground
package main
import (
"fmt"
)
func main() {
fmt.Println("Hello, playground")
}

CLI output : killing the process in the middle but no required Output

I am currently working on a project which requires me to send a curl request. The output of the request provides a link which expires in 10 seconds. The link starts a download. I am trying send the request and downloading the file from the link in a go program. But the link providing output takes 10 seconds to complete and after that I cannot access the link. So I decided to kill the process prematurely using timeout command and I am able to download the needful. But when I try the same thing in Go, I am not able to get the output which would be displayed until it got killed.
I am only getting this output.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1320 0 1043 100 277 98 26 0:00:10 0:00:10 --:--:-- 12
I use the following go script.
cmd := exec.Command("chmod", "744", "end.sh")
out, err := cmd.Output()
cmd1 := exec.Command("./end.sh")
out, err = cmd1.CombinedOutput()
if err != nil {
log.Fatalf("cmd.Run(endpoint) failed with %s\n", err)
}
fmt.Println(string(out))
This code calls a shell script.
timeout 5 curl XXXXXXXXX
So what could I do to get the output, from shell script or go script modification?

Go Benchmark how does it work

I've got my Go benchmark working with my API calls but I'm not exactly sure what it means below:
$ go test intapi -bench=. -benchmem -cover -v -cpuprofile=cpu.out
=== RUN TestAuthenticate
--- PASS: TestAuthenticate (0.00 seconds)
PASS
BenchmarkAuthenticate 20000 105010 ns/op 3199 B/op 49 allocs/op
coverage: 0.0% of statements
ok intapi 4.349s
How does it know how many calls it should make? I do have a loop with b.N as size of the loop but how does Golang know how many to run?
Also I now have cpu profile file. How can I use this to view it?
From TFM:
The benchmark function must run the target code b.N times. The benchmark package will vary b.N until the benchmark function lasts long enough to be timed reliably.

does anybody have a simple pprof use on a go-executable?

I have looked at the article about profiling go programs, and I simple do not understand it. Do someone have a simple code example were the performance of code snippet is logged in text file by a profile-"object"?
Here are the commands I use for a simple CPU and memory profiling to get you started.
Let's say you made a benchmark function like this :
File something_test.go :
func BenchmarkProfileMe(b *testing.B) {
// execute the significant portion of the code you want to profile b.N times
}
In a shell script:
# -test XXX is a trick so you don't trigger other tests by asking a non existent specific test called literally XXX
# you can adapt the benchtime depending on the type of code you want to profile.
go test -v -bench ProfileMe -test.run XXX -cpuprofile cpu.pprof -memprofile mem.pprof -benchtime 10s
go tool pprof --text ./something.test cpu.pprof ## To get a CPU profile per function
go tool pprof --text ./something.test cpu.pprof --lines ## To get a CPU profile per line
go tool pprof --text ./something.test mem.pprof ## To get the memory profile
It will present you the hottests spots in each cases on the console.

Resources