runtime._ExternalCode Cpu usage is too high, Up to 80% - go

I wrote a tcp handler in golang, about 300 connections per second. There was no problem with the program just released to production. But after running for about 10 days, I see that the cpu usage is up to 100%. I used the golang tool "go tool pprof" to get the information of cpu usage :
File: gateway-w
Type: cpu
Time: Nov 7, 2018 at 5:38pm (CST)
Duration: 30.14s, Total samples = 30.13s ( 100%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 27.42s, 91.01% of 30.13s total
Dropped 95 nodes (cum <= 0.15s)
Showing top 10 nodes out of 28
flat flat% sum% cum cum%
24.69s 81.94% 81.94% 24.69s 81.94% runtime._ExternalCode /usr/local/go/src/runtime/proc.go
0.57s 1.89% 83.84% 0.57s 1.89% runtime.lock /usr/local/go/src/runtime/lock_futex.go
0.56s 1.86% 85.70% 0.56s 1.86% runtime.unlock /usr/local/go/src/runtime/lock_futex.go
0.26s 0.86% 86.56% 5.37s 17.82% gateway-w/connect/connect-tcp.tcpStartSession /go/src/gateway-w/connect/connect-tcp/tcp_framework.go
0.25s 0.83% 87.39% 1.67s 5.54% net.(*conn).Read /usr/local/go/src/net/net.go
0.24s 0.8% 88.18% 1.41s 4.68% net.(*netFD).Read /usr/local/go/src/net/fd_unix.go
0.23s 0.76% 88.95% 0.23s 0.76% runtime.nanotime /usr/local/go/src/runtime/sys_linux_amd64.s
0.22s 0.73% 89.68% 0.22s 0.73% internal/poll.(*fdMutex).incref /usr/local/go/src/internal/poll/fd_mutex.go
0.21s 0.7% 90.38% 0.21s 0.7% internal/poll.(*fdMutex).rwunlock /usr/local/go/src/internal/poll/fd_mutex.go
0.19s 0.63% 91.01% 0.19s 0.63% internal/poll.(*fdMutex).rwlock /usr/local/go/src/internal/poll/fd_mutex.go
my tcpHandle code is like this:
func tcpStartSession(conn net.Conn) {
defer closeTcp(conn)
var (last, n int
err error
buff []byte
)
last, n, err, buff =
0, 0, nil,
make([]byte, MAX_PACKET_LEN)
for {
// set read timeout
conn.SetReadDeadline(time.Now().Add(time.Duration(tcpTimeOutSec) * time.Second))
n, err = conn.Read(buff[last:])
if err != nil {
log.Info("tcp read error maybe timeout , ", err)
break
}
if n == 0 {
log.Debug("empty packet, continue")
continue
}
log.Debug("read bytes ", n)
log.Info("get a raw package:", hex.EncodeToString(buff[:last+n]))
last += n
...
for {
if last == 0 {
break
}
ret, err := protoHandle.IsWhole(buff[:last])
if err != nil {
log.Warn("proto handle check iswhole error", err)
}
log.Debug("rest buffer len = %d\n", ret)
if ret < 0 {
//wait for more tcp fragment.
break
}
packetLen := last - ret
packetBuf := make([]byte, packetLen)
copy(packetBuf, buff[:packetLen])
last = ret
if last > 0 {
copy(buff, buff[packetLen:packetLen+last])
}
...
}
}
}
I can't understand what runtime._ExternalCode means. This is the function inside golang.
my golang version is :go version go1.9.2 linux/amd64
my program is running on docker
my docker version is : 1.12.6
I hope someone can help me. Thank you very much!
I tried to upgrade the golang version to 1.10.3. After running for more than half a year, there was no problem. Recently, the same problem occurred, but I have not changed the program code. I suspect that there is a problem with this code:
conn.SetReadDeadline(time.Now().Add(time.Duration(tcpTimeOutSec) * time.Second))
Need your help, thank you.

As you have confirmed that your program was not built with CGO_ENABLED=0
The problem is (probably) in the C go parts of the program. pprof can't profile inside sections of C libraries
I believe there are some other things that count as "_External" like time.Now on some systems

Related

Size control on logging an unknown length of parameters

The Problem:
Right now, I'm logging my SQL query and the args that related to that query, but what will happen if my args weight a lot? say 100MB?
The Solution:
I want to iterate over the args and once they exceeded the 0.5MB I want to take the args up till this point and only log them (of course I'll use the entire args set in the actual SQL query).
Where am stuck:
I find it hard to find the size on the disk of an interface{}.
How can I print it? (there is a nicer way to do it than %v?)
The concern is mainly focused on the first section, how can I find the size, I need to know the type, if its an array, stack, heap, etc..
If code helps, here is my code structure (everything sits in dal pkg in util file):
package dal
import (
"fmt"
)
const limitedLogArgsSizeB = 100000 // ~ 0.1MB
func parsedArgs(args ...interface{}) string {
currentSize := 0
var res string
for i := 0; i < len(args); i++ {
currentEleSize := getSizeOfElement(args[i])
if !(currentSize+currentEleSize =< limitedLogArgsSizeB) {
break
}
currentSize += currentEleSize
res = fmt.Sprintf("%s, %v", res, args[i])
}
return "[" + res + "]"
}
func getSizeOfElement(interface{}) (sizeInBytes int) {
}
So as you can see I expect to get back from parsedArgs() a string that looks like:
"[4378233, 33, true]"
for completeness, the query that goes with it:
INSERT INTO Person (id,age,is_healthy) VALUES ($0,$1,$2)
so to demonstrate the point of all of this:
lets say the first two args are equal exactly to the threshold of the size limit that I want to log, I will only get back from the parsedArgs() the first two args as a string like this:
"[4378233, 33]"
I can provide further details upon request, Thanks :)
Getting the memory size of arbitrary values (arbitrary data structures) is not impossible but "hard" in Go. For details, see How to get memory size of variable in Go?
The easiest solution could be to produce the data to be logged in memory, and you can simply truncate it before logging (e.g. if it's a string or a byte slice, simply slice it). This is however not the gentlest solution (slower and requires more memory).
Instead I would achieve what you want differently. I would try to assemble the data to be logged, but I would use a special io.Writer as the target (which may be targeted at your disk or at an in-memory buffer) which keeps track of the bytes written to it, and once a limit is reached, it could discard further data (or report an error, whatever suits you).
You can see a counting io.Writer implementation here: Size in bits of object encoded to JSON?
type CounterWr struct {
io.Writer
Count int
}
func (cw *CounterWr) Write(p []byte) (n int, err error) {
n, err = cw.Writer.Write(p)
cw.Count += n
return
}
We can easily change it to become a functional limited-writer:
type LimitWriter struct {
io.Writer
Remaining int
}
func (lw *LimitWriter) Write(p []byte) (n int, err error) {
if lw.Remaining == 0 {
return 0, io.EOF
}
if lw.Remaining < len(p) {
p = p[:lw.Remaining]
}
n, err = lw.Writer.Write(p)
lw.Remaining -= n
return
}
And you can use the fmt.FprintXXX() functions to write into a value of this LimitWriter.
An example writing to an in-memory buffer:
buf := &bytes.Buffer{}
lw := &LimitWriter{
Writer: buf,
Remaining: 20,
}
args := []interface{}{1, 2, "Looooooooooooong"}
fmt.Fprint(lw, args)
fmt.Printf("%d %q", buf.Len(), buf)
This will output (try it on the Go Playground):
20 "[1 2 Looooooooooooon"
As you can see, our LimitWriter only allowed to write 20 bytes (LimitWriter.Remaining), and the rest were discarded.
Note that in this example I assembled the data in an in-memory buffer, but in your logging system you can write directly to your logging stream, just wrap it in LimitWriter (so you can completely omit the in-memory buffer).
Optimization tip: if you have the arguments as a slice, you may optimize the truncated rendering by using a loop, and stop printing arguments once the limit is reached.
An example doing this:
buf := &bytes.Buffer{}
lw := &LimitWriter{
Writer: buf,
Remaining: 20,
}
args := []interface{}{1, 2, "Loooooooooooooooong", 3, 4, 5}
io.WriteString(lw, "[")
for i, v := range args {
if _, err := fmt.Fprint(lw, v, " "); err != nil {
fmt.Printf("Breaking at argument %d, err: %v\n", i, err)
break
}
}
io.WriteString(lw, "]")
fmt.Printf("%d %q", buf.Len(), buf)
Output (try it on the Go Playground):
Breaking at argument 3, err: EOF
20 "[1 2 Loooooooooooooo"
The good thing about this is that once we reach the limit, we don't have to produce the string representation of the remaining arguments that would be discarded anyway, saving some CPU (and memory) resources.

Why does time.Now().UnixNano() returns the same result after an IO operation?

I use time.Now().UnixNano() to calculate the execution time for some part of my code, but I find an interesting thing. The elapsed time is sometimes zero after an IO operation! What's wrong with it?
The code is running in Go 1.11, and use the standard library "time". Redis library is "github.com/mediocregopher/radix.v2/redis". The redis server version is 3.2. I'm running this on Windows, with VSCode Editor.
isGatherTimeStat = false
if rand.Intn(100) < globalConfig.TimeStatProbability { // Here I set TimeStatProbability 100
isGatherTimeStat = true
}
if isGatherTimeStat {
timestampNano = time.Now()
}
globalLogger.Info("time %d", time.Now().UnixNano())
resp := t.redisConn.Cmd("llen", "log_system")
globalLogger.Info("time %d", time.Now().UnixNano())
if isGatherTimeStat {
currentTimeStat.time = time.Since(timestampNano).Nanoseconds()
currentTimeStat.name = "redis_llen"
globalLogger.Info("redis_llen time sub == %d", currentTimeStat.time)
select {
case t.chTimeStat <- currentTimeStat:
default:
}
}
Here are some logs:
[INFO ][2019-07-31][14:47:53] time 1564555673269444200
[INFO ][2019-07-31][14:47:53] time 1564555673269444200
[INFO ][2019-07-31][14:47:53] redis_llen time sub == 0
[INFO ][2019-07-31][14:47:58] time 1564555678267691700
[INFO ][2019-07-31][14:47:58] time 1564555678270689300
[INFO ][2019-07-31][14:47:58] redis_llen time sub == 2997600
[INFO ][2019-07-31][14:48:03] time 1564555683268195600
[INFO ][2019-07-31][14:48:03] time 1564555683268195600
[INFO ][2019-07-31][14:48:03] redis_llen time sub == 0
[INFO ][2019-07-31][14:48:08] time 1564555688267631100
[INFO ][2019-07-31][14:48:08] time 1564555688267631100
[INFO ][2019-07-31][14:48:08] redis_llen time sub == 0
There's nothing wrong with your code. On Windows, the system time is often only updated once every 10-15 ms or so, which means if you query the current time twice within this period, you get the same value.
Your operation sometimes yields t = 2997600ns = 3ms, which could explain this. Blame it on Windows.
Related questions:
How precise is Go's time, really?
How to determine the current Windows timer resolution?
Measuring time differences using System.currentTimeMillis()
time.Now() resolution under Windows has been improved in Go 1.16, see #8687 and CL #248699.
The timer resolution should now be around ~500 nanoseconds.
Test program:
package main
import (
"fmt"
"time"
)
func timediff() int64 {
t0 := time.Now().UnixNano()
for {
t := time.Now().UnixNano()
if t != t0 {
return t - t0
}
}
}
func main() {
var ds []int64
for i := 0; i < 10; i++ {
ds = append(ds, timediff())
}
fmt.Printf("%v nanoseconds\n", ds)
}
Test output:
[527400 39200 8400 528900 17000 16900 8300 506300 9700 34100] nanoseconds

When does netpoll() been called in Golang

Hi there, Im new Goer and confused by the netpoll() function.
Heres the thing. When I start a Httpserver like this
http.ListenAndServe("127.0.0.1:9988",nil)
as far as I concerned, there should be a gorouting or thread or something else do the epoll things to check the socket events. Cause Im testing on MAC, the related runtime code is in "netpoll_kqueue.go". This func is called by the sysmon(). In order to debug, I add some "println" to print out the related information.
The println is in
netpoll_kqueue.go:
func netpoll(block bool) *g {
if kq == -1 {
return nil
}
var tp *timespec
var ts timespec
if !block {
tp = &ts
}
var events [64]keventt
retry:
n := kevent(kq, nil, 0, &events[0], int32(len(events)), tp)
println("===============")
if n < 0 {
if n != -_EINTR {
println("runtime: kevent on fd", kq, "failed with", -n)
throw("runtime: netpoll failed")
}
goto retry
}
and
proc.go.sysmon()
asmcgocall(*cgo_yield, nil)
}
// poll network if not polled for more than 10ms
lastpoll := int64(atomic.Load64(&sched.lastpoll))
now := nanotime()
println("+++++++++++++++++++++")
if netpollinited() && lastpoll != 0 && lastpoll+10*1000*1000 < now {
atomic.Cas64(&sched.lastpoll, uint64(lastpoll), uint64(now))
gp := netpoll(false) // non-blocking - returns list of goroutines
if gp != nil {
// Need to decrement number of idle locked M's
// (pretending that one more is running) before i
As mentioned before, as far as I concerned, the netpoll() function should been called frequently. However, both "======" and "++++++" are printed only once. And, only when I send a request, the "======" and "++++" are printed. It confused me a lot, according to the scenario, if the netpoll() is not the function to call system epoll and sysmon() is not the function to raise netpoll() then who do the job?
Appreciate your help

Random drop in performance

I'm kind of a newbie in Go and there is something that confused me recently.
I have a piece of code (simplified version posted below) and I was trying to measure performanc for it. I did this in two ways: 1) a bencmark with testing package 2) manually logging time
Running the benchmark outputs a result
30000 55603 ns/op
which is fine, BUT... When I do the 30k runs of the same function logging the time for each iteration I get an output like this:
test took 0 ns
test took 0 ns
... ~10 records all the same
test took 1000100 ns
test took 0 ns
test took 0 ns
... lots of zeroes again
test took 0 ns
test took 1000000 ns
test took 0 ns
...
Doing the math shows that the average is indeed 55603 ns/op just as the benchmark claims.
Ok, I said, I'm not that good in optimizing performance and not that into all the hardcore compiler stuff, but I guess that might be random garbage collection? So I turned on the gc log, made sure it shows some output, then turned off the gc for good aaand... no garbage collection, but I see the same picture - some iterations take a million times longer(?).
It is 99% that my understanding of all this is wrong somewhere, maybe someone can point me to the right direction or maybe someone knows for sure what the hell is going on? :)
P.S. Also, to me less that a nanosecond (0 ns) is somewhat surprising, that seems too fast, but the program does provide the result of computation, so I don't know what to think anymore. T_T
EDIT 1: Answering Kenny Grant's question: I was using goroutines to implement sort-of generator of values to have laziness, now I removed them and simplified the code. The issue is much less frequent now, but it is still reproducible.
Playground link: https://play.golang.org/p/UQMgtT4Jrf
Interesting thing is that does not happen on playground, but still happens on my machine.
EDIT 2: I'm running Go 1.9 on win7 x64
EDIT 3: Thanks to the responses I now know that this code cannot possible work properly on playground. I will repost the code snippet here so that we don't loose it. :)
type PrefType string
var types []PrefType = []PrefType{
"TYPE1", "TYPE2", "TYPE3", "TYPE4", "TYPE5", "TYPE6",
}
func GetKeys(key string) []string {
var result []string
for _, t := range types {
rr := doCalculations(t)
for _, k := range rr {
result = append(result, key + "." + k)
}
}
return result
}
func doCalculations(prefType PrefType) []string {
return []string{ string(prefType) + "something", string(prefType) + "else" }
}
func test() {
start := time.Now()
keysPrioritized := GetKeys("spec_key")
for _, k := range keysPrioritized {
_ = fmt.Sprint(k)
}
fmt.Printf("test took %v ns\n", time.Since(start).Nanoseconds())
}
func main() {
for i := 0; i < 30000; i++ {
test()
}
}
Here is the output on my machine:
EDIT 4: I have tried the same on my laptop with Ubuntu 17.04, the output is reasonable, no zeroes and millions. Seems like a Windows-specific issue in the compiler/runtime lib. Would be great if someone can verify this on their machine (Win 7/8/10).
On Windows, for such a tiny duration, you don't have precise enough time stamps. Linux has more precise time stamps. By design, Go benchmarks run for at least one second. Go1.9+ uses the monotonic (m) value to compute the duration.
On Windows:
timedur.go:
package main
import (
"fmt"
"os"
"time"
)
type PrefType string
var types []PrefType = []PrefType{
"TYPE1", "TYPE2", "TYPE3", "TYPE4", "TYPE5", "TYPE6",
}
func GetKeys(key string) []string {
var result []string
for _, t := range types {
rr := doCalculations(t)
for _, k := range rr {
result = append(result, key+"."+k)
}
}
return result
}
func doCalculations(prefType PrefType) []string {
return []string{string(prefType) + "something", string(prefType) + "else"}
}
func test() {
start := time.Now()
keysPrioritized := GetKeys("spec_key")
for _, k := range keysPrioritized {
_ = fmt.Sprint(k)
}
end := time.Now()
fmt.Printf("test took %v ns\n", time.Since(start).Nanoseconds())
fmt.Println(start)
fmt.Println(end)
if end.Sub(start) < time.Microsecond {
os.Exit(1)
}
}
func main() {
for i := 0; i < 30000; i++ {
test()
}
}
Output:
>go run timedur.go
test took 1026000 ns
2017-09-02 14:21:58.1488675 -0700 PDT m=+0.010003700
2017-09-02 14:21:58.1498935 -0700 PDT m=+0.011029700
test took 0 ns
2017-09-02 14:21:58.1538658 -0700 PDT m=+0.015002000
2017-09-02 14:21:58.1538658 -0700 PDT m=+0.015002000
exit status 1
>
On Linux:
Output:
$ go run timedur.go
test took 113641 ns
2017-09-02 14:52:02.917175333 +0000 UTC m=+0.001041249
2017-09-02 14:52:02.917287569 +0000 UTC m=+0.001153717
test took 23614 ns
2017-09-02 14:52:02.917600301 +0000 UTC m=+0.001466208
2017-09-02 14:52:02.917623585 +0000 UTC m=+0.001489354
test took 22814 ns
2017-09-02 14:52:02.917726364 +0000 UTC m=+0.001592236
2017-09-02 14:52:02.917748805 +0000 UTC m=+0.001614575
test took 21139 ns
2017-09-02 14:52:02.917818409 +0000 UTC m=+0.001684292
2017-09-02 14:52:02.917839184 +0000 UTC m=+0.001704954
test took 21478 ns
2017-09-02 14:52:02.917911899 +0000 UTC m=+0.001777712
2017-09-02 14:52:02.917932944 +0000 UTC m=+0.001798712
test took 31032 ns
<SNIP>
The results are comparable. They were run on the same machine, a dual-boot with Windows 10 and Ubuntu 16.04.
Best to eliminate GC as obviously logging it is going to interfere with timings. The time pkg on playground is fake, so this won't work there. Trying it locally, I get no times of 0 ns with your code as supplied, it look like it is working as intended.
You should of course expect some variation in times - when I try it the results are all within the same order of magnitude (very small times of 0.000003779 s), but there is an occasional blip even if you do 30 runs, sometimes up to double - but running timings at this resolution is unlikely to give you reliable results as it depends what else is running on the computer, on memory layout etc. Better to try to time long running operations this way rather than very short times like this one and to time lots of operations and average them - this is why the benchmark tool gives you an average over so many runs.
Since the timings are for operations taking very little time, and are not wildly different, I think this is normal behaviour with the code supplied. The 0ns results are wrong but probably the result of your previous use of goroutines, hard to judge that without code as the code you provided doesn't give that result.

golang profile with pprof, how to get hit count not duration?

how to get hit count like:
(pprof) top
Total: 2525 samples
298 11.8% 11.8% 345 13.7% runtime.mapaccess1_fast64
268 10.6% 22.4% 2124 84.1% main.FindLoops
not, durations like:
(pprof) top
2220ms of 3080ms total (72.08%)
Dropped 72 nodes (cum <= 15.40ms)
Showing top 10 nodes out of 111 (cum >= 60ms)
flat flat% sum% cum cum%
1340ms 43.51% 43.51% 1410ms 45.78% runtime.cgocall_errno
env: I using golang1.4, add below codes.
defer pprof.StopCPUProfile()
f, err := os.Create("innercpu.pprof")
if err != nil {
fmt.Println("Error: ", err)
}
pprof.StartCPUProfile(f)
You can use go tool pprof -callgrind -output callgrind.out innercpu.pprof to generate callgrind data out of your collected profiling data. Which you can then visualise with qcachegrind/kcachegrind. It'll display call counts.

Resources