I am running a program in Go which sends data continuously after reading a file /proc/stat.
Using ioutil.ReadFile("/proc/stat")
After running for about 14 hrs i got err: too many files open /proc/stat
Click here for snippet of code.
I doubt that defer f.Close is ignored by Go sometimes or it is skipping it.
The snippet of code (in case play.golang.org dies sooner than stackoverflow.com):
package main
import ("fmt";"io/ioutil")
func main() {
for {
fmt.Println("Hello, playground")
fData,err := ioutil.ReadFile("/proc/stat")
if err != nil {
fmt.Println("Err is ",err)
}
fmt.Println("FileData",string(fData))
}
}
The reason probably is that somewhere in your program:
you are forgetting to close files, or
you are leaning on the garbage collector to automatically close files on object finalization, but Go's conservative garbage collector fails to do so. In this case you should check your program's memory consumption (whether it is steadily increasing while the program is running).
In either case, try to check the contents of /proc/PID/fd to see whether the number of open files is increasing while the program is running.
If you are sure you Do the f.Close(),it still has the proble,Maybe it is because your other connection,for example the connection to MYSQL,also will be cause the problem,especially,in a loop,and you forget to close the connection.
Always do :
db.connection....
**defer db.Close()**
If it is in loop
loop
db.connection....
**defer db.Close()**
end
Do not put the db.connection before the loop
Related
Another question How to read/write from/to file using Go? got into safe closing of file descriptors in a comment.
Note that these examples aren't checking the error return from
fo.Close(). From the Linux man pages close(2): Not checking the return
value of close() is a common but nevertheless serious programming
error. It is quite possible that errors on a previous write(2)
operation are first reported at the final close(). Not checking the
return value when closing the file may lead to silent loss of data.
This can especially be observed with NFS and with disk quota. – Nick
Craig-Wood Jan 25 '13 at 7:12
The solution that updated the post used a panic:
// close fo on exit and check for its returned error
defer func() {
if err := fo.Close(); err != nil {
panic(err)
}
}()
I want to hand this error as a value instead of panicking.
If we are afraid of writes not being completed close isn't enough, so updating the error is still not correct.
The correct solution if you want to not hit this is to fsync the file(s):
defer(fd.Close())
// Do stuff
return fd.Sync()
It's easier to read then returning a non-nil modified error either through defer or maintaining throughout the function.
This will be a performance hit, but will catch both close errors for writing to buffers and the physical write to disk.
Think of a large project which deals with tons of concurrent requests handled by its own goroutine. It happens that there is a bug in the code and one of these requests will cause panic due to a nil reference.
In Java, C# and many other languages, this would end up in a exception which would stop the request without any harm to other healthy requests. In go, that would crash the entire program.
AFAIK, I'd have to have recover() for every single new go routine creation. Is that the only way to prevent entire program from crashing?
UPDATE: adding recover() call for every gorouting creation seems OK. What about third-party libraries? If third party creates goroutines without recover() safe net, it seems there is NOTHING to be done.
If you go the defer-recover-all-the-things, I suggest investing some time to make sure that a clear error message is collected with enough information to promptly act on it.
Writing the panic message to stderr/stdout is not great as it will be very hard to find where the problem is. In my experience the best approach is to invest a bit of time to get your Go programs to handle errors in a reasonable way. errors.Wrap from "github.com/pkg/errors" for instance allows you to wrap all errors and get a stack-trace.
Recovering panic is often a necessary evil. Like you say, it's not ideal to crash the entire program just because one requested caused a panic. In most cases recovering panics will not back-fire, but it is possible for a program to end up in a undefined not-recoverable state that only a manual restart can fix. That being said, my suggestion in this case is to make sure your Go program exposes a way to create a core dump.
Here's how to write a core dump to stderr when SIGQUIT is sent to the Go program (eg. kill pid -QUIT)
go func() {
// Based on answers to this stackoverflow question:
// https://stackoverflow.com/questions/19094099/how-to-dump-goroutine-stacktraces
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGQUIT)
for {
<-sigs
fmt.Fprintln(os.Stderr, "=== received SIGQUIT ===")
fmt.Fprintln(os.Stderr, "*** goroutine dump...")
var buf []byte
var bufsize int
var stacklen int
// Create a stack buffer of 1MB and grow it to at most 100MB if
// necessary
for bufsize = 1e6; bufsize < 100e6; bufsize *= 2 {
buf = make([]byte, bufsize)
stacklen = runtime.Stack(buf, true)
if stacklen < bufsize {
break
}
}
fmt.Fprintln(os.Stderr, string(buf[:stacklen]))
fmt.Fprintln(os.Stderr, "*** end of dump")
}
}()
there is no way you can handle panic without recover function, a good practice would be using a middleware like function for your safe function, checkout this snippet
https://play.golang.org/p/d_fQWzXnlAm
This question already has answers here:
Cannot free memory once occupied by bytes.Buffer
(2 answers)
Closed 4 years ago.
I've written a simple TCP server.
The problem is that when stress-testing it, it seems that the memory usage is increasing dramatically, and not decreasing when he test is done.
When the server is started, it takes ~700KB.
During and after the stress-test, the memory usage jumps to ~7MB.
Here's my code:
package main
import (
"net"
"log"
"fmt"
"bufio"
)
func main() {
ln, err := net.Listen("tcp", ":8888")
if err != nil {
log.Fatal(err)
}
defer ln.Close()
for {
conn, err := ln.Accept()
if err != nil {
fmt.Println(err)
continue
}
go handle(conn)
}
}
func handle(conn net.Conn) {
defer conn.Close()
fmt.Println("Accepted", conn.LocalAddr())
for {
buf, err := bufio.NewReader(conn).ReadString('\n')
if err != nil {
break
}
msg := string(buf[:len(buf)-2])
fmt.Println("Received", msg)
conn.Write([]byte("OK\n"))
}
}
Any help is much appreciated.
Note: I'm using tcpkali for loading it. This is the command line:
tcpkali -em "testing\r\n" -c 100 -r 1000 -T 60 127.0.0.1:8888
EDIT: Following some comments below, I ran some tests and here are the results:
Started the server and ran tcpkali.
After the first run, RSS was at 8516.
After a second run, RSS climbed to 8572.
Server is now idle. 5 minutes later, RSS climbed to 8588.
5 more minutes later, RSS climbed to 8608, and seems stable.
After 15 minutes of break, I ran tcpkali again, and RSS climbed to 8684.
A few minutes break, another tcpkali run, RSS climbs to 8696.
A few minutes break, another tcpkali run, RSS climbs to 8704.
A few minutes break, another tcpkali run, RSS climbs to 8712.
Now, I don't know what you call this, but I call this a memory leak. Something is wrong here. No memory is freed, and RSS is keep climbing every time I run a test. Obviously, this thing cannot be deployed to production as it will eventually consume all available memory.
I also tried calling os.FreeOSMemory() but nothing happens.
My system is Go 1.9.4 on macOS 10.13.1. Is this environment related or am I missing something?
LAST UPDATE:
Following #Steffen Ullrich answer and the tests that failed on my environment, I gave it a try on Ubuntu server, and the memory is freed after a few minutes of idle time.
Seems like there's an issue with macOS.
Go does not release memory it allocated from the OS immediately. The reason is probably that allocating memory is costly (needs system calls) and the chance is high that it will be needed in the near future anyway again. But, if memory gets long enough unused it will be released eventually so that the RSS of the process decreases again.
Doing your test again with slight modifications will show this (it did at least for me):
Start you program and look at the RSS.
Run tcpkali, wait for tcpkali to end and look at the RSS again. It is much higher now since lots of memory was needed for the program to do the intended task.
Don't stop the program but run tcpkali again and wait again for it to end. When looking at the RSS you should see that it did not grow (much) further. This means that the program used the already allocated memory again and did not need to allocate new memory from the system.
Now monitor the RSS and wait. After a while (about 10 minutes on my system) you should see the RSS go down again. This is because the program has determined now that the allocated but unused memory will probably not be used any longer and returned the memory back to the OS.
Note that not all memory might be given back. According to Understanding Go Lang Memory Usage it will not return (in go 1.3) the memory used for the stacks of the go routines since it is more likely that this will be needed in the future.
For testing you might also add some debug.FreeOSMemory() (from runtime/debug) at strategic places (like when you break out of the loop in the goroutine) so that the memory gets returned earlier to the OS. But given that the lazy return of memory is for performance such explicit freeing might impact the performance.
I am trying to build a small tool that will allow me to run a program and track memory usage through Go. I am using r.exec = exec.Command(r.Command, r.CommandArgs...) to run the command, and runtime.MemStats to track memory usage (in a separate go routine):
func monitorRuntime() {
m := &runtime.MemStats{}
f, err := os.Create(fmt.Sprintf("mmem_%s.csv", getFileTimeStamp()))
if err != nil {
panic(err)
}
f.WriteString("Time;Allocated;Total Allocated; System Memory;Num Gc;Heap Allocated;Heap System;Heap Objects;Heap Released;\n")
for {
runtime.ReadMemStats(m)
f.WriteString(fmt.Sprintf("%s;%d;%d;%d;%d;%d;%d;%d;%d;\n", getTimeStamp(), m.Alloc, m.TotalAlloc, m.Sys, m.NumGC, m.HeapAlloc, m.HeapSys, m.HeapObjects, m.HeapReleased))
time.Sleep(5 * time.Second)
}
}
When I tested my code with simple program that just sits there (for about 12 hours), I noticed that Go is constantly allocating more memory:
System Memory
Heap Allocation
I did a few more tests such as running the monitorRuntime() function without any other code, or using pprof, such as:
package main
import (
"net/http"
_ "net/http/pprof"
)
func main() {
http.ListenAndServe(":8080", nil)
}
Yet I still noticed that memory allocation keeps going up just like in the graphs.
How can I accurately track memory usage of the program I want to run through Go?
I know one way, which I used in the past, is to use /proc/$PID/statm, but that file doesn't exist in every operating system (such as MacOS or Windows)
There isn't a way in standard Go to get the memory usage of a program called from exec.Command. runtime.ReadMemStats only returns memory tracked by the go runtime (which, in this case, is only the file handling and sprintf).
Your best bet would be to execute platform specific commands to get memory usage.
On Linux (RedHat) the following will show memory usage:
ps -No pid,comm,size,vsize,args:90
I have a cli application in Go (still in development) and no changes were made in source code neither on dependencies but all of a sudden it started to panic panic: sync: unlock of unlocked mutex.
The only place I'm running concurrent code is to handle when program is requested to close:
func handleProcTermination() {
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
go func() {
<-c
curses.Endwin()
os.Exit(0)
}()
defer curses.Endwin()
}
Only thing I did was to rename my $GOPATH and work space folder. Can this operation cause such error?
Have some you experienced any related problem without having any explanation? Is there a rational check list that would help to find the cause of the problem?
Ok, after some unfruitful debugging sessions, as a last resort, I simply wiped all third party code (dependencies) from the workspace:
cd $GOPATH
rm -rf pkg/ bin/ src/github.com src/golang.org # the idea is to remove all except your own source
Used go get to get all used dependencies again:
go get github.com/yadayada/yada
go get # etc
And the problem is gone! Application is starting normally and tests are passing. No startup panics anymore. It looks like this problem happens when you mv your work space folder but I'm not 100% sure yet. Hope it helps someone else.
From now on, re install dependencies will be my first step when weird panic conditions like that suddenly appear.
You're not giving much information to go on, so my answer is generic.
In theory, bugs in concurrent code can remain unnoticed for a long time and then suddenly show up. In practice, if the bug is easily repeatable (happens nearly every run) this usually indicates that something did change in the code or environment.
The solution: debug.
Knowing what has changed can be helpful to identify the bug. In this case, it appears that lock/unlock pairs or not matching up. If you are not passing locks between threads, you should be able to find a code path within the thread that has not acquired the lock, or has released it early. It may be helpful to put assertions at certain points to validate that you are holding the lock when you think you are.
Make sure you don't copy the lock somewhere.
What can happen with seemingly bulletproof code in concurrent environments is that the struct including the code gets copied elsewhere, which results in the underlying lock being different.
Consider this code snippet:
type someStruct struct {
lock sync.Mutex
}
func (s *someStruct) DoSomethingUnderLock() {
s.lock.Lock()
defer s.lock.Unlock() // This will panic
time.Sleep(200 * time.Millisecond)
}
func main() {
s1 := &someStruct{}
go func() {
time.Sleep(100 * time.Millisecond) // Wait until DoSomethingUnderLock takes the lock
s2 := &someStruct{}
*s1 = *s2
}()
s1.DoSomethingUnderLock()
}
*s1 = *s2 is the key here - it results in a different lock being used by the same receiver function and if the struct is replaced while the lock is taken, we'll get sync: unlock of unlocked mutex.
What makes it harder to debug is that someStruct might be nested in another struct (and another, and another), and if the outer struct gets replaced (as long as someStruct is not a reference there) in a similar manner, the result will be the same.
If this is the case, you can use a reference to the lock (or the whole struct) instead. Now you need to initialize it, but it's a small price that might save you some obscure bugs. See the modified code that doesn't panic here.
For those who come here and didn't solve your problem. Check If the application is compiled in one version of Linux but running in another version. At least in my case it happened.