Goroutine thread safety of Go logging struct instantiation utility method - go

I'm working with a new go service and I have a SetupLogger utility function that creates a new instance of go-kit's logging struct log.Logger.
Is this method safe to invoke from code that's handling requests inside separate go-routines?
package utils
import (
"fmt"
"github.com/go-kit/kit/log"
"io"
"os"
"path/filepath"
)
// If the environment-specified directory for writing log files exists, open the existing log file
// if it already exists or create a log file if no log file exists.
// If the environment-specified directory for writing log files does not exist, configure the logger
// to log to process stdout.
// Returns an instance of go-kit logger
func SetupLogger() log.Logger {
var logWriter io.Writer
var err error
LOG_FILE_DIR := os.Getenv("CRAFT_API_LOG_FILE_DIR")
LOG_FILE_NAME := os.Getenv("CRAFT_API_LOG_FILE_NAME")
fullLogFilePath := filepath.Join(
LOG_FILE_DIR,
LOG_FILE_NAME,
)
if dirExists, _ := Exists(&ExistsOsCheckerStruct{}, LOG_FILE_DIR); dirExists {
if logFileExists, _ := Exists(&ExistsOsCheckerStruct{}, fullLogFilePath); !logFileExists {
os.Create(fullLogFilePath)
}
logWriter, err = os.OpenFile(fullLogFilePath, os.O_RDWR|os.O_CREATE|os.O_APPEND, 0666)
if err != nil {
fmt.Println("Could not open log file. ", err)
}
} else {
logWriter = os.Stdout
}
return log.NewContext(log.NewJSONLogger(logWriter)).With(
"timestamp", log.DefaultTimestampUTC,
"caller", log.DefaultCaller,
)
}

First recommendation: Use the -race flag to go build and go test. It will almost always be able to tell you if you have a race condition. Although it might not in this case since you could end up calling your os.Create() and your os.OpenFile() simultaneously.
So, second recommendation is to avoid, if at all possible, the "If it exists/matches/has permissions Then open/delete/whatever" pattern.
That pattern leads to the TOCTTOU (Time of Check To Time Of Use) bug, which is often a security bug and can at the very least lead to data loss.
To avoid it either wrap the check and use into the same mutex, or use atomic operations, such as an OpenFile call that creates the file or returns an error if it already existed (although to be technical, its locked inside the OS kernel. Just like how atomic CPU ops are locked in the hardware bus.).
In your case here I am not quite sure why you have two Open calls since it looks like just one would do the job.

Since your setting up of your Logger only involves library instantiation, creating a log file if it doesn't exist, opening the log file and no writing involved there will be no problem calling it from different go-routines since the shared data is not getting tampered with.
Side note:
Design wise it makes sense (assuming Logger is writing to the same file) to pass around the only instantiated instance of Logger for logging which would prevent two go-routines calling your setup function at the same time.

Related

How and when is the Go sdk logger flushed?

I'm trying to determine if the default/sdk logger log.PrintYYY() functions are flushed at some point in time, on exit, on panic, etc. I'm unsure if I need to find a way to flush the writer that the logger is hooked up to, especially when setting the output writer with SetOutput(...). Of course, the writer interface doesn't have a flush() method, so not really sure how that might get done.
How and when is the Go sdk logger flushed?
The log package is not responsible for flushing the underlying io.Writer. It would be possible for the log package to perform a type-assertion to see if the current io.Writer has a Flush() method, and if so then call it, but there is no guarantee that if multiple io.Writers are "chained", the data will eventually be flushed to the bottommost layer.
And the primary reason why the log package doesn't flush in my opinion is performance. We use buffered writers so we don't have to reach the underlying layer each time a single byte (or byte slice) is written to it, but we can cache the recently written data, and when we reach a certain size (or a certain time), we can write the "batch" at once, efficiently.
If the log package would flush after each log statement, that would render buffered IO useless. It might not matter in case of small apps, but if you have a high traffic web server, issuing a flush after each log statement (of which there may be many in each request handing) would cause a serious performance drawback.
Then yes, there is the issue if the app is terminated, the last log statements might not make it to the underlying layer. The proper solution is to do a graceful shutdown: implement signal handling, and when your app is about to terminate, properly flush and close the underlying io.Writer of the logger you use. For details, see:
Is it possible to capture a Ctrl+C signal and run a cleanup function, in a "defer" fashion?
Is there something like finally() in Go just opposite to what init()?
Are deferred functions called when SIGINT is received in Go?
If–for simplicity only–you still need a logger that flushes after each log statement, you can achieve that easily. This is because the log.Logger type guarantees that each log message is delivered to the destination io.Writer with a single Writer.Write() call:
Each logging operation makes a single call to the Writer's Write method. A Logger can be used simultaneously from multiple goroutines; it guarantees to serialize access to the Writer.
So basically all you need to do is create a "wrapper" io.Writer whose Write() method does a flush after "forwarding" the Write() call to its underlying writer.
This is how it could look like:
type myWriter struct {
io.Writer
}
func (m *myWriter) Write(p []byte) (n int, err error) {
n, err = m.Writer.Write(p)
if flusher, ok := m.Writer.(interface{ Flush() }); ok {
flusher.Flush()
} else if syncer := m.Writer.(interface{ Sync() error }); ok {
// Preserve original error
if err2 := syncer.Sync(); err2 != nil && err == nil {
err = err2
}
}
return
}
This implementation checks for both the Flush() method and os.File's Sync() method, and calls if they are "present".
This is how this can be used so that logging statements always flush:
f, err := os.Create("log.txt")
if err != nil {
panic(err)
}
defer f.Close()
log.SetOutput(&myWriter{Writer: f})
log.Println("hi")
See related questions:
Go: Create io.Writer inteface for logging to mongodb database
net/http set custom logger
Logger shouldn't know how to flush data. You have to flush output writer which you specified on logger creation (if it has such capability).
See example from github discussion
package main
import (
"bufio"
"flag"
"log"
"os"
"strings"
)
func main() {
var flush bool
flag.BoolVar(&flush, "flush", false, "if set, will flush the buffered io before exiting")
flag.Parse()
br := bufio.NewWriter(os.Stdout)
logger := log.New(br, "", log.Ldate)
logger.Printf("%s\n", strings.Repeat("This is a test\n", 5))
if flush {
br.Flush()
}
logger.Fatalf("exiting now!")
}
You can read entire discussion related to your issue on github
Alternatively, you can look at 3rd party loggers which have flush. Check out zap logger which have method logger.Sync()

Does closing io.PipeWriter close the underlying file?

I am using logrus for logging and have a few custom format loggers. Each is initialized to write to a different file like:
fp, _ := os.OpenFile(path, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0755)
// error handling left out for brevity
log.Out = fp
Later in the application, I need to change the file the logger is writing to (for a log rotation logic). What I want to achieve is to properly close the current file before changing the logger's output file. But the closest thing to the file handle logrus provides me is a Writer() method that returns a io.PipeWriter pointer. So would calling Close() on the PipeWriter also close the underlying file?
If not, what are my options to do this, other than keeping the file pointer stored somewhere.
For the record, twelve-factor tells us that applications should not concern themselves with log rotation. If and how logs are handled best depends on how the application is deployed. Systemd has its own logging system, for instance. Writing to files when deployed in (Docker) containers is annoying. Rotating files are annoying during development.
Now, pipes don't have an "underlying file". There's a Reader end and a Writer end, and that's it. From the docs for PipeWriter:
Close closes the writer; subsequent reads from the read half of the pipe will return no bytes and EOF.
So what happens when you close the writer depends on how Logrus handles EOF on the Reader end. Since Logger.Out is an io.Writer, Logrus cannot possibly call Close on your file.
Your best bet would be to wrap *os.File, perhaps like so:
package main
import "os"
type RotatingFile struct {
*os.File
rotate chan struct{}
}
func NewRotatingFile(f *os.File) RotatingFile {
return RotatingFile{
File: f,
rotate: make(chan struct{}, 1),
}
}
func (r RotatingFile) Rotate() {
r.rotate <- struct{}{}
}
func (r RotatingFile) doRotate() error {
// file rotation logic here
return nil
}
func (r RotatingFile) Write(b []byte) (int, error) {
select {
case <-r.rotate:
if err := r.doRotate(); err != nil {
return 0, err
}
default:
}
return r.File.Write(b)
}
Implementing log file rotation in a robust way is surprisingly tricky. For instance, closing the old file before creating the new one is not a good idea. What if the log directory permissions changed? What if you run out of inodes? If you can't create a new log file you may want to keep writing to the current file. Are you okay with ripping lines apart, or do you only want to rotate after a newline? Do you want to rotate empty files? How do you reliably remove old logs if someone deletes the N-1th file? Will you notice the Nth file or stop looking at the N-2nd?
The best advice I can give you is to leave log rotation to the pros. I like svlogd (part of runit) as a standalone log rotation tool.
The closing of io.PipeWriter will not affect actual Writer behind it. The chain of close execution:
PipeWriter.Close() -> PipeWriter.CloseWithError(err error) ->
pipe.CloseWrite(err error)
and it doesn't influence underlying io.Writer.
To close actual writer you need just to close Logger.Out that is an exported field.

Do I need a mutex if I am returning a copy of the variable rather than a pointer?

I'm a little confused about the use of sync.Mutex in Go. I understand the basic concept (calling Lock() will prevent other goroutines from executing the code between it and Unlock()), but I'm not sure if I need it here. I've seen a fair few C++ answers for this but in each example they all seem to be modifying and accessing a variable directly.
This is my code from a package called configuration, which I will use throughout the application to get (surprisingly) configuration and settings information.
package config
import (
"encoding/json"
"fmt"
"os"
"sync"
log "github.com/sirupsen/logrus"
)
/*
ConfigurationError is an implementation of the error interface describing errors that occurred while dealing with this
package.
*/
type ConfigurationError string
/*
Error prints the error message for this ConfigurationError. It also implements the error interface.
*/
func (ce ConfigurationError) Error() string {
return fmt.Sprintf("Configuration error: %s", string(ce))
}
/*
configuration is a data struct that holds all of the required information for setting up the program. It is unexported
to prevent other packages creating more instances. Other packages that need settings information should call Current()
to access a copy of the unexported programConfig package variable.
*/
type configuration struct {
sync.RWMutex
LogLevel log.Level `json:"logLevel,omitempty"` //Logging
LogLocation string `json:"logLocation,omitempty"` //-
HttpPort int `json:"port,omitempty"` //Web
LoginUri string `json:"loginUri"` //-
WebBaseUri string `json:"webBaseUri"` //-
StaticBaseUri string `json:"staticBaseUri"` //-
ApiBaseUri string `json:"apiBaseUri"` //-
StaticContentLocalPath string `json:"staticContentLocalPath"` //-
MaxSimultaneousReports int `json:"maxSimultaneousReports"` //Reporting
}
var programConfig configuration
/*
Current returns a copy of the currently loaded program configuration.
*/
func Current() configuration {
programConfig.RLock()
defer programConfig.RUnlock()
return programConfig
}
/*
Load attempts to load a JSON settings file containing a representation of the Configuration struct. It will then set
the value of the package-level config struct to the loaded values. Some settings changes will require a restart of the
server application.
filepath - the full path of the settings file including a leading slash or drive name (depending on the OS).
*/
func Load(filepath string) error {
//Open the file for reading.
settingsFile, err := os.Open(filepath)
if err != nil {
return ConfigurationError(err.Error())
}
defer settingsFile.Close()
//Decode JSON into package level var.
decoder := json.NewDecoder(settingsFile)
newSettings := configuration{}
err = decoder.Decode(&newSettings)
if err != nil {
return ConfigurationError(err.Error())
}
programConfig.Lock() //I'm not 100% sure this is the correct use of a mutex for this situation, so check up on that.
programConfig = newSettings
programConfig.Unlock()
return nil
}
As you can see, I've used mutex in two places.
In Current(). Do I need this here if the function is not returning a pointer but a copy of the programConfig variable? The only way the underlying package variable will be modified is through the Load() function.
In the Load() function. This can be called at any time by any goroutine, although it rarely will be.
Given that, am I using them correctly and why do I need one when reading a copy of the data (if so)?
When you read data which can be written at the same time you need a mutex. Otherwise it might happen that you read while it's written and get half of the old data and half of the new data.
So your example seems to be just fine. Because you are probably reading the config very often but writing it rarely your usage of a RWLock makes sense. This means that multiple threads can read at the same time as long as it's not written.
What in your code looks dangerous is:
programConfig.Lock()
programConfig = newSettings
programConfig.Unlock()
Because programConfig contains the Mutex you are doing the Lock and Unlock on different instances which will lead to deadlocks. You should add the Mutex to the "parent" of the instance. In this case probably the package.

Running Multiple GTK WebKitWebViews via Goroutines

I'm using Go with the gotk3 and webkit2 libraries to try and build a web crawler that can parse JavaScript in the context of a WebKitWebView.
Thinking of performance, I'm trying to figure out what would be the best way to have it crawl concurrently (if not in parallel, with multiple processors), using all available resources.
GTK and everything with threads and goroutines are pretty new to me. Reading from the gotk3 goroutines example, it states:
Native GTK is not thread safe, and thus, gotk3's GTK bindings may not be used from other goroutines. Instead, glib.IdleAdd() must be used to add a function to run in the GTK main loop when it is in an idle state.
Go will panic and show a stack trace when I try to run a function, which creates a new WebView, in a goroutine. I'm not exactly sure why this happens, but I think it has something to do with this comment. An example is shown below.
Current Code
Here's my current code, which has been adapted from the webkit2 example:
package main
import (
"fmt"
"github.com/gotk3/gotk3/glib"
"github.com/gotk3/gotk3/gtk"
"github.com/sourcegraph/go-webkit2/webkit2"
"github.com/sqs/gojs"
)
func crawlPage(url string) {
web := webkit2.NewWebView()
web.Connect("load-changed", func(_ *glib.Object, i int) {
loadEvent := webkit2.LoadEvent(i)
switch loadEvent {
case webkit2.LoadFinished:
fmt.Printf("Load finished for: %v\n", url)
web.RunJavaScript("window.location.hostname", func(val *gojs.Value, err error) {
if err != nil {
fmt.Println("JavaScript error.")
} else {
fmt.Printf("Hostname (from JavaScript): %q\n", val)
}
//gtk.MainQuit()
})
}
})
glib.IdleAdd(func() bool {
web.LoadURI(url)
return false
})
}
func main() {
gtk.Init(nil)
crawlPage("https://www.google.com")
crawlPage("https://www.yahoo.com")
crawlPage("https://github.com")
crawlPage("http://deelay.me/2000/http://deelay.me/img/1000ms.gif")
gtk.Main()
}
It seems that creating a new WebView for each URL allows them to load concurrently. Having glib.IdleAdd() running in a goroutine, as per the gotk3 example, doesn't seem to have any effect (although I'm only doing a visual benchmark):
go glib.IdleAdd(func() bool { // Works
web.LoadURI(url)
return false
})
However, trying to create a goroutine for each crawlPage() call ends in a panic:
go crawlPage("https://www.google.com") // Panics and shows stack trace
I can run web.RunJavaScript() in a goroutine without issue:
switch loadEvent {
case webkit2.LoadFinished:
fmt.Printf("Load finished for: %v\n", url)
go web.RunJavaScript("window.location.hostname", func(val *gojs.Value, err error) { // Works
if err != nil {
fmt.Println("JavaScript error.")
} else {
fmt.Printf("Hostname (from JavaScript): %q\n", val)
}
//gtk.MainQuit()
})
}
Best Method?
The current methods I can think of are:
Spawn new WebViews to crawl each page, as shown in the current code. Track how many WebViews are opened and either continually delete and create new ones, or reuse a set number created initially, to where all available resources on the machine are used. Would this be limited in terms of processor cores being used?
Basic idea of #1, but running the binary multiple times (instead of one gocrawler process running on the machine, have four) to utilize all cores/resources.
Run the GUI (gtk3) portion of the app in its own goroutine. I could then pass data to other goroutines which do their own heavy processing, such as searching through content.
What would actually be the best way to run this code concurrently, if possible, and max out performance?
Update
Method 1 and 2 are probably out of the picture, as I ran a test by spawning ~100 WebViews and they seem to load synchronously.

Is there a built-in go logger that can roll

Is there a built-in Go logger that can roll a log file when it reaches a file size limit?
Thanks.
No, there is no built-in in logger that currently has this feature.
log4go, which you will find recommended when searching, is currently broken. has some issues, that lead to messages getting lost (in case the main program exits and some messages are still in the channelbuffer before being written).
this is present in most if not all the examples.
see also this question
Above the syslog package, which probably is not what you really want, no such thing is in the standard library.
From the 3rd party packages, for example log4go claims to have this feature.
import (
"os"
"github.com/nikandfor/tlog"
"github.com/nikandfor/tlog/rotated"
)
func main() {
f, err := rotated.Create("logfile_template_#.log") // # will be substituted by time of file creation
if err != nil {
panic(err)
}
defer f.Close()
f.MaxSize = 1 << 30 // 1GiB
f.Fallback = os.Stderr // in case of failure to write to file, last chance to save log message
tlog.DefaultLogger = tlog.New(tlog.NewConsoleWriter(f, tlog.LstdFlags))
tlog.Printf("now use it much like %v", "log.Logger")
log.SetOutput(f) // also works for any logger or what ever needs io.Writer
log.Printf("also appears in the log")
}
logger https://github.com/nikandfor/tlog
rotated file https://godoc.org/github.com/nikandfor/tlog/rotated

Resources