I want to try open a PE file with a timeout in Go. To achieve this, I am using anonymous function while channeling out the file pointer and error. I use the select clause with a timeout case to enforce the timeout as shown below.
go func() {
f, e := pe.Open(filePath)
file <- f
err <- e
}()
select {
case <-fileOpenTimeout:
fmt.Printf("ERROR: Opening PE file timed out")
return
case fileError := <-err:
if fileError == nil{...}
}
This code works fine for my use case. However, this may lead to resource leakage if the file takes too long to open. How can I prevent this? Is there a better way to enforce timeout on opening the PE file?
If you have a done channel that's passed to the anonymous func, you can use it to send a signal that you've ended early.
func asd() {
fileOpenTimeout := time.After(5 * time.Second)
type fileResponse struct {
file *pe.File
err error
}
response := make(chan fileResponse)
done := make(chan struct{})
go func(done <-chan struct{}) {
f, e := pe.Open(filePath)
r := fileResponse{
file: f,
err: e,
}
select {
case response <- r:
// do nothing, response sent
case <-done:
// clean up
if f != nil {
f.Close()
}
}
}(done)
select {
case <-fileOpenTimeout:
fmt.Printf("ERROR: Opening PE file timed out")
close(done)
return
case r := <-response:
if r.err != nil { ... }
}
}
When the done channel is closed you will always be able to read the zero value. So your anonymous func won't leak. There's also a struct fileResponse that is scoped to only the function to simplify passing multiple values back from the go routine
Related
I am trying to use gzip.NewWriter to stream data in and write compressed data to a CSV file. Everything works except the footer doesn't seem to get written when I use defer gzip.Close(). I get an Unexpected end of data message when I try to open the file with 7-zip.
Note: I have seen this question (and answer) but I feel like my problem is a little different, because I am writing to a file and not returning the bytes.
As I understand it, OP in that question was returning the bytes before writing the footer. But because I am just writing to a file, I shouldn't be running into the same issue.
Here is a snippet of my code. I have removed all error-checking for the sake of brevity.
// Worker that reads invoices and compresses them into a single file.
func compressWorker(c Config, archived <-chan Invoice, done chan<- int) {
now := time.Now().Format("2006-01")
path := filepath.Join(c.OutDir, now+".csv.gz")
readColumns := false
f, _ := os.Create(path)
defer closeWriter("csv file", f)
cw := gzip.NewWriter(f)
cw.Name = now + ".csv"
// * This is where I originally had my defer Close call.
// defer closeWriter("compression stream", cw)
for i := range archived {
if !readColumns {
b := []byte(i.CsvColumns() + ",DateLastSaved\n")
cw.Write(b)
readColumns = true
}
b := []byte(i.ToCsvString() + "," + time.Now().Format("2006-01-02") + "\n")
cw.Write(b)
}
// Ordinarily I'd say we defer this earlier, but that doesn't work for some
// reason.
closeWriter("compression writer", cw)
done <- 1
}
// Print an error with a prefix string and exit.
func HandleErrorWithPrefix(e error, p string) {
if e != nil {
log.Fatalf("Error: %v; %v\n", p, e)
}
}
func closeWriter(n string, wc io.WriteCloser) {
log.Printf("closing %v\n", n)
err := wc.Close()
HandleErrorWithPrefix(err, fmt.Sprintf("error closing writer '%v'", n))
}
Interestingly, I get this from closeWriter with my original code and deferred closing:
2021/10/13 09:05:42 closing compression writer
2021/10/13 09:05:42 closing csv file
However, I get this when I close cw without deferring:
2021/10/13 09:06:01 closing compression stream
I get no errors in closeWriter though so I'm not sure why the file wouldn't be closing. Deferred works last-in-first-out right?
So it appears my problem wasn't much different: my program was ending before my writers had enough time to fully close.
I solved this, with the help of everyone commenting, by moving my done send to a deferred closure at the beginning of the worker function.
func compressWorker(c Config, archived <-chan Invoice, done chan<- int) {
// Defer here ensures it will always be run last.
defer func() {
done <- 1
}()
// ...
defer f.Close()
defer cw.Close()
// ...
}
I'm using two concurrent goroutines to copy stdin/stdout from my terminal to a net.Conn target. For some reason, I can't manage to completely stop the two go routines without getting a panic error (for trying to close a closed connection). This is my code:
func interact(c net.Conn, sessionMap map[int]net.Conn) {
quit := make(chan bool) //the channel to quit
copy := func(r io.ReadCloser, w io.WriteCloser) {
defer func() {
r.Close()
w.Close()
close(quit) //this is how i'm trying to close it
}()
_, err := io.Copy(w, r)
if err != nil {
//
}
}
go func() {
for {
select {
case <-quit:
return
default:
copy(c, os.Stdout)
}
}
}()
go func() {
for {
select {
case <-quit:
return
default:
copy(os.Stdin, c)
}
}
}()
}
This errors as panic: close of closed channel
I want to terminate the two go routines, and then normally proceed to another function. What am I doing wrong?
You can't call close on a channel more than once, there's no reason to call copy in a for loop, since it can only operate one time, and you're copying in the wrong direction, writing to stdin and reading from stdout.
Simply asking how to quit 2 goroutines is simple, but that's not the only thing you need to do here. Since io.Copy is blocking, you don't need the extra synchronization to determine when the call is complete. This lets you simplify the code significantly, which will make it a lot easier to reason about.
func interact(c net.Conn) {
go func() {
// You want to close this outside the goroutine if you
// expect to send data back over a half-closed connection
defer c.Close()
// Optionally close stdout here if you need to signal the
// end of the stream in a pipeline.
defer os.Stdout.Close()
_, err := io.Copy(os.Stdout, c)
if err != nil {
//
}
}()
_, err := io.Copy(c, os.Stdin)
if err != nil {
//
}
}
Also note that you may not be able to break out of the io.Copy from stdin, so you can't expect the interact function to return. Manually doing the io.Copy in the function body and checking for a half-closed connection on every loop may be a good idea, then you can break out sooner and ensure that you fully close the net.Conn.
Also could be like this
func scanReader(quit chan int, r io.Reader) chan string {
line := make(chan string)
go func(quit chan int) {
defer close(line)
scan := bufio.NewScanner(r)
for scan.Scan() {
select {
case <- quit:
return
default:
s := scan.Text()
line <- s
}
}
}(quit)
return line
}
stdIn := scanReader(quit, os.Stdin)
conIn := scanReader(quit, c)
for {
select {
case <-quit:
return
case l <- stdIn:
_, e := fmt.Fprintf(c, l)
if e != nil {
quit <- 1
return
}
case l <- conIn:
fmt.Println(l)
}
}
I have a for-loop that calls a function runCommand() which runs a remote command on a switch and prints the output. The function is called in a goroutine on each iteration and I am using a sync.Waitgroup to synchronize the goroutines. Now, I need a way to capture the output and any errors of my runCommand() function into a channel. I have read many articles and watched a lot of videos on using channels with goroutines, but this is the first time I have ever written a concurrent application and I can't seem to wrap my head around the idea.
Basically, my program takes in a list of hostnames from the command line then asynchronously connects to each host, runs a configuration command on it, and prints the output. It is ok for my program to continue configuring the remaining hosts if one has an error.
How would I idiomatically send the output or error(s) of each call to runCommand() to a channel then receive the output or error(s) for printing?
Here is my code:
package main
import (
"fmt"
"golang.org/x/crypto/ssh"
"os"
"time"
"sync"
)
func main() {
hosts := os.Args[1:]
clientConf := configureClient("user", "password")
var wg sync.WaitGroup
for _, host := range hosts {
wg.Add(1)
go runCommand(host, &clientConf, &wg)
}
wg.Wait()
fmt.Println("Configuration complete!")
}
// Run a remote command
func runCommand(host string, config *ssh.ClientConfig, wg *sync.WaitGroup) {
defer wg.Done()
// Connect to the client
client, err := ssh.Dial("tcp", host+":22", config)
if err != nil {
fmt.Println(err)
return
}
defer client.Close()
// Create a session
session, err := client.NewSession()
if err != nil {
fmt.Println(err)
return
}
defer session.Close()
// Get the session output
output, err := session.Output("show lldp ne")
if err != nil {
fmt.Println(err)
return
}
fmt.Print(string(output))
fmt.Printf("Connection to %s closed.\n", host)
}
// Set up client configuration
func configureClient(user, password string) ssh.ClientConfig {
var sshConf ssh.Config
sshConf.SetDefaults()
// Append supported ciphers
sshConf.Ciphers = append(sshConf.Ciphers, "aes128-cbc", "aes256-cbc", "3des-cbc", "des-cbc", "aes192-cbc")
// Create client config
clientConf := &ssh.ClientConfig{
Config: sshConf,
User: user,
Auth: []ssh.AuthMethod{ssh.Password(password)},
HostKeyCallback: ssh.InsecureIgnoreHostKey(),
Timeout: time.Second * 5,
}
return *clientConf
}
EDIT: I got rid of the Waitgroup, as suggested, and now I need to keep track of which output belongs to which host by printing the hostname before printing its output and printing a Connection to <host> closed. message when the gorouttine completes. For example:
$ go run main.go host1[,host2[,...]]
Connecting to <host1>
[Output]
...
[Error]
Connection to <host1> closed.
Connecting to <host2>
...
Connection to <host2> closed.
Configuration complete!
I know the above won't necessarily process host1 and host2 in order, But I need to print the correct host value for the connecting and closing messages before and after the output/error(s), respectively. I tried defering printing the closing message in the runCommand() function, but the message is printed out before the output/error(s). And printing the closing message in the for-loop after each goroutine call doesn't work as expected either.
Updated code:
package main
import (
"fmt"
"golang.org/x/crypto/ssh"
"os"
"time"
)
type CmdResult struct {
Host string
Output string
Err error
}
func main() {
start := time.Now()
hosts := os.Args[1:]
clientConf := configureClient("user", "password")
results := make(chan CmdResult)
for _, host := range hosts {
go runCommand(host, &clientConf, results)
}
for i := 0; i < len(hosts); i++ {
output := <- results
fmt.Println(output.Host)
if output.Output != "" {
fmt.Printf("%s\n", output.Output)
}
if output.Err != nil {
fmt.Printf("Error: %v\n", output.Err)
}
}
fmt.Printf("Configuration complete! [%s]\n", time.Since(start).String())
}
// Run a remote command
func runCommand(host string, config *ssh.ClientConfig, ch chan CmdResult) {
// This is printing before the output/error(s).
// Does the same when moved to the bottom of this function.
defer fmt.Printf("Connection to %s closed.\n", host)
// Connect to the client
client, err := ssh.Dial("tcp", host+":22", config)
if err != nil {
ch <- CmdResult{host, "", err}
return
}
defer client.Close()
// Create a session
session, err := client.NewSession()
if err != nil {
ch <- CmdResult{host, "", err}
return
}
defer session.Close()
// Get the session output
output, err := session.Output("show lldp ne")
if err != nil {
ch <- CmdResult{host, "", err}
return
}
ch <- CmdResult{host, string(output), nil}
}
// Set up client configuration
func configureClient(user, password string) ssh.ClientConfig {
var sshConf ssh.Config
sshConf.SetDefaults()
// Append supported ciphers
sshConf.Ciphers = append(sshConf.Ciphers, "aes128-cbc", "aes256-cbc", "3des-cbc", "des-cbc", "aes192-cbc")
// Create client config
clientConf := &ssh.ClientConfig{
Config: sshConf,
User: user,
Auth: []ssh.AuthMethod{ssh.Password(password)},
HostKeyCallback: ssh.InsecureIgnoreHostKey(),
Timeout: time.Second * 5,
}
return *clientConf
}
If you use an unbuffered channel, you actually don't need the sync.WaitGroup, because you can call the receive operator on the channel once for every goroutine that will send on the channel. Each receive operation will block until a send statement is ready, resulting in the same behavior as a WaitGroup.
To make this happen, change runCommand to execute a send statement exactly once before the function exits, under all conditions.
First, create a type to send over the channel:
type CommandResult struct {
Output string
Err error
}
And edit your main() {...} to execute a receive operation on the channel the same number of times as the number of goroutines that will send to the channel:
func main() {
ch := make(chan CommandResult) // initialize an unbuffered channel
// rest of your setup
for _, host := range hosts {
go runCommand(host, &clientConf, ch) // pass in the channel
}
for x := 0; x < len(hosts); x++ {
fmt.Println(<-ch) // this will block until one is ready to send
}
And edit your runCommand function to accept the channel, remove references to WaitGroup, and execute the send exactly once under all conditions:
func runCommand(host string, config *ssh.ClientConfig, ch chan CommandResult) {
// do stuff that generates output, err; then when ready to exit function:
ch <- CommandResult{output, err}
}
EDIT: Question updated with stdout message order requirements
I'd like to get nicely formatted output that ignores the order of events
In this case, remove all print messages from runCommand, you're going to put all output into the element you're passing on the channel so it can be grouped together. Edit the CommandResult type to contain additional fields you want to organize, such as:
type CommandResult struct {
Host string
Output string
Err error
}
If you don't need to sort your results, you can just move on to printing the data received, e.g.
for x := 0; x < len(hosts); x++ {
r := <-ch
fmt.Printf("Host: %s----\nOutput: %s\n", r.Host, r.Output)
if r.Err != nil {
fmt.Printf("Error: %s\n", r.Err)
}
}
If you do need to sort your results, then in your main goroutine, add the elements received on the channel to a slice:
...
results := make([]CommandResult, 0, len(hosts))
for x := 0; x < len(hosts); x++ {
results = append(results, <-ch) // this will block until one is ready to send
}
Then you can use the sort package in the Go standard library to sort your results for printing. For example, you could sort them alphabetically by host. Or you could put the results into a map with host string as the key instead of a slice to allow you to print in the order of the original host list.
I'm trying to understand the difference in Go between creating an anonymous function which takes a parameter, versus having that function act as a closure. Here is an example of the difference.
With parameter:
func main() {
done := make(chan bool, 1)
go func(c chan bool) {
time.Sleep(50 * time.Millisecond)
c <- true
}(done)
<-done
}
As closure:
func main() {
done := make(chan bool, 1)
go func() {
time.Sleep(50 * time.Millisecond)
done <- true
}()
<-done
}
My question is, when is the first form better than the second? Would you ever use a parameter for this kind of thing? The only time I can see the first form being useful is when returning a func(x, y) from another function.
The difference between using a closure vs using a function parameter has to do with sharing the same variable vs getting a copy of the value. Consider these two examples below.
In the Closure all function calls will use the value stored in i. This value will most likely already reach 3 before any of the goroutines has had time to print it's value.
In the Parameter example each function call will get passed a copy of the value of i when the call was made, thus giving us the result we more likely wanted:
Closure:
for i := 0; i < 3; i++ {
go func() {
fmt.Println(i)
}()
}
Result:
3
3
3
Parameter:
for i := 0; i < 3; i++ {
go func(v int) {
fmt.Println(v)
}(i)
}
Result:
0
1
2
Playground: http://play.golang.org/p/T5rHrIKrQv
When to use parameters
Definitely the first form is preferred if you plan to change the value of the variable which you don't want to observe in the function.
This is the typical case when the anonymous function is inside a for loop and you intend to use the loop's variables, for example:
for i := 0; i < 10; i++ {
go func(i int) {
fmt.Println(i)
}(i)
}
Without passing the variable i you might observe printing 10 ten times. With passing i, you will observe numbers printed from 0 to 9.
When not to use parameters
If you don't want to change the value of the variable, it is cheaper not to pass it and thus not create another copy of it. This is especially true for large structs. Although if you later alter the code and modify the variable, you may easily forget to check its effect on the closure and get unexpected results.
Also there might be cases when you do want to observe changes made to "outer" variables, such as:
func GetRes(name string) (Res, error) {
res, err := somepack.OpenRes(name)
if err != nil {
return nil, err
}
closeres := true
defer func() {
if closeres {
res.Close()
}
}()
// Do other stuff
if err = otherStuff(); err != nil {
return nil, err // res will be closed
}
// Everything went well, return res, but
// res must not be closed, it will be the responsibility of the caller
closeres = false
return res, nil // res will not be closed
}
In this case the GetRes() is to open some resource. But before returning it other things have to be done which might also fail. If those fail, res must be closed and not returned. If everything goes well, res must not be closed and returned.
This is a example of parameter from net/Listen
package main
import (
"io"
"log"
"net"
)
func main() {
// Listen on TCP port 2000 on all available unicast and
// anycast IP addresses of the local system.
l, err := net.Listen("tcp", ":2000")
if err != nil {
log.Fatal(err)
}
defer l.Close()
for {
// Wait for a connection.
conn, err := l.Accept()
if err != nil {
log.Fatal(err)
}
// Handle the connection in a new goroutine.
// The loop then returns to accepting, so that
// multiple connections may be served concurrently.
go func(c net.Conn) {
// Echo all incoming data.
io.Copy(c, c)
// Shut down the connection.
c.Close()
}(conn)
}
}
I am trying build a zip archive from a large number of small-medium sized files. I want to be able to do this concurrently, since compression is CPU intensive, and I'm running on a multi core server. Also I don't want to have the whole archive in memory, since its might turn out to be large.
My question is that do I have to compress every file and then combine manually combine everything together with zip header, checksum etc?
Any help would be greatly appreciated.
I don't think you can combine the zip headers.
What you could do is, run the zip.Writer sequentially, in a separate goroutine, and then spawn a new goroutine for each file that you want to read, and pipe those to the goroutine that is zipping them.
This should reduce the IO overhead that you get by reading the files sequentially, although it probably won't leverage multiple cores for the archiving itself.
Here's a working example. Note that, to keep things simple,
it does not handle errors nicely, just panics if something goes wrong,
and it does not use the defer statement too much, to demonstrate the order in which things should happen.
Since defer is LIFO, it can sometimes be confusing when you stack a lot of them together.
package main
import (
"archive/zip"
"io"
"os"
"sync"
)
func ZipWriter(files chan *os.File) *sync.WaitGroup {
f, err := os.Create("out.zip")
if err != nil {
panic(err)
}
var wg sync.WaitGroup
wg.Add(1)
zw := zip.NewWriter(f)
go func() {
// Note the order (LIFO):
defer wg.Done() // 2. signal that we're done
defer f.Close() // 1. close the file
var err error
var fw io.Writer
for f := range files {
// Loop until channel is closed.
if fw, err = zw.Create(f.Name()); err != nil {
panic(err)
}
io.Copy(fw, f)
if err = f.Close(); err != nil {
panic(err)
}
}
// The zip writer must be closed *before* f.Close() is called!
if err = zw.Close(); err != nil {
panic(err)
}
}()
return &wg
}
func main() {
files := make(chan *os.File)
wait := ZipWriter(files)
// Send all files to the zip writer.
var wg sync.WaitGroup
wg.Add(len(os.Args)-1)
for i, name := range os.Args {
if i == 0 {
continue
}
// Read each file in parallel:
go func(name string) {
defer wg.Done()
f, err := os.Open(name)
if err != nil {
panic(err)
}
files <- f
}(name)
}
wg.Wait()
// Once we're done sending the files, we can close the channel.
close(files)
// This will cause ZipWriter to break out of the loop, close the file,
// and unblock the next mutex:
wait.Wait()
}
Usage: go run example.go /path/to/*.log.
This is the order in which things should be happening:
Open output file for writing.
Create a zip.Writer with that file.
Kick off a goroutine listening for files on a channel.
Go through each file, this can be done in one goroutine per file.
Send each file to the goroutine created in step 3.
After processing each file in said goroutine, close the file to free up resources.
Once each file has been sent to said goroutine, close the channel.
Wait until the zipping has been done (which is done sequentially).
Once zipping is done (channel exhausted), the zip writer should be closed.
Only when the zip writer is closed, should the output file be closed.
Finally everything is closed, so close the sync.WaitGroup to tell the calling function that we're good to go. (A channel could also be used here, but sync.WaitGroup seems more elegant.)
When you get the signal from the zip writer that everything is properly closed, you can exit from main and terminate nicely.
This might not answer your question, but I've been using similar code to generate zip archives on-the-fly for a web service some time ago. It performed quite well, even though the actual zipping was done in a single goroutine. Overcoming the IO bottleneck can already be an improvement.
From the look of it, you won't be able to parallelise the compression using the standard library archive/zip package because:
Compression is performed by the io.Writer returned by zip.Writer.Create or CreateHeader.
Calling Create/CreateHeader implicitly closes the writer returned by the previous call.
So passing the writers returned by Create to multiple goroutines and writing to them in parallel will not work.
If you wanted to write your own parallel zip writer, you'd probably want to structure it something like this:
Have multiple goroutines compress files using the compress/flate module, and keep track of the CRC32 value and length of the uncompressed data. The output should be directed to temporary files. Note the compressed size of the data.
Once everything has been compressed, start writing the Zip file starting with the header.
Write out the file header followed by the contents of the corresponding temporary file for each compressed file.
Write out the central directory record and end record at the end of the file. All the required information should be available at this point.
For added parallelism, step 1 could be performed in parallel with the remaining steps by using a channel to indicate when compression of each file completes.
Due to the file format, you won't be able to perform parallel compression without either storing compressed data in memory or in temporary files.
With Go1.17, parallel compression and merging of zip files are possible using the archive/zip package.
An example is below. In the example, I create zip workers to create individual zip files and an entry provider worker which provides entries to be added to a zip file via a channel to zip workers. Actual files can be provided to the zip workers but I skipped that part.
package main
import (
"archive/zip"
"context"
"fmt"
"io"
"log"
"os"
"strings"
"golang.org/x/sync/errgroup"
)
const numOfZipWorkers = 10
type entry struct {
name string
rc io.ReadCloser
}
func main() {
log.SetFlags(log.LstdFlags | log.Lshortfile)
entCh := make(chan entry, numOfZipWorkers)
zpathCh := make(chan string, numOfZipWorkers)
group, ctx := errgroup.WithContext(context.Background())
for i := 0; i < numOfZipWorkers; i++ {
group.Go(func() error {
return zipWorker(ctx, entCh, zpathCh)
})
}
group.Go(func() error {
defer close(entCh) // Signal workers to stop.
return entryProvider(ctx, entCh)
})
err := group.Wait()
if err != nil {
log.Fatal(err)
}
f, err := os.OpenFile("output.zip", os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
zw := zip.NewWriter(f)
close(zpathCh)
for path := range zpathCh {
zrd, err := zip.OpenReader(path)
if err != nil {
log.Fatal(err)
}
for _, zf := range zrd.File {
err := zw.Copy(zf)
if err != nil {
log.Fatal(err)
}
}
_ = zrd.Close()
_ = os.Remove(path)
}
err = zw.Close()
if err != nil {
log.Fatal(err)
}
err = f.Close()
if err != nil {
log.Fatal(err)
}
}
func entryProvider(ctx context.Context, entCh chan<- entry) error {
for i := 0; i < 2*numOfZipWorkers; i++ {
select {
case <-ctx.Done():
return ctx.Err()
case entCh <- entry{
name: fmt.Sprintf("file_%d", i+1),
rc: io.NopCloser(strings.NewReader(fmt.Sprintf("content %d", i+1))),
}:
}
}
return nil
}
func zipWorker(ctx context.Context, entCh <-chan entry, zpathch chan<- string) error {
f, err := os.CreateTemp(".", "tmp-part-*")
if err != nil {
return err
}
zw := zip.NewWriter(f)
Loop:
for {
var (
ent entry
ok bool
)
select {
case <-ctx.Done():
err = ctx.Err()
break Loop
case ent, ok = <-entCh:
if !ok {
break Loop
}
}
hdr := &zip.FileHeader{
Name: ent.name,
Method: zip.Deflate, // zip.Store can also be used.
}
hdr.SetMode(0644)
w, e := zw.CreateHeader(hdr)
if e != nil {
_ = ent.rc.Close()
err = e
break
}
_, e = io.Copy(w, ent.rc)
_ = ent.rc.Close()
if e != nil {
err = e
break
}
}
if e := zw.Close(); e != nil && err == nil {
err = e
}
if e := f.Close(); e != nil && err == nil {
err = e
}
if err == nil {
select {
case <-ctx.Done():
err = ctx.Err()
case zpathch <- f.Name():
}
}
return err
}