Go code running in the background
I am a beginner of the go language.
I wrote a small program that made a keyboard sound.
After go build main.go, you can hear the sound of the button in the current shell.
But when running ./main in the background or when re-opening a new shell will not hear the button sound.
This is what I need help with.
package main
import (
"fmt"
"github.com/eiannone/keyboard"
"github.com/faiface/beep"
"github.com/faiface/beep/speaker"
"github.com/faiface/beep/wav"
"os"
"time"
"log"
"path/filepath"
)
func main(){
env:= os.Getenv("HOME")
fmt.Println(env)
err := keyboard.Open()
if err != nil {
panic(err)
}
defer keyboard.Close()
for {
char, key, err := keyboard.GetKey()
if err != nil {
log.Fatal(err)
}else if (key == keyboard.KeyEsc){
break
}
n := int(char)
path := "./wav/*.wav"
fpath,_ := filepath.Glob(path)
name := fpath[n]
f, err := os.Open(name)
if err != nil {
log.Fatal(err)
}
streamer, format, err := wav.Decode(f)
if err != nil {
log.Fatal(err)
}
defer streamer.Close()
speaker.Init(format.SampleRate, format.SampleRate.N(time.Second/10))
done := make(chan bool)
speaker.Play(beep.Seq(streamer, beep.Callback(func() {
done <- true
})))
<-done
}
}
I want it to run in the background. I can hear the corresponding sound after pressing the keyboard.
If the program is running in the background it will not get any input from the keyboard.
Also since n depends on user input it would be easy for it to exceed the number of sound files and you would get an index out of bounds error at run-time. You should do something like this:
if fpath, _ := filepath.Glob("./wav/*.wav"); len(fpath) > 0 {
n := int(char)%len(fpath)
name := fpath[n]
...
Related
In Go, I would like to execute a binary from within my application and continually read what the command prints to stdout. However, the one caveat is that the binary is programmed to execute its task infinitely until it reads the enter key, and I don't have access to the binary's source code. If I execute the binary directly from a terminal, it behaves correctly. However, if I execute the binary from within my application, it somehow thinks that it reads the enter key, and closes almost immediately. Here is a code snippet demonstrating how I'm trying to execute the binary, pipe it's stdout, and print it to the screen:
func main() {
// The binary that I want to execute.
cmd := exec.Command("/usr/lib/demoApp")
// Pipe the command's output.
stdout, err := cmd.StdoutPipe()
if err != nil {
fmt.Println(err)
}
stdoutReader := bufio.NewReader(stdout)
// Start the command.
err = cmd.Start()
if err != nil {
fmt.Println(err)
}
// Read and print the command's output.
buff := make([]byte, 1024)
var n int
for err == nil {
n, err = stdoutReader.Read(buff)
if n > 0 {
fmt.Printf(string(buff[0:n]))
}
}
_ = cmd.Wait()
}
Any ideas if what I'm trying to accomplish is possible?
As #mgagnon mentioned, your problem might lie somewhere else; like perhaps the external dependency just bails due to not running in a terminal. Using following to simulate demoApp:
func main() {
fmt.Println("Press enter to exit")
// Every second, report fake progress
go func() {
for {
fmt.Print("Doing stuff...\n")
time.Sleep(time.Second)
}
}()
for {
// Read single character and if enter, exit.
consoleReader := bufio.NewReaderSize(os.Stdin, 1)
input, _ := consoleReader.ReadByte()
// Enter = 10 | 13 (LF or CR)
if input == 10 || input == 13 {
fmt.Println("Exiting...")
os.Exit(0)
}
}
}
... this works fine for me:
func main() {
cmd := exec.Command("demoApp.exe")
stdout, err := cmd.StdoutPipe()
if err != nil {
panic(err)
}
stdin, err := cmd.StdinPipe()
if err != nil {
log.Fatal(err)
}
go func() {
defer stdin.Close()
// After 3 seconds of running, send newline to cause program to exit.
time.Sleep(time.Second * 3)
io.WriteString(stdin, "\n")
}()
cmd.Start()
// Scan and print command's stdout
scanner := bufio.NewScanner(stdout)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
// Wait for program to exit.
cmd.Wait()
}
$ go run main.go
Press enter to exit
Doing stuff...
Doing stuff...
Doing stuff...
Exiting...
The only difference between this and your code is that I'm using stdin to send a newline after 3 seconds to terminate the cmd. Also using scanner for brevity.
Using this as my /usr/lib/demoApp:
package main
import (
"fmt"
"time"
)
func main() {
for {
fmt.Print("North East South West")
time.Sleep(time.Second)
}
}
This program works as expected:
package main
import (
"os"
"os/exec"
)
func main() {
cmd := exec.Command("demoApp")
stdout, err := cmd.StdoutPipe()
if err != nil {
panic(err)
}
cmd.Start()
defer cmd.Wait()
for {
var b [1024]byte
stdout.Read(b[:])
os.Stdout.Write(b[:])
}
}
I am new to golang so please review the code and suggest any changes required.
So the problem statement goes below,
We have a file whose contents are in binary and are encrypted. The only way to read that contents if by using a custom utility say (named decode_it).. The command just accepts filename like below
decode_it filename.d
Now what I have to do is live monitoring the output of the decode_it utility in GO. I have written the code which is working great but somehow it is not able to process the latest tailed output (it is waiting for some amount of time for reading the last latest chunk before more data comes in ). s.Scan() is the function which is not returning the latest changes in output of that utility. I have another terminal side by side so I know that a line is appended or not. The GO Scan() function only scans when another chunk is appended at the end.
Please help. Suggest any changes required and also if possible you can suggest any other alternative approach for this.
Output of utility is - These are huge and come in seconds
1589261318 493023 8=DECODE|9=59|10=053|34=1991|35=0|49=TEST|52=20200512-05:28:38|56=TEST|57=ADMIN|
1589261368 538427 8=DECODE|9=59|10=054|34=1992|35=0|49=TEST|52=20200512-05:29:28|56=TEST|57=ADMIN|
1589261418 579765 8=DECODE|9=59|10=046|34=1993|35=0|49=TEST|52=20200512-05:30:18|56=TEST|57=ADMIN|
1589261468 627052 8=DECODE|9=59|10=047|34=1994|35=0|49=TEST|52=20200512-05:31:08|56=TEST|57=ADMIN|
1589261518 680570 8=DECODE|9=59|10=053|34=1995|35=0|49=TEST|52=20200512-05:31:58|56=TEST|57=ADMIN|
1589261568 722516 8=DECODE|9=59|10=054|34=1996|35=0|49=TEST|52=20200512-05:32:48|56=TEST|57=ADMIN|
1589261618 766070 8=DECODE|9=59|10=055|34=1997|35=0|49=TEST|52=20200512-05:33:38|56=TEST|57=ADMIN|
1589261668 807964 8=DECODE|9=59|10=056|34=1998|35=0|49=TEST|52=20200512-05:34:28|56=TEST|57=ADMIN|
1589261718 853464 8=DECODE|9=59|10=057|34=1999|35=0|49=TEST|52=20200512-05:35:18|56=TEST|57=ADMIN|
1589261768 898758 8=DECODE|9=59|10=031|34=2000|35=0|49=TEST|52=20200512-05:36:08|56=TEST|57=ADMIN|
1589261818 948236 8=DECODE|9=59|10=037|34=2001|35=0|49=TEST|52=20200512-05:36:58|56=TEST|57=ADMIN|
1589261868 995181 8=DECODE|9=59|10=038|34=2002|35=0|49=TEST|52=20200512-05:37:48|56=TEST|57=ADMIN|
1589261918 36727 8=DECODE|9=59|10=039|34=2003|35=0|49=TEST|52=20200512-05:38:38|56=TEST|57=ADMIN|
1589261968 91253 8=DECODE|9=59|10=040|34=2004|35=0|49=TEST|52=20200512-05:39:28|56=TEST|57=ADMIN|
1589262018 129336 8=DECODE|9=59|10=032|34=2005|35=0|49=TEST|52=20200512-05:40:18|56=TEST|57=ADMIN|
1589262068 173247 8=DECODE|9=59|10=033|34=2006|35=0|49=TEST|52=20200512-05:41:08|56=TEST|57=ADMIN|
1589262118 214993 8=DECODE|9=59|10=039|34=2007|35=0|49=TEST|52=20200512-05:41:58|56=TEST|57=ADMIN|
1589262168 256754 8=DECODE|9=59|10=040|34=2008|35=0|49=TEST|52=20200512-05:42:48|56=TEST|57=ADMIN|
1589262218 299908 8=DECODE|9=59|10=041|34=2009|35=0|49=TEST|52=20200512-05:43:38|56=TEST|57=ADMIN|
1589262268 345560 8=DECODE|9=59|10=033|34=2010|35=0|49=TEST|52=20200512-05:44:28|56=TEST|57=ADMIN|
1589262318 392894 8=DECODE|9=59|10=034|34=2011|35=0|49=TEST|52=20200512-05:45:18|56=TEST|57=ADMIN|
1589262368 439936 8=DECODE|9=59|10=035|34=2012|35=0|49=TEST|52=20200512-05:46:08|56=TEST|57=ADMIN|
1589262418 484959 8=DECODE|9=59|10=041|34=2013|35=0|49=TEST|52=20200512-05:46:58|56=TEST|57=ADMIN|
1589262468 531136 8=DECODE|9=59|10=042|34=2014|35=0|49=TEST|52=20200512-05:47:48|56=TEST|57=ADMIN|
1589262518 577190 8=DECODE|9=59|10=043|34=2015|35=0|49=TEST|52=20200512-05:48:38|56=TEST|57=ADMIN|
1589262568 621673 8=DECODE|9=59|10=044|34=2016|35=0|49=TEST|52=20200512-05:49:28|56=TEST|57=ADMIN|
1589262618 661569 8=DECODE|9=59|10=036|34=2017|35=0|49=TEST|52=20200512-05:50:18|56=TEST|57=ADMIN|
1589262668 704912 8=DECODE|9=59|10=037|34=2018|35=0|49=TEST|52=20200512-05:51:08|56=TEST|57=ADMIN|
1589262718 751844 8=DECODE|9=59|10=043|34=2019|35=0|49=TEST|52=20200512-05:51:58|56=TEST|57=ADMIN|
1589262768 792980 8=DECODE|9=59|10=035|34=2020|35=0|49=TEST|52=20200512-05:52:48|56=TEST|57=ADMIN|
1589262818 840365 8=DECODE|9=59|10=036|34=2021|35=0|49=TEST|52=20200512-05:53:38|56=TEST|57=ADMIN|
1589262868 879185 8=DECODE|9=59|10=037|34=2022|35=0|49=TEST|52=20200512-05:54:28|56=TEST|57=ADMIN|
1589262918 925163 8=DECODE|9=59|10=038|34=2023|35=0|49=TEST|52=20200512-05:55:18|56=TEST|57=ADMIN|
1589262968 961584 8=DECODE|9=59|10=039|34=2024|35=0|49=TEST|52=20200512-05:56:08|56=TEST|57=ADMIN|
1589263018 10120 8=DECODE|9=59|10=045|34=2025|35=0|49=TEST|52=20200512-05:56:58|56=TEST|57=ADMIN|
1589263068 53127 8=DECODE|9=59|10=046|34=2026|35=0|49=TEST|52=20200512-05:57:48|56=TEST|57=ADMIN|
1589263118 92960 8=DECODE|9=59|10=047|34=2027|35=0|49=TEST|52=20200512-05:58:38|56=TEST|57=ADMIN|
1589263168 134768 8=DECODE|9=59|10=048|34=2028|35=0|49=TEST|52=20200512-05:59:28|56=TEST|57=ADMIN|
1589263218 180362 8=DECODE|9=59|10=035|34=2029|35=0|49=TEST|52=20200512-06:00:18|56=TEST|57=ADMIN|
1589263268 220070 8=DECODE|9=59|10=027|34=2030|35=0|49=TEST|52=20200512-06:01:08|56=TEST|57=ADMIN|
1589263318 269426 8=DECODE|9=59|10=033|34=2031|35=0|49=TEST|52=20200512-06:01:58|56=TEST|57=ADMIN|
1589263368 309432 8=DECODE|9=59|10=034|34=2032|35=0|49=TEST|52=20200512-06:02:48|56=TEST|57=ADMIN|
1589263418 356561 8=DECODE|9=59|10=035|34=2033|35=0|49=TEST|52=20200512-06:03:38|56=TEST|57=ADMIN|
Code -
package main
import (
"bytes"
"bufio"
"io"
"log"
"os/exec"
"fmt"
)
// dropCRLR drops a terminal \r from the data.
func dropCRLR(data []byte) []byte {
if len(data) > 0 && data[len(data)-1] == '\r' {
return data[0 : len(data)-1]
}
return data
}
func newLineSplitFunc(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.IndexByte(data, '\n'); i >= 0 {
// We have a full newline-terminated line.
return i + 1, dropCRLR(data[0:i]), nil
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
return len(data), dropCRLR(data), nil
}
// Request more data.
// fmt.Println("Returning 0,nil,nil")
return 0, nil, nil
}
func main() {
cmd := exec.Command("decode_it", "filename.d", "4", "1")
var out io.Reader
{
stdout, err := cmd.StdoutPipe()
if err != nil {
log.Fatal(err)
}
stderr, err := cmd.StderrPipe()
if err != nil {
log.Fatal(err)
}
out = io.MultiReader(stdout, stderr)
}
if err := cmd.Start(); err != nil {
log.Fatal(err)
}
// Make a new channel which will be used to ensure we get all output
done := make(chan struct{})
go func() {
// defer cmd.Process.Kill()
s := bufio.NewScanner(out)
s.Split(newLineSplitFunc)
for s.Scan() {
fmt.Println("---- " + s.Text())
}
if s.Err() != nil {
fmt.Printf("error: %s\n", s.Err())
}
}()
// Wait for all output to be processed
<-done
// Wait for the command to finish
if err := cmd.Wait(); err != nil{
fmt.Println("Error: " + string(err.Error()))
}
// if out closes, cmd closed.
log.Println("all done")
}
Also, Since scan() is taking a lot of time and goes into a loop from which I am not able to break as well. Please help for that too..
try something like this one, i fixed some issues and make it more simple:
package main
import (
"bufio"
"fmt"
"io"
"log"
"os/exec"
)
func main() {
var err error
// change to your command
cmd := exec.Command("sh", "test.sh")
var out io.Reader
{
var stdout, stderr io.ReadCloser
stdout, err = cmd.StdoutPipe()
if err != nil {
log.Fatal(err)
}
stderr, err = cmd.StderrPipe()
if err != nil {
log.Fatal(err)
}
out = io.MultiReader(stdout, stderr)
}
if err = cmd.Start(); err != nil {
log.Fatal(err)
}
scanner := bufio.NewScanner(out)
for scanner.Scan() {
fmt.Println("---- " + scanner.Text())
}
if err = scanner.Err(); err != nil {
fmt.Printf("error: %v\n", err)
}
log.Println("all done")
}
test.sh that i used in test:
#!/bin/bash
while [[ 1 = 1 ]]; do
echo 1
sleep 1
done
:)
I tried resolving the above issue using stdbuf -
cmd := exec.Command("stdbuf", "-o0", "-e0", "decode_it", FILEPATH, "4", "1")
Reference link - STDIO Buffering
When programs write to stdout they write with line bufferring. If they are writing to something else, then they use fully buffered mode. golang exec.Command seems to end up using fully buffered mode so using stdbuf forces no buffering.
I want to write a mime/multipart message in Python to standard output and read that message in Golang using the mime/multipart package. This is just a learning exercise.
I tried simulating this example.
output.py
#!/usr/bin/env python2.7
import sys
s = "--foo\r\nFoo: one\r\n\r\nA section\r\n" +"--foo\r\nFoo: two\r\n\r\nAnd another\r\n" +"--foo--\r\n"
print s
main.go
package main
import (
"io"
"os/exec"
"mime/multipart"
"log"
"io/ioutil"
"fmt"
"sync"
)
var wg sync.WaitGroup
func main() {
pr,pw := io.Pipe()
defer pw.Close()
cmd := exec.Command("python","output.py")
cmd.Stdout = pw
mr := multipart.NewReader(pr,"foo")
wg.Add(1)
go func() {
defer wg.Done()
for {
p, err := mr.NextPart()
if err == io.EOF {
fmt.Println("EOF")
return
}
if err != nil {
log.Fatal(err)
}
slurp, err := ioutil.ReadAll(p)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Part : %q\n", slurp)
return
}
}()
if err := cmd.Start(); err != nil {
log.Fatal(err)
}
cmd.Wait()
wg.Wait()
}
Output of go run main.go:
fatal error: all goroutines are asleep - deadlock!
Other answers regarding this topic on StackOverflow are related to channels not being closed, but I am not even using a channel. I understand that somewhere, there is infinite loop or something similar, but I don't see it.
Try something like this (explanation below):
package main
import (
"fmt"
"io"
"io/ioutil"
"log"
"mime/multipart"
"os"
"os/exec"
"sync"
"github.com/pkg/errors"
)
func readCommand(cmdStdout io.ReadCloser, wg *sync.WaitGroup, resc chan<- []byte, errc chan<- error) {
defer wg.Done()
defer close(errc)
defer close(resc)
mr := multipart.NewReader(cmdStdout, "foo")
for {
part, err := mr.NextPart()
if err != nil {
if err == io.EOF {
fmt.Println("EOF")
} else {
errc <- errors.Wrap(err, "failed to get next part")
}
return
}
slurp, err := ioutil.ReadAll(part)
if err != nil {
errc <- errors.Wrap(err, "failed to read part")
return
}
resc <- slurp
}
}
func main() {
cmd := exec.Command("python", "output.py")
cmd.Stderr = os.Stderr
pr, err := cmd.StdoutPipe()
if err != nil {
log.Fatal(err)
}
var wg sync.WaitGroup
wg.Add(1)
resc := make(chan []byte)
errc := make(chan error)
go readCommand(pr, &wg, resc, errc)
if err := cmd.Start(); err != nil {
log.Fatal(err)
}
for {
select {
case err, ok := <-errc:
if !ok {
errc = nil
break
}
if err != nil {
log.Fatal(errors.Wrap(err, "error from goroutine"))
}
case res, ok := <-resc:
if !ok {
resc = nil
break
}
fmt.Printf("Part from goroutine: %q\n", res)
}
if errc == nil && resc == nil {
break
}
}
cmd.Wait()
wg.Wait()
}
In no particular order:
Rather than using an io.Pipe() as the command's Stdout, just ask the command for it's StdoutPipe(). cmd.Wait() will ensure it's closed for you.
Set cmd.Stderr to os.Stderr so that you can see errors generated by your Python program.
I noticed this program was hanging anytime the Python program wrote to standard error. Now it doesn't :)
Don't make the WaitGroup a global variable; pass a reference to it to the goroutine.
Rather than log.Fatal()ing inside the goroutine, create an error channel to communicate errors back to main().
Rather than printing results inside the goroutine, create a result channel to communicate results back to main().
Ensure channels are closed to prevent blocking/goroutine leaks.
Separate out the goroutine into a proper function to make the code easier to read and follow.
In this example, we can create the multipart.Reader() inside our goroutine, since this is the only part of our code that uses it.
Note that I am using Wrap() from the errors package to add context to the error messages. This is, of course, not relevant to your question, but is a good habit.
The for { select { ... } } part may be confusing. This is one article I found introducing the concept. Basically, select is letting us read from whichever of these two channels (resc and errc) are currently readable, and then setting each to nil when the channel is closed. When both channels are nil, the loop exits. This lets us handle "either a result or an error" as they come in.
Edit: As johandalabacka said on the Golang Forum, it looks like the main issue here was that Python on Windows was adding an extra \r to the output, and that the problem is your Python program should omit the \r in the output string or sys.stdout.write() instead of print() ing. The output could also be cleaned up on the Golang side, but, aside from not being able to parse properly without modifying the Python side, this answer will still improve the concurrency mechanics of your program.
I am trying build a zip archive from a large number of small-medium sized files. I want to be able to do this concurrently, since compression is CPU intensive, and I'm running on a multi core server. Also I don't want to have the whole archive in memory, since its might turn out to be large.
My question is that do I have to compress every file and then combine manually combine everything together with zip header, checksum etc?
Any help would be greatly appreciated.
I don't think you can combine the zip headers.
What you could do is, run the zip.Writer sequentially, in a separate goroutine, and then spawn a new goroutine for each file that you want to read, and pipe those to the goroutine that is zipping them.
This should reduce the IO overhead that you get by reading the files sequentially, although it probably won't leverage multiple cores for the archiving itself.
Here's a working example. Note that, to keep things simple,
it does not handle errors nicely, just panics if something goes wrong,
and it does not use the defer statement too much, to demonstrate the order in which things should happen.
Since defer is LIFO, it can sometimes be confusing when you stack a lot of them together.
package main
import (
"archive/zip"
"io"
"os"
"sync"
)
func ZipWriter(files chan *os.File) *sync.WaitGroup {
f, err := os.Create("out.zip")
if err != nil {
panic(err)
}
var wg sync.WaitGroup
wg.Add(1)
zw := zip.NewWriter(f)
go func() {
// Note the order (LIFO):
defer wg.Done() // 2. signal that we're done
defer f.Close() // 1. close the file
var err error
var fw io.Writer
for f := range files {
// Loop until channel is closed.
if fw, err = zw.Create(f.Name()); err != nil {
panic(err)
}
io.Copy(fw, f)
if err = f.Close(); err != nil {
panic(err)
}
}
// The zip writer must be closed *before* f.Close() is called!
if err = zw.Close(); err != nil {
panic(err)
}
}()
return &wg
}
func main() {
files := make(chan *os.File)
wait := ZipWriter(files)
// Send all files to the zip writer.
var wg sync.WaitGroup
wg.Add(len(os.Args)-1)
for i, name := range os.Args {
if i == 0 {
continue
}
// Read each file in parallel:
go func(name string) {
defer wg.Done()
f, err := os.Open(name)
if err != nil {
panic(err)
}
files <- f
}(name)
}
wg.Wait()
// Once we're done sending the files, we can close the channel.
close(files)
// This will cause ZipWriter to break out of the loop, close the file,
// and unblock the next mutex:
wait.Wait()
}
Usage: go run example.go /path/to/*.log.
This is the order in which things should be happening:
Open output file for writing.
Create a zip.Writer with that file.
Kick off a goroutine listening for files on a channel.
Go through each file, this can be done in one goroutine per file.
Send each file to the goroutine created in step 3.
After processing each file in said goroutine, close the file to free up resources.
Once each file has been sent to said goroutine, close the channel.
Wait until the zipping has been done (which is done sequentially).
Once zipping is done (channel exhausted), the zip writer should be closed.
Only when the zip writer is closed, should the output file be closed.
Finally everything is closed, so close the sync.WaitGroup to tell the calling function that we're good to go. (A channel could also be used here, but sync.WaitGroup seems more elegant.)
When you get the signal from the zip writer that everything is properly closed, you can exit from main and terminate nicely.
This might not answer your question, but I've been using similar code to generate zip archives on-the-fly for a web service some time ago. It performed quite well, even though the actual zipping was done in a single goroutine. Overcoming the IO bottleneck can already be an improvement.
From the look of it, you won't be able to parallelise the compression using the standard library archive/zip package because:
Compression is performed by the io.Writer returned by zip.Writer.Create or CreateHeader.
Calling Create/CreateHeader implicitly closes the writer returned by the previous call.
So passing the writers returned by Create to multiple goroutines and writing to them in parallel will not work.
If you wanted to write your own parallel zip writer, you'd probably want to structure it something like this:
Have multiple goroutines compress files using the compress/flate module, and keep track of the CRC32 value and length of the uncompressed data. The output should be directed to temporary files. Note the compressed size of the data.
Once everything has been compressed, start writing the Zip file starting with the header.
Write out the file header followed by the contents of the corresponding temporary file for each compressed file.
Write out the central directory record and end record at the end of the file. All the required information should be available at this point.
For added parallelism, step 1 could be performed in parallel with the remaining steps by using a channel to indicate when compression of each file completes.
Due to the file format, you won't be able to perform parallel compression without either storing compressed data in memory or in temporary files.
With Go1.17, parallel compression and merging of zip files are possible using the archive/zip package.
An example is below. In the example, I create zip workers to create individual zip files and an entry provider worker which provides entries to be added to a zip file via a channel to zip workers. Actual files can be provided to the zip workers but I skipped that part.
package main
import (
"archive/zip"
"context"
"fmt"
"io"
"log"
"os"
"strings"
"golang.org/x/sync/errgroup"
)
const numOfZipWorkers = 10
type entry struct {
name string
rc io.ReadCloser
}
func main() {
log.SetFlags(log.LstdFlags | log.Lshortfile)
entCh := make(chan entry, numOfZipWorkers)
zpathCh := make(chan string, numOfZipWorkers)
group, ctx := errgroup.WithContext(context.Background())
for i := 0; i < numOfZipWorkers; i++ {
group.Go(func() error {
return zipWorker(ctx, entCh, zpathCh)
})
}
group.Go(func() error {
defer close(entCh) // Signal workers to stop.
return entryProvider(ctx, entCh)
})
err := group.Wait()
if err != nil {
log.Fatal(err)
}
f, err := os.OpenFile("output.zip", os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
zw := zip.NewWriter(f)
close(zpathCh)
for path := range zpathCh {
zrd, err := zip.OpenReader(path)
if err != nil {
log.Fatal(err)
}
for _, zf := range zrd.File {
err := zw.Copy(zf)
if err != nil {
log.Fatal(err)
}
}
_ = zrd.Close()
_ = os.Remove(path)
}
err = zw.Close()
if err != nil {
log.Fatal(err)
}
err = f.Close()
if err != nil {
log.Fatal(err)
}
}
func entryProvider(ctx context.Context, entCh chan<- entry) error {
for i := 0; i < 2*numOfZipWorkers; i++ {
select {
case <-ctx.Done():
return ctx.Err()
case entCh <- entry{
name: fmt.Sprintf("file_%d", i+1),
rc: io.NopCloser(strings.NewReader(fmt.Sprintf("content %d", i+1))),
}:
}
}
return nil
}
func zipWorker(ctx context.Context, entCh <-chan entry, zpathch chan<- string) error {
f, err := os.CreateTemp(".", "tmp-part-*")
if err != nil {
return err
}
zw := zip.NewWriter(f)
Loop:
for {
var (
ent entry
ok bool
)
select {
case <-ctx.Done():
err = ctx.Err()
break Loop
case ent, ok = <-entCh:
if !ok {
break Loop
}
}
hdr := &zip.FileHeader{
Name: ent.name,
Method: zip.Deflate, // zip.Store can also be used.
}
hdr.SetMode(0644)
w, e := zw.CreateHeader(hdr)
if e != nil {
_ = ent.rc.Close()
err = e
break
}
_, e = io.Copy(w, ent.rc)
_ = ent.rc.Close()
if e != nil {
err = e
break
}
}
if e := zw.Close(); e != nil && err == nil {
err = e
}
if e := f.Close(); e != nil && err == nil {
err = e
}
if err == nil {
select {
case <-ctx.Done():
err = ctx.Err()
case zpathch <- f.Name():
}
}
return err
}
I want to efficiently calculate the checksum of a very large file (multiple GB). This Go program has two approaches one chunks the file and calculates the checksum quicksha but it's not correct. Another classical approach slowsha works well.
Can you help me fix quicksha?
package main
import (
"bufio"
"crypto/sha256"
"encoding/hex"
"io"
"log"
"net/http"
"net/http/pprof"
"os"
)
func slowsha(fname string) {
f, err := os.Open(fname)
if err != nil {
log.Fatal(err)
}
defer f.Close()
h := sha256.New()
if _, err := io.Copy(h, f); err != nil {
log.Fatal(err)
}
log.Printf("%s %s", hex.EncodeToString(h.Sum(nil)), os.Args[1])
}
func quicksha(fname string) {
f, err := os.Open(fname)
if err != nil {
log.Fatal(err)
}
defer f.Close()
buf := make([]byte, 16*1024)
pr, pw := io.Pipe()
go func() {
w := bufio.NewWriter(pw)
for {
n, err := f.Read(buf)
if n > 0 {
buf = buf[:n]
w.Write(buf)
}
if err == io.EOF {
pw.Close()
break
}
}
}()
h := sha256.New()
io.Copy(h, pr)
log.Printf("%s %s", hex.EncodeToString(h.Sum(nil)), os.Args[1])
}
func main() {
fname := os.Args[2]
choice := os.Args[1]
for i := 0; i < 100; i++ {
if choice == "-s" {
slowsha(fname)
} else if choice == "-f" {
quicksha(fname)
} else {
log.Fatal("Bad choice")
}
}
}
Output
shasum -a 256 lessthan20MBTest.doc >> reference answer
d91b998a372035c2378fc40a6d0eee17b9f16d60207343f9fc3558eb77f90b71 lessthan20MBTest.doc
./quicksha -f lessthan20MBTest.doc >> wrong answer
b97d5167bbe945ca90223b7503653df89ba9e7d420268da27851fca6db3fcdcf lessthan20MBTest.doc
./quicksha -s lessthan20MBTest.doc . >>> right answer
d91b998a372035c2378fc40a6d0eee17b9f16d60207343f9fc3558eb77f90b71 lessthan20MBTest.doc
There are several problems in your program:
First: you are already using a buffer for reading/writing, so there is no need to use a bufio.Writer. You are double-buffering with that. Which also happens to be the reason why you don't get the result you want: you have to w.Flush() before closing the pipe, because you haven't written what's in the bufio.Writer's buffers to the pipe:
if err == io.EOF {
w.Flush()
pw.Close()
break
}
Second: you are making your buffer shorter. In general, read does not have to read to fill the buffer. If the underlying stream is a network stream, read may read less than the buffer size, and that doesn't mean the end of stream reached. For files, this does not make any difference in practice but in general, you should do:
if n > 0 {
w.Write(buf[:n])
}
Third: Did you measure? It is unlikely that the 'faster' implementation is actually faster. Including the buffering in io.Copy, you're triple-buffering with this implementation.