How to calculate checksum of a file efficiently - go

I want to efficiently calculate the checksum of a very large file (multiple GB). This Go program has two approaches one chunks the file and calculates the checksum quicksha but it's not correct. Another classical approach slowsha works well.
Can you help me fix quicksha?
package main
import (
"bufio"
"crypto/sha256"
"encoding/hex"
"io"
"log"
"net/http"
"net/http/pprof"
"os"
)
func slowsha(fname string) {
f, err := os.Open(fname)
if err != nil {
log.Fatal(err)
}
defer f.Close()
h := sha256.New()
if _, err := io.Copy(h, f); err != nil {
log.Fatal(err)
}
log.Printf("%s %s", hex.EncodeToString(h.Sum(nil)), os.Args[1])
}
func quicksha(fname string) {
f, err := os.Open(fname)
if err != nil {
log.Fatal(err)
}
defer f.Close()
buf := make([]byte, 16*1024)
pr, pw := io.Pipe()
go func() {
w := bufio.NewWriter(pw)
for {
n, err := f.Read(buf)
if n > 0 {
buf = buf[:n]
w.Write(buf)
}
if err == io.EOF {
pw.Close()
break
}
}
}()
h := sha256.New()
io.Copy(h, pr)
log.Printf("%s %s", hex.EncodeToString(h.Sum(nil)), os.Args[1])
}
func main() {
fname := os.Args[2]
choice := os.Args[1]
for i := 0; i < 100; i++ {
if choice == "-s" {
slowsha(fname)
} else if choice == "-f" {
quicksha(fname)
} else {
log.Fatal("Bad choice")
}
}
}
Output
shasum -a 256 lessthan20MBTest.doc >> reference answer
d91b998a372035c2378fc40a6d0eee17b9f16d60207343f9fc3558eb77f90b71 lessthan20MBTest.doc
./quicksha -f lessthan20MBTest.doc >> wrong answer
b97d5167bbe945ca90223b7503653df89ba9e7d420268da27851fca6db3fcdcf lessthan20MBTest.doc
./quicksha -s lessthan20MBTest.doc . >>> right answer
d91b998a372035c2378fc40a6d0eee17b9f16d60207343f9fc3558eb77f90b71 lessthan20MBTest.doc

There are several problems in your program:
First: you are already using a buffer for reading/writing, so there is no need to use a bufio.Writer. You are double-buffering with that. Which also happens to be the reason why you don't get the result you want: you have to w.Flush() before closing the pipe, because you haven't written what's in the bufio.Writer's buffers to the pipe:
if err == io.EOF {
w.Flush()
pw.Close()
break
}
Second: you are making your buffer shorter. In general, read does not have to read to fill the buffer. If the underlying stream is a network stream, read may read less than the buffer size, and that doesn't mean the end of stream reached. For files, this does not make any difference in practice but in general, you should do:
if n > 0 {
w.Write(buf[:n])
}
Third: Did you measure? It is unlikely that the 'faster' implementation is actually faster. Including the buffering in io.Copy, you're triple-buffering with this implementation.

Related

Ignore a line containing a pattern from a long text file in Go

I'm trying to implement a function to ignore a line containing a pattern from a long text file (ASCII guaranteed) in Go
The functions I have below withoutIgnore and withIgnore, both take a filename argument input and return a *byte.Buffer, which can be subsequently used to write to a io.Writer.
The withIgnore function takes an additional argument pattern to exclude the line containing the pattern from the file. The function works, but with benchmarking, found it to be 5x slower than withoutIgnore. Is there a way it could be improved?
package main
import (
"bufio"
"bytes"
"io"
"log"
"os"
)
func withoutIgnore(f string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
log.Fatal(err)
}
defer func() {
if err := rfd.Close(); err != nil {
log.Fatal(err)
}
}()
inputBuffer := make([]byte, 1048576)
var bytesRead int
var bs []byte
opBuffer := bytes.NewBuffer(bs)
for {
bytesRead, err = rfd.Read(inputBuffer)
if err == io.EOF {
return opBuffer, nil
}
if err != nil {
return nil, nil
}
_, err = opBuffer.Write(inputBuffer[:bytesRead])
if err != nil {
return nil, err
}
}
return opBuffer, nil
}
func withIgnore(f, pattern string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
log.Fatal(err)
}
defer func() {
if err := rfd.Close(); err != nil {
log.Fatal(err)
}
}()
scanner := bufio.NewScanner(rfd)
var bs []byte
buffer := bytes.NewBuffer(bs)
for scanner.Scan() {
if !bytes.Contains(scanner.Bytes(), []byte(pattern)) {
_, err := buffer.WriteString(scanner.Text() + "\n")
if err != nil {
return nil, nil
}
}
}
return buffer, nil
}
func main() {
// buff, err := withoutIgnore("base64dump.log")
buff, err := withIgnore("base64dump.log", "AUDIT")
if err != nil {
log.Fatal(err)
}
_, err = buff.WriteTo(os.Stdout)
if err != nil {
log.Fatal(err)
}
}
Benchmark test
package main
import "testing"
func BenchmarkTestWithoutIgnore(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := withoutIgnore("base64dump.log")
if err != nil {
b.Fatal(err)
}
}
}
func BenchmarkTestWithIgnore(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := withIgnore("base64dump.log", "AUDIT")
if err != nil {
b.Fatal(err)
}
}
}
and the "base64dump.log" can be generated in the command line using
base64 /dev/urandom | head -c 10000000 > base64dump.log
Since ASCII is guaranteed, one can work directly at byte level.
Still if one checks each byte for line breaks when reading the input and then searches for the pattern again within the line, operations are applied to each byte.
If, on the other hand, one reads chunks of the input and performs an optimized search for the pattern in the text, not even examining each input byte, one minimizes the operations per input byte.
For example, there is the Boyer-Moore string search algorithm. Go's built-in bytes.Index function is also optimized. The achieved speed depends of course on the input data and the actual pattern. For the input as specified in the question, `bytes.Index turned out to be significantly more performant when measured.
Procedure
read in a chunk, where the chunk size should be significantly longer than the maximum line length, a value >= 64KB should probably be good, in the test 1MB was used as in the question.
a chunk usually doesn't end at a linefeed, so search from the end of the chunk to the next linefeed, limit the search to this slice and remember the remaining data for the next pass
the last chunk does not necessarily end in a linefeed
with the help of the performant GO function bytes.Index you can find the places where the pattern occurs in the chunk
from the found location one searches for the preceding and the following linefeed
then the block is output up to the corresponding beginning of the line
and the search is continued from the end of the line where the pattern occurred
if the search does not find another location, the rest is output
read the next chunk and apply the described steps again until the end of the file is reached
Noteworthy
A read operation may return less data than the chunk size, so it makes sense to repeat the read operation until the chunk size data has been read.
Benchmark
Optimized code is often significantly more complicated, but the performance is also significantly better, as we will see in a moment.
BenchmarkTestWithoutIgnore-8 270 4137267 ns/op
BenchmarkTestWithIgnore-8 54 22403931 ns/op
BenchmarkTestFilter-8 150 7947454 ns/op
Here, the optimized code BenchmarkTestFilter-8 is only about 1.9x slower than the operation without filtering while the BenchmarkTestWithIgnore-8 method is 5.4x slower than the comparison value without filtering.
Looked at another way: the optimized code is 2.8 times faster than the unoptimized one.
Code
Of course, here is the code for your own tests:
func filterFile(f, pattern string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
log.Fatal(err)
}
defer func() {
if err := rfd.Close(); err != nil {
log.Fatal(err)
}
}()
reader := bufio.NewReader(rfd)
return filter(reader, []byte(pattern), 1024*1024)
}
// chunkSize must be larger than the longest line
// a reasonable size is probably >= 64K
func filter(reader io.Reader, pattern []byte, chunkSize int) (*bytes.Buffer, error) {
var bs []byte
buffer := bytes.NewBuffer(bs)
chunk := make([]byte, chunkSize)
var remaining []byte
for lastChunk := false; !lastChunk; {
n, err := readChunk(reader, chunk, remaining, chunkSize)
if err != nil {
if err == io.EOF {
lastChunk = true
} else {
return nil, err
}
}
remaining = remaining[:0]
if !lastChunk {
for i := n - 1; i > 0; i-- {
if chunk[i] == '\n' {
remaining = append(remaining, chunk[i+1:n]...)
n = i + 1
break
}
}
}
s := 0
for s < n {
hit := bytes.Index(chunk[s:n], pattern)
if hit < 0 {
break
}
hit += s
startOfLine := hit
for ; startOfLine > 0; startOfLine-- {
if chunk[startOfLine] == '\n' {
startOfLine++
break
}
}
endOfLine := hit + len(pattern)
for ; endOfLine < n; endOfLine++ {
if chunk[endOfLine] == '\n' {
break
}
}
endOfLine++
_, err = buffer.Write(chunk[s:startOfLine])
if err != nil {
return nil, err
}
s = endOfLine
}
if s < n {
_, err = buffer.Write(chunk[s:n])
if err != nil {
return nil, err
}
}
}
return buffer, nil
}
func readChunk(reader io.Reader, chunk, remaining []byte, chunkSize int) (int, error) {
copy(chunk, remaining)
r := len(remaining)
for r < chunkSize {
n, err := reader.Read(chunk[r:])
r += n
if err != nil {
return r, err
}
}
return r, nil
}
And the benchmark part might look something like this:
func BenchmarkTestFilter(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := filterFile("base64dump.log", "AUDIT")
if err != nil {
b.Fatal(err)
}
}
}
The filter function was split and the actual job is done in func filter(reader io.Reader, pattern []byte, chunkSize int) (*bytes.Buffer, error).
By injecting a reader and a chunkSize, the creation of unit tests is already prepared or contemplated, which is missing here, but is definitely recommended when dealing with indexes.
However, the main point here was to find a way to significantly improve it in terms of performance.

Bufio scan function is waiting for sometime for the last buffer (tailed operation) in GO

I am new to golang so please review the code and suggest any changes required.
So the problem statement goes below,
We have a file whose contents are in binary and are encrypted. The only way to read that contents if by using a custom utility say (named decode_it).. The command just accepts filename like below
decode_it filename.d
Now what I have to do is live monitoring the output of the decode_it utility in GO. I have written the code which is working great but somehow it is not able to process the latest tailed output (it is waiting for some amount of time for reading the last latest chunk before more data comes in ). s.Scan() is the function which is not returning the latest changes in output of that utility. I have another terminal side by side so I know that a line is appended or not. The GO Scan() function only scans when another chunk is appended at the end.
Please help. Suggest any changes required and also if possible you can suggest any other alternative approach for this.
Output of utility is - These are huge and come in seconds
1589261318 493023 8=DECODE|9=59|10=053|34=1991|35=0|49=TEST|52=20200512-05:28:38|56=TEST|57=ADMIN|
1589261368 538427 8=DECODE|9=59|10=054|34=1992|35=0|49=TEST|52=20200512-05:29:28|56=TEST|57=ADMIN|
1589261418 579765 8=DECODE|9=59|10=046|34=1993|35=0|49=TEST|52=20200512-05:30:18|56=TEST|57=ADMIN|
1589261468 627052 8=DECODE|9=59|10=047|34=1994|35=0|49=TEST|52=20200512-05:31:08|56=TEST|57=ADMIN|
1589261518 680570 8=DECODE|9=59|10=053|34=1995|35=0|49=TEST|52=20200512-05:31:58|56=TEST|57=ADMIN|
1589261568 722516 8=DECODE|9=59|10=054|34=1996|35=0|49=TEST|52=20200512-05:32:48|56=TEST|57=ADMIN|
1589261618 766070 8=DECODE|9=59|10=055|34=1997|35=0|49=TEST|52=20200512-05:33:38|56=TEST|57=ADMIN|
1589261668 807964 8=DECODE|9=59|10=056|34=1998|35=0|49=TEST|52=20200512-05:34:28|56=TEST|57=ADMIN|
1589261718 853464 8=DECODE|9=59|10=057|34=1999|35=0|49=TEST|52=20200512-05:35:18|56=TEST|57=ADMIN|
1589261768 898758 8=DECODE|9=59|10=031|34=2000|35=0|49=TEST|52=20200512-05:36:08|56=TEST|57=ADMIN|
1589261818 948236 8=DECODE|9=59|10=037|34=2001|35=0|49=TEST|52=20200512-05:36:58|56=TEST|57=ADMIN|
1589261868 995181 8=DECODE|9=59|10=038|34=2002|35=0|49=TEST|52=20200512-05:37:48|56=TEST|57=ADMIN|
1589261918 36727 8=DECODE|9=59|10=039|34=2003|35=0|49=TEST|52=20200512-05:38:38|56=TEST|57=ADMIN|
1589261968 91253 8=DECODE|9=59|10=040|34=2004|35=0|49=TEST|52=20200512-05:39:28|56=TEST|57=ADMIN|
1589262018 129336 8=DECODE|9=59|10=032|34=2005|35=0|49=TEST|52=20200512-05:40:18|56=TEST|57=ADMIN|
1589262068 173247 8=DECODE|9=59|10=033|34=2006|35=0|49=TEST|52=20200512-05:41:08|56=TEST|57=ADMIN|
1589262118 214993 8=DECODE|9=59|10=039|34=2007|35=0|49=TEST|52=20200512-05:41:58|56=TEST|57=ADMIN|
1589262168 256754 8=DECODE|9=59|10=040|34=2008|35=0|49=TEST|52=20200512-05:42:48|56=TEST|57=ADMIN|
1589262218 299908 8=DECODE|9=59|10=041|34=2009|35=0|49=TEST|52=20200512-05:43:38|56=TEST|57=ADMIN|
1589262268 345560 8=DECODE|9=59|10=033|34=2010|35=0|49=TEST|52=20200512-05:44:28|56=TEST|57=ADMIN|
1589262318 392894 8=DECODE|9=59|10=034|34=2011|35=0|49=TEST|52=20200512-05:45:18|56=TEST|57=ADMIN|
1589262368 439936 8=DECODE|9=59|10=035|34=2012|35=0|49=TEST|52=20200512-05:46:08|56=TEST|57=ADMIN|
1589262418 484959 8=DECODE|9=59|10=041|34=2013|35=0|49=TEST|52=20200512-05:46:58|56=TEST|57=ADMIN|
1589262468 531136 8=DECODE|9=59|10=042|34=2014|35=0|49=TEST|52=20200512-05:47:48|56=TEST|57=ADMIN|
1589262518 577190 8=DECODE|9=59|10=043|34=2015|35=0|49=TEST|52=20200512-05:48:38|56=TEST|57=ADMIN|
1589262568 621673 8=DECODE|9=59|10=044|34=2016|35=0|49=TEST|52=20200512-05:49:28|56=TEST|57=ADMIN|
1589262618 661569 8=DECODE|9=59|10=036|34=2017|35=0|49=TEST|52=20200512-05:50:18|56=TEST|57=ADMIN|
1589262668 704912 8=DECODE|9=59|10=037|34=2018|35=0|49=TEST|52=20200512-05:51:08|56=TEST|57=ADMIN|
1589262718 751844 8=DECODE|9=59|10=043|34=2019|35=0|49=TEST|52=20200512-05:51:58|56=TEST|57=ADMIN|
1589262768 792980 8=DECODE|9=59|10=035|34=2020|35=0|49=TEST|52=20200512-05:52:48|56=TEST|57=ADMIN|
1589262818 840365 8=DECODE|9=59|10=036|34=2021|35=0|49=TEST|52=20200512-05:53:38|56=TEST|57=ADMIN|
1589262868 879185 8=DECODE|9=59|10=037|34=2022|35=0|49=TEST|52=20200512-05:54:28|56=TEST|57=ADMIN|
1589262918 925163 8=DECODE|9=59|10=038|34=2023|35=0|49=TEST|52=20200512-05:55:18|56=TEST|57=ADMIN|
1589262968 961584 8=DECODE|9=59|10=039|34=2024|35=0|49=TEST|52=20200512-05:56:08|56=TEST|57=ADMIN|
1589263018 10120 8=DECODE|9=59|10=045|34=2025|35=0|49=TEST|52=20200512-05:56:58|56=TEST|57=ADMIN|
1589263068 53127 8=DECODE|9=59|10=046|34=2026|35=0|49=TEST|52=20200512-05:57:48|56=TEST|57=ADMIN|
1589263118 92960 8=DECODE|9=59|10=047|34=2027|35=0|49=TEST|52=20200512-05:58:38|56=TEST|57=ADMIN|
1589263168 134768 8=DECODE|9=59|10=048|34=2028|35=0|49=TEST|52=20200512-05:59:28|56=TEST|57=ADMIN|
1589263218 180362 8=DECODE|9=59|10=035|34=2029|35=0|49=TEST|52=20200512-06:00:18|56=TEST|57=ADMIN|
1589263268 220070 8=DECODE|9=59|10=027|34=2030|35=0|49=TEST|52=20200512-06:01:08|56=TEST|57=ADMIN|
1589263318 269426 8=DECODE|9=59|10=033|34=2031|35=0|49=TEST|52=20200512-06:01:58|56=TEST|57=ADMIN|
1589263368 309432 8=DECODE|9=59|10=034|34=2032|35=0|49=TEST|52=20200512-06:02:48|56=TEST|57=ADMIN|
1589263418 356561 8=DECODE|9=59|10=035|34=2033|35=0|49=TEST|52=20200512-06:03:38|56=TEST|57=ADMIN|
Code -
package main
import (
"bytes"
"bufio"
"io"
"log"
"os/exec"
"fmt"
)
// dropCRLR drops a terminal \r from the data.
func dropCRLR(data []byte) []byte {
if len(data) > 0 && data[len(data)-1] == '\r' {
return data[0 : len(data)-1]
}
return data
}
func newLineSplitFunc(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.IndexByte(data, '\n'); i >= 0 {
// We have a full newline-terminated line.
return i + 1, dropCRLR(data[0:i]), nil
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
return len(data), dropCRLR(data), nil
}
// Request more data.
// fmt.Println("Returning 0,nil,nil")
return 0, nil, nil
}
func main() {
cmd := exec.Command("decode_it", "filename.d", "4", "1")
var out io.Reader
{
stdout, err := cmd.StdoutPipe()
if err != nil {
log.Fatal(err)
}
stderr, err := cmd.StderrPipe()
if err != nil {
log.Fatal(err)
}
out = io.MultiReader(stdout, stderr)
}
if err := cmd.Start(); err != nil {
log.Fatal(err)
}
// Make a new channel which will be used to ensure we get all output
done := make(chan struct{})
go func() {
// defer cmd.Process.Kill()
s := bufio.NewScanner(out)
s.Split(newLineSplitFunc)
for s.Scan() {
fmt.Println("---- " + s.Text())
}
if s.Err() != nil {
fmt.Printf("error: %s\n", s.Err())
}
}()
// Wait for all output to be processed
<-done
// Wait for the command to finish
if err := cmd.Wait(); err != nil{
fmt.Println("Error: " + string(err.Error()))
}
// if out closes, cmd closed.
log.Println("all done")
}
Also, Since scan() is taking a lot of time and goes into a loop from which I am not able to break as well. Please help for that too..
try something like this one, i fixed some issues and make it more simple:
package main
import (
"bufio"
"fmt"
"io"
"log"
"os/exec"
)
func main() {
var err error
// change to your command
cmd := exec.Command("sh", "test.sh")
var out io.Reader
{
var stdout, stderr io.ReadCloser
stdout, err = cmd.StdoutPipe()
if err != nil {
log.Fatal(err)
}
stderr, err = cmd.StderrPipe()
if err != nil {
log.Fatal(err)
}
out = io.MultiReader(stdout, stderr)
}
if err = cmd.Start(); err != nil {
log.Fatal(err)
}
scanner := bufio.NewScanner(out)
for scanner.Scan() {
fmt.Println("---- " + scanner.Text())
}
if err = scanner.Err(); err != nil {
fmt.Printf("error: %v\n", err)
}
log.Println("all done")
}
test.sh that i used in test:
#!/bin/bash
while [[ 1 = 1 ]]; do
echo 1
sleep 1
done
:)
I tried resolving the above issue using stdbuf -
cmd := exec.Command("stdbuf", "-o0", "-e0", "decode_it", FILEPATH, "4", "1")
Reference link - STDIO Buffering
When programs write to stdout they write with line bufferring. If they are writing to something else, then they use fully buffered mode. golang exec.Command seems to end up using fully buffered mode so using stdbuf forces no buffering.

Count lines via bufio

I'm utilizing bufio to do a for loop for each line in a text file. I have no idea how to count the amount of lines though.
scanner := bufio.NewScanner(bufio.NewReader(file))
The above is what I use to scan my file.
You could do something like this:
counter := 0
for scanner.Scan() {
line := scanner.Text()
counter++
// do something with your line
}
fmt.Printf("Lines read: %d", counter)
Keep it simple and fast. No need for buffering, scanner already does that. Don't do unnecessary string conversions. For example,
package main
import (
"bufio"
"fmt"
"os"
)
func lineCount(filename string) (int64, error) {
lc := int64(0)
f, err := os.Open(filename)
if err != nil {
return 0, err
}
defer f.Close()
s := bufio.NewScanner(f)
for s.Scan() {
lc++
}
return lc, s.Err()
}
func main() {
filename := `testfile`
lc, err := lineCount(filename)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(filename+" line count:", lc)
}
As I commented, the accepted answer fails at long lines. The default limit is bufio.MaxScanTokenSize which is 64KiB. So if your line is longer than 65536 chars, it will silently fail. You've got two options.
Call scanner.Buffer() and supply the sufficient max parameter. buffer may be small by default because Scanner is smart enough to allocate new ones. Can be a problem if you don't know the total size beforehand, like with vanilla Reader interface, and you've got huge lines - the memory consumption will grow correspondingly as Scanner records all the line.
Recreate scanner in the outer loop, this will ensure that you advance further:
var scanner *bufio.Scanner
counter := 0
for scanner == nil || scanner.Err() == bufio.ErrTooLong {
scanner = bufio.NewScanner(reader)
for scanner.Scan() {
counter++
}
}
The problem with (2) is that you keep allocating and deallocating buffers instead of reusing them. So let's fuse (1) and (2):
var scanner *bufio.Scanner
buffer := make([]byte, bufio.MaxScanTokenSize)
counter := 0
for scanner == nil || scanner.Err() == bufio.ErrTooLong {
scanner = bufio.NewScanner(reader)
scanner.Buffer(buffer, 0)
for scanner.Scan() {
counter++
}
}
Here is my approach to do the task:
inputFile, err := os.Open("input.txt")
if err != nil {
panic("Error happend during opening the file. Please check if file exists!")
os.Exit(1)
}
defer inputFile.Close()
inputReader := bufio.NewReader(inputFile)
scanner := bufio.NewScanner(inputReader)
// Count the words.
count := 0
for scanner.Scan() {
line := scanner.Text()
fmt.Printf("%v\n", line)
count++
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading input:", err)
}
fmt.Printf("%d\n", count)

How to stop io.CopyN

I have some code that copies from a file to a tcp socket (like an ftp server) and want to be able to abort this copy if needed.
Im just using io.CopyN(socket, file, size) and cant see a way to signal an abort. Any ideas?
How about just closing the input file? io.CopyN will then return an error and abort.
Here is a demonstration (If not running on Linux change /dev/zero & /dev/null for your OS equivalent!)
package main
import (
"fmt"
"io"
"log"
"os"
"time"
)
func main() {
in, err := os.Open("/dev/zero")
if err != nil {
log.Fatal(err)
}
out, err := os.Create("/dev/null")
if err != nil {
log.Fatal(err)
}
go func() {
time.Sleep(time.Second)
in.Close()
}()
written, err := io.CopyN(out, in, 1E12)
fmt.Printf("%d bytes written with error %s\n", written, err)
}
When run it will print something like
9756147712 bytes written with error read /dev/zero: bad file descriptor
CopyN tries hard to copy N bytes. If you want to optionally copy less than N bytes then don't use CopyN in the first place. I would probably adapt the original code to something like (untested code):
func copyUpToN(dst Writer, src Reader, n int64, signal chan int) (written int64, err error) {
buf := make([]byte, 32*1024)
for written < n {
select {
default:
case <-signal:
return 0, fmt.Errorf("Aborted") // or whatever
}
l := len(buf)
if d := n - written; d < int64(l) {
l = int(d)
}
nr, er := src.Read(buf[0:l])
if nr > 0 {
nw, ew := dst.Write(buf[0:nr])
if nw > 0 {
written += int64(nw)
}
if ew != nil {
err = ew
break
}
if nr != nw {
err = io.ErrShortWrite
break
}
}
if er != nil {
err = er
break
}
}
return written, err
}

Golang, is there a better way read a file of integers into an array?

I need to read a file of integers into an array. I have it working with this:
package main
import (
"fmt"
"io"
"os"
)
func readFile(filePath string) (numbers []int) {
fd, err := os.Open(filePath)
if err != nil {
panic(fmt.Sprintf("open %s: %v", filePath, err))
}
var line int
for {
_, err := fmt.Fscanf(fd, "%d\n", &line)
if err != nil {
fmt.Println(err)
if err == io.EOF {
return
}
panic(fmt.Sprintf("Scan Failed %s: %v", filePath, err))
}
numbers = append(numbers, line)
}
return
}
func main() {
numbers := readFile("numbers.txt")
fmt.Println(len(numbers))
}
The file numbers.txt is just:
1
2
3
...
ReadFile() seems too long (maybe because of the error handing).
Is there a shorter / more Go idiomatic way to load a file?
Using a bufio.Scanner makes things nice. I've also used an io.Reader rather than taking a filename. Often that's a good technique, since it allows the code to be used on any file-like object and not just a file on disk. Here it's "reading" from a string.
package main
import (
"bufio"
"fmt"
"io"
"strconv"
"strings"
)
// ReadInts reads whitespace-separated ints from r. If there's an error, it
// returns the ints successfully read so far as well as the error value.
func ReadInts(r io.Reader) ([]int, error) {
scanner := bufio.NewScanner(r)
scanner.Split(bufio.ScanWords)
var result []int
for scanner.Scan() {
x, err := strconv.Atoi(scanner.Text())
if err != nil {
return result, err
}
result = append(result, x)
}
return result, scanner.Err()
}
func main() {
tf := "1\n2\n3\n4\n5\n6"
ints, err := ReadInts(strings.NewReader(tf))
fmt.Println(ints, err)
}
I would do it like this:
package main
import (
"fmt"
"io/ioutil"
"strconv"
"strings"
)
// It would be better for such a function to return error, instead of handling
// it on their own.
func readFile(fname string) (nums []int, err error) {
b, err := ioutil.ReadFile(fname)
if err != nil { return nil, err }
lines := strings.Split(string(b), "\n")
// Assign cap to avoid resize on every append.
nums = make([]int, 0, len(lines))
for _, l := range lines {
// Empty line occurs at the end of the file when we use Split.
if len(l) == 0 { continue }
// Atoi better suits the job when we know exactly what we're dealing
// with. Scanf is the more general option.
n, err := strconv.Atoi(l)
if err != nil { return nil, err }
nums = append(nums, n)
}
return nums, nil
}
func main() {
nums, err := readFile("numbers.txt")
if err != nil { panic(err) }
fmt.Println(len(nums))
}
Your solution with fmt.Fscanf is fine. There are certainly a number of other ways to do though, depending on your situation. Mostafa's technique is one I use a lot (although I might allocate the result all at once with make. oops! scratch that. He did.) but for ultimate control you should learn bufio.ReadLine. See go readline -> string for some example code.

Resources