Given an io.ReadCloser, from the response of an HTTP request for example, what is the most efficient way both in memory overhead and code readability to stream the response to a File?
io.Copy is undoubtedly the most efficient in terms of code; you only need to
outFile, err := os.Create(filename)
// handle err
defer outFile.Close()
_, err = io.Copy(outFile, res.Body)
// handle err
it's also likely to be pretty efficient in terms of CPU and memory as well. You can peek at the implementation of io.Copy if you want; assuming that the body doesn't implement WriteTo and the file doesn't implement ReadFrom (a quick glance says that they don't), Copy will copy chunks of up to 32kB at a time. A bigger chunk would probably use a bit less CPU but more memory; the value they picked seems like a good tradeoff.
Another option is File.ReadFrom:
package main
import (
"net/http"
"os"
)
func main() {
r, e := http.Get("https://stackoverflow.com")
if e != nil {
panic(e)
}
defer r.Body.Close()
f, e := os.Create("index.html")
if e != nil {
panic(e)
}
defer f.Close()
f.ReadFrom(r.Body)
}
https://golang.org/pkg/os#File.ReadFrom
Related
I am copying a network stream to a file using io.Copy. I would like to extract the current speed, preferably in bytes per second, that the transfer is operating at.
res, err := http.Get(url)
if err != nil {
panic(err)
}
// Open output file
out, err := os.OpenFile("output", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
panic(err)
}
// Close output file as well as body
defer out.Close()
defer func(Body io.ReadCloser) {
err := Body.Close()
if err != nil {
panic(err)
}
}(res.Body)
_, err := io.Copy(out, res.Body)
As noted in the comments - the entire transfer rate is easily computed after the fact - especially when using io.Copy. If you want to track "live" transfer rates - and poll the results over a long file transfer - then a little more work is involved.
Below I've outlined a simple io.Reader wrapper to track the overall transfer rate. For brevity, it is not goroutine safe, but would be trivial do make it so. And then one could poll from another goroutine the progress, while the main goroutine does the reading.
You can create a io.Reader wrapper - and use that to track the moment of first read - and then track future read byte counts. The final result may look like this:
r := NewRater(resp.Body) // io.Reader wrapper
n, err := io.Copy(out, r)
log.Print(r) // stringer method shows human readable "b/s" output
To implement this, one approach:
type rate struct {
r io.Reader
count int64 // may have large (2GB+) files - so don't use int
start, end time.Time
}
func NewRater(r io.Reader) *rate { return &rate{r: r} }
then we need the wrapper Read to track the underlying io.Readers progress:
func (r *rate) Read(b []byte) (n int, err error) {
if r.start.IsZero() {
r.start = time.Now()
}
n, err = r.r.Read(b) // underlying io.Reader read
r.count += int64(n)
if err == io.EOF {
r.end = time.Now()
}
return
}
the rate at any time can be polled like so - even before EOF:
func (r *rate) Rate() (n int64, d time.Duration) {
end := r.rend
if end.IsZero() {
end = time.Now()
}
return r.count, end.Sub(r.start)
}
and a simple Stringer method to show b/s:
func (r *rate) String() string {
n, d := r.Rate()
return fmt.Sprintf("%.0f b/s", float64(n)/(d.Seconds()))
}
Note: the above io.Reader wrapper has no locking in place, so operations must be from the same goroutine. Since the question relates to io.Copy - then this is a safe assumption to make.
I need my program to be in the middle of the connection and transfer data correctly in both directions. I wrote this code, but it does not work properly
package main
import (
"fmt"
"net"
)
func main() {
listener, err := net.Listen("tcp", ":8120")
if err != nil {
fmt.Println(err)
return
}
defer listener.Close()
fmt.Println("Server is listening...")
for {
var conn1, conn2 net.Conn
var err error
conn1, err = listener.Accept()
if err != nil {
fmt.Println(err)
conn1.Close()
continue
}
conn2, err = net.Dial("tcp", "185.151.245.51:80")
if err != nil {
fmt.Println(err)
conn2.Close()
continue
}
go handleConnection(conn1, conn2)
go handleConnection(conn2, conn1)
}
}
func handleConnection(conn1, conn2 net.Conn) {
defer conn1.Close()
for {
input := make([]byte, 1024)
n, err := conn1.Read(input)
if n == 0 || err != nil {
break
}
conn2.Write([]byte(input))
}
}
The problem is that the data is corrupted,
for example.
Left one is original, right one is what i got.
End of the final gotten file is unreadable.
But at the beginnig everything is ok.
I tried to change input slice size. If size > 0 and < 8, everything is fine, but slow. If i set input size very large, corruption of data become more awful.
What I'm doing wrong?
In handleConnection, you always write 1024 bytes, no matter what conn1.Read returns.
You want to write the data like this:
conn2.Write(input[:n])
You should also check your top-level for loop. Are you sure you're not accepting multiple connections and smushing them all together? I'd sprinkle in some log statements so you can see when connections are made and closed.
Another (probably inconsequential) mistake, is that you treat n==0 as a termination condition. In the documentation of io.Reader it's recommended that you ignore n==0, err==nil. Without checking the code I can't be sure, but I expect that conn.Read never returns n==0, err==nil, so it's unlikely that this is causing you trouble.
Although it doesn't affect correctness, you could also lift the definition of input out of the loop so that it's reused on each iteration; it's likely to reduce the amount of work the garbage collector has to do.
Golang Bufio writer .Flush() does not write small data when big buffer size (example 4096(standard size)*2)
package main
import (
"log"
"os"
"bufio"
)
func main() {
file, err := os.Create("test")
defer file.Close()
w := bufio.NewWriter(file)
w = bufio.NewWriterSize(
w,
4096*2,
)
bytesAvailable := w.Available()
log.Printf("Available %v\n", bytesAvailable)
bw, _ := w.Write(
[]byte("A"),
)
log.Printf("written bytes: %v\n", bw)
bytesAvailable = w.Available()
log.Printf("Available: %v\n", bytesAvailable)
buf := w.Buffered()
log.Printf("buffered: %d\n", buf)
err = w.Flush()
if err != nil {
log.Fatal(err)
}
}
When I use the standard size or I write more data it works as expected.
The problem is that the application has two layers of bufio writers:
w := bufio.NewWriter(file)
w = bufio.NewWriterSize(
w,
4096*2,
)
One bufio.Writer wraps the other. The application flushes the outer bufio.Writer, but there's no code that flushes the inner bufio.Writer. Change the code to use a single bufio.Writer and the program will work as expected.
w := bufio.NewWriterSize(
file,
4096*2,
)
I believe you're missing the Sync() call to the file pointer, so the file is not written to the file system; Flush() will just pass the buffer to the file writer, but then you need to push the file to the disk.
See src/os/file_posix.go#L120 and pkg/bufio/#Writer.Flush
I am trying to make a program for checking file duplicates based on md5 checksum.
Not really sure whether I am missing something or not, but this function reading the XCode installer app (it has like 8GB) uses 16GB of Ram
func search() {
unique := make(map[string]string)
files, err := ioutil.ReadDir(".")
if err != nil {
log.Println(err)
}
for _, file := range files {
fileName := file.Name()
fmt.Println("CHECKING:", fileName)
fi, err := os.Stat(fileName)
if err != nil {
fmt.Println(err)
continue
}
if fi.Mode().IsRegular() {
data, err := ioutil.ReadFile(fileName)
if err != nil {
fmt.Println(err)
continue
}
sum := md5.Sum(data)
hexDigest := hex.EncodeToString(sum[:])
if _, ok := unique[hexDigest]; ok == false {
unique[hexDigest] = fileName
} else {
fmt.Println("DUPLICATE:", fileName)
}
}
}
}
As per my debugging the issue is with the file reading
Is there a better approach to do that?
thanks
There is an example in the Golang documentation, which covers your case.
package main
import (
"crypto/md5"
"fmt"
"io"
"log"
"os"
)
func main() {
f, err := os.Open("file.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()
h := md5.New()
if _, err := io.Copy(h, f); err != nil {
log.Fatal(err)
}
fmt.Printf("%x", h.Sum(nil))
}
For your case, just make sure to close the files in the loop and not defer them. Or put the logic into a function.
Sounds like the 16GB RAM is your problem, not speed per se.
Don't read the entire file into a variable with ReadFile; io.Copy from the Reader that Open gives you to the Writer that hash/md5 provides (md5.New returns a hash.Hash, which embeds an io.Writer). That only copies a little bit at a time instead of pulling all of the file into RAM.
This is a trick useful in a lot of places in Go; packages like text/template, compress/gzip, net/http, etc. work in terms of Readers and Writers. With them, you don't usually need to create huge []bytes or strings; you can hook I/O interfaces up to each other and let them pass around pieces of content for you. In a garbage collected language, saving memory tends to save you CPU work as well.
I am trying to parse a file that annoying consists of many separately zipped segments. I have parsed these segments one at a time into a slice of bytes and I want to uncompress them as I go.
Here is my current code that does the decompressing, which doesn't work. from and to are just set at the top as an example, in reality they are set by the code. data is the byte array containing the entire file. I don't want to seek it while it's on disk because its location on another server, so it's only realistic for me to load the entire file to []byte first and then parse it.
from, to := 0, 1000;
b := bytes.NewReader(data[from:from+to])
z, err := zlib.NewReader(b)
CheckErr(err)
defer z.Close()
p := make([]byte,0,1024)
z.Read(p)
fmt.Println(string(p))
So how is it so massively difficult just to unzip a slice of bytes? Anyway...
The problem appears to with how I am reading it out. Where it says z.Read, that doesn't seem to do anything.
How can I read the entire thing in one go into a slice of bytes?
Here's an outline for you. Note: In Go, CHECK FOR ERRORS!
package main
import (
"bytes"
"compress/zlib"
"fmt"
"io/ioutil"
)
func readSegment(data []byte, from, to int) ([]byte, error) {
b := bytes.NewReader(data[from : from+to])
z, err := zlib.NewReader(b)
if err != nil {
return nil, err
}
defer z.Close()
p, err := ioutil.ReadAll(z)
if err != nil {
return nil, err
}
return p, nil
}
func main() {
from, to := 0, 1000
data := make([]byte, from+to)
// ** parse input segments into data **
p, err := readSegment(data, from, to)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(p))
}
Use ReadAll(r io.Reader) ([]byte, error) from the io/ioutil package.
p, err := ioutil.ReadAll(b)
fmt.Println(string(p))
Read only reads up to the length of the given slice (1024 bytes in your case).
To read in chunks of 1024 bytes:
p := make([]byte,1024)
for {
numBytes, err := l.Read(p)
if err == io.EOF {
// you are done, numBytes might be less than len(p)
break
}
// do what you want with p
}
If you are getting the data from a webserver, you might even do
import (
"net/http"
"io/ioutil"
)
...
resp, errGet := http.Get("http://example.com/somefile")
// do error handling
z, errZ := zlib.NewReader(resp.Body)
// do error handling
resp.Body.Close()
p, err := ioutil.ReadAll(b)
// do error handling
since resp.Body happens to be an io.Reader as most io related types.