I am getting a REST command, and want to calculate hash function on it's body.
To do so I read the body using io.TeeReader(request.Body, &writerToHash) where I pass my own class that implements io.Writer:
func (self *WriterToHash) Write(p []byte) (n int, err error) {
n=len(p)
fmt.println("WriterToHash len=%v, buff=%v", n, p) //PRINT 1
self.BodyChannel <- p
return n, nil
}
The BodyChannel is defined: BodyChannel chan []byte
I use this class as follows:
writerToHash := sisutils.WriterToHash{
BodyChannel:make(chan []byte, 1024)
}
writerToHash.StartListen()
reqnew, _ := http.NewRequest("PUT", url, io.TeeReader(request.Body, &writerToHash))
Listening part:
func (wth *WriterToHash) StartListen() {
wth.OutChannel = make(chan []byte, 1000)
go func (self *WriterToHash) {
done := int64(0)
h := sha1.New()
for done < MessageSize{
buff := <- self.BodyChannel
done += int64(len(buff))
DPrint(5, "AccamulateSha1 Done=: %v, buff=%v", done, buff) //PRINT 2
actually_write, err := h.Write(buff)
if err != nil || actually_write != len(buff) {
log.Println("Error in sha write:" + err.Error())
break
}
}
bs := h.Sum(nil)
self.OutChannel <- bs
}(wth)
}
I send messages of 1000 bytes. In debug mode the message is always split in the same way: 1 byte, 999 bytes - I see it using PRINT 1. In this case everythong works fine.
The problem is that when the message is split to more parts in the Write function. In this case I see in PRINT1:
[first byte] : a
[next ~450 bytes] : b,c,d,...
[last ~550 bytes] : w,x,y,...
but in PRINT 2 I see different picture:
[first byte] : a
[ ~450 bytes but starting where last part starts] : w,x,y...
[last ~550 bytes] : w,x,y,...
I actually get the last past twice but not in the same size.
From the io.Writer documentation:
Write must not modify the slice data, even temporarily. Implementations must not retain p
You can't store or reuse the slice being passed to your Write method. If you want to use that data elsewhere, you need to make a copy of it
func (self *WriterToHash) Write(p []byte) (n int, err error) {
b := make([]byte, len(p))
copy(b, p)
fmt.println("WriterToHash len=%d, buff=%v", len(p), b)
self.BodyChannel <- b
return len(p), nil
}
Related
I have a function which splits data and returns slice of subslices:
(buf []byte, lim int) [][]byte
Obviously I get an error if I do:
n, err = out.Write(split(buf[:n], 100))
The error:
cannot convert split(buf[:n], 100) (type [][]byte) to type []byte
How do I convert [][]byte to []byte?
Edit based on #Wishwa Perera: https://play.golang.org/p/nApPAYRV4ZW
Since you are splitting buf into chunks, you can pass them individually to Write by looping over the result of split.
for _, chunk := range split(buf[:n], 100) {
if _, err := out.Write(chunk); err != nil {
panic(err)
}
}
If out is a net.Conn as in your other question, then use net.Buffers to write the [][]byte.
b := net.Buffers(split(buf[:n], 100))
_, err := b.WriteTo(out)
if err != nil {
panic(err)
}
I have the following data structure that I expect to read from TCP socket connection, first 4 bytes are a uint32 that describes the length of the payload that follows these 4 bytes. I try to continuously read from a connection using following code:
// c is TCP connection
func StartReading(c io.Reader, ok chan bool) {
// Reader reads first 4 bytes as payload length
for l, err := getPayloadLength(c); err == nil; {
//Reader reads the rest of the message
b, err := readFixedSize(c, l)
if err != nil {
ok<- false
close(ok)
return
}
go process(b, make(chan bool))
}
ok<- true
}
func getPayloadLength(r io.Reader) (uint, error) {
b, err := readFixedSize(r, 4)
if err != nil {
return 0, err
}
return uint(binary.BigEndian.Uint32(b)), nil
}
// Read fixed size byte slice from reader
func readFixedSize(r io.Reader, len uint) (b []byte, err error) {
b = make([]byte, len)
_, err = io.ReadFull(r, b)
if err != nil {
return
}
return
}
My expectation is that it will read first four bytes from incoming data, parse it to l, and based on parsed value will read consequent l bytes. The first read from a connection yields expected results, but in all consequent read iterations, the reader seems to read 4 bytes from the end of the previous message.
By trial and error I ended up with the following code which reads as expected, but I still could not understand why the code above does not work as I expect.
New code:
func StartReading(c io.Reader, ok chan<- bool) {
br := bufio.NewReader(c)
// Peek into first 4 bytes for payload length
for lb, err := br.Peek(4); err == nil; {
// Read length bytes into uint
l := uint(binary.BigEndian.Uint32(lb))
b := make([]byte, l + 4)
_, err := br.Read(b)
if err != nil {
ok<- false
return
}
//Process from 4th byte
go process(b[4:], make(chan bool))
}
ok<- true
}
I do kind of understand why the latter code works, but can't wrap my head around why the first code does not work as expected. I'm quite new to Go, so could someone please explain what happened there?
I'm trying to improve the performance of an app.
One part of its code uploads a file to a server in chunks.
The original version simply does this in a sequential loop. However, it's slow and during the sequence it also needs to talk to another server before uploading each chunk.
The upload of chunks could simply be placed in a goroutine. It works, but is not a good solution because if the source file is extremely large it ends up using a large amount of memory.
So, I try to limit the number of active goroutines by using a buffered channel. Here is some code that shows my attempt. I've stripped it down to show the concept and you can run it to test for yourself.
package main
import (
"fmt"
"io"
"os"
"time"
)
const defaultChunkSize = 1 * 1024 * 1024
// Lets have 4 workers
var c = make(chan int, 4)
func UploadFile(f *os.File) error {
fi, err := f.Stat()
if err != nil {
return fmt.Errorf("err: %s", err)
}
size := fi.Size()
total := (int)(size/defaultChunkSize + 1)
// Upload parts
buf := make([]byte, defaultChunkSize)
for partno := 1; partno <= total; partno++ {
readChunk := func(offset int, buf []byte) (int, error) {
fmt.Println("readChunk", partno, offset)
n, err := f.ReadAt(buf, int64(offset))
if err != nil {
return n, err
}
return n, nil
}
// This will block if there are not enough worker slots available
c <- partno
// The actual worker.
go func() {
offset := (partno - 1) * defaultChunkSize
n, err := readChunk(offset, buf)
if err != nil && err != io.EOF {
return
}
err = uploadPart(partno, buf[:n])
if err != nil {
fmt.Println("Uploadpart failed:", err)
}
<-c
}()
}
return nil
}
func uploadPart(partno int, buf []byte) error {
fmt.Printf("Uploading partno: %d, buflen=%d\n", partno, len(buf))
// Actually upload the part. Lets test it by instead writing each
// buffer to another file. We can then use diff to compare the
// source and dest files.
// Open file. Seek to (partno - 1) * defaultChunkSize, write buffer
f, err := os.OpenFile("/home/matthewh/Downloads/out.tar.gz", os.O_CREATE|os.O_WRONLY, 0755)
if err != nil {
fmt.Printf("err: %s\n", err)
}
n, err := f.WriteAt(buf, int64((partno-1)*defaultChunkSize))
if err != nil {
fmt.Printf("err=%s\n", err)
}
fmt.Printf("%d bytes written\n", n)
defer f.Close()
return nil
}
func main() {
filename := "/home/matthewh/Downloads/largefile.tar.gz"
fmt.Printf("Opening file: %s\n", filename)
f, err := os.Open(filename)
if err != nil {
panic(err)
}
UploadFile(f)
}
It almost works. But there are several problems.
1) The final partno 22 is occuring 3 times. The correct length is actually 612545 as the file length isn't a multiple of 1MB.
// Sample output
...
readChunk 21 20971520
readChunk 22 22020096
Uploading partno: 22, buflen=1048576
Uploading partno: 22, buflen=612545
Uploading partno: 22, buflen=1048576
Another problem, the upload could fail and I am not familiar enough with go and how best to solve failure of the goroutine.
Finally, I want to ordinarily return some data from the uploadPart when it succeeds. Specifically, it'll be a string (an HTTP ETag header value). These etag values need to be collected by the main function.
What is a better way to structure this code in this instance? I've not yet found a good golang design pattern that correctly fulfills my needs here.
Skipping for the moment the question of how better to structure this code, I see a bug in your code which may be causing the problem you're seeing. Since the function you're running in the goroutine uses the variable partno, which changes with each iteration of the loop, your goroutine isn't necessarily seeing the value of partno at the time you invoked the goroutine. A common way of fixing this is to create a local copy of that variable inside the loop:
for partno := 1; partno <= total; partno++ {
partno := partno
// ...
}
Data race #1
Multiple goroutines are using the same buffer concurrently. Note that one gorouting may be filling it with a new chunk while another is still reading an old chunk from it. Instead, each goroutine should have it's own buffer.
Data race #2
As Andy Schweig has pointed, the value in partno is updated by the loop before the goroutine created in that iteration has a chance to read it. This is why the final partno 22 occurs multiple times. To fix it, you can pass partno as a argument to the anonymous function. That will ensure each goroutine has it's own part number.
Also, you can use a channel to pass the results from the workers. Maybe a struct type with the part number and error. That way, you will be able to observe the progress and retry failed uploads.
For an example of a good pattern check out this example from the GOPL book.
Suggested changes
As noted by dev.bmax buf moved into go routine, as noted by Andy Schweig partno is param to anon function, also added WaitGroup since UploadFile was exiting before uploads were complete. Also defer f.Close() file, good habit.
package main
import (
"fmt"
"io"
"os"
"sync"
"time"
)
const defaultChunkSize = 1 * 1024 * 1024
// wg for uploads to complete
var wg sync.WaitGroup
// Lets have 4 workers
var c = make(chan int, 4)
func UploadFile(f *os.File) error {
// wait for all the uploads to complete before function exit
defer wg.Wait()
fi, err := f.Stat()
if err != nil {
return fmt.Errorf("err: %s", err)
}
size := fi.Size()
fmt.Printf("file size: %v\n", size)
total := int(size/defaultChunkSize + 1)
// Upload parts
for partno := 1; partno <= total; partno++ {
readChunk := func(offset int, buf []byte, partno int) (int, error) {
fmt.Println("readChunk", partno, offset)
n, err := f.ReadAt(buf, int64(offset))
if err != nil {
return n, err
}
return n, nil
}
// This will block if there are not enough worker slots available
c <- partno
// The actual worker.
go func(partno int) {
// wait for me to be done
wg.Add(1)
defer wg.Done()
buf := make([]byte, defaultChunkSize)
offset := (partno - 1) * defaultChunkSize
n, err := readChunk(offset, buf, partno)
if err != nil && err != io.EOF {
return
}
err = uploadPart(partno, buf[:n])
if err != nil {
fmt.Println("Uploadpart failed:", err)
}
<-c
}(partno)
}
return nil
}
func uploadPart(partno int, buf []byte) error {
fmt.Printf("Uploading partno: %d, buflen=%d\n", partno, len(buf))
// Actually do the upload. Simulate long running task with a sleep
time.Sleep(time.Second)
return nil
}
func main() {
filename := "/home/matthewh/Downloads/largefile.tar.gz"
fmt.Printf("Opening file: %s\n", filename)
f, err := os.Open(filename)
if err != nil {
panic(err)
}
defer f.Close()
UploadFile(f)
}
I'm sure you can deal a little smarter with the buf situation. I'm just letting go deal with the garbage. Since you are limiting your workers to specific number 4 you really need only 4 x defaultChunkSize buffers. Please do share if you come up with something simple and shareworth.
Have fun!
There is a go tour. I've solved https://tour.golang.org/methods/23 like this:
func (old_reader rot13Reader) Read(b []byte) (int, error) {
const LEN int = 1024
tmp_bytes := make([]byte, LEN)
old_len, err := old_reader.r.Read(tmp_bytes)
if err == nil {
tmp_bytes = tmp_bytes[:old_len]
rot13(tmp_bytes)
return len(tmp_bytes), nil
} else {
return 0, err
}
}
func main() {
s := strings.NewReader("Lbh penpxrq gur pbqr!")
r := rot13Reader{s}
io.Copy(os.Stdout, &r)
}
Where rot13 is correct and debug output right before return shows correct string. But why there is no output to console?
The Read method for an io.Reader needs to operate on the byte slice provided to it. You're reading into a new slice, and never modifying the original.
Just use b throughout the Read method:
func (old_reader rot13Reader) Read(b []byte) (int, error) {
n, err := old_reader.r.Read(b)
rot13(b[:n])
return n, err
}
You're never modifying b in your reader. The semantic of io.Reader's Read function is that you put the data into b's underlying array directly.
Assuming the rot13() function also in-place modifies, this will work (edit: I've tried to keep this code close to your version so you can see what's changed easier. JimB's solution is a more idiomatic solution to this problem):
func (old_reader rot13Reader) Read(b []byte) (int, error) {
tmp_bytes := make([]byte, len(b))
old_len, err := old_reader.r.Read(tmp_bytes)
tmp_bytes = tmp_bytes[:old_len]
rot13(tmp_bytes)
for i := range tmp_bytes {
b[i] = tmp_bytes[i]
}
return old_len, err
}
Example (with stubbed rot13()): https://play.golang.org/p/vlbra-46zk
On a side note, from an idiomatic perspect, old_reader isn't a proper receiver name (nor is old_len a proper variable name). Go prefers short receiver names (like r or rdr in this case), and also prefer camelcase to underscores (underscores will actually fire a golint warning).
Edit2: A more idiomatic version of your code. Kept the same mechanism of action, just cleaned it up a bit.
func (rdr rot13Reader) Read(b []byte) (int, error) {
tmp := make([]byte, len(b))
n, err := rdr.r.Read(tmp)
tmp = tmp[:n]
rot13(tmp)
for i := range tmp {
b[i] = tmp[i]
}
return n, err
}
From this, removing the tmp byte slice and using the destination b directly results in JimB's idiomatic solution to the problem.
Edit3: Updated to fix the issue Paul pointed out in comments.
I'm wondering if it's possible to count and print the number of bytes downloaded while the file is being downloaded.
out, err := os.Create("file.txt")
defer out.Close()
if err != nil {
fmt.Println(fmt.Sprint(err))
panic(err)
}
resp, err := http.Get("http://example.com/zip")
defer resp.Body.Close()
if err != nil {
fmt.Println(fmt.Sprint(err))
panic(err)
}
n, er := io.Copy(out, resp.Body)
if er != nil {
fmt.Println(fmt.Sprint(err))
}
fmt.Println(n, "bytes ")
If I understand you correctly, you wish to display the number of bytes read, while the data is transferring. Presumably to maintain some kind of a progress bar or something. In which case, you can use Go's compositional data structures to wrap the reader or writer in a custom io.Reader or io.Writer implementation.
It simply forwards the respective Read or Write call to the underlying stream, while doing some additional work with the (int, error) values returned by them. Here is an example you can run on the Go playground.
package main
import (
"bytes"
"fmt"
"io"
"os"
"strings"
)
// PassThru wraps an existing io.Reader.
//
// It simply forwards the Read() call, while displaying
// the results from individual calls to it.
type PassThru struct {
io.Reader
total int64 // Total # of bytes transferred
}
// Read 'overrides' the underlying io.Reader's Read method.
// This is the one that will be called by io.Copy(). We simply
// use it to keep track of byte counts and then forward the call.
func (pt *PassThru) Read(p []byte) (int, error) {
n, err := pt.Reader.Read(p)
pt.total += int64(n)
if err == nil {
fmt.Println("Read", n, "bytes for a total of", pt.total)
}
return n, err
}
func main() {
var src io.Reader // Source file/url/etc
var dst bytes.Buffer // Destination file/buffer/etc
// Create some random input data.
src = bytes.NewBufferString(strings.Repeat("Some random input data", 1000))
// Wrap it with our custom io.Reader.
src = &PassThru{Reader: src}
count, err := io.Copy(&dst, src)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
fmt.Println("Transferred", count, "bytes")
}
The output it generates is this:
Read 512 bytes for a total of 512
Read 1024 bytes for a total of 1536
Read 2048 bytes for a total of 3584
Read 4096 bytes for a total of 7680
Read 8192 bytes for a total of 15872
Read 6128 bytes for a total of 22000
Transferred 22000 bytes
The stdlib now provides something like jimt's PassThru: io.TeeReader. It helps simplify things a bit:
// WriteCounter counts the number of bytes written to it.
type WriteCounter struct {
Total int64 // Total # of bytes transferred
}
// Write implements the io.Writer interface.
//
// Always completes and never returns an error.
func (wc *WriteCounter) Write(p []byte) (int, error) {
n := len(p)
wc.Total += int64(n)
fmt.Printf("Read %d bytes for a total of %d\n", n, wc.Total)
return n, nil
}
func main() {
// ...
// Wrap it with our custom io.Reader.
src = io.TeeReader(src, &WriteCounter{})
// ...
}
playground
The grab Go package implements progress updates (and many other features) for file downloads.
An example of printing progress updates while a download is in process is included in the following walkthrough: http://cavaliercoder.com/blog/downloading-large-files-in-go.html
You can basically call grab.GetAsync which downloads in a new Go routine and then monitor the BytesTransferred or Progress of the returned grab.Response from the calling thread.
Other answers have explained about PassThru. Just provide a full example with callback function base on Dave Jack's answer.
package main
import (
"fmt"
"io"
"net/http"
"os"
"strconv"
)
// writeCounter counts the number of bytes written to it.
type writeCounter struct {
total int64 // total size
downloaded int64 // downloaded # of bytes transferred
onProgress func(downloaded int64, total int64)
}
// Write implements the io.Writer interface.
//
// Always completes and never returns an error.
func (wc *writeCounter) Write(p []byte) (n int, e error) {
n = len(p)
wc.downloaded += int64(n)
wc.onProgress(wc.downloaded, wc.total)
return
}
func newWriter(size int64, onProgress func(downloaded, total int64)) io.Writer {
return &writeCounter{total: size, onProgress: onProgress}
}
func main() {
client := http.DefaultClient
url := "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerFun.mp4"
saveTo := "/Users/tin/Desktop/ForBiggerFun.mp4"
download(client, url, saveTo, func(downloaded, total int64) {
fmt.Printf("Downloaded %d bytes for a total of %d\n", downloaded, total)
})
}
func download(client *http.Client, url, filePath string, onProgress func(downloaded, total int64)) (err error) {
// Create file writer
file, err := os.Create(filePath)
if err != nil {
return
}
defer file.Close()
// Determinate the file size
resp, err := client.Head(url)
if err != nil {
return
}
contentLength := resp.Header.Get("content-length")
length, err := strconv.Atoi(contentLength)
if err != nil {
return
}
// Make request
resp, err = client.Get(url)
if err != nil {
return
}
defer resp.Body.Close()
// pipe stream
body := io.TeeReader(resp.Body, newWriter(int64(length), onProgress))
_, err = io.Copy(file, body)
return err
}
Base #Dave Jack
I add progress value and receiving file data from NC (direct TCP data transfer)
// WriteCounter counts the number of bytes written to it.
type WriteCounter struct {
Total int64 // Total # of bytes transferred
Last int64
LastUpdate time.Time
}
// Write implements the io.Writer interface.
//
// Always completes and never returns an error.
func (wc *WriteCounter) Write(p []byte) (int, error) {
n := len(p)
wc.Total += int64(n)
now := time.Now()
duration := now.Sub(wc.LastUpdate).Seconds()
if duration > 1 {
wc.LastUpdate = now
rate := float64(wc.Total-wc.Last) / (duration) / 1024.0
wc.Last = wc.Total
fmt.Printf("Read %d bytes for a total of %d , Rate %.1fKb/s \n", n, wc.Total, rate)
}
return n, nil
}
func Server(dest string) {
outputFile, err := os.Create(dest)
if err != nil {
fmt.Println(err)
}
defer outputFile.Close()
fileWriter := bufio.NewWriter(outputFile)
serverListener, err := net.Listen("tcp", "0.0.0.0:"+PORT)
if err != nil {
fmt.Println(err)
}
defer serverListener.Close()
serverConn, err := serverListener.Accept()
if err != nil {
fmt.Println(err)
}
defer serverConn.Close()
wc := &WriteCounter{}
reader := io.TeeReader(serverConn, wc)
serverConnReader := bufio.NewReaderSize(reader, 32*1024*1024)
io.Copy(fileWriter, serverConnReader)
fileWriter.Flush()
outputFile.Sync()
fmt.Println("Done: Writer")
}