I've written a little server which receives a blob of data in the form of an io.Reader, adds a header and streams the result back to the caller.
My implementation isn't particularly efficient as I'm buffering the blob's data in-memory so that I can calculate the blob's length, which needs to form part of the header.
I've seen some examples of io.Pipe() with io.TeeReader but they're more for splitting an io.Reader into two, and writing them away in parallel.
The blobs I'm dealing with are around 100KB, so not huge but if my server gets busy, memory's going to quickly become an issue...
Any ideas?
func addHeader(in io.Reader) (out io.Reader, err error) {
buf := new(bytes.Buffer)
if _, err = io.Copy(buf, in); err != nil {
return
}
header := bytes.NewReader([]byte(fmt.Sprintf("header:%d", buf.Len())))
return io.MultiReader(header, buf), nil
}
I appreciate it's not a good idea to return interfaces from functions but this code isn't destined to become an API, so I'm not too concerned with that bit.
In general, the only way to determine the length of data in an io.Reader is to read until EOF. There are ways to determine the length of the data for specific types.
func addHeader(in io.Reader) (out io.Reader, err error) {
n := 0
switch v := in.(type) {
case *bytes.Buffer:
n = v.Len()
case *bytes.Reader:
n = v.Len()
case *strings.Reader:
n = v.Len()
case io.Seeker:
cur, err := v.Seek(0, 1)
if err != nil {
return nil, err
}
end, err := v.Seek(0, 2)
if err != nil {
return nil, err
}
_, err = v.Seek(cur, 0)
if err != nil {
return nil, err
}
n = int(end - cur)
default:
var buf bytes.Buffer
if _, err := buf.ReadFrom(in); err != nil {
return nil, err
}
n = buf.Len()
in = &buf
}
header := strings.NewReader(fmt.Sprintf("header:%d", n))
return io.MultiReader(header, in), nil
}
This is similar to how the net/http package determines the content length of the request body.
Related
I'm trying to implement a function to ignore a line containing a pattern from a long text file (ASCII guaranteed) in Go
The functions I have below withoutIgnore and withIgnore, both take a filename argument input and return a *byte.Buffer, which can be subsequently used to write to a io.Writer.
The withIgnore function takes an additional argument pattern to exclude the line containing the pattern from the file. The function works, but with benchmarking, found it to be 5x slower than withoutIgnore. Is there a way it could be improved?
package main
import (
"bufio"
"bytes"
"io"
"log"
"os"
)
func withoutIgnore(f string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
log.Fatal(err)
}
defer func() {
if err := rfd.Close(); err != nil {
log.Fatal(err)
}
}()
inputBuffer := make([]byte, 1048576)
var bytesRead int
var bs []byte
opBuffer := bytes.NewBuffer(bs)
for {
bytesRead, err = rfd.Read(inputBuffer)
if err == io.EOF {
return opBuffer, nil
}
if err != nil {
return nil, nil
}
_, err = opBuffer.Write(inputBuffer[:bytesRead])
if err != nil {
return nil, err
}
}
return opBuffer, nil
}
func withIgnore(f, pattern string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
log.Fatal(err)
}
defer func() {
if err := rfd.Close(); err != nil {
log.Fatal(err)
}
}()
scanner := bufio.NewScanner(rfd)
var bs []byte
buffer := bytes.NewBuffer(bs)
for scanner.Scan() {
if !bytes.Contains(scanner.Bytes(), []byte(pattern)) {
_, err := buffer.WriteString(scanner.Text() + "\n")
if err != nil {
return nil, nil
}
}
}
return buffer, nil
}
func main() {
// buff, err := withoutIgnore("base64dump.log")
buff, err := withIgnore("base64dump.log", "AUDIT")
if err != nil {
log.Fatal(err)
}
_, err = buff.WriteTo(os.Stdout)
if err != nil {
log.Fatal(err)
}
}
Benchmark test
package main
import "testing"
func BenchmarkTestWithoutIgnore(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := withoutIgnore("base64dump.log")
if err != nil {
b.Fatal(err)
}
}
}
func BenchmarkTestWithIgnore(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := withIgnore("base64dump.log", "AUDIT")
if err != nil {
b.Fatal(err)
}
}
}
and the "base64dump.log" can be generated in the command line using
base64 /dev/urandom | head -c 10000000 > base64dump.log
Since ASCII is guaranteed, one can work directly at byte level.
Still if one checks each byte for line breaks when reading the input and then searches for the pattern again within the line, operations are applied to each byte.
If, on the other hand, one reads chunks of the input and performs an optimized search for the pattern in the text, not even examining each input byte, one minimizes the operations per input byte.
For example, there is the Boyer-Moore string search algorithm. Go's built-in bytes.Index function is also optimized. The achieved speed depends of course on the input data and the actual pattern. For the input as specified in the question, `bytes.Index turned out to be significantly more performant when measured.
Procedure
read in a chunk, where the chunk size should be significantly longer than the maximum line length, a value >= 64KB should probably be good, in the test 1MB was used as in the question.
a chunk usually doesn't end at a linefeed, so search from the end of the chunk to the next linefeed, limit the search to this slice and remember the remaining data for the next pass
the last chunk does not necessarily end in a linefeed
with the help of the performant GO function bytes.Index you can find the places where the pattern occurs in the chunk
from the found location one searches for the preceding and the following linefeed
then the block is output up to the corresponding beginning of the line
and the search is continued from the end of the line where the pattern occurred
if the search does not find another location, the rest is output
read the next chunk and apply the described steps again until the end of the file is reached
Noteworthy
A read operation may return less data than the chunk size, so it makes sense to repeat the read operation until the chunk size data has been read.
Benchmark
Optimized code is often significantly more complicated, but the performance is also significantly better, as we will see in a moment.
BenchmarkTestWithoutIgnore-8 270 4137267 ns/op
BenchmarkTestWithIgnore-8 54 22403931 ns/op
BenchmarkTestFilter-8 150 7947454 ns/op
Here, the optimized code BenchmarkTestFilter-8 is only about 1.9x slower than the operation without filtering while the BenchmarkTestWithIgnore-8 method is 5.4x slower than the comparison value without filtering.
Looked at another way: the optimized code is 2.8 times faster than the unoptimized one.
Code
Of course, here is the code for your own tests:
func filterFile(f, pattern string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
log.Fatal(err)
}
defer func() {
if err := rfd.Close(); err != nil {
log.Fatal(err)
}
}()
reader := bufio.NewReader(rfd)
return filter(reader, []byte(pattern), 1024*1024)
}
// chunkSize must be larger than the longest line
// a reasonable size is probably >= 64K
func filter(reader io.Reader, pattern []byte, chunkSize int) (*bytes.Buffer, error) {
var bs []byte
buffer := bytes.NewBuffer(bs)
chunk := make([]byte, chunkSize)
var remaining []byte
for lastChunk := false; !lastChunk; {
n, err := readChunk(reader, chunk, remaining, chunkSize)
if err != nil {
if err == io.EOF {
lastChunk = true
} else {
return nil, err
}
}
remaining = remaining[:0]
if !lastChunk {
for i := n - 1; i > 0; i-- {
if chunk[i] == '\n' {
remaining = append(remaining, chunk[i+1:n]...)
n = i + 1
break
}
}
}
s := 0
for s < n {
hit := bytes.Index(chunk[s:n], pattern)
if hit < 0 {
break
}
hit += s
startOfLine := hit
for ; startOfLine > 0; startOfLine-- {
if chunk[startOfLine] == '\n' {
startOfLine++
break
}
}
endOfLine := hit + len(pattern)
for ; endOfLine < n; endOfLine++ {
if chunk[endOfLine] == '\n' {
break
}
}
endOfLine++
_, err = buffer.Write(chunk[s:startOfLine])
if err != nil {
return nil, err
}
s = endOfLine
}
if s < n {
_, err = buffer.Write(chunk[s:n])
if err != nil {
return nil, err
}
}
}
return buffer, nil
}
func readChunk(reader io.Reader, chunk, remaining []byte, chunkSize int) (int, error) {
copy(chunk, remaining)
r := len(remaining)
for r < chunkSize {
n, err := reader.Read(chunk[r:])
r += n
if err != nil {
return r, err
}
}
return r, nil
}
And the benchmark part might look something like this:
func BenchmarkTestFilter(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := filterFile("base64dump.log", "AUDIT")
if err != nil {
b.Fatal(err)
}
}
}
The filter function was split and the actual job is done in func filter(reader io.Reader, pattern []byte, chunkSize int) (*bytes.Buffer, error).
By injecting a reader and a chunkSize, the creation of unit tests is already prepared or contemplated, which is missing here, but is definitely recommended when dealing with indexes.
However, the main point here was to find a way to significantly improve it in terms of performance.
I'm trying to download and decrypt HLS streams by using io.ReadFull to process the data in chunks to conserve memory:
Irrelevant parts of code has been left out for simplicity.
func main() {
f, _ := os.Create(out.ts)
for _, v := range mediaPlaylist {
resp, _ := http.Get(v.URI)
for {
r, err := decryptHLS(key, iv, resp.Body)
if err != nil && err == io.EOF {
break
else if err != nil && err != io.ErrUnexpectedEOF {
panic(err)
}
io.Copy(f, r)
}
}
}
func decryptHLS(key []byte, iv []byte, r io.Reader) (io.Reader, error) {
block, _ := aes.NewCipher(key)
buf := make([]byte, 8192)
mode := cipher.NewCBCDecrypter(block, iv)
n, err := io.ReadFull(r, buf)
if err != nil && err != io.ErrUnexpectedEOF {
return nil, err
}
mode.CryptBlocks(buf, buf)
return bytes.NewReader(buf[:n]), err
}
At first this seems to work as file size is correct and no errors during download,
but the video is corrupted. Not completely as the file is still recognized as a video, but image and sound is distorted.
If I change the code to use ioutil.ReadAll instead, the final video files will no longer be corrupted:
func main() {
f, _ := os.Create(out.ts)
for _, v := range mediaPlaylist {
resp, _ := http.Get(v.URI)
segment, _ := ioutil.ReadAll(resp.Body)
r, _ := decryptHLS(key, iv, &segment)
io.Copy(f, r)
}
}
func decryptHLS(key []byte, iv []byte, s *[]byte) io.Reader {
block, _ := aes.NewCipher(key)
mode := cipher.NewCBCDecrypter(block, iv)
mode.CryptBlocks(*s, *s)
return bytes.NewReader(*s)
}
Any ideas why it works correctly when reading the entire segment into memory, and not when using io.ReadFull and processing it in chunks?
Internally, CBCDecrypter makes a copy of your iv, so subsequent blocks start with the initial IV rather than the one that's been mutated by previous decryptions.
Create the decrypter once, and you should be able to keep re-using it to decrypt block by block (assuming the block size is a multiple of the block size expected by this crypto algorithm).
I have the following data structure that I expect to read from TCP socket connection, first 4 bytes are a uint32 that describes the length of the payload that follows these 4 bytes. I try to continuously read from a connection using following code:
// c is TCP connection
func StartReading(c io.Reader, ok chan bool) {
// Reader reads first 4 bytes as payload length
for l, err := getPayloadLength(c); err == nil; {
//Reader reads the rest of the message
b, err := readFixedSize(c, l)
if err != nil {
ok<- false
close(ok)
return
}
go process(b, make(chan bool))
}
ok<- true
}
func getPayloadLength(r io.Reader) (uint, error) {
b, err := readFixedSize(r, 4)
if err != nil {
return 0, err
}
return uint(binary.BigEndian.Uint32(b)), nil
}
// Read fixed size byte slice from reader
func readFixedSize(r io.Reader, len uint) (b []byte, err error) {
b = make([]byte, len)
_, err = io.ReadFull(r, b)
if err != nil {
return
}
return
}
My expectation is that it will read first four bytes from incoming data, parse it to l, and based on parsed value will read consequent l bytes. The first read from a connection yields expected results, but in all consequent read iterations, the reader seems to read 4 bytes from the end of the previous message.
By trial and error I ended up with the following code which reads as expected, but I still could not understand why the code above does not work as I expect.
New code:
func StartReading(c io.Reader, ok chan<- bool) {
br := bufio.NewReader(c)
// Peek into first 4 bytes for payload length
for lb, err := br.Peek(4); err == nil; {
// Read length bytes into uint
l := uint(binary.BigEndian.Uint32(lb))
b := make([]byte, l + 4)
_, err := br.Read(b)
if err != nil {
ok<- false
return
}
//Process from 4th byte
go process(b[4:], make(chan bool))
}
ok<- true
}
I do kind of understand why the latter code works, but can't wrap my head around why the first code does not work as expected. I'm quite new to Go, so could someone please explain what happened there?
I’ve written a short program in Go to communicate with a sensor through a serial port:
package main
import (
"fmt"
"github.com/tarm/goserial"
"time"
)
func main() {
c := &serial.Config{Name: "/dev/ttyUSB0", Baud: 9600}
s, err := serial.OpenPort(c)
if err != nil {
fmt.Println(err)
}
_, err = s.Write([]byte("\x16\x02N0C0 G A\x03\x0d\x0a"))
if err != nil {
fmt.Println(err)
}
time.Sleep(time.Second/2)
buf := make([]byte, 40)
n, err := s.Read(buf)
if err != nil {
fmt.Println(err)
}
fmt.Println(string(buf[:n]))
s.Close()
}
It works fine, but after writing to the port I have to wait about half a second before I can start reading from it. I would like to use a while-loop instead of time.Sleep to read all incoming data. My attempt doesn’t work:
buf := make([]byte, 40)
n := 0
for {
n, _ := s.Read(buf)
if n > 0 {
break
}
}
fmt.Println(string(buf[:n]))
I guess buf gets overwritten after every loop pass. Any suggestions?
Your problem is that Read() will return whenever it has some data - it won't wait for all the data. See the io.Reader specification for more info
What you want to do is read until you reach some delimiter. I don't know exactly what format you are trying to use, but it looks like maybe \x0a is the end delimiter.
In which case you would use a bufio.Reader like this
reader := bufio.NewReader(s)
reply, err := reader.ReadBytes('\x0a')
if err != nil {
panic(err)
}
fmt.Println(reply)
Which will read data until the first \x0a.
I guess buf gets overwritten after every loop pass. Any suggestions?
Yes, buf will get overwritten with every call to Read().
A timeout on the file handle would be the approach I would take.
s, _ := os.OpenFile("/dev/ttyS0", syscall.O_RDWR|syscall.O_NOCTTY|syscall.O_NONBLOCK, 0666)
t := syscall.Termios{
Iflag: syscall.IGNPAR,
Cflag: syscall.CS8 | syscall.CREAD | syscall.CLOCAL | syscall.B115200,
Cc: [32]uint8{syscall.VMIN: 0, syscall.VTIME: uint8(20)}, //2.0s timeout
Ispeed: syscall.B115200,
Ospeed: syscall.B115200,
}
// syscall
syscall.Syscall6(syscall.SYS_IOCTL, uintptr(s.Fd()),
uintptr(syscall.TCSETS), uintptr(unsafe.Pointer(&t)),
0, 0, 0)
// Send message
n, _ := s.Write([]byte("Test message"))
// Receive reply
for {
buf := make([]byte, 128)
n, err = s.Read(buf)
if err != nil { // err will equal io.EOF
break
}
fmt.Printf("%v\n", string(buf))
}
Also note, if there is no more data read and there is no error, os.File.Read() will return an error of io.EOF,
as you can see here.
I'm trying to compress file from buffered reader and pass compressed bytes through byte channel, but with poor results :), here's what I came up till now, obviously this don't works...
func Compress(r io.Reader) (<-chan byte) {
c := make(chan byte)
go func(){
var wBuff bytes.Buffer
rBuff := make([]byte, 1024)
writer := zlib.NewWriter(*wBuff)
for {
n, err := r.Read(rBuff)
if err != nil && err != io.EOF { panic(err) }
if n == 0 { break }
writer.Write(rBuff) // Compress and write compressed data
// How to send written compressed bytes through channel?
// as fas as I understand wBuff will eventually contain
// whole compressed data?
}
writer.Close()
close(c) // Indicate that no more data follows
}()
return c
}
Please bear with me, as I'm very new to Go
I suggest to use []byte instead of byte. It is more efficient. Because of concurrent memory accesses it may be necessary to send a copy of the buffer through the channel rather than sending the []byte buffer itself.
You can define a type ChanWriter chan []byte and let it implement the io.Writer interface. Then pass the ChanWriter to zlib.NewWriter.
You can create a goroutine for doing the compression and then immediately return the ChanWriter's channel from your Compress function. If there is no goroutine then there is no reason for the function to return a channel and the preferred return type is io.Reader.
The return type of the Compress function should be changed into something like chan <-BytesWithError. In this case ChanWriter can be defined as type ChanWriter chan BytesWithError.
Sending bytes one by one down a channel is not going to be particularly efficient. Another approach that may be more useful would be to return an object implementing the io.Reader interface, implementing the Read() method by reading a block from a original io.Reader and compressing its output before returning it.
Your writer.Write(rBuff) statement always writes len(rBuff) bytes, even when n != len(rBuff).
writer.Write(rBuff[:n])
Also, your Read loop is
for {
n, err := r.Read(rBuff)
if err != nil && err != io.EOF {
panic(err)
}
if n == 0 {
break
}
writer.Write(rBuff[:n])
// ...
}
which is equivalent to
for {
n, err := r.Read(rBuff)
if err != nil && err != io.EOF {
panic(err)
}
// !(err != nil && err != io.EOF)
// !(err != nil) || !(err != io.EOF)
// err == nil || err == io.EOF
if err == nil || err == io.EOF {
if n == 0 {
break
}
}
writer.Write(rBuff[:n])
// ...
}
The loop exits prematurely if err == nil && if n == 0.
Instead, write
for {
n, err := r.Read(rBuf)
if err != nil {
if err != io.EOF {
panic(err)
}
if n == 0 {
break
}
}
writer.Write(rBuf[:n])
// ...
}
Ok, I've found working solution: (Feel free to indicate where it can be improved, or maybe I'm doing something wrong?)
func Compress(r io.Reader) (<-chan byte) {
c := make(chan byte)
go func(){
var wBuff bytes.Buffer
rBuff := make([]byte, 1024)
writer := zlib.NewWriter(&wBuff)
for {
n, err := r.Read(rBuff)
if err != nil {
if err != io.EOF {
panic(err)
}
if n == 0 {
break
}
}
writer.Write(rBuff[:n])
for _, v := range wBuff.Bytes() {
c <- v
}
wBuff.Truncate(0)
}
writer.Close()
for _, v := range wBuff.Bytes() {
c <- v
}
close(c) // Indicate that no more data follows
}()
return c
}