golang read bytes from net.TCPConn with 4 bytes as message separation - go

I am working on SIP over TCP mock service in golang. Incoming SIP messages are separated by '\r\n\r\n' sequence (I do not care about SDP for now). I want to extract message based on that delimiter and send it over to the processing goroutine. Looking through golang standard libraries I see no trivial way of achieving it. There seems to be no one shop stop in io and bufio packages. Currently I see two options of going forward (bufio):
*Reader.ReadBytes function with '/r' set as the delimiter. Further processing is done by using ReadByte function and comparing it sequentially with each byte of the delimiter and unreading them if necessary (which looks quite tedious)
Using Scanner with a custom split function, which does not look too trivial as well.
I wonder whether there are any other better options, functionality seems so common that it is hard to believe that it is not possible to just define delimiter for tcp stream and extract messages from it.

You can either choose to buffer the reads up yourself and split on the \r\n\r\n delimiter, or let a bufio.Scanner do it for you. There's nothing onerous about implementing a scanner.SplitFunc, and it's definitely simpler than the alternative. Using bufio.ScanLines as an example, you could use:
scanner.Split(func(data []byte, atEOF bool) (advance int, token []byte, err error) {
delim := []byte{'\r', '\n', '\r', '\n'}
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.Index(data, delim); i >= 0 {
return i + len(delim), data[0:i], nil
}
if atEOF {
return len(data), data, nil
}
return 0, nil, nil
}

Related

Getting EOF on 2nd prompt when using a file as Stdin (Golang)

I am trying to do functional testing of a cli app similar to this way.
As the command asks a few input on command prompt, I am putting them in a file and setting it as os.Stdin.
cmd := exec.Command(path.Join(dir, binaryName), "myArg")
tmpfile := setStdin("TheMasterPassword\nSecondAnswer\n12121212\n")
cmd.Stdin = tmpfile
output, err := cmd.CombinedOutput()
The setStdin just creates a tmpFile, write the string in file and returns the *os.File.
Now, I am expecting TheMasterPassword to be first input, and it's working. But for the second input always getting Critical Error: EOF.
The function I am using for asking and getting user input this :
func Ask(question string, minLen int) string {
reader := bufio.NewReader(os.Stdin)
for {
fmt.Printf("%s: ", question)
response, err := reader.ReadString('\n')
ExitIfError(err)
if len(response) >= minLen {
return strings.TrimSpace(response)
} else {
fmt.Printf("Provide at least %d character.\n", minLen)
}
}
}
Can you please help me to find out what's going wrong?
Thanks a lot!
Adding setStdin as requested
func setStdin(userInput string) *os.File {
tmpfile, err := ioutil.TempFile("", "test_stdin_")
util.ExitIfError(err)
_, err = tmpfile.Write([]byte(userInput))
util.ExitIfError(err)
_, err = tmpfile.Seek(0, 0)
util.ExitIfError(err)
return tmpfile
}
It pretty much looks like in your app your call Ask() whenever you want a single input line.
Inside Ask() you create a bufio.Reader to read from os.Stdin. Know that bufio.Reader–as its name suggests–uses buffered reading, meaning it may read more data from its source than what is returned by its methods (Reader.ReadString() in this case). Which means if you just use it to read one (or some) lines and you throw away the reader, you will throw away buffered, unread data.
So next time you call Ask() again, attempting to read from os.Stdin, you will not continue from where you left off...
To fix this issue, only create a single bufio.Reader from os.Stdin, store it in a global variable for example, and inside Ask(), always use this single reader. So buffered and unread data will not be lost between Ask() calls. Of course this solution will not be valid to call from multiple goroutines, but reading from a single os.Stdin isn't either.
For example:
var reader = bufio.NewReader(os.Stdin)
func Ask(question string, minLen int) string {
// use the global reader here...
}
Also note that using bufio.Scanner would be easier in your case. But again, bufio.Scanner may also read more data from its source than needed, so you have to use a shared bufio.Scanner here too. Also note that Reader.ReadString() returns you a string containing the delimeter (a line ending with \n in your case) which you probably have to trim, while Scanner.Text() (with the default line splitting function) will strip that first before returning the line. That's also a simplification you can take advantage of.

Reading from a file from bufio with a semi complex sequencing through file

So there may be questions like this but its not a super easy thing to google. Basically I have a file thats a set of protobufs encoded and sequenced as they normally are from the protobuf spec.
So think of the bytes values being chunked something like this throughout the file:
[EncodeVarInt(size of protobuf struct)] [protobuf stuct bytes]
So you have a few bytes read one at a time that are used for large jump of a read on our protof structure.
My implementation using the os ReadAt method on a file currently looks something like this.
// getting the next value in a file context feature
func (geobuf *Geobuf_Reader) Next() bool {
if geobuf.EndPos <= geobuf.Pos {
return false
} else {
startpos := int64(geobuf.Pos)
for int(geobuf.Get_Byte(geobuf.Pos)) > 127 {
geobuf.Pos += 1
}
geobuf.Pos += 1
sizebytes := make([]byte,geobuf.Pos-int(startpos))
geobuf.File.ReadAt(sizebytes,startpos)
size,_ := DecodeVarint(sizebytes)
geobuf.Feat_Pos = [2]int{int(size),geobuf.Pos}
geobuf.Pos = geobuf.Pos+int(size)
return true
}
return false
}
// reads a geobuf feature as geojson
func (geobuf *Geobuf_Reader) Feature() *geojson.Feature {
// getting raw bytes
a := make([]byte,geobuf.Feat_Pos[0])
geobuf.File.ReadAt(a,int64(geobuf.Feat_Pos[1]))
return Read_Feature(a)
}
How can I implement something like bufio or other chunked reading mechanisms to speed up so many file ReadAt's? Most bufio implementations I've seen are for having a specific delimitter. Thanks in advance hopefully this wasn't a horrible question.
Package bufio
import "bufio"
type SplitFunc
SplitFunc is the signature of the split function used to tokenize the
input. The arguments are an initial substring of the remaining
unprocessed data and a flag, atEOF, that reports whether the Reader
has no more data to give. The return values are the number of bytes to
advance the input and the next token to return to the user, plus an
error, if any. If the data does not yet hold a complete token, for
instance if it has no newline while scanning lines, SplitFunc can
return (0, nil, nil) to signal the Scanner to read more data into the
slice and try again with a longer slice starting at the same point in
the input.
If the returned error is non-nil, scanning stops and the error is
returned to the client.
The function is never called with an empty data slice unless atEOF is
true. If atEOF is true, however, data may be non-empty and, as always,
holds unprocessed text.
type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
Use bufio.Scanner and write a custom protobuf struct SplitFunc.

Golang buffer with concurrent readers

I want to build a buffer in Go that supports multiple concurrent readers and one writer. Whatever is written to the buffer should be read by all readers. New readers are allowed to drop in at any time, which means already written data must be able to be played back for late readers.
The buffer should satisfy the following interface:
type MyBuffer interface {
Write(p []byte) (n int, err error)
NextReader() io.Reader
}
Do you have any suggestions for such an implementation preferably using built in types?
Depending on the nature of this writer and how you use it, keeping everything in memory (to be able to re-play everything for readers joining later) is very risky and might demand a lot of memory, or cause your app to crash due to out of memory.
Using it for a "low-traffic" logger keeping everything in memory is probably ok, but for example streaming some audio or video is most likely not.
If the reader implementations below read all the data that was written to the buffer, their Read() method will report io.EOF, properly. Care must be taken as some constructs (such as bufio.Scanner) may not read more data once io.EOF is encountered (but this is not the flaw of our implementation).
If you want the readers of our buffer to wait if no more data is available in the buffer, to wait until new data is written instead of returning io.EOF, you may wrap the returned readers in a "tail reader" presented here: Go: "tail -f"-like generator.
"Memory-safe" file implementation
Here is an extremely simple and elegant solution. It uses a file to write to, and also uses files to read from. The synchronization is basically provided by the operating system. This does not risk out of memory error, as the data is solely stored on the disk. Depending on the nature of your writer, this may or may not be sufficient.
I will rather use the following interface, because Close() is important in case of files.
type MyBuf interface {
io.WriteCloser
NewReader() (io.ReadCloser, error)
}
And the implementation is extremely simple:
type mybuf struct {
*os.File
}
func (mb *mybuf) NewReader() (io.ReadCloser, error) {
f, err := os.Open(mb.Name())
if err != nil {
return nil, err
}
return f, nil
}
func NewMyBuf(name string) (MyBuf, error) {
f, err := os.Create(name)
if err != nil {
return nil, err
}
return &mybuf{File: f}, nil
}
Our mybuf type embeds *os.File, so we get the Write() and Close() methods for "free".
The NewReader() simply opens the existing, backing file for reading (in read-only mode) and returns it, again taking advantage of that it implements io.ReadCloser.
Creating a new MyBuf value is implementing in the NewMyBuf() function which may also return an error if creating the file fails.
Notes:
Note that since mybuf embeds *os.File, it is possible with a type assertion to "reach" other exported methods of os.File even though they are not part of the MyBuf interface. I do not consider this a flaw, but if you want to disallow this, you have to change the implementation of mybuf to not embed os.File but rather have it as a named field (but then you have to add the Write() and Close() methods yourself, properly forwarding to the os.File field).
In-memory implementation
If the file implementation is not sufficient, here comes an in-memory implementation.
Since we're now in-memory only, we will use the following interface:
type MyBuf interface {
io.Writer
NewReader() io.Reader
}
The idea is to store all byte slices that are ever passed to our buffer. Readers will provide the stored slices when Read() is called, each reader will keep track of how many of the stored slices were served by its Read() method. Synchronization must be dealt with, we will use a simple sync.RWMutex.
Without further ado, here is the implementation:
type mybuf struct {
data [][]byte
sync.RWMutex
}
func (mb *mybuf) Write(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
// Cannot retain p, so we must copy it:
p2 := make([]byte, len(p))
copy(p2, p)
mb.Lock()
mb.data = append(mb.data, p2)
mb.Unlock()
return len(p), nil
}
type mybufReader struct {
mb *mybuf // buffer we read from
i int // next slice index
data []byte // current data slice to serve
}
func (mbr *mybufReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
// Do we have data to send?
if len(mbr.data) == 0 {
mb := mbr.mb
mb.RLock()
if mbr.i < len(mb.data) {
mbr.data = mb.data[mbr.i]
mbr.i++
}
mb.RUnlock()
}
if len(mbr.data) == 0 {
return 0, io.EOF
}
n = copy(p, mbr.data)
mbr.data = mbr.data[n:]
return n, nil
}
func (mb *mybuf) NewReader() io.Reader {
return &mybufReader{mb: mb}
}
func NewMyBuf() MyBuf {
return &mybuf{}
}
Note that the general contract of Writer.Write() includes that an implementation must not retain the passed slice, so we have to make a copy of it before "storing" it.
Also note that the Read() of readers attempts to lock for minimal amount of time. That is, it only locks if we need new data slice from buffer, and only does read-locking, meaning if the reader has a partial data slice, will send that in Read() without locking and touching the buffer.
I linked to the append only commit log, because it seems very similar to your requirements. I am pretty new to distributed systems and the commit log so I may be butchering a couple of the concepts, but the kafka introduction clearly explains everything with nice charts.
Go is also pretty new to me, so i'm sure there's a better way to do it:
But perhaps you could model your buffer as a slice, I think a couple of cases:
buffer has no readers, new data is written to the buffer, buffer length grows
buffer has one/many reader(s):
reader subscribes to buffer
buffer creates and returns a channel to that client
buffer maintains a list of client channels
write occurs -> loops through all client channels and publishes to it (pub sub)
This addresses a pubsub real time consumer stream, where messages are fanned out, but does not address the backfill.
Kafka enables a backfill and their intro illustrates how it can be done :)
This offset is controlled by the consumer: normally a consumer will
advance its offset linearly as it reads records, but, in fact, since
the position is controlled by the consumer it can consume records in
any order it likes. For example a consumer can reset to an older
offset to reprocess data from the past or skip ahead to the most
recent record and start consuming from "now".
This combination of features means that Kafka consumers are very
cheap—they can come and go without much impact on the cluster or on
other consumers. For example, you can use our command line tools to
"tail" the contents of any topic without changing what is consumed by
any existing consumers.
I had to do something similar as part of an experiment, so sharing:
type MultiReaderBuffer struct {
mu sync.RWMutex
buf []byte
}
func (b *MultiReaderBuffer) Write(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
b.mu.Lock()
b.buf = append(b.buf, p...)
b.mu.Unlock()
return len(p), nil
}
func (b *MultiReaderBuffer) NewReader() io.Reader {
return &mrbReader{mrb: b}
}
type mrbReader struct {
mrb *MultiReaderBuffer
off int
}
func (r *mrbReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
r.mrb.mu.RLock()
n = copy(p, r.mrb.buf[r.off:])
r.mrb.mu.RUnlock()
if n == 0 {
return 0, io.EOF
}
r.off += n
return n, nil
}

Golang high cpu usage on simple webserver unable to understand why?

So I have a simple net/http webserver. All it does is is deliver 100MB of random bytes, which I intend to use for network speed testing. My handler for the 100mb endpoint is really simple (pasted below). The code works fine and I get my random byte file, the problem is when I run this and someone downloads these 100megabytes, the CPU for this program shoots up to 150% and stays there until this handler finishes running. Am I doing something very wrong here? What could I do to improve this handler's performance?
func downloadHandler(w http.ResponseWriter, r *http.Request) {
str := RandStringBytes(8192); //generates 8192 bytes of randomness
sz := 1000*1000*100; //100Megabytes
iter := sz/len(str)+1;
w.Header().Set("Content-Type", "application/octet-stream")
w.Header().Set("Content-Length", strconv.Itoa( sz ))
for i := 0; i < iter ; i++ {
fmt.Fprintf(w, str )
}
}
The problem is that fmt.Fprintf() expects a format string:
func Fprintf(w io.Writer, format string, a ...interface{}) (n int, err error)
And you pass it a big, 8 KB format string. The fmt package has to analyze the format string, it is not something that gets to the output as is. Most definately this is what is eating your CPU.
If the random string contains the special % sign, that even makes your case worse, as then fmt.Fprintf() might expect further arguments which you don't "deliver", so the fmt package also has to (will) include error messages in the output, such as:
fmt.Fprintf(os.Stdout, "aaa%bbb%d")
Output:
aaa%!b(MISSING)bb%!d(MISSING)
Use fmt.Fprint() instead which does not expect a format string:
fmt.Fprint(w, str)
Or even better, convert your random string to a byte slice once, and just keep writing that:
data := []byte(str)
for i := 0; i < iter; i++ {
if _, err := w.Write(data); err != nil {
// Handle error, e.g. return
}
}
Delivering large amount of data – you won't get a faster solution than writing a prepared byte slice in a loop (maybe slightly if you vary the size of the slice). If your solution is still "slow", that might be due to your RandStringBytes() function which we don't know anything about, or your output might be compressed (gzipped) if you use other handlers or some framework (which does use relatively high CPU). Also if the client that receives the response is also on your computer (e.g. a browser), it –or a firewall / antivirus software– may check / analyze the response for malicious code (which may also be resource intensive).

Most efficient way to read Zlib compressed file in Golang?

I'm reading in and at the same time parsing (decoding) a file in a custom format, which is compressed with zlib. My question is how can I efficiently uncompress and then parse the uncompressed content without growing the slice? I would like to parse it whilst reading it into a reusable buffer.
This is for a speed-sensitive application and so I'd like to read it in as efficiently as possible. Normally I would just ioutil.ReadAll and then loop again through the data to parse it. This time I'd like to parse it as it's read, without having to grow the buffer into which it is read, for maximum efficiency.
Basically I'm thinking that if I can find a buffer of the perfect size then I can read into this, parse it, and then write over the buffer again, then parse that, etc. The issue here is that the zlib reader appears to read an arbitrary number of bytes each time Read(b) is called; it does not fill the slice. Because of this I don't know what the perfect buffer size would be. I'm concerned that it might break up some of the data that I wrote into two chunks, making it difficult to parse because one say uint64 could be split from into two reads and therefore not occur in the same buffer read - or perhaps that can never happen and it's always read out in chunks of the same size as were originally written?
What is the optimal buffer size, or is there a way to calculate this?
If I have written data into the zlib writer with f.Write(b []byte) is it possible that this same data could be split into two reads when reading back the compressed data (meaning I will have to have a history during parsing), or will it always come back in the same read?
You can wrap your zlib reader in a bufio reader, then implement a specialized reader on top that will rebuild your chunks of data by reading from the bufio reader until a full chunk is read. Be aware that bufio.Read calls Read at most once on the underlying Reader, so you need to call ReadByte in a loop. bufio will however take care of the unpredictable size of data returned by the zlib reader for you.
If you do not want to implement a specialized reader, you can just go with a bufio reader and read as many bytes as needed with ReadByte() to fill a given data type. The optimal buffer size is at least the size of your largest data structure, up to whatever you can shove into memory.
If you read directly from the zlib reader, there is no guarantee that your data won't be split between two reads.
Another, maybe cleaner, solution is to implement a writer for your data, then use io.Copy(your_writer, zlib_reader).
OK, so I figured this out in the end using my own implementation of a reader.
Basically the struct looks like this:
type reader struct {
at int
n int
f io.ReadCloser
buf []byte
}
This can be attached to the zlib reader:
// Open file for reading
fi, err := os.Open(filename)
if err != nil {
return nil, err
}
defer fi.Close()
// Attach zlib reader
r := new(reader)
r.buf = make([]byte, 2048)
r.f, err = zlib.NewReader(fi)
if err != nil {
return nil, err
}
defer r.f.Close()
Then x number of bytes can be read straight out of the zlib reader using a function like this:
mydata := r.readx(10)
func (r *reader) readx(x int) []byte {
for r.n < x {
copy(r.buf, r.buf[r.at:r.at+r.n])
r.at = 0
m, err := r.f.Read(r.buf[r.n:])
if err != nil {
panic(err)
}
r.n += m
}
tmp := make([]byte, x)
copy(tmp, r.buf[r.at:r.at+x]) // must be copied to avoid memory leak
r.at += x
r.n -= x
return tmp
}
Note that I have no need to check for EOF because I my parser should stop itself at the right place.

Resources