golang - bufio read multiline until (CRLF) \r\n delimiter - go

I am trying to implement my own beanstalkd client as a way of learning go. https://github.com/kr/beanstalkd/blob/master/doc/protocol.txt
At the moment, I am using bufio to read in a line of data delimited by \n.
res, err := this.reader.ReadLine('\n')
This is fine for when I send a single command, and read a a single line response like: INSERTED %d\r\n but I find difficulties when I try to reserve a job because the job body could be multiple lines and as such, I cannot use the \n delimiter.
Is there a way to read into the buffer until CRLF?
e.g. when I send the reserve command. My expected response is as follows:
RESERVED <id> <bytes>\r\n
<data>\r\n
But data could contain \n, so I need to read until the \r\n.
Alternatively - is there a way of reading a specific number of bytes as specified in <bytes> in example response above?
At the moment, I have (err handling removed):
func (this *Bean) receiveLine() (string, error) {
res, err := this.reader.ReadString('\n')
return res, err
}
func (this *Bean) receiveBody(numBytesToRead int) ([]byte, error) {
res, err := this.reader.ReadString('\r\n') // What to do here to read to CRLF / up to number of expected bytes?
return res, err
}
func (this *Bean) Reserve() (*Job, error) {
this.send("reserve\r\n")
res, err := this.receiveLine()
var jobId uint64
var bodylen int
_, err = fmt.Sscanf(res, "RESERVED %d %d\r\n", &jobId, &bodylen)
body, err := this.receiveBody(bodylen)
job := new(Job)
job.Id = jobId
job.Body = body
return job, nil
}

res, err := this.reader.Read('\n')
Does not make any sense to me. Did you mean ReadBytes/ReadSlice/ReadString?
You need bufio.Scanner.
Define your bufio.SplitFunc (example is a copy of bufio.ScanLines with modifications to look for '\r\n'). Modify it to match your case.
// dropCR drops a terminal \r from the data.
func dropCR(data []byte) []byte {
if len(data) > 0 && data[len(data)-1] == '\r' {
return data[0 : len(data)-1]
}
return data
}
func ScanCRLF(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.Index(data, []byte{'\r','\n'}); i >= 0 {
// We have a full newline-terminated line.
return i + 2, dropCR(data[0:i]), nil
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
return len(data), dropCR(data), nil
}
// Request more data.
return 0, nil, nil
}
Now, wrap your io.Reader with your custom scanner.
scanner := bufio.NewScanner(this.reader)
scanner.Split(ScanCRLF)
// Set the split function for the scanning operation.
scanner.Split(split)
// Validate the input
for scanner.Scan() {
fmt.Printf("%s\n", scanner.Text())
}
if err := scanner.Err(); err != nil {
fmt.Printf("Invalid input: %s", err)
}
Read bufio package's source code about Scanner.
Alternatively - is there a way of reading a specific number of bytes as specified in in example response above?
First you need to read "RESERVED \r\n" line some how.
And then you can use
nr_of_bytes : = read_number_of_butes_somehow(this.reader)
buf : = make([]byte, nr_of_bytes)
this.reader.Read(buf)
or LimitedReader.
But i dont like this approach.
Thanks for this - reader.Read('\n') was a typo - I corrected question. I have also attached example code of where I have got so far. As you can see, I can get the number of expected bytes of the body. Could you elaborate on why you don't like the idea of reading a specific number of bytes? This seems most logical?
I'd like to see Bean's definition, especially reader's part.
Imagine, this counter is wrong somehow.
Its short: you need to find following "\r\n" and discard everything up to that point? or not? why do you need counter in the first place then?
Its bigger then it should be (or even worse its huge!).
2.1 No next message in the reader: fine, read is shorter then expected but its fine.
2.2 There is next message waiting: bah, you read part of it and there is no easy way to recover.
2.3 Its huge: you cant allocate memory even if message is only 1 byte.
This byte counters in general are designed to verify the message.
And looks like it is the case with beanstalkd protocol.
Use Scanner, parse message, check length with expected number ... profit
UPD
Be warned, default bufio.Scanner cant read more then 64k, set max length with scanner.Buffer first. And thats bad, because you cant change this option on the fly and some data may have had been "pre"-read by scanner.
UPD2
Thinking about my last update. Take a look at net.textproto how it implements dotReader like simple state machine. You could do something similar with reading command first and "expected bytes" checking on payload.

Related

Streaming Stdout and Stderr over SSH, manipulate the stream and then print to local Stdout, and Stderr

I'm performing a bunch of operations over SSH on a remote machine and I'm streaming its stdout and stderr and then consuming it to by a writer, which writes to the local stdout and stderr, along with byte buffers.
Just before the writer consumes it, I want to perform a series of string manipulations on it and then write to my screen and buffer. Up to this point, it all works fine and dandy.
My issue is now it's not a stream anymore, it hangs and then outputs the whole glob in one chunk. I want it to be real time, so I put channels in my go routines but with no improvement. Below are my functions, let me know if you can spot a reason why, or possibly a better way of achieving this.
// sending
func handleStdStream(filters []string, replaceFilters map[string]string, pipe io.Reader, readers chan io.Reader) {
if filters != nil {
// filters exist
// read first 8 bytes
res := readPipe(8, pipe)
// get each line from the resulting streamed output
for _, str := range strings.Split(res, "\n") {
if str != "" {
out := lineFilterAndReplace(str, filters, replaceFilters)
// instantiate an io.Reader obj from the given string
outReader := strings.NewReader(out)
readers <- outReader
}
}
} else {
// filters dont exist
if len(replaceFilters) > 0 {
res := readPipe(8, pipe)
for _, str := range strings.Split(res, "\n") {
if str != "" {
out := lineReplace(str, replaceFilters)
// instantiate an io.Reader obj from the given string
outReader := strings.NewReader(out)
readers <- outReader
}
}
} else {
readers <- pipe
}
}
}
// recieving
outReaders := make(chan io.Reader)
go handleStdStream(outFilters, replaceFilters, stdoutIn, outReaders)
go func() {
for {
pipe := <-outReaders
_, errStdout = io.Copy(outWriter, pipe)
}
// _, errStdout = io.Copy(outWriter, stdoutIn)
}()
I don't think you need channels or goroutines to accomplish this. The Writer and Reader interfaces are already streaming; you sip bytes from a Reader continuously until you hit EOF or an error and you hand off bytes to a Writer continuously until you're done or you get an error. On its own, processing a stream does not require any concurrency, so doing this sequentially in a single goroutine is quite appropriate.
You shouldn't ignore error returns. If a function or method returns an error value, you need to check it. In the case of I/O, you usually need to stop reading from a Reader when it returns an error and you usually need to stop writing to a Writer when it returns an error. In the case of a Reader you also have to check for the special "error" value io.EOF.
I think using Scanner from the bufio package is better than trying to do your own buffering/splitting. By default, Scanner splits input on newlines (Unix-style LF or DOS-style CRLF). It also gets rid of the need to check for io.EOF, provided you only interact with the Reader through the Scanner.
Consider the following version of handleStdStream:
func handleStdStream(filters []string, replaceFilters map[string]string, pipe io.Reader, w io.Writer) error {
scanner := bufio.NewScanner(pipe)
for scanner.Scan() {
str := scanner.Text()
if str == "" {
continue
}
out := ""
if len(filters) != 0 {
out = lineFilterAndReplace(str, filters, replaceFilters)
} else {
out = lineReplace(str, replaceFilters)
}
if _, err := w.Write([]byte(out)); err != nil {
return err
}
}
if err := scanner.Err(); err != nil {
return err
}
return nil
}
You would use it like this:
err := handleStdStream(filters, replaceFilters, pipe, outWriter)
if err != nil {
// do something, like printing the error to a log or stderr
}

Golang reading from serial

I'm trying to read from a serial port (a GPS device on a Raspberry Pi).
Following the instructions from http://www.modmypi.com/blog/raspberry-pi-gps-hat-and-python
I can read from shell using
stty -F /dev/ttyAMA0 raw 9600 cs8 clocal -cstopb
cat /dev/ttyAMA0
I get well formatted output
$GNGLL,5133.35213,N,00108.27278,W,160345.00,A,A*65
$GNRMC,160346.00,A,5153.35209,N,00108.27286,W,0.237,,290418,,,A*75
$GNVTG,,T,,M,0.237,N,0.439,K,A*35
$GNGGA,160346.00,5153.35209,N,00108.27286,W,1,12,0.67,81.5,M,46.9,M,,*6C
$GNGSA,A,3,29,25,31,20,26,23,21,16,05,27,,,1.11,0.67,0.89*10
$GNGSA,A,3,68,73,83,74,84,75,85,67,,,,,1.11,0.67,0.89*1D
$GPGSV,4,1,15,04,,,34,05,14,040,21,09,07,330,,16,45,298,34*40
$GPGSV,4,2,15,20,14,127,18,21,59,154,30,23,07,295,26,25,13,123,22*74
$GPGSV,4,3,15,26,76,281,40,27,15,255,20,29,40,068,19,31,34,199,33*7C
$GPGSV,4,4,15,33,29,198,,36,23,141,,49,30,172,*4C
$GLGSV,3,1,11,66,00,325,,67,13,011,20,68,09,062,16,73,12,156,21*60
$GLGSV,3,2,11,74,62,177,20,75,53,312,36,76,08,328,,83,17,046,25*69
$GLGSV,3,3,11,84,75,032,22,85,44,233,32,,,,35*62
$GNGLL,5153.35209,N,00108.27286,W,160346.00,A,A*6C
$GNRMC,160347.00,A,5153.35205,N,00108.27292,W,0.216,,290418,,,A*7E
$GNVTG,,T,,M,0.216,N,0.401,K,A*3D
$GNGGA,160347.00,5153.35205,N,00108.27292,W,1,12,0.67,81.7,M,46.9,M,,*66
$GNGSA,A,3,29,25,31,20,26,23,21,16,05,27,,,1.11,0.67,0.89*10
$GNGSA,A,3,68,73,83,74,84,75,85,67,,,,,1.11,0.67,0.89*1D
$GPGSV,4,1,15,04,,,34,05,14,040,21,09,07,330,,16,45,298,34*40
(I've put some random data in)
I'm trying to read this in Go. Currently, I have
package main
import "fmt"
import "log"
import "github.com/tarm/serial"
func main() {
config := &serial.Config{
Name: "/dev/ttyAMA0",
Baud: 9600,
ReadTimeout: 1,
Size: 8,
}
stream, err := serial.OpenPort(config)
if err != nil {
log.Fatal(err)
}
buf := make([]byte, 1024)
for {
n, err := stream.Read(buf)
if err != nil {
log.Fatal(err)
}
s := string(buf[:n])
fmt.Println(s)
}
}
But this prints malformed data. I suspect that this is due to the buffer size or the value of Size in the config struct being wrong, but I'm not sure how to get those values from the stty settings.
Looking back, I think the issue is that I'm getting a stream and I want to be able to iterate over lines of the stty, rather than chunks. This is how the stream is outputted:
$GLGSV,3
,1,09,69
,10,017,
,70,43,0
69,,71,3
2,135,27
,76,23,2
32,22*6F
$GLGSV
,3,2,09,
77,35,30
0,21,78,
11,347,,
85,31,08
1,30,86,
72,355,3
6*6C
$G
LGSV,3,3
,09,87,2
4,285,30
*59
$GN
GLL,5153
.34919,N
,00108.2
7603,W,1
92901.00
,A,A*6A
The struct you get back from serial.OpenPort() contains a pointer to an open os.File corresponding to the opened serial port connection. When you Read() from this, the library calls Read() on the underlying os.File.
The documentation for this function call is:
Read reads up to len(b) bytes from the File. It returns the number of bytes read and any error encountered. At end of file, Read returns 0, io.EOF.
This means you have to keep track of how much data was read. You also have to keep track of whether there were newlines, if this is important to you. Unfortunately, the underlying *os.File is not exported, so you'll find it difficult to use tricks like bufio.ReadLine(). It may be worth modifying the library and sending a pull request.
As Matthew Rankin noted in a comment, Port implements io.ReadWriter so you can simply use bufio to read by lines.
stream, err := serial.OpenPort(config)
if err != nil {
log.Fatal(err)
}
scanner := bufio.NewScanner(stream)
for scanner.Scan() {
fmt.Println(scanner.Text()) // Println will add back the final '\n'
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
Change
fmt.Println(s)
to
fmt.Print(s)
and you will probably get what you want.
Or did I misunderstand the question?
Two additions to Michael Hamptom's answer which can be useful:
line endings
You might receive data that is not newline-separated text. bufio.Scanner uses ScanLines by default to split the received data into lines - but you can also write your own line splitter based on the default function's signature and set it for the scanner:
scanner := bufio.NewScanner(stream)
scanner.Split(ownLineSplitter) // set custom line splitter function
reader shutdown
You might not receive a constant stream but only some packets of bytes from time to time. If no bytes arrive at the port, the scanner will block and you can't just kill it. You'll have to close the stream to do so, effectively raising an error. To not block any outer loops and handle errors appropriately, you can wrap the scanner in a goroutine that takes a context. If the context was cancelled, ignore the error, otherwise forward the error. In principle, this can look like
var errChan = make(chan error)
var dataChan = make(chan []byte)
ctx, cancelPortScanner := context.WithCancel(context.Background())
go func(ctx context.Context) {
scanner := bufio.NewScanner(stream)
for scanner.Scan() { // will terminate if connection is closed
dataChan <- scanner.Bytes()
}
// if execution reaches this point, something went wrong or stream was closed
select {
case <-ctx.Done():
return // ctx was cancelled, just return without error
default:
errChan <- scanner.Err() // ctx wasn't cancelled, forward error
}
}(ctx)
// handle data from dataChan, error from errChan
To stop the scanner, you would cancel the context and close the connection:
cancelPortScanner()
stream.Close()

io.Reader and Line Break issue involving a CSV file

I have an application which deals with CSV's being delivered via RabbitMQ from many different upstream applications - typically 5000-15,000 rows per file. Most of the time it works great. However a couple of these upstream applications are old (12-15 years) and the people who wrote them are long gone.
I'm unable to read CSV files from these older aplications due to the line breaks. I'm finding this a bit weird as the line breaks see to map to UTF-8 Carriage Returns (http://www.fileformat.info/info/unicode/char/000d/index.htm). Typically the app reads in only the headers from those older files and nothing else.
If I open one of these files in a text editor and save as utf-8 encoding overwriting the exiting file then it works with no issues at all.
Things I've tried I expected to work:
-Using a Reader:
ba := make([]byte, 262144000)
if _, err := file.Read(ba); err != nil {
return nil, err
}
ba = bytes.Trim(ba, "\x00")
bb := bytes.NewBuffer(ba)
reader := csv.NewReader(bb)
records, err := reader.ReadAll()
if err != nil {
return nil, err
}
-Using the Scanner to read line by line (get a bufio.Scanner: token too long)
scanner := bufio.NewScanner(file)
var bb bytes.Buffer
for scanner.Scan() {
bb.WriteString(fmt.Sprintf("%s\n", scanner.Text()))
}
// check for errors
if err = scanner.Err(); err != nil {
return nil, err
}
reader := csv.NewReader(&bb)
records, err := reader.ReadAll()
if err != nil {
return nil, err
}
Things I tried I expected not to work (and didn't):
Writing file contents to a new file (.txt) and reading the file back in (including running dos2unix against the created txt file)
Reading file into a standard string (hoping Go's UTF-8 encoding would magically kick in which of course it doesn't)
Reading file to Rune slice, then transforming to a string via byte slice
I'm aware of the https://godoc.org/golang.org/x/text/transform package but not too sure of a viable approach - it looks like the src encoding needs to be known to transform.
Am I stupidly overlooking something? Are there any suggestions how to transform these files into UTF-8 or update the line endings without knowing the file encoding whilst keeping the application working for all the other valid CSV files being delivered? Are there any options that don't involve me going byte to byte and doing a bytes.Replace I've not considered?
I'm hoping there's something really obvious I've overlooked.
Apologies - I can't share the CSV files for obvious reasons.
For anyone who's stumbled on this and wants an answer that doesn't involve strings.Replace, here's a method that wraps an io.Reader to replace solo carriage returns. It could probably be more efficient, but works better with huge files than a strings.Replace-based solution.
https://gist.github.com/b5/78edaae9e6a4248ea06b45d089c277d6
// ReplaceSoloCarriageReturns wraps an io.Reader, on every call of Read it
// for instances of lonely \r replacing them with \r\n before returning to the end customer
// lots of files in the wild will come without "proper" line breaks, which irritates go's
// standard csv package. This'll fix by wrapping the reader passed to csv.NewReader:
// rdr, err := csv.NewReader(ReplaceSoloCarriageReturns(r))
//
func ReplaceSoloCarriageReturns(data io.Reader) io.Reader {
return crlfReplaceReader{
rdr: bufio.NewReader(data),
}
}
// crlfReplaceReader wraps a reader
type crlfReplaceReader struct {
rdr *bufio.Reader
}
// Read implements io.Reader for crlfReplaceReader
func (c crlfReplaceReader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return
}
for {
if n == len(p) {
return
}
p[n], err = c.rdr.ReadByte()
if err != nil {
return
}
// any time we encounter \r & still have space, check to see if \n follows
// if next char is not \n, add it in manually
if p[n] == '\r' && n < len(p) {
if pk, err := c.rdr.Peek(1); (err == nil && pk[0] != '\n') || (err != nil && err.Error() == io.EOF.Error()) {
n++
p[n] = '\n'
}
}
n++
}
return
}
Have you tried to replace all line endings from \r\n or \r to \n ?

Why does conn.Read() write nothing into a []byte, but bufio.Reader.ReadString() works?

I have a connection, created like this:
conn, err = net.Dial("tcp", "127.0.0.1:20000")
I have tried reading from this connection in two ways. I think they both must work, but the first option doesn't.
Here is the first way of doing it:
var bytes []byte
for i := 0; i < 4; i++ {
conn.Read(bytes)
}
fmt.Printf("%v", bytes)
The output of this method is:
[]
And here is the same thing, done with bufio.Reader:
func readResponse(conn net.Conn) (response string, err error) {
reader := bufio.NewReader(conn)
_, err = reader.Discard(8)
if err != nil {
return
}
response, err = reader.ReadString('\n')
return
}
This function returns the response given by the server on the other end of the TCP connection.
Why does bufio.Reader.Read() work, but net.Conn.Read() doesn't?
The Conn.Read() method is to implement io.Reader, the general interface to read data from any source of bytes into a []byte. Quoting from the doc of Reader.Read():
Read reads up to len(p) bytes into p.
So Read() reads up to len(p) bytes but since you pass a nil slice, it won't read anything (length of a nil slice is 0). Please read the linked doc to know how Reader.Read() works.
Reader.Read() does not allocate a buffer ([]byte) where the read data will be stored, you have to create one and pass it, e.g.:
var buf = make([]byte, 100)
n, err := conn.Read(buf)
// n is the number of read bytes; don't forget to check err!
Don't forget to always check the returned error which may be io.EOF if end of data is reached. The general contract of io.Reader.Read() also allows returning some non-nil error (including io.EOF) and some read data (n > 0) at the same time. The number of read bytes will be in n, which means only the first n bytes of the buf is useful (in other words: buf[:n]).
Your other example using bufio.Reader works because you called Reader.ReadString() which doesn't require a []byte argument. If you would've used the bufio.Reader.Read() method, you would also had to pass a non-nil slice in order to actually get some data.

How to find EOF while reading from a file

I am using the following code to read a file in Go:
spoon , err := ioutil.ReadFile(os.Args[1])
if err!=nil {
panic ("File reading error")
}
Now I check for every byte I pick for what character it is. For example:
spoon[i]==' ' //for checking space
Likewise I read the whole file (I know there maybe other ways of reading it)
but keeping this way intact, how can I know that I have reached EOF of the file and I should stop reading it further?
Please don't suggest to find the length of spoon and start a loop. I want a sure shot way of finding EOF.
Use io.EOF to test for end-of-file. For example, to count spaces in a file:
package main
import (
"fmt"
"io"
"os"
)
func main() {
if len(os.Args) <= 1 {
fmt.Println("Missing file name argument")
return
}
f, err := os.Open(os.Args[1])
if err != nil {
fmt.Println(err)
return
}
defer f.Close()
data := make([]byte, 100)
spaces := 0
for {
data = data[:cap(data)]
n, err := f.Read(data)
if err != nil {
if err == io.EOF {
break
}
fmt.Println(err)
return
}
data = data[:n]
for _, b := range data {
if b == ' ' {
spaces++
}
}
}
fmt.Println(spaces)
}
ioutil.ReadFile() reads the entire contents of the file into a byte slice. You don't need to be concerned with EOF. EOF is a construct that is needed when you read a file one chunk at a time. You need to know which chunk has reached the end of the file when you're reading one chunk at a time.
The length of the byte slice returned by ioutil.ReadFile() is all you need.
data := ioutil.ReadFile(os.Args[1])
// Do we need to know the data size?
slice_size := len(data)
// Do we need to look at each byte?
for _,byte := range data {
// do something with each byte
}
This is what you need to look for to find out about End Of File(EOF)
if err != nil {
if errors.Is(err, io.EOF) { // prefered way by GoLang doc
fmt.Println("Reading file finished...")
}
break
}
When you use ioutil.ReadFile(), you don't ever see io.EOF, by design, because ReadFile will read the whole file until EOF is reached. So the slice it returns is the whole file. From the doc:
ReadFile reads the file named by filename and returns the contents. A successful call returns err == nil, not err == EOF. Because ReadFile reads the whole file, it does not treat an EOF from Read as an error to be reported.
From your question, you explicitly mention that you are aware there are other ways to read the file, and some of those ways require you to test the error for io.EOF, but not ReadFile.
Then, with the slice you have, you can read the file using the for...range construct, as others have mentioned. This is a sure way to read the whole file and nothing more (again, ReadFile takes care of that). Or iterating from 0 to len(spoon) - 1 would work too, but range is more idiomatic and basically does the same.
In other words: when you reach the end of the slice, you reach the end of the file (provided ReadFile did not return an error).
A slice has no concept of end of file. The slice returned by ioutil.ReadFile has a specific length, which reflects the size of the file it was read from. A common idiom, but only one of the possible used in this case, is to range the slice, effectively "consuming" all of the bytes, originally sitting in the file:
for i, b := range spoon {
// At index 'i' is byte 'b'
// At file's offset 'i', 'b' was read
... do something useful here
}

Resources