Scan strings containing white spaces using fmt.Scan()/fmt.Scanf()/fmt.Scanln()? - go

Using the Go language, to read input strings with spaces, I have to use
s, err := bufio.NewReader(os.Stdin).ReadString('\n')
Is there is any way to use fmt.Scan, fmt.Scanf, or fmt.Scanln()?

If you're building a CLI tool I highly suggest you check out cobra. It's written in pure go (see dependencies) and used by multiple well-known projects.
Alternatively, I wrote a quick (gross) example to demonstrate how you could gain finer control with the Reader interface by linearly reading individual bytes from stdin.
func byteByByte() [][]byte {
reader := bufio.NewReader(os.Stdin)
buffer, result := []byte{}, [][]byte{}
for {
c, err := reader.ReadByte()
if err != nil {
break
}
if c == byte(32) {
result, buffer = append(result, buffer), []byte{}
continue
}
buffer = append(buffer, c)
}
return result
}
Here we are temporarily buffering results until a space is reached, at which, the temporary buffer is dumped into a larger one.
This is meant as an example to show you how the reader interface can be used with more control/granularity, not as a piece of code to be used verbatim.

Related

Why is my Go app not reading from sysfs like the busybox `cat` command?

Go 1.12 on Linux 4.19.93 armv6l.
Hardware is a raspberypi zero w (BCM2835) running a yocto linux image.
I've got a gpio driven SRF04 proximity sensor driven by the srf04 linux driver.
It works great over sysfs and the busybox shell.
# cat /sys/bus/iio/devices/iio:device0/in_distance_raw
1646
I've used Go before with IIO devices that support triggers and buffered output at high sample rates on this hardware platform. However for this application the srf04 driver doesn't implement those IIO features. Drat. I don't really feel like adding buffer / trigger support to the driver myself (at this time) since I do not have a need for a 'high' sample rate. A handful of pings per second should suffice for my purpose. I figure I'll calculate mean & std. dev. for a rolling window of data points and 'divine' the signal out of the noise.
So with that - I'd be perfectly happy to Read the bytes from the published sysfs file with Go.
Which brings me to the point of this post.
When I open the file for reading, and try to Read() any number of bytes, I always get a generic -EIO error.
func (s *Srf04) Read() (int, error) {
samp := make([]byte, 16)
f, err := os.OpenFile(s.readPath, OS.O_RDONLY, os.ModeDevice)
if err != nil {
return 0, err
}
defer f.Close()
n, err := f.Read(samp)
if err != nil {
// This block is always executed.
// The error is never a timeout, and always 'input/output error' (-EIO aka -5)
log.Fatal(err)
}
...
}
This seems like strange behavior to me.
So I decided to mess with using io.ReadFull. This yielded unreliable results.
func (s *Srf04) Read() (int, error) {
samp := make([]byte, 16)
f, err := os.OpenFile(s.readPath, OS.O_RDONLY, os.ModeDevice)
if err != nil {
return 0, err
}
defer f.Close()
for {
n, err := io.ReadFull(readFile, samp)
log.Println("ReadFull ", n, " bytes.")
if err == io.EOF {
break
}
if err != nil {
log.Println(err)
}
}
...
}
I ended up adding it to a loop, as I found behavior changes from 'one-off' reads to multiple read calls subsequent to one another. I have it exiting if it gets an EOF, and repeatedly trying to read otherwise.
The results are straight-up crazy unreliable, seemingly returning random results. Sometimes I get the -5, other times I read between 2 - 5 bytes from the device. Sometimes I get bytes without an eof file before the EOF. The bytes appear to represent character data for numbers (each rune is a rune between [0-9]) -- which I'd expect.
Aside: I expect this is related to file polling and the go blocking IO implementation, but I have no way to really tell.
As a temporary workaround, I decided try using os.exec, and now I get results I'd expect to see.
func (s *Srf04)Read() (int, error) {
out, err := exec.Command("cat", s.readPath).Output()
if err != nil {
return 0, err
}
return strconv.Atoi(string(out))
}
But Yick. os.exec. Yuck.
I'd try to run that cat whatever encantation under strace and then peer at what read(2) calls cat actually manages to do (including the number of bytes actually read), and then I'd try to re-create that behaviour in Go.
My own sheer guess at the problem's cause is that the driver (or the sysfs layer) is not too well prepared to deal with certain access patterns.
For a start, consider that GNU cat is not a simple-minded byte shoveler but is rather a reasonably tricky piece of software, which, among other things, considers optimal I/O block sizes for both input and output devices (if available), calls fadvise(2) etc. It's not that any of that gets actually used when you run it on your sysfs-exported file, but it may influence how the full stack (starting with the sysfs layer) performs in the case of using cat and with your code, respectively.
Hence my advice: start with strace-ing the cat and then try to re-create its usage pattern in your Go code; then try to come up with a minimal subset of that, which works; then profoundly comment your code ;-)
I'm sure I've been looking at this too long tonight, and this code is probably terrible. That said, here's the snippet of what I came up with that works just as reliably as the busybox cat, but in Go.
The Srf04 struct carries a few things, the important bits are included below:
type Srf04 struct {
readBuf []byte `json:"-"`
readFile *os.File `json:"-"`
samples *ring.Ring `json:"-"`
}
func (s *Srf04) Read() (int, error) {
/** Reliable, but really really slow.
out, err := exec.Command("cat", s.readPath).Output()
if err != nil {
log.Fatal(err)
}
val, err := strconv.Atoi(string(out[:len(out) - 2]))
if err == nil {
s.samples.Value = val
s.samples = s.samples.Next()
}
*/
// Seek should tell us the new offset (0) and no err.
bytesRead := 0
_, err := s.readFile.Seek(0, 0)
// Loop until N > 0 AND err != EOF && err != timeout.
if err == nil {
n := 0
for {
n, err = s.readFile.Read(s.readBuf)
bytesRead += n
if os.IsTimeout(err) {
// bail out.
bytesRead = 0
break
}
if err == io.EOF {
// Success!
break
}
// Any other err means 'keep trying to read.'
}
}
if bytesRead > 0 {
val, err := strconv.Atoi(string(s.readBuf[:bytesRead-1]))
if err == nil {
fmt.Println(val)
s.samples.Value = val
s.samples = s.samples.Next()
}
return val, err
}
return 0, err
}

Multi line buffered read in go

I am trying to read file in buffered manner because I have very large files. I want to apply some text replacement on a file. Suppose for each read I search for a word 'foo' and replace it with some other word 'bar'. If I read using buffer of some size 5MB then it may be the case foo will split into two reads may be one read 'fo' and another read 'o' then I will not be able to find that word. Is there a way so that I can use buffered read upto last newline or may be read multiple line in buffer
I did below. But It will not read upto next line or previous line
file, err := os.Open(filename)
if err != nil {
panic(err)
}
defer file.Close()
byteSlice := make([]byte, 5*1024*1024) // read 5 MB
bufioreader := bufio.NewReaderSize(file, bufferSize)
for {
n, err := bufioreader.Read(byteSlice)
if n > 0 {
fmt.Println(byteSlice[:n])
} else if err == io.EOF {
break
} else {
panic(err)
}
}
Since you're using the bufio reader, you shouldn't really work on aligning the input with buffer boundaries yourself. Use one of the high level read functions, such as `bufioreader.ReadString('\n'), which will read a line using the underlying buffer, and you won't have to deal with line delimiters yourself.
You don’t need bufio reader if you have your own buffer. With your code you have a useless copy of data from the buffer in bufio to the byteslice.
Regarding the split "foo" problem, the solution is to move the last 2 characters from the buffer to the front before the next read.
More precisely, if the word to replace has length m, the copy the m-1 last letters of the buffer to the front of the buffer, fill the remain of the buffer and search for the word to replace in the buffer.
// assume we want to find word
file, err := os.Open(filename)
if err != nil {
panic(err)
}
defer file.Close()
trailingLen := len(word)-1
dataLen := 5*1024*1024 + trailingLen
data := make([]byte, dataLen) // read 5 MB
for {
n, err := file.Read(data[trailingLen:])
if err != nil {
if err == io.EOF {
break
}
panic(err)
}
// search and replace word in data[:n]
if n == dataLen {
copy(data, data[dataLen-trailingLen:])
}
}

golang - bufio read multiline until (CRLF) \r\n delimiter

I am trying to implement my own beanstalkd client as a way of learning go. https://github.com/kr/beanstalkd/blob/master/doc/protocol.txt
At the moment, I am using bufio to read in a line of data delimited by \n.
res, err := this.reader.ReadLine('\n')
This is fine for when I send a single command, and read a a single line response like: INSERTED %d\r\n but I find difficulties when I try to reserve a job because the job body could be multiple lines and as such, I cannot use the \n delimiter.
Is there a way to read into the buffer until CRLF?
e.g. when I send the reserve command. My expected response is as follows:
RESERVED <id> <bytes>\r\n
<data>\r\n
But data could contain \n, so I need to read until the \r\n.
Alternatively - is there a way of reading a specific number of bytes as specified in <bytes> in example response above?
At the moment, I have (err handling removed):
func (this *Bean) receiveLine() (string, error) {
res, err := this.reader.ReadString('\n')
return res, err
}
func (this *Bean) receiveBody(numBytesToRead int) ([]byte, error) {
res, err := this.reader.ReadString('\r\n') // What to do here to read to CRLF / up to number of expected bytes?
return res, err
}
func (this *Bean) Reserve() (*Job, error) {
this.send("reserve\r\n")
res, err := this.receiveLine()
var jobId uint64
var bodylen int
_, err = fmt.Sscanf(res, "RESERVED %d %d\r\n", &jobId, &bodylen)
body, err := this.receiveBody(bodylen)
job := new(Job)
job.Id = jobId
job.Body = body
return job, nil
}
res, err := this.reader.Read('\n')
Does not make any sense to me. Did you mean ReadBytes/ReadSlice/ReadString?
You need bufio.Scanner.
Define your bufio.SplitFunc (example is a copy of bufio.ScanLines with modifications to look for '\r\n'). Modify it to match your case.
// dropCR drops a terminal \r from the data.
func dropCR(data []byte) []byte {
if len(data) > 0 && data[len(data)-1] == '\r' {
return data[0 : len(data)-1]
}
return data
}
func ScanCRLF(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if i := bytes.Index(data, []byte{'\r','\n'}); i >= 0 {
// We have a full newline-terminated line.
return i + 2, dropCR(data[0:i]), nil
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
return len(data), dropCR(data), nil
}
// Request more data.
return 0, nil, nil
}
Now, wrap your io.Reader with your custom scanner.
scanner := bufio.NewScanner(this.reader)
scanner.Split(ScanCRLF)
// Set the split function for the scanning operation.
scanner.Split(split)
// Validate the input
for scanner.Scan() {
fmt.Printf("%s\n", scanner.Text())
}
if err := scanner.Err(); err != nil {
fmt.Printf("Invalid input: %s", err)
}
Read bufio package's source code about Scanner.
Alternatively - is there a way of reading a specific number of bytes as specified in in example response above?
First you need to read "RESERVED \r\n" line some how.
And then you can use
nr_of_bytes : = read_number_of_butes_somehow(this.reader)
buf : = make([]byte, nr_of_bytes)
this.reader.Read(buf)
or LimitedReader.
But i dont like this approach.
Thanks for this - reader.Read('\n') was a typo - I corrected question. I have also attached example code of where I have got so far. As you can see, I can get the number of expected bytes of the body. Could you elaborate on why you don't like the idea of reading a specific number of bytes? This seems most logical?
I'd like to see Bean's definition, especially reader's part.
Imagine, this counter is wrong somehow.
Its short: you need to find following "\r\n" and discard everything up to that point? or not? why do you need counter in the first place then?
Its bigger then it should be (or even worse its huge!).
2.1 No next message in the reader: fine, read is shorter then expected but its fine.
2.2 There is next message waiting: bah, you read part of it and there is no easy way to recover.
2.3 Its huge: you cant allocate memory even if message is only 1 byte.
This byte counters in general are designed to verify the message.
And looks like it is the case with beanstalkd protocol.
Use Scanner, parse message, check length with expected number ... profit
UPD
Be warned, default bufio.Scanner cant read more then 64k, set max length with scanner.Buffer first. And thats bad, because you cant change this option on the fly and some data may have had been "pre"-read by scanner.
UPD2
Thinking about my last update. Take a look at net.textproto how it implements dotReader like simple state machine. You could do something similar with reading command first and "expected bytes" checking on payload.

Read entire file of newline delimited JSON blobs to memory and unmarshal each blob with the least amount of conversions in golang?

I'm new to go, so don't know a whole lot about the language specific constructs.
My use case is first to read into memory an input file containing JSON blobs that are newline delimited. From this "array" of JSON source, I'd like to unmarshal each array element to deal with it in golang. The expected structure mapping is already defined.
I typically like to read all lines at once, so ioutil.ReadFile() as mentioned in How can I read a whole file into a string variable in Golang? seems like a good choice. And json.Unmarshal appears to take byte array as the source. But if I'm using ReadFile(), I have a single array of bytes for the whole file. How might I extract slices of this byte array such that the newline bytes (as delimiters) are skipped and each slice is one of those JSON blobs? I'd assume the best technique is one that doesn't do or minimizes data type conversions. As the easy hack would be something like convert the byte array to string, split the newline delimited string to array then cast each string array element back to bytes to pass to json.Unmarshal. I'd prefer the optimized approach but not sure how to tackle the implementation algorithm details in go, could use some tips here.
Ideally, I'd like the preprocessing done beforehand, so that I'm not dealing with the content of the JSON byte array from file as I'm iterating over the slices, etc. Rather I'd like to preprocess the single byte array read from file into an array of byte array slices, with all the newline bytes removed, each slice being the segments that were delimited by newline.
Use bufio.Scanner to read a line at a time:
f, err := os.Open(fname)
if err != nil {
// handle error
}
s := bufio.NewScanner(f)
for s.Scan() {
var v ValueTypeToUnmarshalTo
if err := json.Unmarshal(s.Bytes(), &v); err != nil {
//handle error
}
// do something with v
}
if s.Err() != nil {
// handle scan error
}
or use ioutil.ReadFile to slurp up the entire file and bytes.Split to break the file into lines:
p, err := ioutil.ReadFile(fname)
if err != nil {
// handle error
}
for _, line := range bytes.Split(p, []byte{'\n'}) {
var v ValueTypeToUnmarshalTo
if err := json.Unmarshal(line, &v); err != nil {
//handle error
}
// do something with v
}
or use the json.Decoder built-in streaming feature to read mulitple values from the file:
f, err := os.Open(fname)
if err != nil {
// handle error
}
d := json.NewDecoder(f)
for {
var v ValueTypeToUnmarshalTo
if err := d.Decode(&v); err == io.EOF {
break // done decoding file
} else if err != nil {
// handle error
}
// do something with v
}
Run the code on the playground
The ioutil.ReadFile approach uses more memory than the other approaches (one byte for each byte in file plus one slice header for each line).
Because the decoder ignores whitespace following a JSON value, the three approaches handle \r\n line terminators.
There are no data conversions in any of these approaches other than those inherent to unmarshalling JSON bytes to Go values.

How to find EOF while reading from a file

I am using the following code to read a file in Go:
spoon , err := ioutil.ReadFile(os.Args[1])
if err!=nil {
panic ("File reading error")
}
Now I check for every byte I pick for what character it is. For example:
spoon[i]==' ' //for checking space
Likewise I read the whole file (I know there maybe other ways of reading it)
but keeping this way intact, how can I know that I have reached EOF of the file and I should stop reading it further?
Please don't suggest to find the length of spoon and start a loop. I want a sure shot way of finding EOF.
Use io.EOF to test for end-of-file. For example, to count spaces in a file:
package main
import (
"fmt"
"io"
"os"
)
func main() {
if len(os.Args) <= 1 {
fmt.Println("Missing file name argument")
return
}
f, err := os.Open(os.Args[1])
if err != nil {
fmt.Println(err)
return
}
defer f.Close()
data := make([]byte, 100)
spaces := 0
for {
data = data[:cap(data)]
n, err := f.Read(data)
if err != nil {
if err == io.EOF {
break
}
fmt.Println(err)
return
}
data = data[:n]
for _, b := range data {
if b == ' ' {
spaces++
}
}
}
fmt.Println(spaces)
}
ioutil.ReadFile() reads the entire contents of the file into a byte slice. You don't need to be concerned with EOF. EOF is a construct that is needed when you read a file one chunk at a time. You need to know which chunk has reached the end of the file when you're reading one chunk at a time.
The length of the byte slice returned by ioutil.ReadFile() is all you need.
data := ioutil.ReadFile(os.Args[1])
// Do we need to know the data size?
slice_size := len(data)
// Do we need to look at each byte?
for _,byte := range data {
// do something with each byte
}
This is what you need to look for to find out about End Of File(EOF)
if err != nil {
if errors.Is(err, io.EOF) { // prefered way by GoLang doc
fmt.Println("Reading file finished...")
}
break
}
When you use ioutil.ReadFile(), you don't ever see io.EOF, by design, because ReadFile will read the whole file until EOF is reached. So the slice it returns is the whole file. From the doc:
ReadFile reads the file named by filename and returns the contents. A successful call returns err == nil, not err == EOF. Because ReadFile reads the whole file, it does not treat an EOF from Read as an error to be reported.
From your question, you explicitly mention that you are aware there are other ways to read the file, and some of those ways require you to test the error for io.EOF, but not ReadFile.
Then, with the slice you have, you can read the file using the for...range construct, as others have mentioned. This is a sure way to read the whole file and nothing more (again, ReadFile takes care of that). Or iterating from 0 to len(spoon) - 1 would work too, but range is more idiomatic and basically does the same.
In other words: when you reach the end of the slice, you reach the end of the file (provided ReadFile did not return an error).
A slice has no concept of end of file. The slice returned by ioutil.ReadFile has a specific length, which reflects the size of the file it was read from. A common idiom, but only one of the possible used in this case, is to range the slice, effectively "consuming" all of the bytes, originally sitting in the file:
for i, b := range spoon {
// At index 'i' is byte 'b'
// At file's offset 'i', 'b' was read
... do something useful here
}

Resources