I had seen several blurbs on the interwebs which had loosely talked about why one should use bufio.Scanner instead of bufio.Reader.
I don't know if my test case is relevant, but I decided to test one vs the other when it comes to reading 1,000,000 lines from a text file:
package main
import (
"fmt"
"strconv"
"bufio"
"time"
"os"
//"bytes"
)
func main() {
fileName := "testfile.txt"
// Create 1,000,000 integers as strings
numItems := 1000000
startInitStringArray := time.Now()
var input [1000000]string
//var input []string
for i:=0; i < numItems; i++ {
input[i] = strconv.Itoa(i)
//input = append(input,strconv.Itoa(i))
}
elapsedInitStringArray := time.Since(startInitStringArray)
fmt.Printf("Took %s to populate string array.\n", elapsedInitStringArray)
// Write to a file
fo, _ := os.Create(fileName)
for i:=0; i < numItems; i++ {
fo.WriteString(input[i] + "\n")
}
fo.Close()
// Use reader
openedFile, _ := os.Open(fileName)
startReader := time.Now()
reader := bufio.NewReader(openedFile)
for i:=0; i < numItems; i++ {
reader.ReadLine()
}
elapsedReader := time.Since(startReader)
fmt.Printf("Took %s to read file using reader.\n", elapsedReader)
openedFile.Close()
// Use scanner
openedFile, _ = os.Open(fileName)
startScanner := time.Now()
scanner := bufio.NewScanner(openedFile)
for i:=0; i < numItems; i++ {
scanner.Scan()
scanner.Text()
}
elapsedScanner := time.Since(startScanner)
fmt.Printf("Took %s to read file using scanner.\n", elapsedScanner)
openedFile.Close()
}
A pretty average output I receive on the timings looks like this:
Took 44.1165ms to populate string array.
Took 17.0465ms to read file using reader.
Took 23.0613ms to read file using scanner.
I am curious, when is it better to use a reader vs. a scanner, and is it based on performance, or functionality?
It's a flawed benchmark. They are not doing the same thing.
func (b *Reader) ReadLine() (line []byte, isPrefix bool, err error)
returns []byte.
func (s *Scanner) Text() string
returns string([]byte)
To be comparable, use,
func (s *Scanner) Bytes() []byte
It's a flawed benchmark. It reads short strings, the integers from "0\n" to "999999\n". What real-world data set looks like that?
In the real world we read Shakespeare: http://www.gutenberg.org/ebooks/100: Plain Text UTF-8: pg100.txt.
Took 2.973307ms to read file using reader. size: 5340315 lines: 124787
Took 2.940388ms to read file using scanner. size: 5340315 lines: 124787
Related
Currently, I'm using the following to format data from my npm script.
npm run startWin | while IFS= read -r line; do printf '%b\n' "$line"; done | less
It works, but my colleagues do not use Linux. So, I would like to implement while IFS= read -r line; do printf '%b\n' "$line"; done in Go, and use the binary in the pipe.
npm run startWin | magical-go-formater
What I tried
package main
import (
"fmt"
"io/ioutil"
"os"
"strings"
)
func main() {
fi, _ := os.Stdin.Stat() // get the FileInfo struct
if (fi.Mode() & os.ModeCharDevice) == 0 {
bytes, _ := ioutil.ReadAll(os.Stdin)
str := string(bytes)
arr := strings.Fields(str)
for _, v := range arr {
fmt.Println(v)
}
}
Currently the program silences any output from the text-stream.
You want to use bufio.Scanner for tail-type reads. IMHO the checks you're doing on os.Stdin are unnecessary, but YMMV.
See this answer for an example. ioutil.ReadAll() (now deprecated, just use io.ReadAll()) reads up to an error/EOF, but it is not a looping input - that's why you want bufio.Scanner.Scan().
Also - %b will convert any escape sequence in the text - e.g. any \n in a passed line will be rendered as a newline - do you need that? B/c go does not have an equivalent format specifier, AFAIK.
EDIT
So I think that, your ReadAll()-based approach would/could have worked...eventually. I am guessing that you were expecting behavior like you get with bufio.Scanner - the receiving process handles bytes as they are written (it's actually a polling operation - see the standard library source for Scan() to see the grimy details).
But ReadAll() buffers everything read and doesn't return until it finally gets either an error or an EOF. I hacked up an instrumented version of ReadAll() (this is an exact copy of the standard library source with just a little bit of additional instrumentation output), and you can see that it's reading as the bytes are written, but it just doesn't return and yield the contents until the writing process is finished, at which time it closes its end of the pipe (its open filehandle), which generates the EOF:
package main
import (
"fmt"
"io"
"os"
"time"
)
func main() {
// os.Stdin.SetReadDeadline(time.Now().Add(2 * time.Second))
b, err := readAll(os.Stdin)
if err != nil {
fmt.Println("ERROR: ", err.Error())
}
str := string(b)
fmt.Println(str)
}
func readAll(r io.Reader) ([]byte, error) {
b := make([]byte, 0, 512)
i := 0
for {
if len(b) == cap(b) {
// Add more capacity (let append pick how much).
b = append(b, 0)[:len(b)]
}
n, err := r.Read(b[len(b):cap(b)])
//fmt.Fprintf(os.Stderr, "READ %d - RECEIVED: \n%s\n", i, string(b[len(b):cap(b)]))
fmt.Fprintf(os.Stderr, "%s READ %d - RECEIVED %d BYTES\n", time.Now(), i, n)
i++
b = b[:len(b)+n]
if err != nil {
if err == io.EOF {
fmt.Fprintln(os.Stderr, "RECEIVED EOF")
err = nil
}
return b, err
}
}
}
I just hacked up a cheap script to generate the input, simulating something long-running and writing only at periodic intervals, how I'd imagine npm is behaving in your case:
#!/bin/sh
for x in 1 2 3 4 5 6 7 8 9 10
do
cat ./main.go
sleep 10
done
As a side note, I find reading the actual standard library code really helpful...or at least interesting in cases like this.
#Sandy Cash was helpful in stating to use Bufio. I don't know why, if what #Jim said is true, but Bufio worked out and ReadAll() didn't.
Thanks for the help.
The code:
package main
import (
"bufio"
"fmt"
"os"
"strings"
)
func main() {
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
s := scanner.Text()
arr := strings.Split(s, `\n`)
for _, v := range arr {
fmt.Println(v)
}
}
}
I try to do something very simple in Go but I do not manage to find any resources.
I receive an hexadump and I want to write it to a file but the content of both files (src and dst) do not match at all. Currently the only way I have find it's to manually add \x every 2 characters.
I tried to loop over my string and add \x the string looks identical but output is very different.
This code manually works:
binary.Write(f, binary.LittleEndian, []byte("\x00\x00\x00\x04\x0A\xFA\x64\xA7\x00\x03\x31\x30"))
But I did not manage to make it from string "000000040afa64a700033130"...
What i currently do (this is what I do in python3):
text := "000000040afa64a700033130"
j := 0
f, _ := os.OpenFile("gotest", os.O_WRONLY|os.O_CREATE, 0600)
for i := 0; i < len(text); i += 2 {
if (i + 2) <= len(text) {
j = i + 2
}
value, _ := strconv.ParseInt(hex, 16, 8)
binary.Write(f, binary.LittleEndian,value)
s = append(s, value)
}
If your hex data is in the from of a string and you want to write the raw bytes you'll have to convert it first, the easier way would be to use hex.Decode.
import (
"encoding/hex"
"io/ioutil"
)
func foo() {
stringData := []byte("48656c6c6f20476f7068657221")
hexData := make([]byte, hex.DecodedLen(len(stringData)))
_, err := hex.Decode(stringData, hexData)
// handle err
err := ioutil.WriteFile("filename", hexData, 0644)
// handle err
}
Based on your use you could swap over to using ioutil.WriteFile. It writes the given byte slice to a file, creating the file if it doesn't exist or truncating it in the case it already exists.
I write a file uploader with Go. I would like to have md5 of the file as a file name when I save it to the disk.
What is the best way to solve this problem?
I save a file this way:
reader, _ := r.MultipartReader()
p, _ := reader.NextPart()
f, _ := os.Create("./filename") // here I need md5 as a file name
defer f.Close()
lmt := io.LimitReader(p, maxSize + 1)
written, _ := io.Copy(f, lmt)
if written > maxSize {
os.Remove(f.Name())
}
here is an example using io.TeeReader to perform both computation and copy at same time
https://play.golang.org/p/IJJQiaeTOBh
package main
import (
"crypto/sha256"
"fmt"
"io"
"os"
"strings"
)
func main() {
var s io.Reader = strings.NewReader("some data")
// maxSize := 4096
// s = io.LimitReader(s, maxSize + 1)
h := sha256.New()
tr := io.TeeReader(s, h)
io.Copy(os.Stdout, tr)
fmt.Printf("\n%x", h.Sum(nil))
}
// Output:
//some data
//1307990e6ba5ca145eb35e99182a9bec46531bc54ddf656a602c780fa0240dee
And the comparison test for correctness
$ echo -n "some data" | sha256sum -
1307990e6ba5ca145eb35e99182a9bec46531bc54ddf656a602c780fa0240dee -
Instead of using io.TeeReader I have used io.MultiWriter to create 2 buffers (I will use the first buffer to calculate md5 and the second to write to a file with md5 name)
lmt := io.LimitReader(buf, maxSize + 1)
hash := md5.New()
var buf1, buf2 bytes.Buffer
w := io.MultiWriter(&buf1, &buf2)
if _, err := io.Copy(w, lmt); err != nil {
log.Fatal(err)
}
if _, err := io.Copy(hash, &buf1); err != nil {
log.Fatal(err)
}
fmt.Println("md5 is: ", hex.EncodeToString(hash.Sum(nil)))
// Now we can create file with os.Openfile passing md5 name as an argument + write &buf2 to this file
I liked the solution with TeeReader here, but simplified it like this:
type HashReader struct {
io.Reader
hash.Hash
}
func NewHashReader(r io.Reader, h hash.Hash) HashReader {
return HashReader{io.TeeReader(r, h), h}
}
func NewMD5Reader(r io.Reader) HashReader {
return NewHashReader(r, md5.New())
}
func main() {
dataReader := bytes.NewBufferString("Hello, world!")
hashReader := NewMD5Reader(dataReader)
resultBytes := make([]byte, dataReader.Len())
_, err := hashReader.Read(resultBytes)
if err != nil {
fmt.Println(err)
}
fmt.Println(hex.EncodeToString(hashReader.Sum(nil)))
}
hex-encoded string of md5 looks more familiar to me, but feel free to encode result byte array of hashReader.Sum(nil) as you wish.
P.S. One more thing on the playground example. They assign md5 result on EOF, but definitely not all consumers read until EOF. Since Hash object stores current hash calculation, it is enough to call hashReader.Sum after consumption finishes and use the result.
I’m using something similar in a project and I'm a bit perplexed: why isn't anything being printed?
package main
import (
"fmt"
"encoding/json"
"io"
)
func main() {
m := make(map[string]string)
m["foo"] = "bar"
pr, pw := io.Pipe()
go func() { pw.CloseWithError(json.NewEncoder(pw).Encode(&m)) }()
fmt.Fscan(pr)
}
https://play.golang.org/p/OJT1ZRAnut
Is this a race condition of some sort? I tried removing pw.CloseWithError but it changes nothing.
fmt.Fscan takes two arguments. A reader to read from, and one or more pointers to objects to populate. Its result is (n int, err error), where n is the number of items read, and err is the reason why n is less than the (variadic...) slice of data objects you fed into its second argument.
In this case, the slice of data objects is length zero, so Fscan fills zero objects and reads no data. It dutifully reports that it scanned 0 objects, and since that number is not less than the number of objects you passed into it, it reports nil error.
Try the following:
func main() {
m := make(map[string]string)
m["foo"] = "bar"
pr, pw := io.Pipe()
go func() { pw.CloseWithError(json.NewEncoder(pw).Encode(&m)) }()
var s string
n, err := fmt.Fscan(pr, &s)
fmt.Println(n, err) // should be 1 <nil>
fmt.Println(s) // should be {"foo":"bar"}
}
I'm trying to read from Stdin in Golang as I'm trying to implement a driver for Erlang. I have the following code:
package main
import (
"fmt"
"os"
"bufio"
"time"
)
func main() {
go func() {
stdout := bufio.NewWriter(os.Stdin)
p := []byte{121,100,125,'\n'}
stdout.Write(p)
}()
stdin := bufio.NewReader(os.Stdin)
values := make([]byte,4,4)
for{
fmt.Println("b")
if read_exact(stdin) > 0 {
stdin.Read(values)
fmt.Println("a")
give_func_write(values)
}else{
continue
}
}
}
func read_exact(r *bufio.Reader) int {
bits := make([]byte,3,3)
a,_ := r.Read(bits)
if a > 0 {
r.Reset(r)
return 1
}
return -1
}
func give_func_write(a []byte) bool {
fmt.Println("Yahu")
return true
}
However it seems that the give_func_write is never reached. I tried to start a goroutine to write to standard input after 2 seconds to test this.
What am I missing here?
Also the line r.Reset(r). Is this valid in go? What I tried to achieve is simply restart the reading from the beginning of the file. Is there a better way?
EDIT
After having played around I was able to find that the code is stuck at a,_ := r.Read(bits) in the read_exact function
I guess that I will need to have a protocol in which I send a \n to
make the input work and at the same time discard it when reading it
No, you don't. Stdin is line-buffered only if it's bound to terminal. You can run your program prog < /dev/zero or cat file | prog.
bufio.NewWriter(os.Stdin).Write(p)
You probably don't want to write to stdin. See "Writing to stdin and reading from stdout" for details.
Well, it's not particular clear for me what you're trying to achieve. I'm assuming, that you just want to read data from stdin by fixed-size chunks. Use io.ReadFull for this. Or if you want to use buffers, you can use Reader.Peek or Scanner to ensure, that specific number of bytes is available. I've changed your program to demonstrate the usage of io.ReadFull:
package main
import (
"fmt"
"io"
"time"
)
func main() {
input, output := io.Pipe()
go func() {
defer output.Close()
for _, m := range []byte("123456") {
output.Write([]byte{m})
time.Sleep(time.Second)
}
}()
message := make([]byte, 3)
_, err := io.ReadFull(input, message)
for err == nil {
fmt.Println(string(message))
_, err = io.ReadFull(input, message)
}
if err != io.EOF {
panic(err)
}
}
You can easily split it in two programs and test it that way. Just change input to os.Stdin.