Comparing the same value in ioutil package? - go

I'm puzzled by what this line of code does from the ioutil package. It appears to compare the same value twice but casts it twice on one side. Any insights would be greatly appreciated!
int64(int(capacity)) == capacity
from this function
func readAll(r io.Reader, capacity int64) (b []byte, err error) {
var buf bytes.Buffer
// If the buffer overflows, we will get bytes.ErrTooLarge.
// Return that as an error. Any other panic remains.
defer func() {
e := recover()
if e == nil {
return
}
if panicErr, ok := e.(error); ok && panicErr == bytes.ErrTooLarge {
err = panicErr
} else {
panic(e)
}
}()
if int64(int(capacity)) == capacity {
buf.Grow(int(capacity))
}
_, err = buf.ReadFrom(r)
return buf.Bytes(), err
}

Converting Tim Cooper's comment into an answer.
bytes.Buffer.Grow takes in an int and capacity is int64.
func (b *Buffer) Grow(n int)
Grow grows the buffer's capacity, if necessary, to guarantee space for
another n bytes. After Grow(n), at least n bytes can be written to the
buffer without another allocation.
As mentioned in GoDoc, Grow is used as an optimisation to prevent further allocations.
int64(int(capacity)) == capacity
makes sure that capacity is within the range of int values so that the optimisation can be applied.

Related

Binary Encoding/Decoding File in Golang Gives Different Checksum

I'm working on encoding and decoding files in golang. I specifically do need the 2D array that I'm using, this is just test code to show the point. I'm not entirely sure what I'm doing wrong, I'm attempting to convert the file into a list of uint32 numbers and then take those numbers and convert them back to a file. The problem is that when I do it the file looks fine but the checksum doesn't line up. I suspect that I'm doing something wrong in the conversion to uint32. I have to do the switch/case because I have no way of knowing how many bytes I'll read for sure at the end of a given file.
package main
import (
"bufio"
"encoding/binary"
"fmt"
"io"
"os"
)
const (
headerSeq = 8
body = 24
)
type part struct {
Seq int
Data uint32
}
func main() {
f, err := os.Open("speech.pdf")
if err != nil {
panic(err)
}
defer f.Close()
reader := bufio.NewReader(f)
b := make([]byte, 4)
o := make([][]byte, 0)
var value uint32
for {
n, err := reader.Read(b)
if err != nil {
if err != io.EOF {
panic(err)
}
}
if n == 0 {
break
}
fmt.Printf("len array %d\n", len(b))
fmt.Printf("len n %d\n", n)
switch n {
case 1:
value = uint32(b[0])
case 2:
value = uint32(uint32(b[1]) | uint32(b[0])<<8)
case 3:
value = uint32(uint32(b[2]) | uint32(b[1])<<8 | uint32(b[0])<<16)
case 4:
value = uint32(uint32(b[3]) | uint32(b[2])<<8 | uint32(b[1])<<16 | uint32(b[0])<<24)
}
fmt.Println(value)
bs := make([]byte, 4)
binary.BigEndian.PutUint32(bs, value)
o = append(o, bs)
}
fo, err := os.OpenFile("test.pdf", os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0600)
if err != nil {
panic(err)
}
defer fo.Close()
for _, ba := range o {
_, err := fo.Write(ba)
if err != nil {
panic(err)
}
}
}
So, you want to write and read arrays of varying length in a file.
import "encoding/binary"
// You need a consistent byte order for reading and writing multi-byte data types
const order = binary.LittleEndian
var dataToWrite = []byte{ ... ... ... }
var err error
// To write a recoverable array of varying length
var w io.Writer
// First, encode the length of data that will be written
err = binary.Write(w, order, int64(len(dataToWrite)))
// Check error
err = binary.Write(w, order, dataToWrite)
// Check error
// To read a variable length array
var r io.Reader
var dataLen int64
// First, we need to know the length of data to be read
err = binary.Read(r, order, &dataLen)
// Check error
// Allocate a slice to hold the expected amount of data
dataReadIn := make([]byte, dataLen)
err = binary.Read(r, order, dataReadIn)
// Check error
This pattern works not just with byte, but any other fixed size data type. See binary.Write for specifics about the encoding.
If the size of encoded data is a big concern, you can save some bytes by storing the array length as a varint with binary.PutVarint and binary.ReadVarint

Scanner.Buffer - max value has no effect on custom Split?

To reduce the default 64k scanner buffer (for microcomputer with low memory), I try to use this buffer and custom split functions:
scanner.Buffer(make([]byte, 5120), 64)
scanner.Split(Scan64Bytes)
Here I noticed that the second buffer argument "max" has no effect. If I instead insert e.g. 0, 1, 5120 or bufio.MaxScanTokenSize, I can' t see any difference.
Only the first argument "buf" has consequences. Is the capacity to small the scan is incomplete and if it's to large the B/op benchmem value increases.
From the doc:
The maximum token size is the larger of max and cap(buf). If max <= cap(buf), Scan will use this buffer only and do no allocation.
I don't understand which is the correct max value. Can you maybe explain this to me, please?
Go Playground
package main
import (
"bufio"
"bytes"
"fmt"
)
func Scan64Bytes(data []byte, atEOF bool) (advance int, token []byte, err error) {
if len(data) < 64 {
return 0, data[0:], bufio.ErrFinalToken
}
return 64, data[0:64], nil
}
func main() {
// improvised source of the same size:
cmdstd := bytes.NewReader(make([]byte, 5120))
scanner := bufio.NewScanner(cmdstd)
// I guess 64 is the correct max arg:
scanner.Buffer(make([]byte, 5120), 64)
scanner.Split(Scan64Bytes)
for i := 0; scanner.Scan(); i++ {
fmt.Printf("%v: %v\r\n", i, scanner.Bytes())
}
if err := scanner.Err(); err != nil {
fmt.Println(err)
}
}
max value has no effect on custom Split?
No, without split there is the same result. But this wouldn't be possible without split and ErrFinalToken:
//your reader/input
cmdstd := bytes.NewReader(make([]byte, 5120))
// your scanner buffer size
scanner.Buffer(make([]byte, 5120), 64)
The buffer size from the scanner should be larger. This is how I would set buf and max:
scanner.Buffer(make([]byte, 5121), 5120)

How could I read a text file and implement a tree based on those values?

I'm trying to implement DFS (Depth First Search) algorithm using Go, but my actual code need to add node by node to build the tree, manually. I want to read a text file, with this data (example):
75
95 64
17 47 82
18 35 87 10
20 04 83 47 65
And build the tree with these values. The root value will be 75,left 95, right 64 and so on.
This is my complete code:
// Package main implements the DFS algorithm
package main
import (
"bufio"
"flag"
"fmt"
"log"
"os"
"strconv"
"strings"
"sync"
)
// Node handle all the tree data
type Node struct {
Data interface {}
Left *Node
Right *Node
}
// NewNode creates a new node to the tree
func NewNode(data interface{}) *Node {
node := new(Node)
node.Data = data
node.Left = nil
node.Right = nil
return node
}
// FillNodes create all the nodes based on each value on file
func FillNodes(lines *[][]string) {
nodes := *lines
rootInt, _ := strconv.Atoi(nodes[0][0])
root := NewNode(rootInt)
// add the values here
wg.Add(1)
go root.DFS()
wg.Wait()
}
// ProcessNode checks and print the actual node
func (n *Node) ProcessNode() {
defer wg.Done()
var hello []int
for i := 0; i < 10000; i++ {
hello = append(hello, i)
}
fmt.Printf("Node %v\n", n.Data)
}
// DFS calls itself on each node
func (n *Node) DFS() {
defer wg.Done()
if n == nil {
return
}
wg.Add(1)
go n.Left.DFS()
wg.Add(1)
go n.ProcessNode()
wg.Add(1)
go n.Right.DFS()
}
// CheckError handle erros check
func CheckError(err error) {
if err != nil {
log.Fatal(err)
}
}
// OpenFile handle reading data from a text file
func OpenFile() [][]string {
var lines [][]string
ftpr := flag.String("fpath", "pyramid2.txt", "./pyramid2.txt")
flag.Parse()
f, err := os.Open(*ftpr)
CheckError(err)
defer func() {
if err := f.Close(); err != nil {
log.Fatal(err)
}
}()
s := bufio.NewScanner(f)
for s.Scan() {
line := strings.Fields(s.Text())
lines = append(lines, line)
}
err = s.Err()
CheckError(err)
return lines
}
var wg sync.WaitGroup
// Main creates the tree and call DFS
func main() {
nodes := OpenFile()
FillNodes(&nodes)
}
What would be a possible solution to this? Also, How I could convert all those string to int on a easy way?
Here is a method for the creation of the tree (didn't test it):
func FillLevel(parents []*Node, level []string) (children []*Node, err error){
if len(parents) + 1 != len(level) {
return nil, errors.New("params size not OK")
}
for i, p := range parents {
leftVal, err := strconv.Atoi(level[i])
rightVal, err := strconv.Atoi(level[i+1])
if err != nil {
return nil, err
}
p.Left = NewNode(leftVal)
p.Right = NewNode(rightVal)
children = append(children, p.Left)
if i == len(parents) - 1 {
children = append(children, p.Right)
}
}
return children, nil
}
func FillNodes(lines *[][]string) (*Node, error){
nodes := *lines
rootInt, _ := strconv.Atoi(nodes[0][0])
root := NewNode(rootInt)
// add the values here
parents := []*Node{root}
for _, level := range nodes[1:] {
parents, _ = FillLevel(parents, level)
}
return root, nil
}
func main() {
nodes := OpenFile()
r, _ := FillNodes(&nodes)
wg.Add(1)
r.DFS()
wg.Wait()
}
If this is for production, my advice is to TDD it, and to handle all the errors correctly and decide what your software should do about each one of them. You can also write some benchmarks, and then optimize the algorithm using goroutines (if applicable)
The way you're doing right now, you're better off without goroutines:
Imagine you have a huge tree, with 1M nodes, the DFS func will recursively launch 1M goroutines, each one of them has a memory & CPU additional cost without doing much to justify it. You need a better way of splitting the work to do on far fewer goroutines, maybe 10000 nodes per each goroutine.
I would strongly advise you to write a version without goroutines, study it's complexity, write benchmarks to validate the expected complexity. Once you have that, start looking for a strategy to introduce goroutines, and validate that it's more efficient that what you already have.

Tour of Go rot13Reader buffer not being updated after Read function finishes

Here is my implementation of the exercise using strings.Map (the rot13 function is straight out of golang's docs). The issue is that the buffer does not seem to be modified after the Read function returns. Here is the code:
package main
import (
"io"
"os"
"strings"
"fmt"
)
type rot13Reader struct {
r io.Reader
}
func (reader *rot13Reader) Read(b []byte) (int, error) {
rot13 := func(r rune) rune {
switch {
case r >= 'A' && r <= 'Z':
return 'A' + (r-'A'+13)%26
case r >= 'a' && r <= 'z':
return 'a' + (r-'a'+13)%26
}
return r
}
n, err := reader.r.Read(b)
result := []byte(strings.Map(rot13, string(b)))
b = []byte(result)
fmt.Println(string(b))
return n, err
}
func main() {
s := strings.NewReader("Lbh penpxrq gur pbqr!")
r := rot13Reader{s}
io.Copy(os.Stdout, &r)
}
and the output:
You cracked the code!
Lbh penpxrq gur pbqr!You cracked the code!
Clearly the buffer has been modified in the Read function, but it does not seem to be the case after it returns. If I were to comment out the fmt.Println(string(b)), the output would just be:
Lbh penpxrq gur pbqr!
Is there something quirky about Readers and Writers that I should know about?
In Go, all arguments are passed by value, as if by assignment to the parameter or receiver (a shallow copy).
In Go, a slice is implemented as
type slice struct {
array unsafe.Pointer
len int
cap int
}
When the slice is passed by value, after you return, you will not see any changes you make to the copy of the struct fields. You will only see any changes to elements of the underlying array.
In your case, you overwrite b (array, cap, len), a copy.
b = []byte(result)
The copy is is discarded when you return.
What you want to do is change elements of b's array.
For example,
package main
import (
"io"
"os"
"strings"
)
func rot13(b byte) byte {
switch {
case b >= 'A' && b <= 'Z':
return 'A' + (b-'A'+13)%26
case b >= 'a' && b <= 'z':
return 'a' + (b-'a'+13)%26
}
return b
}
type rot13Reader struct {
r io.Reader
}
func (reader *rot13Reader) Read(b []byte) (int, error) {
n, err := reader.r.Read(b)
b = b[:n]
for i := range b {
b[i] = rot13(b[i])
}
return n, err
}
func main() {
s := strings.NewReader("Lbh penpxrq gur pbqr!")
r := rot13Reader{s}
io.Copy(os.Stdout, &r)
}
Playground: https://play.golang.org/p/0LDYmzrrgty
Output:
You cracked the code!
The Go Blog: Go Slices: usage and internals
I am not too sure, so please take the below with between a few grains to several pounds of salt.
First you should add an error check as early as possible:
n, err := reader.r.Read(b)
if err != nil && err == io.EOF {
fmt.Printf("\n%s, %d bytes read", err, n)
return n, err
}
With this added, the output is the one you would expect:
You cracked the code!
Lbh penpxrq gur pbqr!
EOF, 0 bytes read
The reason here is that a reader is supposed to return io.EOF in case there is nothing to read any more.
So why did you experience said strange behavior? A quick look in the source code of io.Copy reveals that b is allocated once and reused. But since b was not modified (no bytes were read) and you accessed it for reading from it, it still held the same values as before. I would argue that the underlying io.Reader should clear b in case nothing is read, as per principle of least surprise, though.

Why using naked return and the normal return give me different results?

I'm playing around with Golang tour and I wonder why using naked return give me the correct result but the normal one doesn't. This is the exercise that I have this problem https://tour.golang.org/methods/12.
The objective is to create a reader that can decipher rot13. and the rot13 function is already tested.
func (r rot13Reader) Read(b []byte) (n int, err error) {
n, err = r.r.Read(b)
for i, v := range b {
b[i] = rot13(v)
}
return
}
The code above give me the correct result.
func (r rot13Reader) Read(b []byte) (int, error) {
for i, v := range b {
b[i] = rot13(v)
}
return r.r.Read(b)
}
And this doesn't change anything from the input stream.
Could anybody explain why? Thank you in advance.
It's not a problem with returns, but in the first case you're reading the data in before transforming it and in the second case you're transforming junk in a buffer and only then reading in the data (and simply passing what has been read from the underlying reader).
While this is not required for correctness, I'd suggest you don't transform the whole buffer every time, but only the portion that has been read, i.e. changing your first example from for i, v := range b to for i, v := range b[:n]. That's because io.Read call is not able to modify length of slice b, but just its content.
Take a look at the documentation of io.Reader, it should give you some more idea on how this interface is expected to work.
The Read() operation mutates the input array b. In the second example the rot13() operations are overwritten by the Read() operation. Furthermore, the rot13() operation is performed before any data has been read into the array, so you're probably doing rot13() on garbage data.
If you wanted the second example to work you'd need to write something like this:
func (r rot13Reader) Read(b []byte) (int, error) {
n, err := r.r.Read(b)
for i, v := range b {
b[i] = rot13(v)
}
return n, err
}

Resources