LimitedReader reads only once - go

I'm trying to understand Go by studying gopl book. I'm stuck when trying to implement the LimitReader function. I realized that I have two problems so let me separate them.
First issue
The description from official doc is saying that:
A LimitedReader reads from R but limits the amount of data returned to just N bytes. Each call to Read updates N to reflect the new amount remaining. Read returns EOF when N <= 0 or when the underlying R returns EOF.
OK, so my understanding is that I can read from io.Reader type many times but I will be always limited to N bytes. Running this code shows me something different:
package main
import (
"fmt"
"io"
"log"
"strings"
)
func main() {
r := strings.NewReader("some io.Reader stream to be read\n")
lr := io.LimitReader(r, 4)
b := make([]byte, 7)
n, err := lr.Read(b)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Read %d bytes: %s\n", n, b)
b = make([]byte, 5)
n, _ = lr.Read(b)
// If removed because EOF
fmt.Printf("Read %d bytes: %s\n", n, b)
}
// Output:
// Read 4 bytes: some
// Read 0 bytes:
// I expect next 4 bytes instead
It seems that this type of object is able to read only once. Not quite sure but maybe this line in io.go source code could be changed to l.N = 0. The main question is why this code is inconsistent with doc description?
Second issue
When I've struggled with the first issue I was trying to display current N value. If I put fmt.Println(lr.N) to the code above it cannot be compiled lr.N undefined (type io.Reader has no field or method N). I realized that I still don't understand Go interfaces concept.
Here is my POV (based on listing above). Using io.LimitReader function I create LimitedReader object (see source code). Due to the fact that this object contains Read method with proper signature its interface type is io.Reader. That's is the reason why io.LimitReader returns io.Reader, right? OK, so everything works together.
The question is: why lr.N cannot be accessed? As I correctly understood the book, interface type only requires that data type contains some method(s). Nothing more.

LimitedReader limits the total size of data that can be read, not the amount of data that can be read at each read call. That is, if you set the limit to 4, you can perform 4 reads of 1 byte, or 1 read of 4 bytes, and after that, all reads will fail.
For your second question: lr is an io.Reader, so you cannot read lr.N. However, you can access the underlying concrete type using a type assertion: lr.(*io.LimitedReader).N should work.

Related

How is this code generating memory aligned slices?

I'm trying to do direct i/o on linux, so I need to create memory aligned buffers. I copied some code to do it, but I don't understand how it works:
package main
import (
"fmt"
"golang.org/x/sys/unix"
"unsafe"
"yottaStore/yottaStore-go/src/yfs/test/utils"
)
const (
AlignSize = 4096
BlockSize = 4096
)
// Looks like dark magic
func Alignment(block []byte, AlignSize int) int {
return int(uintptr(unsafe.Pointer(&block[0])) & uintptr(AlignSize-1))
}
func main() {
path := "/path/to/file.txt"
fd, err := unix.Open(path, unix.O_RDONLY|unix.O_DIRECT, 0666)
defer unix.Close(fd)
if err != nil {
panic(err)
}
file := make([]byte, 4096*2)
a := Alignment(file, AlignSize)
offset := 0
if a != 0 {
offset = AlignSize - a
}
file = file[offset : offset+BlockSize]
n, readErr := unix.Pread(fd, file, 0)
if readErr != nil {
panic(readErr)
}
fmt.Println(a, offset, offset+utils.BlockSize, len(file))
fmt.Println("Content is: ", string(file))
}
I understand that I'm generating a slice twice as big than what I need, and then extracting a memory aligned block from it, but the Alignment function doesn't make sense to me.
How does the Alignment function works?
If I try to fmt.Println the intermediate steps of that function I get different results, why? I guess because observing it changes its memory alignment (like in quantum physics :D)
Edit:
Example with fmt.println, where I don't need any more alignment:
package main
import (
"fmt"
"golang.org/x/sys/unix"
"unsafe"
)
func main() {
path := "/path/to/file.txt"
fd, err := unix.Open(path, unix.O_RDONLY|unix.O_DIRECT, 0666)
defer unix.Close(fd)
if err != nil {
panic(err)
}
file := make([]byte, 4096)
fmt.Println("Pointer: ", &file[0])
n, readErr := unix.Pread(fd, file, 0)
fmt.Println("Return is: ", n)
if readErr != nil {
panic(readErr)
}
fmt.Println("Content is: ", string(file))
}
Your AlignSize has a value of a power of 2. In binary representation it contains a 1 bit followed by full of zeros:
fmt.Printf("%b", AlignSize) // 1000000000000
A slice allocated by make() may have a memory address that is more or less random, consisting of ones and zeros following randomly in binary; or more precisely the starting address of its backing array.
Since you allocate twice the required size, that's a guarantee that the backing array will cover an address space that has an address in the middle somewhere that ends with as many zeros as the AlignSize's binary representation, and has BlockSize room in the array starting at this. We want to find this address.
This is what the Alignment() function does. It gets the starting address of the backing array with &block[0]. In Go there's no pointer arithmetic, so in order to do something like that, we have to convert the pointer to an integer (there is integer arithmetic of course). In order to do that, we have to convert the pointer to unsafe.Pointer: all pointers are convertible to this type, and unsafe.Pointer can be converted to uintptr (which is an unsigned integer large enough to store the uninterpreted bits of a pointer value), on which–being an integer–we can perform integer arithmetic.
We use bitwise AND with the value uintptr(AlignSize-1). Since AlignSize is a power of 2 (contains a single 1 bit followed by zeros), the number one less is a number whose binary representation is full of ones, as many as trailing zeros AlignSize has. See this example:
x := 0b1010101110101010101
fmt.Printf("AlignSize : %22b\n", AlignSize)
fmt.Printf("AlignSize-1 : %22b\n", AlignSize-1)
fmt.Printf("x : %22b\n", x)
fmt.Printf("result of & : %22b\n", x&(AlignSize-1))
Output:
AlignSize : 1000000000000
AlignSize-1 : 111111111111
x : 1010101110101010101
result of & : 110101010101
So the result of & is the offset which if you subtract from AlignSize, you get an address that has as many trailing zeros as AlignSize itself: the result is "aligned" to the multiple of AlignSize.
So we will use the part of the file slice starting at offset, and we only need BlockSize:
file = file[offset : offset+BlockSize]
Edit:
Looking at your modified code trying to print the steps: I get an output like:
Pointer: 0xc0000b6000
Unsafe pointer: 0xc0000b6000
Unsafe pointer, uintptr: 824634466304
Unpersand: 0
Cast to int: 0
Return is: 0
Content is:
Note nothing is changed here. Simply the fmt package prints pointer values using hexadecimal representation, prefixed by 0x. uintptr values are printed as integers, using decimal representation. Those values are equal:
fmt.Println(0xc0000b6000, 824634466304) // output: 824634466304 824634466304
Also note the rest is 0 because in my case 0xc0000b6000 is already a multiple of 4096, in binary it is 1100000000000000000100001110000000000000.
Edit #2:
When you use fmt.Println() to debug parts of the calculation, that may change escape analysis and may change the allocation of the slice (from stack to heap). This depends on the used Go version too. Do not rely on your slice being allocated at an address that is (already) aligned to AlignSize.
See related questions for more details:
Mix print and fmt.Println and stack growing
why struct arrays comparing has different result
Addresses of slices of empty structs

Reading from a file from bufio with a semi complex sequencing through file

So there may be questions like this but its not a super easy thing to google. Basically I have a file thats a set of protobufs encoded and sequenced as they normally are from the protobuf spec.
So think of the bytes values being chunked something like this throughout the file:
[EncodeVarInt(size of protobuf struct)] [protobuf stuct bytes]
So you have a few bytes read one at a time that are used for large jump of a read on our protof structure.
My implementation using the os ReadAt method on a file currently looks something like this.
// getting the next value in a file context feature
func (geobuf *Geobuf_Reader) Next() bool {
if geobuf.EndPos <= geobuf.Pos {
return false
} else {
startpos := int64(geobuf.Pos)
for int(geobuf.Get_Byte(geobuf.Pos)) > 127 {
geobuf.Pos += 1
}
geobuf.Pos += 1
sizebytes := make([]byte,geobuf.Pos-int(startpos))
geobuf.File.ReadAt(sizebytes,startpos)
size,_ := DecodeVarint(sizebytes)
geobuf.Feat_Pos = [2]int{int(size),geobuf.Pos}
geobuf.Pos = geobuf.Pos+int(size)
return true
}
return false
}
// reads a geobuf feature as geojson
func (geobuf *Geobuf_Reader) Feature() *geojson.Feature {
// getting raw bytes
a := make([]byte,geobuf.Feat_Pos[0])
geobuf.File.ReadAt(a,int64(geobuf.Feat_Pos[1]))
return Read_Feature(a)
}
How can I implement something like bufio or other chunked reading mechanisms to speed up so many file ReadAt's? Most bufio implementations I've seen are for having a specific delimitter. Thanks in advance hopefully this wasn't a horrible question.
Package bufio
import "bufio"
type SplitFunc
SplitFunc is the signature of the split function used to tokenize the
input. The arguments are an initial substring of the remaining
unprocessed data and a flag, atEOF, that reports whether the Reader
has no more data to give. The return values are the number of bytes to
advance the input and the next token to return to the user, plus an
error, if any. If the data does not yet hold a complete token, for
instance if it has no newline while scanning lines, SplitFunc can
return (0, nil, nil) to signal the Scanner to read more data into the
slice and try again with a longer slice starting at the same point in
the input.
If the returned error is non-nil, scanning stops and the error is
returned to the client.
The function is never called with an empty data slice unless atEOF is
true. If atEOF is true, however, data may be non-empty and, as always,
holds unprocessed text.
type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
Use bufio.Scanner and write a custom protobuf struct SplitFunc.

Golang high cpu usage on simple webserver unable to understand why?

So I have a simple net/http webserver. All it does is is deliver 100MB of random bytes, which I intend to use for network speed testing. My handler for the 100mb endpoint is really simple (pasted below). The code works fine and I get my random byte file, the problem is when I run this and someone downloads these 100megabytes, the CPU for this program shoots up to 150% and stays there until this handler finishes running. Am I doing something very wrong here? What could I do to improve this handler's performance?
func downloadHandler(w http.ResponseWriter, r *http.Request) {
str := RandStringBytes(8192); //generates 8192 bytes of randomness
sz := 1000*1000*100; //100Megabytes
iter := sz/len(str)+1;
w.Header().Set("Content-Type", "application/octet-stream")
w.Header().Set("Content-Length", strconv.Itoa( sz ))
for i := 0; i < iter ; i++ {
fmt.Fprintf(w, str )
}
}
The problem is that fmt.Fprintf() expects a format string:
func Fprintf(w io.Writer, format string, a ...interface{}) (n int, err error)
And you pass it a big, 8 KB format string. The fmt package has to analyze the format string, it is not something that gets to the output as is. Most definately this is what is eating your CPU.
If the random string contains the special % sign, that even makes your case worse, as then fmt.Fprintf() might expect further arguments which you don't "deliver", so the fmt package also has to (will) include error messages in the output, such as:
fmt.Fprintf(os.Stdout, "aaa%bbb%d")
Output:
aaa%!b(MISSING)bb%!d(MISSING)
Use fmt.Fprint() instead which does not expect a format string:
fmt.Fprint(w, str)
Or even better, convert your random string to a byte slice once, and just keep writing that:
data := []byte(str)
for i := 0; i < iter; i++ {
if _, err := w.Write(data); err != nil {
// Handle error, e.g. return
}
}
Delivering large amount of data – you won't get a faster solution than writing a prepared byte slice in a loop (maybe slightly if you vary the size of the slice). If your solution is still "slow", that might be due to your RandStringBytes() function which we don't know anything about, or your output might be compressed (gzipped) if you use other handlers or some framework (which does use relatively high CPU). Also if the client that receives the response is also on your computer (e.g. a browser), it –or a firewall / antivirus software– may check / analyze the response for malicious code (which may also be resource intensive).

Under what circumstances would the two return values of crypto/rand read() ever be useful?

The typical usage of crypto/rand goes something like this:
salt := make([]byte, saltLength)
n,err := rand.Read(salt)
Which fills the byte slice I have labeled "salt" here with a sequence of random bytes.
Under what circumstances might the random number generator fail? Would it be insecure to fall back to a math/rand equivalent in the event that err is not nil?
Since the length of the byte slice is already known, n also seems useless to me, is there any reason I wouldn't just use _,err in its place?
To be safe your code should look more like this:
package main
import (
"crypto/rand"
"fmt"
)
func main() {
saltLength := 16
salt := make([]byte, saltLength)
n, err := rand.Read(salt[:cap(salt)])
if err != nil {
// handle error
}
salt = salt[:n]
if len(salt) != saltLength {
// handle error
}
fmt.Println(len(salt), salt)
}
Output:
16 [191 235 81 37 175 238 93 202 230 158 41 199 202 85 67 209]
n may be less than len(salt) if insufficient entropy is available. You should always check for errors.
For example, one of the many ways to obtain a sequence of random numbers is the getrandom system call on Linux or the CryptGenRandom API call on Windows.
References:
random: introduce getrandom(2) system call
CryptGenRandom function
ADDENDUM:
The crypto/rand package is a cryptographically secure pseudorandom number generator. Package math/rand is not cryptographically secure.
There are too many paths in even a simple program to test them all. Therefore, the only way to write programs with zero defects and zero bugs is to write readable, maintainable code that is provably correct. Systematic Programming by Niklaus Wirth is a good primer. It's worthwhile to spend time on constructing a robust general form, which can easily be adapted to each special case and that is easily maintainable as requirements change.
For example, for the io.Reader interface, typical usage is a looping pattern.
func Reader(rdr io.Reader) error {
bufLen := 256
buf := make([]byte, bufLen)
for {
n, err := rdr.Read(buf[:cap(buf)])
if n == 0 {
if err == nil {
continue
}
if err == io.EOF {
break
}
return err
}
buf = buf[:n]
// process read buffer
if err != nil && err != io.EOF {
return err
}
}
return nil
}
type Reader
type Reader interface {
Read(p []byte) (n int, err error)
}
Reader is the interface that wraps the basic Read method.
Read reads up to len(p) bytes into p. It returns the number of bytes
read (0 <= n <= len(p)) and any error encountered. Even if Read
returns n < len(p), it may use all of p as scratch space during the
call. If some data is available but not len(p) bytes, Read
conventionally returns what is available instead of waiting for more.
When Read encounters an error or end-of-file condition after
successfully reading n > 0 bytes, it returns the number of bytes read.
It may return the (non-nil) error from the same call or return the
error (and n == 0) from a subsequent call. An instance of this general
case is that a Reader returning a non-zero number of bytes at the end
of the input stream may return either err == EOF or err == nil. The
next Read should return 0, EOF regardless.
Callers should always process the n > 0 bytes returned before
considering the error err. Doing so correctly handles I/O errors that
happen after reading some bytes and also both of the allowed EOF
behaviors.
Implementations of Read are discouraged from returning a zero byte
count with a nil error, and callers should treat that situation as a
no-op.
We only want to allocate the buffer once, before we start the Read loop. However, we want the compiler and runtime to detect if we stray outside the valid buffer length n in the Read loop, so we write buf = buf[:n]. However, when we loop to the next Read we explicitly want the full buffer: buf[:cap(buf).
It's never wrong to write Read(buf[:cap(buf)]). Even though you may not have a Read loop now, you may add one later, and you may forget to reset the buffer length. There may be special case for a particular Read implementation, like an underlying ReadFull. Now you have to read and monitor the underlying code to prove that your code is correct. Documentation is not always reliable. And you can't safely switch to another io.Reader Read implementation.
When you access the salt slice, salt[:len(salt)], you are using len(salt) not n. If they differ, you have a bug.
"implementations should follow a general principle of robustness: be
conservative in what you do, be liberal in what you accept from
others." Jon
Postel

How to read the first four bytes of a file, using Go?

I'm learning Go, and am trying to read the first four bytes of a file. I'm wanting to check if the file contains a specific file header that I'm looking for. My code does not display the bytes that I'm expecting, though. Does anybody know why the following code might not work? It does read in some bytes, but they're not bytes I recognized or expected to see. They're not random or anything, because they're the same every time I run it, so it's probably a pointer to something else or something.
Also, I realize I'm ignoring errors but that's because I went into hack-mode while this wasn't working and removed as much of the cruft as I could, trying to get down to the issue.
package main
import (
"os"
"io"
"fmt"
)
type RoflFile struct {
identifier []byte
}
func main() {
arguments := os.Args[1:]
if len(arguments) != 1 {
fmt.Println("Usage: <path-to-rofl>")
return
}
inputfile := arguments[0]
if _, err := os.Stat(inputfile); os.IsNotExist(err) {
fmt.Printf("Error: the input file could not be found: %s", inputfile)
return
}
rofl := new(RoflFile)
rofl.identifier = make([]byte, 4)
// open the input file so that we can pull out data
f, _ := os.Open(inputfile)
// read in the file identifier
io.ReadAtLeast(f, rofl.identifier, 4)
f.Close()
fmt.Printf("Got: %+v", rofl)
}
When I run your code against an input file beginning with "9876", I get:
Got: &{identifier:[57 56 55 54]}
When run against an input file beginning with "1234", I get:
Got: &{identifier:[49 50 51 52]}
For me, the program works as expected. Either something is going wrong on your system, or you don't realize that you're getting the decimal value of the first four bytes in the file. Were you expecting hex? Or were you expecting to see the bytes interpreted according to some encoding (e.g., ASCII or UTF-8, seeing "9 8 7 6" instead of "57 56 55 54")?
For future reference (or if this didn't answer your question), it's helpful in these situations to include your input file, the output you get on your system, and the output you expected. "They're not bytes I recognized or expected to see" leaves a lot of possibilities on the table.

Resources