Assume I have the following structure
type Hdr struct{
Src uint16
Dst uint16
Priotity byte
Pktcnt byte
Opcode byte
Ver byte
}
I have two functions Marshal and Unmarshal that encode Hdr to and from a binary format of:
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Src |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Dst |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Prio | Cnt | Opcode| Ver |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
I'd like to use Go Fuzz to make random, valid Hdr instances, Marshal then to binary, Unmarshal the binary and make sure the output matches the original input.
The main issue I am having is that I cannot figure out how to tell Go Fuzz that fields like Priotity cannot be greater than 15 otherwise they will get truncated when they are marshalled (only 4 bits). How do I set this constraint?
Update
This is just a toy case. There are many times with protocols like the above where something like the opcode would trigger secondary more complex parsing/vetting. Fuzzing could still find very useful issues within a constraint (IE: if Prio 0x00 and Cnt 0x2F secondary parser will error because delimiter is \ ).
EDIT
I'm not sure Fuzzing is a good fit here. Fuzzing is designed to find unexpected inputs: multi-byte UTF8 inputs (valid and non-valid); negative values; huge values, long lengths etc. These will try to catch "edge" cases.
In your case here, you know the:
Unmarshal input payload must be 6 bytes (should error otherwise)
you know precisely your internal "edges"
so vanilla testing.T tests may be a better fit here.
Keep it simple.
If you don't want to "waste" a Fuzz input & you know the input constraints of your code, you can try something like this:
func coerce(h *Hdr) (skip bool) {
h.Priotity &= 0x0f // ensure priority is 0-15
h.OpCode %= 20 // ensure opcode is 0-19
return false // optionally skip this test
}
and in your test - the coerced value can be tested - or skipped (as #jch showed):
import "github.com/google/go-cmp/cmp"
f.Fuzz(func(t *testing.T, src, dst uint16, pri, count, op, ver byte) {
h := Hdr{src, dst, pri, count, op, ver}
if coerce(&h) {
t.Skip()
return
}
bs, err := Marshal(h) // check err
h2, err := Unmarhsal(bs) // check err
if !cmp.Equal(h, h2) {
t.Errorf("Marshal/Unmarshal validation failed for: %+v", h)
}
}
In order to skip uninteresting results, call t.Skip in your fuzzing function. Something like this:
f.Fuzz(func(t *testing.T, b []byte) {
a, err := Unmarshal(b)
if err != nil {
t.Skip()
return
}
c, err := Marshal(a)
if err != nil || !bytes.Equal(b, c) {
t.Errorf("Eek!")
}
})
Related
I'm trying to do direct i/o on linux, so I need to create memory aligned buffers. I copied some code to do it, but I don't understand how it works:
package main
import (
"fmt"
"golang.org/x/sys/unix"
"unsafe"
"yottaStore/yottaStore-go/src/yfs/test/utils"
)
const (
AlignSize = 4096
BlockSize = 4096
)
// Looks like dark magic
func Alignment(block []byte, AlignSize int) int {
return int(uintptr(unsafe.Pointer(&block[0])) & uintptr(AlignSize-1))
}
func main() {
path := "/path/to/file.txt"
fd, err := unix.Open(path, unix.O_RDONLY|unix.O_DIRECT, 0666)
defer unix.Close(fd)
if err != nil {
panic(err)
}
file := make([]byte, 4096*2)
a := Alignment(file, AlignSize)
offset := 0
if a != 0 {
offset = AlignSize - a
}
file = file[offset : offset+BlockSize]
n, readErr := unix.Pread(fd, file, 0)
if readErr != nil {
panic(readErr)
}
fmt.Println(a, offset, offset+utils.BlockSize, len(file))
fmt.Println("Content is: ", string(file))
}
I understand that I'm generating a slice twice as big than what I need, and then extracting a memory aligned block from it, but the Alignment function doesn't make sense to me.
How does the Alignment function works?
If I try to fmt.Println the intermediate steps of that function I get different results, why? I guess because observing it changes its memory alignment (like in quantum physics :D)
Edit:
Example with fmt.println, where I don't need any more alignment:
package main
import (
"fmt"
"golang.org/x/sys/unix"
"unsafe"
)
func main() {
path := "/path/to/file.txt"
fd, err := unix.Open(path, unix.O_RDONLY|unix.O_DIRECT, 0666)
defer unix.Close(fd)
if err != nil {
panic(err)
}
file := make([]byte, 4096)
fmt.Println("Pointer: ", &file[0])
n, readErr := unix.Pread(fd, file, 0)
fmt.Println("Return is: ", n)
if readErr != nil {
panic(readErr)
}
fmt.Println("Content is: ", string(file))
}
Your AlignSize has a value of a power of 2. In binary representation it contains a 1 bit followed by full of zeros:
fmt.Printf("%b", AlignSize) // 1000000000000
A slice allocated by make() may have a memory address that is more or less random, consisting of ones and zeros following randomly in binary; or more precisely the starting address of its backing array.
Since you allocate twice the required size, that's a guarantee that the backing array will cover an address space that has an address in the middle somewhere that ends with as many zeros as the AlignSize's binary representation, and has BlockSize room in the array starting at this. We want to find this address.
This is what the Alignment() function does. It gets the starting address of the backing array with &block[0]. In Go there's no pointer arithmetic, so in order to do something like that, we have to convert the pointer to an integer (there is integer arithmetic of course). In order to do that, we have to convert the pointer to unsafe.Pointer: all pointers are convertible to this type, and unsafe.Pointer can be converted to uintptr (which is an unsigned integer large enough to store the uninterpreted bits of a pointer value), on which–being an integer–we can perform integer arithmetic.
We use bitwise AND with the value uintptr(AlignSize-1). Since AlignSize is a power of 2 (contains a single 1 bit followed by zeros), the number one less is a number whose binary representation is full of ones, as many as trailing zeros AlignSize has. See this example:
x := 0b1010101110101010101
fmt.Printf("AlignSize : %22b\n", AlignSize)
fmt.Printf("AlignSize-1 : %22b\n", AlignSize-1)
fmt.Printf("x : %22b\n", x)
fmt.Printf("result of & : %22b\n", x&(AlignSize-1))
Output:
AlignSize : 1000000000000
AlignSize-1 : 111111111111
x : 1010101110101010101
result of & : 110101010101
So the result of & is the offset which if you subtract from AlignSize, you get an address that has as many trailing zeros as AlignSize itself: the result is "aligned" to the multiple of AlignSize.
So we will use the part of the file slice starting at offset, and we only need BlockSize:
file = file[offset : offset+BlockSize]
Edit:
Looking at your modified code trying to print the steps: I get an output like:
Pointer: 0xc0000b6000
Unsafe pointer: 0xc0000b6000
Unsafe pointer, uintptr: 824634466304
Unpersand: 0
Cast to int: 0
Return is: 0
Content is:
Note nothing is changed here. Simply the fmt package prints pointer values using hexadecimal representation, prefixed by 0x. uintptr values are printed as integers, using decimal representation. Those values are equal:
fmt.Println(0xc0000b6000, 824634466304) // output: 824634466304 824634466304
Also note the rest is 0 because in my case 0xc0000b6000 is already a multiple of 4096, in binary it is 1100000000000000000100001110000000000000.
Edit #2:
When you use fmt.Println() to debug parts of the calculation, that may change escape analysis and may change the allocation of the slice (from stack to heap). This depends on the used Go version too. Do not rely on your slice being allocated at an address that is (already) aligned to AlignSize.
See related questions for more details:
Mix print and fmt.Println and stack growing
why struct arrays comparing has different result
Addresses of slices of empty structs
The Problem:
Right now, I'm logging my SQL query and the args that related to that query, but what will happen if my args weight a lot? say 100MB?
The Solution:
I want to iterate over the args and once they exceeded the 0.5MB I want to take the args up till this point and only log them (of course I'll use the entire args set in the actual SQL query).
Where am stuck:
I find it hard to find the size on the disk of an interface{}.
How can I print it? (there is a nicer way to do it than %v?)
The concern is mainly focused on the first section, how can I find the size, I need to know the type, if its an array, stack, heap, etc..
If code helps, here is my code structure (everything sits in dal pkg in util file):
package dal
import (
"fmt"
)
const limitedLogArgsSizeB = 100000 // ~ 0.1MB
func parsedArgs(args ...interface{}) string {
currentSize := 0
var res string
for i := 0; i < len(args); i++ {
currentEleSize := getSizeOfElement(args[i])
if !(currentSize+currentEleSize =< limitedLogArgsSizeB) {
break
}
currentSize += currentEleSize
res = fmt.Sprintf("%s, %v", res, args[i])
}
return "[" + res + "]"
}
func getSizeOfElement(interface{}) (sizeInBytes int) {
}
So as you can see I expect to get back from parsedArgs() a string that looks like:
"[4378233, 33, true]"
for completeness, the query that goes with it:
INSERT INTO Person (id,age,is_healthy) VALUES ($0,$1,$2)
so to demonstrate the point of all of this:
lets say the first two args are equal exactly to the threshold of the size limit that I want to log, I will only get back from the parsedArgs() the first two args as a string like this:
"[4378233, 33]"
I can provide further details upon request, Thanks :)
Getting the memory size of arbitrary values (arbitrary data structures) is not impossible but "hard" in Go. For details, see How to get memory size of variable in Go?
The easiest solution could be to produce the data to be logged in memory, and you can simply truncate it before logging (e.g. if it's a string or a byte slice, simply slice it). This is however not the gentlest solution (slower and requires more memory).
Instead I would achieve what you want differently. I would try to assemble the data to be logged, but I would use a special io.Writer as the target (which may be targeted at your disk or at an in-memory buffer) which keeps track of the bytes written to it, and once a limit is reached, it could discard further data (or report an error, whatever suits you).
You can see a counting io.Writer implementation here: Size in bits of object encoded to JSON?
type CounterWr struct {
io.Writer
Count int
}
func (cw *CounterWr) Write(p []byte) (n int, err error) {
n, err = cw.Writer.Write(p)
cw.Count += n
return
}
We can easily change it to become a functional limited-writer:
type LimitWriter struct {
io.Writer
Remaining int
}
func (lw *LimitWriter) Write(p []byte) (n int, err error) {
if lw.Remaining == 0 {
return 0, io.EOF
}
if lw.Remaining < len(p) {
p = p[:lw.Remaining]
}
n, err = lw.Writer.Write(p)
lw.Remaining -= n
return
}
And you can use the fmt.FprintXXX() functions to write into a value of this LimitWriter.
An example writing to an in-memory buffer:
buf := &bytes.Buffer{}
lw := &LimitWriter{
Writer: buf,
Remaining: 20,
}
args := []interface{}{1, 2, "Looooooooooooong"}
fmt.Fprint(lw, args)
fmt.Printf("%d %q", buf.Len(), buf)
This will output (try it on the Go Playground):
20 "[1 2 Looooooooooooon"
As you can see, our LimitWriter only allowed to write 20 bytes (LimitWriter.Remaining), and the rest were discarded.
Note that in this example I assembled the data in an in-memory buffer, but in your logging system you can write directly to your logging stream, just wrap it in LimitWriter (so you can completely omit the in-memory buffer).
Optimization tip: if you have the arguments as a slice, you may optimize the truncated rendering by using a loop, and stop printing arguments once the limit is reached.
An example doing this:
buf := &bytes.Buffer{}
lw := &LimitWriter{
Writer: buf,
Remaining: 20,
}
args := []interface{}{1, 2, "Loooooooooooooooong", 3, 4, 5}
io.WriteString(lw, "[")
for i, v := range args {
if _, err := fmt.Fprint(lw, v, " "); err != nil {
fmt.Printf("Breaking at argument %d, err: %v\n", i, err)
break
}
}
io.WriteString(lw, "]")
fmt.Printf("%d %q", buf.Len(), buf)
Output (try it on the Go Playground):
Breaking at argument 3, err: EOF
20 "[1 2 Loooooooooooooo"
The good thing about this is that once we reach the limit, we don't have to produce the string representation of the remaining arguments that would be discarded anyway, saving some CPU (and memory) resources.
I'm trying to understand why my code in Go doesn't work the way I thought it would. When I execute this test, it fails:
func TestConversion(t *testing.T) {
type myType struct {
a uint8
value uint64
}
myVar1 := myType{a: 1, value: 12345}
var copyFrom []byte
copyFromHeader := (*reflect.SliceHeader)(unsafe.Pointer(©From))
copyFromHeader.Data = uintptr(unsafe.Pointer(&myVar1))
copyFromHeader.Cap = 9
copyFromHeader.Len = 9
copyTo := make([]byte, len(copyFrom))
for i := range copyFrom {
copyTo[i] = copyFrom[i]
}
myVar2 := (*myType)(unsafe.Pointer(©From[0]))
myVar3 := (*myType)(unsafe.Pointer(©To[0]))
if myVar2.value != myVar3.value {
t.Fatalf("Expected myVar3.value to be %d, but it is %d", myVar2.value, myVar3.value)
}
}
The output will be:
slab_test.go:67: Expected myVar3.value to be 12345, but it is 57
However, if I increase copyFromHeader.Data by 1 before the copying of the data, then it all works fine. Like this:
copyFromHeader.Data = uintptr(unsafe.Pointer(&myVar1)) + 1
I don't understand why it seems to shift the underlying data by one byte.
There are 7 padding bytes between a and value. You're only getting the least significant byte of 12345 (57) in value. When you move copyFrom down by one byte, the values of myVar2.value and myVar3.value are both 48 (the second byte of 12345), so your test passes. It should work if you change 9 to 16.
Is there some particular reason you're copying the struct that way?
I'm using https://github.com/coocood/freecache to cache database results, but currently I need to dump bigger chunks on every delete, which costs multiple microseconds extra compared to targeted deletion. fmt.Sprintf("%d_%d_%d") for a pattern like #SUBJECT_#ID1_#ID2 also costs multiple microseconds. Even tho that doesn't sound like much, in the current ratio of the cache's response time that a multitude slower than it currently is.
I was thinking of using the library's SetInt/GetInt which works with int64 keys instead of strings.
So let's say I'm storing in a #SUBJECT_#ID1_#ID2 pattern. The Subject is a table or query-segment-range in my code (e.a. everything concern ACL or Productfiltering).
Let's take an example of Userright.id is #ID1 and User.id is #ID2 and Subject ACL. I would build it as something like this:
// const CACHE_SUBJECT_ACL = 0x1
// var userrightID int64 = 0x1
// var userID int64 = 0x1
var storeKey int64 = 0x1000000101
fmt.Println("Range: ", storeKey&0xff)
fmt.Println("ID1 : ", storeKey&0xfffffff00-0xff)
fmt.Println("ID2 : ", storeKey&0x1fffffff00000000-0xfffffffff)
How can I compile the CACHE_SUBJECT_ACL/userrightID/userID into the storeKey?
I know I can call userrightID 0x100000001, but it's a dynamic value so I'm not sure what's the best way to compile this without causing more overhead than formatting the string as a key.
The idea is that in a later state when I need to flush the cache I can call a small range of int64 calls instead of just dumping a whole partition (of maybe thousands of entries).
I was thinking of adding them to each other with bit shifting, like userID<<8, but I'm not sure if that's the safe route.
If I failed to supply enough information, please ask.
Packing numbers to an int64
If we can make sure the numbers we want to pack are not negative and they fit into the bit range we're reserving for them, then yes, this is a safe and efficient way to pack them.
An int64 has 64 bits, that's how many we can assign to the parts we want to pack into it. Often the sign bit is not used to avoid confusion, or an unsigned version uint64 is used.
For example, if we reserve 8 bits for subject, that leaves 64-8=56 bits for the rest, 28 bits for each ID.
| ID2 | ID1 |SUB|
Encoded key bits: |f f f f f f f|f f f f f f f|f f|
Note that when encoding, it's recommended to also use a bitmask with bitwise AND to make sure the numbers we pack do not overlap (arguable, because if the components are bigger, we're screwed anyway...).
Also note that if we're also using the sign bit (63th), we have to apply masking after the bitshift when decoding, as shifting right "brings in" the sign bit and not 0 (sign bit is 1 in case of negative numbers).
Since we used 28 bits for both ID1 and ID2, we can use the same mask for both IDs:
Use these short utility functions which get the job done:
const (
maskSubj = 0xff
maskId = 0xfffffff
)
func encode(subj, id1, id2 int64) int64 {
return subj&maskSubj | (id1&maskId)<<8 | (id2&maskId)<<36
}
func decode(key int64) (sub, id1, id2 int64) {
return key & maskSubj, (key >> 8) & maskId, (key >> 36) & maskId
}
Testing it:
key := encode(0x01, 0x02, 0x04)
fmt.Printf("%016x\n", key)
fmt.Println(decode(key))
Output (try it on the Go Playground):
0000004000000201
1 2 4
Sticking to string
Originally you explored packing into an int64 because fmt.Sprintf() was slow. Note that Sprintf() uses a format string, and it takes time to parse the format string and format the arguments according to the "rules" laid out in the format string.
But in your case we don't need this. We can simply get what you originally wanted like this:
id2, id1, subj := 0x04, 0x02, 0x01
key := fmt.Sprint(id2, "_", id1, "_", subj)
fmt.Println(key)
Output:
4_2_1
This one will be significantly faster as it doesn't have to process a format string, it will just concatenate the arguments.
We can even do better; if none of 2 arguments being next to each other are string values, a space is automatically inserted, so it's really enough to just list the numbers:
key = fmt.Sprint(id2, id1, subj)
fmt.Println(key)
Output:
4 2 1
Try these on the Go Playground.
Utilizing fmt.AppendInt()
We can improve it further by using fmt.AppendInt(). This function appends the textual representation of an integer to a byte slice. We can use base 16 so we'll have more compact representation, and also because the algorithm to convert a number to base 16 is faster than to base 10:
func encode(subj, id1, id2 int64) string {
b := make([]byte, 0, 20)
b = strconv.AppendInt(b, id2, 16)
b = append(b, '_')
b = strconv.AppendInt(b, id1, 16)
b = append(b, '_')
b = strconv.AppendInt(b, subj, 16)
return string(b)
}
Testing it:
id2, id1, subj := int64(0x04), int64(0x02), int64(0x01)
key := encode(subj, id1, id2)
fmt.Println(key)
Output (try it on the Go Playground):
4_2_1
Seemed to have figured it out:
const CacheSubjectACL = 1
var userrightID int64 = 8
var userID int64 = 2
storeKey := CacheSubjectACL + (userrightID << 8) + (userID << 36)
fmt.Println("storeKey: ", storeKey)
fmt.Println("Range : ", storeKey&0xff)
fmt.Println("ID1 : ", storeKey&0xfffffff00>>8)
fmt.Println("ID2 : ", storeKey&0x1ffffff000000000>>36)
Gives:
storeKey: 137438955521
Range : 1
ID1 : 8
ID2 : 2
storeKey builds the int64 masked. And the masking with a new shift the other way around fishes the old values out of the int64 again.
Because storeKey&0x1ffffff000000000>>36 runs to the end anyway, storeKey>>36 will suffice too as there are no bits on the further left.
http://play.golang.org/p/RQXB-hCq_M
type Header struct {
ByteField1 uint32 // 4 bytes
ByteField2 [32]uint8 // 32 bytes
ByteField3 [32]uint8 // 32 bytes
SkipField1 []SomethingElse
}
func main() {
var header Header
headerBytes := make([]byte, 68) // 4 + 32 + 32 == 68
headerBuf := bytes.NewBuffer(headerBytes)
err := binary.Read(headerBuf, binary.LittleEndian, &header)
if err != nil {
fmt.Println(err)
}
fmt.Println(header)
}
I don't want to read from the buffer into the header struct in chunks. I want to read into the bytefield in one step but skip non byte fields. If you run the program in the given link (http://play.golang.org/p/RQXB-hCq_M) you will find that binary.Read to throw an error: binary.Read: invalid type []main.SomethingElse
Is there a way that I can skip this field?
Update:
Based on dommage's answer, I decided to embed the fields inside the struct instead like this
http://play.golang.org/p/i0xfmnPx4A
You can cause a field to be skipped by prefixing it's name with _ (underscore).
But: binary.Read() requires all fields to have a known size. If SkipField1 is of variable or unknown length then you have to leave it out of your struct.
You could then use io.Reader.Read() to manually skip over the skip field portion of your input and then call binary.Read() again.