Bitmask multiple values in int64 - go

I'm using https://github.com/coocood/freecache to cache database results, but currently I need to dump bigger chunks on every delete, which costs multiple microseconds extra compared to targeted deletion. fmt.Sprintf("%d_%d_%d") for a pattern like #SUBJECT_#ID1_#ID2 also costs multiple microseconds. Even tho that doesn't sound like much, in the current ratio of the cache's response time that a multitude slower than it currently is.
I was thinking of using the library's SetInt/GetInt which works with int64 keys instead of strings.
So let's say I'm storing in a #SUBJECT_#ID1_#ID2 pattern. The Subject is a table or query-segment-range in my code (e.a. everything concern ACL or Productfiltering).
Let's take an example of Userright.id is #ID1 and User.id is #ID2 and Subject ACL. I would build it as something like this:
// const CACHE_SUBJECT_ACL = 0x1
// var userrightID int64 = 0x1
// var userID int64 = 0x1
var storeKey int64 = 0x1000000101
fmt.Println("Range: ", storeKey&0xff)
fmt.Println("ID1 : ", storeKey&0xfffffff00-0xff)
fmt.Println("ID2 : ", storeKey&0x1fffffff00000000-0xfffffffff)
How can I compile the CACHE_SUBJECT_ACL/userrightID/userID into the storeKey?
I know I can call userrightID 0x100000001, but it's a dynamic value so I'm not sure what's the best way to compile this without causing more overhead than formatting the string as a key.
The idea is that in a later state when I need to flush the cache I can call a small range of int64 calls instead of just dumping a whole partition (of maybe thousands of entries).
I was thinking of adding them to each other with bit shifting, like userID<<8, but I'm not sure if that's the safe route.
If I failed to supply enough information, please ask.

Packing numbers to an int64
If we can make sure the numbers we want to pack are not negative and they fit into the bit range we're reserving for them, then yes, this is a safe and efficient way to pack them.
An int64 has 64 bits, that's how many we can assign to the parts we want to pack into it. Often the sign bit is not used to avoid confusion, or an unsigned version uint64 is used.
For example, if we reserve 8 bits for subject, that leaves 64-8=56 bits for the rest, 28 bits for each ID.
| ID2 | ID1 |SUB|
Encoded key bits: |f f f f f f f|f f f f f f f|f f|
Note that when encoding, it's recommended to also use a bitmask with bitwise AND to make sure the numbers we pack do not overlap (arguable, because if the components are bigger, we're screwed anyway...).
Also note that if we're also using the sign bit (63th), we have to apply masking after the bitshift when decoding, as shifting right "brings in" the sign bit and not 0 (sign bit is 1 in case of negative numbers).
Since we used 28 bits for both ID1 and ID2, we can use the same mask for both IDs:
Use these short utility functions which get the job done:
const (
maskSubj = 0xff
maskId = 0xfffffff
)
func encode(subj, id1, id2 int64) int64 {
return subj&maskSubj | (id1&maskId)<<8 | (id2&maskId)<<36
}
func decode(key int64) (sub, id1, id2 int64) {
return key & maskSubj, (key >> 8) & maskId, (key >> 36) & maskId
}
Testing it:
key := encode(0x01, 0x02, 0x04)
fmt.Printf("%016x\n", key)
fmt.Println(decode(key))
Output (try it on the Go Playground):
0000004000000201
1 2 4
Sticking to string
Originally you explored packing into an int64 because fmt.Sprintf() was slow. Note that Sprintf() uses a format string, and it takes time to parse the format string and format the arguments according to the "rules" laid out in the format string.
But in your case we don't need this. We can simply get what you originally wanted like this:
id2, id1, subj := 0x04, 0x02, 0x01
key := fmt.Sprint(id2, "_", id1, "_", subj)
fmt.Println(key)
Output:
4_2_1
This one will be significantly faster as it doesn't have to process a format string, it will just concatenate the arguments.
We can even do better; if none of 2 arguments being next to each other are string values, a space is automatically inserted, so it's really enough to just list the numbers:
key = fmt.Sprint(id2, id1, subj)
fmt.Println(key)
Output:
4 2 1
Try these on the Go Playground.
Utilizing fmt.AppendInt()
We can improve it further by using fmt.AppendInt(). This function appends the textual representation of an integer to a byte slice. We can use base 16 so we'll have more compact representation, and also because the algorithm to convert a number to base 16 is faster than to base 10:
func encode(subj, id1, id2 int64) string {
b := make([]byte, 0, 20)
b = strconv.AppendInt(b, id2, 16)
b = append(b, '_')
b = strconv.AppendInt(b, id1, 16)
b = append(b, '_')
b = strconv.AppendInt(b, subj, 16)
return string(b)
}
Testing it:
id2, id1, subj := int64(0x04), int64(0x02), int64(0x01)
key := encode(subj, id1, id2)
fmt.Println(key)
Output (try it on the Go Playground):
4_2_1

Seemed to have figured it out:
const CacheSubjectACL = 1
var userrightID int64 = 8
var userID int64 = 2
storeKey := CacheSubjectACL + (userrightID << 8) + (userID << 36)
fmt.Println("storeKey: ", storeKey)
fmt.Println("Range : ", storeKey&0xff)
fmt.Println("ID1 : ", storeKey&0xfffffff00>>8)
fmt.Println("ID2 : ", storeKey&0x1ffffff000000000>>36)
Gives:
storeKey: 137438955521
Range : 1
ID1 : 8
ID2 : 2
storeKey builds the int64 masked. And the masking with a new shift the other way around fishes the old values out of the int64 again.
Because storeKey&0x1ffffff000000000>>36 runs to the end anyway, storeKey>>36 will suffice too as there are no bits on the further left.

Related

How is this code generating memory aligned slices?

I'm trying to do direct i/o on linux, so I need to create memory aligned buffers. I copied some code to do it, but I don't understand how it works:
package main
import (
"fmt"
"golang.org/x/sys/unix"
"unsafe"
"yottaStore/yottaStore-go/src/yfs/test/utils"
)
const (
AlignSize = 4096
BlockSize = 4096
)
// Looks like dark magic
func Alignment(block []byte, AlignSize int) int {
return int(uintptr(unsafe.Pointer(&block[0])) & uintptr(AlignSize-1))
}
func main() {
path := "/path/to/file.txt"
fd, err := unix.Open(path, unix.O_RDONLY|unix.O_DIRECT, 0666)
defer unix.Close(fd)
if err != nil {
panic(err)
}
file := make([]byte, 4096*2)
a := Alignment(file, AlignSize)
offset := 0
if a != 0 {
offset = AlignSize - a
}
file = file[offset : offset+BlockSize]
n, readErr := unix.Pread(fd, file, 0)
if readErr != nil {
panic(readErr)
}
fmt.Println(a, offset, offset+utils.BlockSize, len(file))
fmt.Println("Content is: ", string(file))
}
I understand that I'm generating a slice twice as big than what I need, and then extracting a memory aligned block from it, but the Alignment function doesn't make sense to me.
How does the Alignment function works?
If I try to fmt.Println the intermediate steps of that function I get different results, why? I guess because observing it changes its memory alignment (like in quantum physics :D)
Edit:
Example with fmt.println, where I don't need any more alignment:
package main
import (
"fmt"
"golang.org/x/sys/unix"
"unsafe"
)
func main() {
path := "/path/to/file.txt"
fd, err := unix.Open(path, unix.O_RDONLY|unix.O_DIRECT, 0666)
defer unix.Close(fd)
if err != nil {
panic(err)
}
file := make([]byte, 4096)
fmt.Println("Pointer: ", &file[0])
n, readErr := unix.Pread(fd, file, 0)
fmt.Println("Return is: ", n)
if readErr != nil {
panic(readErr)
}
fmt.Println("Content is: ", string(file))
}
Your AlignSize has a value of a power of 2. In binary representation it contains a 1 bit followed by full of zeros:
fmt.Printf("%b", AlignSize) // 1000000000000
A slice allocated by make() may have a memory address that is more or less random, consisting of ones and zeros following randomly in binary; or more precisely the starting address of its backing array.
Since you allocate twice the required size, that's a guarantee that the backing array will cover an address space that has an address in the middle somewhere that ends with as many zeros as the AlignSize's binary representation, and has BlockSize room in the array starting at this. We want to find this address.
This is what the Alignment() function does. It gets the starting address of the backing array with &block[0]. In Go there's no pointer arithmetic, so in order to do something like that, we have to convert the pointer to an integer (there is integer arithmetic of course). In order to do that, we have to convert the pointer to unsafe.Pointer: all pointers are convertible to this type, and unsafe.Pointer can be converted to uintptr (which is an unsigned integer large enough to store the uninterpreted bits of a pointer value), on which–being an integer–we can perform integer arithmetic.
We use bitwise AND with the value uintptr(AlignSize-1). Since AlignSize is a power of 2 (contains a single 1 bit followed by zeros), the number one less is a number whose binary representation is full of ones, as many as trailing zeros AlignSize has. See this example:
x := 0b1010101110101010101
fmt.Printf("AlignSize : %22b\n", AlignSize)
fmt.Printf("AlignSize-1 : %22b\n", AlignSize-1)
fmt.Printf("x : %22b\n", x)
fmt.Printf("result of & : %22b\n", x&(AlignSize-1))
Output:
AlignSize : 1000000000000
AlignSize-1 : 111111111111
x : 1010101110101010101
result of & : 110101010101
So the result of & is the offset which if you subtract from AlignSize, you get an address that has as many trailing zeros as AlignSize itself: the result is "aligned" to the multiple of AlignSize.
So we will use the part of the file slice starting at offset, and we only need BlockSize:
file = file[offset : offset+BlockSize]
Edit:
Looking at your modified code trying to print the steps: I get an output like:
Pointer: 0xc0000b6000
Unsafe pointer: 0xc0000b6000
Unsafe pointer, uintptr: 824634466304
Unpersand: 0
Cast to int: 0
Return is: 0
Content is:
Note nothing is changed here. Simply the fmt package prints pointer values using hexadecimal representation, prefixed by 0x. uintptr values are printed as integers, using decimal representation. Those values are equal:
fmt.Println(0xc0000b6000, 824634466304) // output: 824634466304 824634466304
Also note the rest is 0 because in my case 0xc0000b6000 is already a multiple of 4096, in binary it is 1100000000000000000100001110000000000000.
Edit #2:
When you use fmt.Println() to debug parts of the calculation, that may change escape analysis and may change the allocation of the slice (from stack to heap). This depends on the used Go version too. Do not rely on your slice being allocated at an address that is (already) aligned to AlignSize.
See related questions for more details:
Mix print and fmt.Println and stack growing
why struct arrays comparing has different result
Addresses of slices of empty structs

How to set constraint on input for fuzzing?

Assume I have the following structure
type Hdr struct{
Src uint16
Dst uint16
Priotity byte
Pktcnt byte
Opcode byte
Ver byte
}
I have two functions Marshal and Unmarshal that encode Hdr to and from a binary format of:
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Src |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Dst |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Prio | Cnt | Opcode| Ver |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
I'd like to use Go Fuzz to make random, valid Hdr instances, Marshal then to binary, Unmarshal the binary and make sure the output matches the original input.
The main issue I am having is that I cannot figure out how to tell Go Fuzz that fields like Priotity cannot be greater than 15 otherwise they will get truncated when they are marshalled (only 4 bits). How do I set this constraint?
Update
This is just a toy case. There are many times with protocols like the above where something like the opcode would trigger secondary more complex parsing/vetting. Fuzzing could still find very useful issues within a constraint (IE: if Prio 0x00 and Cnt 0x2F secondary parser will error because delimiter is \ ).
EDIT
I'm not sure Fuzzing is a good fit here. Fuzzing is designed to find unexpected inputs: multi-byte UTF8 inputs (valid and non-valid); negative values; huge values, long lengths etc. These will try to catch "edge" cases.
In your case here, you know the:
Unmarshal input payload must be 6 bytes (should error otherwise)
you know precisely your internal "edges"
so vanilla testing.T tests may be a better fit here.
Keep it simple.
If you don't want to "waste" a Fuzz input & you know the input constraints of your code, you can try something like this:
func coerce(h *Hdr) (skip bool) {
h.Priotity &= 0x0f // ensure priority is 0-15
h.OpCode %= 20 // ensure opcode is 0-19
return false // optionally skip this test
}
and in your test - the coerced value can be tested - or skipped (as #jch showed):
import "github.com/google/go-cmp/cmp"
f.Fuzz(func(t *testing.T, src, dst uint16, pri, count, op, ver byte) {
h := Hdr{src, dst, pri, count, op, ver}
if coerce(&h) {
t.Skip()
return
}
bs, err := Marshal(h) // check err
h2, err := Unmarhsal(bs) // check err
if !cmp.Equal(h, h2) {
t.Errorf("Marshal/Unmarshal validation failed for: %+v", h)
}
}
In order to skip uninteresting results, call t.Skip in your fuzzing function. Something like this:
f.Fuzz(func(t *testing.T, b []byte) {
a, err := Unmarshal(b)
if err != nil {
t.Skip()
return
}
c, err := Marshal(a)
if err != nil || !bytes.Equal(b, c) {
t.Errorf("Eek!")
}
})

How to represent empty byte in Go

Want to have an empty char/byte, which has zero size/length, in Go, such as byte("").
func main() {
var a byte = '' // not working
var a byte = 0 // not working
}
A more specific example is
func removeOuterParentheses(S string) string {
var stack []int
var res []byte
for i, b := range []byte(S) {
if b == '(' {
stack = append(stack, i)
} else {
if len(stack) == 1 {
res[stack[0]] = '' // set this byte to be empty
res[i] = '' // / set this byte to be empty
}
stack = stack[:len(stack)-1]
}
}
return string(res)
}
There is an equivalent question in Java
A byte is an alias to the uint8 type. Having an "empty byte" doesn't really make any sense, just as having an "empty number" doesn't make any sense (you can have the number 0, but what is an "empty" number?)
You can assign a value of zero (b := byte(0), or var b byte), which can be used to indicate that nothing is assigned yet ("zero value"). The byte value of 0 is is known as a "null byte". It normally never occurs in regular text, but often occurs in binary data (e.g. images, compressed files, etc.)
This is different from byte(""), which is a sequence of bytes. You can have a sequence of zero bytes. To give an analogy: I can have a wallet with no money in it, but I can't have a coin that is worth "empty".
If you really want to distinguish between "value of 0" and "never set" you can use either a pointer or a struct. An example with a pointer:
var b *byte
fmt.Println(b) // <nil>, since it's a pointer which has no address to point to.
one := byte(0)
b = &one // Set address to one.
fmt.Println(b, *b) // 0xc000014178 0 (first value will be different, as it's
// a memory address).
You'll need to be a little bit careful here, as *b will be a panic if you haven't assigned a value yet. Depending on how it's used it can either work quite well, or be very awkward to work with. An example where this is used in the standard library is the flag package.
Another possibility is to use a struct with separate fiels for the byte itself and a flag to record whether it's been set or not. In the database/sql library there are already the Null* types (e.g. NullInt64, which you can use as a starting point.
a single byte is a number. 0 would transform into a 8bit number. 00000000.
A byte slice/array can have a length of 0.
var a byte = 0
var b = [0]byte{}

Formatting big.Rat using golang.org/x/text/message

The package golang.org/x/text/message allows us to format numbers using national formats:
const n = 1222333.444555
prEn := message.NewPrinter(language.English)
prEn.Printf("%20.6f\n", n)
// Prints:
// 1,222,333.444555
prRu := message.NewPrinter(language.Russian)
prRu.Printf("%20.6f\n", n)
// Prints:
// 1 222 333,444555
Can I use it with math/big.Rat? That is, something like (doesn't work):
rat := big.NewRat(1222333444555, 1000000)
prEn.Printf("%20.6f\n", rat.FloatString(6))
// Should print:
// 1,222,333.444555
I know that I can wrap Rat in my own type and implement fmt.Formatter, but maybe there is a built-in way already?

Are there any go libraries that provide associative array capability?

I'm looking for a go language capability similar to the "dictionary" in python to facilitate the conversion of some python code.
EDIT: Maps worked quite well for this de-dupe application. I was able to condense 1.3e6 duplicated items down to 2.5e5 unique items using a map with a 16 byte string index in just a few seconds. The map-related code was simple so I've included it below. Worth noting that pre-allocation of map with 1.3e6 elements sped it up by only a few percent:
var m = make(map[string]int, 1300000) // map with initial space for 1.3e6 elements
ct, ok := m[ax_hash]
if ok {
m[ax_hash] = ct + 1
} else {
m[ax_hash] = 1
}
To expand a little on answers already given:
A Go map is a typed hash map data structure. A map's type signature is of the form map[keyType]valueType where keyType and valueType are the types of the keys and values respectively.
To initialize a map, you must use the make function:
m := make(map[string]int)
An uninitialized map is equal to nil, and if read from or written a panic will occur at runtime.
The syntax for storing values is much the same as doing so with arrays or slices:
m["Alice"] = 21
m["Bob"] = 17
Similarly, retrieving values from a map is done like so:
a := m["Alice"]
b := m["Bob"]
You can use the range keyword to iterate over a map with a for loop:
for k, v := range m {
fmt.Println(k, v)
}
This code will print:
Alice 21
Bob 17
Retrieving a value for a key that is not in the map will return the value type's zero value:
c := m["Charlie"]
// c == 0
By reading multiple values from a map, you can test for a key's presence. The second value will be a boolean indicating the key's presence:
a, ok := m["Alice"]
// a == 21, ok == true
c, ok := m["Charlie"]
// c == 0, ok == false
To remove a key/value entry from a map, you flip it around and assign false as the second value:
m["Bob"] = 0, false
b, ok := m["Bob"]
// b == 0, ok == false
You can store arbitrary types in a map by using the empty interface type interface{}:
n := make(map[string]interface{})
n["One"] = 1
n["Two"] = "Two"
The only proviso is that when retrieving those values you must perform a type assertion to use them in their original form:
a := n["One"].(int)
b := n["Two"].(string)
You can use a type switch to determine the types of the values you're pulling out, and deal with them appropriately:
for k, v := range n {
switch u := v.(type) {
case int:
fmt.Printf("Key %q is an int with the value %v.\n", k, u)
case string:
fmt.Printf("Key %q is a string with the value %q.\n", k, u)
}
}
Inside each of those case blocks, u will be of the type specified in the case statement; no explicit type assertion is necessary.
This code will print:
Key "One" is an int with the value 1.
Key "Two" is a string with the value "Two".
The key can be of any type for which the equality operator is defined, such as integers, floats, strings, and pointers. Interface types can also be used, as long as the underlying type supports equality. (Structs, arrays and slices cannot be used as map keys, because equality is not defined on those types.)
For example, the map o can take keys of any of the above types:
o := make(map[interface{}]int)
o[1] = 1
o["Two"] = 2
And that's maps in a nutshell.
The map type. http://golang.org/doc/effective_go.html#maps
There is some difference from python in that the keys have to be typed, so you can't mix numeric and string keys (for some reason I forgot you can), but they're pretty easy to use.
dict := make(map[string]string)
dict["user"] = "so_user"
dict["pass"] = "l33t_pass1"
You're probably looking for a map.

Resources