performance improvements for my hashmap implementation - performance

I decided i'd try make my own hashmap (here)
For reads it's 28% slower than the standard library implementation, I'm wondering if it's possible to speed up the following code, Index() which is critical for lookups:
const numOnes = uint8(20)
const ones = uint32(1 << numOnes - 1)
func (m *Map) Index(num uint64) uint32 {
part := uint32(num) & ones
remaining := num >> numOnes
start := m.starts[part]
bitsNum := m.bitNums[part]
matchedBits := bitsNum & uint16(remaining)
offset := BitScoreCache[bitsNum][matchedBits]
return start + uint32(offset)
}
note BitScoreCache is var BitScoreCache [5000][5000]uint16 which is supposed to be readonly and be shared between multiple different map instances.
example usage:
func (pa PrimeAdvancedAnagrammar) GetAnagrams(word string) []string {
return pa.m[pa.locator.Index(PrimeProduct(word))] //pa.m is an [][]string
}
versus standard library:
func (pba PrimeBasicAnagrammar) GetAnagrams(word string) []string {
return pba.m[PrimeProduct(word)] //pba.m is a map[uint64][]string
}
What are the main reasons it's slower than the standard library for so few operations?

Combining the two arrays into one array of structs reduced cache misses and was the biggest performance improvement.
also returning early on the case that there is no collisions improved performance.

Related

How to efficiently hash (SHA 256) in golang data where only the last few bytes changes

Assuming you had 80 bytes of data and only the last 4 bytes was constantly changing, how would you efficiently hash the total 80 bytes using Go. In essence, the first 76 bytes are the same, while the last 4 bytes keeps changing. Ideally, you want to keep a copy of the hash digest for the first 76 bytes and just keep changing the last 4.
You can try the following examples on the Go Playground. Benchmark results is at the end.
Note: the implementations below are not safe for concurrent use; I intentionally made them like this to be simpler and faster.
Fastest when using only public API (always hashes all input)
The general concept and interface of Go's hash algorithms is the hash.Hash interface. This does not allow you to save the state of the hasher and to return or rewind to the saved state. So using the public hash APIs of the Go standard lib, you always have to calculate the hash from start.
What the public API offers is to reuse an already constructed hasher to calculate the hash of a new input, using the Hash.Reset() method. This is nice so that no (memory) allocations will be needed to calculate multiple hash values. Also you may take advantage of the optional slice that may be passed to Hash.Sum() which is used to append the current hash to. This is nice so that no allocations will be needed to receive the hash results either.
Here's an example that takes advantage of these:
type Cached1 struct {
hasher hash.Hash
result [sha256.Size]byte
}
func NewCached1() *Cached1 {
return &Cached1{hasher: sha256.New()}
}
func (c *Cached1) Sum(data []byte) []byte {
c.hasher.Reset()
c.hasher.Write(data)
return c.hasher.Sum(c.result[:0])
}
Test data
We'll use the following test data:
var fixed = bytes.Repeat([]byte{1}, 76)
var variantA = []byte{1, 1, 1, 1}
var variantB = []byte{2, 2, 2, 2}
var data = append(append([]byte{}, fixed...), variantA...)
var data2 = append(append([]byte{}, fixed...), variantB...)
var c1 = NewCached1()
First let's get authentic results (to verify if our hasher works correctly):
fmt.Printf("%x\n", sha256.Sum256(data))
fmt.Printf("%x\n", sha256.Sum256(data2))
Output:
fb8e69bdfa2ad15be7cc8a346b74e773d059f96cfc92da89e631895422fe966a
10ef52823dad5d1212e8ac83b54c001bfb9a03dc0c7c3c83246fb988aa788c0c
Now let's check our Cached1 hasher:
fmt.Printf("%x\n", c1.Sum(data))
fmt.Printf("%x\n", c1.Sum(data2))
Output is the same:
fb8e69bdfa2ad15be7cc8a346b74e773d059f96cfc92da89e631895422fe966a
10ef52823dad5d1212e8ac83b54c001bfb9a03dc0c7c3c83246fb988aa788c0c
Even faster but may break (in future Go releases): hashes only the last 4 bytes
Now let's see a less flexible solution which truly calculates the hash of the first 76 fixed part only once.
The hasher of the crypto/sha256 package is the unexported sha256.digest type (more precisely a pointer to this type):
// digest represents the partial evaluation of a checksum.
type digest struct {
h [8]uint32
x [chunk]byte
nx int
len uint64
is224 bool // mark if this digest is SHA-224
}
A value of the digest struct type basically holds the current state of the hasher.
What we may do is feed the hasher the fixed, first 76 bytes, and then save this struct value. When we need to caclulate the hash of some 80 bytes data where the first 76 is the same, we use this saved value as a starting point, and then feed the varying last 4 bytes.
Note that it's enough to simply save this struct value as it contains no pointers and no descriptor types like slices and maps. Else we would also have to make a copy of those, but we're "lucky". So this solution would need adjustment if a future implementation of crypto/sha256 would add a pointer or slice field for example.
Since sha256.digest is unexported, we can only use reflection (reflect package) to achieve our goals, which inherently will add some delays to computation.
Example implementation that does this:
type Cached2 struct {
origv reflect.Value
hasherv reflect.Value
hasher hash.Hash
result [sha256.Size]byte
}
func NewCached2(fixed []byte) *Cached2 {
h := sha256.New()
h.Write(fixed)
c := &Cached2{origv: reflect.ValueOf(h).Elem()}
hasherv := reflect.New(c.origv.Type())
c.hasher = hasherv.Interface().(hash.Hash)
c.hasherv = hasherv.Elem()
return c
}
func (c *Cached2) Sum(data []byte) []byte {
// Set state of the fixed hash:
c.hasherv.Set(c.origv)
c.hasher.Write(data)
return c.hasher.Sum(c.result[:0])
}
Testing it:
var c2 = NewCached2(fixed)
fmt.Printf("%x\n", c2.Sum(variantA))
fmt.Printf("%x\n", c2.Sum(variantB))
Output is again the same:
fb8e69bdfa2ad15be7cc8a346b74e773d059f96cfc92da89e631895422fe966a
10ef52823dad5d1212e8ac83b54c001bfb9a03dc0c7c3c83246fb988aa788c0c
So it works.
The "ultimate", fastest solution
Cached2 could be faster if reflection would not be involved. If we want an even faster solution, simply we can make a copy of the sha256.digest type and its methods into our package, so we can directly use it without having to resort to reflection.
If we do this, we will have access to the digest struct value, and we can simply make a copy of it like:
var d digest
// init d
saved := d
And restoring it is like:
d = saved
I simply "cloned" the crypto/sha256 package to my workspace, and changed / exported the digest type as Digest just for demonstration purposes. Then using this mysha256.Digest type I implemented Cached3 like this:
type Cached3 struct {
orig mysha256.Digest
result [sha256.Size]byte
}
func NewCached3(fixed []byte) *Cached3 {
var d mysha256.Digest
d.Reset()
d.Write(fixed)
return &Cached3{orig: d}
}
func (c *Cached3) Sum(data []byte) []byte {
// Make a copy of the fixed hash:
d := c.orig
d.Write(data)
return d.Sum(c.result[:0])
}
Testing it:
var c3 = NewCached3(fixed)
fmt.Printf("%x\n", c3.Sum(variantA))
fmt.Printf("%x\n", c3.Sum(variantB))
Output again is the same. So this works too.
Benchmarks
We can benchmark performance with this code:
func BenchmarkCached1(b *testing.B) {
for i := 0; i < b.N; i++ {
c1.Sum(data)
c1.Sum(data2)
}
}
func BenchmarkCached2(b *testing.B) {
for i := 0; i < b.N; i++ {
c2.Sum(variantA)
c2.Sum(variantB)
}
}
func BenchmarkCached3(b *testing.B) {
for i := 0; i < b.N; i++ {
c3.Sum(variantA)
c3.Sum(variantB)
}
}
Benchmark results (go test -bench . -benchmem):
BenchmarkCached1-4 1000000 1569 ns/op 0 B/op 0 allocs/op
BenchmarkCached2-4 2000000 926 ns/op 0 B/op 0 allocs/op
BenchmarkCached3-4 2000000 872 ns/op 0 B/op 0 allocs/op
Cached2 is approximately 41% faster than Cached1 which is quite noticable and nice. Cached3 only gives a "little" performance boost compared to Cached2, another 6%. Cached3 is 44% faster than Cached1.
Also note that none of the solutions use any allocations which is also nice.
Conclusion
For that extra 40% or 44%, I would probably not go for the Cached2 or Cached3 solutions. Of course it really depends on how important the performance is to you. If it is important, I think the Cached2 solution presents a fine compromise between minimum added complexity and the noticeable performance gain. It does pose a threat as future Go implementations may break it; if it is a problem, Cached3 solves this by copying the current implementation (and also improves its performance a little).

Will garbage collect the parent class if one child is in use?

I am thinking, when I create a System struct, the builder system will cost much memory, but the result is simple, so if I return a address of result, will garbage know that it can collect the builder system memory?
How to test this?
I simulate the situation like this:
// Builder is used to build `System`, and it will cost much memory
type Builder struct {
aux [][]int
system *System
}
// System is the result of `Builder.build`, this is relatively simple
type System struct {
avg []float32
}
func NewSystem() *System {
builder := &Builder{system: &System{}}
builder.build()
return builder.system
}
func (builder *Builder) build() {
// mock initialize
maxCnt := 10000
builder.aux = make([][]int, maxCnt)
for i := range builder.aux {
builder.aux[i] = make([]int, maxCnt)
for j := range builder.aux[i] {
builder.aux[i][j] = rand.Intn(maxCnt)
}
}
builder.system.avg = make([]float32, len(builder.aux))
for i, col := range builder.aux {
var sum float32
for _, row := range col {
sum += float32(row)
}
builder.system.avg[i] = sum / float32(len(builder.aux))
}
}
func TestMem(t *testing.T) {
system := NewSystem()
// I want to know can garbage know that the memory cost by builder is able to be collected
fmt.Println("do many things with system")
fmt.Println(system.avg)
}
Yes, it will be garbage collected once there is enough memory pressure to trigger garbage collection (assuming it's even put on the heap; anything allocated to the stack doesn't need to be garbage collected, as the entire stack is deallocated when no longer in use). The garbage collector will deallocate anything with no remaining references. After your processing finishes, the only thing with a reference is the []float32. If that were instead a slice of structs, and those structs had a pointer back to the parent object, that would prevent the parent being collected.

Can we write a generic array/slice deduplication in go?

Is there a way to write a generic array/slice deduplication in go, for []int we can have something like (from http://rosettacode.org/wiki/Remove_duplicate_elements#Go ):
func uniq(list []int) []int {
unique_set := make(map[int] bool, len(list))
for _, x := range list {
unique_set[x] = true
}
result := make([]int, len(unique_set))
i := 0
for x := range unique_set {
result[i] = x
i++
}
return result
}
But is there a way to extend it to support any array? with a signature like:
func deduplicate(a []interface{}) []interface{}
I know that you can write that function with that signature, but then you can't actually use it on []int, you need to create a []interface{} put everything from the []int into it, pass it to the function then get it back and put it into a []interface{} and go through this new array and put everything in a new []int.
My question is, is there a better way to do this?
While VonC's answer probably does the closest to what you really want, the only real way to do it in native Go without gen is to define an interface
type IDList interface {
// Returns the id of the element at i
ID(i int) int
// Returns the element
// with the given id
GetByID(id int) interface{}
Len() int
// Adds the element to the list
Insert(interface{})
}
// Puts the deduplicated list in dst
func Deduplicate(dst, list IDList) {
intList := make([]int, list.Len())
for i := range intList {
intList[i] = list.ID(i)
}
uniques := uniq(intList)
for _,el := range uniques {
dst.Insert(list.GetByID(el))
}
}
Where uniq is the function from your OP.
This is just one possible example, and there are probably much better ones, but in general mapping each element to a unique "==able" ID and either constructing a new list or culling based on the deduplication of the IDs is probably the most intuitive way.
An alternate solution is to take in an []IDer where the IDer interface is just ID() int. However, that means that user code has to create the []IDer list and copy all the elements into that list, which is a bit ugly. It's cleaner for the user to wrap the list as an ID list rather than copy, but it's a similar amount of work either way.
The only way I have seen that implemented in Go is with the clipperhouse/gen project,
gen is an attempt to bring some generics-like functionality to Go, with some inspiration from C#’s Linq and JavaScript’s underscore libraries
See this test:
// Distinct returns a new Thing1s slice whose elements are unique. See: http://clipperhouse.github.io/gen/#Distinct
func (rcv Thing1s) Distinct() (result Thing1s) {
appended := make(map[Thing1]bool)
for _, v := range rcv {
if !appended[v] {
result = append(result, v)
appended[v] = true
}
}
return result
}
But, as explained in clipperhouse.github.io/gen/:
gen generates code for your types, at development time, using the command line.
gen is not an import; the generated source becomes part of your project and takes no external dependencies.
You could do something close to this via an interface. Define an interface, say "DeDupable" requiring a func, say, UniqId() []byte, which you could then use to do the removing of dups. and your uniq func would take a []DeDupable and work on it

Is there an efficient way of reclaiming over-capacity slices?

I have a large number of allocated slices (a few million) which I have appended to. I'm sure a large number of them are over capacity. I want to try and reduce memory usage.
My first attempt is to iterate over all of them, allocate a new slice of len(oldSlice) and copy the values over. Unfortunately this appears to increase memory usage (up to double) and the garbage collection is slow to reclaim the memory.
Is there a good general way to slim down memory usage for a large number of over-capacity slices?
Choosing the right strategy to allocate your buffers is hard without knowing the exact problem.
In general you can try to reuse your buffers:
type buffer struct{}
var buffers = make(chan *buffer, 1024)
func newBuffer() *buffer {
select {
case b:= <-buffers:
return b
default:
return &buffer{}
}
}
func returnBuffer(b *buffer) {
select {
case buffers <- b:
default:
}
}
The heuristic used in append may not be suitable for all applications. It's designed for use when you don't know the final length of the data you'll be storing. Instead of iterating over them later, I'd try to minimize the amount of extra capacity you're allocating as early as possible. Here's a simple example of one strategy, which is to use a buffer only while the length is not known, and to reuse that buffer:
type buffer struct {
names []string
... // possibly other things
}
// assume this is called frequently and has lots and lots of names
func (b *buffer) readNames(lines bufio.Scanner) ([]string, error) {
// Start from zero, so we can re-use capacity
b.names = b.names[:0]
for lines.Scan() {
b.names = append(b.names, lines.Text())
}
// Figure out the error
err := lines.Err()
if err == io.EOF {
err = nil
}
// Allocate a minimal slice
out := make([]string, len(b.names))
copy(out, b.names)
return out, err
}
Of course, you'll need to modify this if you need something that's safe for concurrent use; for that I'd recommend using a buffered channel as a leaky bucket for storing your buffers.

How should Go library code initialize and use random number generation?

When writing a Go library that needs to use random numbers, what is the best way to initialize and consume random numbers?
I know that the std way to do this in an application is:
import (
"math/rand"
"time"
)
// do the initial seeding in an init fn
func init() {
// set the global seed and use the global fns
rand.Seed(time.Now().UTC().UnixNano())
}
func main() {
fmt.Println(rand.Int())
fmt.Println(rand.Intn(200))
}
So when I'm writing library code (not in the main package), should I just do the same:
package libfoo
func init() {
rand.Seed(time.Now().UTC().UnixNano())
}
func AwesomeFoo() {
r := rand.Intn(1000)
// ...
}
The application using my library might also do its own random number seeding and use rand.Intn, so my question really is - is there any downside to having a library seed the random number generator and some app code (or another library) do so as well?
Also is there any issue with the library using the "global" rand.Intn or rand.Int or should a library create it's own private Rand object via rand.New(src) and use that instead?
I don't have any particular reason for thinking this is unsafe, but I know enough about crypto and PRNGs to know that it is easy to get something wrong if you don't know what you're doing.
For example, here's a simple library for the Knuth (Fisher-Yates) shuffle that needs randomness: https://gist.github.com/quux00/8258425
What's best really just depends on the type of application you're writing and the type of library you want to create. If we're not sure, we can get the most flexibility by using a form of dependency injection through Go interfaces.
Consider the following naive Monte Carlo integrator that takes advantage of the rand.Source interface:
package monte
import (
"math/rand"
)
const (
DEFAULT_STEPS = 100000
)
type Naive struct {
rand *rand.Rand
steps int
}
func NewNaive(source rand.Source) *Naive {
return &Naive{rand.New(source), DEFAULT_STEPS}
}
func (m *Naive) SetSteps(steps int) {
m.steps = steps
}
func (m *Naive) Integrate1D(fn func(float64) float64, a, b float64) float64 {
var sum float64
for i := 0; i < m.steps; i++ {
x := (b-a) * m.rand.Float64()
sum += fn(x)
}
return (b-a)*sum/float64(m.steps)
}
We can then use this package to calculate the value of pi:
func main() {
m := monte.NewNaive(rand.NewSource(200))
pi := 4*m.Integrate1D(func (t float64) float64 {
return math.Sqrt(1-t*t)
}, 0, 1)
fmt.Println(pi)
}
In this case, the quality of our algorithm's results depend on the type of pseudorandom number generator used, so we need to provide a way for users to swap out one generator for another. Here we've defined an opaque type that takes a random number source in its constructor. By having their random number generator satisfy the rand.Source interface, our application writer can then swap out random number generators as needed.
However, there are many cases where this is exactly what we don't want to do. Consider a random password or key generator. In that case, what we really want is a high entropy source of truly random data, so we should just use the crypto/rand package internally and hide the details from our application writers:
package keygen
import (
"crypto/rand"
"encoding/base32"
)
func GenKey() (string, error) {
b := make([]byte, 20)
if _, err := rand.Read(b); err != nil {
return "", err
}
enc := base32.NewEncoding("ABCDEFGHIJKLMNOPQRSTUVWXYZ346789")
return enc.EncodeToString(b), nil
}
Hopefully that helps you make a decision. If the code is for your own applications or applications within a specific company rather than industry wide or public use, lean towards the library design that exposes the fewest internals and creates the fewest dependencies rather than the most general design, since that will ease maintenance and shorten implementation time.
Basically, if it feels like overkill, it probably is.
In the case of the Knuth Shuffle, the requirements are simply a decent psuedo-random number generator, so you could simply use an internally seeded rand.Rand object that's private to your package like so:
package shuffle
import (
"math/rand"
"time"
)
var r *rand.Rand
func init() {
r = rand.New(rand.NewSource(time.Now().UTC().UnixNano()))
}
func ShuffleStrings(arr []string) {
last := len(arr)-1
for i := range arr {
j := r.Intn(last)
arr[i], arr[j] = arr[j], arr[i]
}
}
Then the application doesn't have to worry about how it works:
package main
import (
"shuffle"
"fmt"
)
func main() {
arr := []string{"a","set","of","words"}
fmt.Printf("Shuffling words: %v\n", arr)
for i := 0; i<10; i++ {
shuffle.ShuffleStrings(arr)
fmt.Printf("Shuffled words: %v\n", arr)
}
}
This prevents the application from accidentally reseeding the random number generator used by your package by calling rand.Seed.
Don't seed the global random number generator. That should be left to package main.
If you care what your seed is, you should create your own private Rand object. If you don't care, you can just use the global source.
If you care about your numbers actually being random, you should use crypto/rand instead of math/rand.

Resources