Golang maps/hashmaps, optimizing for iteration speed - algorithm

When migrating a production NodeJS application to Golang I've noticed that iteration of GO's native Map is actually slower than Node.
I've come up with an alternative solution that sacrifices removal/insertion speed with iteration speed instead, by exposing an array that can be iterated over and storing key=>index pairs inside a separate map.
While this solution works, and has a significant performance increase, I was wondering if there is a better solution to this that I could look into.
The setup I have is that its very rare something is removed from the hashmaps, only additions and replacements are common for which this implementation 'works', albeit feels like a workaround more than an actual solution.
The maps are always indexed by an integer, holding arbitrary data.
FastMap: 500000 Iterations - 0.153000ms
Native Map: 500000 Iterations - 4.988000ms
/*
Unordered hash map optimized for iteration speed.
Stores values in an array and holds key=>index mappings inside a separate hashmap
*/
type FastMapEntry[K comparable, T any] struct {
Key K
Value T
}
type FastMap[K comparable, T any] struct {
m map[K]int // Stores key => array index mappings
entries []FastMapEntry[K, T] // Array holding entries and their keys
len int // Total map size
}
func MakeFastMap[K comparable, T any]() *FastMap[K, T] {
return &FastMap[K, T]{
m: make(map[K]int),
entries: make([]FastMapEntry[K, T], 0),
}
}
func (m *FastMap[K, T]) Set(key K, value T) {
index, exists := m.m[key]
if exists {
// Replace if key already exists
m.entries[index] = FastMapEntry[K, T]{
Key: key,
Value: value,
}
} else {
// Store the key=>index pair in the map and add value to entries. Increase total len by one
m.m[key] = m.len
m.entries = append(m.entries, FastMapEntry[K, T]{
Key: key,
Value: value,
})
m.len++
}
}
func (m *FastMap[K, T]) Has(key K) bool {
_, exists := m.m[key]
return exists
}
func (m *FastMap[K, T]) Get(key K) (value T, found bool) {
index, exists := m.m[key]
if exists {
found = true
value = m.entries[index].Value
}
return
}
func (m *FastMap[K, T]) Remove(key K) bool {
index, exists := m.m[key]
if exists {
// Remove value from entries
m.entries = append(m.entries[:index], m.entries[index+1:]...)
// Remove key=>index mapping
delete(m.m, key)
m.len--
for i := index; i < m.len; i++ {
// Move all index mappings up, starting from current index
m.m[m.entries[i].Key] = i
}
}
return exists
}
func (m *FastMap[K, T]) Entries() []FastMapEntry[K, T] {
return m.entries
}
func (m *FastMap[K, T]) Len() int {
return m.len
}
The test code that was ran is:
// s.Variations is a native map holding ~500k records
start := time.Now()
iterations := 0
for _, variation := range s.Variations {
if variation.Id > 0 {
}
iterations++
}
log.Printf("Native Map: %d Iterations - %fms\n", iterations, float64(time.Since(start).Microseconds())/1000)
// Copy data into FastMap
fm := helpers.MakeFastMap[state.VariationId, models.ItemVariation]()
for key, variation := range s.Variations {
fm.Set(key, variation)
}
start = time.Now()
iterations = 0
for _, variation := range fm.Entries() {
if variation.Value.Id > 0 {
}
iterations++
}
log.Printf("FastMap: %d Iterations - %fms\n", iterations, float64(time.Since(start).Microseconds())/1000)

I think this kind of comparison and benchmarking is a little off-topic. Go implementation of map is quite different from your implementation, basically because it needs to cover a wider area of entries, the structs used in compile time are actually kind of heavy (not so much though, they basically store some information about the types you use in your map and so on), and the implementation approach is different! Go implementation of map is basically a hashmap (yours is not obviously, or it is, but the actual hashing implementation is delegated to the m map you hold internally).
One of the other factors makes you get this result is, if you take a look at this:
for _, variation := range fm.Entries() {
if variation.Value.Id > 0 {
}
iterations++
}
Basically, you're iterating over a slice, which is much easier and faster to iterate rather than a map, you have a view to an array, which holds elements of the same types next to each other, makes sense, right?
What you should do to make a better comparison would be something like this:
for _, y := range fastMap.m {
_ = fastMap.Entries()[y].Value + 1 // some simple calculation
}
If you're really looking for performance, a well written hash function and a fixed size array would be your best choice.

Related

Reading from a slice of unknown length in Golang

I'm trying to replicate this algorithm for finding duplicates in an array in Golang. Here's the javascript version:
function hasDuplicateValue(array) {
let existingNumbers = [];
for(let i = 0; i < array.length; i++) {
if(existingNumbers[array[i]] === 1) {
return true;
} else {
existingNumbers[array[i]] = 1;
}
}
return false;
}
On line 2, the algorithm creates an empty array of unknown length, and then adds 1 to an index in the array corresponding with each number that it finds (e.g. if it finds the number 3 in the array, it will add a 1 to index 3 in existing numbers.
I'm wondering — how do I replicate this in Golang (since we need to have slots allocated in the slice before reading it). Would I first need to find the max value in the array and then declare the existingNumbers slice to be of that same size?
Or is there a more efficient way of doing this (instead of searching through the array and finding the max value before constructing the slice).
Thanks!
Edit:
I realized that I can't do this with a slice because I can't read from an empty value. However, as #icza suggested, it will work with a map:
func findDuplicates(list []int)(bool) {
temp := make(map[int]int)
for _, elem := range list {
if temp[elem] == 1 {
return true
} else {
temp[elem] = 1
}
}
return false
}
As comments, I would also suggest using a map to keep the state of the duplications, but we can use map[int]struct{} because empty structs are not consumed any memory in Go.
And also I have simplified the code a bit and it is as follows.
func findDuplicates(list []int) bool {
temp := make(map[int]struct{})
for _, elem := range list {
if _, ok := temp[elem]; ok {
return true
}
temp[elem] = struct{}{}
}
return false
}
Full code can be executed here

Sort 2D array of structs Golang

I want to create a consistent ordering for a 2D slice of structs, I am creating the 2D slice from a map so the order is always different.
My structs look like
// Hit contains the data for a hit.
type Hit struct {
Key string `json:"key"`
Data []Field `json:"data"`
}
// Hits stores a list of hits.
type Hits [][]Hit
I want to provide a consistent order for the contents of my Hits type.
I have tried:
func (c Hits) Len() int { return len(c) }
func (c Hits) Swap(i, j int) { c[i], c[j] = c[j], c[i] }
func (c Hits) Less(i, j int) bool { return strings.Compare(c[i][0].Key, c[j][0].Key) == -1 }
But the results still seem to come back in random order.
I was thinking of possibly hashing each item in the slice but thought there might be an easier option
The order of iteration over a map, because it's a hash table is rather indeterminate (it's not, really — insert items with the same keys in the same exact sequence into 2 maps and the order of iteration for each will be identical).
Assuming that your map is a map[string]Hit, to iterate it over in a determinate order, I would enumerate the set of keys in the map, sort that, and use that sorted set to enumerate the map.
Something like this:
package main
import (
"fmt"
"sort"
)
type Hit struct {
Key string `json:"key"`
Data []Field `json:"data"`
}
type Field struct {
Value string `json:"value"`
}
func main() {
var mapOfHits = getSomeHits()
var sortedHits = sortHits(mapOfHits)
for _, h := range sortedHits {
fmt.Println(h.Key)
}
}
func getSomeHits() map[string]Hit {
return make(map[string]Hit, 0)
}
func sortHits(m map[string]Hit) []Hit {
keys := make([]string, 0, len(m))
sorted := make([]Hit, 0, len(m))
for k := range m {
keys = append(keys, k)
}
sort.Strings(keys)
for _, k := range keys {
sorted = append(sorted, m[k])
}
return sorted
}

Sorting a slice based on the order of the elements in another slice

I am attempting to order a slice based on the order of the elements within another slice. My sort function works when I only have one of each type within my slice I want to order however when I start adding more elements the ordering breaks.
I have created an example within the Golang playground.
https://play.golang.org/p/e9sHIeV2qSf
I want to order my Variant slice by the Code field and have it the same as order as the codes appear in the Language struct.
Below is the sort function I am using:
sort.Slice(variants, func(i, j int) bool {
for k, language := range languages {
if language.Code == variants[i].Code {
return i >= k
}
}
return false
})
The current order it's returning is:
Sorted slice: [{Code:en-GB} {Code:en-US} {Code:en-GB} {Code:es-ES}
{Code:en-GB} {Code:en-GB} {Code:en-GB} {Code:en-GB} {Code:es-ES}]
When the order within my Language struct is:
"en-GB", "en-US", "fr-FR", "es-ES"
I think to do this, you need to build a ranking of your languages:
var langMap map[string]int
for i, lang := range languages {
langMap[lang.Code] = i
}
With this, it becomes trivial to just look up the ranking of each item in variants, and return the appropriate value:
sort.Slice(variants, func(i, j int) bool {
iRank, jRank := langMap[variants[i].Code], langMap[variants[j].Code]
return iRank < jRank
})
If there's a chance you may have inputs that are not in the pre-sorted list, you can sort them last:
sort.Slice(variants, func(i, j int) bool {
iRank, iExists := langMap[variants[i].Code]
jRank, jExists := langMap[variants[j].Code]
switch (
case iExists && jExists:
// Both exist in the pre-ordered list, so sort by rank
return iRank < jRank
case !iExists && !jExists:
// Neither exists in the pre-ordered list, sort alphabetically
return variants[i].Code < variants[j].Code
case iExists:
// Only i exists, so sort it before j
return true
default: // jExists
// Only j exists, so sort it after i
return false
)
})
It is logically possible to do the same by looping through your reference list each time, as you're attempting, but it's much harder to reason about, and far less efficient.

GO: Slice of unique structs effective reusable implementation

I often need to get rid of duplicates based on arbitrary equals function.
I need implementation that:
is fast and memory effective (does not create map)
is reusable and easy to use, think of slice.Sort() (github.com/bradfitz/slice)
it's not required to keep order of the original slice or preserve original slice
would be nice to minimize copying
Can this be implemented in go? Why this function is not part of some library I am aware of?
I was looking e.g. godash (github.com/zillow/godash) implementation uses map and does not allow arbitrary less and equal.
Here is how it should approximately look like.
Test:
import (
"reflect"
"testing"
)
type bla struct {
ID string
}
type blas []bla
func (slice blas) Less(i, j int) bool {
return slice[i].ID < slice[j].ID
}
func (slice blas) EqualID(i, j int) bool {
return slice[i].ID == slice[j].ID
}
func Test_Unique(t *testing.T) {
input := []bla{bla{ID: "d"}, bla{ID: "a"}, bla{ID: "b"}, bla{ID: "a"}, bla{ID: "c"}, bla{ID: "c"}}
expected := []bla{bla{ID: "a"}, bla{ID: "b"}, bla{ID: "c"}, bla{ID: "d"}}
Unique(input, blas(input).Less, blas(input).EqualID)
if !reflect.DeepEqual(expected, input) {
t.Errorf("2: Expected: %v but was %v \n", expected, input)
}
}
What I think will need to be used to implement this:
Only slices as data structure to keep it simple and for easy sorting.
Some reflection - the hard part for me! Since I am new to go.
Options
You can sort slice and check for adjacent nodes creation = O(n logn),lookup = O(log n) , insertion = O(n), deletion = O(n)
You can use a Tree and the original slice together creation = O(n logn),lookup = O(log n) , insertion = O(log n), deletion = O(log n)
In the tree implementation you may put only the index in tree nodes and evaluation of nodes will be done using the Equal/Less functions defined for the interface.
Here is an example with tree, here is the play link
You have to add more functions to make it usable ,and the code is not cache friendly so you may improve the code for make it cache friendly
How to use
Make the type representing slice implement Setter interface
set := NewSet(slice),creates a slice
now set.T has only unique values indexes
implement more functions to Set for other set operations
Code
type Set struct {
T Tree
Slice Setter
}
func NewSet(slice Setter) *Set {
set := new(Set)
set.T = Tree{nil, 0, nil}
set.Slice = slice
for i:=0;i < slice.Len();i++ {
insert(&set.T, slice, i)
}
return set
}
type Setter interface {
Len() int
At(int) (interface{},error)
Less(int, int) bool
Equal(int, int) bool
}
// A Tree is a binary tree with integer values.
type Tree struct {
Left *Tree
Value int
Right *Tree
}
func insert(t *Tree, Setter Setter, index int) *Tree {
if t == nil {
return &Tree{nil, index, nil}
}
if Setter.Equal(t.Value, index) {
return t
}
if Setter.Less(t.Value, index) {
t.Left = insert(t.Left, Setter, index)
return t
}
t.Right = insert(t.Right, Setter, index)
return t
}
Bloom filter is frequently used for equality test. There is https://github.com/willf/bloom for example, which awarded some stars on github. This particular implementation uses murmur3 for hashing and bitset for filter, so can be more efficient than map.

How to check the uniqueness inside a for-loop?

Is there a way to check slices/maps for the presence of a value?
I would like to add a value to a slice only if it does not exist in the slice.
This works, but it seems verbose. Is there a better way to do this?
orgSlice := []int{1, 2, 3}
newSlice := []int{}
newInt := 2
newSlice = append(newSlice, newInt)
for _, v := range orgSlice {
if v != newInt {
newSlice = append(newSlice, v)
}
}
newSlice == [2 1 3]
Your approach would take linear time for each insertion. A better way would be to use a map[int]struct{}. Alternatively, you could also use a map[int]bool or something similar, but the empty struct{} has the advantage that it doesn't occupy any additional space. Therefore map[int]struct{} is a popular choice for a set of integers.
Example:
set := make(map[int]struct{})
set[1] = struct{}{}
set[2] = struct{}{}
set[1] = struct{}{}
// ...
for key := range(set) {
fmt.Println(key)
}
// each value will be printed only once, in no particular order
// you can use the ,ok idiom to check for existing keys
if _, ok := set[1]; ok {
fmt.Println("element found")
} else {
fmt.Println("element not found")
}
Most efficient is likely to be iterating over the slice and appending if you don't find it.
func AppendIfMissing(slice []int, i int) []int {
for _, ele := range slice {
if ele == i {
return slice
}
}
return append(slice, i)
}
It's simple and obvious and will be fast for small lists.
Further, it will always be faster than your current map-based solution. The map-based solution iterates over the whole slice no matter what; this solution returns immediately when it finds that the new value is already present. Both solutions compare elements as they iterate. (Each map assignment statement certainly does at least one map key comparison internally.) A map would only be useful if you could maintain it across many insertions. If you rebuild it on every insertion, then all advantage is lost.
If you truly need to efficiently handle large lists, consider maintaining the lists in sorted order. (I suspect the order doesn't matter to you because your first solution appended at the beginning of the list and your latest solution appends at the end.) If you always keep the lists sorted then you you can use the sort.Search function to do efficient binary insertions.
Another option:
package main
import "golang.org/x/tools/container/intsets"
func main() {
var (
a intsets.Sparse
b bool
)
b = a.Insert(9)
println(b) // true
b = a.Insert(9)
println(b) // false
}
https://pkg.go.dev/golang.org/x/tools/container/intsets
This option if the number of missing numbers is unknown
AppendIfMissing := func(sl []int, n ...int) []int {
cache := make(map[int]int)
for _, elem := range sl {
cache[elem] = elem
}
for _, elem := range n {
if _, ok := cache[elem]; !ok {
sl = append(sl, elem)
}
}
return sl
}
distincting a array of a struct :
func distinctObjects(objs []ObjectType) (distinctedObjs [] ObjectType){
var output []ObjectType
for i:= range objs{
if output==nil || len(output)==0{
output=append(output,objs[i])
} else {
founded:=false
for j:= range output{
if output[j].fieldname1==objs[i].fieldname1 && output[j].fieldname2==objs[i].fieldname2 &&......... {
founded=true
}
}
if !founded{
output=append(output,objs[i])
}
}
}
return output
}
where the struct here is something like :
type ObjectType struct {
fieldname1 string
fieldname2 string
.........
}
the object will distinct by checked fields here :
if output[j].fieldname1==objs[i].fieldname1 && output[j].fieldname2==objs[i].fieldname2 &&......... {

Resources