Implementing a Merkle-tree data structure in Go - data-structures

I'm currently attempting to implement a merkle-tree data structure in Go. Basically, my end goal is to store a small set of structured data (10MB max) and allow this "database" to be easily synchronised with other nodes distributed over the network (see related ).
I've implemented this reasonably effectively in Node as there are no type-checks. Herein lies the problem with Go, I'd like to make use of Go's compile-time type checks, though I also want to have one library which works with any provided tree.
In short, I'd like to use structs as merkle nodes and I'd like to have one Merkle.Update() method which is embedded in all types. I'm trying to avoid writing an Update() for every struct (though I'm aware this might be the only/best way).
My idea was to use embedded types:
//library
type Merkle struct {
Initialised bool
Container interface{} //in example this references foo
Fields []reflect.Type
//... other merkle state
}
//Merkle methods... Update()... etc...
//userland
type Foo struct {
Merkle
A int
B bool
C string
D map[string]*Bazz
E []*Bar
}
type Bazz struct {
Merkle
S int
T int
U int
}
type Bar struct {
Merkle
X int
Y int
Z int
}
In this example, Foo will be the root, which will contain Bazzs and Bars. This relationship could be inferred by reflecting on the types. The problem is the usage:
foo := &Foo{
A: 42,
B: true,
C: "foo",
D: map[string]*Bazz{
"b1": &Bazz{},
"b2": &Bazz{},
},
E: []*Bar{
&Bar{},
&Bar{},
&Bar{},
},
}
merkle.Init(foo)
foo.Hash //Initial hash => abc...
foo.A = 35
foo.E = append(foo.E, &Bar{})
foo.Update()
foo.Hash //Updated hash => def...
I think we need to merkle.Init(foo) since foo.Init() would actually be foo.Merkle.Init() and would not be able to reflect on foo. The uninitialised Bars and Bazzs could be detected and initialised by the parent foo.Update(). Some reflection is acceptable as correctness is more important than performance at the moment.
Another problem is, when we Update() a node, all struct fields (child nodes) would need to be Update()d as well (rehashed) since we aren't sure what was changed. We could do foo.SetInt("A", 35) to implement an auto-update, though then we lose compile time type-checks.
Would this be considered idiomatic Go? If not, how could this be improved? Can anyone think of an alternative way to store a dataset in memory (for fast reads) with concise dataset comparison (for efficient delta transfers over the network)?
Edit: And also a meta-question: Where is the best place to ask this kind of question, StackOverflow, Reddit or go-nuts? Originally posted on reddit with no answer :(

Some goals seem like:
Hash anything -- make it easy to use by hashing lots of things out of the box
Cache hashes -- make updates just rehash what they need to
Be idiomatic -- fit in well among other Go code
I think you can attack hashing anything roughly the way that serialization tools like the built-in encoding/gob or encoding/json do, which is three-pronged: use a special method if the type implements it (for JSON that's MarshalJSON), use a type switch for basic types, and fall back to a nasty default case using reflection. Here's an API sketch that provides a helper for hash caching and lets types either implement Hash or not:
package merkle
type HashVal uint64
const MissingHash HashVal = 0
// Hasher provides a custom hash implementation for a type. Not
// everything needs to implement it, but doing so can speed
// updates.
type Hasher interface {
Hash() HashVal
}
// HashCacher is the interface for items that cache a hash value.
// Normally implemented by embedding HashCache.
type HashCacher interface {
CachedHash() *HashVal
}
// HashCache implements HashCacher; it's meant to be embedded in your
// structs to make updating hash trees more efficient.
type HashCache struct {
h HashVal
}
// CachedHash implements HashCacher.
func (h *HashCache) CachedHash() *HashVal {
return &h.h
}
// Hash returns something's hash, using a cached hash or Hash() method if
// available.
func Hash(i interface{}) HashVal {
if hashCacher, ok := i.(HashCacher); ok {
if cached := *hashCacher.CachedHash(); cached != MissingHash {
return cached
}
}
switch i := i.(type) {
case Hasher:
return i.Hash()
case uint64:
return HashVal(i * 8675309) // or, you know, use a real hash
case []byte:
// CRC the bytes, say
return 0xdeadbeef
default:
return 0xdeadbeef
// terrible slow recursive case using reflection
// like: iterate fields using reflect, then hash each
}
// instead of panic()ing here, you could live a little
// dangerously and declare that changes to unhashable
// types don't invalidate the tree
panic("unhashable type passed to Hash()")
}
// Item is a node in the Merkle tree, which must know how to find its
// parent Item (the root node should return nil) and should usually
// embed HashCache for efficient updates. To avoid using reflection,
// Items might benefit from being Hashers as well.
type Item interface {
Parent() Item
HashCacher
}
// Update updates the chain of items between i and the root, given the
// leaf node that may have been changed.
func Update(i Item) {
for i != nil {
cached := i.CachedHash()
*cached = MissingHash // invalidate
*cached = Hash(i)
i = i.Parent()
}
}

Go doesn't have inheritance in the same way other languages do.
The "parent" can't modify items in the child, you'd have to implement Update on each struct then do your business in it then have it call the parent's Update.
func (b *Bar) Update() {
b.Merkle.Update()
//do stuff related to b and b.Merkle
//stuff
}
func (f *Foo) Update() {
f.Merkle.Update()
for _, b := range f.E {
b.Update()
}
//etc
}
I think you will have to re-implement your tree in a different way.
Also please provide a testable case the next time.

Have you seen https://github.com/xsleonard/go-merkle which will allow you to create a binary merkle tree. You could append a type byte to the end of your data to identify it.

Related

How to test if a generic type can fit a more restrictive interface - golang

I am building an all-purpose data structure, that I intend to use in various contexts and with various bits of data.
I am currently attempting to make a matcher, that will look into my structure and return all nodes containing the data given. My problem being that, since I need my structure to be as generic as possible, I need my data to be of a generic type matching any, and this won't allow me to make equalities.
I have built a "descendant type" (there's probably a correct term, I'm self-taught on this) that has the more rigorous comparable constraint.
I want to see if I can convert from the more general one to the more specific one (even if I have to catch an error and return it to the user). I know that I don't specifically need to, but it makes the code understandable down the line if i do it like that.
Here's a bit of code to explain my question :
type DataStructureGeneralCase[T any] struct {
/*
my data structure, which is too long to make for a good example, so I'm using a slice instead
*/
data []T
}
type DataStructureSpecific[T comparable] DataStructureGeneralCase[T]
// this works because any contains comparable
func (ds *DataStructureSpecific[T]) GetMatching(content T) int {
/*The real function returns my custom Node type, but let's be brief*/
for idx, item := range ds.data {
if item == content {
return idx
}
}
return -1
}
func (dg *DataStructureGeneralCase[T]) TryMatching(content T) (int, error) {
if ds, ok := (*dg).(DataStructureGeneral); ok {
// Does not work because dg is not interface, which I understand
} else {
return -1, fmt.Errorf("Could not convert because of non-comparable content")
}
}
My question can be summarized to "How can I do my conversion ?".
Cast to the empty interface first:
castedDg := interface{}(dg)
if ds, ok := castedDg.(DataStructureGeneralCase[T]); ok {
They explain why they chose this approach in https://go.googlesource.com/proposal/+/refs/heads/master/design/43651-type-parameters.md#why-not-permit-type-assertions-on-values-whose-type-is-a-type-parameter

map[T]struct{} and map[T]bool in golang

What's the difference? Is map[T]bool optimized to map[T]struct{}? Which is the best practice in Go?
Perhaps the best reason to use map[T]struct{} is that you don't have to answer the question "what does it mean if the value is false"?
From "The Go Programming Language":
The struct type with no fields is called the empty struct, written
struct{}. It has size zero and carries no information but may be
useful nonetheless. Some Go programmers use it instead of bool as the
value type of a map that represents a set, to emphasize that only the
keys are significant, but the space saving is marginal and the syntax
more cumbersome, so we generally avoid it.
If you use bool testing for presence in the "set" is slightly nicer since you can just say:
if mySet["something"] {
/* .. */
}
Difference is in memory requirements. Under the bonnet empty struct is not a pointer but a special value to save memory.
An empty struct is a struct type like any other. All the properties you are used to with normal structs apply equally to the empty struct. You can declare an array of structs{}s, but they of course consume no storage.
var x [100]struct{}
fmt.Println(unsafe.Sizeof(x)) // prints 0
If empty structs hold no data, it is not possible to determine if two struct{} values are different.
Considering the above statements it means that we may use them as method receivers.
type S struct{}
func (s *S) addr() { fmt.Printf("%p\n", s) }
func main() {
var a, b S
a.addr() // 0x1beeb0
b.addr() // 0x1beeb0
}

GoLang conventions - create custom type from slice

Is it a good idea to create own type from a slice in Golang?
Example:
type Trip struct {
From string
To string
Length int
}
type Trips []Trip // <-- is this a good idea?
func (trips *Trips) TotalLength() int {
ret := 0
for _, i := range *trips {
ret += i.Length
}
return ret
}
Is it somehow a convention in Golang to create types like Trips in my example? Or it is better to use []Trip in the whole project? Any pros and cons?
There's no convention, as far as I am aware of. It's OK to create a slice type if you really need it. In fact, if you ever want to sort your data, this is pretty much the only way: create a type and define the sort.Interface methods on it.
Also, in your example there is no need to take the address of Trips since slice is already a "fat pointer" of a kind. So you can simplify your method to:
func (trips Trips) TotalLength() (tl int) {
for _, l := range trips {
tl += l.Length
}
return tl
}
If this is what your type is (a slice), it's just fine. It gives you an easy access to underlying elements (and allows for range iteration) while providing additional methods.
Of course you probably should only keep essential set of methods on this type and not bloating it with everything that would take []Trip as an argument. (For example I would suggest having DrawTripsOnTheGlobe(t Trips) rather than having it as a Trips' method.)
To calm your mind there are plenty of such slice-types in standard packages:
http://golang.org/pkg/net/#IP
http://golang.org/pkg/sort/#Float64Slice
http://golang.org/pkg/sort/#IntSlice
http://golang.org/pkg/encoding/json/#RawMessage

Creating objects in Go

I'm playing around with Go for the first time. Consider this example.
type Foo struct {
Id int
}
func createFoo(id int) Foo {
return Foo{id}
}
This is perfectly fine for small objects, but how to create factory function for big objects?
In that case it's better to return pointer to avoid copying large chunks of data.
// now Foo has a lot of fields
func createFoo(id int /* other data here */) *Foo {
x := doSomeCalc()
return &Foo{
Id: id
//X: x and other data
}
}
or
func createFoo(id int /* other data here */) *Foo {
x := doSomeCalc()
f := new(Foo)
f.Id = id
//f.X = x and other data
return f
}
What's the difference between these two? What's the canonical way of doing it?
The convention is to write NewFoo functions to create and initialize objects. Examples:
xml.NewDecoder
http.NewRequest
You can always return pointers if you like since there is no syntactic difference when accessing methods or attributes. I would even go as far and say that it is often more convenient to return pointers so that you can use pointer receiver methods directly on the returned object. Imagine a base like this:
type Foo struct{}
func (f *Foo) M1() {}
When returning the object you cannot do this, since the returned value is not addressable (example on play):
NewFoo().M1()
When returning a pointer, you can do this. (example on play)
There is no difference. Sometimes one version is the "natural one", sometimes the other. Most gophers would prefere the first variant (unless the second has some advantages).
(Nitpick: Foo{id} is bad practice. Use Foo{Id: id} instead.)

Best practice for unions in Go

Go has no unions. But unions are necessary in many places. XML makes excessive use of unions or choice types. I tried to find out, which is the preferred way to work around the missing unions. As an example I tried to write Go code for the non terminal Misc in the XML standard which can be either a comment, a processing instruction or white space.
Writing code for the three base types is quite simple. They map to character arrays and a struct.
type Comment Chars
type ProcessingInstruction struct {
Target *Chars
Data *Chars
}
type WhiteSpace Chars
But when I finished the code for the union, it got quite bloated with many redundant functions. Obviously there must be a container struct.
type Misc struct {
value interface {}
}
In order to make sure that the container holds only the three allowed types I made the value private and I had to write for each type a constructor.
func MiscComment(c *Comment) *Misc {
return &Misc{c}
}
func MiscProcessingInstruction (pi *ProcessingInstruction) *Misc {
return &Misc{pi}
}
func MiscWhiteSpace (ws *WhiteSpace) *Misc {
return &Misc{ws}
}
In order to be able to test the contents of the union it was necessary to write three predicates:
func (m Misc) IsComment () bool {
_, itis := m.value.(*Comment)
return itis
}
func (m Misc) IsProcessingInstruction () bool {
_, itis := m.value.(*ProcessingInstruction)
return itis
}
func (m Misc) IsWhiteSpace () bool {
_, itis := m.value.(*WhiteSpace)
return itis
}
And in order to get the correctly typed elements it was necessary to write three getters.
func (m Misc) Comment () *Comment {
return m.value.(*Comment)
}
func (m Misc) ProcessingInstruction () *ProcessingInstruction {
return m.value.(*ProcessingInstruction)
}
func (m Misc) WhiteSpace () *WhiteSpace {
return m.value.(*WhiteSpace)
}
After this I was able to create an array of Misc types and use it:
func main () {
miscs := []*Misc{
MiscComment((*Comment)(NewChars("comment"))),
MiscProcessingInstruction(&ProcessingInstruction{
NewChars("target"),
NewChars("data")}),
MiscWhiteSpace((*WhiteSpace)(NewChars(" \n")))}
for _, misc := range miscs {
if (misc.IsComment()) {
fmt.Println ((*Chars)(misc.Comment()))
} else if (misc.IsProcessingInstruction()) {
fmt.Println (*misc.ProcessingInstruction())
} else if (misc.IsWhiteSpace()) {
fmt.Println ((*Chars)(misc.WhiteSpace()))
} else {
panic ("invalid misc");
}
}
}
You see there is much code looking almost the same. And it will be the same for any other union. So my question is: Is there any way to simplify the way to deal with unions in Go?
Go claims to simplify programing work by removing redundancy. But I think the above example shows the exact opposite. Did I miss anything?
Here is the complete example: http://play.golang.org/p/Zv8rYX-aFr
As it seems that you're asking because you want type safety, I would firstly argue that your initial
solution is already unsafe as you have
func (m Misc) Comment () *Comment {
return m.value.(*Comment)
}
which will panic if you haven't checked IsComment before. Therefore this solution has no benefits over
a type switch as proposed by Volker.
Since you want to group your code you could write a function that determines what a Misc element is:
func IsMisc(v {}interface) bool {
switch v.(type) {
case Comment: return true
// ...
}
}
That, however, would bring you no compiler type checking either.
If you want to be able to identify something as Misc by the compiler then you should
consider creating an interface that marks something as Misc:
type Misc interface {
ImplementsMisc()
}
type Comment Chars
func (c Comment) ImplementsMisc() {}
type ProcessingInstruction
func (p ProcessingInstruction) ImplementsMisc() {}
This way you could write functions that are only handling misc. objects and get decide later
what you really want to handle (Comments, instructions, ...) in these functions.
If you want to mimic unions then the way you wrote it is the way to go as far as I know.
I think this amount of code might be reduced, e.g. I personally do not think that safeguarding type Misc against containing "illegal" stuff is really helpful: A simple type Misc interface{} would do, or?
With that you spare the constructors and all the Is{Comment,ProcessingInstruction,WhiteSpace} methods boil down to a type switch
switch m := misc.(type) {
Comment: fmt.Println(m)
...
default: panic()
}
Thats what package encoding/xml does with Token.
I am not sure to understand your issue. The 'easy' way to do it would be like the encoding/xml package with interface{}. If you do not want to use interfaces, then you can do something like you did.
However, as you stated, Go is a typed language and therefore should be use for typed needs.
If you have a structured XML, Go can be a good fit, but you need to write your schema. If you want a variadic schema (one given field can have multiple types), then you might be better off with an non-typed language.
Very useful tool for json that could easily rewritten for xml:
http://mholt.github.io/json-to-go/
You give a json input and it gives you the exact Go struct. You can have multiple types, but you need to know what field has what type. If you don't, you need to use the reflection and indeed you loose a lot of the interest of Go.
TL;DR You don't need a union, interface{} solves this better.
Unions in C are used to access special memory/hardware. They also subvert the type system. Go does not have the language primitives access special memory/hardware, it also shunned volatile and bit-fields for the same reason.
In C/C++ unions can also be used for really low level optimization / bit packing. The trade off: sacrifice the type system and increase complexity in favor of saving some bits. This of course comes with all the warnings about optimizations.
Imagine Go had a native union type. How would the code be better? Rewrite the code with this:
// pretend this struct was a union
type MiscUnion struct {
c *Comment
pi *ProcessingInstruction
ws *WhiteSpace
}
Even with a builtin union accessing the members of MiscUnion requires a runtime check of some kind. So using an interface is no worse off. Arguably the interface is superior as the runtime type checking is builtin (impossible to get wrong) and has really nice syntax for dealing with it.
One advantage of a union type is static type check to make sure only proper concrete types where put in a Misc. The Go way of solving this is "New..." functions, e.g. MiscComment, MiscProcessingInstruction, MiscWhiteSpace.
Here is a cleaned up example using interface{} and New* functions: http://play.golang.org/p/d5bC8mZAB_

Resources