Golang Representing ASTs as Structs and Serializing them - go

I want to encode/decode an AST in Golang but am running up against a roadblock with the lack of sumtypes.
A very trivial AST to demonstrate the problem:
data BoolAST = Or BoolAST BoolAST | And BoolAST BoolAST | Lit Bool
Approach 1
type BoolStatement interface {
ComputeBool() bool
Serialize() []byte
}
type Or struct {
Left BoolStatement
Right BoolStatement
}
type And struct {
Left BoolStatement
Right BoolStatement
}
type BoolFunc struct {
Value bool
}
Now this structure serializes very well, But what I am having trouble with is deserializing (msgpack, json, etc). So You would have to wrap the interface in a struct so you can add a deserializer method. This to me also seems sloppy as I dont want to have to hand craft a serializer
Approach 2
To solve this I changed my struct a little bit. Everything is the same struct, but only one field is populated at a time. This code makes deserialization a breeze.
type BoolAST struct {
Or []BoolAST
And []BoolAST
Lit bool
}
This works well to deserialize because empty fields are left nil, but then I have these empty fields. When interpreting the AST I have to use switch statements to find what field is null and that adds a tiny bit of overhead.
This also makes the json nice:
{
"and": [
{"lit": true},
{"or": [
{"lit": false},
{"lit": true}}
]
]
}
But this can get crazy when the ast has many types of branching paths. You might end up with 10 nil fields on the struct.
It seems very hard with no sum types to solve this problem. Is this the right approach?

Related

Deriving a hashmap/Btree from a strcut of values in rust

I am trying to do the following:
Deserialise a struct of fields into another struct of different format.
I am trying to do the transformation via an Hashmap as this seems to be the best suited way.
I am required to do this as a part of transformation of a legacy system, and this being one of the intermediary phases.
I have raised another question which caters to a subset of the same use-case, but do not think it gives enough overview, hence raising this question with more details and code.
(Rust converting a struct o key value pair where the field is not none)
Will be merging both questions once I figure out how.
Scenario:
I am getting input via an IPC through a huge proto.
I am using prost to deserialise this data into a struct.
My deserialised struct has n no of fields and can increase as well.
I need to transform this deserialised struct into a key,value struct. (shown ahead).
The incoming data, will mostly have a majority of null keys .i.e out of n fields, most likely only 1,2 have values at a given time.
Input struct after prost deserialisation:
Using proto3 so unable to define optional fields.
Ideally I would prefer a prost struct of options on every field. .i.e Option instead of string.
struct Data{
int32 field1,
int64 field2,
string field3,
...
}
This needs to be transformed to a genric struct as below for further use:
struct TransformedData
{
string Key
string Value
}
So,
Data{
field1: 20
field2: null
field3: null
field4: null
....
}
becomes
TransformedData{
key:"field1"
Value: "20"
}
Methods I have tried:
Method1
Add serde to the prost struct definiton and deserialise it into a map.
Loop over each item in a map to get values which are non-null.
Use these to create a struct
https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=9645b1158de31fd54976926c9665d6b4
This has its challenges:
Each primitve data type has its own default values and needs to be checked for.
Nested structs will result in object data type which needs to be handled.
Need to iterate over every field of a struct to determine non null values.
1&2 can be mitigated by setting an Optional field(yet to figure out how in proto3 and prost)
But I am not sure how to get over 3.
Iterating over a large struct to find non null fields is not scalable and will be expensive.
Method 2:
I am using prost reflects dynamic reflection to deserialise and get only specified value.
Doing this by ensuring each proto message has two extra fields:
proto -> signifying the proto struct being used when serializing.
key -> signifying the filed which has value when serializing.
let fd = package::FILE_DESCRIPTOR_SET;
let pool = DescriptorPool::decode(fd).unwrap();
let proto_message: package::GenericProto = prost::Message::decode(message.as_ref()).expect("error when de-serialising protobuf message to struct");
let proto = proto_message.proto.as_str();
let key = proto_message.key.as_str();
Then using key , I derive the key from what looks to be a map implemented by prost:
let message_descriptor = pool.get_message_by_name(proto).unwrap();
let dynamic_message = DynamicMessage::decode(message_descriptor, message.as_ref()).unwrap();
let data = dynamic_message.get_field_by_name(key).unwrap().as_ref().clone();
Here :
1.This fails when someone sends a struct with multiple fields filled.
2.This does not work for nested objects, as nested objects in turn needs to be converted to a map.
I am looking for the least expensive way to do the above.
I understand the solution I am looking for is pretty out there and I am taking a one size fits all approach.
Any pointers appreciated.

Best patterns for dynamic checks in Go without reflection?

Any chance there are good patterns in any of our codebases that help reduce redundant checks on typed fields/structs? I feel like Lisp handles these things well, but trying to understand if there are any good patterns without the need for something like reflection, etc.
For example, for custom structs, I find myself needing to do a lot of:
type helloworld struct {
field1 *string
field2 *string
field3 int
}
if field1 != nil {
if len([]rune(*field1)) > MAX_ALLOWED_LEN) return errors.New("exceeded max allowable length")
}
if field2 != nil {
if len([]rune(*field2)) > MAX_ALLOWED_LEN) return errors.New("exceeded max allowable length")
}
if field3 != 0 { //do stuff}
....
maps are great, but typing can be too generic a lot of the time, which can also cause confusion. I ask, because some of these checks are redundant and proliferate through a lot of packages, so when I need to add an additional field, refactoring is time consuming. It would be nice if I have validation on certain types or fields while keeping performance high.
3 directions, not the most straight forward, but they can reduce your check blocks:
Method 1: Define your own annotation
Golang has annotations in structs
type bla struct {
t string `json:"some,omitempty"`
}
This is actually an open annotation structure (tags).
A nice example (contains some reflection) is https://github.com/mitchellh/mapstructure
Look at line 374 of the mapstructure.go file. It is pretty re-usable, and you could use it as base for your own annotation.
To use it in your context you would create types which you can then parse.
Method 2: DRY check function
Your check seems to be the same or similar.
A generic comparator which iterates over fields:
func check(fields ...interface{}) error {
for v:=range fields {
if v==nil {
return errors.New("nil")
}
switch v.(type) {
case *string:
if len([]rune(*v)) > MAX_ALLOWED_LEN) {
return errors.New("exceeded max allowable length")
}
}
...
}
}
Method 3: Generics.
The switch type in method 2 might be able to be rewritten in a generic style. Not easier, but can be prettier.

Polymorphism on structs without methods in Go

I'm working on several web server projects in Go, and there is a common problem that I'm always facing. I know we can achieve something like polymorphism in Go with interfaces and methods, but many times I had a scenario that I needed polymorphism on some data-holder structs that (maybe) just had some common fields, and no methods at all.
For example consider a story writing platform, where each user can write short stories and novels:
type ShortStory struct {
Name string
ID int
Body string
}
type LongStory struct {
Name string
ID int
Chapters []string
}
Now I simply want to have a data layer function, say GetStories(), which fetches all stories written by a user from database.
func GetStories(id int) []SOME_TYPE {
...
}
There are really no methods that I want to have on my ShortStory and LongStory structs. I know I can add a dummy method and let them satisfy some Storier interface, then use that interface as return type. But since there is no method I would want on a data container model, adding a dummy method just for the language to enable a feature, seems like a poor design choice to me.
I can also make the function return []interface{}, but that's against the whole idea of "typed language" I believe.
Another way is to have two separate GetShortStories() and GetLongStories() methods, which return a slice of their own type. But at some point I would finally want to merge those two slices into one and there I would again need a []interface{}. Yes, I can return a JSON like:
{
"short_stories" : [...],
"long_stories" : [...]
}
But I want my json to be like:
[{...}, {...}, {...}]
And I wouldn't change my APIs because of a language's limits!
I'm not a pro in Go, so am I missing something here? Is there a Go-ish approach to this, or is it really bad language design on Golang's side?
If you cannot express what you want to do using the features of a language, you should first try to change the way you structure your program before blaming the language itself. There are concepts that cannot be expressed in Go but can be expressed well in other languages, and there are concepts you cannot express well in other languages but you can in Go. Change the way you solve the problem to effectively use the language.
One way you can address your problem is using a different type of struct:
type Story struct {
Name string
ID int
ShortBody string
Chapters []string
}
If the Chapters is empty, then it is a short story.
Another way:
type Story struct {
Name string
ID int
Content StoryContent
}
type StoryContent interface {
Type() string
}
type ShortStory interface {
StoryContent
Body() string
}
type LongStory interface {
StoryContent
Chapters() []string
}
etc.

Conversion of Go struct fields with tags

I am really stuck here with a seemingly trivial problem in Go:
I have a Golang microservice, which outputs data in json format. Let's say I have a simple struct with json tags for this result:
type Result struct {
Name string `json:"name"`
Age int `json:"age"`
}
In the code part where the data is actually pulled out of the database, I have a quite similar struct, like this:
type ResultBackend struct {
Name string `bson:"fullName"`
Age int `bson:"age"`
}
The struct fields are similar, except the different tags. I would like to keep things simple, and return just one struct from the backend service (ResultBackend), which can then be send as a JSON response, like this:
func process() Result {
var result ResultBackend
... do a MongoDB query here and store results in result variable ...
return result
}
This certainly would not work, because we have two different structs here.
Of course one solution would be to embed both tags in one struct like so:
type Result struct {
Name string `json:"name" bson:"fullName"`
Age int `json:"age bson:"age"`
}
and then use this struct in the main code and the "process" function.
This works, but this seems like "poisoning" the Result struct of the main code with the bson tags. What, for instance, if the backend result is a XML file? I'd have to add xml tags to the struct as well. Or maybe someday tags some very obsure database adapter.
This doesn't seem to be the cleanest approach in my eyes. I'd rather have a clean Result struct in the main code, and simply to a conversion from one struct to another.
Is there any easy way doing that, or do I really have to copy all the fields of a ResultBackend struct to a new Result struct and return that? Or I am trying to over-simplify my code here? :)
Cheers!
What I'd do is create a separate type for every "serialisation format" if you will. This approach has several advantages:
Your concerns are separated.
JSON un/marshalling doesn't interfere with BSON/XML, so you can add any kind of additional struct fields (like xml.Name for example).
You can actually create several such types for different APIs that need different parameters.
The only disadvantage I see is that it's more code and even more code to move data between those. Ultimately it's up to you and your application design. If you're sure that your JSON/BSON will stay the same, you can use one type. Otherwise, I'd recommend specialisation.
To those who want to add bson tags in golang struct, here is a tool which can be helpful. I have spent a whole day searching for this and I want others to save their valuable time.
You can convert the following struct
type Result struct {
Name string `json:"name"`
Age int `json:"age"`
}
into
type Result struct {
Name string `json:"name" bson:"fullName"`
Age int `json:"age bson:"age"`
}
With this tool, you can not only add bson tags but also
XML tags,
Define validation and scope
Format tag values
Apply transformation
skip unexported fields.
Go Modify Tags: https://github.com/fatih/gomodifytags
Cheers,

Any down-side always using pointers for struct field types?

Originally I figured I'd only use pointers for optional struct fields which could potentionally be nil in cases which it was initially built for.
As my code evolved I was writing different layers upon my models - for xml and json (un)marshalling. In these cases even the fields I thought would always be a requirement (Id, Name etc) actually turned out to be optional for some layers.
In the end I had put a * in front of all the fields including so int became *int, string became *string etc.
Now I'm wondering if I had been better of not generalising my code so much? I could have duplicated the code instead, which I find rather ugly - but perhaps more efficient than using pointers for all struct fields?
So my question is whether this is turning into an anti-pattern and just a bad habbit, or if this added flexibility does not come at a cost from a performance point of view?
Eg. can you come up with good arguments for sticking with option A:
type MyStruct struct {
Id int
Name string
ParentId *int
// etc.. only pointers where NULL columns in db might occur
}
over this option B:
type MyStruct struct {
Id *int
Name *string
ParentId *int
// etc... using *pointers for all fields
}
Would the best practice way of modelling your structs be from a purely database/column perspective, or eg if you had:
func (m *MyStruct) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
var v struct {
XMLName xml.Name `xml:"myStruct"`
Name string `xml:"name"`
Parent string `xml:"parent"`
Children []*MyStruct `xml:"children,omitempty"`
}
err := d.DecodeElement(&v, &start)
if err != nil {
return err
}
m.Id = nil // adding to db from xml, there's initially no Id, until after the insert
m.Name = v.Name // a parent might be referenced by name or alias
m.ParentId = nil // not by parentId, since it's not created yet, but maybe by nesting elements like you see above in the V struct (Children []*ContentType)
// etc..
return nil
}
This example could be part of the scenario where you want to add elements from XML to the database. Here ids would generally not make sense, so instead we use nesting and references on name or other aliases. An Id for the structs would not be set until we got the id, after the INSERT query. Then using that ID we could traverse down the hierachy to the child elements etc.
This would allow us to have just 1 MyStruct, and use eg. different POST http request handler functions, depending if the call came from form input, or xml importing where a nested hierarchy and different relations might need come different handling.
In the end I guess what I'm asking is:
Would you be better off separating struct models for db, xml- and json operations (or whatever scenario that you can think of), than using struct field pointers all the way, so we can reuse the model for different, yet related stuff?
Apart from possible performance (more pointers = more things for the GC to scan), safety (nil pointer dereference), convenience (s.a = 2 vs s.a = new(int); *s.a = 42), and memory penalties (a bool is one byte, a *bool is four to eight), there is one thing that really bothers me in the all-pointer approach. It violates the Single responsibility principle.
Is the MyStruct you get from XML or DB same as MyStruct? What if the DB schema will change? What if the XML changes format? What if you'll also need to unmarshal it into JSON, but in a slightly different manner? And what if you need to support all that (and in multiple versions!) at the same time?
A lot of pain comes to you when you try to make one thing do many things. Is having one do-it-all type instead of N specialised types really worth it?

Resources