Bypass sql null value problems in Go - go

I want to use Go to make an API for an existing database that uses null values extensively. Go will not scan nulls to empty strings (or equivalent), and so I need to implement a workaround.
The workarounds I have discovered have left me unsatisfied. In fact I went looking for a dynamic language because of this problem, but Go has certain attractions and I would like to stick with it if possible. Here are the workarounds that did not satisfy:
Don't use nulls in the database. Unsuitable because the database is pre-existing and I do not have liberty to interfere with its structure. The database is more important than my app, not the other way around.
In sql queries, use COALESCE, ISNULL, etc to convert nulls to empty strings (or equiv) before the data gets to my app. Unsuitable because there are many fields and many tables. Apart from a couple of obvious ones (primary key, surname), I don't know for sure which fields can be relied upon not to give me a null value, so I would be defensively cluttering my sql queries everywhere.
Use sql.NullString, sql.NullInt64, sql.NullFloat64, etc to convert nulls to empty strings (or equiv) as an intermediate step before settling them into their destination type. This suffers from the same problem as above, only I am cluttering my Go code instead of my sql queries.
Use a combination of *pointers and []byte, to scan each item in to a memory location without committing it to a particular type (other than []byte), and then somehow work with the raw data. But to do something meaningful with the data you have to convert it to something more useful, and then you are back to sql.Nullstring or if x==nil{handle it}, and this again is happening on a case by case basis for any field that I need to work with. So, again, we are looking at cluttered, messy, error-prone code and I'm repeating myself all the time instead of being DRY in my coding.
Look to the Go ORM libraries for help. Well I did that, but to my surprise none of them tackle this issue.
Make my own helper package to convert all null strings to "", null ints to 0, null floats to 0.00, null bools to false, etc, and make it part of the process of scanning in from the sql driver, resulting in regular, normal strings, ints, floats and bools.
Unfortunately if 6 is the solution, I do not have the expertise. I suspect the solution would involve something like "if the intended type of the item to be scanned to is a string, make it an sql.NullString and extract an empty string from it. But if the item to be scanned to is an int, make it a NullInt64 and get a zero from that. But if ...(etc)"
Is there anything I have missed? Thank you.

The use of pointers for the sql-scanning destination variables enables the data to be scanned in, worked with (subject to checking if != nil) and marshalled to json, to be sent out from the API, without having to put hundreds of sql.Nullstring, sql.Nullfloat64 etc everywhere. Nulls are miraculously preserved and sent out through the marshalled json. (See Fathername at the bottom). At the other end, the client can work with the nulls in javascript which is better equipped to handle them.
func queryToJson(db *sql.DB) []byte {
rows, err := db.Query(
"select mothername, fathername, surname from fams" +
"where surname = ?", "Nullfather"
)
defer rows.Close()
type record struct {
Mname, Fname, Surname *string // the key: use pointers
}
records := []record{}
for rows.Next() {
var r record
err := rows.Scan(r.Mname, r.Fname, r.Surname) // no need for "&"
if err != nil {
log.Fatal(err)
}
fmt.Println(r)
records = append(records, r)
}
j, err := json.Marshal(records)
if err != nil {
log.Fatal(err)
}
return j
}
j := queryToJson(db)
fmt.Println(string(j)) // [{"Mothername":"Mary", "Fathername":null, "Surname":"Nullfather"}]

Related

Upserting multiple vertices in Gremlin from Go

I've written the following Go code to upsert and array of vertices in Go. First off, the code has not effect. It doesn't error out, it just doesn't do the upserts.
Second, is this the most efficient way to upsert a batch of vertices using Gremlin?
func (n NeptuneGremlinGraph) Put(assetID string, version string, records []les.DeltaEditRecord) error {
g := gremlin.Traversal_().WithRemote(n.connection)
for _, r := range records {
promise := g.V().HasLabel("Entity").Property("asset_id", assetID).Property("version", version).Property("entity_id", r.EntityID).Fold().
Coalesce(g.V().Unfold(),
g.AddV("Entity").Property("asset_id", assetID).Property("version", version).Property("entity_id", r.EntityID)).Iterate()
err := <-promise
if err != nil {
return err
}
}
return nil
}
This is using the tinkerpop Go driver gremlingo.
Your Coalesce looks wrong. Can you please try
Coalesce(AnonT.Unfold(),
AnonT.AddV("Entity").Property("asset_id", assetID).Property("version", version).Property("entity_id", r.EntityID)).Iterate()
This assumes AnonT was defined as
var AnonT = gremlingo.T__
In your original query, the Coalesce started with g.V().Unfold() which is going to always yield results (unless the graph is empty) so the alternate part of the Coalesce will never get executed.
Using the Fold ... Coalesce pattern is a perfectly reasonable way to do a "create if not exist" type of operation. Note that in Apache TinkerPop 3.6.x a new step called MergeV (along with a corresponding MergeE) was added. This will help simplify these types of tasks.
It looks from your code sample that you may be using Amazon Neptune. If so, support for MergeV is not quite there yet in Neptune, so keep using the Coalesce idiom until Neptune adds that support.
UPDATED based on comment discussion
Also, as discussed in the comments, this line
g.V().HasLabel("Entity").Property("asset_id", assetID).Property("version", version).Property("entity_id", r.EntityID).Fold().
should use Has instead of Property

Use a single mutex across multiple goroutines

I'm trying to reduce the amount of http requests my discord bot is making.
It's reading from an API.
With the fetched data it updates an internal database and outputs the changes.
Thing is: that database is different for every server the bot is in, and that's where I'm using the go routines. But, some servers need to fetch the same data, here is where I want to reduce the http requests. Right now I'm making requests regardless if I've already fetched a character. I want to create some sort of data that could be shared between the go routines and before making a request search within this data.
I was advised to use mutex. I'm trying. Original question: Working with unbuffered channels in golang
I made a skeleton of the real code I've tried: https://play.golang.org/p/mt229ns1R8m
In this example master := make([][]map[string]interface{}, 0) is simulating the discord servers.
Chars and Chars2 would be the tracked chars for each individual server.
The char "Test" is mutual to both of them, so it should be fetched from the API only once.
It's outputing this:
[[map[Level:15 Name:Test] map[Level:150 Name:Test2]] [map[Level:1500 Name:Test3] map[Level:15 Name:Test]]]
------
A call would be made
A call would be made
A call would be made
A call would be made
Cache: [map[Level:150 Name:Test2] map[Level:15 Name:Test]]Cache: [map[Level:15 Name:Test] map[Level:1500 Name:Test3]]Done
I was expecting the output to be:
[[map[Level:15 Name:Test] map[Level:150 Name:Test2]] [map[Level:1500 Name:Test3] map[Level:15 Name:Test]]]
------
A call would be made
A call would be made
A call would be made
Cache: [map[Level:150 Name:Test2] map[Level:15 Name:Test] map[Level:1500 Name:Test3]]Done
But a new cache is being generated by every go routine. How can I fix this?
Thanks.
There are too many unknowns here for me to really write a proper design, but let's make a few notes:
Try not to use interface{} at all, if at all possible. In this case, it seems that it must be possible, though I'm not sure what the actual types will be.
Try to make your data as simple as possible, but no simpler. In this case, that probably means: have one data structure for "thing that talks to a Discord server" and a separate one for "thing that talks to the local database" (is this a caching database? if so, what are the criteria for invalidating a cache entry?). But if one "character" (whatever that is—apparently a string) can have different properties per Discord server, that means that your index into your local database is not just a character, but rather a pair of values: the string value itself plus a Discord-server-identifier.
This might give you a functional interface like this:
var cacheServer *CacheServer
func InitCacheServer() error {
cacheServer = ... // whatever it takes to initialize the cache server
}
(I've assumed lazy initialization of the cache server. If you can do up-front initialization, you can drop the next test below. Replace ValueType with the type of the result of a cached lookup of a name.)
func (DiscordServer ds) Get(name string) (ValueType, error) {
if cacheserver == nil {
if err := InitCacheServer(); err != nil {
return nil, err
}
}
// Do a cache lookup. Tell the cache server that if there
// is no entry, it should return a NoEntry error and we will
// fill the cache ourselves, so it should hold this slot as
// "will be filled, so wait for it".
slot, v, err := cacheServer.Lookup(name, ds.identity, CacheServer.IntentToFill)
if err == CacheServer.NoEntry {
// We have the slot held. Try to look up the right info
// directly in the Discord server, then cache it.
v, err = ds.UncachedGet(name)
// Tell cache server that this is the value, or that it should
// produce this error instead of NoCache.
cacheServer.FillSlot(slot, v, err)
}
}
You might only want to cache some error types, rather than all; that's another one of those design questions that needs an answer that I cannot provide here. There are other ways to do this that don't necessarily need a slot pointer return value, too; I've just chosen this one for this example.
Note that most of the "hard work" is now in the cache server, which definitely requires some fancy footwork. In particular you will want to lock the overall data structure for a little while, use that to find the correct slot, then hold the slot itself so that other users of the slot must wait, while releasing the overall lock so that other users of other entries need not wait. This introduces locking order constraints: be careful to avoid deadlock. One method that should work is:
type CacheServer struct {
lock sync.Mutex
data map[string]map[string]*Entry
// more fields
}
type Entry {
lock sync.Mutex
cachedValue ValueType
cachedError error
}
(You'll need some more types, like Intent—just two enumerated integers for now—below, and probably more fields in the above; this is just a skeleton.)
func (cs *CacheServer) Lookup(name, srv string, flags Intent) (*Entry, ValueType, error) {
cs.lock.Lock()
defer cs.lock.Unlock()
// first, look up the server - if it does not exist, create one
smap := cs.data[srv]
if smap == nil {
cs.data[server] = make(map[string]*Entry)
}
entry := smap[name]
if entry == nil {
// no cached entry - if this is a pure lookup, just error,
// but if not, make a locked entry
if flags == CacheServer.IntentToFill {
// make a new entry and return with it locked
entry = &Entry{}
smap[name] = entry
entry.lock.Lock() // and do not unlock
}
return entry, nil, NoEntry
}
entry.lock.Lock() // wait for someone to fill it, if needed
defer entry.lock.Unlock()
return nil, entry.cachedValue, entry.cachedError
}
You need a routine to fill and release the entry as well, but it's pretty simple. You could, if you choose, make this a method on the Entry type rather than on the CacheServer type, as at least in this particular prototype, there is no need to use the cache server data structures directly. If you start getting fancier with cache invalidation, though, it might be nice to have access to the CacheServer object.
Note: I've designed this so that you can do a cache lookup without an intent-to-fill, if that's useful. If not, there's no reason to bother with the Intent argument.

Should you use a zero "enum" value to indicate an invalid value

Having used C for decades I got into the habit of using the zero value of an enum as a special undefined/unknown/error value. Over the years I believe this has saved me not hours or even days but months of debugging time since it makes it obvious when a value has not been initialized. (I wouldn't do this for simple enums where there is a sensible default value and no possibility of uninitialized values.)
It seems to me that this practice is even more useful in Go as values are automatically zero-initialised for you. However, I have been told that "idiomatic" Go zero-values should be valid values. I think this "rule" was invented for structs, where it makes a lot of sense (in the absence of constructors) to have a newly created "zeroed" struct ready for use, but there are cases where there is no logical default value (for structs and enums).
If you need it here is an example:
type Base int
const (
Invalid Base = iota
A
C
T
G
)
Note that I have searched extensively for this question on SO and was surprised that this specific topic has not been covered. I realise that my question is somewhat subjective and may be flagged but I think it is useful. I am looking for evidence that using zero values to indicate error conditions is acceptable Go practice. Any examples of this use, eg. from the standard Go library, would be appreciated.
A true enum type should only be assigned a value from a list of pre-defined constant values. The go language, however, does not have such a type-value enforcement.
go has const which typically uses a derivative type of say int. There is no compile/run-time mechanism to enforce a value is strictly within a pre-defined list.
So what does this mean in practice?
Is your enum value mandatory or optional? That is, when deserializing the 'enum' value, is it:
optional - then use the zero-value signifies the default value
mandatory - then the zero-value indicates an initialization error
Depending on your common use-case, choose one of these two options.
EDIT:
Deserializing is not the only concern. One has to be careful when branching on enum values. For example:
type role int
const (
user role = iota
helpdesk
admin
)
func greet(r role) {
switch r {
case admin:
fmt.Println("hi admin")
case helpdesk:
fmt.Println("hi helpdesk")
default:
fmt.Println("hi user") // right?
}
}
This works:
var r role
r = admin
greet(r) // hi admin
But what about this?
r = 12
greet(r) // 'hi user' ?!!
So be sure to pedantically validate on valid values only:
func validateRole(r role) (err error) {
switch r {
case user, helpdesk, admin: // all valid values
default:
err = fmt.Errorf("invalid `role` enum %d", r)
}
return
}
Playground

Any down-side always using pointers for struct field types?

Originally I figured I'd only use pointers for optional struct fields which could potentionally be nil in cases which it was initially built for.
As my code evolved I was writing different layers upon my models - for xml and json (un)marshalling. In these cases even the fields I thought would always be a requirement (Id, Name etc) actually turned out to be optional for some layers.
In the end I had put a * in front of all the fields including so int became *int, string became *string etc.
Now I'm wondering if I had been better of not generalising my code so much? I could have duplicated the code instead, which I find rather ugly - but perhaps more efficient than using pointers for all struct fields?
So my question is whether this is turning into an anti-pattern and just a bad habbit, or if this added flexibility does not come at a cost from a performance point of view?
Eg. can you come up with good arguments for sticking with option A:
type MyStruct struct {
Id int
Name string
ParentId *int
// etc.. only pointers where NULL columns in db might occur
}
over this option B:
type MyStruct struct {
Id *int
Name *string
ParentId *int
// etc... using *pointers for all fields
}
Would the best practice way of modelling your structs be from a purely database/column perspective, or eg if you had:
func (m *MyStruct) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
var v struct {
XMLName xml.Name `xml:"myStruct"`
Name string `xml:"name"`
Parent string `xml:"parent"`
Children []*MyStruct `xml:"children,omitempty"`
}
err := d.DecodeElement(&v, &start)
if err != nil {
return err
}
m.Id = nil // adding to db from xml, there's initially no Id, until after the insert
m.Name = v.Name // a parent might be referenced by name or alias
m.ParentId = nil // not by parentId, since it's not created yet, but maybe by nesting elements like you see above in the V struct (Children []*ContentType)
// etc..
return nil
}
This example could be part of the scenario where you want to add elements from XML to the database. Here ids would generally not make sense, so instead we use nesting and references on name or other aliases. An Id for the structs would not be set until we got the id, after the INSERT query. Then using that ID we could traverse down the hierachy to the child elements etc.
This would allow us to have just 1 MyStruct, and use eg. different POST http request handler functions, depending if the call came from form input, or xml importing where a nested hierarchy and different relations might need come different handling.
In the end I guess what I'm asking is:
Would you be better off separating struct models for db, xml- and json operations (or whatever scenario that you can think of), than using struct field pointers all the way, so we can reuse the model for different, yet related stuff?
Apart from possible performance (more pointers = more things for the GC to scan), safety (nil pointer dereference), convenience (s.a = 2 vs s.a = new(int); *s.a = 42), and memory penalties (a bool is one byte, a *bool is four to eight), there is one thing that really bothers me in the all-pointer approach. It violates the Single responsibility principle.
Is the MyStruct you get from XML or DB same as MyStruct? What if the DB schema will change? What if the XML changes format? What if you'll also need to unmarshal it into JSON, but in a slightly different manner? And what if you need to support all that (and in multiple versions!) at the same time?
A lot of pain comes to you when you try to make one thing do many things. Is having one do-it-all type instead of N specialised types really worth it?

Properly distinguish between not set (nil) and blank/empty value

Whats the correct way in go to distinguish between when a value in a struct was never set, or is just empty, for example, given the following:
type Organisation struct {
Category string
Code string
Name string
}
I need to know (for example) if the category was never set, or was saved as blank by the user, should I be doing this:
type Organisation struct {
Category *string
Code *string
Name *string
}
I also need to ensure I correctly persist either null or an empty string to the database
I'm still learning GO so it is entirely possible my question needs more info.
The zero value for a string is an empty string, and you can't distinguish between the two.
If you are using the database/sql package, and need to distinguish between NULL and empty strings, consider using the sql.NullString type. It is a simple struct that keeps track of the NULL state:
type NullString struct {
String string
Valid bool // Valid is true if String is not NULL
}
You can scan into this type and use it as a query parameter, and the package will handle the NULL state for you.
Google's protocol buffers (https://code.google.com/p/goprotobuf/) use pointers to describe optional fields.
The generated objects provide GetFoo methods which take the pain away from testing for nil (a.GetFoo() returns an empty string if a.Foo is nil, otherwise it returns *a.Foo).
It introduces a nuisance when you want to write literal structs (in tests, for example), because &"something" is not valid syntax to generate a pointer to a string, so you need a helper function (see, for example, the source code of the protocol buffer library for proto.String).
// String is a helper routine that allocates a new string value
// to store v and returns a pointer to it.
func String(v string) *string {
return &v
}
Overall, using pointers to represent optional fields is not without drawbacks, but it's certainly a viable design choice.
The standard database/sql package provides a NullString struct (members are just String string and Valid bool). To take care of some of the repetitive work of persistence, you could look at an object-relational manager like gorp.
I looked into whether there was some way to distinguish two kinds of empty string just out of curiosity, and couldn't find one. With []bytes, []byte{} == []byte(nil) currently returns false, but I'm not sure if the spec guarantees that to always remain true. In any case, it seems like the most practical thing to do is to go with the flow and use NullString.

Resources