Retrieval after serialization to disk using gob - go

I have been learning about databases and wanted to implement one as well for learning purposes and not for production. I have a defined schema:
type Row struct {
ID int32
Username string
Email string
}
Now, currently, i am able to encode structs of this type to a file in an append only manner.
//Just to show i use a file for the encoding, it has missing details.
func NewEncoder(db *DB) *gob.Encoder{
return gob.NewEncoder(db.File)
}
func SerializeRow(r Row, encoder *gob.Encoder, db *DB) {
err := encoder.Encode(r)
if err != nil {
log.Println("encode error:", err)
}
}
Now, it's relatively easy to mimick a "select" statement by simply decoding the entire file using gob.decode
func DeserializeRow(decoder *gob.Decoder, db *DB){
var rows Row
db.File.Seek(0, 0)
err := decoder.Decode(&rows)
for err == nil {
if err != nil {
log.Println("decode error:", err)
}
fmt.Printf("%d %s %s\n", rows.ID, rows.Username, rows.Email)
err = decoder.Decode(&rows)
}
}
My current issue is, I want to be able to retrieve specific rows based on ID. I know sqlite uses 4kb paging, in the sense that serialized rows occupy a "page" ie. 4KB till a page can't hold them anymore, then another is created. How do I mimick such a behaviour using gob in the most simplistic and idiomatic way?
Seen: I have seen this and this

A Gob stream may contain type definitions and decoding instructions, so you can't seek a Gob stream. You can only read it from the beginning up to the point you find what you need.
A Gob stream is completely unsuitable for a database storage format in which you need to skip elements.
You could create a new encoder and serialize each records separately, in which case you could skip elements (by maintaining a file index storing which record starts at which position), but it would be terribly inefficient and redundant (as described in the linked answer, the speed and storage cost amortizes as you write more values of the same type, and always creating new encoders loses this gain).
A much better approach would be to not use encoding/gob for this, but rather define your own format. To efficiently support searches (select), you have to build some kind of index on the searchable columns / fields, else you still need to perform a full table scan.

Related

2 parameters invoke lambda aws from golang

i want to send the 2 parameters a lambda needs in order to work and it basically needs the value i want to search and as a second parameter the field where to find that value.
Now with no problem i've been able to access some other lambdas with that only need one parameter with a code like this.
func (s *resourceService) GetProject(ctx context.Context, name string) projectStruct {
payload, err := json.Marshal(name)
util.Logger.Debugf("Payload",payload)
invokeOutput, err := s.lambdaSvc.Invoke(ctx, &lambda.InvokeInput{
FunctionName: &s.getProject,
InvocationType: "RequestResponse",
Payload: payload,
})
if err != nil {
panic(err.Error())
}
var project projectStruct
err = json.Unmarshal(invokeOutput.Payload, &project)
if err != nil {
panic(err.Error())
}
util.Logger.Debugf("Invocation output [%v]", invokeOutput)
return project
}
now with 2 parameters i've had a lot of problems and tried a LOT of different approaches starting for adding another Payload value, creating a string with the 2 values and marshal it, marshaling both parameters and try and add them as the payload, even append both marshaled bytes array but i've been incapable of sending 2 parameters as the payload
Do you know the right way to do so? Please Help
Lambda functions only take one Payload. In V1 of the AWS SDK, InvokeInput takes one []byte parameter expressing JSON, as you already know.
You can structure your one Json Payload to carry a list. Looking at your example, Payload could look something like
["name","name"]
You could change your signature like so:
func (s *resourceService) GetProject(ctx context.Context, names []string) projectStruct
json.Marshal can handle marshaling a slice just as well as the elements within the slice, so the remaining code doesn't need to change.
Of course the receiving function must agree about the schema of the data it's being passed. If you wish to change from a string to a list of strings, that will be a breaking change. For that reason, Json schemas typically use named values instead of scalars.
[{ "Name": "Joan"},{"Name":"Susan"}]
You can add Age and Address without breaking the receiving function (though of course, it will ignore the new fields until you program it to ignore them).
Take time to Get to know JSON - it's a simple and expressive encoding standard that is reliably supported everywhere. JSON is a natural choice for encoding structured data in Go because JSON integrates well with Go's with structures, maps, and slices.

How to output json format errors in Golang rest, specifically referencing the bad fields

I have the following requirement: return errors from a REST API in the following format:
Error format
422
{
"name-of-field": [
"can't be blank",
"is too silly"
]
}
My code looks like this:
var PostFeedback = func(w http.ResponseWriter, r *http.Request) {
params := mux.Vars(r)
surveyId := params["id"]
feedback := &models.Feedback{}
err := json.NewDecoder(r.Body).Decode(feedback)
if err != nil {
jsonError := fmt.Sprintf(`{
"%s": [
"%s"
]
}`, "errors", err)
log.Printf("invalid input format, %v", jsonError)
resp := map[string]interface{}{"error": jsonError}
u.Respond(w, resp)
return
}
Questions:
How do I get the names of the offending fields?
How do I satisfy the requirement best?
The encoding/json package doesn't provide validation for "blank", nor "silly" values. It will return an error only if the data in the body is not a valid json, or if the field types in the json do not, according to the package's spec, match the field types of the structure into which you're trying to decode that json.
The 1st type of error would be the json.SyntaxError, if you get this it is not always possible to satisfy your requirements since there may be no actual fields which you could use in your response, or if there are json fields, they, and their values, may be perfectly valid json, but the cause of the error may lie elsewhere (see example).
In cases where the data holds actual json fields but it has, for example, non-json values you could use the Offset field of the SyntaxError type to find the closest preceding field in the data stream. Using strings.LastIndex you can implement a naive solution to look backwards for the field.
data := []byte(`{"foobar": i'm not json}`)
err := json.Unmarshal(data, &T{})
se, ok := err.(*json.SyntaxError)
if !ok {
panic(err)
}
field := string(data[:se.Offset])
if i := strings.LastIndex(field, `":`); i >= 0 {
field = field[:i]
if j := strings.LastIndex(field, `"`); j >= 0 {
field = field[j+1:]
}
}
fmt.Println(field) // outputs foobar
Playground link
NOTE: As you can see, for you to be able to look for the field, you need to have access to the data, but when you're using json.NewDecoder and passing it the request's body directly, without first storing its contents somewhere, you'll loose access to that data once the decoder's Decode method is done. This is because the body is a stream of bytes wrapped in a io.ReadCloser that does not support "rewinding", i.e. you cannot re-read bytes that the decoder already read. To avoid this you can use ioutil.ReadAll to read the full contents of the body and then json.Unmarshal to do the decoding.
The 2nd type of error would be the json.UnmarshalTypeError. If you look at the documentation of the error type and its fields you'll know that all you need to do is to type assert the returned value and you're done. Example
Validation against "blank" and "silly" values would be done after the json has been successfully decoded into your structure. How you do that is up to you. For example you could use a 3rd party package that's designed for validating structs, or you can implement an in-house solution yourself, etc. I actually don't have an opinion on which one of them is the "best" so I can't help you with that.
What I can say is that the most basic approach would be to simply look at each field of the structure and check if its value is valid or not according to the requirements for that field.

Reading from a file from bufio with a semi complex sequencing through file

So there may be questions like this but its not a super easy thing to google. Basically I have a file thats a set of protobufs encoded and sequenced as they normally are from the protobuf spec.
So think of the bytes values being chunked something like this throughout the file:
[EncodeVarInt(size of protobuf struct)] [protobuf stuct bytes]
So you have a few bytes read one at a time that are used for large jump of a read on our protof structure.
My implementation using the os ReadAt method on a file currently looks something like this.
// getting the next value in a file context feature
func (geobuf *Geobuf_Reader) Next() bool {
if geobuf.EndPos <= geobuf.Pos {
return false
} else {
startpos := int64(geobuf.Pos)
for int(geobuf.Get_Byte(geobuf.Pos)) > 127 {
geobuf.Pos += 1
}
geobuf.Pos += 1
sizebytes := make([]byte,geobuf.Pos-int(startpos))
geobuf.File.ReadAt(sizebytes,startpos)
size,_ := DecodeVarint(sizebytes)
geobuf.Feat_Pos = [2]int{int(size),geobuf.Pos}
geobuf.Pos = geobuf.Pos+int(size)
return true
}
return false
}
// reads a geobuf feature as geojson
func (geobuf *Geobuf_Reader) Feature() *geojson.Feature {
// getting raw bytes
a := make([]byte,geobuf.Feat_Pos[0])
geobuf.File.ReadAt(a,int64(geobuf.Feat_Pos[1]))
return Read_Feature(a)
}
How can I implement something like bufio or other chunked reading mechanisms to speed up so many file ReadAt's? Most bufio implementations I've seen are for having a specific delimitter. Thanks in advance hopefully this wasn't a horrible question.
Package bufio
import "bufio"
type SplitFunc
SplitFunc is the signature of the split function used to tokenize the
input. The arguments are an initial substring of the remaining
unprocessed data and a flag, atEOF, that reports whether the Reader
has no more data to give. The return values are the number of bytes to
advance the input and the next token to return to the user, plus an
error, if any. If the data does not yet hold a complete token, for
instance if it has no newline while scanning lines, SplitFunc can
return (0, nil, nil) to signal the Scanner to read more data into the
slice and try again with a longer slice starting at the same point in
the input.
If the returned error is non-nil, scanning stops and the error is
returned to the client.
The function is never called with an empty data slice unless atEOF is
true. If atEOF is true, however, data may be non-empty and, as always,
holds unprocessed text.
type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
Use bufio.Scanner and write a custom protobuf struct SplitFunc.

Is it possible to dump golang db.Query() output to a string?

I have a small Heroku app in which i print out name and age from each rows after query execution.
I want to avoid looping rows.Next(),Scan().. and just want to show what database returned after query execution which may be some data or error.
Can we directly dump data to a string for printing?
rows, err := db.Query("SELECT name FROM users WHERE age = $1", age)
if err != nil {
log.Fatal(err)
}
for rows.Next() {
var name string
if err := rows.Scan(&name); err != nil {
log.Fatal(err)
}
fmt.Printf("%s is %d\n", name, age)
}
if err := rows.Err(); err != nil {
log.Fatal(err)
}
Pretty much: No.
The Query method is going to return a pointer to a Rows struct:
func (db *DB) Query(query string, args ...interface{}) (*Rows, error)
If you print that (fmt.Printf("%#v\n", rows)) you'll see something such as:
&sql.Rows{dc:(*sql.driverConn)(0xc8201225a0), releaseConn:(func(error)(0x4802c0), rowsi:(*pq.rows)(0xc820166700), closed:false, lastcols:[]driver.Value(nil), lasterr:error(nil), closeStmt:driver.Stmt(nil)}
...probably not what you want.
Those correspond to the Rows struct from the sql package (you'll notice the fields are not exported):
type Rows struct {
dc *driverConn // owned; must call releaseConn when closed to release
releaseConn func(error)
rowsi driver.Rows
closed bool
lastcols []driver.Value
lasterr error // non-nil only if closed is true
closeStmt driver.Stmt // if non-nil, statement to Close on close
}
You'll see []driver.Value (an interface from the driver package), that looks like where we can expect to find some useful, maybe even human readable data. But when directly printed it doesn't appear useful, it's even empty... So you have to somehow get at the underlying information. The sql package gives us the Next method to start with:
Next prepares the next result row for reading with the Scan method.
It returns true on success, or false if there is no next
result row or an error happened while preparing it. Err
should be consulted to distinguish between the two cases.
Every call to Scan, even the first one, must be preceded by a call to Next.
Next is going to make a []driver.Value the same size as the number of columns I have, which is accessible (within the sql package) through driver.Rows (the rowsi field) and populate it with values from the query.
After calling rows.Next() if you did the same fmt.Printf("%#v\n", rows) you should now see that []diver.Value is no longer empty but it's still not going to be anything that you can read, more likely something resembling:[]diver.Value{[]uint8{0x47, 0x65...
And since the field isn't exported you can't even try and convert it to something more meaningful. But the sql package gives us a means to do something with the data, which is Scan.
The Scan method is pretty concise, with lengthy comments that I won't paste here, but the really important bit is that it ranges over the columns in the current row you get from the Next method and calls convertAssign(dest[i], sv), which you can see here:
https://golang.org/src/database/sql/convert.go
It's pretty long but actually relatively simple, it essentially switches on the type of the source and destination and converts where it can, and copies from source to destination; the function comments tell us:
convertAssign copies to dest the value in src, converting it if possible. An error is returned if the copy would result in loss of information. dest should be a pointer type.
So now you have a method (Scan) which you can call directly and which hands you back converted values. Your code sample above is fine (except maybe the call to Fatal() on a Scan error).
It's important to realize that the sql package has to work with a specific driver, which is in turn implemented for specific database software, so there is quite some work going on behind the scenes.
I think your best bet if you want to hide/generalize the whole Query() ---> Next() ---> Scan() idiom is to drop it into another function which does it behind the scenes... write a package in which you abstract away that higher level implementation, as the sql package abstracts away some of the driver-specific details, the converting and copying, populating the Rows, etc.

Most efficient way to read Zlib compressed file in Golang?

I'm reading in and at the same time parsing (decoding) a file in a custom format, which is compressed with zlib. My question is how can I efficiently uncompress and then parse the uncompressed content without growing the slice? I would like to parse it whilst reading it into a reusable buffer.
This is for a speed-sensitive application and so I'd like to read it in as efficiently as possible. Normally I would just ioutil.ReadAll and then loop again through the data to parse it. This time I'd like to parse it as it's read, without having to grow the buffer into which it is read, for maximum efficiency.
Basically I'm thinking that if I can find a buffer of the perfect size then I can read into this, parse it, and then write over the buffer again, then parse that, etc. The issue here is that the zlib reader appears to read an arbitrary number of bytes each time Read(b) is called; it does not fill the slice. Because of this I don't know what the perfect buffer size would be. I'm concerned that it might break up some of the data that I wrote into two chunks, making it difficult to parse because one say uint64 could be split from into two reads and therefore not occur in the same buffer read - or perhaps that can never happen and it's always read out in chunks of the same size as were originally written?
What is the optimal buffer size, or is there a way to calculate this?
If I have written data into the zlib writer with f.Write(b []byte) is it possible that this same data could be split into two reads when reading back the compressed data (meaning I will have to have a history during parsing), or will it always come back in the same read?
You can wrap your zlib reader in a bufio reader, then implement a specialized reader on top that will rebuild your chunks of data by reading from the bufio reader until a full chunk is read. Be aware that bufio.Read calls Read at most once on the underlying Reader, so you need to call ReadByte in a loop. bufio will however take care of the unpredictable size of data returned by the zlib reader for you.
If you do not want to implement a specialized reader, you can just go with a bufio reader and read as many bytes as needed with ReadByte() to fill a given data type. The optimal buffer size is at least the size of your largest data structure, up to whatever you can shove into memory.
If you read directly from the zlib reader, there is no guarantee that your data won't be split between two reads.
Another, maybe cleaner, solution is to implement a writer for your data, then use io.Copy(your_writer, zlib_reader).
OK, so I figured this out in the end using my own implementation of a reader.
Basically the struct looks like this:
type reader struct {
at int
n int
f io.ReadCloser
buf []byte
}
This can be attached to the zlib reader:
// Open file for reading
fi, err := os.Open(filename)
if err != nil {
return nil, err
}
defer fi.Close()
// Attach zlib reader
r := new(reader)
r.buf = make([]byte, 2048)
r.f, err = zlib.NewReader(fi)
if err != nil {
return nil, err
}
defer r.f.Close()
Then x number of bytes can be read straight out of the zlib reader using a function like this:
mydata := r.readx(10)
func (r *reader) readx(x int) []byte {
for r.n < x {
copy(r.buf, r.buf[r.at:r.at+r.n])
r.at = 0
m, err := r.f.Read(r.buf[r.n:])
if err != nil {
panic(err)
}
r.n += m
}
tmp := make([]byte, x)
copy(tmp, r.buf[r.at:r.at+x]) // must be copied to avoid memory leak
r.at += x
r.n -= x
return tmp
}
Note that I have no need to check for EOF because I my parser should stop itself at the right place.

Resources