protobuf unmarshal unknown message - go

I have a listener which receives protobuf messages. However it doesn't know which type of message comes in when. So I tried to unmarshal into an interface{} so I can later type cast:
var data interface{}
err := proto.Unmarshal(message, data)
if err != nil {
log.Fatal("unmarshaling error: ", err)
}
log.Printf("%v\n", data)
However this code doesn't compile:
cannot use data (type interface {}) as type proto.Message in argument to proto.Unmarshal:
interface {} does not implement proto.Message (missing ProtoMessage method)
How can I unmarshal and later type cast an "unknown" protobuf message in go?

First, two words about the OP's question, as presented by them:
proto.Unmarshal can't unmarshal into an interface{}. The method signature is obvious, you must pass a proto.Message argument, which is an interface implemented by concrete protobuffer types.
When handling a raw protobuffer []byte payload that didn't come in an Any, ideally you have at least something (a string, a number, etc...) coming together with the byte slice, that you can use to map to the concrete protobuf message.
You can then switch on that and instantiate the appropriate protobuf concrete type, and only then pass that argument to Unmarshal:
var message proto.Message
switch atLeastSomething {
case "foo":
message = &mypb.Foo{}
case "bar":
message = &mypb.Bar{}
}
_ = proto.Unmarshal(data, message)
Now, what if the byte payload is truly unknown?
As a foreword, consider that this should seldom happen in practice. The schema used to generate the protobuffer types in your language of choice represents a contract, and by accepting protobuffer payloads you are, for some definitions of it, fulfilling that contract.
Anyway, if for some reason you must deal with a completely unknown, mysterious, protobuffer payload in wire format, you can extract some information from it with the protowire package.
Be aware that the wire representation of a protobuf message is ambiguous. A big source of uncertainty is the "length-delimited" type (2) being used for strings, bytes, repeated fields and... sub-messages (reference).
You can retrieve the payload content, but you are bound to have weak semantics.
The code
With that said, this is what a parser for unknown proto messages may look like. The idea is to leverage protowire.ConsumeField to read through the original byte slice.
The data model could be like this:
type Field struct {
Tag Tag
Val Val
}
type Tag struct {
Num int32
Type protowire.Type
}
type Val struct {
Payload interface{}
Length int
}
And the parser:
func parseUnknown(b []byte) []Field {
fields := make([]Field, 0)
for len(b) > 0 {
n, t, fieldlen := protowire.ConsumeField(b)
if fieldlen < 1 {
return nil
}
field := Field{
Tag: Tag{Num: int32(n), Type: t },
}
_, _, taglen := protowire.ConsumeTag(b[:fieldlen])
if taglen < 1 {
return nil
}
var (
v interface{}
vlen int
)
switch t {
case protowire.VarintType:
v, vlen = protowire.ConsumeVarint(b[taglen:fieldlen])
case protowire.Fixed64Type:
v, vlen = protowire.ConsumeFixed64(b[taglen:fieldlen])
case protowire.BytesType:
v, vlen = protowire.ConsumeBytes(b[taglen:fieldlen])
sub := parseUnknown(v.([]byte))
if sub != nil {
v = sub
}
case protowire.StartGroupType:
v, vlen = protowire.ConsumeGroup(n, b[taglen:fieldlen])
sub := parseUnknown(v.([]byte))
if sub != nil {
v = sub
}
case protowire.Fixed32Type:
v, vlen = protowire.ConsumeFixed32(b[taglen:fieldlen])
}
if vlen < 1 {
return nil
}
field.Val = Val{Payload: v, Length: vlen - taglen}
// fmt.Printf("%#v\n", field)
fields = append(fields, field)
b = b[fieldlen:]
}
return fields
}
Sample input and output
Given a proto schema like:
message Foo {
string a = 1;
string b = 2;
Bar bar = 3;
}
message Bar {
string c = 1;
}
initialized in Go as:
&test.Foo{A: "A", B: "B", Bar: &test.Bar{C: "C"}}
And by adding a fmt.Printf("%#v\n", field) statement at the end of the loop in the above code, it will output the following:
main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x41}, Length:1}}
main.Field{Tag:main.Tag{Num:2, Type:2}, Val:main.Val{Payload:[]uint8{0x42}, Length:1}}
main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}
main.Field{Tag:main.Tag{Num:3, Type:2}, Val:main.Val{Payload:[]main.Field{main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}}, Length:3}}
About sub-messages
As you can see from the above the idea to deal with a protowire.BytesType that may or may not be a message field is to attempt to parse it, recursively. If it succeeds, we keep the resulting msg and store it in the field value, if it fails, we store the bytes as-is, which then may be a proto string or bytes. BTW, if I'm reading correctly, this seems what Marc Gravell does in the Protogen code.
About repeated fields
The code above doesn't deal with repeated fields explicitly, but after the parsing is done, repeated fields will have the same value for Field.Tag.Num. From that, packing the fields into a slice/array should be trivial.
About maps
The code above also doesn't deal with proto maps. I suspect that maps are semantically equivalent to a repeated k/v pair, e.g.:
message Pair {
string key = 1; // or whatever key type
string val = 2; // or whatever val type
}
If my assumption is correct, then maps can be parsed with the given code as sub-messages.
About oneofs
I haven't yet tested this, but I expect that information about the union type are completely lost. The byte payload will contain only the value that was actually set.
But what about Any?
The Any proto type doesn't fit in the picture. Contrary to what it may look like, Any is not analogous to, say, map[string]interface{} for JSON objects. And the reason is simple: Any is a proto message with a very well defined structure, namely (in Go):
type Any struct {
// unexported fields
TypeUrl string // struct tags omitted
Value []byte // struct tags omitted
}
So it is more similar to the implementation of a Go interface{} in that it holds some actual data and that data's type information.
It can hold itself arbitrary proto payloads (with their type information!) but it can not be used to decode unknown messages, because Any has exactly those two fields, type url and a byte payload.
To wrap up, this answer doesn't provide a full-blown production-grade solution, but it shows how to decode arbitrary payloads while preserving as much original semantics as possible. Hopefully it will point you in the right direction.

As you've seen, and the commenters have pointed out, you can't use proto.Unmarshal to interface{} because, the method expects a type Message which implements an interface MessageV1.
Protobuf messages are typed and correspond to method invocations ("comes in") and the implementation cannot take generic types of protobuf but specific protobufs:
func (s *server) M(ctx context.Context, _ *pb.Foo) (*pb.Bar, error)
The solution is to envelope your generic types as Any within a specific type perhaps Envelope:
message Envelope {
google.protobuf.Any content = 1;
...
}
The content is then transmitted as a []byte (see Golang anypb.Any) and the implementation (anypb) includes methods to pack|unpack these.
The 'trick' with Any is that messages include a [TypeURL] that uniquely identifies the message so that the receiver knows how to e.g. Unmarshal it.

Related

why unmarshal make the type of object changed in golang

I want to write a mockData method which can accept several types of parameter and return correspond objects based on its json data. The code as below:
func MockData(jsonPath string,v interface{})(interface{},error){
var ret interface{}
data,_ := ioutil.ReadFile(jsonPath)
switch v.(type) {
case Req:
ret = Req{}
fmt.Printf("\n===before Unmarshal==%T===\n",ret)
err = json.Unmarshal(data,&ret)
if err!=nil{...}
fmt.Printf("======after unmarshal===%T\n",ret)
case ...
default:
fmt.Printf("error===not match")
}
return ret,err
}
However, it panics when I use it. The code as below:
func main(){
reqJsonPath := /xx/yy/req.json
obj,err:=test.MockData(jsonFile,Req{})
if err!=nil{...}
require := obj.(Req) //panic cant []interface{} to Req
}
and the output of MockData is:
===before Unmarshal==Req===
======after unmarshal===[]interface{}
the type of object changed after unmarshal. and some more strange is that if I replace:
ret = Req{}
with
ret = &Req{}
the output will be same as below:
===before Unmarshal==*Req===
======after unmarshal===*Req
To reproduce the problem more conveniently I give the Require struct as below:
type Req []*Ele
type Ele struct {
ID int
Level int
}
summary:
Can I achieve expected function which produces different types of objects based on its json and type?
Why does the type of object changed after unmarshal, and why it not changed after I add &?
Can I achieve expected function which produces different types of objects based on its json and type?
func MockData(filename string, v interface{}) (interface{}, error) {
data, _ := ioutil.ReadFile(filename)
switch t := v.(type) {
case Req:
// t at this point is a Req{}
err := json.Unmarshal(data, &t)
return t, err
}
return nil, errors.New("unknown type")
}
I don't really know your motivation why you you need to pass an actual struct rather than a pointer. Check this demonstration
Why does the type of object changed after unmarshal, and why it not changed after I add &?
When you unmarshal using &ret where ret is an interface, you are getting the address of the interface. Hence, json.Unmarshal() will see that the backing data is a interface rather than a pointer to a struct. The default data type that json.Unmarshal() will use is map[string]interface{} for objects and []interface{} for arrays.
Now if you unmarshal using ret where ret is &Req{}, json.Unmarshal() will check that the backing data is a struct, hence it can do it's unmarshaling using the struct's fields.
Edit:
You seem to be confused by pointer to an interface which is different to an interface which has a pointer. Try this code and you'll see the difference.
var x interface{} = Req{}
var y interface{} = &x
var z interface{} = &Req{}
fmt.Printf("%T\n", y)
fmt.Printf("%T\n", z)
Remember that interfaces are just normal values and they also take memory. Now if you take an address of that memory, you get the pointer to the interface rather than the pointer to the data the interface is referring to.
Can I achieve expected function which produces different types of objects based on its json and type?
Yes, but you'll have to convert it back using a type assertion at the calling end ie
MyFoo:=MockData("foo.json", Foo{}).(Foo)
(or have multiple return ret.(Foo) return ret.(Bar) in the func)
Why does the type of object changed after unmarshal, and why it not changed after I add &?
There are some helpful comments in the top of the Unmarshal source
namely
// To unmarshal JSON into a pointer, Unmarshal first handles the case of
// the JSON being the JSON literal null. In that case, Unmarshal sets
// the pointer to nil. Otherwise, Unmarshal unmarshals the JSON into
// the value pointed at by the pointer. If the pointer is nil, Unmarshal
// allocates a new value for it to point to.
and
// To unmarshal JSON into an interface value,
// Unmarshal stores one of these in the interface value:
//
// bool, for JSON booleans
// float64, for JSON numbers
// string, for JSON strings
// []interface{}, for JSON arrays
// map[string]interface{}, for JSON objects
// nil for JSON null
So in the first case you are unmarshalling into an interface value (ret is declared as an interface{})
In the second case there is a pointer to a struct so that's what you get

Parse and validate "key1:value1; key2:value2" string to Go struct efficiently?

I have a "key1:value1; key2:value2" like string (string with key:value pattern concated by ;).
Now I wish to parse this string to a Go struct:
type CustomStruct struct {
KeyName1 string `name:"key1" somevalidation:"xxx"`
KeyName2 int `name:"key2" somevalidation:"yyy"`
}
In the above example, the struct tag defines the name of the key in the string and can provide some validation for its corresponding value (it can set a default value if validation fails). For instance, KeyName2 is an int value, so I wish the somevalidation can check whether the KeyName2 satisfy, let's say, greater than 30 and less equal than 100.
And in another senario, I can define another struct CustomStruct2 for string like key3:value3; key4:value4;
How can I archive this kind of requirement efficiently and elegantly?
I'll assume that you can parse the data to a map[string]interface{}.
Use the reflect package to set the fields. Here's the basic function:
// set sets fields in struct pointed to by pv to values in data.
func set(pv interface{}, data map[string]interface{}) {
// pv is assumed to be pointer to a struct
s := reflect.ValueOf(pv).Elem()
// Loop through fields
t := s.Type()
for i := 0; i < t.NumField(); i++ {
// Set field if there's a data value for the field.
f := t.Field(i)
if d, ok := data[f.Tag.Get("name")]; ok {
s.Field(i).Set(reflect.ValueOf(d))
}
}
}
This code assumes that the values in the data map are assignable to the corresponding field in the struct and that the first argument is a pointer to a struct. The code will panic if these assumptions are not true. You can protect against this by checking types and assignability using the reflect package.
playground example

Check if struct field is empty

I would like to iterate over the fields of a struct after unmarshalling a JSON object into it and check for the fields whose value was not set (i.e. are empty).
I can get the value of each field and compare that to the reflect.Zero value for the corresponding type
json.Unmarshal([]byte(str), &res)
s := reflect.ValueOf(&res).Elem()
typeOfT := s.Type()
for i := 0; i < s.NumField(); i++ {
f := s.Field(i)
v := reflect.ValueOf(f.Interface())
if (reflect.DeepEqual(v.Interface(), reflect.Zero(v.Type()).Interface())) {
....
But the problem, of course, is that this will not work well for bool or int values.
If a bool field is set to false in the JSON or an int field is set to 0, they will be equal to the zero value of their type. The aforementioned check will consider the fields to be uninitialized, even though they actually have a value set.
I know one way around this is to use pointers, but I just don't see how that would be possible in this case as I'm working with reflect.Value types, not the actual struct.
As you've mentioned, you could use pointers.
The json package can handle unmarshalling values into pointers for you. You've not included the json payload you are trying to unmarshal, or the struct you are unmarshalling into, so I've made up an example.
// json
{
"foo": true,
"number_of_foos": 14
}
// go struct
type Foo struct {
Present bool `json:"foo"`
Num int `json:"number_of_foos"`
}
Here if the keys foo or number_of_foos is missing, then as you've correctly observed, the zero value (false/ 0) will be used. In general the best advice is to make use of the zero value. Create structures so that zero values of false are useful, rather than a pain.
This is not always possible, so changing the types of the fields in the Foo struct to be pointers will allow you to check the 3 cases you are after.
Present
Present and zero
Missing
here is the same struct with pointers:
// go struct
type Foo struct {
Present *bool `json:"foo"`
Num *int `json:"number_of_foos"`
}
Now you can check for presence of the value with fooStruct.Present != nil and if that condition holds, you can assume that the value in the field is the one you wanted.
There is no need to use the reflect package.
Another way of doing the same is by implementing json.Unmarshaler.
type MaybeInt struct {
Present bool
Value int
}
func (i *MaybeInt) UnmarshalJSON(bs []byte) error {
if e := json.Unmarshal(bs, &i.Value); e != nil {
return e
}
i.Present = true
return nil
}
You can then use MaybeInt in your top-level structure:
type Top struct {
N MaybeInt `json:"n"`
M MaybeInt `json:"m"`
}
func main() {
t := Top{}
if e := json.Unmarshal([]byte(` { "n": 4930 } `), &t); e != nil {
panic(e)
}
fmt.Println(t.N, t.M)
}
See it working on the playground
Try using the golang validator package. The validator package offers a required attribute that might do the required job for your need. The official documentation for required attribute states:
This validates that the value is not the data types default zero value. For numbers ensures value is not zero. For strings ensures value is not "". For slices, maps, pointers, interfaces, channels and functions ensures the value is not nil.
The example illustrating the same can be seen at: https://github.com/go-playground/validator/blob/v9/_examples/struct-level/main.go.
Hope this solves your requirement.

Using empty interfaces in go

I am trying to understand the code that is used at my company. I am new to go lang, and I have already gone through the tutorial on their official website. However, I am having a hard time wrapping my head around empty interfaces, i.e. interface{}. From various sources online, I figured out that the empty interface can hold any type. But, I am having a hard time figuring out the codebase, especially some of the functions. I will not be posting the entire thing here, but just the minimal functions in which it has been used. Please bear with me!
Function (I am trying to understand):
func (this *RequestHandler) CreateAppHandler(rw http.ResponseWriter, r *http.Request) *foo.ResponseError {
var data *views.Data = &views.Data{Attributes: &domain.Application{}}
var request *views.Request = &views.Request{Data: data}
if err := json.NewDecoder(r.Body).Decode(request); err != nil {
logrus.Error(err)
return foo.NewResponsePropogateError(foo.STATUS_400, err)
}
requestApp := request.Data.Attributes.(*domain.Application)
requestApp.CreatedBy = user
Setting some context, RequestHandler is a struct defined in the same package as this code. domain and views are seperate packages. Application is a struct in the package domain. The following two structs are part of the package views:
type Data struct {
Id string `json:"id"`
Type string `json:"type"`
Attributes interface{} `json:"attributes"`
}
type Request struct {
Data *Data `json:"data"`
}
The following are part of the package json:
func NewDecoder(r io.Reader) *Decoder {
return &Decoder{r: r}
}
func (dec *Decoder) Decode(v interface{}) error {
if dec.err != nil {
return dec.err
}
if err := dec.tokenPrepareForDecode(); err != nil {
return err
}
if !dec.tokenValueAllowed() {
return &SyntaxError{msg: "not at beginning of value"}
}
// Read whole value into buffer.
n, err := dec.readValue()
if err != nil {
return err
}
dec.d.init(dec.buf[dec.scanp : dec.scanp+n])
dec.scanp += n
// Don't save err from unmarshal into dec.err:
// the connection is still usable since we read a complete JSON
// object from it before the error happened.
err = dec.d.unmarshal(v)
// fixup token streaming state
dec.tokenValueEnd()
return err
}
type Decoder struct {
r io.Reader
buf []byte
d decodeState
scanp int // start of unread data in buf
scan scanner
err error
tokenState int
tokenStack []int
}
Now, I understood that, in the struct Data in package views, Application is being set as a type for the empty interface. After that, a pointer to Request in the same package is created which points to the variable data.
I have the following doubts:
What exactly does this keyword mean in Go? What is the purpose of writing this * RequestHandler?
Initialization of a structure in Go can be done while assigning it to a variable by specifying the values of all it's members. However, here, for the struct Data, only the empty interface value is assigned and the values for the other two fields are not assigned?
What is the advantage of assigning the Application struct to an empty interface? Does it mean I can use the struct members using the interface variable directly?
Can someone help me figure out the meaning of this statement? json.NewDecoder(r.Body).Decode(request)?
While I know this is too much, but I am having a hard time figuring out the meaning of interfaces in Go. Please help!
this is not a keyword in go; any variable name can be used there. That is called the receiver. A function declared in that way must be called like thing.func(params), where "thing" is an expression of the type of the receiver. Within the function, the receiver is set to the value of thing.
A struct literal does not have to contain values for all the fields (or any of them). Any fields not explicitly set will have the zero value for their types.
As you said, an empty interface can take on a value of any type. To use a value of type interface{}, you would use type assertion or a type switch to determine the type of the value, or you could use reflection to use the value without having to have code for the specific type.
What specifically about that statement do you not understand? json is the name of a package in which the function NewDecoder is declared. That function is called, and then the Decode function (which is implemented by the type of the return value of NewDecoder) is called on that return value.
You may want to take a look at Effective Go and/or The Go Programming Language Specification for more information.

Cast boxed struct to boxed pointer - golang

I'm using Protobuf for Golang.
Protobuf generates message types where type pointer implements proto.Message().
e.g.
func (*SomeMessage) Message() {}
The protobuf lib have methods like Marshal(proto.Message)
Now to my actual issue.
message := SomeMessage {}
SendMessage(&message)
func SendMessage(message interface{}) {
switch msg := message.(type) {
case proto.Message:
//send across the wire or whatever
default:
//non proto message, panic or whatever
}
}
The above works fine.
However, If I don't pass the message as a pointer, then the code in SendMessage will not match, as the interface is only implemented on the SomeMessage pointer, not on the value.
What I would like to do is:
message := SomeMessage {}
SendMessage(message) //pass by value
//there are more stuff going on in my real code, but just trying to show the relevant parts
func SendMessage(message interface{}) {
//match both pointer and value as proto.Message
//and then turn the value into a pointer so that
//other funcs or protobuf can consume it
message = MagicallyTurnBoxedValueIntoBoxedStruct(message)
switch msg := message.(type) {
case proto.Message:
//send across the wire or whatever
default:
//non proto message, panic or whatever
}
}
preferably I'd like to be able to pass both as pointer and as value.
The reason why I want to pass by value, is that this can act as a poor mans isolation when passing messages across goroutines/threads etc.
(in lack of immutability)
All of this could probably be avoided if the protobuf generator generated allowed values to be treated as proto.Message() too.
Or if there was some nicer way to do immutable messages.
It's not super important,if its possible, cool, if its not, meh :-)
[EDIT]
If I have the reflect.Type of the message and the reflect.Type of the pointer type of the message.
Is it somehow possible to create an instance of the pointer type pointing to the value using "reflect" ?
Normally, you can't take the address of a value which means you can't simply convert the interface{} to a pointer to satisfy Protobuf's requirement.
That said, you can dynamically create a new pointer then copy the value in to that then pass the newly allocated pointer to protobuf.
Here's an example on Play
The value -> pointer conversion is:
func mkPointer(i interface{}) interface{} {
val := reflect.ValueOf(i)
if val.Kind() == reflect.Ptr {
return i
}
if val.CanAddr() {
return val.Addr().Interface()
}
nv := reflect.New(reflect.TypeOf(i))
nv.Elem().Set(val)
return nv.Interface()
}
We first see if it's a pointer, if so, just return the value.
Then we check to see if it's addressable and return that.
Lastly, we make a new instance of the type and copy the contents to that and return it.
Since this this copies the data, it may not be practical for your purposes. It will all depend on size of message and expected rate of calling with a value (as that will generate more garbage).

Resources