Can the google protobuf library be used to serialize an unknown binary protobuf into a ruby object, then back again? - ruby

I understand that the binary protobuf format has no context (without the proto def), but all the same, I have a requirement to be able to deserialize it into a ruby object that can be enumerated and changed, and then reserialize the object back into binary protobuf format.
The google protobuf docs for ruby are really light in comparison to the other supported languages, so it isn't clear if this is possible, or how to go about it.
If the google protobuf library isn't the best choice, is there a better one (that supports all the protobuf versions)?

No, the protocol buffer wire format is not self-describing. Without a schema file there's only a limited amount of information which can be extracted.
There is protoscope, a sort of decompiler. That will show you how much information you can get from the wire format, though it is not exact.

After a bit of experimentation, I found that I couldn't get what I wanted from the Google library. However, the wire protocol is documented here, and there are only six variable types used.
In the end I created a simple class that can deserialize all the TLVs, and for the LEN type, recognise (and permute strings), recurse on embedded messages, and skip everything else.

Related

Can proto2 talk to proto3?

I have two applications that talk to eachother via GPB messages. Both were using proto3, but found out one will have to use proto2. If the messages are the same, can one program use proto2 to compile while the other uses proto3? Or do they need to be compiled with the same version of proto.
The wire format is very similar, so it will work to some extent. However, there are some caveats:
Distinction of required/optional fields does not exist on proto3. You should make all the fields optional on proto2 side to avoid errors about missing required fields.
When proto3 encodes fields, any fields that have zero value will be missing when decoded on proto2 side. If you specify zero as default value on proto2 side, it should work out ok.
Extensions and Any type will be quite difficult to use in a way that would be compatible with both.

Decoding Protobuf encoded data using non-supported platform

I am new to Protobufs; I haven't had much exposure to them. One of the API endpoints we require data from, uses Protobuf encoded data. This generally wouldn't be an issue if I was using a 'supported' language such as JavaScript, Java, Python or even R to decode the data...
Unfortunately, I am trying to automate the process using Alteryx. Rather than this being an Alteryx specific question, I have a few questions about Protobufs themselves so I understand this situation better. I've read through the implementation of Protobufs in Java and Python, and have a basic understanding of how to use them.
To surmise (please correct me if I am wrong), a Protobuf is a method of serializing structured data where a .proto schema is used to encode / decode data into raw binary. My confusion lies with the compiler. Google documentation and examples for Python / Java show how a Protobuf compiler (library) is required in order to run the encoding and decoding process. Reading the Google website, it advises that the Protobufs are 'language neutral and platform neutral', but I can't see how that is possible if you need the compiler (and .proto file!) to do the decoding. For example, how would anyone using a language outside of the languages where Google have a compiler created possibly decode Protobuf encoded data? Am I missing something?
I figure I'm missing something, since it seems weird that a public API would force this constraint.
"language/platform neutral" here simply means that you can reliably get the same data back from any language/framework/platform. The serialization format is defined independently and does not rely on the nuances of any particular framework.
This might seem a low bar, but you'd be surprised how many serialization formats fail to clear it.
Because the format is specified, anyone can create a tool for some other platform. It is a little fiddly if you're not used to dealing in bits, but: totally doable. The protobuf landscape is not dependent on Google - here's a list of some of the known non-Google tools: https://github.com/protocolbuffers/protobuf/blob/master/docs/third_party.md
Also, note that technically you don't even need a .proto; you just need some mechanism for specifying which fields map to which field numbers (since protobuf doesn't include the names). Quite a few in that list can work either from a .proto, or from the field/number map being specified in some other way. The advantage of .proto is simply that it is easy to convey as the schema - and again: isn't tied to any particular language. You can write plugins for "protoc" to add your own tooling, so you don't need to write your own parser from scratch. Or you can write your own parser from scratch if you prefer.
You can't speak of non-supported platform in this case: it is more about languages for which you can't find a protobuf implementation.
My 2 cents is: if you can't find a protobuf implementation for your language, find another language you're familiar with (and popular in protobuf community) and handle the protobuf serialization/deserialization with it. Then call it via a REST API, a executable ... whatever

Verify protobuf message definition

Is it possible to get a hash of the protobuf message definition? The hash should be on the message definition itself, not depending on any data that's in it. I'm using protobuf to transfer data across machines, and I want to make sure they are compiled against the exactly same definition of message structure.
You can get the message descriptor using google::protobuf::Message::GetDescriptor() interface. Using the Descriptor::CopyTo() method you can then convert this to DescriptorProto, which contains all the information about the protobuf message stored itself in protobuf format. This you can then serialize and hash in whatever way you want.
But I agree with whrrgarbl's comment that protobuf already has very good forwards- and backwards-compatibility. So unless you have very special reasons this hashing seems unnecessary and will only make future maintenance of your code more difficult.

Converting Protobuf definitions to Thrift

Are there any tools that exist to generate a Thrift interface definition from a Protobuf definition?
It appears the answer is "not yet". One problem you will face is that thrift defines a full RPC system with services and method calls, while protobuf really focuses on the datatypes and the serialization bits. Thrift's data model is a bit more restricted than protobuf's (no recursive structures, etc.), but that should not be a problem in the thrift -> protobuf direction.
Sure, you could quite easily convert all the thrift data types to protobuf definitions, while ignoring the service section entirely. You could even add something like that as a built in generator in the thrift compiler if you wanted to.
Thrift and Protobuf are not interchangeable though. Take a look at Biggest differences of Thrift vs Protocol Buffers? to see some key differences. What exactly are you trying to accomplish?
I wrote a translator to convert a subset of Thrift into Protobuf and vice-versa.
This is some Thrift code:
enum Operation{
ADD=1,SUBTRACT=2,MULTIPLY=3,DIVIDE=4
}
struct Work{1:i32 num1,2:i32 num2,4:string comment
}
which is automatically converted into this Protobuf code:
enum Operation{
ADD=1,SUBTRACT=2,MULTIPLY=3,DIVIDE=4
}
message Work{int32 num1 = 1;int32 num2 = 2;string comment = 4;
}
I don't think there is. If you're up to write code for that, you can write a language generator that produces .thrift files for you.
You can write the tool in any language (I wrote in C++ in protobuf-j2me [1], and adapted protobuf-csharp-port code in [2]).
You can have protoc.exe to call it like:
protoc.exe --plugin=protoc-gen-thrift.exe --thrift_out=. file.proto
You need to name it as protoc-gen-thrift.exe to make --thrift_out option available.
[1] https://github.com/igorgatis/protobuf-j2me
[2] https://github.com/jskeet/protobuf-csharp-port/blob/6ac38bf84452f1f9bd3df59a276708725be2ed49/src/ProtoGen/ProtocGenCs.cs
I believe there is.
Check out the non-destructive transformer I wrote protobuf-thrift, which can transform protobuf to thrift and vice-versa. What's more, it will preserve the declaration order and comments, which gives you maximum retention of source code format.
You can also try out the web interface here.
BTW, I also agreed with captncraig's answer. It's true that thrift and protobuf have many differences, such as nested type in protobuf and union type in thrift. But you can't deny that they have a lot of syntaxes in common, such as enum => enum, struct => message, service => service, which are our most used syntaxes.
So, protobuf-thrift is the tool that helps you reduce repeating work, with knowing the truth that "they are not 100% convertible".

Human-readable representations in protobuf-net

Does protobuf-net have any APIs to dump a protobuf into human readable form? I was hoping for something like TextFormat.
At the moment, no. I'm in two minds as to whether it is worthwhile adding; in my mind, this defeats most of the benefits of protocol buffers.
However, since Jon's version is a port of the java version you should find that it is feature compatible, so it should exist there.
there is one for java. the build.toString() method returns a string representation but you'll loose the serialization.

Resources