Decoding Protobuf encoded data using non-supported platform - protocol-buffers

I am new to Protobufs; I haven't had much exposure to them. One of the API endpoints we require data from, uses Protobuf encoded data. This generally wouldn't be an issue if I was using a 'supported' language such as JavaScript, Java, Python or even R to decode the data...
Unfortunately, I am trying to automate the process using Alteryx. Rather than this being an Alteryx specific question, I have a few questions about Protobufs themselves so I understand this situation better. I've read through the implementation of Protobufs in Java and Python, and have a basic understanding of how to use them.
To surmise (please correct me if I am wrong), a Protobuf is a method of serializing structured data where a .proto schema is used to encode / decode data into raw binary. My confusion lies with the compiler. Google documentation and examples for Python / Java show how a Protobuf compiler (library) is required in order to run the encoding and decoding process. Reading the Google website, it advises that the Protobufs are 'language neutral and platform neutral', but I can't see how that is possible if you need the compiler (and .proto file!) to do the decoding. For example, how would anyone using a language outside of the languages where Google have a compiler created possibly decode Protobuf encoded data? Am I missing something?
I figure I'm missing something, since it seems weird that a public API would force this constraint.

"language/platform neutral" here simply means that you can reliably get the same data back from any language/framework/platform. The serialization format is defined independently and does not rely on the nuances of any particular framework.
This might seem a low bar, but you'd be surprised how many serialization formats fail to clear it.
Because the format is specified, anyone can create a tool for some other platform. It is a little fiddly if you're not used to dealing in bits, but: totally doable. The protobuf landscape is not dependent on Google - here's a list of some of the known non-Google tools: https://github.com/protocolbuffers/protobuf/blob/master/docs/third_party.md
Also, note that technically you don't even need a .proto; you just need some mechanism for specifying which fields map to which field numbers (since protobuf doesn't include the names). Quite a few in that list can work either from a .proto, or from the field/number map being specified in some other way. The advantage of .proto is simply that it is easy to convey as the schema - and again: isn't tied to any particular language. You can write plugins for "protoc" to add your own tooling, so you don't need to write your own parser from scratch. Or you can write your own parser from scratch if you prefer.

You can't speak of non-supported platform in this case: it is more about languages for which you can't find a protobuf implementation.
My 2 cents is: if you can't find a protobuf implementation for your language, find another language you're familiar with (and popular in protobuf community) and handle the protobuf serialization/deserialization with it. Then call it via a REST API, a executable ... whatever

Related

Can the google protobuf library be used to serialize an unknown binary protobuf into a ruby object, then back again?

I understand that the binary protobuf format has no context (without the proto def), but all the same, I have a requirement to be able to deserialize it into a ruby object that can be enumerated and changed, and then reserialize the object back into binary protobuf format.
The google protobuf docs for ruby are really light in comparison to the other supported languages, so it isn't clear if this is possible, or how to go about it.
If the google protobuf library isn't the best choice, is there a better one (that supports all the protobuf versions)?
No, the protocol buffer wire format is not self-describing. Without a schema file there's only a limited amount of information which can be extracted.
There is protoscope, a sort of decompiler. That will show you how much information you can get from the wire format, though it is not exact.
After a bit of experimentation, I found that I couldn't get what I wanted from the Google library. However, the wire protocol is documented here, and there are only six variable types used.
In the end I created a simple class that can deserialize all the TLVs, and for the LEN type, recognise (and permute strings), recurse on embedded messages, and skip everything else.

How to customize a serialization

I'm newbie with graphql and spqr. I would like to serialize my dates with personal format. How I can do it?
The best answer I'd offer is: don't! SPQR serializes all temporal scalars as ISO 8601 strings in UTC zone for a reason. It is the most portable format, that any client can easily parse and understand, and any conversion and display logic is better left to the client itself.
If this is for some reason impossible (e.g. backwards compatibility with a legacy client), your best bet is providing your own scalar implementations. In the future there might be a feature to avoid this, but currently you have to implement your own scalars and a TypeMapper that will map the desired Java types to those scalars. See the existing ScalarMapper for inspiration. Once you have the mapper, register it via generator.withTypeMappers.

How to verify and validate parsed Google Protobuf v2 file

First, I'll just couch this in the acknowledgement, yes, I am aware of protoc, but I've got a specific requirement to extrapolate some specialized target language artifacts based on a .proto file parser outcome.
That being established, I've already got the parser itself working. I am working on resolving imported .proto dependencies. Not a terribly difficult endeavor on the surface, in and of itself.
The next steps after that, I think, are to perform a kind of "transitive linkage", as I've learned, but I am curious what I should be aware of. Prima facie, I think I should be collating a set (most likely map) of element paths to field numbers, as well as collating the reserved as well as extensions, then perhaps verifying as I traverse the .proto dependency tree.
However, I'd like to get an idea of others' experience, guidance, feedback, along these lines.
For what I'm wanting to accomplish, I do not think this verification step needs to be that elaborate, only enough to rule out invalid .proto, etc.
Oh. last but not least, I need to handle this for Protobuf v2 language spec.

Go - decode/encode asn.1

Does anyone know where there is a good example of how to use the asn1 Marshal and Unmarshal funcs in Go?
I'm familiar with the concept of how DER encoding with ASN.1 works, but do not have experience dealing with it directly in code (usually I'm using another library with wraps it - openldap or whatever).
Yes, I've looked at the documentation (http://golang.org/pkg/encoding/asn1/), which seems to describe a tagging system much like what is available for JSON and XML in Go; however I have yet to find a good practical example of this anywhere for the encoding/asn1 package. (Hm, okay I see the Certificate example in asn1_test.go - anyone know of anything else?)
(Overall, I'm trying to implement a very small subset of LDAP (the server side) in Go.)
UPDATE: My question is flawed by the fact that LDAP uses BER, not DER. So encoding/asn.1 isn't going to help. In any case, I ended up making this: https://github.com/bradleypeabody/godap (which uses this for BER+ASN1: https://github.com/go-asn1-ber/asn1-ber )
https://web.archive.org/web/20160816005220/https://jan.newmarch.name/go/serialisation/chapter-serialisation.html
and
https://ipfs.io/ipfs/QmfYeDhGH9bZzihBUDEQbCbTc5k5FZKURMUoUvfmc27BwL/dataserialisation/asn1.html
have quite a few examples with asn1.Marshal / asn1.Unmarshal

Human-readable representations in protobuf-net

Does protobuf-net have any APIs to dump a protobuf into human readable form? I was hoping for something like TextFormat.
At the moment, no. I'm in two minds as to whether it is worthwhile adding; in my mind, this defeats most of the benefits of protocol buffers.
However, since Jon's version is a port of the java version you should find that it is feature compatible, so it should exist there.
there is one for java. the build.toString() method returns a string representation but you'll loose the serialization.

Resources