How to parse a .proto file into a JSON/Python structure? - protocol-buffers

I have a .proto file, describing a given schema and following proto2 syntax.
I would like to read this file in python, and generate a structure that I can then manipulate (can be JSON, a python data structure, a proto-specific structure, w/e), in which I have the field names, their types, their optional/required tags, etc. I then want to loop around those fields and print them.
Conceptually, something like this:
my_schema = parseFrom(my_proto.proto)
for field in my_schema["fields"]:
print(field["name"])
How can I do so? I am fiddling around with proto python API, but I am a bit. I did manage to create a FileDescriptorProto from my proto file, and add it to a DescriptorPool. But from there, I do not understand how I can get a DescriptorProto.

Related

Parsing a JSON file without JSON.parse()

This is my first time using Ruby. I'm writing an application that parses data and performs some calculations based on it, the source of which is a JSON file. I'm aware I can use JSON.parse() here but I'm trying to write my program so that it will work with other sources of data. Is there a clear cut way of doing this? Thank you.
When your source file is JSON then use JSON.parse. Do not implement a JSON parser on your own. If the source file is a CSV, then use the CSV class.
When your application should be able to read multiple different formats then just add one Reader class for each data type, like JSONReader, CSVReader, etc. And then decide depending on the file extension which reader to use to read the file.

Differences between .proto, .pb and .pbtxt

As explained in their website and Wikipedia, the Protocol Buffers (or Protobuf) is "used to serialize structured data". The definition of the data structure is done in a .proto file that can be compiled by protoc and turned into code (.cc/.h, .py, .java...) that can be imported to several languages to manipulate and serialize the data.
My understanding is that the .pb files contain that data in binary and the .pbtxt are an equivalent that contain it in ascii. Is that correct?
If so, why are .pbtxt so readable? I've found some with commentaries (https://github.com/google/mediapipe/blob/master/mediapipe/graphs/hand_tracking/subgraphs/hand_renderer_cpu.pbtxt).
Also, are .pb/.pbtxt enough to interpret the data? Or do you need their .proto?

How does file convertors work in general like word to pdf, XML to json, word to txt etc

I've used many types of file convertor like word to pdf, XML to json, word to txt etc.
How do they work in backend? Is there some specific guidelines each of them follow? Are there some similarity in the way they are implemented.
I tried searching it but most of the articles take me to the web app that can convert the doc, but none of them gives clarity on how it's done.
All of them work by parsing the first document into a data structure. Then generate a document in the other format from that data structure using recursion.
Parsing itself is a giant topic that people take courses on in computer science. But long story short, it proceeds by breaking the document into tokens, and then fitting the tokens into a parse tree using one of a standard set of methods. They have all sorts of fancy names like Recursive Descent and LALR(1). That's where most of the theory you'd want to learn is.
For example if you're writing a JSON to XML converter, you'd first need to parse that JSON. A JSON Parser shows how you could write that, from scratch, using recursive descent. Once written you just need to write a recursive function that takes each data type and does something appropriate with it to generate text in the format that you want.
Incidentally you can also write a "document converter" that converts from a document format to the same document format. Why would someone want to do that? The two most common use cases are to prettify or minify code. Despite the fact that only one format is being dealt with, the principles of how you do it are exactly the same.

How to retrieve protobuf data rendomly?

I want to store large amount of data in a protobuf format in which include time-stamp parameter. And I want to retrieve the data based on the time-stamp value.
Thanks.
Protobuf is a sequential-access format. There's no way to jump into the middle of a message looking for data; you have to parse through the whole thing.
Some options:
Devise a framing format that allows you to break up your datastore into many small chunks, each of which is a separate protobuf message. This is a fairly large project.
Use SQLite or even an actual database.
Use a random-access-fieldly format like Cap'n Proto instead. (Disclosure: I'm the author of Cap'n Proto, and also of Protobufs v2 (Google's open source release).)

Can Ruby read a .dat file created in VB.NET?

How would I create a list of elements in VB.NET, save it to a .dat file, and make Ruby re-create such list (as an array) with such elements (they will be strings, booleans and integers)?
You can do it, but you'd need to find some representation for it. The easiest is probably JSON, so you would
make the data structure in VB
write it to JSON as a file
read the JSON file using Ruby.
Here's a JSON serializer for .Net:
A .dat file is just a binary blob, 'tis it not? If there's any particular format you use you could easily translate that to equivalent Ruby code. Just as long as the knowledge is duplicated on both ends, though that leads to a violation of the DRY principle. JSON might be a good intermediate representation (as noted by #Charlie Martin) because it's a plain text format and you can always add compression.

Resources