How to write my own code generator of protobuf - protocol-buffers

Google protobuf is a nice IDL for RPC. But I want to know how to write my own code generator for protobuf.

The protoc compiler can output a protobuf-formatted description of the .proto file. That way most of the parsing has been done for you already, and you only need to generate the output you want.
The .proto schema for the .proto file description is here:
https://github.com/google/protobuf/blob/master/src/google/protobuf/descriptor.proto
As an additional step, you can make your generator runnable via an "-mygenerator-out=." option on protoc itself:
https://developers.google.com/protocol-buffers/docs/reference/other
Here is one (albeit a bit convoluted) example on how a code generator can be written in Python:
https://github.com/nanopb/nanopb/blob/master/generator/nanopb_generator.py

A protoc plugin is a binary that takes a protobuf message of type CodeGeneratorRequest and returns a response of type CodeGeneratorResponse to standard out.
The binary must be called protoc-gen-NAME and can be used by invoking the protoc command with:
protoc --plugin=./path/to/protoc-gen-NAME --NAME_out=./test/generated ./test.proto
Note specifically that names are important. This will not work, it will invoke the java generator:
protoc --plugin=./path/to/protoc-gen-NAME --java_out=./test/generated ./test.proto
This will not work, because the binary does not have the correct name:
protoc --plugin=./path/to/whatever-NAME --NAME_out=./test/generated ./test.proto
In order to process the incoming CodeGeneratorRequest and generate a valid response, your binary must itself be able to parse the protobuf message as per the protocol file plugin.proto from the protocolbuffers repository.
Historically this was difficult to do in a self-contained manner, but you can do this 'end-to-end' entirely in rust simply with the protobuf crate, like this trivial example demonstrates:
[dependencies]
protobuf="3.0.2"
use protobuf::plugin::{code_generator_response, CodeGeneratorRequest, CodeGeneratorResponse};
use protobuf::Message;
use std::io;
use std::io::{BufReader, Read, Write};
fn main() {
// Read message from stdin
let mut reader = BufReader::new(io::stdin());
let mut incoming_request = Vec::new();
reader.read_to_end(&mut incoming_request).unwrap();
// Parse as a request
let req = CodeGeneratorRequest::parse_from_bytes(&incoming_request).unwrap();
// Generate the content for each output file
let mut response = CodeGeneratorResponse::new();
for proto_file in req.proto_file.iter() {
let mut output = String::new();
output.push_str(&format!("// from file: {:?}\n", &proto_file.name));
output.push_str(&format!("// package: {:?}\n", &proto_file.package));
for message in proto_file.message_type.iter() {
output.push_str(&format!("\nmessage: {:?}\n", &message.name));
for field in message.field.iter() {
output.push_str(&format!(
"- {:?} {:?} {:?}\n",
field.type_,
field.type_name,
field.name(),
));
}
}
// Add it to the response
let mut output_file = code_generator_response::File::new();
output_file.content = Some(output);
output_file.name = Some(format!("{:?}/out.txt", &proto_file.name.as_ref().unwrap()));
response.file.push(output_file);
}
// Serialize the response to binary message and return it
let out_bytes: Vec<u8> = response.write_to_bytes().unwrap();
io::stdout().write_all(&out_bytes).unwrap();
}
Obviously this trivial example doesn't generate code, just text files, but it shows the basic process. You should also iterate over service and deal with all the additional properties on each type.
What this basically gives you is an AST matching the .proto files; the codegen side of it can be done however you like.
Helpful hints:
Do not log to stdout in your plugin, eg. for debugging. The only permitted output to stdout is a protobuf format CodeGeneratorResponse message.
The plugin does not write files, the protoc command does that; it should generate content and then return an array of files, along with content and metadata.
For more information on plugins, carefully read the plugin.proto file linked above; it has extensive details.

Related

Protobuf: to use same Message name with different packages

I am using protobuf java, with following .proto
// service1.proto
option java_package = "package";
option java_outer_classname = "Proto1";
message M {
... // some definition
}
and
// service2.proto
option java_package = "package";
option java_outer_classname = "Proto2";
message M {
... // some different definition
}
while compile, error is thrown in service2.proto saying that "M" is already defined in service1.proto
But from package and generated code they should be package.Proto1.M and package.Proto2.M, is this conflict?
The "package" is also a .proto concept (not just a language/framework concept); if you need to have both schemas involved in anything, it may be useful to add
package Proto1;
to service1.proto and
package Proto2;
to service2.proto
Alternatively, if the M is actually the same in both places: move M to a different single file, and use import from both service1.proto and service2.proto

Serialize and deserialize protobufs through CLI?

I am trying to deserialize a file saved as a protobuf through the CLI (seems like the easiest thing to do). I would prefer not to use protoc to compile, import it into a programming language and then read the result.
My use case: A TensorFlow lite tool has output some data in a protobuf format. I've found the protobuf message definition in the TensorFlow repo too. I just want to read the output quickly. Specifically, I am getting back a tflite::evaluation::EvaluationStageMetrics message from the inference_diff tool.
I assume that the tool outputs a protobuf message in binary format.
protoc can decode the message and output in text format. See this option:
--decode=MESSAGE_TYPE Read a binary message of the given type from
standard input and write it in text format
to standard output. The message type must
be defined in PROTO_FILES or their imports.
While Timo Stamms answer was instrumental, I still struggled with the paths to get protoc to work in a complex repo (e.g. TensorFlow).
In the end, this worked for me:
cat inference_diff.txt | \
protoc --proto_path="/Users/ben/butter/repos/tensorflow/" \
--decode tflite.evaluation.EvaluationStageMetrics \
$(pwd)/evaluation_config.proto
Here I pipe the binary contents of the file containing protobuf (inference_diff.txt in my case, generated by following this guide), and specify the fully qualified protobuf message (which I got by combining the package tflite.evaluation; and the message name, EvaluationStageMetrics), the absolute path of the project for the proto_path (which is the project root/ TensorFlow repo), and also the absolute path for the file which actually contains the message. proto_path is just used for resolving imports, where as the PROTO_FILE (in this case, evaluation_config.proto), is used to decode the file.
Example Output
num_runs: 50
process_metrics {
inference_profiler_metrics {
reference_latency {
last_us: 455818
max_us: 577312
min_us: 453121
sum_us: 72573828
avg_us: 483825.52
std_deviation_us: 37940
}
test_latency {
last_us: 59503
max_us: 66746
min_us: 57828
sum_us: 8992747
avg_us: 59951.646666666667
std_deviation_us: 1284
}
output_errors {
max_value: 122.371696
min_value: 83.0335922
avg_value: 100.17548828125
std_deviation: 8.16124535
}
}
}
If you just want to get the numbers in a rush and can't be bothered to fix the paths, you can do
cat inference_diff.txt | protoc --decode_raw
Example output
1: 50
2 {
5 {
1 {
1: 455818
2: 577312
3: 453121
4: 72573828
5: 0x411d87c6147ae148
6: 37940
}
2 {
1: 59503
2: 66746
3: 57828
4: 8992747
5: 0x40ed45f4b17e4b18
6: 1284
}
3 {
1: 0x42f4be4f
2: 0x42a61133
3: 0x40590b3b33333333
4: 0x41029476
}
}
}

Vision API: How to get JSON-output

I'm having trouble saving the output given by the Google Vision API. I'm using Python and testing with a demo image. I get the following error:
TypeError: [mid:...] + is not JSON serializable
Code that I executed:
import io
import os
import json
# Imports the Google Cloud client library
from google.cloud import vision
from google.cloud.vision import types
# Instantiates a client
vision_client = vision.ImageAnnotatorClient()
# The name of the image file to annotate
file_name = os.path.join(
os.path.dirname(__file__),
'demo-image.jpg') # Your image path from current directory
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
# Performs label detection on the image file
response = vision_client.label_detection(image=image)
labels = response.label_annotations
print('Labels:')
for label in labels:
print(label.description, label.score, label.mid)
with open('labels.json', 'w') as fp:
json.dump(labels, fp)
the output appears on the screen, however I do not know exactly how I can save it. Anyone have any suggestions?
FYI to anyone seeing this in the future, google-cloud-vision 2.0.0 has switched to using proto-plus which uses different serialization/deserialization code. A possible error you can get if upgrading to 2.0.0 without changing the code is:
object has no attribute 'DESCRIPTOR'
Using google-cloud-vision 2.0.0, protobuf 3.13.0, here is an example of how to serialize and de-serialize (example includes json and protobuf)
import io, json
from google.cloud import vision_v1
from google.cloud.vision_v1 import AnnotateImageResponse
with io.open('000048.jpg', 'rb') as image_file:
content = image_file.read()
image = vision_v1.Image(content=content)
client = vision_v1.ImageAnnotatorClient()
response = client.document_text_detection(image=image)
# serialize / deserialize proto (binary)
serialized_proto_plus = AnnotateImageResponse.serialize(response)
response = AnnotateImageResponse.deserialize(serialized_proto_plus)
print(response.full_text_annotation.text)
# serialize / deserialize json
response_json = AnnotateImageResponse.to_json(response)
response = json.loads(response_json)
print(response['fullTextAnnotation']['text'])
Note 1: proto-plus doesn't support converting to snake_case names, which is supported in protobuf with preserving_proto_field_name=True. So currently there is no way around the field names being converted from response['full_text_annotation'] to response['fullTextAnnotation']
There is an open closed feature request for this: googleapis/proto-plus-python#109
Note 2: The google vision api doesn't return an x coordinate if x=0. If x doesn't exist, the protobuf will default x=0. In python vision 1.0.0 using MessageToJson(), these x values weren't included in the json, but now with python vision 2.0.0 and .To_Json() these values are included as x:0
Maybe you were already able to find a solution to your issue (if that is the case, I invite you to share it as an answer to your own post too), but in any case, let me share some notes that may be useful for other users with a similar issue:
As you can check using the the type() function in Python, response is an object of google.cloud.vision_v1.types.AnnotateImageResponse type, while labels[i] is an object of google.cloud.vision_v1.types.EntityAnnotation type. None of them seem to have any out-of-the-box implementation to transform them to JSON, as you are trying to do, so I believe the easiest way to transform each of the EntityAnnotation in labels would be to turn them into Python dictionaries, then group them all into an array, and transform this into a JSON.
To do so, I have added some simple lines of code to your snippet:
[...]
label_dicts = [] # Array that will contain all the EntityAnnotation dictionaries
print('Labels:')
for label in labels:
# Write each label (EntityAnnotation) into a dictionary
dict = {'description': label.description, 'score': label.score, 'mid': label.mid}
# Populate the array
label_dicts.append(dict)
with open('labels.json', 'w') as fp:
json.dump(label_dicts, fp)
There is a library released by Google
from google.protobuf.json_format import MessageToJson
webdetect = vision_client.web_detection(blob_source)
jsonObj = MessageToJson(webdetect)
I was able to save the output with the following function:
# Save output as JSON
def store_json(json_input):
with open(json_file_name, 'a') as f:
f.write(json_input + '\n')
And as #dsesto mentioned, I had to define a dictionary. In this dictionary I have defined what types of information I would like to save in my output.
with open(photo_file, 'rb') as image:
image_content = base64.b64encode(image.read())
service_request = service.images().annotate(
body={
'requests': [{
'image': {
'content': image_content
},
'features': [{
'type': 'LABEL_DETECTION',
'maxResults': 20,
},
{
'type': 'TEXT_DETECTION',
'maxResults': 20,
},
{
'type': 'WEB_DETECTION',
'maxResults': 20,
}]
}]
})
The objects in the current Vision library lack serialization functions (although this is a good idea).
It is worth noting that they are about to release a substantially different library for Vision (it is on master of vision's repo now, although not released to PyPI yet) where this will be possible. Note that it is a backwards-incompatible upgrade, so there will be some (hopefully not too much) conversion effort.
That library returns plain protobuf objects, which can be serialized to JSON using:
from google.protobuf.json_format import MessageToJson
serialized = MessageToJson(original)
You can also use something like protobuf3-to-dict

Vala error "unknown type name" using enum from camel

I am writing this code in Vala, using Camel
using Camel;
[...]
MimeParser par = new MimeParser();
[...]
par.push_state( MimeParserState.MULTIPART, boundary );
I downloaded the camel-1.2.vapi from github vala-girs (this link), put it in a vapi subdirectory and compiled with
valac --vapidir=vapi --includedir=/usr/include/evolution-data-server/camel --pkg camel-1.2 --pkg posix --target-glib=2.32 -o prog prog.vala -X -lcamel-1.2
Compiling I get this error:
error: unknown type name "CamelMimeParserState"
const gchar* camel_mime_parser_state_to_string (CamelMimeParserState self);
Looking the C output code I see that the CamelMimeParserState type is used several times but it is never defined. It should be a simple enum because the camel-1.2.vapi file says:
[CCode (cheader_filename = "camel/camel.h", cprefix = "CAMEL_MIME_PARSER_STATE_", has_type_id = false)]
public enum MimeParserState {
INITIAL,
PRE_FROM,
FROM,
HEADER,
BODY,
MULTIPART,
MESSAGE,
PART,
END,
EOF,
PRE_FROM_END,
FROM_END,
HEADER_END,
BODY_END,
MULTIPART_END,
MESSAGE_END
}
So why doesn't the C output code simply use an enum as the vapi file says (described by cprefix CAMEL_MIME_PARSER_STATE_)?
Is there an error in the .vapi file?
I found the solution. The vapi file is wrong because the cname field is missing. Changing the vapi file adding this cname="camel_mime_parser_state_t":
[CCode (cheader_filename = "camel/camel.h", cname="camel_mime_parser_state_t", cprefix = "CAMEL_MIME_PARSER_STATE_", has_type_id = false)]
public enum MimeParserState {
INITIAL,
[...]
works correctly.

error reading protobuf message as string from file

I intend to read protobuf data from a file and use it in my code. I tried to do some reverse engineering and filled the data into my file and dumbed the string in a log file. I read this log file now and change values of fields and read it in a string using file handling. However it doesn't seem to work. Is there some problem with my code.
std::ifstream myprotobuf("/root/testfile.txt", std::ios::in | std::ios::binary);
if (myprotobuf)
{
std::string string_in_file;
myprotobuf.seekg(0, std::ios::end);
contents.resize(myprotobuf.tellg());
myprotobuf.seekg(0, std::ios::beg);
myprotobuf.read(&string_in_file[0], contents.size());
myprotobuf.close();
RealMsg -> ParseFromString (string_in_file);
}

Resources