Reverse engineering .proto files from pb2.py generated with protoc - protocol-buffers

Is it possible to get proto files from generated pb2.py with protoc? Will be the same reverse engineering possible for gRPC?

The format of the _pb2.py file varies between protobuf-python versions, but most of them have a field called serialized_pb inside them. This contains the whole structure of the .proto file in the FileDescriptorProto format:
serialized_pb=b'\n\x0c...'
This can be passed to the protoc compiler to generate headers for other languages. However, it has to be first put inside a FileDescriptorSet to match the format correctly. This can be done using Python:
import google.protobuf.descriptor_pb2
fds = google.protobuf.descriptor_pb2.FileDescriptorSet()
fds.file.append(google.protobuf.descriptor_pb2.FileDescriptorProto())
fds.file[0].ParseFromString(b'\n\x0c... serialized_pb data ....')
open('myproto.txt', 'w').write(str(fds))
open('myproto.pb', 'wb').write(fds.SerializeToString())
The snippet above saves a human-readable version to myproto.txt and a format that is nominally compatible with protoc to myproto.pb. The text representation looks like this:
file {
name: "XYZ.proto"
dependency: "dependencyXYZ.proto"
message_type {
name: "MyMessage"
field {
name: "myfield"
number: 1
label: LABEL_OPTIONAL
type: TYPE_INT32
}
...
For example C++ headers could now be generated using:
protoc --cpp_out=. --descriptor_set_in=myproto.pb XYZ.proto
Note that the XYZ.proto must match the name of the file in the descriptor set, which you can check in myproto.txt. However this method quickly gets difficult if the file has dependencies, as all of those dependencies have to be collected in the same descriptor set. In some cases it may be easier to just use the textual representation to rewrite the .proto file by hand.

Related

Get list of files containing string(s) or pattern(s)

Is there a Gradle pattern for retrieving the list of files in a folder or set of folders that contain a given string, set of strings, or pattern?
My project produces RPMs and is using the Nebula RPM type (great package!). There are a couple of different kinds of sets of files that need post-processing. I am trying to generate the list of files that contain the strings that are the indicators for post-processing. For example, files that contain "#doc" need to be processed by the doc generator script. Files that contain "#HOSTNAME#" and "#HOSTFQDN#" need to be processed by sed to replace the strings with the actual host name or host fqdn.
The search root in the package will be src\main\resources. With the result the build script sets up the post-install script commands - something like:
postInstall('/opt/product/bin/postprocess.sh ' + join(filesContainingDocs, " "))
postInstall('/bin/sed -i -e "s/#HOSTNAME#/$(hostname -s)/" -e s/#HOSTFQDN#/$(hostname)/" ' + join(filesContainingHostname, " ")
I can figure out the postinstall syntax. I'm having difficulty finding the filter for any of the regular Gradle 'things' (i.e., FileTree) that operate on contents of files rather than names of files. How would I populate filesContainingDocs and filesContainingHostname - something along the lines of:
filesContainingDocs = FileTree('src/main/resources', { contents.matches('#doc') }
filesContainingHostname = FileTree('src/main/resources', { contents.matches('#(HOSTNAME|HOSTFQDN)#') }
While the post-process script could simply do the grep, the several RPMs in our product overlay each other and each RPM should only post-process the files it provides, so a general grep over the final installed folder is not workable - it would catch files provided by other RPMs. It seems to me that I ought to be able to, at build time, produce the correct static list of files from the bigger set of source files that comprise the given RPM's project.
It doesn't have to be FileTree - running a command like findstr /s /m /c:"#doc" src\main\resources\*.conf (alas, the build platform is Windows) produces the answer in stdout but I'm not sure how to get that result into an object Gradle can use to expand the result. (I also suspect there is a 'more Gradle way' to do this.)
The set of files, and the contents of those files, is generally fairly small.
I'm having difficulty finding the filter for any of the regular Gradle 'things' (i.e., FileTree) that operate on contents of files rather than names of files.
You can apply any filter you can imagine on a Gradle file tree, in the end it is just Groovy (or Kotlin) code running in the JVM. Each Gradle FileTree is nothing more than a (lazily evaluated) collection of Java File objects. To filter those File objects, you can read their content, e.g. in the same way you would read them in Java. Groovy even provides a JDK enhancement for the Java class File that includes the simple method getText() for this purpose. Now you can easily filter for files that contain a certain string:
filesContainingDocs = fileTree('src/main/resources').filter { file ->
file.text.contains('#doc')
}
Using Groovy, you can call getters like .getText() in the same way as accessing fields (.text in this case).
If a simple contains check is not enough, the Groovy JDK enhancements even provide the method matches(Pattern pattern) on CharSequence/string instances to perform a regular extension check:
filesContainingDocs = fileTree('src/main/resources').filter { file ->
file.text.replace('\r\n','\n').matches('.*some regex.*') }
}

Is there any way to decompiler google protobuf binary file(.pb file) to .proto file

When I'm reversing an apk I got .pb file but not .proto file, is there any way to decompiler this file to .proto file or can I just generate java code from this .pb file ?
If (as per comments) the file you have is the compiled descriptor set, then you can use protoc to generate any language (that it usually supports) from this; simply use the --descriptor_set_in=FILES option at the command line to specify your file as input (in place of FILES), and use --java_out=OUT_DIR (or whatever) to indicate the output language and location.

File does not reside within any path specified using proto_path

I am testing out importing .proto file from another directory.
$GOPATH/src/A/A.proto
syntax = "proto3";
package A;
message SomeMsg {
string msg = 2;
int64 id = 3;
}
$GOPATH/src/B/B.proto
syntax = "proto3";
package B; import "A/A.proto";
message Msg {
SomeMsg s = 1;
}
I'm doing this:
in folder A:
protoc A.proto --go_out=.
and then in folder B:
protoc B.proto --go_out=. --proto_path=$GOPATH/
But I will get this error:
B.proto: File does not reside within any path specified using --proto_path (or -I). You must specify a --proto_path which encompasses this file. Note that the proto_path must be an exact prefix of the .proto file names -- protoc is too dumb to figure out when two paths (e.g. absolute and relative) are equivalent (it's harder than you think).
Error seems clear enough to me, it is saying that you need to specify the exact directory that B.proto is in
protoc B.proto --go_out=. --proto_path=$GOPATH/src/B
or if you are in folder B already,
protoc B.proto --go_out=.
protoc B.proto --go_out=. --proto_path=$GOPATH/src/B --progo_path=. worked for me.
--progo_path=. may help you too.
Case1: The '..' cannot used in path of .proto file if all paths are in 'absolute format'.
Case2: Another word to explain that error. The relevant path and absolute path cannot use in mixed within -I and path to .proto file, since the 'prefix' means the STRING-PREFIX instead of 'a path can jump to by relevant-path in filesystem'.
=========
It seems that the .proto and -I its resided in should use both relative path or both, otherwise error occured.
In case of a reference to a proto file in a C# project, note that the path to the proto file is case sensitive. In my case a link to the proto file in a .net 4.6 project (VS2022 on Windows), looks like this. When I use uppercase characters in the path, the compiler gives the "File does not reside within any path specified using --proto_path" error, as mentioned. However, I can use relative paths.
<!-- all path characters must be lower case -->
<Protobuf Include="..\path.to.proto\protos\myfile.proto">
<Link>Protos\myfile.proto</Link>
</Protobuf>
Also note that you must change the csproj file manually to be sure that the xml-element is called 'Protobuf' and not 'None'! Of course, only applicable for visual studio project situations.

How to convert data to a serialized tf.Example(tensorflow tfrecords) using Golang

Is there a Golang API to convert data to a serialized tf.Example, also known as tensorflow tfrecords.
I can find a Python api tf.python_io.TFRecordWriter() to achieve this.
But examples with golang can not be found.
I want do this coz i use a golang client to call the tensorflow serving and my input feature for nn is a SparseTensor.
tf.Example refers to a protocol buffer, while tfrecords typically refers to a file format which stores "records" in the form of strings (which is why tf.python_io.TFRecordWriter.write() takes a string). So the two are separate things, not technically related. Though often one serializes tf.Example protocol buffers into a strings, one each per records in a TFRecordWriter.
That said, it is common for models to take as input a serialized tf.Example protocol buffer as input in the form of a STRING tensor. If that is the case, then you'd want to construct the tf.Example protocol buffer in Go and then use proto.Marshal to construct the tf.Tensor object to feed.
Unfortunately, as of January 2018, you have to generate the Go files for the tf.Example protos yourself. That can be done with something like:
# Fetch the tools
GOPATH=$(go env GOPATH)
go get github.com/golang/protobuf/proto
go get github.com/golang/protobuf/protoc-gen-go
# Create a directory to generate the files
mkdir -p /tmp/protos/out
cd /tmp/protos
# Clone the TensorFlow sources to get the source .proto files
git clone https://github.com/tensorflow/tensorflow ./src
PATH=$PATH:${GOPATH}/bin
# Generate Go source files from the .proto files.
# Assuming protoc is installed.
# See https://github.com/google/protobuf/releases
protoc \
-I ./src \
--go_out ./out \
./src/tensorflow/core/framework/*.proto ./src/tensorflow/core/example/*.proto
rm -rf ./src
This will generate a bunch of .pb.go files in /tmp/protos/out which can be used to build the tf.Example protocol buffer structures to marshal.
Hope that gives you enough information to get going.
For anyone still interested in interacting with TFRecords and tf.data.Example, there exists a library, created by NVIDIA, with MIT licence, called go-tfdata: https://github.com/NVIDIA/go-tfdata
It can transform any source of records/samples into set of tf.data.Example - TFRecord file.
Additionally, it natively supports TAR archives to TFRecord conversion.
From examples - converting TAR to TFRecord is as simple as:
pipeline := NewPipeline().FromTar(inFile).SampleToTFExample().ToTFRecord(outFile)
pipeline.Do()

Disable encoding checking in java gradle project

I want to migrate one of our java projects from ant to gradle. This project has got a lot of source code wrote by few programmers. The problem is that some of files are encoded in ANSi and some in UTF-8 (this generates compile errors). I know that I can set encoding using compileJava.options.encoding = 'UTF-8' but this will not work (not all files are encoded in UTF-8). Is it possible to disable encoding checking (I don't want to change encoding of all files)?
This is not an issue with Gradle but with javac. However, you can solve this issue running a one-time groovy script in your gradle build as described below.
Normally you'd only need to add following line to your build.gradle file:
compileJava.options.encoding = 'UTF-8'
However, some text editors when saving files to UTF-8 will generate a byte order mark (BOM) header at the beginning of the text files.
And javac does not understand the BOM, not even when you compile with encoding="UTF-8" option so you're probably getting an error such as this:
> javac -encoding UTF8 Test.java
Test.java:1: error: illegal character: \65279
?class Test {
You need to strip the BOM from your source files or convert your source file to another encoding. Notepad++ for example can convert the file encoding from one to another.
For lots of source files you can easily write a simple task in Groovy/Gradle to open your source text files and convert the UTF-8 removing the BOM prefix from the first line if found.
Add this to your build.gradle and run gradle convertSource
task convertSource << {
// convert sources files in source set to normalized text format
sourceSets.main.java.each { file ->
// read first "raw" line via BufferedReader
def r = new BufferedReader(new FileReader(file))
String s = r.readLine()
r.close()
// get entire file normalized
String text = file.text
// get first "normalized" line
String normalizedLine = new StringReader(text).readLine()
if (s != normalizedLine) {
println "rename: $file"
File target = new File(file.getParentFile(), file.getName() + '.bak')
if (!target.exists()) {
if (file.renameTo(target))
file.setText(text)
else
println "failed to rename or target already exists"
}
}
}
} // end task
The convertSource task will simply enumerate all of the source files, read first "raw" line from each source file then read the normalized text lines and compare first lines. If the first line is different then it would output a new target file with the normalized text and save backup of the original source. Only need to run convertSource task one-time after which you can remove original source files and the compile should work without getting encoding errors.

Resources