File extension for serialized protobuf output - protocol-buffers

Seems odd that I can't find the answer to this, but what file extension are you supposed to use when storing serialized protobuf output in a file? Just .protobuf? The json equivalent of what I am talking about would be a .json file.

I just use .bin, but there's no actual standard here AFAIK. If protoc -o (which emits a .proto schema in protobuf binary format as a FileDescriptorSet) had taken a directory like all the other output options do, we could have used that as a de-facto answer, but protoc -o is unusual in that it takes a file instead. In an old post on the protobuf group, Kenton Varda (one of the original authors) suggests that the file extension should be implementation specific (meaning: you decide) rather than simply referring to the format: https://groups.google.com/forum/#!topic/protobuf/JWZx9n8CUvw

Related

Extract translatable strings from "NIBArchive" .nib files

My goal is to extract the localization keys and strings from a Base.lproj's .nib files.
While most compiled nib files use the binary plist format, I ran into a few that are in a different format, where the file starts with "NIBArchive".
An example (in macOS Monterey) is the file at:
/System/Library/CoreServices/Finder.app/Contents/Resources/Base.lproj/ClipWindow.nib
For "bplist" files, I can easily read them via CFPropertyListCreateFrom… into a NSDictionary, and then find the translatable strings therein (inside the "$classes" entry they're always three consecurity dict, string and string entries, with the dict containing the keys "NS.string", "NSKey" and "NSDev", and the following strings being the key and value of a translation entry, similar to what .strings files contain).
The NIBArchive, however format doesn't seem to be documented anyway. Has anyone figured out how to decode the entries in a meaningful manner so that I could find the translation items in them? Or convert them into the bplist format?
Note that this kind of file is a compiled nib, and ibtool won't work because it gives the error: "Interface Builder cannot open compiled nibs".
I am working with random .nib files, for which I don't know the implementation specifics. All I want is to extract are the .strings contents that were originally compiled into the Base localization file.
I had googled for this format before but found nothing. Now, with a slightly modified search, I ran into some answers.
My best hope to solving this so far is this format description, determined through reverse-engineering:
https://github.com/matsmattsson/nibsqueeze/blob/master/NibArchive.md
I can build a parser based on this, but still wonder if there are easier ways.
Another possible solution is to use NSKeyedUnarchiver to decode the data, after loading it into a NSNib object, as suggested here:
https://stackoverflow.com/a/4205296/43615
This method of decoding keyed archives of unknown types is also shown in the PlistExplorer project:
https://github.com/karstenBriksoft/PlistExplorer
It seems https://github.com/kam800/MachObfuscator does include a NIBArchive-reader NibArchive+Loading written in Swift.

Differences between .proto, .pb and .pbtxt

As explained in their website and Wikipedia, the Protocol Buffers (or Protobuf) is "used to serialize structured data". The definition of the data structure is done in a .proto file that can be compiled by protoc and turned into code (.cc/.h, .py, .java...) that can be imported to several languages to manipulate and serialize the data.
My understanding is that the .pb files contain that data in binary and the .pbtxt are an equivalent that contain it in ascii. Is that correct?
If so, why are .pbtxt so readable? I've found some with commentaries (https://github.com/google/mediapipe/blob/master/mediapipe/graphs/hand_tracking/subgraphs/hand_renderer_cpu.pbtxt).
Also, are .pb/.pbtxt enough to interpret the data? Or do you need their .proto?

gRPC protobuf bindings: Are changes to fileDescriptor breaking changes?

I am currently developing a gRPC service in Go with gRPC Gateway as an HTTP proxy. I am generating .pb.go bindings from my .proto files, but I noticed that there are subtle changes to my bindings in two separate but related situations when I wouldn't expect it. Each binding file has a mysterious field var fileDescriptorX = byte[]{.....} where X is actually a number. Both unexpected changes happen with this field and only this field.
My big question: Are these bindings compatible with each other or are changes to this field considered breaking changes, rendering different versions of the bindings incompatible?
First, if I add another proto file to the same folder and it comes alphabetically before the existing protos, the fileDescriptor field will be renamed when I re-generate my Go bindings. The number X at the end of the name of the field fileDescriptorX corresponds to its ordering compared to the other files in the folder. To be clear, if I have a folder with b.proto and b.pb.go, then I add a.proto and then re-reun my compiler to create a.pb.go, b.pb.go's file descriptor will get bumped from 0 to 1, and a.pb.go will get the new filedescriptor0.
Second, since I am using gRPC gateway, I wanted to change the paths in my HTTP options. Let's say I have an RPC in a.proto:
rpc GetFoo(GetFooRequest) returns (Foo) {
option (google.api.http) = {
get: "/v1alpha1/foo"
};
}
When I change the path above to "/api/foo/v1alpha1/foo", a.pb.gw.go changes understandably, but the bytes in a.pb.go's fileDescriptor0 field change as well.
There doesn't seem to be any documentation discussing how these fields are used and if changes in them are incompatible, breaking changes with other bindings. Any help is appreciated. Thanks!
These "file descriptors" are actually binary encodings of everything in your .proto file. The format is defined by descirptor.proto in the Protobuf source code. Any change you make to your .proto files is expected to cause the file descriptors to change.
This isn't documented because this is an internal implementation detail of the generated code. You don't need to worry about what's changing in the generated code. As long as your .proto changes follow the documented backwards-compatibility rules, your protocol will be compatible.

How secure the protobuf is to get some of the data out?

Without any encryption, if the recipient has the serialized Protobuf file but does not have the generated Protobuf class (they don't have access to the .proto file that define its structure), is it possible for them to get any data in the Protobuf file from the binary?
If they have access to a part of the .proto file (for example, just one related message in the file) can they get a part of that data out from the entire file while skipping other unknown parts?
yes, absolutely; the protoc tool can help with this (see: --decode_raw), as can https://protogen.marcgravell.com/decode - so it should not be treated as "secure" at all
yes, absolutely - that's a key part built into the protocol that allows messages to be extensible such that they can decode the bits they understand and either ignore or just store (for round-trip or "extension" fields) the bits they don't understand
protobuf is not a security device; to someone with the right tools it is just as readable as xml or json, with the slight issue that it can be uncertain how to interpret some values; but: you can infer and guess and reverse engineer
Ok, I have found this page https://developers.google.com/protocol-buffers/docs/encoding
The message discards all the names and is just a pair of key number and values. The generated class might offer some protection for safely reading these data and could not read unknown data. (Sure enough because the generated class was generated from known structure, .proto file)
But if I am an attacker I could reference that Encoding page and try to figure out which area in the binary corresponds to which data. For example, varint might be easy to spot after changing some data. And proceed to write my own .proto file to attack this unknown data or even a custom binary reader that can selectively read part of the binary.

Rules for file extensions?

Are there any rules for file extensions? For example, I wrote some code which reads and writes a byte pattern that is only understood by that specific programm. I'm assuming my anti virus programm won't be too happy if I give it the name "pleasetrustme.exe"... Is it gerally allowed to use those extensions? And what about the lesser known ones, like ".arw"?
You can use any file extension you want (or none at all). Using standard extensions that reflect the actual type of the file just makes things more convenient. On Windows, file extensions control stuff like how the files are displayed in Windows Explorer and what happens when you double click on it.
I wrote some code which reads and writes a byte pattern that is only
understood by that specific programm.
A file extension is only an indication of what type of data will be inside, never a guarantee that certain data formatted in a specific way will be inside the file.
For your own specific data structure it is of course always best to choose an extension that is not already in use for other file formats (or use a general extension like .dat or .bin maybe). This also has the advantage of being able to use an own icon without it being overwritten by other software using the same extension - or the other way around.
But maybe even more important when creating a custom (binary?) file format, is to provide a magic number as the first bytes of that file, maybe followed by a file header structure containing a version number etc. That way your own software can first check the header data to make sure it's the right type and version (for example: anyone could rename any file type to your extension, so your program needs to have a way to do some checks inside the file before reading the remaining data).

Resources