Restoring .proto file from descriptor string. Possible? - protocol-buffers

Is it possible to decompile a string containing Protocol Buffers descriptor back to .proto file?
Say I have a long string like
\n\file.proto\u001a\u000ccommon.proto\"\u00a3\u0001\n\nMsg1Request\u0012\u0017\n\u0006common\u0018\u0001 ... etc.
I need to restore .proto, not necessary exactly as it was but compilable.

In C++, the FileDescriptor interface has a method DebugString() which formats the descriptor contents in .proto syntax -- i.e. exactly what you want. In order to use it, you first need to write code to convert the raw FileDescriptorProto to a FileDescriptor, using the DescriptorPool interface.
Something like this should do it:
#include <google/protobuf/descriptor.h>
#include <google/protobuf/descriptor.pb.h>
#include <iostream>
int main() {
google::protobuf::FileDescriptorProto fileProto;
fileProto.ParseFromFileDescriptor(0);
google::protobuf::DescriptorPool pool;
const google::protobuf::FileDescriptor* desc =
pool.BuildFile(fileProto);
std::cout << desc->DebugString() << std::endl;
return 0;
}
You need to feed this program the raw bytes of the FileDescriptorProto, which you can get by using Java to encode your string to bytes using the ISO-8859-1 charset.
Also note that the above doesn't work if the file imports any other files -- you would have to load those imports into the DescriptorPool first.

Yes it should be possible to get some thing close get original definition. I do not know of any existing code to do it (hopefully some one else will).
Hava a look at how protocol buffers itself handles the String.
Basically
convert the string to bytes (using charset="ISO-8859-1" in java), it will then be a Protocol-Buffer message(format=FileDescriptorProto in java). The FileDescriptorProto is built as part of the Protocol-Buffers install.
Extract the data in the Protocol-Buffer message
Here is a File-Descriptor protocol displayed in the Protocol-Buffer editor

Related

Is there a way to programmatically dump Google Protocol Buffer packet?

If I save the packet data as a binary file, I can run it through protoc -decode to dump the data into a formatted textual representation.
I am wondering if there is any function available to dump the binary data as formatted text programmatically? My code is in JavaScript but C++ is fine as well.
One way would be to spawn protoc as a background process and get the results back. However, it is not an option for me to bundle protoc executable itself with my code. Regards.
One way how to do this could be to create a stream in JS and then post process with protoc decode, this would work since you could bind the stream to the protoc process or thread (the thread can read a buffer, and post process). This would work if you do not have any performance requirements. To solve the problem you could create a c++ binding to the JavaScript app. This would ensure efficiency to some extend.
You can use the google::protobuf::compiler::CommandLineInterface and google::protobuf::compiler::cpp::CppGenerator interface to achieve whatever protoc can do.
#include <google/protobuf/compiler/command_line_interface.h>
#include <google/protobuf/compiler/cpp/cpp_generator.h>
int dump() {
const char *argv[] = {"dumper", "--decode_raw"};
google::protobuf::compiler::CommandLineInterface cli;
google::protobuf::compiler::cpp::CppGenerator cpp_generator;
cli.RegisterGenerator("--cpp_out", &cpp_generator, "Generate C++ source and header.");
return cli.Run(sizeof(argv) / sizeof(char*), argv);
}
int main() {
return dump();
}
Build the above code into a bin: dumper, and run it like this:
cat your-binary-file | ./dumper

how to include text file as string at compile time without adding c++11 string literal prefix and suffix in the text file

I'm aware of many similar questions on this site. I really like the solution mention in the following link:
https://stackoverflow.com/a/25021520/884553
with some modification, you can include text file at compile time, for example:
constexpr const char* s =
#include "file.txt"
BUT to make this work you have to add string literal prefix and suffix to your original file, for example
R"(
This is the original content,
and I don't want this file to be modified. but i
don't know how to do it.
)";
My question is: is there a way to make this work but not modifying file.txt?
(I know I can use command line tools to make a copy, prepend and append to the copy, remove the copy after compile. I'm looking for a more elegant solution than this. hopefully no need of other tools)
Here's what I've tried (but not working):
#include <iostream>
int main() {
constexpr const char* s =
#include "bra.txt" // R"(
#include "file.txt" //original file without R"( and )";
#include "ket.txt" // )";
std::cout << s << "\n";
return 0;
}
/opt/gcc8/bin/g++ -std=c++1z a.cpp
In file included from a.cpp:5:
bra.txt:1:1: error: unterminated raw string
R"(
^
a.cpp: In function ‘int main()’:
a.cpp:4:27: error: expected primary-expression at end of input
constexpr const char* s =
^
a.cpp:4:27: error: expected ‘}’ at end of input
a.cpp:3:12: note: to match this ‘{’
int main() {
^
No, this cannot be done.
There is a c++2a proposal to allow inclusion of such resources at compile time called std::embed.
The motivation part of ths p1040r1 proposal:
Motivation
Every C and C++ programmer -- at some point -- attempts to #include large chunks of non-C++ data into their code. Of course, #include expects the format of the data to be source code, and thusly the program fails with spectacular lexer errors. Thusly, many different tools and practices were adapted to handle this, as far back as 1995 with the xxd tool. Many industries need such functionality, including (but hardly limited to):
Financial Development
representing coefficients and numeric constants for performance-critical algorithms;
Game Development
assets that do not change at runtime, such as icons, fixed textures and other data
Shader and scripting code;
Embedded Development
storing large chunks of binary, such as firmware, in a well-compressed format
placing data in memory on chips and systems that do not have an operating system or file system;
Application Development
compressed binary blobs representing data
non-C++ script code that is not changed at runtime; and
Server Development
configuration parameters which are known at build-time and are baked in to set limits and give compile-time information to tweak performance under certain loads
SSL/TLS Certificates hard-coded into your executable (requiring a rebuild and potential authorization before deploying new certificates).
In the pursuit of this goal, these tools have proven to have inadequacies and contribute poorly to the C++ development cycle as it continues to scale up for larger and better low-end devices and high-performance machines, bogging developers down with menial build tasks and trying to cover-up disappointing differences between platforms.
MongoDB has been kind enough to share some of their code below. Other companies have had their example code anonymized or simply not included directly out of shame for the things they need to do to support their workflows. The author thanks MongoDB for their courage and their support for std::embed.
The request for some form of #include_string or similar dates back quite a long time, with one of the oldest stack overflow questions asked-and-answered about it dating back nearly 10 years. Predating even that is a plethora of mailing list posts and forum posts asking how to get script code and other things that are not likely to change into the binary.
This paper proposes <embed> to make this process much more efficient, portable, and streamlined. Here’s an example of the ideal:
#include <embed>
int main (int, char*[]) {
constexpr std::span<const std::byte> fxaa_binary = std::embed( "fxaa.spirv" );
// assert this is a SPIRV file, compile-time
static_assert( fxaa_binary[0] == 0x03 && fxaa_binary[1] == 0x02
&& fxaa_binary[2] == 0x23 && fxaa_binary[3] == 0x07
, "given wrong SPIRV data, check rebuild or check the binaries!" )
auto context = make_vulkan_context();
// data kept around and made available for binary
// to use at runtime
auto fxaa_shader = make_shader( context, fxaa_binary );
for (;;) {
// ...
// and we’re off!
// ...
}
return 0;
}

Embedding big data file into executable binary

I am working on a C++11 application that is supposed to ship as a single executable binary file. Optionally, users can provide their own CSV data files to be used by the application. To simplify things, assume each element is in format key,value\n. I have created a structure such as:
typedef struct Data {
std::string key;
std::string value;
Data(std::string key, std::string value) : key(key), value(value) {}
} Data;
By default, the application should use data defined in a single header file. I've made a simple Python script to parse default CSV file and put it into header file like:
#ifndef MYPROJECT_DEFAULTDATA
#define MYPROJECT_DEFAULTDATA
#include "../database/DefaultData.h"
namespace defaults {
std::vector<Data> default_data = {
Data("SomeKeyA","SomeValueA"),
Data("SomeKeyB","SomeValueB"),
Data("SomeKeyC","SomeValueC"),
/* and on, and on, and on... */
Data("SomeKeyASFHOIEGEWG","SomeValueASFHOIEGEWG")
}
}
#endif //MYPROJECT_DEFAULTDATA
The only problem is, that file is big. I'm talking 116'087 (12M) lines big, and it will probably be replaced with even bigger file in the future. When I include it, my IDE is trying to parse it and update indices. It slows everything down to the point where I can hardly write anything.
I'm looking for a way to either:
prevent my IDE (CLion) from parsing it or
make a switch in cmake that would use this file only with release executables or
somehow inject data directly into executable
Since your build process already includes a pre-process, which generates C++ code from a CSV, this should be easy.
Step 1: Put most of the generated data in the .cpp file, not a header.
Step 2: Generate your code so that it doesn't use vector or string.
Here's how to do these:
struct Data
{
string_view key;
string_view value;
};
You will need an implementation of string_view or a similar type. While it was standardized in C++17, it doesn't rely on C++17 features.
As for the data structure itself, this is what gets generated in the header:
namespace defaults {
extern const std::array<Data, {{GENERATED_ARRAY_COUNT}}> default_data;
}
{{GENERATED_ARRAY_COUNT}} is the number of items in the array. That's all the generated header should expose. The generated .cpp file is a bit more complex:
static const char ptr[] =
"SomeKeyA" "SomeValueA"
"SomeKeyB" "SomeValueB"
"SomeKeyC" "SomeValueC"
...
"SomeKeyASFHOIEGEWG" "SomeValueASFHOIEGEWG"
;
namespace defaults
{
const std::array<Data, {{GENERATED_ARRAY_COUNT}}> default_data =
{
{{ptr+{{GENERATED_OFFSET}}, {{GENERATED_SIZE}}}, {ptr+{{GENERATED_OFFSET}}, {{GENERATED_SIZE}}}},
{{ptr+{{GENERATED_OFFSET}}, {{GENERATED_SIZE}}}, {ptr+{{GENERATED_OFFSET}}, {{GENERATED_SIZE}}}},
...
{{ptr+{{GENERATED_OFFSET}}, {{GENERATED_SIZE}}}, {ptr+{{GENERATED_OFFSET}}, {{GENERATED_SIZE}}}},
};
}
ptr is a string which is a concatenation of all of your individual strings. There is no need to put spaces or \0 characters or whatever between the individual strings. However, if you do need to pass these strings to APIs that take NULL-terminated strings, you'll either have to copy them into a std::string or have the generator stick \0 characters after each generated sub-string.
The point is that ptr should be a single, giant block of character data.
{{GENERATED_OFFSET}} and {{GENERATED_SIZE}} are offsets and sizes within the giant block of character data that represents a single substring.
This method will solve two of your problems. It will be much faster at load time, since it performs zero dynamic allocations. And it puts the generated strings in the .cpp file, thus making your IDE cooperate.

Reading an UTF-8 encoded file into std::u32string without intermediate buffering

Having worked quite long time with Unicode and C++ I thought this would be a simple thing to accomplish, especially with the new C++11 std::codecvt_utf8 facet. Though it turned out to be a diffcult task. What I want is to read a file encoded in UTF-8 into a u32string (converting it from UTF-8 to UTF-32 implicitly). Sure, I could load the entire content into a buffer and convert that using std::wstring_convert. But that doubles the memory footprint when loading a file. So I tried to use a std::wifstream and imbue a locale with a utf-8 facet like this:
std::wifstream stream(fileName, std::ios::binary);
stream.imbue(std::locale(stream.getloc(), new std::codecvt_utf8<char32_t, 0x10ffff, std::consume_header>));
std::u32string data;
for (char32_t c; stream >> c; )
data += c;
which looks like a straight forward implementation. It only doesn't compile. wifstream's element type is wchar_t, so you can only use wchar_t in the loop, like this:
std::u32string data;
for (wchar_t c; stream >> c; )
data += c;
(at least with clang, VC++ also accepts char32_t there, but that doesn't change anything). After fixing this several other problems remain, though:
In Visual C++ wchar_t is only 16bit (no UTF-32 then, we don't consider surrogate pairs here).
Using char32_t for the facet essentially disables conversion. The iteration over the stream returns the original UTF-8 content, both in clang and VC++.
Using wchar_t also for the facet makes it work in clang, but not in VC++, because in clang wchar_t is 32bit wide, while (as mentioned already) it is only 16bit in VC++.
So, what is the correct approach here? With the lock into wchar_t for the facet I cannot even use a different data type. I also tried defining a basic_ifstream<char32_t> but that requires additional typedefs, hence I didn't follow that path further.
Seems there is no way using a facet and imbue that in a stream, so I went with an intermediate buffer, which is a very elegant solution too, only that it doubles (more or less) the memory needed to load the content. Use a byte (file) stream in binary mode to call this:
void load(std::istream &stream)
{
static std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> utfConverter;
std::string s((std::istreambuf_iterator<char>(stream)), std::istreambuf_iterator<char>());
_data = utfConverter.from_bytes(s);
}

My exists function says the file exists, but winapi functions say it does not

I copied code that's supposed to change desktop wallpaper. I have this constant in my program:
const char * image_name = "button_out.gif";
Later, I write the image on disk using Magick++:
image.write(image_name);
The image appears in program's working directory. If I run the program directly from explorer the working directory equals the program location.
Because the code prints the 0x80070002 - File not found error I added a exist function in the beginning:
#include <sys/stat.h>
bool exists(const char* name) {
struct stat buffer;
return (stat (name, &buffer) == 0);
}
void SetWallpaper(LPCWSTR file){
if(!exists((const char* )file)) {
wcout << "The file "<<file<<" does not exist!" << endl;
return;
... actually try to set a wallpaper ...
}
The error is not printed however and the code proceeds.
Now the question is:
Does my exist function work properly?
Where does windows look for that image?
Full code to set a Magick++ generated image as background in case I have missed something relevant in this question.
Problem 1: String Conversions
Your primary problem is that you are attempting to use LPCWSTR (a const wchar_t *) and const char * interchangeably. I see a number of issues in your source, in particular:
You start with const char * image_name.
You then cast it to a LPCWSTR to pass to SetWallpaper. This basically guarantees that SetWallpaper will fail, as desktop->SetWallpaper is not able to handle non wide-character strings.
You then cast it back to a const char * to pass to stat() via exists(). This should work in your situation (since the original string really is a char *) but isn't correct because your string parameter to SetWallpaper is supposedly a proper LPCWSTR.
You need to pick a string format (wide-character vs. what Windows terms "ANSI") and stick to that format, using consistent APIs throughout.
The easiest option is probably just to leave most of your code untouched, but modify SetWallpaper to take a const char * and convert to a wide-character string when needed (for this you can use mbstowcs). So, for example:
void SetWallpaper(const char * file){ // <- Use a const char* parameter.
...
// Convert to a wide-character string to pass to COM:
wchar_t wcfile[MAX_PATH + 1];
mbstowcs(wcfile, file, sizeof(wcfile) / sizeof(wchar_t));
// Pass the converted wide-character string:
desktop->SetWallpaper(wcfile, 0);
...
}
The other option would be to use wide-character strings throughout, i.e.:
LPCWSTR image_name = L"button_out.gif";
Modify exists() to take a LPCWSTR and use _wstat() instead.
Use wide-character versions of all other API functions.
However, I am unsure how that would interact with the ImageMagick API, which may not have wide-character support. So it's up to you. Choose whatever approach is the easiest to implement but make sure you are consistent. The general rule is do not cast between LPCWSTR and const char *; if you are ever in a situation where you need to change one to the other, you cannot cast, you must convert (via mbstowcs or wcstombs).
Problem 2: SetWallpaper default directory is not current working directory
At this point, your string usage will be consistent. Now that you have that problem ironed out, if SetWallpaper fails while exists() does not, then SetWallpaper is not looking where you think it is. As you discovered in your comment, SetWallpaper looks in the desktop by default. In this case, while I have not tested it, you may be able to work around this by passing an absolute path to SetWallpaper. For this, you can use GetFullPathName to determine the absolute file name given your relative path. Remember to be consistent with your string types, though.
Also, if stat() continues to fail, then that problem is either that your working directory is not what you think it is, or your filename is not what you think it is. To that end you will want to perform the following tests:
Print the current working directory at the point you check for the files existence, verify it is correct.
Print the filename when you check for its existence, verify it is correct.
You should be good to go once you work all the above issues out.

Resources