error reading protobuf message as string from file

error reading protobuf message as string from file - protocol-buffers

I intend to read protobuf data from a file and use it in my code. I tried to do some reverse engineering and filled the data into my file and dumbed the string in a log file. I read this log file now and change values of fields and read it in a string using file handling. However it doesn't seem to work. Is there some problem with my code.
std::ifstream myprotobuf("/root/testfile.txt", std::ios::in | std::ios::binary);
if (myprotobuf)
{
std::string string_in_file;
myprotobuf.seekg(0, std::ios::end);
contents.resize(myprotobuf.tellg());
myprotobuf.seekg(0, std::ios::beg);
myprotobuf.read(&string_in_file[0], contents.size());
myprotobuf.close();
RealMsg -> ParseFromString (string_in_file);
}

Related

Google Cloud DLP - CSV inspection

I'm trying to inspect a CSV file and there are no findings being returned (I'm using the EMAIL_ADDRESS info type and the addresses I'm using are coming up with positive hits here: https://cloud.google.com/dlp/demo/#!/). I'm sending the CSV file into inspect_content with a byte_item as follows:
byte_item: {
type: :CSV,
data: File.open('/xxxxx/dlptest.csv', 'r').read
}
In looking at the supported file types, it looks like CSV/TSV files are inspected via Structured Parsing.
For CSV/TSV does that mean one can't just sent in the file, and needs to use the table attribute instead of byte_item as per https://cloud.google.com/dlp/docs/inspecting-structured-text?
What about for XSLX files for example? They're an unspecified file type so I tried with a configuration like so, but it still returned no findings:
byte_item: {
type: :BYTES_TYPE_UNSPECIFIED,
data: File.open('/xxxxx/dlptest.xlsx', 'rb').read
}
I'm able to do inspection and redaction with images and text fine, but having a bit of a problem with other file types. Any ideas/suggestions welcome! Thanks!
Edit: The contents of the CSV in question:
$ cat ~/Downloads/dlptest.csv
dylans#gmail.com,anotehu,steve#example.com
blah blah,anoteuh,
aonteuh,
$ file ~/Downloads/dlptest.csv
~/Downloads/dlptest.csv: ASCII text, with CRLF line terminators
The full request:
parent = "projects/xxxxxxxx/global"
inspect_config = {
info_types: [{name: "EMAIL_ADDRESS"}],
min_likelihood: :POSSIBLE,
limits: { max_findings_per_request: 0 },
include_quote: true
}
request = {
parent: parent,
inspect_config: inspect_config,
item: {
byte_item: {
type: :CSV,
data: File.open('/xxxxx/dlptest.csv', 'r').read
}
}
}
dlp = Google::Cloud::Dlp.dlp_service
response = dlp.inspect_content(request)

The CSV file I was testing with was something I created using Google Sheets and exported as a CSV, however, the file showed locally as a "text/plain; charset=us-ascii". I downloaded a CSV off the internet and it had a mime of "text/csv; charset=utf-8". This is the one that worked. So it looks like my issue was specifically due the file being an incorrect mime type.

xlsx is not yet supported. Coming soon. (Maybe that part of the question should be split out from the CSV debugging issue.)

Serialize and deserialize protobufs through CLI?

I am trying to deserialize a file saved as a protobuf through the CLI (seems like the easiest thing to do). I would prefer not to use protoc to compile, import it into a programming language and then read the result.
My use case: A TensorFlow lite tool has output some data in a protobuf format. I've found the protobuf message definition in the TensorFlow repo too. I just want to read the output quickly. Specifically, I am getting back a tflite::evaluation::EvaluationStageMetrics message from the inference_diff tool.

I assume that the tool outputs a protobuf message in binary format.
protoc can decode the message and output in text format. See this option:
--decode=MESSAGE_TYPE Read a binary message of the given type from
standard input and write it in text format
to standard output. The message type must
be defined in PROTO_FILES or their imports.

While Timo Stamms answer was instrumental, I still struggled with the paths to get protoc to work in a complex repo (e.g. TensorFlow).
In the end, this worked for me:
cat inference_diff.txt | \
protoc --proto_path="/Users/ben/butter/repos/tensorflow/" \
--decode tflite.evaluation.EvaluationStageMetrics \
$(pwd)/evaluation_config.proto
Here I pipe the binary contents of the file containing protobuf (inference_diff.txt in my case, generated by following this guide), and specify the fully qualified protobuf message (which I got by combining the package tflite.evaluation; and the message name, EvaluationStageMetrics), the absolute path of the project for the proto_path (which is the project root/ TensorFlow repo), and also the absolute path for the file which actually contains the message. proto_path is just used for resolving imports, where as the PROTO_FILE (in this case, evaluation_config.proto), is used to decode the file.
Example Output
num_runs: 50
process_metrics {
inference_profiler_metrics {
reference_latency {
last_us: 455818
max_us: 577312
min_us: 453121
sum_us: 72573828
avg_us: 483825.52
std_deviation_us: 37940
}
test_latency {
last_us: 59503
max_us: 66746
min_us: 57828
sum_us: 8992747
avg_us: 59951.646666666667
std_deviation_us: 1284
}
output_errors {
max_value: 122.371696
min_value: 83.0335922
avg_value: 100.17548828125
std_deviation: 8.16124535
}
}
}
If you just want to get the numbers in a rush and can't be bothered to fix the paths, you can do
cat inference_diff.txt | protoc --decode_raw
Example output
1: 50
2 {
5 {
1 {
1: 455818
2: 577312
3: 453121
4: 72573828
5: 0x411d87c6147ae148
6: 37940
}
2 {
1: 59503
2: 66746
3: 57828
4: 8992747
5: 0x40ed45f4b17e4b18
6: 1284
}
3 {
1: 0x42f4be4f
2: 0x42a61133
3: 0x40590b3b33333333
4: 0x41029476
}
}
}

Vala error "unknown type name" using enum from camel

I am writing this code in Vala, using Camel
using Camel;
[...]
MimeParser par = new MimeParser();
[...]
par.push_state( MimeParserState.MULTIPART, boundary );
I downloaded the camel-1.2.vapi from github vala-girs (this link), put it in a vapi subdirectory and compiled with
valac --vapidir=vapi --includedir=/usr/include/evolution-data-server/camel --pkg camel-1.2 --pkg posix --target-glib=2.32 -o prog prog.vala -X -lcamel-1.2
Compiling I get this error:
error: unknown type name "CamelMimeParserState"
const gchar* camel_mime_parser_state_to_string (CamelMimeParserState self);
Looking the C output code I see that the CamelMimeParserState type is used several times but it is never defined. It should be a simple enum because the camel-1.2.vapi file says:
[CCode (cheader_filename = "camel/camel.h", cprefix = "CAMEL_MIME_PARSER_STATE_", has_type_id = false)]
public enum MimeParserState {
INITIAL,
PRE_FROM,
FROM,
HEADER,
BODY,
MULTIPART,
MESSAGE,
PART,
END,
EOF,
PRE_FROM_END,
FROM_END,
HEADER_END,
BODY_END,
MULTIPART_END,
MESSAGE_END
}
So why doesn't the C output code simply use an enum as the vapi file says (described by cprefix CAMEL_MIME_PARSER_STATE_)?
Is there an error in the .vapi file?

I found the solution. The vapi file is wrong because the cname field is missing. Changing the vapi file adding this cname="camel_mime_parser_state_t":
[CCode (cheader_filename = "camel/camel.h", cname="camel_mime_parser_state_t", cprefix = "CAMEL_MIME_PARSER_STATE_", has_type_id = false)]
public enum MimeParserState {
INITIAL,
[...]
works correctly.

How to write my own code generator of protobuf

Google protobuf is a nice IDL for RPC. But I want to know how to write my own code generator for protobuf.

The protoc compiler can output a protobuf-formatted description of the .proto file. That way most of the parsing has been done for you already, and you only need to generate the output you want.
The .proto schema for the .proto file description is here:
https://github.com/google/protobuf/blob/master/src/google/protobuf/descriptor.proto
As an additional step, you can make your generator runnable via an "-mygenerator-out=." option on protoc itself:
https://developers.google.com/protocol-buffers/docs/reference/other
Here is one (albeit a bit convoluted) example on how a code generator can be written in Python:
https://github.com/nanopb/nanopb/blob/master/generator/nanopb_generator.py

A protoc plugin is a binary that takes a protobuf message of type CodeGeneratorRequest and returns a response of type CodeGeneratorResponse to standard out.
The binary must be called protoc-gen-NAME and can be used by invoking the protoc command with:
protoc --plugin=./path/to/protoc-gen-NAME --NAME_out=./test/generated ./test.proto
Note specifically that names are important. This will not work, it will invoke the java generator:
protoc --plugin=./path/to/protoc-gen-NAME --java_out=./test/generated ./test.proto
This will not work, because the binary does not have the correct name:
protoc --plugin=./path/to/whatever-NAME --NAME_out=./test/generated ./test.proto
In order to process the incoming CodeGeneratorRequest and generate a valid response, your binary must itself be able to parse the protobuf message as per the protocol file plugin.proto from the protocolbuffers repository.
Historically this was difficult to do in a self-contained manner, but you can do this 'end-to-end' entirely in rust simply with the protobuf crate, like this trivial example demonstrates:
[dependencies]
protobuf="3.0.2"
use protobuf::plugin::{code_generator_response, CodeGeneratorRequest, CodeGeneratorResponse};
use protobuf::Message;
use std::io;
use std::io::{BufReader, Read, Write};
fn main() {
// Read message from stdin
let mut reader = BufReader::new(io::stdin());
let mut incoming_request = Vec::new();
reader.read_to_end(&mut incoming_request).unwrap();
// Parse as a request
let req = CodeGeneratorRequest::parse_from_bytes(&incoming_request).unwrap();
// Generate the content for each output file
let mut response = CodeGeneratorResponse::new();
for proto_file in req.proto_file.iter() {
let mut output = String::new();
output.push_str(&format!("// from file: {:?}\n", &proto_file.name));
output.push_str(&format!("// package: {:?}\n", &proto_file.package));
for message in proto_file.message_type.iter() {
output.push_str(&format!("\nmessage: {:?}\n", &message.name));
for field in message.field.iter() {
output.push_str(&format!(
"- {:?} {:?} {:?}\n",
field.type_,
field.type_name,
field.name(),
));
}
}
// Add it to the response
let mut output_file = code_generator_response::File::new();
output_file.content = Some(output);
output_file.name = Some(format!("{:?}/out.txt", &proto_file.name.as_ref().unwrap()));
response.file.push(output_file);
}
// Serialize the response to binary message and return it
let out_bytes: Vec<u8> = response.write_to_bytes().unwrap();
io::stdout().write_all(&out_bytes).unwrap();
}
Obviously this trivial example doesn't generate code, just text files, but it shows the basic process. You should also iterate over service and deal with all the additional properties on each type.
What this basically gives you is an AST matching the .proto files; the codegen side of it can be done however you like.
Helpful hints:
Do not log to stdout in your plugin, eg. for debugging. The only permitted output to stdout is a protobuf format CodeGeneratorResponse message.
The plugin does not write files, the protoc command does that; it should generate content and then return an array of files, along with content and metadata.
For more information on plugins, carefully read the plugin.proto file linked above; it has extensive details.

Reading and Writing File Contents From HTTP Post Data In Bash

Problem Statement:
I am trying to upload a file through an HTML form using an HTTP post request and then write it to a file called configuration.xml on my local server. I can only use the stock capabilities of the server, so, as much as I'd love to, I can't use cURL, PHP, Perl, or anything that I'd have to install on the server. What I have tried doing is having the HTML form open a CGI file as the form action and all this CGI file does is run the Bash script with the proper HTML formatting. I would run the Bash script directly from the HTML form, but my research led me to believe that this isn't possible without having to edit .htaccess or other hacky alternatives, which are not roads I want to go down. (If this can be done in a reasonable fashion, please enlighten me!) Regardless, I am able to successfully run the Bash script. I know this because I put a "touch configuration.xml" command in the script and it creates it every time. My script is also able to tell that it is an HTTP Post, as shown by the echoed text in the browser, but then I can't seem to be able to properly read any data from the file. I tried echoing the data as well as redirecting the read data to a file, but nothing appeared in the browser and nothing wrote to the file I specified. This very well may be me not knowing Bash scripting well enough or something silly like that, but I really don't know how to proceed from here.
Code:
UploadToServer.html:
<form action="run_script.cgi" method="POST" enctype="multipart/form-data">
<input type="file" name="file" />
<input type="submit" name="submit" value="Submit">
</form>
run_script.c:
Note: I compile this to a CGI file with the following command: gcc run_script.c -o run_script.cgi
#include <stdlib.h>
#include <stdio.h>
int main() {
system("./test.sh &");
printf("Content-Type: text/html\r\n\r\n");
printf(""); // print blank line for proper HTML header formatting
printf("<html>\n");
printf("</HTML>\n");
}
test.sh:
The non-commented code in the second if statement is from here. The commented code is from here.
#!/bin/bash
touch configuration.xml
if [[ $REQUEST_METHOD = 'POST' ]]; then
echo "this is a post!"
if [ "$CONTENT_LENGTH" -gt 0 ]; then
echo "entered second if statement!"
# read -n $CONTENT_LENGTH POST_DATA <&0
# echo "$CONTENT_LENGTH"
while read line
do eval "echo ${line}"
done
fi
fi
I also tried the approach in the third code block of this post, but didn't get any output. I also looked through this post, but it doesn't seem to grab all the data from the file like I need to. I also tried the approach of just using a CGI file like suggested in this post (_http://blog.purplepixie.org/2013/08/cc-cgi-file-upload/), but, once again, no output. I've been looking through the Apache error log as I try new things and no errors come up.
Anybody have any ideas on what I could possibly be doing wrong? Is there a different approach worth looking into? Any suggestions are greatly appreciated!

I figured out how to do it, with some help from my friends. I ended up doing it all in a CGI script and foregoing the Bash component. While this isn't what I asked for in my original question, it gets the job done for me, which is really what the question was asking.
The following is the C code I'm now using to successfully write the file on the server:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void print_empty_html_page();
int main() {
char * req_method_str = getenv("REQUEST_METHOD");
if (req_method_str != NULL) {
if (strcmp(req_method_str, "POST") == 0) {
// process POST arguments
char * len_str = getenv("CONTENT_LENGTH");
if (len_str != NULL) {
int len = atoi(len_str);
if (len > 0) {
FILE * fp;
fp = fopen("file.xml", "w");
char * postdata = malloc((len + 1) * sizeof(char));
fread(postdata, sizeof(char), len, stdin);
postdata[len] = '\0';
fprintf(fp, "%s\n", postdata);
free(postdata);
fclose(fp);
}
system("sed -e '/Content/d' -e '/[-][-][*][*][*][*][*]/d' -e '/^[s]*$/d' -e '/WebKitFormBoundary/d' -e '/Submit/d' < file.xml > file_trimmed.xml");
system("rm file.xml");
}
}
}
print_empty_html_page();
return 0;
}
void print_empty_html_page() {
// Send the content type, letting the browser know this is HTML
printf("Content-type: text/html\r\n\r\n");
// Header information that prevents browser from caching
printf(
"<META HTTP-EQUIV=\"CACHE-CONTROL\" CONTENT=\"NO-CACHE, NO-STORE\">\r\n\r\n");
// Top of the page
printf("<html>\n");
printf("<BODY>\n");
// Finish up the page
printf("</BODY></html>\n");
}
Note: This method writes the entire HTTP POST to the file 'file.xml'. The system call to 'sed' is to remove the tags from the HTTP POST that don't correspond to the actual data in the file that was uploaded. If you need to check for additional unwanted lines, just add another -e '/<line_with_expression_to_delete>/d' in that sed call, where line_with_expression_to_delete is the expression you want to match and then delete all lines containing that expression. I couldn't figure out how to delete all the blank lines in the newly uploaded file, even though '/^[s]*$/d' should do that, according to my research. Gonna have to look into that more...
Also note: This method only works for uploading text files. It does not work for other file types, such as JPEGs or OGGs.
Hopefully this helps some other people with the same problem. Let me know if you have any questions.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

error reading protobuf message as string from file - protocol-buffers

Related

Google Cloud DLP - CSV inspection

Serialize and deserialize protobufs through CLI?

Vala error "unknown type name" using enum from camel

How to write my own code generator of protobuf

Reading and Writing File Contents From HTTP Post Data In Bash

Categories

Resources