Find Protocol Buffer (protobuf) buffer in file, not know where it starts - protocol-buffers

I have a file that is suffixed by a protobuf buffer.
The problem is that I'm not sure exactly where the protobuf starts
Is there some magic prefix or something else at the beginning of the protobuf content that I can use to find it?

There is no standard prefix for the start of a protobuf message. But most messages will always start with the tag number of the first field defined. For example field 1 with integer coding would be prefix of 0x08.
You can also bruteforce it:
for (int length = MAX_LENGTH; length > 0; length--)
{
if (try_decode(&data[datalen - length], length))
{
break; // Found longest valid message
}
}
This would find the longest suffix (up to some maximum length) that can be successfully decoded. There is a risk that other random data may appear like a valid message.
You could also take advantage of the fact that protobuf encoding is usually in increasing tag number order. So if you see a lower number after a higher number, you know the message boundary is there. But in common protobuf libraries this is difficult to check, as they don't expose the internal tag number order while decoding.

Related

InterlockedAdd on R32_Sint with negative number

I am writing to an RWBuffer<int> using InterlockedAdd - originally I had an RWBuffer<uint> but I needed my values to go negative sometimes.
I find that using InterlockedAdd passing a negative number doesn't update the underlying int buffer - I tested this by using abs() on the value being passed in, and it worked.
I realize using an Add method to add a negative number might seem like "doh ! what did you expect" but there isnt an InterlockedSubtract() so ...
Is this a known issue that I just haven't managed to find the docs for, or would you normally expect InterlockedAdd(-1) to subtract 1 from an RWBuffer<int> like I did ?
I'm not sure how atomics are handled with typed buffers, but they definitely work with structured buffers.
In your case since typed buffer is R32 it would perfectly map to a int structured buffer.
Syntax would be :
RWStructuredBuffer<int> OutputBuffer : register(u0);
Then interlocked operation would be like (if you want to apply it on the 2nd element for example):
uint idx = 1;
uint current_value;
InterlockedAdd(OutputBuffer[idx],-1,current_value);
Buffer creation is slightly different, but nothing too complicated as a change (need to set the structured flag and also set element stride, which is 4 in that case).

About protobuf repeating varint decoding

I use charles and got a protobuf http message from other iOS applications. Now I want to genereate the same http packet but the output is not the same.
My protobuf file:
message TTCreateConversationBody
{
repeated uint32 imUid1 = 2;
}
I'm using objective-c:
TTCreateConversationBody *body = [TTCreateConversationBody new];
GPBUInt32Array *arr = [[GPBUInt32Array alloc] initWithCapacity:2];
[arr addValue:123123];
[arr addValue:9999999];
body.imUid1Array = arr;
and my output, charles decode it as a length-delimited string:
it's raw data and mine:
8A-26-10-08-01-10-AE-F7-81-80-9F-03-10-D4-E4-82-F0-D2-01
8A-26-10-08-01-12-0C-F9-F6-C3-9D-FA-02-AE-F7-81-80-9F-03
What's the correct protobuf file format?
They're actually both valid... ish.
This comes down to "packed" fields; without "packed", your two integers are encoded as
[header, varint][value][header, varint][value]
[10][AE-F7-81-80-9F-03][10][D4-E4-82-F0-D2-01]
where-as with "packed", it becomes
[header, string][length][value][value]
[12][0C][F9-F6-C3-9D-FA-02][AE-F7-81-80-9F-03]
note: the actual values look very different in the two runs... I'm assuming that is accidental.
To quote from the specification:
Protocol buffer parsers must be able to parse repeated fields that were compiled as packed as if they were not packed, and vice versa. This permits adding [packed=true] to existing fields in a forward- and backward-compatible way.
So: serializers should write the layout that is defined by whether your data is "packed" or not, but decoders must be able to handle it either way. Some libraries, when encountering data that should be "packed": determine which layout will actually be shorter, and make the final decision based on that. In reality, this can be approximated to "use packed encoding whenever there's at least two items".

What kind of encoding does posFlag requires?

How can I encode the position of the form /pathto/file.go:40:32 which is returned by token.Position.String() to a posFlag param required by ParseQueryPos which looks like /pathto/file.go:#550.
Why?
I'm using the Oracle tool to do some static analysis. I need to run Oracle.Query which requires a param of type *QueryPos. The only way to get *QueryPos is using ParseQueryPos.
The source to tools/pos.go called by ParseQueryPos says
// parsePosFlag parses a string of the form "file:pos" or
// file:start,end" where pos, start, end match #%d and represent byte
// offsets, and returns its components.
If you really had to convert from line:column strings, you'd look at the file contents and count up bytes (including newlines) leading to that line:column. But since you're working with a token.Position, it looks like you can get what you need from token.Position.Offset.

protocol buffer uint32 field with data always in [0,255]

In a Google protocol buffer, I'm going to use a field to store values that will be integers in [0,255]. From http://code.google.com/apis/protocolbuffers/docs/proto.html#scalar, it looks like the uint32 will be the appropriate value type to use. Despite the field being able to hold up to 32-bit integers, those extra bits will not be wasted in my case due to the variable length encoding. (Correct me if I'm wrong up to here.)
My question is: how should I indicate that the reader of a serialized message can assume that the largest value in that field will be 255? Just a comment in the protocol buffer specification? Is there any other way?
In .proto there is no such specification; you must simply document it (and presumably cast it appropriately at the consuming code).
Aside: if you happen to be using the C# protobuf-net implementation, then you can do this by working outside a .proto definition (protobuf-net allows code-first):
[ProtoMember(3)] // <=== field number
public byte SomeValue {get;set;}
This is then obviously constrained to 0-255, but is encoded on the wire as you expect (like a uint32). It also does a checked conversion when deserializing, to sanity-check the values.
In .proto, the above is closest to:
optional uint32 someValue = 3;

Generating confirmation numbers

I need a technique (an a pointer to sample code if you have) for generating conformation numbers for web payment. I don't want the customer to write down a long sequence like a GUID but I don't want it easily predictable as well.
Using C#
Thanks for all the tips.
I decided on a format like this:
TdddRROOO
T = 2009 (next year will be U = 2010)
ddd = days this year
RR = two random numbers
000 = order number (I'll offset this so folks can't know the order number that day)
So the confirmation number will be something like
P23477098
You could do something with a mixture. Generate the first half of the key as a known, predictable value (e.g. 00001, 00002, 00003, etc.) and then generate the second half as a randomly generated value so it won't be predictable. Then, increment the "known, predictable" value so that you will never get a match.
Your unique code would then become: 00001-53481, 00002-43853, 00003-54511, etc.
Of course, I am sure there are libraries out there that probably do this already. (It might help if you specify what language you are using.)
I recent did same thing in PHP. We use random function in this class,
https://github.com/kohana/core/blob/3.3/master/classes/Kohana/Text.php
We use random('distinct', 8) to generate confirmation number. It generates strings like this,
4CFY24HJ
JH5AYL7J
2TVWTMJ5
As you can see, it has no confusing numbers/letters like (1/l, 0/O etc) so it makes it much clearer when customers have to read the numbers over the phone.
Decide on the characters (char[] chars) that you want in your confirmation code, decide on the length of confirmation code (n), generate n random numbers (i_1, i_2, ... i_n) in the range [0..chars.Length) and return the string chars[i_1]chars[i_2]...chars[i_n].
In C#:
public string ConfirmationCode(char[] chars, int length, Random rg) {
StringBuilder codeBuilder = new StringBuilder();
for(int i = 0; i < length; i++) {
int index = rg.Next(chars.Length);
codeBuilder.Append(chars[index]);
}
return codeBuilder.ToString();
For uniqueness, prepend the current time in yyyyMMddhhmmss format.
Just generate a random number between 100000 and 999999, for example. Also a good idea is to put some letters in front that identify that it is a confirmation number, such as CONF-843682 so that people will recognize it more easily when you ask for it.
Store the number in the database, together with an ID for the order and an expiry date (say 1 year).
You could do something like get a random number of a specified length, convert to base64 and add a checksum character.
How about something like Amazon's PayPhrase? Use a library like Faker (Ruby) or Data::Faker (Perl) to generate random phrases, or write your own utility. Then just use a simple hash function to convert the "confirmation phrase" into a number you can index.
As for C# there exists a port Ruby's Faker gem at http://github.com/slashdotdash/faker-cs

Resources