How to decode protobuf binary response - protocol-buffers

I have created a test app that can recognize some image using Goggle Goggles. It works for me, but I receive binaryt protobuf response. I have no proto-files, just binary response. How can I get data from it? (Have sent some image with bottle of bear and got the nex response):
A
TuborgLogo9 HoaniText���;�)b���2d8e991bff16229f6"�
+TR=T=AQBd6Cl4Kd8:X=OqSEi:S=_rSozFBgfKt5d9b0
+TR=T=6rLQxKE2xdA:X=OqSEi:S=gd6Aqb28X0ltBU9V
+TR=T=uGPf9zJDWe0:X=OqSEi:S=32zTfdIOdI6kuUTa
+TR=T=RLkVoGVd92I:X=OqSEi:S=P7yOhvSAOQW6SRHN
+TR=T=J1FMvNmcyMk:X=OqSEi:S=5Z631_rd2ijo_iuf�
need to get string "Tuborg" and if possible type - "Logo"

You can decode with protoc:
protoc --decode_raw < msg.bin

UnknownFieldSet.parseFrom(msg).toString()
This will show you the top level fields. Unfortunately it can't know the exact details of field types. long/int/bool/enum etc are all encoded as Varint and all look the same. Strings, byte-arrays and sub-messages are length-delimited and are also indistinguishable.
Some useful details here: https://github.com/dcodeIO/protobuf.js/wiki/How-to-reverse-engineer-a-buffer-by-hand
If you follow the code in the UnknownFieldSet.mergeFrom() you'll see how you could try decode sub-messages and falling back to strings if that fails - but it's not going to be very reliable.
There are 2 spare values for the wiretype in the protocol - it would have been really helpful if google had used one of these to denote sub-messages. (And the other for null values perhaps.)
Here's some very crude rushed code which attempts to produce a something useful for diagnostics. It guesses at the data types and in the case of strings and sub-messages it will print both alternatives in some cases. Please don't trust any values it prints:
public static String decodeProto(byte[] data, boolean singleLine) throws IOException {
return decodeProto(ByteString.copyFrom(data), 0, singleLine);
}
public static String decodeProto(ByteString data, int depth, boolean singleLine) throws IOException {
final CodedInputStream input = CodedInputStream.newInstance(data.asReadOnlyByteBuffer());
return decodeProtoInput(input, depth, singleLine);
}
private static String decodeProtoInput(CodedInputStream input, int depth, boolean singleLine) throws IOException {
StringBuilder s = new StringBuilder("{ ");
boolean foundFields = false;
while (true) {
final int tag = input.readTag();
int type = WireFormat.getTagWireType(tag);
if (tag == 0 || type == WireFormat.WIRETYPE_END_GROUP) {
break;
}
foundFields = true;
protoNewline(depth, s, singleLine);
final int number = WireFormat.getTagFieldNumber(tag);
s.append(number).append(": ");
switch (type) {
case WireFormat.WIRETYPE_VARINT:
s.append(input.readInt64());
break;
case WireFormat.WIRETYPE_FIXED64:
s.append(Double.longBitsToDouble(input.readFixed64()));
break;
case WireFormat.WIRETYPE_LENGTH_DELIMITED:
ByteString data = input.readBytes();
try {
String submessage = decodeProto(data, depth + 1, singleLine);
if (data.size() < 30) {
boolean probablyString = true;
String str = new String(data.toByteArray(), Charsets.UTF_8);
for (char c : str.toCharArray()) {
if (c < '\n') {
probablyString = false;
break;
}
}
if (probablyString) {
s.append("\"").append(str).append("\" ");
}
}
s.append(submessage);
} catch (IOException e) {
s.append('"').append(new String(data.toByteArray())).append('"');
}
break;
case WireFormat.WIRETYPE_START_GROUP:
s.append(decodeProtoInput(input, depth + 1, singleLine));
break;
case WireFormat.WIRETYPE_FIXED32:
s.append(Float.intBitsToFloat(input.readFixed32()));
break;
default:
throw new InvalidProtocolBufferException("Invalid wire type");
}
}
if (foundFields) {
protoNewline(depth - 1, s, singleLine);
}
return s.append('}').toString();
}
private static void protoNewline(int depth, StringBuilder s, boolean noNewline) {
if (noNewline) {
s.append(" ");
return;
}
s.append('\n');
for (int i = 0; i <= depth; i++) {
s.append(INDENT);
}
}

I'm going to assume the real question is how to decode protobufs and not how to read binary from the wire using Java.
The answer to your question can be found here
Briefly, on the wire, protobufs are encoded as 3-tuples of <key,type,value>, where:
the key is the field number assigned to the field in the .proto schema
the type is one of <Varint, int32, length-delimited, start-group, end-group,int64. It contains just enough information to decode the value of the 3-tuple, namely it tells you how long the value is.

Related

Check if two mathematical expressions are equivalent

I came across a question in an interview. I tried solving it but could not come up with a solution. Question is:
[Edited]
First Part: You are given two expressions with only "+" operator, check if given two expressions are mathematically equivalent.
For eg "A+B+C" is equivalent to "A+(B+C)".
Second Part : You are given two expressions with only "+" and "-" operators, check if given two expressions are mathematically equivalent.
For eg "A+B-C" is equivalent to "A-(-B+C)".
My thought process : I was thinking in terms of building an expression tree out of the given expressions and look for some kind of similarity. But I am unable to come up with a good way of checking if two expression trees are some way same or not.
Can some one help me on this :) Thanks in advance !
As long as the operations are commutative, the solution I'd propose is distribute parenthetic operations and then sort terms by 'variable', then run an aggregator across them and you should get a string of factors and symbols. Then just check the set of factors.
Aggregate variable counts until encountering an opening brace, treating subtraction as addition of the negated variable. Handle sub-expressions recursively.
The content of sub-expressions can be directly aggregated into the counts, you just need to take the sign into account properly -- there is no need to create an actual expression tree for this task. The TreeMap used in the code is just a sorted map implementation in the JDK.
The code takes advantage of the fact that the current position is part of the Reader state, so we can easily continue parsing after the closing bracket of the recursive call without needing to hand this information back to the caller explicitly somehow.
Implementation in Java (untested):
class Expression {
// Count for each variable name
Map<String, Integer> counts = new TreeMap<>();
Expression(Srring s) throws IOException {
this(new StringReader(s));
}
Expression(Reader reader) throws IOException {
int sign = 1;
while (true) {
int token = reader.read();
switch (token) {
case -1: // Eof
case ')':
return;
case '(':
add(sign, new Expression(reader));
sign = 1;
break;
case '+':
break;
case '-':
sign = -sign;
break;
default:
add(sign, String.valueOf((char) token));
sign = 1;
break;
}
}
}
void add(int factor, String variable) {
int count = counts.containsKey(variable) ? counts.get(variable) : 0;
counts.put(count + factor, variable);
}
void add(int sign, Expression expr) {
for (Map.Entry<String,Integer> entry : expr.counts.entrySet()) {
add(sign * entry.getVaue(), entry.getKey());
}
}
void equals(Object o) {
return (o instanceof Expression)
&& ((Expression) o).counts.equals(counts);
}
// Not needed for the task, just added for illustration purposes.
String toString() {
StringBuilder sb = new StringBuilder();
for (Map.Entry<String,Integer> entry : expr.counts.entrySet()) {
if (sb.length() > 0) {
sb.append(" + ");
}
sb.append(entry.getValue()); // count
sb.append(entry.getKey()); // variable name
}
return sb.toString();
}
}
Compare with
new Expression("A+B-C").equals(new Expression("A-(-B+C)"))
P.S: Added a toString() method to illustrate the data structure better.
Should print 1A + 1B + -1C for the example.
P.P.P.P.S.: Fixes, simplification, better explanation.
You can parse the expressions from left to right and reduce them to a canonical form for comparison in a straightforward way; the only complication is that when you encounter a closing bracket, you need to know whether its associated opening bracket had a plus or minus in front of it; you can use a stack for that; e.g.:
function Dictionary() {
this.d = [];
}
Dictionary.prototype.add = function(key, value) {
if (!this.d.hasOwnProperty(key)) this.d[key] = value;
else this.d[key] += value;
}
Dictionary.prototype.compare = function(other) {
for (var key in this.d) {
if (!other.d.hasOwnProperty(key) || other.d[key] != this.d[key]) return false;
}
return this.d.length == other.d.length;
}
function canonize(expression) {
var tokens = expression.split('');
var variables = new Dictionary();
var sign_stack = [];
var total_sign = 1;
var current_sign = 1;
for (var i in tokens) {
switch(tokens[i]) {
case '(' : {
sign_stack.push(current_sign);
total_sign *= current_sign;
current_sign = 1;
break;
}
case ')' : {
total_sign *= sign_stack.pop();
break;
}
case '+' : {
current_sign = 1;
break;
}
case '-' : {
current_sign = -1;
break;
}
case ' ' : {
break;
}
default : {
variables.add(tokens[i], current_sign * total_sign);
}
}
}
return variables;
}
var a = canonize("A + B + (A - (A + C - B) - B) - C");
var b = canonize("-C - (-A - (B + (-C)))");
document.write(a.compare(b));

Protobuf 3.0 Any Type pack/unpack

I would like to know how to transform a Protobuf Any Type to the original Protobuf message type and vice versa. In Java from Message to Any is easy:
Any.Builder anyBuilder = Any.newBuilder().mergeFrom(protoMess.build());
But how can I parse that Any back to the originial message (e.g. to the type of "protoMess")? I could probably parse everything on a stream just to read it back in, but that's not what I want. I want to have some transformation like this:
ProtoMess.MessData.Builder protoMessBuilder = (ProtoMess.MessData.Builder) transformToMessageBuilder(anyBuilder)
How can I achieve that? Is it already implemented for Java? The Protobuf Language Guide says there were pack and unpack methods, but there are none in Java.
Thank you in Advance :)
The answer might be a bit late but maybe this still helps someone.
In the current version of Protocol Buffers 3 pack and unpack are available in Java.
In your example packing can be done like:
Any anyMessage = Any.pack(protoMess.build()));
And unpacking like:
ProtoMess protoMess = anyMessage.unpack(ProtoMess.class);
Here is also a full example for handling Protocol Buffers messages with nested Any messages:
ProtocolBuffers Files
A simple Protocol Buffers file with a nested Any message could look like:
syntax = "proto3";
import "google/protobuf/any.proto";
message ParentMessage {
string text = 1;
google.protobuf.Any childMessage = 2;
}
A possible nested message could then be:
syntax = "proto3";
message ChildMessage {
string text = 1;
}
Packing
To build the full message the following function can be used:
public ParentMessage createMessage() {
// Create child message
ChildMessage.Builder childMessageBuilder = ChildMessage.newBuilder();
childMessageBuilder.setText("Child Text");
// Create parent message
ParentMessage.Builder parentMessageBuilder = ParentMessage.newBuilder();
parentMessageBuilder.setText("Parent Text");
parentMessageBuilder.setChildMessage(Any.pack(childMessageBuilder.build()));
// Return message
return parentMessageBuilder.build();
}
Unpacking
To read the child message from the parent message the following function can be used:
public ChildMessage readChildMessage(ParentMessage parentMessage) {
try {
return parentMessage.getChildMessage().unpack(ChildMessage.class);
} catch (InvalidProtocolBufferException e) {
e.printStackTrace();
return null;
}
}
EDIT:
If your packed messages can have different Types, you can read out the typeUrl and use reflection to unpack the message. Assuming you have the child messages ChildMessage1 and ChildMessage2 you can do the following:
#SuppressWarnings("unchecked")
public Message readChildMessage(ParentMessage parentMessage) {
try {
Any childMessage = parentMessage.getChildMessage();
String clazzName = childMessage.getTypeUrl().split("/")[1];
String clazzPackage = String.format("package.%s", clazzName);
Class<Message> clazz = (Class<Message>) Class.forName(clazzPackage);
return childMessage.unpack(clazz);
} catch (ClassNotFoundException | InvalidProtocolBufferException e) {
e.printStackTrace();
return null;
}
}
For further processing, you could determine the type of the message with instanceof, which is not very efficient. If you want to get a message of a certain type, you should compare the typeUrl directly:
public ChildMessage1 readChildMessage(ParentMessage parentMessage) {
try {
Any childMessage = parentMessage.getChildMessage();
String clazzName = childMessage.getTypeUrl().split("/")[1];
if (clazzName.equals("ChildMessage1")) {
return childMessage.unpack("ChildMessage1.class");
}
return null
} catch (InvalidProtocolBufferException e) {
e.printStackTrace();
return null;
}
}
Just to add information in case someone has the same problem ... Currently to unpack you have to do (c# .netcore 3.1 Google.Protobuf 3.11.4)
Foo myobject = anyMessage.Unpack<Foo>();
I know this question is very old, but it still came up when I was looking for the answer. Using #sundance answer I had to answer do this a little differently. The problem being that the actual message was a subclass of the actual class. So it required a $.
for(Any x : in.getDetailsList()){
try{
String clazzName = x.getTypeUrl().split("/")[1];
String[] split_name = clazzName.split("\\.");
String nameClass = String.join(".", Arrays.copyOfRange(split_name, 0, split_name.length - 1)) + "$" + split_name[split_name.length-1];
Class<Message> clazz = (Class<Message>) Class.forName(nameClass);
System.out.println(x.unpack(clazz));
} catch (Exception e){
e.printStackTrace();
}
}
With this being the definition of my proto messages
syntax = "proto3";
package cb_grpc.msg.Main;
service QueryService {
rpc anyService (AnyID) returns (QueryResponse) {}
}
enum Buckets {
main = 0;
txn = 1;
hxn = 2;
}
message QueryResponse{
string content = 1;
string code = 2;
}
message AnyID {
Buckets bucket = 1;
string docID = 2;
repeated google.protobuf.Any details = 3;
}
and
syntax = "proto3";
package org.querc.cb_grpc.msg.database;
option java_package = "org.querc.cb_grpc.msg";
option java_outer_classname = "database";
message TxnLog {
string doc_id = 1;
repeated string changes = 2;
}

Compare each string in datatable with that of list takes longer time.poor performance

I have a datatable of 200,000 rows and want to validate each row with that of list and return that string codesList..
It is taking very long time..I want to improve the performance.
for (int i = 0; i < dataTable.Rows.Count; i++)
{
bool isCodeValid = CheckIfValidCode(codevar, codesList,out CodesCount);
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
List<Codes> tempcodes= codesList.Where(code => code.StdCode.Equals(codevar)).ToList();
if (tempcodes.Count == 0)
{
RetVal = false;
for (int i = 0; i < dataTable.Rows.Count; i++)
{
bool isCodeValid = CheckIfValidCode(codevar, codesList,out CodesCount);
}
}
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
List<Codes> tempcodes= codesList.Where(code => code.StdCode.Equals(codevar)).ToList();
if (tempcodes.Count == 0)
{
RetVal = false;
}
else
{
RetVal=true;
}
return bRetVal;
}
codelist is a list which also contains 200000 records. Please suggest. I used findAll which takes same time and also used LINQ query which also takes same time.
A few optimizations come to mind:
You could start by removing the Tolist() altogether
replace the Count() with .Any(), which returns true if there are items in the result
It's probably also a lot faster when you replace the List with a HashSet<Codes> (this requires your Codes class to implement HashCode and Equals properly. Alternatively you could populate a HashSet<string> with the contents of Codes.StdCode
It looks like you're not using the out count at all. Removing it would make this method a lot faster. Computing a count requires you to check all codes.
You could also split the List into a Dictionary> which you populate with by taking the first character of the code. That would reduce the number of codes to check drastically, since you can exclude 95% of the codes by their first character.
Tell string.Equals to use a StringComparison of type Ordinal or OrdinalIgnoreCase to speed up the comparison.
It looks like you can stop processing a lot earlier as well, the use of .Any takes care of that in the second method. A similar construct can be used in the first, instead of using for and looping through each row, you could short-circuit after the first failure is found (unless this code is incomplete and you mark each row as invalid individually).
Something like:
private bool CheckIfValidCode(string codevar, List<Codes> codesList)
{
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
return codes.Contains(codevar);
// or: return codes.Any(c => string.Equals(codevar, c, StringComparison.Ordinal);
}
If you're adamant about the count:
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
count = codes.Count(codevar);
// or: count = codes.Count(c => string.Equals(codevar, c, StringComparison.Ordinal);
return count > 0;
}
You can optimize further by creating the HashSet outside of the call and re-use the instance:
InCallingCode
{
...
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
for (/*loop*/) {
bool isValid = CheckIfValidCode(codevar, codes, out int count)
}
....
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
count = codes.Count(codevar);
// or: count = codes.Count(c => string.Equals(codevar, c, StringComparison.Ordinal);
return count > 0;
}

How to get a substring in some length for special chars like Chinese

For example, I can get 80 chars with {description?substring(0, 80)} if description is in English, but for Chinese chars, I can get only about 10 chars, and there is a garbage char at the end always.
How can I get 80 chars for any language?
FreeMarker relies on String#substring to do the actual (UTF-16-chars-based?) substring calculation, which doesn't work well with Chinese characters. Instead one should uses Unicode code points. Based on this post and FreeMarker's own substring builtin I hacked together a FreeMarker TemplateMethodModelEx implementation which operates on code points:
public class CodePointSubstring implements TemplateMethodModelEx {
#Override
public Object exec(List args) throws TemplateModelException {
int argCount = args.size(), left = 0, right = 0;
String s = "";
if (argCount != 3) {
throw new TemplateModelException(
"Error: Expecting 1 string and 2 numerical arguments here");
}
try {
TemplateScalarModel tsm = (TemplateScalarModel) args.get(0);
s = tsm.getAsString();
} catch (ClassCastException cce) {
String mess = "Error: Expecting numerical argument here";
throw new TemplateModelException(mess);
}
try {
TemplateNumberModel tnm = (TemplateNumberModel) args.get(1);
left = tnm.getAsNumber().intValue();
tnm = (TemplateNumberModel) args.get(2);
right = tnm.getAsNumber().intValue();
} catch (ClassCastException cce) {
String mess = "Error: Expecting numerical argument here";
throw new TemplateModelException(mess);
}
return new SimpleScalar(getSubstring(s, left, right));
}
private String getSubstring(String s, int start, int end) {
int[] codePoints = new int[end - start];
int length = s.length();
int i = 0;
for (int offset = 0; offset < length && i < codePoints.length;) {
int codepoint = s.codePointAt(offset);
if (offset >= start) {
codePoints[i] = codepoint;
i++;
}
offset += Character.charCount(codepoint);
}
return new String(codePoints, 0, i);
}
}
You can put an instance of it into your data model root, e.g.
SimpleHash root = new SimpleHash();
root.put("substring", new CodePointSubstring());
template.process(root, ...);
and use the custom substring method in FTL:
${substring(description, 0, 80)}
I tested it with non-Chinese characters, which still worked, but so far I haven't tried it with Chinese characters. Maybe you want to give it a try.

How to get out of repetitive if statements?

While looking though some code of the project I'm working on, I've come across a pretty hefty method which does
the following:
public string DataField(int id, string fieldName)
{
var data = _dataRepository.Find(id);
if (data != null)
{
if (data.A == null)
{
data.A = fieldName;
_dataRepository.InsertOrUpdate(data);
return "A";
}
if (data.B == null)
{
data.B = fieldName;
_dataRepository.InsertOrUpdate(data);
return "B";
}
// keep going data.C through data.Z doing the exact same code
}
}
Obviously having 26 if statements just to determine if a property is null and then to update that property and do a database call is
probably very naive in implementation. What would be a better way of doing this unit of work?
Thankfully C# is able to inspect and assign class members dynamically, so one option would be to create a map list and iterate over that.
public string DataField(int id, string fieldName)
{
var data = _dataRepository.Find(id);
List<string> props = new List<string>();
props.Add("A");
props.Add("B");
props.Add("C");
if (data != null)
{
Type t = typeof(data).GetType();
foreach (String entry in props) {
PropertyInfo pi = t.GetProperty(entry);
if (pi.GetValue(data) == null) {
pi.SetValue(data, fieldName);
_dataRepository.InsertOrUpdate(data);
return entry;
}
}
}
}
You could just loop through all the character from 'A' to 'Z'. It gets difficult because you want to access an attribute of your 'data' object with the corresponding name, but that should (as far as I know) be possible through the C# reflection functionality.
While you get rid of the consecutive if-statements this still won't make your code nice :P
there is a fancy linq solution for your problem using reflection:
but as it was said before: your datastructure is not very well thought through
public String DataField(int id, string fieldName)
{
var data = new { Z = "test", B="asd"};
Type p = data.GetType();
var value = (from System.Reflection.PropertyInfo fi
in p.GetProperties().OrderBy((fi) => fi.Name)
where fi.Name.Length == 1 && fi.GetValue(data, null) != null
select fi.Name).FirstOrDefault();
return value;
}
ta taaaaaaaaa
like that you get the property but the update is not yet done.
var data = _dataRepository.Find(id);
If possible, you should use another DataType without those 26 properties. That new DataType should have 1 property and the Find method should return an instance of that new DataType; then, you could get rid of the 26 if in a more natural way.
To return "A", "B" ... "Z", you could use this:
return (char)65; //In this example this si an "A"
And work with some transformation from data.Value to a number between 65 and 90 (A to Z).
Since you always set the lowest alphabet field first and return, you can use an additional field in your class that tracks the first available field. For example, this can be an integer lowest_alphabet_unset and you'd update it whenever you set data.{X}:
Init:
lowest_alphabet_unset = 0;
In DataField:
lowest_alphabet_unset ++;
switch (lowest_alphabet_unset) {
case 1:
/* A is free */
/* do something */
return 'A';
[...]
case 7:
/* A through F taken */
data.G = fieldName;
_dataRepository.InsertOrUpdate(data);
return 'G';
[...]
}
N.B. -- do not use, if data is object rather that structure.
what comes to my mind is that, if A-Z are all same type, then you could theoretically access memory directly to check for non null values.
start = &data;
for (i = 0; i < 26; i++){
if ((typeof_elem) *(start + sizeof(elem)*i) != null){
*(start + sizeof(elem)*i) = fieldName;
return (char) (65 + i);
}
}
not tested but to give an idea ;)

Resources