How to use protobuf to deserialize string between c# and java? - protocol-buffers

In c#,i use protobuf-net to serialize a string to byte array and send it through network
var bytes = Serializer.SerializeObject("Hello,world")
this byte array contains 13 elements, includes 2 prefix tags, start with 0x10, then 0x0b for string length.
i tried to deserialize in java, I use ByteString to convert that byte array to string, i got an error string: \n Hello,world!
this means that java does not ignore the prefix tags.
anybody knows why? thx!

The protobuf format doesn't allow for naked data, so protobuf-net interprets Serializer.Serialize("Hello, world") as though it were a message of the form:
message Foo {
optional value = 1;
}
and as if you had - in C# terms - used:
Serializer.Serialize(new Foo { value = "Hello, world") });
The 0x10 is the field marker for field 1, etc.
If you ever want to check the internals of an encoded message without knowing the schema, this tool may help: https://protogen.marcgravell.com/decode

Related

New line character in serialized messages

Some protobuf messages, when serialized to string, have new line character \n inside them. Usually when the first field of the message is a string then the new line character is prepended before the message. But wa also found messages with new line character somewhere in the middle.
The problem with new line character is when you want to save the messages into one file line by line. The new line character breaks the line and makes the message invalid.
example.proto
syntax = "proto3";
package data_sources;
message StringFirst {
string key = 1;
bool valid = 2;
}
message StringSecond {
bool valid = 1;
string key = 2;
}
example.py
from protocol_buffers.data_sources.example_pb2 import StringFirst, StringSecond
print(StringFirst(key='some key').SerializeToString())
print(StringSecond(key='some key').SerializeToString())
output
b'\n\x08some key'
b'\x12\x08some key'
Is this expected / desired behaviour? How can one prevent the new line character?
protobuf is a binary protocol (unless you're talking about the optional json thing). So: any time you're treating it as text-like in any way, you're using it wrong and the behaviour will be undefined. This includes worrying about whether there are CR/LF characters, but it also includes things like the nul-character (0x00), which is often interpreted as end-of-string in text-based APIs in many frameworks (in particular, C-strings).
Specifically:
LF (0x0A) is identical to the field header for "field 1, length-prefixed"
CR (0x0D) is identical to the field header for "field 1, fixed 32-bit"
any of 0x00, 0x0A or 0x0D could occur as a length prefix (to signify a length of 0, 10, or 13)
any of 0x00, 0x0A or 0x0D could occur naturally in binary data (bytes)
any of 0x00, 0x0A or 0x0D could occur naturally in any numeric type
0x0A or 0x0D could occur naturally in text data (as could 0x00 if your originating framework allows nul-characters arbitrarily in strings, so... not C-strings)
and probably a range of other things
So: again - if the inclusion of "special" text characters is problematic: you're using it wrong.
The most common way to handle binary data as text is to use a base-N encode; base-16 (hex) is convenient to display and read, but base-64 is more efficient in terms of the number of characters required to convey the same number of bytes. So if possible: convert to/from base-64 as required. Base-64 never includes any of the non-printable characters, so you will never encounter CR/LF/nul.

Get string with base-16 (hex) rendering of the bytes of an ASCII string

E.g.
input := "Office"
want := "4f6666696365" // Note: this is a string!!
I know that string literals are stored in UTF-8 already.
What is the easiest way to get convert this to string in UTF-8 representation?
Calling EncodeRune on each character seems too cumbersome.
What you're looking for is a string that contains the hex representation of your input string. That is not UTF-8. (Any string that's valid ASCII is also valid UTF-8.)
In any case, this is how to do what you want:
want := fmt.Sprintf("%x", []byte(input))

Ruby: What does unpack("C") actually do?

From the docs, unpack does:
Decodes str (which may contain binary data) according to the format
string, returning an array of each value extracted.
And the "C" format means 8-bit unsigned (unsigned char).
But what does this actually end up doing to the string I input? What does the result mean, and if I had to do it by hand, how would I go about doing that?
It converts each subsequent char to it’s integer ordinal as String#ord does. That said,
string.unpack 'C*'
is an exact equivalent of
string.each_char.map(&:ord)
But what does this actually end up doing to the string I input
It doesn't do anything to the input. And the input is not really a string here. It's typed as a string, but it is really a buffer of binary data, such as you might receive by networking, and your goal is to extract that data into an array of integers. Example:
s = "\01\00\02\03"
arr = s.unpack("C*")
p(arr) # [1,0,2,3]
That "string" would be meaningless as a string of text, but it is quite viable as a data buffer. Unpacking it allows you examine the data.

How to convert byte array to string in Go [duplicate]

This question already has answers here:
How do I convert [Size]byte to string in Go?
(8 answers)
Closed 2 years ago.
[]byte to string raises an error.
string([]byte[:n]) raises an error too.
By the way, for example, sha1 value to string for filename.
Does it need utf-8 or any other encoding set explicitly?
Thanks!
The easiest method I use to convert byte to string is:
myString := string(myBytes[:])
The easiest way to convert []byte to string in Go:
myString := string(myBytes)
Note: to convert a "sha1 value to string" like you're asking, it needs to be encoded first, since a hash is binary. The traditional encoding for SHA hashes is hex (import "encoding/hex"):
myString := hex.EncodeToString(sha1bytes)
In Go you convert a byte array (utf-8) to a string by doing string(bytes) so in your example, it should be string(byte[:n]) assuming byte is a slice of bytes.
I am not sure that i understand question correctly, but may be:
var ab20 [20]byte = sha1.Sum([]byte("filename.txt"))
var sx16 string = fmt.Sprintf("%x", ab20)
fmt.Print(sx16)
https://play.golang.org/p/haChjjsH0-
ToBe := [6]byte{65, 66, 67, 226, 130, 172}
s:=ToBe[:3]
// this will work
fmt.Printf("%s",string(s))
// this will not
fmt.Printf("%s",string(ToBe))
Difference : ToBe is an array whereas s is a slice.
First you're getting all these negatives reviews because you didn't provided any code.
Second, without a good example. This is what i'd do
var Buf bytes.Buffer
Buf.Write([]byte)
myString := Buf.String()
Buf.Reset() // Reset the buffer to reuse later
or better yet
myString := string(someByteArray[:n])
see here also see #JimB's comment
That being said if you help that targets your program, please provide and example of what you've tried, the expect results, and error.
We can just guess what is wrong with your code because no meaningful example is provided. But first what I see that string([]byte[:n]) is not valid at all. []byte[:n] is not a valid expression because no memory allocated for the array. Since byte array could be converted to string directly I assume that you have just a syntax error.
Shortest valid is fmt.Println(string([]byte{'g', 'o'}))

byte[] to string an string to byte[]

i know this was handled a lot here, but i couldnt solve my problem yet:
I read bytes from a Parceble Object and save them in a byte[], then I unmurshall
them back to an Object an it works all fine. But i have to send the bytes as a String, so i
have to convert the bytes to string and then return.
I thought it would work as follow:
byte[] bytes = p1.marshall(); //get my object as bytes
String str = bytes.toString();
byte[] someBytes = str.getBytes();
But it doesnt Work, when I "p2.unmarshall(someBytes, 0, someBytes.length);" with someBytes, but when I p2.unmarshall(bytes, 0, bytes.length); with bytes, it works fine. How can i convert bytes to String right?
You've got three problems here:
You're calling toString() on byte[], which is just going to give you something like "[B#15db9742"
You're assuming you can just convert a byte array into text with no specific conversion, and not lose data
You're calling getBytes() without specifying the character encoding, which is almost always a mistake.
In this case, you should just use base64 - that's almost always the right thing to do when converting arbitrary binary data to text. (If you were actually trying to decode encoded text, you should use new String(bytes, charset), but that's not the case here.)
So, using android.util.Base64:
String str = Base64.encodeToString(bytes, Base64.DEFAULT);
byte[] someBytes = Base64.decode(str, Base64.DEFAULT);

Resources