How to explain embedded message binary wire format of protocol buffer? - protocol-buffers

I'm trying to understand protocol buffer encoding method, when translating message to binary(or hexadecimal) format, I can't understand how the embedded message is encoded.
I guess maybe it's related to memory address, but I can't find the accurate relationship.
Here is what i've done.
Step 1: I defined two messages in test.proto file,
syntax = "proto3";
package proto_test;
message Education {
string college = 1;
}
message Person {
int32 age = 1;
string name = 2;
Education edu = 3;
}
Step 2: And then I generated some go code,
protoc --go_out=. test.proto
Step 3: Then I check the encoded format of the message,
p := proto_test.Person{
Age: 666,
Name: "Tom",
Edu: &proto_test.Education{
College: "SOMEWHERE",
},
}
var b []byte
out, err := p.XXX_Marshal(b, true)
if err != nil {
log.Fatalln("fail to marshal with error: ", err)
}
fmt.Printf("hexadecimal format:% x \n", out)
fmt.Printf("binary format:% b \n", out)
which outputs,
hexadecimal format:08 9a 05 12 03 54 6f 6d 1a fd 96 d1 08 0a 09 53 4f 4d 45 57 48 45 52 45
binary format:[ 1000 10011010 101 10010 11 1010100 1101111 1101101 11010 11111101 10010110 11010001 1000 1010 1001 1010011 1001111 1001101 1000101 1010111 1001000 1000101 1010010 1000101]
what I understand is ,
08 - int32 wire type with tag number 1
9a 05 - Varints for 666
12 - string wire type with tag number 2
03 - length delimited which is 3 byte
54 6f 6d - ascii for "TOM"
1a - embedded message wire type with tag number 3
fd 96 d1 08 - ? (here is what I don't understand)
0a - string wire type with tag number 1
09 - length delimited which is 9 byte
53 4f 4d 45 57 48 45 52 45 - ascii for "SOMEWHERE"
What does fd 96 d1 08 stands for?
It seems like that d1 08 always be there, but fd 96 sometimes change, don't know why. Thanks for answering :)
Add
I debugged the marshal process and reported a bug here.

At that location I/you would expect the number of bytes in the embedded message.
I have repeated your experiment in Python.
msg = Person()
msg.age = 666
msg.name = "Tom"
msg.edu.college = "SOMEWHERE"
I got a different result, the one I would expect. A varint stating the size of the embedded message.
0x08
0x9A, 0x05
0x12
0x03
0x54 0x6F 0x6D
0x1A
0x0B <- Equals to 11 decimal.
0x0A
0x09
0x53 0x4F 0x4D 0x45 0x57 0x48 0x45 0x52 0x45
Next I deserialized your bytes:
msg2 = Person()
str = bytearray(b'\x08\x9a\x05\x12\x03\x54\x6f\x6d\x1a\xfd\x96\xd1\x08\x0a\x09\x53\x4f\x4d\x45\x57\x48\x45\x52\x45')
msg2.ParseFromString(str)
print(msg2)
The result of this is perfect:
age: 666
name: "Tom"
edu {
college: "SOMEWHERE"
}
The conclusion I come to is that there are some different ways of encoding in Protobuf. I do not know what is done in this case but I know the example of a negative 32 bit varint. A positive varint is encoded in five bytes, a negative value is cast as a 64 bit value and encoded to ten bytes.

Related

how to strip out non-human-readable character at the start of each line using Xcode

I am trying to set up Xcode to get rid of non-human readable characters in legacy text files recovered from 8” floppy disks created in 1986. The files were created in QDOS, a proprietary disk operating system using a text-based Music Composition Language application aka MCL.
I aim to write a C program to read the ascii file, character by character, filter out non-printable characters from the source file and save it to a destination file thereby making it possible to view file contents in exactly the same format a composer would have seen it in 1986.
When Xcode reads the legacy text file, the unwanted character appears as the first human readable character of every line except the first line.
!B=24:Af
* BAR 1
G2,6
* BAR 2 & 3
!G2,1/4:Bf2,1/4:C2,1/4:Ef2,1/4:F3,1/4:G3,35/4:D3:A4
"* BAR 4
#Bf4:G4,2:D3:A4:Bf4
$* BAR 5
%D4,2:C4,3:F5
&* BAR 6
'D4:Bf4:A4,2:G4:D3:?
(* BAR 7 &
A hex dump of the above text file shows the two ascii bytes $0D (Carriage Return) followed by $1C (File Separator). These two bytes plus the byte that follows immediately after them, are the characters I am trying to remove.
0000: 1C 1D 21 42 3D 32 34 3A 41 66 0A 1C 1E 2A 20 20 ¿¿!B=24:Af¬¿¿*
0010: 20 20 20 20 20 20 20 20 20 42 41 52 20 31 0A 1C BAR 1¬¿
0020: 1F 47 32 2C 36 0A 1C 20 2A 20 20 20 20 20 20 20 ¿G2,6¬¿ *
0030: 20 20 20 20 42 41 52 20 32 20 26 20 33 0A 1C 21 BAR 2 & 3¬¿!
0040: 47 32 2C 31 2F 34 3A 42 66 32 2C 31 2F 34 3A 43 G2,1/4:Bf2,1/4:C
0050: 32 2C 31 2F 34 3A 45 66 32 2C 31 2F 34 3A 46 33 2,1/4:Ef2,1/4:F3
0060: 2C 31 2F 34 3A 47 33 2C 33 35 2F 34 3A 44 33 3A ,1/4:G3,35/4:D3:
0070: 41 34 0A 1C 22 2A 20 20 20 20 20 20 20 20 20 20 A4¬¿"*
0080: 20 42 41 52 20 34 20 0A 1C 23 42 66 34 3A 47 34 BAR 4 ¬¿#Bf4:G4
0090: 2C 32 3A 44 33 3A 41 34 3A 42 66 34 0A 1C 24 2A ,2:D3:A4:Bf4¬¿$*
00A0: 20 20 20 20 20 20 20 20 20 20 20 42 41 52 20 35 BAR 5
00B0: 0A 1C 25 44 34 2C 32 3A 43 34 2C 33 3A 46 35 0A ¬¿%D4,2:C4,3:F5¬
00C0: 1C 26 2A 20 20 20 20 20 20 20 20 20 20 20 42 41 ¿&* BA
00D0: 52 20 36 0A 1C 27 44 34 3A 42 66 34 3A 41 34 2C R 6¬¿'D4:Bf4:A4,
00E0: 32 3A 47 34 3A 44 33 3A 3F 0A 1C 28 2A 20 20 20 2:G4:D3:?¬¿(*
00F0: 20 20 20 20 20 20 20 20 42 41 52 20 37 20 26 20 BAR 7 &
I created an Xcode Command Line Tool Project. When I select Type : Plain Text and Text Encoding : Unicode (UTF-8) in the Xcode Inspectors Window the same single printable character is visible. I chose those settings because my MacOS expects en_AU.UTF-8.
The C code that follows makes an identical copy of the text file without identifying individual characters. Essentially it will read old file contents and write a new file successfully. The hex dump for the output file is identical to the hex dump above.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, const char * argv[]) {
char filename[] = {"~/Desktop/MCLRead/bell1.ss"} ;
printf("MCLRead\n\t%s\n", filename);
FILE* fin = fopen(filename, "r");
if (!fin) { perror("input error"); return 0; }
FILE* fout = fopen("output.txt", "w");
if (!fout) { perror("fout error"); return 0; }
fseek(fin, 0, SEEK_END); // go to the end of file
size_t filesize = ftell(fin); // get file size
fseek(fin, 0, SEEK_SET); // go back to the beginning
//allocate enough memory
char* buffer = malloc(filesize * sizeof(char));
//read one character at a time (or `fread` the whole file)
size_t i = 0;
while (1)
{
int c = fgetc(fin);
if (c == EOF) break;
//save to buffer
buffer[i++] = (char)c;
}
However when I compile, build and run this in Xcode the characters are unrecognisable regardless of the Type or Text Encoding settings in the Xcode Inspectors Window. The following error message appears in the Console Window
error: No such file or directory
Program ended with exit code: 0
When I run the same code in the Terminal Window it generates an output text file but the characters are unrecognisable
Desktop % gcc main.c
Desktop % ./a.out output.txt
Desktop % cat output.txt
cat results in a string of 128 ? characters in the Terminal Command Line - a total of 128 even though the file contains more than a thousand characters in total.
Can someone give me any clues for making this text file readable in a format that allows the non-human-readable characters to be stripped from the start of each line.
Please note, I am not asking for help to write the C code but rather what Text Format will make the unwanted 8-bit characters readable so I can remove them (a slight refinement on the question I asked initially). Any further help would be most appreciated. Thanks in advance.
Note
This post has been revised in response to comments.
The hex dump has been done as text rather than as an image. This offers the most reliable way to share the text file for anyone who wants to test what I have done
The problem can be solved easily by reading each byte as a 7-bit binary value using int not char. Source file is read in hex, saved in decimal and read as text.
Note. There is no EOF character. MCL used the word 'END' at the end of the file. Because it has been salvaged from a floppy disk image, the file sometimes has a trailing string of hex E5 characters written on the floppy disk when it was formatted. At other times where the format track is already overwritten the file has a trailing string of zeros.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CR 0x0D // ASCII Carriage Return
#define FS 0x1C // ASCII File Separator
#define FD_FORMAT 0xE5 // floppy disk format track
int main(int argc, const char * argv[])
{
char fname[20];
printf("\n Enter MCL file name : ");
scanf("%s", fname);
printf("\n\t%s\n", fname);
int a = 0; // init CR holder
int b = a; // init File Separator holder
FILE* fin = fopen(fname, "r"); // init read
if (!fin)
{ perror("input error"); return 0;
}
FILE* fout = fopen("output.txt", "w"); // init write
if (!fout)
{ perror("fout error"); return 0;
}
fseek(fin, 0, SEEK_END); // look for end of file
size_t fsize = ftell(fin); // get file size
fseek(fin, 0, SEEK_SET); // go back to the start
int* buffer = malloc(fsize * sizeof(int)); // allocate buffer
size_t i = 0;
while (1)
{
int c = fgetc(fin); // read one byte at a time
if (c < CR) break; // skip low control codes
if (c == FD_FORMAT) break; // skip floppy format track
printf("\t%X", a);
printf("\t%X", b);
if ((a != CR) && (b != FS)) // skip save if new line
{
printf("\t%0X\n", c);
buffer[i++] = c; // save to buffer
}
a = b;
b = c;
}
for (i = 0; i < fsize; i++) // write out int by int
fputc(buffer[i], fout);
free(buffer);
fclose(fin);
fclose(fout);
return 0;
}

Protobuf: ParseFromString method gives error "raise message_mod.DecodeError('Field number 0 is illegal.')" for message with oneof field

I have a kafka topic that has protobuf message of format:
message CreditTransaction {
string date = 1;
float amount = 2;
}
message DebitTransaction {
string date = 1;
float amount = 2;
}
...
.. # other message definitions
message TransactionEvent {
oneof event {
CreditTransaction credit = 1;
DebitTransaction debit = 2;
Trade trade = 3;
....
..# other fields
}
};
Using pyspark-streaming, when I am trying to use ParseFromString method to parse it, its giving me error:
File "./google.zip/google/protobuf/message.py", line 202, in ParseFromString
return self.MergeFromString(serialized)
File "./google.zip/google/protobuf/internal/python_message.py", line 1128, in MergeFromString
if self._InternalParse(serialized, 0, length) != length:
File "./google.zip/google/protobuf/internal/python_message.py", line 1178, in InternalParse
raise message_mod.DecodeError('Field number 0 is illegal.')
google.protobuf.message.DecodeError: Field number 0 is illegal.
Is it because the message TransactionEvent has only a single field and that too oneof type ?
I tried to add a dummy int64 id field also
message TransactionEvent {
int64 id = 1;
oneof event {
CreditTransaction credit = 2;
DebitTransaction debit = 3;
Trade trade = 4;
....
..# other fields
}
};
but still the same error.
Code I am using:
def parse_protobuf_from_bytes(msg_bytes):
msg = schema_pb2.MarketDataEvent()
msg.ParseFromString(msg_bytes)
eventStr = msg.WhichOneof("event")
if eventStr=="credit":
# some code
elif eventStr=="debit":
# some code
return str(concatenatedFieldsValue)
parse_protobuf = udf(lambda x: parse_protobuf_from_bytes(x), StringType())
kafka_conf = {
"kafka.bootstrap.servers": "kafka.broker.com:9092",
"checkpointLocation": "/user/aiman/checkpoint/kafka_local/transactions",
"subscribe": "TRANSACTIONS",
"startingOffsets": "earliest",
"enable.auto.commit": False,
"value.deserializer": "ByteArrayDeserializer",
"group.id": "my-group"
}
df = spark.readStream \
.format("kafka") \
.options(**kafka_conf) \
.load()
data = df.selectExpr("offset","CAST(key AS STRING)", "value") \
.withColumn("event", parse_protobuf(col("value")))
df2 = data.select(col("offset"),col("event"))
If I am just printing the bytes without parsing, I am getting this:
-------------------------------------------
Batch: 0
-------------------------------------------
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|offset |event |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|7777777|bytearray(b'\x00\x00\x00\x00\xc2\x02\x0e\x1a]\n\x11RIOT_230120P35.00\x10\x80\xa6\xae\x82\xd9\xed\xf1\xfe\x16\x18\xcd\xd9\xd9\x82\xd9\xed\xf1\xfe\x16 \xe2\xf7\xd9\x82\xd9\xed\xf1\xfe\x16(\x95\xa2\xed\xff\xd9\xed\xf1\xfe\x160\x8c\xaa\xed\xff\xd9\xed\xf1\xfe\x168\x80\xd1\xb6\xc1\x0b#\xc0\x8d\xa3\xba\x0bH\x19P\x04Z\x02Q_b\x02A_') |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
The raw data you are trying to decode is:
b'\x00\x00\x00\x00\xc2\x02\x0e\x1a]\n\x11RIOT_230120P35.00\x10\x80\xa6\xae\x82\xd9\xed\xf1\xfe\x16\x18\xcd\xd9\xd9\x82\xd9\xed\xf1\xfe\x16 \xe2\xf7\xd9\x82\xd9\xed\xf1\xfe\x16(\x95\xa2\xed\xff\xd9\xed\xf1\xfe\x160\x8c\xaa\xed\xff\xd9\xed\xf1\xfe\x168\x80\xd1\xb6\xc1\x0b#\xc0\x8d\xa3\xba\x0bH\x19P\x04Z\x02Q_b\x02A_'
A valid protobuf message never starts with 0x00, because the field number 0 is reserved. The message appears to have some extra data in the beginning.
Comparing with the protobuf encoding specification, we can try to make sense of this.
Starting from the string RIOT_230120P35.00, it is correctly prefixed by length 0x11 (17 characters). The previous byte is 0x0A, which is tag for field 1 with type string, such as in CreditTransaction message. Reading the message backwards from there, everything looks reasonable up to 0x1A byte.
After stripping the first 7 bytes and converting to hex (1a 5d 0a 11 52 49 4f 54 5f 32 33 30 31 32 30 50 33 35 2e 30 30 10 80 a6 ae 82 d9 ed f1 fe 16 18 cd d9 d9 82 d9 ed f1 fe 16 20 e2 f7 d9 82 d9 ed f1 fe 16 28 95 a2 ed ff d9 ed f1 fe 16 30 8c aa ed ff d9 ed f1 fe 16 38 80 d1 b6 c1 0b 40 c0 8d a3 ba 0b 48 19 50 04 5a 02 51 5f 62 02 41 5f), the message is accepted by online protobuf decoder.
It seems the message has 7 extra bytes in the beginning for some reason. These bytes do not conform to the protobuf format and their meaning cannot be determined without some information from the developer of the other endpoint of the communication.

How to find AID number of my local transportation card?

I have a transportation card for urban transportation. I need to know what aid(application identifier) number of the card is. According to EMV Book 1, i have to use the List of AIDs method (page 141). But how?
I also have a ACR122U card reader. I can send an APDU command to the card. All i need is the AID of the card. In addition, i always get SW=6A82 error. It means "File Not Found". I suppose, i need to know true AID number to solve this problem. I want to see SW=9000 (successful) response...
Edit: Code for creating select apdu command
private static final byte[] CLA_INS_P1_P2 = { 0x00, (byte)0xA4, 0x04, 0x00 };
private static final byte[] AID_ANDROID = { (byte)0xF0, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06 };
private byte[] createSelectAidApdu(byte[] aid) {
byte[] result = new byte[6 + aid.length];
System.arraycopy(CLA_INS_P1_P2, 0, result, 0, CLA_INS_P1_P2.length);
result[4] = (byte)aid.length;
System.arraycopy(aid, 0, result, 5, aid.length);
result[result.length - 1] = 0;
return result;
}
Thanks..
Typically, you should lookup the card documentation, which should describe how files organized.
However, since you're reading an ISO-DEP card, you may refer to ISO/IEC CD 7816-4. The card should implement part of instructions in this standard. According to Section 5.2, the file can be chosen using its identifier, which means you are able to enumerate all files located in MF.
So a possible solution is:
Send select file by identifier instruction as
00 A4 00 00 02 id 00
Where id ranges from 0000 to FFFF.
Once you receive SW=9000, the response should contain file control information (FCI, see Section 5.6). You can then find the DF name after byte 84. For example, a card responds
6F 15 84 0D 4E 43 2E 65 43 61 72 64 2E 44 46 30 31 A5 04 9F 08 01 02 90 00,
the DF name is 4E 43 2E 65 43 61 72 64 2E 44 46 30 31. The byte 0D after 84 indicates the length of DF name is 0x0D. You may use it as AID.

Hashing a 64-bit value into a 32-bit MAC address

I'm looking into suggestions on how to convert a 64-bit die revision field into a 32-bit MAC address I can use a for a wireless application to avoid collisions.
The die information is
struct {
uint32_t lot;
uint16_t X_coordinate;
uint16_t Y_coordinate;
}
I don't know the range of coordinates, but based on a few samples, I think the coordinates are limited to < 256. That effectly reduces the space by 2 bytes. But the lot number is fully populated.
I'm going to try this (pseudocode to make it readable, I'm leaving the casts out)
MAC = X_coordinate | Y_coordinate << 8 | lot << 16;
and throw away the top 16 bits of the lot and the top 8 bits of the coordinates. I feel though that maybe I should XOR in the top 16 bits of the lot somewhere, but I have no experience with this in the real world.
Here is a sample of die revision information: little endian byte dump
lot/wafer ID X coordinate Y coordinate
C3 1B B0 46 20 00 22 00
CB 8B 94 46 14 00 32 00
CB 8B 94 46 27 00 1E 00
B9 F7 80 6F 20 00 08 00

Capicom 3des: 2 key or 3 key?

Much searching and reading has not told me whether the capicom.encrypteddata class module (it's VB6, but that shouldn't matter in answering this question) is using 2-key 3DES or 3-key 3DES. (.Algorithm.Name = CAPICOM_ENCRYPTION_ALGORITHM_3DES) Anyone know which one it is using? A source of this information would also be helpful. I suspect, since I don't think high enough key lengths are supported, that it is 2DES. But I haven't found acceptable confirmation.
CAPICOM is a thin wrapper on top of CryptoAPI. If you decode the output from EncryptedData.Encrypt() you will see something like this (it is ASN.1 encoded in a proprietary format):
SEQUENCE {
OBJECT IDENTIFIER '1 3 6 1 4 1 311 88 3'
[0] {
SEQUENCE {
OBJECT IDENTIFIER '1 3 6 1 4 1 311 88 3 1'
[0] {
SEQUENCE {
INTEGER 131073
INTEGER 26115
INTEGER 192
OCTET STRING
AA A6 05 4E FA AF 4C 0B
OCTET STRING
3A 22 58 C3 51 D8 91 C8 7B 3C C9 51 9B E7 BA B7
OCTET STRING
84 FA 56 AF 01 FE C9 74
}
}
}
}
}
Note the 26115. That is the value for CALG_3DES, which is the CryptoAPI identifier for 3DES with three keys (3DES with two keys is called CALG_3DES_112). The 192 is the key-length, also match three-key 3DES:

Resources