Encoded nanopb protocol buffer consumes more memory than expected

Encoded nanopb protocol buffer consumes more memory than expected - protocol-buffers

I'm trying to convert the following c structure:
typedef struct _algo_stream_debug_pkt {
uint16_t src;
uint16_t dest;
uint16_t length;
uint16_t crc;
uint8_t command;
uint8_t status;
uint16_t sequence_num;
uint32_t timestamp;
uint16_t data_state;
uint16_t algo_out_x1;
uint16_t algo_out_x2;
uint16_t algo_out_x2;
uint16_t algo_interval;
uint16_t debug_info[10];
} algo_stream_debug_pkt ;
to protocol buffer with the following .proto file:
message m_header {
required int32 src = 1;
required int32 dst = 2;
required int32 len = 3;
required int32 crc = 4;
}
message algo_stream_debug_pkt {
required m_header header = 1;
required int32 command = 2;
required int32 status = 3;
required int32 sequence_num = 4;
required int32 timestamp = 5;
required int32 data_state = 6;
required int32 algo_out_x1 = 7;
required int32 algo_out_x2 = 8;
required int32 algo_out_x3 = 9;
required int32 algo_interval = 10;
repeated int32 debug_info = 11;
}
and the following .option file:
algo_stream_debug_pkt.debug_info max_count:10
The issue is that the encoded protocol buffer data of type algo_stream_debug_pkt uses approximately 58 bytes (when streamed to memory) for my use case. The actual c structure of algo_stream_debug_pkt uses just 48 bytes [ sizeof(algo_stream_debug_pkt) -> 48 ]. I'm wondering why the protocol buffer version consumes more space than the non - encoded c struct version given that the nanopb uses varints to represent integer types so that actual encoded version should be consuming lesser memory than the initially specified type.

The issue is that the encoded protocol buffer data of type algo_stream_debug_pkt uses approximately 58 bytes (when streamed to memory) for my use case. The actual c structure of algo_stream_debug_pkt uses just 48 bytes [ sizeof(algo_stream_debug_pkt) -> 48 ]
Each protobuf field is prefixed with the field tag number, which takes up at least one byte. That is 15 bytes spent for the field tags, and 2 bytes for the header submessage and debug_info array length.
On the other hand, some space is saved by encoding the values as variable length integers, which only takes one byte for values less than 128.
In general protobuf format is quite space-efficient, but it is not a compression format. The requirement of forward- and backward-compatibility in the protocol also causes some overhead.

Related

What's the meaning of return val about the "write_packet()/seek()" callback functions in "AVIOContext" struct?

I'm writing a muxer DirectShow Filter using libav, I need to redirect muxer's output to filter's output pin, So I use avio_alloc_context() to create AVIOContext with my write_packet and seek callback functions, these 2 functions are defined below:
int (*write_packet)(void *opaque, uint8_t *buf, int buf_size)
int64_t (*seek)(void *opaque, int64_t offset, int whence)
I can understand the meaning of these functions' input parameters, but what's the meaning of its return? Is it means the bytes written actually?

int (*write_packet)(void *opaque, uint8_t *buf, int buf_size)
Number of bytes written. Negative values indicate error.
int64_t (*seek)(void *opaque, int64_t offset, int whence)
The position of the offset, in bytes, achieved by the seek call, measured from the start of the output file. Negative values indicate error.

Define uint8_t variable in Protocol Buffers message file

I want to define a Point message in Protocol Buffers which represents an RGB Colored Point in 3-dimensional space.
message Point {
float x = 1;
float y = 2;
float z = 3;
uint8_t r = 4;
uint8_t g = 5;
uint8_t b = 6;
}
Here, x, y, z variables defines the position of Point and r, g, b defines the color in RGB space.
Since uint8_t is not defined in Protocol Buffers, I am looking for a workaround to define it. At present, I am using uint32 in place of uint8_t.

There isn't anything in protobuf that represents a single byte - it simply isn't a thing that the wire-format worries about. The options are:
varint (up to 64 bits input, up to 10 bytes on the wire depending on the highest set bit)
fixed 32 bit
fixed 64 bit
length-prefixed (strings, sub-objects, packed arrays)
(group tokens; a rare implementation detail)
A single byte isn't a good fit for any of those. Frankly, I'd use a single fixed32 for all 3, and combine/decompose the 3 bytes manually (via shifting etc). The advantage here is that it would only have one field header for the 3 bytes, and wouldn't be artificially stretched via having high bits (I'm not sure that a composed RGB value is a good candidate for varint). You'd also have a spare byte if you want to add something else at a later date (alpha, maybe).
So:
message Point {
float x = 1;
float y = 2;
float z = 3;
fixed32 rgb = 4;
}

IMHO this is the correct approach. You should use the nearest data type capable of holding all values to be sent between the system. The source & destination systems should validate the data if it is in the correct range. For uint8_t this is int32 indeed.

Some protocol buffers implementations actually allow this. In particular, nanopb allows to either have .options file alongside the .proto file or use its extension directly in .proto file to fine tune interpretation of individual fields.
Specifying int_size = IS_8 will convert uint32 from message to uint8_t in generated structure.
import "nanopb.proto";
message Point {
float x = 1;
float y = 2;
float z = 3;
uint32 r = 4 [(nanopb).int_size = IS_8];
uint32 g = 5 [(nanopb).int_size = IS_8];
uint32 b = 6 [(nanopb).int_size = IS_8];
}

Access violation error while memset

typedef struct
{
long nIndex; // object index
TCHAR path[3 * MAX_TEXT_FIELD_SIZE];
}structItems;
void method1(LPCTSTR pInput, LPTSTR pOutput, size_t iSizeOfOutput)
{
size_t iLength = 0;
iLength = _tcslen(pInput);
if (iLength > iSizeOfOutput + sizeof(TCHAR))
iLength = iSizeOfOutput - sizeof(TCHAR);
memset(pOutput, 0, iSizeOfOutput); // Access violation error
}
void main()
{
CString csSysPath = _T("fghjjjjjjjjjjjjjjjj");
structItems *pIndexSyspath = nullptr;
pIndexSyspath = (structItems *)calloc(1, sizeof(structItems) * 15555555); //If i put size as 1555555 then it works well
method1(csSysPath, pIndexSyspath[0].path, (sizeof(TCHAR) * (3 * MAX_TEXT_FIELD_SIZE)));
}
This is a sample code which cause the crash.
In the above code if the size we put 1555555 then it works well (I randomly decreased size by a digit).
This is a 32 bit application running on 64 Bit Win OS on 16GB RAM
I kindly request some one to help me understand the reason for failure and relation between calloc - size - memset.

typedef struct
{
long nIndex; // 4 bytes on Windows
TCHAR path[3 * MAX_TEXT_FIELD_SIZE]; // 1 * 3 * 255 bytes for non-unicode
} structItems;
Supposing non unicode, TCHAR is 1byte, MAX_TEXT_FIELD_SIZE is 255, so sizeof(structItems) is 255*3 + 4, which is 769 bytes for a struct. Now, you want to allocate sizeof(structItems) * 15555555, which is more than 11GiB. How could that fit into 2GiB available to 32-bit process.

when to use hton/ntoh and when to convert data myself?

to convert a byte array from another machine which is big-endian, we can use:
long long convert(unsigned char data[]) {
long long res;
res = 0;
for( int i=0;i < DATA_SIZE; ++i)
res = (res << 8) + data[i];
return res;
}
if another machine is little-endian, we can use
long long convert(unsigned char data[]) {
long long res;
res = 0;
for( int i=DATA_SIZE-1;i >=0 ; --i)
res = (res << 8) + data[i];
return res;
}
why do we need the above functions? shouldn't we use hton at sender and ntoh when receiving? Is it because hton/nton is to convert integer while this convert() is for char array?

The hton/ntoh functions convert between network order and host order. If these two are the same (i.e., on big-endian machines) these functions do nothing. So they cannot be portably relied upon to swap endianness. Also, as you pointed out, they are only defined for 16-bit (htons) and 32-bit (htonl) integers; your code can handle up to the sizeof(long long) depending on how DATA_SIZE is set.

Through the network you always receive a series of bytes (octets), which you can't directly pass to ntohs or ntohl. Supposing the incoming bytes are buffered in the (unsigned) char array buf, you could do
short x = ntohs(*(short *)(buf+offset));
but this is not portable unless buf+offset is always even, so that you read with correct alignment. Similarly, to do
long y = ntohl(*(long *)(buf+offset));
you have to make sure that 4 divides buf+offset. Your convert() functions, though, don't have this limitation, they can process byte series at arbitrary (unaligned) memory address.

Ruby C-Extension for manipulating Binary Data

I need to interface Ruby with a C-function which does low-level byte operations on a fixed-sized buffer (16 bytes length).
I should also mention that I am using Ruby 1.8.7 for this, so no headaches with Ruby trying to figure out encodings.
void transmogrify(unsigned char *data_in, unsigned_char *data_out)
{
unsigned char buffer[16];
for(i=0;i<16;i++) {
buffer[i] = data_in[i] << 2; // some low level operations on 8-bit values here
}
memcpy(data_out, buffer, sizeof(buffer));
}
How do I write the Ruby C-interface code in my library, so that it will handle binary data in strings correctly? e.g. that it will be robust against zero bytes in the fixed-length input string?
If I use StringValuePtr() it breaks on zero bytes.
static VALUE mygem_transmogrify(VALUE self, VALUE input)
{
unsigned char output[16]
if (TYPE(input) != T_STRING) { return Qnil; } // make sure input is a String
StringValuePtr(input); // <--- THIS does not like zero bytes in the string
transmogrify( (unsigned char *) RSTRING_PTR(input) , (unsigned char *) output);
return rb_str_new( (char *) output, 16);
}
I want to call this from Ruby as follows:
input_string = File.read( 'somefile') # reads 16 bytes binary
output = MyGem.transmogrify( input_string ) # output should be 16 bytes binary
=> ... in 'transmogrify': string contains null butye (ArgumentError)

I figured it out:
static VALUE mygem_transmogrify(VALUE self, VALUE input)
{
unsigned char output[16]
if (TYPE(input) != T_STRING) { return Qnil; } // make sure input is a String
transmogrify( (unsigned char *) RSTRING(input)->ptr , (unsigned char *) output);
return rb_str_new( (char *) output, 16);
}
this does not break on zero bytes in the input string.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Encoded nanopb protocol buffer consumes more memory than expected - protocol-buffers

Related

What's the meaning of return val about the "write_packet()/seek()" callback functions in "AVIOContext" struct?

Define uint8_t variable in Protocol Buffers message file

Access violation error while memset

when to use hton/ntoh and when to convert data myself?

Ruby C-Extension for manipulating Binary Data

Categories

Resources