Binary file and endianness - binaryfiles

Short question. If I set first byte in binary file as a flag to check if it is created on little endian hardware, would I do good? First byte could be one byte value 0 or 1 which is possible to cast to bool later.

Endianness can change between hardwares. A "flag" as you describe is an ok way to keep track of the endianness of a file.
You might want to consider forcing an endianness on your data.
Is there a way to enforce specific endianness for a C or C++ struct?
The above question answers how to force endianness to your data, which I believe is the better way.

Related

How to represent byte and word (int16) in protobuf .proto file? [duplicate]

What data type do I use to store a single byte in a protocol buffer message? Seeing the list at https://developers.google.com/protocol-buffers/docs/proto#scalar it seems like one of the *int32 types are the best fit. Is there a more efficient way to store a single byte?
Well you need to understand that it will take at least two bytes anyway - one for the tag and one for the data. (The tag will take more space if the field number is high.) If you use uint32, it will take 1 byte for the data for values up to 127, and 2 bytes for anything larger.
I don't believe there's anything that will be more efficient than that.

Protocol Buffers - Best practice for repeated boolean values

I need to transfer some data over a relative slow (down to only 1Kb/s) connection. I have read that the encoding of Googles protocol buffers is efficient.
Thats true for most of my data, but not for boolean values, especialy if it is a repeated field.
The problem is that I have to transfer, beside other data, a specified number (15) of boolean values every 50 milliseconds. Protobuf is encoding each boolean value into one byte for the field ID and one byte for the boolean value (0x00 or 0x01) which results in 30 bytes of data for 15 boolean values.
So I am searching for a better way of encoding this now. Anybody also had this problem already? What would be the best practice to reach a efficient encoding for this situation?
My idea was to use a numbered data type (uint32) and manual encode the data, for every bool one bit of the integer. Any feedback about this idea?
In Protobuf, your best bet is to use an integer bitfield. If you have more than 64 bits, use a bytes field (and pack the bits manually).
Note that Cap'n Proto will pack boolean values (in both structs and lists) as individual bits, and so may be worth looking at.
However, if you are extremely bandwidth-constrained, it may be best to develop your own custom protocol. Most of these serialization frameworks trade-off a little bit of space for ease of use (especially when it comes to dealing with version skew), but if your case it may be more important to focus solely on size. A custom message format that just contains some bits should be easy enough to maintain and can be packed as tightly as you want.
(Disclosure: I am the author of Cap'n Proto, as well as most of Google's open source Protobuf code.)

What is a good way to deal with byte alignment and endianess when packing a struct?

My current design involves communication between an embedded system and PC, where I am always buzzed by the struct design.
The two systems have different endianess that I need to deal with. However, I find that I cannot just do a simple byte-order switch for every 4 bytes to solve the problem. It turns out to depend on the struct.
For example, a struct like this:
{
uint16_t a;
uint32_t b;
}
would result in padding between a and b. Eventually, the endian switch has to be specific to a and b because the existence of the padding bytes. But it looks ugly because I need to change the endian switch logic every time I change the struct content.
What is a good strategy to arrange elements in a struct when padding comes in? Should we try to rearrange the elements so that there is only padding bytes at the end of the struct?
Thanks.
I'm afraid you'll need to do some more platform-neutral serialization, since different architectures have different alignment requirements. I don't think there is a safe and generic way to do something like grabbing a chunk of memory and sending it to another architecture where you just place it at some address and read from it (the correct data). Just convert and send the elements one-by-one - you can push the values into a buffer, that will not have any padding and you'll know exactly what is where. Plus you decide which part will do the conversions (typically the PC has more resources to do that). As a bonus you can checksum/sign the communication to catch errors/tampering.
BTW, afaik while the compiler keeps the order of the variables intact, it theoretically can put some additional padding between them (e.g. for performance reasons), so it's not just an architecture related thing.

endian.h for floats

I am trying to read an unformatted binary file, that was written by a big endian machine. My machine is a 32bit little endian.
I already know how to swap bytes for different variable types, but it is a cumbersome work. I found this set of functions endian.h that handle integer swapping very easily.
I was wondering if there is something similar for floats or strings, or if I have to program it from scratch? Since they are handled differently for this endianness problem as integers.
Thanks.
I do not think there is a standart header for swapping floats. You could take a look at http://www.gamedev.net/page/resources/_/technical/game-programming/writing-endian-independent-code-in-c-r2091
which provides some helpful code.
As for strings, there is no need to do endian-swapping. Endianness is used to order the bytes of a variable. A string consists of a series of chars. Each char has only one Byte so there is nothing to swap.

What are the alignas and alignof keywords used for?

I'm having trouble understand what the purpose of the alignas and alignof keywords are, and I'm not quite sure I fully understand what alignment is.
As I understand it, a memory address is aligned to n bytes if it is divisible by n, that is, it can be got to by counting 'n' bytes at a time (from 0? or some default value?). Also, the alignas keyword, when prefixing a variable declaration, specifies how the address at which the variable is stored is to be aligned, and the alignof returns how a variable's address is aligned.
However, I am not confident that this is a correct understanding of alignment or the alignof/alignas keywords - please correct me on any of the points I got wrong. I also don't see what use these keywords serve, so I would appreciate it if anyone could point out what their purpose is.
Some special types must be aligned at more bytes than usual- for example, matrices must be aligned at 16bytes on x86 for the most efficient copying to the GPU. SSE vector types can behave this way too. As such, if you want to make a container type, then you must know the alignment requirements of the type you're trying to contain or allocate.

Resources