endian.h for floats - endianness

I am trying to read an unformatted binary file, that was written by a big endian machine. My machine is a 32bit little endian.
I already know how to swap bytes for different variable types, but it is a cumbersome work. I found this set of functions endian.h that handle integer swapping very easily.
I was wondering if there is something similar for floats or strings, or if I have to program it from scratch? Since they are handled differently for this endianness problem as integers.
Thanks.

I do not think there is a standart header for swapping floats. You could take a look at http://www.gamedev.net/page/resources/_/technical/game-programming/writing-endian-independent-code-in-c-r2091
which provides some helpful code.
As for strings, there is no need to do endian-swapping. Endianness is used to order the bytes of a variable. A string consists of a series of chars. Each char has only one Byte so there is nothing to swap.

Related

Efficiency of sharing an integer variable

First of all, sorry for my English, I'm not a native speaker. That said, I've lately been working on some programs to compress string data into integer values, using the Huffman's tree structure. The compressed data is meant to be shared using LoRa from a board to another.
The problem I'm facing is that I don't know if compressing the data is more efficient, because I've now converted successfully a char into a unique integer code, but the code, in some extreme cases, is a 5-digit number. So I don't know if it's more efficient to send a char value or an integer value.
Basically, I'm trying to understand if communication byte by byte is more efficient than communication char by char, and if so, by how much.
I've tried searching if this question is a duplicate, but the only thing I've found is that for the CPU it's better to work with integers.
I am assuming that your "5 digit number" is all zeros and ones. In order to compress, each digits becomes a single bit in your data stream. You pack the bits, eight at a time, into bytes. So eight 5-bit values are packed into five bytes.

What for people sometimes convert numbers or strings to bytes?

Sometimes I encounter questions about converting sth to bytes. Are anything existing where it is vitally important to convert to bytes or what for could I convert sth to bytes?
In most languages the most common string functions come as part of the language or in a library/include/import that comes pre-made, often employing object code to take advantage of processor based strings functions, however, sometimes you need to do something with a string that isnt natively supported by the language so since 8-bit days, people have viewed strings as an array of 7 or 8-bit characters, which fit within a byte and use conventions like ASCII to determine which byte value represents which character.
While standard languages often have functions like "string.replaceChar(OFFSET,'a')" this methodology can be painstaking slow because each call to the replaceChar method results in processing overhead which may be greater than the processing needing to be done.
There is also the simplicity factor when designing your own string algorithms but like I said, most of the common algorithms come prebuilt in modern languages. (stringCompare, trimString, reverseString, etc).
Suppose you want to perform an operation on a string which doesnt come as standard.
Suppose you want to add two numbers which are represented in decimal digits in strings and the size of these numbers are greater than the 64-bit bus size of the processor? The RSA encryption/descryption behind the SSL browser padlocks employs the use of numbers which dont fit into the word size of a desktop computer but none the less the programs on a desktop which deal with RSA certificates and keys must be able to process these data which are actually strings.
There are many and varied reasons you would want to deal with string, as an array of bytes but each of these reasons would be fairly specialised.

Binary file and endianness

Short question. If I set first byte in binary file as a flag to check if it is created on little endian hardware, would I do good? First byte could be one byte value 0 or 1 which is possible to cast to bool later.
Endianness can change between hardwares. A "flag" as you describe is an ok way to keep track of the endianness of a file.
You might want to consider forcing an endianness on your data.
Is there a way to enforce specific endianness for a C or C++ struct?
The above question answers how to force endianness to your data, which I believe is the better way.

Efficient Algorithm for Parsing OpCodes

Let's say I'm writing a virtual machine. I read in the program data into an array of bytes. Now I need to loop through those bytes (instructions are two bytes) and instantiate a little class representing each instruction and it's arguments.
What would be a fast parsing approach? Here are the two way's I've thought of:
Logically branching by inspecting each bit from the left to the right until I narrowed it down to a particular op code. This would be like a binary search.
Inspecting some programs to come up with a list of opcodes ordered by frequency of use, and then checking the for the full opcode in that order.
Note: I will be using bit shifting and masking in C to check, not regexes or string comps or anything high-level like that.
You don't need to parse anything. If this is in C, you make a table of function pointers which has 256 entries in it, one for each possible byte value, then jump to the appropriate function based on the first byte value. If the second byte is significant then a switch statement can be used within the function to handle the second byte. This is how the original Visual Basic interpreter (versions 1-6) worked.

How is data stored in a bit vector?

I'm a bit confused how a fixed size bit vector stores its data.
Let's assume that we have a bit vector bv that I want to store hello in as ASCII.
So we do bv[0]=104, bv[1]=101, bv[2]=108, bv[3]=108, bv[4]=111.
How is the ASCII of hello represented in the bit vector?
Is it as binary like this: [01101000][01100101][01101100][01101100][01101111]
or as ASCII like this: [104][101][108][108][111]
The following paper HAMPI at section 3.5 step 2, the author is assigning ascii code to a bit vector, but Im confused how the char is represented in the bit vector.
Firstly, you should probably read up on what a bit vector is, just to make sure we're on the same page.
Bit vectors don't represent ASCII characters, they represent bits. Trying to do bv[0]=104 on a bit vector will probably not compile / run, or, if it does, it's very unlikely to do what you expect.
The operations that you would expect to be supported is along the lines of set the 5th bit to 1, set the 10th bit to 0, set all these bit to this, OR the bits of these two vectors and probably some others.
How these are actually stored in memory is completely up to the programming language, and, on top of that, it may even be completely up to a given implementation of that language.
The general consensus (not a rule) is that each bit should take up roughly 1 bit in memory (maybe, on average, slightly more, since there could be overhead related to storing these).
As one example (how Java does it), you could have an array of 64-bit numbers and store 64 bits in each position. The translation to ASCII won't make sense in this case.
Another thing you should know - even ASCII gets stored as bits in memory, so those 2 arrays are essentially the same, unless you meant something else.

Resources