Left shift byte in C on Microchip 16F876A - microchip

I'm facing with this problem: working with C for Microchip PIC16F876 microcontroller (HI-TECH PICC compiler), I need to save a number in EEPROM and read it when I switch my device ON. Number dimension requires three bytes (number is 126000 in decimal, corresponding to 0x01EC30), thus I thought to split it in 0x01, 0xEC, 0x30 and save them in three EEPROM locations.
At startup I read the three bytes and join them to obtain my whole variable:
unsigned long TotVariable = byte2 << 16 | byte1 << 8 | byte0;
check in a watch windows, I found byte2 left shifted didn't appeared in position I'd like, but the variable dimension seems to be 16 byte only (value of byte2 I found summed to least significant, thus when I read I found 0xFFFFEC31 instead of 0x0001EC30).
How can I solve this issue?
Problem is my variable dimension? Or left shift operation? Compiler can't manage this variable?
Does exist another way to split and join three bytes?

Related

How does UTF-8 represent characters?

I'm reading UTF-8 Encoding, and I don't understand the following sentence.
For characters equal to or below 2047 (hex 0x07FF), the UTF-8
representation is spread across two bytes. The first byte will have
the two high bits set and the third bit clear (i.e. 0xC2 to 0xDF). The
second byte will have the top bit set and the second bit clear (i.e.
0x80 to 0xBF).
If I'm not mistaken, this means UTF-8 requires two bytes to represent 2048 characters. In other words, we need to choose 2048 candidates from 2 to the power of 16 to represent each character.
For characters equal to or below 2047 (hex 0x07FF), the UTF-8
representation is spread across two bytes.
What's the big deal about choosing 2048 out of 65,536? However, UTF-8 explicitly sets boundary to each byte.
With following statements, The number of combinations is 30 (0xDF - 0xC2 + 0x01) for first byte, and 64 (0xBF - 0x80 + 0x01) for second byte.
The first byte will have
the two high bits set and the third bit clear (i.e. 0xC2 to 0xDF). The
second byte will have the top bit set and the second bit clear (i.e.
0x80 to 0xBF).
How does 1920 numbers (64 times 30) accommodate 2048 combinations?
As you already know, 2047 (0x07FF) contains the raw bits
00000111 11111111
If you look at the bit distribution chart for UTF-8:
You will see that 0x07FF falls in the second line, so it is encoded as 2 bytes using this bit pattern:
110xxxxx 10xxxxxx
Substitute the raw bits into the xs and you get this result:
11011111 10111111 (0xDF 0xBF)
Which is exactly as the description you quoted says:
The first byte will have the two high bits set and the third bit clear (11011111). The second byte will have the top bit set and the second bit clear (10111111).
Think of it as a container, where the encoding reserves a few bits for its own synchronization, and you get to use the remaining bits.
So for the range in question, the encoding "template" is
110 abcde 10 fghijk
(where I have left a single space to mark the boundary between the template and the value from the code point we want to encode, and two spaces between the actual bytes)
and you get to use the 11 bits abcdefghijk for the value you actually want to transmit.
So for the code point U+07EB you get
0x07 00000111
0xEB 11101011
where the top five zero bits are masked out (remember, we only get 11 -- because the maximum value that the encoding can accommodate in two bytes is 0x07FF. If you have a larger value, the encoding will use a different template, which is three bytes) and so
0x07 = _____ 111 (template: _____ abc)
0xEB = 11 101011 (template: de fghijk)
abc de = 111 11 (where the first three come from 0x07, and the next two from 0xEB)
fghijk = 101011 (the remaining bits from 0xEB)
yielding the value
110 11111 10 101011
aka 0xDF 0xAB.
Wikipedia's article on UTF-8 contains more examples with nicely colored numbers to see what comes from where.
The range 0x00-0x7F, which can be represented in a single byte, contains 128 code points; the two-byte range thus needs to accommodate 1920 = 2048-128 code points.
The raw encoding would allow values in the range 0xC0-0xBF in the first byte, but the values 0xC0 and 0xC1 are not ever needed because those would represent code points which can be represented in a single byte, and thus are invalid as per the encoding spec. In other words, the 0x02 in 0xC2 comes from the fact that at least one bit in the high four bits out of the 11 that this segment of the encoding can represent (one of abcd) needs to be a one bit in order for the value to require two bytes.

Confusion over little and big endian

I was reading an article which was explaining the difference in between little and big endian. I understand that big endian stores the data "big end" first and that little endian stores the data "little end" first. My confusion is in the following block of text:
Big endian machine: I think a short is two bytes, so I'll read them off: location s is address 0 (W, or 0x12) and location s + 1 is address 1 (X, or 0x34). Since the first byte is biggest (I'm big-endian!), the number must be 256 * byte 0 + byte 1, or 256*W + X, or 0x1234. I multiplied the first byte by 256 (2^8) because I needed to shift it over 8 bits.
I don't understand why they did a bit shift of 8 bits.
Also, here's another block of text I don't understand:
On a big endian machine we see:
Byte: U N I X
Location: 0 1 2 3
Which make sense. U is the biggest byte in "UN" and is stored first. The > same goes for IX: I is the biggest, and stored first.
On a little-endian machine we would see:
Byte: N U X I
Location: 0 1 2 3
If my understanding is correct, wouldn't it be "INUX, " on a little-endian machine?
The full article is at https://betterexplained.com/articles/understanding-big-and-little-endian-byte-order/.
If anyone could clear this up, that would be wonderful.
Alright, so I understand how big and little endian work now:
I'll address the issue I had understanding the second block of text.
Basically, in the article, the author stated that if we stored the word, "UNIX, " as a couple of shorts (not longs), then the final result would be "NUXI."
I'll now address the issue I had understanding the first block of text.
Basically, the bit shift is done so as to switch the arrangement of the bytes in memory so that, in the case of big endian, the most significant byte is first, and, in little endian, the least significant byte is first.

Actual length of input vector in VHDL

i am running a HDL code written in VHDL and i have an input vector with maximum length of 512 bits. Some of my inputs are less than the max size. So i want to find if there is a way to find the actual length of every input, in order to cut the unwanted zeros at the most significant bits of the input vector. Is there any possible way to do this kind of stuff?
I guess you are looking for an unambiguous padding method for your data. What I would recommend in your case is an adaption of the ISO/IEC 9797-1 padding method 2 as follows:
For every input data (even if it already has 512 bits), you add a leading '1' bit. Then you add leading '0' bits (possibly none) to fill up your vector.
To implement this scheme you would have to enlargen your input vector to 513 bits (because you have to always add at least one bit).
To remove the padding, you simple go through the vector starting at the MSB and find the first '1' bit, which marks the end of your apdding pattern.
Example (for 8+1 bit):
input: 10101
padded: 0001 10101
input: 00000000
padded: 1 00000000

Ada- what do 'at' and 'range' mean/ do?

I am debugging some software that has been written in two parts- one part in C++, and the other part in Ada- which I have never used before.
While reading through some of the Ada code, and looking for variables that contain particlar data, I have found that those variables are used in a record in a for loop, such as:
for myRecord use
record
eta at 8 range 0 .. 31;
ttg at 16 range 0 .. 63;
end record;
The at and range are in bold type in the IDE (GPS- GNAT Programming Studio), which I assume means that they are keywords/ have a particular meaning in Ada... Can someone explain to me what this structure is/ does? Do the numbers here have something to do with the amount of memory assigned to the variables/ their memory location?
eta starts at bit 0 of byte offset 8 from the start of the record, and continues to bit 31; i.e. it occupies 32 bits starting at byte 8.
Similarly, ttg occupies 64 bits starting at byte 16 bit 0.
See ARM 13.5.1, Record Representation Clauses.

Reverse order of bits using shift and rotate

I am asked on HW to reverse bits, like mirror flip them so for example 1011 0100 becomes 0010 1101, using shift and rotate combination. I understand how those commands work but I can't think of a way to flip them. Thanks.
I need to do it using SAL assembly language.
If you need to flip a b-bit word using only shifts, you could emulate a stack:
b times{
right shift on the input register, setting a carry flag.
left shift on the output register, reading the carry flag.
}
Note that x86 has the "rotate through carry" instructions - they serve both purposes (or use rotation without carry on the input register to preserve the input). If left shift from carry is not available but right shift from carry is, reverse the words "left" and "right" in the previous algorithm. If no shift from carry is available, you need to emulate by an "ordinary" logical shift followed by setting the correct bit, but...
If you can use AND and OR as well and b is known ahead and is a power of two, there is a faster way. Reverse two bits within each pair, then two pairs within each nibble, then two nibbles within each byte, then two bytes within each word...
for 8-bit x:
//1234 5678
x = (0x55 & x)<< 1 | (0xAA & x)>> 1 //2143 6587
x = (0x33 & x)<< 2 | (0xCC & x)>> 2 //4321 8765
x = (0x0F & x)<< 4 | (0xF0 & x)>> 4 //8765 4321

Resources