Differentiation between integer and character - memory-management

I have just started learning c++ and have come across various data types in c++. I also learnt how the computer stores values when the data type is specified . One doubt that occurred to me while learning char data types was how did the computer differentiate between integers and characters.
I learnt that the character data type uses 8 bits to store a character and the computer can store a character in its memory location by following ASCII encoding rules. However, I didn't realise how the computer knows whether the byte 00100001 represents the latter 'a' or the integer 65. Is there any special bit assigned for this purpose?

when we do
int a = 65
or
char ch = 'a'
If we check the memory address we will see the value 00100001 as expected.
In application layer we choose to cast as character or integer
prinf("%d", ch)
will print 65

Characters are represented as integers inside the computer. Hence the data type "char" is simply a subset of the data type "int".
Refer to following page: will clear all the ambiguities in your mind.
Data Types Detail

The computer itself does not remember or set any bits to distinguish chars from ints. Instead it's the compiler which maintains that information and generates proper machine code which operates on data appropriately.
You can even override and 'mislead' the compiler if you want. For example you can cast a char pointer to a void pointer and then to an int pointer and then try to read the location referred to as an int. I think 'dynamic casts' are also possible. If there was an actual bit used then such operations would not be possible.
Adding more details in response to comment:
Hi, really what you should ask is that who will retrieve the values? Imagine that you write the contents of memory to file and send them over the Internet. If the receiver "knows" that its receiving chars then there is no need to encode the identity of chars. But if the receiver could receive either chars or ints then it would need identifying bits. In the same way, when you compile a program and the compiler knows what's stored where, there is no need to 'figure out' anything since you already know it. Now how a char is encoded as bits vs a float vs an int is decided by a standard like IEEE standard

You have asked a simple yet profound question. :-)
Answers and an example or two are below.
(see edit2, at bottom, for a longer example that tries to illustrate what happens when you interpret a single memory location's bit patterns in different ways).
The "profound" aspect of it lies in the astounding variety of character encodings that exist. There are many - I wager more than you believe there could possibly be. :-)
This is a worthwhile read: http://www.joelonsoftware.com/articles/Unicode.html
full title: "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"
As for your first question: "how did the computer differentiate between integers and characters":
The computer doesn't (for better or worse).
The meaning of bit patterns is interpreted by whatever reads them.
Consider this example bit pattern (8 bits, or one byte):
01000001b = 41x = 65d (binary, hex & decimal respectively).
If that bit pattern is based on ASCII it will represent an uppercase A.
If that bit pattern is EBCDIC it will represent an "non-breaking space" character (at least according to the EBCDIC chart at wikipedia, most of the others I looked at don't say what 65d means in EBCDIC).
(Just for trivia's sake, in EBCDIC, 'A' would be represented with a different bit pattern entirely: C1x or 193d.)
If you read that bit pattern is an integer (perhaps a short), it may indicate you have 65 dollars in a bank account (or euros, or something else - just like the character set your bit pattern won't have anything in it to tell you what currency it is.
If that bit pattern is part of a 24-bit pixel encoding for your display (3 bytes for RBG), perhps 'blue' in RBG encoding, it may indicate your pixel is roughly 25% blue (e.g. 65/255 is about 25.4%); 0% would be black, 100% would be as blue as possible.
So, yeah, there are lots of variations on how bits can be interpreted. It is up to your program to keep track of that.
edit: it is common to add metadata to track that, so if you are dealing with currencies you may have one byte for currency type and other bytes for the quantity of a given currency. Currency type would have to be encoded as well; there are different ways to do that... something that "C++ enum" attempts to solve in a space-efficient way: http://www.cprogramming.com/tutorial/enum.html ).
As for 8 bits (one byte) per character, that is an Fair Assumption when you're starting out. But it isn't always true. Lots of languages will use 2+ bytes for each character when you get into Unicode.
However... ASCII is very common and it fits into a single byte (8 bits).
If you are handling simple english text (A-Z, 0-9 and so on), that my be enough for you.
Spend some time browsing here and look at acsii, ebcdic and others:
http://www.lookuptables.com/
If you're running on linux or smth, hexdump can be your friend.
Try the following
$ hexdump -C myfile.dat
Whatever operating system you're using, you will want to find a hexdump utility you can use to see what is really in your data files.
You mentioned C++, I think it would would be an interesting exercise to write a "thing" byte-dumper utility, just a short program that takes a void* pointer and the number of bytes it has and then prints out that many bytes worth of values.
Good luck with your studies! :-)
Edit 2: I added a small research program... I don't know how to illustrate the idea more concisely (seems easer in C than C++).
Anyway...
In this example program, I have two character pointers that are referencing memory used by an integer.
The actual code (see 'example program', way below) is messier with casting, but this illustrates the basic idea:
unsigned short a; // reserve 2 bytes of memory to store our 'unsigned short' integer.
char *c1 = &a; // point to first byte at a's memory location.
char *c2 = c1 + 1; // point to next byte at a's memory location.
Note how 'c1' and 'c2' both share the memory that is also used by 'a'.
Walking through the output...
The sizeof's basically tells you how many bytes something uses.
The ===== Message Here ===== lines are like a comment printed out by the dump() function.
The important thing about the dump() function is that it is using the bit patterns in the memory location for 'a'.
dump() doesn't change those bit patterns, it just retrieves them and displays them via cout.
In the first run, before calling dump I assign the following bit pattern to a:
a = (0x41<<8) + 0x42;
This left-shifts 0x41 8 bits and adds 0x42 to it.
The resulting bit pattern is = 0x4142 (which is 16706 decimal, or 100001 100010 binary).
One of the bytes will be 0x41, the other will hold 0x42.
Next it calls the dump() method:
dump( "In ASCII, 0x41 is 'A' and 0x42 is 'B'" );
Note the output for this run on my virtual box Ubuntu found the address of a was 0x6021b8.
Which nicely matches the expected addresses pointed to by both c1 & c2.
Then I modify the bit pattern in 'a'...
a += 1; dump(); // why did this find a 'C' instead of 'B'?
a += 5; dump(); // why did this find an 'H' instead of 'C' ?
As you dig deeper into C++ (and maybe C ) you will want to be able to draw memory maps like this (more or less):
=== begin memory map ===
+-------+-------+
unsigned short a : byte0 : byte1 : holds 2 bytes worth of bit patterns.
+-------+-------+-------+-------+
char * c1 : byte0 : byte1 : byte3 : byte4 : holds address of a
+-------+-------+-------+-------+
char * c2 : byte0 : byte1 : byte3 : byte4 : holds address of a + 1
+-------+-------+-------+-------+
=== end memory map ===
Here is what it looks like when it runs; I encourage you to walk through the C++ code
in one window and tie each piece of output back to the C++ expression that generated it.
Note how sometimes we do simple math to add a number to a (e.g. "a +=1" followed by "a += 5").
Note the impact that has on the characters that dump() extracts from memory location 'a'.
=== begin run ===
$ clear; g++ memfun.cpp
$ ./a.out
sizeof char =1, unsigned char =1
sizeof short=2, unsigned short=2
sizeof int =4, unsigned int =4
sizeof long =8, unsigned long =8
===== In ASCII, 0x41 is 'A' and 0x42 is 'B' =====
a=16706(dec), 0x4142 (address of a: 0x6021b8)
c1=0x6021b8 (should be the same as 'address of a')
c2=0x6021b9 (should be just 1 more than 'address of a')
c1=B
c2=A
in hex, c1=42
in hex, c2=41
===== after a+= 1 =====
a=16707(dec), 0x4143 (address of a: 0x6021b8)
c1=0x6021b8 (should be the same as 'address of a')
c2=0x6021b9 (should be just 1 more than 'address of a')
c1=C
c2=A
in hex, c1=43
in hex, c2=41
===== after a+= 5 =====
a=16712(dec), 0x4148 (address of a: 0x6021b8)
c1=0x6021b8 (should be the same as 'address of a')
c2=0x6021b9 (should be just 1 more than 'address of a')
c1=H
c2=A
in hex, c1=48
in hex, c2=41
===== In ASCII, 0x58 is 'X' and 0x42 is 'Y' =====
a=22617(dec), 0x5859 (address of a: 0x6021b8)
c1=0x6021b8 (should be the same as 'address of a')
c2=0x6021b9 (should be just 1 more than 'address of a')
c1=Y
c2=X
in hex, c1=59
in hex, c2=58
===== In ASCII, 0x59 is 'Y' and 0x5A is 'Z' =====
a=22874(dec), 0x595a (address of a: 0x6021b8)
c1=0x6021b8 (should be the same as 'address of a')
c2=0x6021b9 (should be just 1 more than 'address of a')
c1=Z
c2=Y
in hex, c1=5a
in hex, c2=59
Done.
$
=== end run ===
=== begin example program ===
#include <iostream>
#include <string>
using namespace std;
// define some global variables
unsigned short a; // declare 2 bytes in memory, as per sizeof()s below.
char *c1 = (char *)&a; // point c1 to start of memory belonging to a (1st byte).
char * c2 = c1 + 1; // point c2 to next piece of memory belonging to a (2nd byte).
void dump(const char *msg) {
// so the important thing about dump() is that
// we are working with bit patterns in memory we
// do not own, and it is memory we did not set (at least
// not here in dump(), the caller is manipulating the bit
// patterns for the 2 bytes in location 'a').
cout << "===== " << msg << " =====\n";
cout << "a=" << dec << a << "(dec), 0x" << hex << a << dec << " (address of a: " << &a << ")\n";
cout << "c1=" << (void *)c1 << " (should be the same as 'address of a')\n";
cout << "c2=" << (void *)c2 << " (should be just 1 more than 'address of a')\n";
cout << "c1=" << (char)(*c1) << "\n";
cout << "c2=" << (char)(*c2) << "\n";
cout << "in hex, c1=" << hex << ((int)(*c1)) << dec << "\n";
cout << "in hex, c2=" << hex << (int)(*c2) << dec << "\n";
}
int main() {
cout << "sizeof char =" << sizeof( char ) << ", unsigned char =" << sizeof( unsigned char ) << "\n";
cout << "sizeof short=" << sizeof( short ) << ", unsigned short=" << sizeof( unsigned short ) << "\n";
cout << "sizeof int =" << sizeof( int ) << ", unsigned int =" << sizeof( unsigned int ) << "\n";
cout << "sizeof long =" << sizeof( long ) << ", unsigned long =" << sizeof( unsigned long ) << "\n";
// this logic changes the bit pattern in a then calls dump() to interpret that bit pattern.
a = (0x41<<8) + 0x42; dump( "In ASCII, 0x41 is 'A' and 0x42 is 'B'" );
a+= 1; dump( "after a+= 1" );
a+= 5; dump( "after a+= 5" );
a = (0x58<<8) + 0x59; dump( "In ASCII, 0x58 is 'X' and 0x42 is 'Y'" );
a = (0x59<<8) + 0x5A; dump( "In ASCII, 0x59 is 'Y' and 0x5A is 'Z'" );
cout << "Done.\n";
}
=== end example program ===

int is an integer, a number that has no digits after the decimal point. It can be positive or negative. Internally, integers are stored as binary numbers. On most computers, integers are 32-bit binary numbers, but this size can vary from one computer to another. When calculations are done with integers, anything after the decimal point is lost. So if you divided 2 by 3, the result is 0, not 0.6666.
char is a data type that is intended for holding characters, as in alphanumeric strings. This data type can be positive or negative, even though most character data for which it is used is unsigned. The typical size of char is one byte (eight bits), but this varies from one machine to another. The plot thickens considerably on machines that support wide characters (e.g., Unicode) or multiple-byte encoding schemes for strings. But in general char is one byte.

Related

How to properly convert dicom image to opencv

I have problems converting .dcm image from dcmtk format to opencv.
My code:
DicomImage dcmImage(in_file.c_str());
int depth = dcmImage.getDepth();
std::cout << "bit-depth: " << depth << "\n"; //this outputs 10
Uint8* imgData = (uchar*)dcmImage.getOutputData(depth);
std::cout << "size: " << dcmImage.getOutputDataSize() << "\n"; //this outputs 226100
cv::Mat image(int(dcmImage.getWidth()), int(dcmImage.getHeight()), CV_32S, imgData);
std::cout << dcmImage.getWidth() << " " << dcmImage.getHeight() << "\n"; //this outputs 266 and 425
imshow("image view", image); //this shows malformed image
So I am not sure about CV_32S and getOutputData parameter. What should i put there? Also 226100/(266*425) == 2 so it should be 2 bytes pre pixel (?)
When getDepth() returns 10, that means you have 10 bits (most probably grayscale) per pixel.
Depending on the pixel representation of the DICOM image (0x0028,0x0103), you have to specify signed or unsigned 16 bit integer for the matrix type:
CV_16UC2 or CV_16SC2.
Caution: As only 10 bits of 2 bytes are used, you might find garbage in the upper 6 bits which should be masked out before passing the buffer to the mat.
Update:
About your comments and your source code:
DicomImage::getInterData()::getPixelRepresentation() does not return the pixel representation as found in the DICOM header but an internal enumeration expressing bit depth and signed/unsigned at the same time. To obtain the value in the header - use the DcmDataset or DcmFileFormat
I am not an openCV expert, but I think you are applying an 8 bit bitmask to the 16 bit image which cannot work properly
The bitmask should read (1 >> 11) - 1
The question is whether you really need rendered pixel data as returned by DicomImage::getOutputData(), or if you need the original pixel data from the DICOM image (also see answer from #kritzel_sw). When using getOutputData() you should pass the requested bit depth as a parameter (e.g. 8 bits per sample) and not the value returned by getDepth().
When working with CT images, you probably want to use pixel data in Hounsfield Units (which is a signed integer value that is the result of the Modality LUT transformation).

Converting a floating point to its corresponding bit-segments

Given a Ruby Float value, e.g.,
f = 12.125
I'd like to wind up a 3-element array containing the floating-point number's sign (1 bit), exponent (11 bits), and fraction (52 bits). (Ruby's floats are the IEEE 754 double-precision 64-bit representation.)
What's the best way to do that? Bit-level manipulation doesn't seem to be Ruby's strong point.
Note that I want the bits, not the numerical values they correspond to. For instance, getting [0, -127, 1] for the floating-point value of 1.0 is not what I'm after -- I want the actual bits in string form or an equivalent representation, like ["0", "0ff", "000 0000 0000"].
The bit data can be exposed via Arrays pack as Float doesn't provide functions internally.
str = [12.125].pack('D').bytes.reverse.map{|n| "%08b" %n }.join
=> "0100000000101000010000000000000000000000000000000000000000000000"
[ str[0], str[1..11], str[12..63] ]
=> ["0", "10000000010", "1000010000000000000000000000000000000000000000000000"]
This is a bit 'around about the houses' to pull it out from a string representation. I'm sure there is a more efficient way to pull the data from the original bytes...
Edit The bit level manipulation tweaked my interest so I had a poke around. To use the operations in Ruby you need to have an Integer so the float requires some more unpacking to convert into a 64 bit int. The big endian/ieee754 documented representation is fairly trivial. The little endian representation I'm not so sure about. It's a little odd, as you are not on complete byte boundaries with an 11 bit exponent and 52 bit mantissa. It's becomes fiddly to pull the bits out and swap them about to get what resembles little endian, and not sure if it's right as I haven't seen any reference to the layout. So the 64 bit value is little endian, I'm not too sure how that applies to the components of the 64bit value until you store them somewhere else, like a 16bit int for the mantissa.
As an example for an 11 bit value from little > big, The kind of thing I was doing was to shift the most significant byte left 3 to the front, then OR with the least significant 3 bits.
v = 0x4F2
((v & 0xFF) << 3) | ( v >> 8 ))
Here it is anyway, hopefully its of some use.
class Float
Float::LITTLE_ENDIAN = [1.0].pack("E") == [1.0].pack("D")
# Returns a sign, exponent and mantissa as integers
def ieee745_binary64
# Build a big end int representation so we can use bit operations
tb = [self].pack('D').unpack('Q>').first
# Check what we are
if Float::LITTLE_ENDIAN
ieee745_binary64_little_endian tb
else
ieee745_binary64_big_endian tb
end
end
# Force a little end calc
def ieee745_binary64_little
ieee745_binary64_little_endian [self].pack('E').unpack('Q>').first
end
# Force a big end calc
def ieee745_binary64_big
ieee745_binary64_big_endian [self].pack('G').unpack('Q>').first
end
# Little
def ieee745_binary64_little_endian big_end_int
#puts "big #{big_end_int.to_s(2)}"
sign = ( big_end_int & 0x80 ) >> 7
exp_a = ( big_end_int & 0x7F ) << 1 # get the last 7 bits, make it more significant
exp_b = ( big_end_int & 0x8000 ) >> 15 # get the 9th bit, to fill the sign gap
exp_c = ( big_end_int & 0x7000 ) >> 4 # get the 10-12th bit to stick on the front
exponent = exp_a | exp_b | exp_c
mant_a = ( big_end_int & 0xFFFFFFFFFFFF0000 ) >> 12 # F000 was taken above
mant_b = ( big_end_int & 0x0000000000000F00 ) >> 8 # F00 was left over
mantissa = mant_a | mant_b
[ sign, exponent, mantissa ]
end
# Big
def ieee745_binary64_big_endian big_end_int
sign = ( big_end_int & 0x8000000000000000 ) >> 63
exponent = ( big_end_int & 0x7FF0000000000000 ) >> 52
mantissa = ( big_end_int & 0x000FFFFFFFFFFFFF ) >> 0
[ sign, exponent, mantissa ]
end
end
and testing...
def printer val, vals
printf "%-15s sign|%01b|\n", val, vals[0]
printf " hex e|%3x| m|%013x|\n", vals[1], vals[2]
printf " bin e|%011b| m|%052b|\n\n", vals[1], vals[2]
end
floats = [ 12.125, -12.125, 1.0/3, -1.0/3, 1.0, -1.0, 1.131313131313, -1.131313131313 ]
floats.each do |v|
printer v, v.ieee745_binary64
printer v, v.ieee745_binary64_big
end
TIL my brain is big endian! You'll note the ints being worked with are both big endian. I failed at bit shifting the other way.
Use frexp from the Math module. From the doc:
fraction, exponent = Math.frexp(1234) #=> [0.6025390625, 11]
fraction * 2**exponent #=> 1234.0
The sign bit is easy to find on its own.

Why are consecutive int data type variables located at 12 bytes offset in visual studio?

To clarify the question, please observe the c/c++ code fragment:
int a = 10, b = 20, c = 30, d = 40; //consecutive 4 int data values.
int* p = &d; //address of variable d.
Now, in visual studio (tested on 2013), if value of p == hex_value (which can be viewed in debugger memory window), then, you can observe that, the addresses for other variables a, b, c, and d are each at a 12 byte difference!
So, if p == hex_value, then it follows:
&c == hex_value + 0xC (note hex C is 12 in decimal)
&b == &c + 0xC
&a == &b + 0xC
So, why is there a 12 bytes offset instead of 4 bytes -- int are just 4 bytes?
Now, if we declared an array:
int array[] = {10,20,30,40};
The values 10, 20, 30, 40 each are located at 4 bytes difference as expected!
Can anyone please explain this behavior?
The standard C++ states in section 8.3.4 Arrays that "An object of array type contains a contiguously allocated non-empty set of N subobjects of type T."
This is why, array[] will be a set of contiguous int, and that difference between one element and the next will be exactly sizeof(int).
For local/block variables (automatic storage), no such guarantee is given. The only statements are in section 1.7. The C++ memory model: "Every byte has a unique address." and 1.8. The C++ object model: "the address of that object is the address of the first byte it occupies. Two objects (...) shall have distinct addresses".
So everything that you do assuming contiguousness of such objects would be undefined behavior and non portable. You cannot even be sure of the order of the addresses in which these objects are created.
Now I have played with a modified version of your code:
int a = 10, b = 20, c = 30, d = 40; //consecutive 4 int data values.
int* p = &d; //address of variable d.
int array[] = { 10, 20, 30, 40 };
char *pa = reinterpret_cast<char*>(&a),
*pb = reinterpret_cast<char*>(&b),
*pc = reinterpret_cast<char*>(&c),
*pd = reinterpret_cast<char*>(&d);
cout << "sizeof(int)=" << sizeof(int) << "\n &a=" << &a << \
" +" << pa - pb << "char\n &b=" << &b << \
" +" << pb - pc << "char\n &c=" << &c << \
" +" << pc - pd << "char\n &d=" << &d;
memset(&d, 0, (&a - &d)*sizeof(int));
// ATTENTION: undefined behaviour:
// will trigger core dump on leaving
// "Runtime check #2, stack arround the variable b was corrupted".
When running this code I get:
debug release comment on release
sizeof(int)=4 sizeof(int)=4
&a=0052F884 +12char &a=009EF9AC +4char
&b=0052F878 +12char &b=009EF9A8 +-8char // is before a
&c=0052F86C +12char &c=009EF9B0 +12char // is just after a !!
&d=0052F860 &d=009EF9A4
So you see that the order of the addresses may even be altered on the same compiler, depending on the build options !! In fact, in release mode the variables are contiguous but not in the same order.
The extra spaces on the debug version come from option /RTCs. I have on purpose overwritten the variables with a harsh memset() that assumes they are contiguous. Upon exit of the execution, I get immediately a message: "Runtime check #2, stack arround the variable b was corrupted", which clearly demonstrate the purpose of these extra chars.
If you remove the option, you will get with MSVC13 contiguous variables, each of 4 bytes as you did expect. But there will be no more error message about corruption of stack either.

How to calculate g values from LIS3DH sensor?

I am using LIS3DH sensor with ATmega128 to get the acceleration values to get motion. I went through the datasheet but it seemed inadequate so I decided to post it here. From other posts I am convinced that the sensor resolution is 12 bit instead of 16 bit. I need to know that when finding g value from the x-axis output register, do we calculate the two'2 complement of the register values only when the sign bit MSB of OUT_X_H (High bit register) is 1 or every time even when this bit is 0.
From my calculations I think that we calculate two's complement only when MSB of OUT_X_H register is 1.
But the datasheet says that we need to calculate two's complement of both OUT_X_L and OUT_X_H every time.
Could anyone enlighten me on this ?
Sample code
int main(void)
{
stdout = &uart_str;
UCSRB=0x18; // RXEN=1, TXEN=1
UCSRC=0x06; // no parit, 1-bit stop, 8-bit data
UBRRH=0;
UBRRL=71; // baud 9600
timer_init();
TWBR=216; // 400HZ
TWSR=0x03;
TWCR |= (1<<TWINT)|(1<<TWSTA)|(0<<TWSTO)|(1<<TWEN);//TWCR=0x04;
printf("\r\nLIS3D address: %x\r\n",twi_master_getchar(0x0F));
twi_master_putchar(0x23, 0b000100000);
printf("\r\nControl 4 register 0x23: %x", twi_master_getchar(0x23));
printf("\r\nStatus register %x", twi_master_getchar(0x27));
twi_master_putchar(0x20, 0x77);
DDRB=0xFF;
PORTB=0xFD;
SREG=0x80; //sei();
while(1)
{
process();
}
}
void process(void){
x_l = twi_master_getchar(0x28);
x_h = twi_master_getchar(0x29);
y_l = twi_master_getchar(0x2a);
y_h = twi_master_getchar(0x2b);
z_l = twi_master_getchar(0x2c);
z_h = twi_master_getchar(0x2d);
xvalue = (short int)(x_l+(x_h<<8));
yvalue = (short int)(y_l+(y_h<<8));
zvalue = (short int)(z_l+(z_h<<8));
printf("\r\nx_val: %ldg", x_val);
printf("\r\ny_val: %ldg", y_val);
printf("\r\nz_val: %ldg", z_val);
}
I wrote the CTRL_REG4 as 0x10(4g) but when I read them I got 0x20(8g). This seems bit bizarre.
Do not compute the 2s complement. That has the effect of making the result the negative of what it was.
Instead, the datasheet tells us the result is already a signed value. That is, 0 is not the lowest value; it is in the middle of the scale. (0xffff is just a little less than zero, not the highest value.)
Also, the result is always 16-bit, but the result is not meant to be taken to be that accurate. You can set a control register value to to generate more accurate values at the expense of current consumption, but it is still not guaranteed to be accurate to the last bit.
the datasheet does not say (at least the register description in chapter 8.2) you have to calculate the 2' complement but stated that the contents of the 2 registers is in 2's complement.
so all you have to do is receive the two bytes and cast it to an int16_t to get the signed raw value.
uint8_t xl = 0x00;
uint8_t xh = 0xFC;
int16_t x = (int16_t)((((uint16)xh) << 8) | xl);
or
uint8_t xa[2] {0x00, 0xFC}; // little endian: lower byte to lower address
int16_t x = *((int16*)xa);
(hope i did not mixed something up with this)
I have another approach, which may be easier to implement as the compiler will do all of the work for you. The compiler will probably do it most efficiently and with no bugs too.
Read the raw data into the raw field in:
typedef union
{
struct
{
// in low power - 8 significant bits, left justified
int16 reserved : 8;
int16 value : 8;
} lowPower;
struct
{
// in normal power - 10 significant bits, left justified
int16 reserved : 6;
int16 value : 10;
} normalPower;
struct
{
// in high resolution - 12 significant bits, left justified
int16 reserved : 4;
int16 value : 12;
} highPower;
// the raw data as read from registers H and L
uint16 raw;
} LIS3DH_RAW_CONVERTER_T;
than use the value needed according to the power mode you are using.
Note: In this example, bit fields structs are BIG ENDIANS.
Check if you need to reverse the order of 'value' and 'reserved'.
The LISxDH sensors are 2's complement, left-justified. They can be set to 12-bit, 10-bit, or 8-bit resolution. This is read from the sensor as two 8-bit values (LSB, MSB) that need to be assembled together.
If you set the resolution to 8-bit, just can just cast LSB to int8, which is the likely your processor's representation of 2's complement (8bit). Likewise, if it were possible to set the sensor to 16-bit resolution, you could just cast that to an int16.
However, if the value is 10-bit left justified, the sign bit is in the wrong place for an int16. Here is how you convert it to int16 (16-bit 2's complement).
1.Read LSB, MSB from the sensor:
[MMMM MMMM] [LL00 0000]
[1001 0101] [1100 0000] //example = [0x95] [0xC0] (note that the LSB comes before MSB on the sensor)
2.Assemble the bytes, keeping in mind the LSB is left-justified.
//---As an example....
uint8_t byteMSB = 0x95; //[1001 0101]
uint8_t byteLSB = 0xC0; //[1100 0000]
//---Cast to U16 to make room, then combine the bytes---
assembledValue = ( (uint16_t)(byteMSB) << UINT8_LEN ) | (uint16_t)byteLSB;
/*[MMMM MMMM LL00 0000]
[1001 0101 1100 0000] = 0x95C0 */
//---Shift to right justify---
assembledValue >>= (INT16_LEN-numBits);
/*[0000 00MM MMMM MMLL]
[0000 0010 0101 0111] = 0x0257 */
3.Convert from 10-bit 2's complement (now right-justified) to an int16 (which is just 16-bit 2's complement on most platforms).
Approach #1: If the sign bit (in our example, the tenth bit) = 0, then just cast it to int16 (since positive numbers are represented the same in 10-bit 2's complement and 16-bit 2's complement).
If the sign bit = 1, then invert the bits (keeping just the 10bits), add 1 to the result, then multiply by -1 (as per the definition of 2's complement).
convertedValueI16 = ~assembledValue; //invert bits
convertedValueI16 &= ( 0xFFFF>>(16-numBits) ); //but keep just the 10-bits
convertedValueI16 += 1; //add 1
convertedValueI16 *=-1; //multiply by -1
/*Note that the last two lines could be replaced by convertedValueI16 = ~convertedValueI16;*/
//result = -425 = 0xFE57 = [1111 1110 0101 0111]
Approach#2: Zero the sign bit (10th bit) and subtract out half the range 1<<9
//----Zero the sign bit (tenth bit)----
convertedValueI16 = (int16_t)( assembledValue^( 0x0001<<(numBits-1) ) );
/*Result = 87 = 0x57 [0000 0000 0101 0111]*/
//----Subtract out half the range----
convertedValueI16 -= ( (int16_t)(1)<<(numBits-1) );
[0000 0000 0101 0111]
-[0000 0010 0000 0000]
= [1111 1110 0101 0111];
/*Result = 87 - 512 = -425 = 0xFE57
Link to script to try out (not optimized): http://tpcg.io/NHmBRR

Ensuring variable size for bitwise operations in Ruby

I have a group of hex values that describe an object in ruby and I want to string them all together into a single bit bucket. In C++ I would do the following:
int descriptor = 0 // or uint64_t to be safe
descriptor += (firstHexValue << 60)
descriptor += (secondHex << 56)
descriptor += (thirdHex << 52)
// ... etc
descriptor += (sixteenthHex << 0)
I want to do the same thing in Ruby, but as Ruby is untyped, I am worried about overflow. If I try and do the same thing in Ruby, is there a way to ensure that descriptor contains 64 bits? Once the descriptors are set, I don't want to suddenly find that only 32 bits are represented and I've lost half of it! How can I safely achieve the same result as above?
Note: Working on OS X 64bit if that is relevant.
Ruby has unlimited integers, so don't worry about that. You won't lose a single bit.
a = 0
a |= (1 << 200)
a # => 1606938044258990275541962092341162602522202993782792835301376

Resources