getelementptr has -1 as the first index operand - compilation

I'm reading the IR of nginx generated by Clang. In function ngx_event_expire_timers, there are some getelementptr instructions with i64 -1 as first index operand. For example,
%handler = getelementptr inbounds %struct.ngx_rbtree_node_s, %struct.ngx_rbtree_node_s* %node.addr.0.i, i64 -1, i32 2
I know the first index operand will be used as an offset to the first operand. But what does a negative offset mean?

The GEP instruction is perfectly fine with negative indices.
In this case you have something like:
node arr[100];
node* ptr = arr[50];
if ( (ptr-1)->value == ptr->value)
// then ...
GEP with negative indices just calculate the offset to the base pointer into the other direction. There is nothing wrong with it.

Considering what is doing inside nginx source code, the semantics of the getelementptr instruction is interesting. It's the result of two lines of C source code:
ev = (ngx_event_t *) ((char *) node - offsetof(ngx_event_t, timer));
ev->handler(ev);
node is of type ngx_rbtree_node_t, which is a member of ev's type ngx_event_t. That is like:
struct ngx_event_t {
....
struct ngx_rbtree_node_t time;
....
};
struct ngx_event_t *ev;
struct ngx_rbtree_node_t *node;
timer is the name of struct ngx_event_t member where node should point to.
|<- ngx_rbtree_node_t ->|
|<- ngx_event_t ->|
------------------------------------------------------
| (some data) | "time" | (some data)
------------------------------------------------------
^ ^
ev node
The graph above shows the layout of an instance of ngx_event_t. The result of offsetof(ngx_event_t, time) is 40. That means the some data before time is of 40 bytes. And the size of ngx_rbtree_node_t is also 40 bytes, by coincidence. So the i64 -1 in the first index oprand of getelementptr instruction computes the base address of the ngx_event_t containing node, which is 40 bytes ahead of node.
handler is another member of ngx_event_t, which is 16 bytes behind the base of ngx_event_t. By (another) coincidence, the third member of ngx_rbtree_node_t is also 16 bytes behind the base address of ngx_rbtree_node_t. So the i32 2 in getelementptr instruction will add 16 bytes to ev, to get the address of handler.
Note that the 16 bytes is computed from the layout of ngx_rbtree_node_t, but not ngx_event_t. Clang must have done some computations to ensure the correctness of the getelementptr instruction. Before use the value of %handler, there is a bitcast instruction which casts %handler to a function pointer type.
What Clang has done breaks the type transformation process defined in C source code. But the result is the exactly same.

Related

Odd behavior of Golang type conversion [duplicate]

Why does below code fail to compile?
package main
import (
"fmt"
"unsafe"
)
var x int = 1
const (
ONE int = 1
MIN_INT int = ONE << (unsafe.Sizeof(x)*8 - 1)
)
func main() {
fmt.Println(MIN_INT)
}
I get an error
main.go:12: constant 2147483648 overflows int
Above statement is correct. Yes, 2147483648 overflows int (In 32 bit architecture). But the shift operation should result in a negative value ie -2147483648.
But the same code works, If I change the constants into variables and I get the expected output.
package main
import (
"fmt"
"unsafe"
)
var x int = 1
var (
ONE int = 1
MIN_INT int = ONE << (unsafe.Sizeof(x)*8 - 1)
)
func main() {
fmt.Println(MIN_INT)
}
There is a difference in evaluation between constant and non-constant expression that arises from constants being precise:
Numeric constants represent exact values of arbitrary precision and do not overflow.
Typed constant expressions cannot overflow; if the result cannot be represented by its type, it's a compile-time error (this can be detected at compile-time).
The same thing does not apply to non-constant expressions, as this can't be detected at compile-time (it could only be detected at runtime). Operations on variables can overflow.
In your first example ONE is a typed constant with type int. This constant expression:
ONE << (unsafe.Sizeof(x)*8 - 1)
Is a constant shift expression, the following applies: Spec: Constant expressions:
If the left operand of a constant shift expression is an untyped constant, the result is an integer constant; otherwise it is a constant of the same type as the left operand, which must be of integer type.
So the result of the shift expression must fit into an int because this is a constant expression; but since it doesn't, it's a compile-time error.
In your second example ONE is not a constant, it's a variable of type int. So the shift expression here may –and will– overflow, resulting in the expected negative value.
Notes:
Should you change ONE in the 2nd example to a constant instead of a variable, you'd get the same error (as the expression in the initializer would be a constant expression). Should you change ONE to a variable in the first example, it wouldn't work as variables cannot be used in constant expressions (it must be a constant expression because it initializes a constant).
Constant expressions to find min-max values
You may use the following solution which yields the max and min values of uint and int types:
const (
MaxUint = ^uint(0)
MinUint = 0
MaxInt = int(MaxUint >> 1)
MinInt = -MaxInt - 1
)
func main() {
fmt.Printf("uint: %d..%d\n", MinUint, MaxUint)
fmt.Printf("int: %d..%d\n", MinInt, MaxInt)
}
Output (try it on the Go Playground):
uint: 0..4294967295
int: -2147483648..2147483647
The logic behind it lies in the Spec: Constant expressions:
The mask used by the unary bitwise complement operator ^ matches the rule for non-constants: the mask is all 1s for unsigned constants and -1 for signed and untyped constants.
So the typed constant expression ^uint(0) is of type uint and is the max value of uint: it has all its bits set to 1. Given that integers are represented using 2's complement: shifting this to the left by 1 you'll get the value of max int, from which the min int value is -MaxInt - 1 (-1 due to the 0 value).
Reasoning for the different behavior
Why is there no overflow for constant expressions and overflow for non-constant expressions?
The latter is easy: in most other (programming) languages there is overflow. So this behavior is consistent with other languages and it has its benefits.
The real question is the first: why isn't overflow allowed for constant expressions?
Constants in Go are more than values of typed variables: they represent exact values of arbitrary precision. Staying at the word exact, if you have a value that you want to assign to a typed constant, allowing overflow and assigning a completely different value doesn't really live up to exact.
Going forward, this type checking and disallowing overflow can catch mistakes like this one:
type Char byte
var c1 Char = 'a' // OK
var c2 Char = '世' // Compile-time error: constant 19990 overflows Char
What happens here? c1 Char = 'a' works because 'a' is a rune constant, and rune is alias for int32, and 'a' has numeric value 97 which fits into byte's valid range (which is 0..255).
But c2 Char = '世' results in a compile-time error because the rune '世' has numeric value 19990 which doesn't fit into a byte. If overflow would be allowed, your code would compile and assign 22 numeric value ('\x16') to c2 but obviously this wasn't your intent. By disallowing overflow this mistake is easily caught, and at compile-time.
To verify the results:
var c1 Char = 'a'
fmt.Printf("%d %q %c\n", c1, c1, c1)
// var c2 Char = '世' // Compile-time error: constant 19990 overflows Char
r := '世'
var c2 Char = Char(r)
fmt.Printf("%d %q %c\n", c2, c2, c2)
Output (try it on the Go Playground):
97 'a' a
22 '\x16'
To read more about constants and their philosophy, read the blog post: The Go Blog: Constants
And a couple more questions (+answers) that relate and / or are interesting:
Golang: on-purpose int overflow
How does Go perform arithmetic on constants?
Find address of constant in go
Why do these two float64s have different values?
How to change a float64 number to uint64 in a right way?
Writing powers of 10 as constants compactly

Assembly language using signed int multiplication math to perform shifts

This is a bit of a turn around.
Usually one is attempting to use shifts to perform multiplication and not the other way around.
On the Hitachi/Motorola 6309 there is no shift by n bits. There is only shift by 1 bit.
However there is a 16 bit x 16 bit signed multiply (provides a 32 bit signed result).
(EDIT) Using this is no problem for a 16 bit shift (left) however I'm trying to use 2 x 16x16 signed mults to do a 32 bit shift. The high order word of the result for the low order word shift is the problem. (Does that make sence?)
Some pseudo code might help:
result.highword = low word of (val.highword * shiftmulttable[shift])
temp = val.lowword * shiftmulttable[shift]
result.lowword = temp.lowword
result.highword = or (result.highword, temp.highword)
(with some magic on temp.highword to consider signed values)
I have been exercising my logic in an attempt to use this instruction to perform the shifts but so far I have failed.
I can easily achieve any positive value shifts by 0 to 14 but when it comes to shifting by 15 bits (mult by 0x8000) or shifting any negative values certain combinations of values require either:
complementing the result by 1
complementing the result by 2
adding 1 to the result
doing nothing to the result
And I just can't see any pattern to these values.
Any ideas appreciated!
Best I can tell from the problem description, implementing the 32-bit shift would work as desired by using an unsigned 16x16->32 bit multiply. This can easily be synthesized from a signed 16x16->32 multiply instruction by exploiting the two's complement integer representation. If the two factors are a and b, adding b to the high-order 16 bits of the signed product when a is negative, and adding a to the high-order 16 bits of the signed product when b is negative will give us the unsigned multiplication result.
The following C code implements this approach and tests it exhaustively:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
/* signed 16x16->32 bit multiply. Hardware instruction */
int32_t mul16_wide (int16_t a, int16_t b)
{
return (int32_t)a * (int32_t)b;
}
/* unsigned 16x16->32 bit multiply (synthetic) */
int32_t umul16_wide (int16_t a, int16_t b)
{
int32_t p = mul16_wide (a, b); // signed 16x16->32 bit multiply
if (a < 0) p = p + (b << 16); // add 'b' to upper 16 bits of product
if (b < 0) p = p + (a << 16); // add 'a' to upper 16 bits of product
return p;
}
/* unsigned 16x16->32 bit multiply (reference) */
uint32_t umul16_wide_ref (uint16_t a, uint16_t b)
{
return (uint32_t)a * (uint32_t)b;
}
/* test synthetic unsigned multiply exhaustively */
int main (void)
{
int16_t a, b;
int32_t res, ref;
uint64_t count = 0;
a = -32768;
do {
b = -32768;
do {
res = umul16_wide (a, b);
ref = umul16_wide_ref (a, b);
count++;
if (res != ref) {
printf ("!!!! a=%d b=%d res=%d ref=%d\n", a, b, res, ref);
return EXIT_FAILURE;
}
if (b == 32767) break;
b = b + 1;
} while (1);
if (a == 32767) break;
a = a + 1;
} while (1);
printf ("test cases passed: %llx\n", count);
return EXIT_SUCCESS;
}
I am not familiar with the Hitachi/Motorola 6309 architecture. I assume it uses a special 32-bit register to hold the result of a wide multiply, from which high and low half can be extracted into 16-bit general-purpose registers, and the conditional corrections can then be applied to the register holding the upper 16 bits.
Are you using fixed-point multiplicative inverses to use the high half result for a right shift?
If you're just left-shifting, multiply by 0x8000 should work. The low half of an NxN => 2N-bit multiply is the same whether inputs are treated as signed or unsigned. Or do you need a 32-bit shift result from your 16-bit input?
Is the multiply instruction actually faster than a few 1-bit shifts for small shift counts? (I wouldn't be surprised if compile-time-constant counts of 2 or 3 would be faster with just a chain of 2 or 3 add same,same or left-shift instructions.)
Anyway, for a compile-time-constant shift count of 15, maybe just multiply by 1<<14 and then do the last count with a 1-bit shift (add same,same).
Or if your ISA has rotates, rotate right by 1 and mask away the low bits, skipping the multiply. Or zero a register, right-shift the low bit into the carry flag, then rotate-through-carry into the top of the zeroed register.
(The latter might be useful on an ISA that doesn't have large immediates and couldn't "mask away all the low bits" in one instruction. Or an ISA that only has RCR not ROR. I don't know 6309 at all)
If you're using a runtime count to look up a multiplier from a table, maybe branch for that case, or adjust your LUT so every entry needs an extra 1-bit shift, so you can do mul(lut[count]) and an unconditional extra shift.
(Only works if you don't need to support a shift-count of zero.)
Not that there would be many interested people who would want to see the 6309 code, but here it is:
Compliant with OS9 C ABI.
Pointer to result and arguments pushed on stack right to left.
U,PC,val(4bytes),shift(2bytes),*result(2bytes)
0 2 4 8 10
:
* 10,s pointer to long result
* 4,s 4 byte value
* 8,s 2 byte shift
* x = pointer to result
pshs u
ldx 10,s * load pointer to result
ldd 8,s * load shift
* if shift amount is greater than 31 then
* just return zero. OS9 C standard.
cmpd #32
blt _10x
ldq #0
stq 4,s
bra _13x
* if shift amount is greater than 16 than
* move bottom word of value into top word
* and clear bottom word
_10x
cmpb #16
blt _1x
ldu 6,s
stu 4,s
clr 6,s
clr 7,s
_1x
* setup pointer u and offset e into mult table _2x
leau _2x,pc
andb #15
* if there is no shift value just return value
beq _13x
aslb * need to double shift to use as word table offset
stb 8,s * save double shft
tfr b,e
* shift top word q = val.word.high * multtab[shft]
ldd 4,s
muld e,u
stw ,x * result.word.high = low word of mult
* shift bottom word q = val.word.low * multtab[shft]
lde 8,s * reload double shft
ldd 6,s
muld e,u
stw 2,x * result.word.low = low word of mult
* The high word or mult needs to be corrected for sign
* if val is negative then muld will return negated results
* and need to un negate it
lde 8,s * reload double shift
tst 4,s * test top byte of val for negative
bge _11x
addd e,u * add the multtab[shft] again to top word
_11x
* if multtab[shft] is negative (shft is 15 or shft<<1 is 30)
* also need to un negate result
cmpe #30
bne _12x
addd 6,s * add val.word.low to top word
_12x
* combine top and bottom and save bottom half of result
ord ,x
std ,x
bra _14x
* this is only reached if the result is in value (let result = value)
_13x
ldq 4,s * load value
stq ,x * result = value
_14x
puls u,pc
_2x fdb $01,$02,$04,$08,$10,$20,$40,$80,$0100,$0200,$0400,$0800
fdb $1000,$2000,$4000,$8000

Golang shift operator conversion

I can not understand in golang how 1<<s return 0 if var s uint = 33.
But 1<<33 return 8589934592.
How a shift operator conversion end up with a value of 0.
I'm reading the language specification and stuck in this section:
https://golang.org/ref/spec#Operators
Specifically this paragraph from docs:
"The right operand in a shift expression must have unsigned integer
type or be an untyped constant representable by a value of type uint.
If the left operand of a non-constant shift expression is an untyped
constant, it is first implicitly converted to the type it would assume
if the shift expression were replaced by its left operand alone."
Some example from official Golang docs:
var s uint = 33
var i = 1<<s // 1 has type int
var j int32 = 1<<s // 1 has type int32; j == 0
var k = uint64(1<<s) // 1 has type uint64; k == 1<<33
Update:
Another very related question, with an example:
package main
import (
"fmt"
)
func main() {
v := int16(4336)
fmt.Println(int8(v))
}
This program return -16
How does the number 4336 become -16 in converting int16 to int8
If you have this:
var s uint = 33
fmt.Println(1 << s)
Then the quoted part applies:
If the left operand of a non-constant shift expression is an untyped constant, it is first implicitly converted to the type it would assume if the shift expression were replaced by its left operand alone.
Because s is not a constant (it's a variable), therefore 1 >> s is a non-constant shift expression. And the left operand is 1 which is an untyped constant (e.g. int(1) would be a typed constant), so it is converted to a type that it would get if the expression would be simply 1 instead of 1 << s:
fmt.Println(1)
In the above, the untyped constant 1 would be converted to int, because that is its default type. Default type of constants is in Spec: Constants:
An untyped constant has a default type which is the type to which the constant is implicitly converted in contexts where a typed value is required, for instance, in a short variable declaration such as i := 0 where there is no explicit type. The default type of an untyped constant is bool, rune, int, float64, complex128 or string respectively, depending on whether it is a boolean, rune, integer, floating-point, complex, or string constant.
And the result of the above is architecture dependent. If int is 32 bits, it will be 0. If int is 64 bits, it will be 8589934592 (because shifting a 1 bit 33 times will shift it out of a 32-bit int number).
On the Go playground, size of int is 32 bits (4 bytes). See this example:
fmt.Println("int size:", unsafe.Sizeof(int(0)))
var s uint = 33
fmt.Println(1 << s)
fmt.Println(int32(1) << s)
fmt.Println(int64(1) << s)
The above outputs (try it on the Go Playground):
int size: 4
0
0
8589934592
If I run the above app on my 64-bit computer, the output is:
int size: 8
8589934592
0
8589934592
Also see The Go Blog: Constants for how constants work in Go.
Note that if you write 1 << 33, that is not the same, that is not a non-constant shift expression, which your quote applies to: "the left operand of a non-constant shift expression". 1<<33 is a constant shift expression, which is evaluated at "constant space", and the result would be converted to int which does not fit into a 32-bit int, hence the compile-time error. It works with variables, because variables can overflow. Constants do not overflow:
Numeric constants represent exact values of arbitrary precision and do not overflow.
See How does Go perform arithmetic on constants?
Update:
Answering your addition: converting from int16 to int8 simply keeps the lowest 8 bits. And integers are represented using the 2's complement format, where the highest bit is 1 if the number is negative.
This is detailed in Spec: Conversions:
When converting between integer types, if the value is a signed integer, it is sign extended to implicit infinite precision; otherwise it is zero extended. It is then truncated to fit in the result type's size. For example, if v := uint16(0x10F0), then uint32(int8(v)) == 0xFFFFFFF0. The conversion always yields a valid value; there is no indication of overflow.
So when you convert a int16 value to int8, if source number has a 1 in bit position 7 (8th bit), the result will be negative, even if the source wasn't negative. Similarly, if the source has 0 at bit position 7, the result will be positive, even if the source is negative.
See this example:
for _, v := range []int16{4336, -129, 8079} {
fmt.Printf("Source : %v\n", v)
fmt.Printf("Source hex: %4x\n", uint16(v))
fmt.Printf("Result hex: %4x\n", uint8(int8(v)))
fmt.Printf("Result : %4v\n", uint8(int8(v)))
fmt.Println()
}
Output (try it on the Go Playground):
Source : 4336
Source hex: 10f0
Result hex: f0
Result : -16
Source : -129
Source hex: ff7f
Result hex: 7f
Result : 127
Source : 8079
Source hex: 1f8f
Result hex: 8f
Result : -113
See related questions:
When casting an int64 to uint64, is the sign retained?
Format printing the 64bit integer -1 as hexadecimal deviates between golang and C
You're building and running the program in 32bit mode (go playground?). In it, int is 32-bit wide and behaves the same as int32.

How to calculate g values from LIS3DH sensor?

I am using LIS3DH sensor with ATmega128 to get the acceleration values to get motion. I went through the datasheet but it seemed inadequate so I decided to post it here. From other posts I am convinced that the sensor resolution is 12 bit instead of 16 bit. I need to know that when finding g value from the x-axis output register, do we calculate the two'2 complement of the register values only when the sign bit MSB of OUT_X_H (High bit register) is 1 or every time even when this bit is 0.
From my calculations I think that we calculate two's complement only when MSB of OUT_X_H register is 1.
But the datasheet says that we need to calculate two's complement of both OUT_X_L and OUT_X_H every time.
Could anyone enlighten me on this ?
Sample code
int main(void)
{
stdout = &uart_str;
UCSRB=0x18; // RXEN=1, TXEN=1
UCSRC=0x06; // no parit, 1-bit stop, 8-bit data
UBRRH=0;
UBRRL=71; // baud 9600
timer_init();
TWBR=216; // 400HZ
TWSR=0x03;
TWCR |= (1<<TWINT)|(1<<TWSTA)|(0<<TWSTO)|(1<<TWEN);//TWCR=0x04;
printf("\r\nLIS3D address: %x\r\n",twi_master_getchar(0x0F));
twi_master_putchar(0x23, 0b000100000);
printf("\r\nControl 4 register 0x23: %x", twi_master_getchar(0x23));
printf("\r\nStatus register %x", twi_master_getchar(0x27));
twi_master_putchar(0x20, 0x77);
DDRB=0xFF;
PORTB=0xFD;
SREG=0x80; //sei();
while(1)
{
process();
}
}
void process(void){
x_l = twi_master_getchar(0x28);
x_h = twi_master_getchar(0x29);
y_l = twi_master_getchar(0x2a);
y_h = twi_master_getchar(0x2b);
z_l = twi_master_getchar(0x2c);
z_h = twi_master_getchar(0x2d);
xvalue = (short int)(x_l+(x_h<<8));
yvalue = (short int)(y_l+(y_h<<8));
zvalue = (short int)(z_l+(z_h<<8));
printf("\r\nx_val: %ldg", x_val);
printf("\r\ny_val: %ldg", y_val);
printf("\r\nz_val: %ldg", z_val);
}
I wrote the CTRL_REG4 as 0x10(4g) but when I read them I got 0x20(8g). This seems bit bizarre.
Do not compute the 2s complement. That has the effect of making the result the negative of what it was.
Instead, the datasheet tells us the result is already a signed value. That is, 0 is not the lowest value; it is in the middle of the scale. (0xffff is just a little less than zero, not the highest value.)
Also, the result is always 16-bit, but the result is not meant to be taken to be that accurate. You can set a control register value to to generate more accurate values at the expense of current consumption, but it is still not guaranteed to be accurate to the last bit.
the datasheet does not say (at least the register description in chapter 8.2) you have to calculate the 2' complement but stated that the contents of the 2 registers is in 2's complement.
so all you have to do is receive the two bytes and cast it to an int16_t to get the signed raw value.
uint8_t xl = 0x00;
uint8_t xh = 0xFC;
int16_t x = (int16_t)((((uint16)xh) << 8) | xl);
or
uint8_t xa[2] {0x00, 0xFC}; // little endian: lower byte to lower address
int16_t x = *((int16*)xa);
(hope i did not mixed something up with this)
I have another approach, which may be easier to implement as the compiler will do all of the work for you. The compiler will probably do it most efficiently and with no bugs too.
Read the raw data into the raw field in:
typedef union
{
struct
{
// in low power - 8 significant bits, left justified
int16 reserved : 8;
int16 value : 8;
} lowPower;
struct
{
// in normal power - 10 significant bits, left justified
int16 reserved : 6;
int16 value : 10;
} normalPower;
struct
{
// in high resolution - 12 significant bits, left justified
int16 reserved : 4;
int16 value : 12;
} highPower;
// the raw data as read from registers H and L
uint16 raw;
} LIS3DH_RAW_CONVERTER_T;
than use the value needed according to the power mode you are using.
Note: In this example, bit fields structs are BIG ENDIANS.
Check if you need to reverse the order of 'value' and 'reserved'.
The LISxDH sensors are 2's complement, left-justified. They can be set to 12-bit, 10-bit, or 8-bit resolution. This is read from the sensor as two 8-bit values (LSB, MSB) that need to be assembled together.
If you set the resolution to 8-bit, just can just cast LSB to int8, which is the likely your processor's representation of 2's complement (8bit). Likewise, if it were possible to set the sensor to 16-bit resolution, you could just cast that to an int16.
However, if the value is 10-bit left justified, the sign bit is in the wrong place for an int16. Here is how you convert it to int16 (16-bit 2's complement).
1.Read LSB, MSB from the sensor:
[MMMM MMMM] [LL00 0000]
[1001 0101] [1100 0000] //example = [0x95] [0xC0] (note that the LSB comes before MSB on the sensor)
2.Assemble the bytes, keeping in mind the LSB is left-justified.
//---As an example....
uint8_t byteMSB = 0x95; //[1001 0101]
uint8_t byteLSB = 0xC0; //[1100 0000]
//---Cast to U16 to make room, then combine the bytes---
assembledValue = ( (uint16_t)(byteMSB) << UINT8_LEN ) | (uint16_t)byteLSB;
/*[MMMM MMMM LL00 0000]
[1001 0101 1100 0000] = 0x95C0 */
//---Shift to right justify---
assembledValue >>= (INT16_LEN-numBits);
/*[0000 00MM MMMM MMLL]
[0000 0010 0101 0111] = 0x0257 */
3.Convert from 10-bit 2's complement (now right-justified) to an int16 (which is just 16-bit 2's complement on most platforms).
Approach #1: If the sign bit (in our example, the tenth bit) = 0, then just cast it to int16 (since positive numbers are represented the same in 10-bit 2's complement and 16-bit 2's complement).
If the sign bit = 1, then invert the bits (keeping just the 10bits), add 1 to the result, then multiply by -1 (as per the definition of 2's complement).
convertedValueI16 = ~assembledValue; //invert bits
convertedValueI16 &= ( 0xFFFF>>(16-numBits) ); //but keep just the 10-bits
convertedValueI16 += 1; //add 1
convertedValueI16 *=-1; //multiply by -1
/*Note that the last two lines could be replaced by convertedValueI16 = ~convertedValueI16;*/
//result = -425 = 0xFE57 = [1111 1110 0101 0111]
Approach#2: Zero the sign bit (10th bit) and subtract out half the range 1<<9
//----Zero the sign bit (tenth bit)----
convertedValueI16 = (int16_t)( assembledValue^( 0x0001<<(numBits-1) ) );
/*Result = 87 = 0x57 [0000 0000 0101 0111]*/
//----Subtract out half the range----
convertedValueI16 -= ( (int16_t)(1)<<(numBits-1) );
[0000 0000 0101 0111]
-[0000 0010 0000 0000]
= [1111 1110 0101 0111];
/*Result = 87 - 512 = -425 = 0xFE57
Link to script to try out (not optimized): http://tpcg.io/NHmBRR

Why do x64 projects use a default packing alignment of 16?

If you compile the following code in a x64 project in VS2012 without any /Zp flags:
#pragma pack(show)
then the compiler will spit out:
value of pragma pack(show) == 16
If the project uses Win32 then, the compiler will spit out:
value of pragma pack(show) == 8
What I don't understand is that the largest natural alignment of any type (ie: long long and pointer) in Win64 is 8. So why not just make the default alignment 8 for x64?
Somewhat related to that, why would anyone ever use /Zp16?
EDIT:
Here's an example to show what I'm talking about. Even though pointers have a natural alignment of 8 bytes for x64, Zp1 can force them to a 1 byte boundary.
struct A
{
char a;
char* b;
}
// Zp16
// Offset of a == 0
// Offset of b == 8
// Zp1
// Offset of a == 0
// Offset of b == 1
Now if we take an example that uses SSE:
struct A
{
char a;
char* b;
__m128 c; // uses declspec(align(16)) in xmmintrinsic.h
}
// Zp16
// Offset of a == 0
// Offset of b == 8
// Offset of c == 16
// Zp1
// Offset of a == 0
// Offset of b == 1
// Offset of c == 16
If __m128 were truly a builtin type, then I'd expect the offset to be 9 with Zp1. But since it uses __declspec(align(16)) in its definition in xmmintrinsic.h, that trumps any Zp settings.
So here's my question worded a little differently: is there a type for 'c' that has a natural alignment of 16B but will have an offset of 9 in the previous example?
The MSDN page here includes the following relevant information about your question "why not make the default alignment 8 for x64?":
Writing applications that use the latest processor instructions introduces some new constraints and issues. In particular, many new instructions require that data must be aligned to 16-byte boundaries. Additionally, by aligning frequently used data to the cache line size of a specific processor, you improve cache performance. For example, if you define a structure whose size is less than 32 bytes, you may want to align it to 32 bytes to ensure that objects of that structure type are efficiently cached.
Why do x64 projects use a default packing alignment of 16?
On x64 the floating point is performed in the SSE unit. You state that the largest type has alignment 8. But that is not correct. Some of the SSE intrinsic types, for example __m128, have alignment of 16.

Resources