Why `sizeof var` does not show true size? - gcc

Suppose we use avr-gcc to compile code which has the following structure:
typedef struct {
uint8_t bLength;
uint8_t bDescriptorType;
int16_t wString[];
} S_string_descriptor;
We initialize it globally like this:
const S_string_descriptor sn_desc PROGMEM = {
1 + 1 + sizeof L"1234" - 2, 0x03, L"1234"
};
Let's check what is generated from it:
000000ac <__trampolines_end>:
ac: 0a 03 fmul r16, r18
ae: 31 00 .word 0x0031 ; ????
b0: 32 00 .word 0x0032 ; ????
b2: 33 00 .word 0x0033 ; ????
b4: 34 00 .word 0x0034 ; ????
...
So, indeed string content follows the first two elements of the structure, as required.
But if we try to check sizeof sn_desc, result is 2.
Definition of the variable is done in compile-time, sizeof is also a compile-time operator. So, why sizeof var does not show true size of var? And where this behavior of the compiler (i.e., adding arbitrary data to a structure) is documented?

sn_desc is a 2-byte pointer into flash. It is meant to be used with LPM et alia in order to retrieve the actual data. There is no way to get the actual size of this data; store it separately.

Related

What do the constraints "Rah" and "Ral" mean in extended inline assembly?

This question is inspired by a question asked by someone on another forum. In the following code what does the extended inline assembly constraint Rah and Ral mean. I haven't seen these before:
#include<stdint.h>
void tty_write_char(uint8_t inchar, uint8_t page_num, uint8_t fg_color)
{
asm (
"int $0x10"
:
: "b" ((uint16_t)page_num<<8 | fg_color),
"Rah"((uint8_t)0x0e), "Ral"(inchar));
}
void tty_write_string(const char *string, uint8_t page_num, uint8_t fg_color)
{
while (*string)
tty_write_char(*string++, page_num, fg_color);
}
/* Use the BIOS to print the first command line argument to the console */
int main(int argc, char *argv[])
{
if (argc > 1)
tty_write_string(argv[1], 0, 0);
return 0;
}
In particular are the use of Rah and Ral as constraints in this code:
asm (
"int $0x10"
:
: "b" ((uint16_t)page_num<<8 | fg_color),
"Rah"((uint8_t)0x0e), "Ral"(inchar));
The GCC Documentation doesn't have an l or h constraint for either simple constraints or x86/x86 machine constraints. R is any legacy register and a is the AX/EAX/RAX register.
What am I not understanding?
What you are looking at is code that is intended to be run in real mode on an x86 based PC with a BIOS. Int 0x10 is a BIOS service that has the ability to write to the console. In particular Int 0x10/AH=0x0e is to write a single character to the TTY (terminal).
That in itself doesn't explain what the constraints mean. To understand the constraints Rah and Ral you have to understand that this code isn't being compiled by a standard version of GCC/CLANG. It is being compiled by a GCC port called ia16-gcc. It is a special port that targets 8086/80186 and 80286 and compatible processors. It doesn't generate 386 instructions or use 32-bit registers in code generation. This experimental version of GCC is to target 16-bit environments like DOS (FreeDOS, MSDOS), and ELKS.
The documentation for ia16-gcc is hard to find online in HTML format but I have produced a copy for the recent GCC 6.3.0 versions of the documentation on GitHub. The documentation was produced by building ia16-gcc from source and using make to generate the HTML. If you review the machine constraints for Intel IA-16—config/ia16 you should now be able to see what is going on:
Ral The al register.
Rah The ah register.
This version of GCC doesn't understand the R constraint by itself anymore. The inline assembly you are looking at matches that of the parameters for Int 0x10/Ah=0xe:
VIDEO - TELETYPE OUTPUT
AH = 0Eh
AL = character to write
BH = page number
BL = foreground color (graphics modes only)
Return:
Nothing
Desc: Display a character on the screen, advancing the cursor
and scrolling the screen as necessary
Other Information
The documentation does list all the constraints that are available for the IA16 target:
Intel IA-16—config/ia16/constraints.md
a
The ax register. Note that for a byte operand,
this constraint means that the operand can go into either al or ah.
b
The bx register.
c
The cx register.
d
The dx register.
S
The si register.
D
The di register.
Ral
The al register.
Rah
The ah register.
Rcl
The cl register.
Rbp
The bp register.
Rds
The ds register.
q
Any 8-bit register.
T
Any general or segment register.
A
The dx:ax register pair.
j
The bx:dx register pair.
l
The lower half of pairs of 8-bit registers.
u
The upper half of pairs of 8-bit registers.
k
Any 32-bit register group with access to the two lower bytes.
x
The si and di registers.
w
The bx and bp registers.
B
The bx, si, di and bp registers.
e
The es register.
Q
Any available segment register—either ds or es (unless one or both have been fixed).
Z
The constant 0.
P1
The constant 1.
M1
The constant -1.
Um
The constant -256.
Lbm
The constant 255.
Lor
Constants 128 … 254.
Lom
Constants 1 … 254.
Lar
Constants -255 … -129.
Lam
Constants -255 … -2.
Uo
Constants 0xXX00 except -256.
Ua
Constants 0xXXFF.
Ish
A constant usable as a shift count.
Iaa
A constant multiplier for the aad instruction.
Ipu
A constant usable with the push instruction.
Imu
A constant usable with the imul instruction except 257.
I11
The constant 257.
N
Unsigned 8-bit integer constant (for in and out instructions).
There are many new constraints and some repurposed ones.
In particular the a constraint for the AX register doesn't work like other versions of GCC that target 32-bit and 64-bit code. The compiler is free to choose either AH or AL with the a constraint if the values being passed are 8 bit values. This means it is possible for the a constraint to appear twice in an extended inline assembly statement.
You could have compiled your code to a DOS EXE with this command:
ia16-elf-gcc -mcmodel=small -mregparmcall -march=i186 \
-Wall -Wextra -std=gnu99 -O3 int10h.c -o int10h.exe
This targets the 80186. You can generate 8086 compatible code by omitting the -march=i186 The generated code for main would look something like:
00000000 <main>:
0: 83 f8 01 cmp ax,0x1
3: 7e 1d jle 22 <tty_write_string+0xa>
5: 56 push si
6: 89 d3 mov bx,dx
8: 8b 77 02 mov si,WORD PTR [bx+0x2]
b: 8a 04 mov al,BYTE PTR [si]
d: 20 c0 and al,al
f: 74 0d je 1e <tty_write_string+0x6>
11: 31 db xor bx,bx
13: b4 0e mov ah,0xe
15: 46 inc si
16: cd 10 int 0x10
18: 8a 04 mov al,BYTE PTR [si]
1a: 20 c0 and al,al
1c: 75 f7 jne 15 <main+0x15>
1e: 31 c0 xor ax,ax
20: 5e pop si
21: c3 ret
22: 31 c0 xor ax,ax
24: c3 ret
When run with the command line int10h.exe "Hello, world!" should print:
Hello, world!
Special Note: The IA16 port of GCC is very experimental and does have some code generation bugs especially when higher optimization levels are used. I wouldn't use it for mission critical applications at this point in time.

instruction repeated twice when decoded into machine language,

Am basically learning how to make my own instruction in the X86 architecture, but to do that I am understanding how they are decoded and and interpreted to a low level language,
By taking an example of a simple mov instruction and using the .byte notation I wanted to understand in detail as to how instructions are decoded,
My simple code is as follows:
#include <stdio.h>
#include <iostream>
int main(int argc, char const *argv[])
{
int x{5};
int y{0};
// mov %%eax, %0
asm (".byte 0x8b,0x45,0xf8\n\t" //mov %1, eax
".byte 0x89, 0xC0\n\t"
: "=r" (y)
: "r" (x)
);
printf ("dst value : %d\n", y);
return 0;
}
and when I use objdump to analyze how it is broken down to machine language, i get the following output:
000000000000078a <main>:
78a: 55 push %ebp
78b: 48 dec %eax
78c: 89 e5 mov %esp,%ebp
78e: 48 dec %eax
78f: 83 ec 20 sub $0x20,%esp
792: 89 7d ec mov %edi,-0x14(%ebp)
795: 48 dec %eax
796: 89 75 e0 mov %esi,-0x20(%ebp)
799: c7 45 f8 05 00 00 00 movl $0x5,-0x8(%ebp)
7a0: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
7a7: 8b 45 f8 mov -0x8(%ebp),%eax
7aa: 8b 45 f8 mov -0x8(%ebp),%eax
7ad: 89 c0 mov %eax,%eax
7af: 89 45 fc mov %eax,-0x4(%ebp)
7b2: 8b 45 fc mov -0x4(%ebp),%eax
7b5: 89 c6 mov %eax,%esi
7b7: 48 dec %eax
7b8: 8d 3d f7 00 00 00 lea 0xf7,%edi
7be: b8 00 00 00 00 mov $0x0,%eax
7c3: e8 78 fe ff ff call 640 <printf#plt>
7c8: b8 00 00 00 00 mov $0x0,%eax
7cd: c9 leave
7ce: c3 ret
With regard to this output of objdump why is the instruction 7aa: 8b 45 f8 mov -0x8(%ebp),%eax repeated twice, any reason behind it or am I doing something wrong while using the .byte notation?
One of those is compiler-generated, because you asked GCC to have the input in its choice of register for you. That's what "r"(x) means. And you compiled with optimization disabled (the default -O0) so it actually stored x to memory and then reloaded it before your asm statement.
Your code has no business assuming anything about the contents of memory or where EBP points.
Since you're using 89 c0 mov %eax,%eax, the only safe constraints for your asm statement are "a" explicit-register constraints for input and output, forcing the compiler to pick that. If you compile with optimization enabled, your code totally breaks because you lied to the compiler about what your code actually does.
// constraints that match your manually-encoded instruction
asm (".byte 0x89, 0xC0\n\t"
: "=a" (y)
: "a" (x)
);
There's no constraint to force GCC to pick a certain addressing mode for a "m" source or "=m" dest operand so you need to ask for inputs/outputs in specific registers.
If you want to encode your own mov instructions differently from standard mov, see which MOV instructions in the x86 are not used or the least used, and can be used for a custom MOV extension - you might want to use a prefix in front of regular mov opcodes so you can let the assembler encode registers and addressing modes for you, like .byte something; mov %1, %0.
Look at the compiler-generate asm output (gcc -S, not disassembly of the .o or executable). Then you can see which instructions come from the asm statement and which are emitted by GCC.
If you don't explicitly reference some operands in the asm template but still want to see what the compiler picked, you can use them in asm comments like this:
asm (".byte 0x8b,0x45,0xf8 # 0 = %0 1 = %1 \n\t"
".byte 0x89, 0xC0\n\t"
: "=r" (y)
: "r" (x)
);
and gcc will fill it in for you so you can see what operands it expects you to be reading and writing. (Godbolt with g++ -m32 -O3). I put your code in void foo(){} instead of main because GCC -m32 thinks it needs to re-align the stack at the top of main. This makes the code a lot harder to follow.
# gcc-9.2 -O3 -m32 -fverbose-asm
.LC0:
.string "dst value : %d\n"
foo():
subl $20, %esp #,
movl $5, %eax #, tmp84
## Notice that GCC hasn't set up EBP at all before it runs your asm,
## and hasn't stored x in memory.
## It only put it in a register like you asked it to.
.byte 0x8b,0x45,0xf8 # 0 = %eax 1 = %eax # y, tmp84
.byte 0x89, 0xC0
pushl %eax # y
pushl $.LC0 #
call printf #
addl $28, %esp #,
ret
Also note that if you were compiling as 64-bit, it would probably pick %esi as a register because printf will want its 2nd arg there. So the "a" instead of "r" constraint would actually matter.
You could get 32-bit GCC to use a different register if you were assigning to a variable that has to survive across a function call; then GCC would pick a call-preserved reg like EBX instead of EAX.

Memory overwrite when using strong and weak symbols in two linked files

Assume we have a 64-bit x86 machine, which is little-endian and therefore stores the least-significant byte of a word in the byte with the lowest address.
Assuming standard alignment rules for a 64-bit x86 Linux C compiler.
Consider
File 1:
#include <stdio.h>
struct cs {
int count;
unsigned short flags;
};
struct cs gcount;
extern void add_counter( int n );
int main(int c, char *argv[]);
int main(int c, char *argv[]) {
gcount.flags = 0xe700;
gcount.count = 1;
add_counter(42);
printf("count =%d\n", gcount.count);
return 0;
}
File 2:
struct cs {
unsigned short flags;
int count;
};
struct cs gcount = {0,0};
void add_counter (int n) {
gcount.count +=n;
}
If compiled the output is 1.
Explanation:
count is defined as a strong global int the second file is thus
initialized to {0,0}, here the order doesn't matter yet since it's
just all zeroes.
A struct / type is defined per compilation unit so the first file uses
the first definition to write to the struct meaning
gcount.flags = 0xe700; gcount.count=1;
cause the memory to look like
[e7 00 | 00 00 00 01] where (in little endian) the left is the top and
the right is the bottom of memory.
(there's no padding between the two fields since short is at the end,
sizeof will report 8B though)
when calling add_counter(42), the second file will use the second
definition of cs and look at the memory as
[e7 00 00 00 | 00 01]
Now there's a 2B padding in between the two fields and the write
access to the count will thus affect the range
[e7 00 00 00 | 00 01]
42 is 0x2a in hexadecimal (2*16 + 10) and will thus result in
[e7 2a 00 00 | 00 01]
converting this back to the view the first file has we get
[e7 2a | 00 00 00 01]
and thus the result is 1 instead of the expected 43.
Now I do get the general gist but I'm a bit confused about why we get [*e7 2a* 00 00 | 00 01] when adding 42=0x2a and not [*e7 00 00 2a | 00 01].
I'm expecting [*e7 00 00 2a | 00 01] because we are using little-endian, meaning, the most right bit is the LSB. So e7 would actualy represent the most significant 8 bits here.
My disparaging comments about the exercise itself notwithstanding, it is possible to interpret the question as a simpler one about byte ordering. In that sense, the issue is with this assertion:
little-endian, meaning, the most right bit is the LSB.
Little-endian means that the bytes are ordered from least significant to most significant. The term having been coined in English and English being written left-to-right, that means the most left byte is the LSB in little-endian ordering.

What's the memory layout of UTF-16 encoded strings with Visual Studio 2015?

WinAPI uses wchar_t buffers. As I understand we need to use UTF-16 to encode all our arguments to WinAPI.
We have two versions of UTF-16: UTF-16be and UTF-16le. Let encode a string "Example" 0x45 0x78 0x61 0x6d 0x70 0x6c 0x65. With UTF-16be bytes should be placed as this: 00 45 00 78 00 61 00 6d 00 70 00 6c 00 65. With UTF-16le it should be 45 00 78 00 61 00 6d 00 70 00 6c 00 65 00. (We are omitting BOM). Byte representations of the same string are different.
According to the docs Windows uses UTF-16le. This means that we should encode all strings with UTF-16le or it would not work.
At the same time my compiler (VS2015) uses UTF-16be for the strings that I hard coded into my code (smth like L"my test string"). But WinAPI works well with these strings. Why it works? What am I missing?
Update 1:
To test byte representation of hard coded strings I used following code:
std::string charToHex(wchar_t ch)
{
const char alphabet[] = "0123456789ABCDEF";
std::string result(4, ' ');
result[0] = alphabet[static_cast<unsigned int>((ch & 0xf000) >> 12)];
result[1] = alphabet[static_cast<unsigned int>((ch & 0xf00) >> 8)];
result[2] = alphabet[static_cast<unsigned int>((ch & 0xf0) >> 4)];
result[3] = alphabet[static_cast<unsigned int>(ch & 0xf)];
return std::move(result);
}
Little endian or big endian describes the way that variables of more than 8 bits are stored in memory. The test you have devised doesn't test memory layout, it's working with wchar_t types directly; the upper bits of an integer type are always the upper bits, no matter if the CPU is big endian or little endian!
This modification to your code will show how it really works.
std::string charToHex(wchar_t * pch)
{
const char alphabet[] = "0123456789ABCDEF";
std::string result;
unsigned char * pbytes = static_cast<unsigned char *>(pch);
for (int i = 0; i < sizeof(wchar_t); ++i)
{
result.push_back(alphabet[(pbytes[i] & 0xf0) >> 4];
result.push_back(alphabet[pbytes[i] & 0x0f];
}
return std::move(result);
}

Arithmetic shift using intel intrinsics

I have a set of bits, for example: 1000 0000 0000 0000 which is 16 bits and therefore a short. I would like to use an arithmetic shift so that I use the MSB to assign the rest of the bits:
1111 1111 1111 1111
and if I started off with 0000 0000 0000 0000:
after arithmetically shifting I would still have: 0000 0000 0000 0000
How can I do this without resorting to assembly? I looked at the intel intrinsics guide and it looks like I need to do this using AVX extensions but they look at data types larger than my short.
As mattnewport states in his answer, C code can do the job efficiently though with the possibility of 'implementation defined' behavior. This answer shows how to avoid the implementation defined behavior while preserving efficient code generation.
Because the question is about shifting a 16-bit operand, the concern over the implementation defined decision whether to sign extend or zero fill can be avoided by first sign extending the operand to 32-bits. Then the 32-bit value can be right shifted as unsigned, and finally truncated back to 16-bits.
Mattnewport's code actually sign extends the 16-bit operand to an int (32-bit or 64-bit depending on the compiler model) before shifting. This is because the language specification (C99 6.5.7 bitwise shift operators) requires a first step: integer promotions are performed on each of the operands. Likewise, the result of mattnewport's code is an int because The type of the result is that of the promoted left operand. Because of this, the code variation that avoids implementation defined behavior generates the same number of instructions as mattnewport's original code.
To avoid implementation defined behavior, the implicit promotion to signed int is replaced with an explicit promotion to unsigned int. This eliminates any possibility of implementation defined behavior while maintaining the same code efficiency.
The idea can be extended to cover 32-bit operands and works efficiently when 64-bit native integer support is present. Here is an example:
// use standard 'll' for long long print format
#define __USE_MINGW_ANSI_STDIO 1
#include <stdio.h>
#include <stdint.h>
// Code provided by mattnewport
int16_t aShiftRight16x (int16_t val, int count)
{
return val >> count;
}
// This variation avoids implementation defined behavior
int16_t aShiftRight16y (int16_t val, int count)
{
uint32_t uintVal = val;
uint32_t uintResult = uintVal >> count;
return (int16_t) uintResult;
}
// A 32-bit arithmetic right shift without implementation defined behavior
int32_t aShiftRight32 (int32_t val, int count)
{
uint64_t uint64Val = val;
uint64_t uint64Result = uint64Val >> count;
return (int32_t) uint64Result;
}
int main (void)
{
int16_t val16 = 0x8000;
int32_t val32 = 0x80000000;
int count;
for (count = 0; count <= 15; count++)
printf ("%04hX %04hX %08X\n", aShiftRight16x (val16, count),
aShiftRight16y (val16, count),
aShiftRight32 (val32, count));
return 0;
}
Here is gcc 4.8.1 x64 code generation:
0000000000000030 <aShiftRight16x>:
30: 0f bf c1 movsx eax,cx
33: 89 d1 mov ecx,edx
35: d3 f8 sar eax,cl
37: c3 ret
0000000000000040 <aShiftRight16y>:
40: 0f bf c1 movsx eax,cx
43: 89 d1 mov ecx,edx
45: d3 e8 shr eax,cl
47: c3 ret
0000000000000050 <aShiftRight32>:
50: 48 63 c1 movsxd rax,ecx
53: 89 d1 mov ecx,edx
55: 48 d3 e8 shr rax,cl
58: c3 ret
Here is MS visual studio x64 code generation:
aShiftRight16x:
00: 0F BF C1 movsx eax,cx
03: 8B CA mov ecx,edx
05: D3 F8 sar eax,cl
07: C3 ret
aShiftRight16y:
10: 0F BF C1 movsx eax,cx
13: 8B CA mov ecx,edx
15: D3 E8 shr eax,cl
17: C3 ret
aShiftRight32:
20: 48 63 C1 movsxd rax,ecx
23: 8B CA mov ecx,edx
25: 48 D3 E8 shr rax,cl
28: C3 ret
program output:
8000 8000 80000000
C000 C000 C0000000
E000 E000 E0000000
F000 F000 F0000000
F800 F800 F8000000
FC00 FC00 FC000000
FE00 FE00 FE000000
FF00 FF00 FF000000
FF80 FF80 FF800000
FFC0 FFC0 FFC00000
FFE0 FFE0 FFE00000
FFF0 FFF0 FFF00000
FFF8 FFF8 FFF80000
FFFC FFFC FFFC0000
FFFE FFFE FFFE0000
FFFF FFFF FFFF0000
I'm not sure why you are looking for intrinsics for this. Why not just use ordinary C++ right shift? This behavior is implementation defined but AFAIK on Intel platforms it will always sign extend.
int16_t val = 1 << 15; // 1000 0000 0000 0000
int16_t shiftVal = val >> 15; // 1111 1111 1111 1111

Resources