Atmel Studio AVR operand out of range - avr

I am working on a small program to learn how to mix C and Assembly in Atmel Studio GCC. Basically I am writing a C program to initiate the stack pointer in assembly. When I build it I keep getting an error "Operand out of range". I've initiated stack pointers many times in programs but can't get it to work in the ".s" file of this program. I've gotten the program to work with different register in the ".s" file so I don't think it's the connection between the two files. Any help would be appreciated.
Main.c:
#define F_CPU 16000000
#include <avr/io.h>
#include <util/delay.h>
extern void setStackPointer(void);
int main(void)
{
DDRB |= 0x20;
setStackPointer();
while (1)
{
PORTB = 0x20;
_delay_ms(500);
PORTB = 0x00;
_delay_ms(500);
}
}
Assembler1.s:
#define _SFR_ASM_COMPAT 1
#define _SFR_OFFSET 0
#include <avr/io.h>
.global setStackPointer
setStackPointer:
ldi r18, lo8(RAMEND-0x20)
out SPL, R18
ldi R18, hi8(RAMEND-0x20)
out SPH, R18
ret

There are several issues here.
First, the comment by Sir Jo Black is right: there is no reason to
subtract 0x20 from RAMEND. I mean, unless you want to set apart 32
bytes at the end of the RAM...
Second, there is no point in setting the stack pointer yourself. On most
recent AVRs, including the ATmega328P, SP is automatically initialized
by the hardware with RAMEND. C.f. the datasheet. If that weren't
enough, it is initialized again by the C runtime, which gets normally
linked into your program (even a 100% assembly program) if you compile
it with gcc.
Third, from the avr-libc documentation:
For more backwards compatibility, insert the following at the start of
your old assembler source file:
#define __SFR_OFFSET 0
This automatically subtracts 0x20 from I/O space addresses, but it's a
hack, so it is recommended to change your source: wrap such addresses
in macros defined here, as shown below. After this is done, the
__SFR_OFFSET definition is no longer necessary and can be removed.
The recommended way to write that code is then:
setStackPointer:
ldi r18, lo8(RAMEND)
out _SFR_IO_ADDR(SPL), r18
ldi r18, hi8(RAMEND)
out _SFR_IO_ADDR(SPH), r18
ret
If you really want to use the old hack, write
#define __SFR_OFFSET 0
at the beginning of your program. And pay attention to the double
underscore at the beginning of the macro name.

Related

Incorrect Relative call address for 32/16bit bootloader compiled using gcc/ld for x86

This question is similar to Incorrect relative address using GNU LD w/ 16-bit x86, but I could not solve by building a cross-compiler.
The scenario is, I have a second stage bootloader that starts as 16bit, and brings itself up to 32 bit. As a result, I have mixed assembly and C code with 32 and 16 bit code mixed together.
I have included an assembly file which defines a global that I will call from C, basically with the purpose of dropping back to REAL mode to perform BIOS interrupts from the protected mode C environment on demand. So far, the function doesn't do anything except get called, push and pop some registers, and return:
[bits 32]
BIOS_Interrupt:
PUSHF
...
ret
global BIOS_Interrupt
this is included in my main bootloader.asm file that is loaded by the stage 1 mbr.
In C, I have defined:
extern void BIOS_Interrupt(uint32_t intno, uint32_t ax, uint32_t bx, uint32_t cx, uint32_t dx, uint32_t es, uint32_t ds, uint32_t di, uint32_t si);
in a header, and
BIOS_Interrupt(0x15,0,0,0,0,0,0,0,0);
in code, just to test calling
I can see in the resultant disassembled linked binary that the call is invariably set 2 bytes too low in RAM:
00000132 0100 add [bx+si],ax
00000134 009CFA9D add [si-0x6206],bl
00000138 6650 push eax
0000013A 6653 push ebx
...
00001625 6A00 push byte +0x0
00001627 6A00 push byte +0x0
00001629 6A00 push byte +0x0
0000162B 6A00 push byte +0x0
0000162D 6A00 push byte +0x0
0000162F 6A00 push byte +0x0
00001631 6A00 push byte +0x0
00001633 6A00 push byte +0x0
00001635 6A15 push byte +0x15
00001637 E8F9EA call 0x133
The instruction at 135 should be the first instruction reached (0x9C = PUSHF), but the call is for 2 bytes less in memory at 133, causing runtime errors.
I have noticed that by using the NASM .align keyword, the extra NOPs that are generated do compensate for the incorrect relative address.
Is this an issue with the linking process? I have LD running with -melf_i386, NASM with -f elf and GCC with -m32 -ffreestanding -0s -fno-pie -nostdlib
edit: images added for #MichaelPetch. Code is loaded at 0x9000 by MBR. Interestingly, the call shows a correct relative jump, to 0x135, but the executing disassembly at 0x135 looks like the code at 0x133 (0x00, 0x00).
Bochs about to call BIOS_Interrupt
Bochs at call start
edit 2: correction to image 2 after refreshing memdump after call
memdump and dissasembly after calling BIOS_Interrupt (call 0x135)
Thanks again to #MichaelPetch for giving me a few pointers.
I don't think there is an issue with the linker, and that the dissassembly was "tricking" me, in that the combination of 16 and 32 bit code led to inaccurate code.
In the end, it was due to overriding of memory values from prior operations. In the code immediately before the BIOS_Interrupt label, I had defined a dword, dd IDT_REAL, designed to store the IDT for real mode processing. However, I did not realise (or forgot) that the SIDT/LIDT instructions take 6 bytes of data, so when I was calling SIDT, it was overriding the first 2 bytes of the label's location in RAM, resulting in runtime errors. After increasing the size of the variable from dword to qword, I can run just fine w/o error.
The linker/compiler suggestion seems to be a red-herring that I fell for courtesy of objdump. However, I've at least learned from this the benefits of Bochs and double checking code before jumping to conclusions!

How does Windows 10 task manager detect a virtual machine?

The Windows 10 task manager (taskmgr.exe) knows if it is running on a physical or virtual machine.
If you look in the Performance tab you'll notice that the number of processors label either reads Logical processors: or Virtual processors:.
In addition, if running inside a virtual machine, there is also the label Virtual machine: Yes.
See the following two screen shots:
My question is if there is a documented API call taskmgr is using to make this kind of detection?
I had a very short look at the disassembly and it seems that the detection code is somehow related to GetLogicalProcessorInformationEx and/or IsProcessorFeaturePresent and/or NtQuerySystemInformation.
However, I don't see how (at least not without spending some more hours of analyzing the assembly code).
And: This question is IMO not related to other existing questions like How can I detect if my program is running inside a virtual machine? since I did not see any code trying to compare smbios table strings or cpu vendor strings with existing known strings typical for hypervisors ("qemu", "virtualbox", "vmware"). I'm not ruling out that a lower level API implementation does that but I don't see this kind of code in taskmgr.exe.
Update: I can also rule out that taskmgr.exe is using the CPUID instruction (with EAX=1 and checking the hypervisor bit 31 in ECX) to detect a matrix.
Update: A closer look at the disassembly showed that there is indeed a check for bit 31, just not done that obviously.
I'll answer this question myself below.
I've analyzed the x64 taskmgr.exe from Windows 10 1803 (OS Build 17134.165) by tracing back the writes to the memory location that is consulted at the point where the Virtual machine: Yes label is set.
Responsible for that variable's value is the return code of the function WdcMemoryMonitor::CheckVirtualStatus
Here is the disassembly of the first use of the cpuid instruction in this function:
lea eax, [rdi+1] // results in eax set to 1
cpuid
mov dword ptr [rbp+var_2C], ebx // save CPUID feature bits for later use
test ecx, ecx
jns short loc_7FF61E3892DA // negative value check equals check for bit 31
...
return 1
loc_7FF61E3892DA:
// different feature detection code if hypervisor bit is not set
So taskmgr is not using any hardware strings, mac addresses or some other sophisticated technologies but simply checks if the hypervisor bit (CPUID leaf 0x01 ECX bit 31)) is set.
The result is bogus of course since e.g. adding -hypervisor to qemu's cpu parameter disables the hypervisor cpuid flag which results in task manager not showing Virtual machine: yes anymore.
And finally here is some example code (tested on Windows and Linux) that perfectly mimics Windows task manager's test:
#include <stdio.h>
#ifdef _WIN32
#include <intrin.h>
#else
#include <cpuid.h>
#endif
int isHypervisor(void)
{
#ifdef _WIN32
int cpuinfo[4];
__cpuid(cpuinfo, 1);
if (cpuinfo[2] >> 31 & 1)
return 1;
#else
unsigned int eax, ebx, ecx, edx;
__get_cpuid (1, &eax, &ebx, &ecx, &edx);
if (ecx >> 31 & 1)
return 1;
#endif
return 0;
}
int main(int argc, char **argv)
{
if (isHypervisor())
printf("Virtual machine: yes\n");
else
printf("Virtual machine: no\n"); /* actually "maybe */
return 0;
}

The different between with GCC inline assembly and VC

I am migrate VC inline assembly code into GCC inline assembly code.
#ifdef _MSC_VER
// this is raw code.
__asm
{
cmp cx, 0x2e
mov dword ptr ds:[esi*4+0x57F0F0], edi
jmp BW::BWFXN_RefundMin4ReturnAddress
}
#else
// this is my code.
asm
(
"cmp $0x2e, %%cx\n\t"
"movl %%edi, $ds:0x57F0F0(0, %%esi, 4)\n\t"
"jmp %0"
: /* no output */
: "i"(BW::BWFXN_RefundGas3ReturnAddress)
: "cx"
);
#endif
But I got error
Error:junk `:0x57F0F0(0,%esi,4)' after expression
/var/.../cckr7pkp.s:3034: Error:operand type mismatch for `mov'
/var/.../cckr7pkp.s:3035: Error:operand type mismatch for `jmp'
Refer the Address operand syntax
segment:displacement(base register, offset register, scalar multiplier)
is equivalent to
segment:[base register + displacement + offset register * scalar multiplier]
in Intel syntax.
I don't know where is the issue.
This is highly unlikely to work just from getting the syntax correct, because you're depending on values in registers being set to something before the asm statement, and you aren't using any input operands to make that happen. (And for some reason you need to set flags with cmp before jumping?)
If that fragment worked on its own somehow in MSVC, then your code depends on the choices made by MSVC's optimizer (as far as which C value is in which register), which seems insane.
Anyway, the first answer to any inline asm question is https://gcc.gnu.org/wiki/DontUseInlineAsm if you can avoid it. Now might be a good time to rewrite your thing in C (maybe with some __builtin functions if needed).
You should use asm volatile and a "memory" clobber at the very least. The compiler assumes that execution continues after the asm statement, but at least this will make sure it stores everything to memory before the asm, i.e. it's a full memory barrier (against compile-time reordering). But any destructors at the end of the function (or in the callers) won't run, and no stack cleanup will happen; there's really no way to make this safe.
You might be able to use asm goto, but that might only work for labels within the same function.
As far as syntax goes, leave out %%ds: because it's the default segment anyway. (Everything after $ds was considered junk because $ds is the address of the symbol ds. Register names start with %.) Also, just leave out the base entirely, instead of using zero. Use
"movl %%edi, 0x57F0F0( ,%%esi, 4) \n\t"
You could have got a disassembler to tell you how to write that, by assembling the Intel version and disassembling in AT&T syntax.
You can probably implement that store in pure C easily enough, e.g. int32_t *p = (int32_t *)0x57F0F0; p[foo]=bar;.
For the jmp operand, use %c0 to get the address with no $ so the compiler's asm output is jmp 0x12345 instead of jmp $0x12345. See also https://stackoverflow.com/tags/inline-assembly/info for more links to guides + docs.
You can and should look at the gcc -O2 -S output to see what the compiler is feeding to the assembler. i.e. how exactly it's filling in the asm template.
I tested this on Godbolt to make sure it compiles, and see the asm output + disassembly output
void ext(void);
long foo(int a, int b) { return 0; }
static const unsigned my_addr = 0x000045678;
//__attribute__((noinline))
void testasm(void)
{
asm volatile( // and still not safe in general
"movl %%edi, 0x57F0F0( ,%%esi, 4) \n\t"
"jmp %c[foo] \n\t"
"jmp foo \n\t"
"jmp 0x12345 \n\t"
"jmp %c[addr] "
: // no outputs
: // "S" (value_for_esi), "D" (value_for_edi)
[foo] "i" (foo),
[addr] "i" (my_addr)
: "memory" // prevents stores from sinking past this
);
// make sure gcc doesn't need to call any destructors here
// or in our caller
// because jumping away will mean they don't run
}
Notice that an "i" (foo) constraint and a %c[operand] (or %c0) will produce a jmp foo in the asm output, so you can emit a direct jmp by pretending you're using a function pointer.
This also works for absolute addresses. x86 machine code can't encode a direct jump, but GAS asm syntax will let you write a jump target as an absolute numeric address. The linker will fill in the right rel32 offset to reach the absolute address from wherever the jmp ends up.
So your inline asm template just needs to produce jmp 0x12345 as input to the assembler to get a direct jump.
asm output for testasm:
movl %edi, 0x57F0F0( ,%esi, 4)
jmp foo #
jmp foo
jmp 0x12345
jmp 284280 # constant substituted by the compiler from a static const unsigned C variable
ret
disassembly output:
mov %edi,0x57f0f0(,%esi,4)
jmp 80483f0 <foo>
jmp 80483f0 <foo>
jmp 12345 <_init-0x8035f57>
jmp 45678 <_init-0x8002c24>
ret
Note that the jump targets decoded to absolute addresses in hex. (Godbolt doesn't give easy access to copy/paste the raw machine code, but you can see it on mouseover of the left column.)
This only works in position-dependent code (not PIC), otherwise absolute relocations aren't possible. Note that many recent Linux distros ship gcc set to use -pie by default to enable ASLR for 64-bit executables, so you may need -no-pie -fno-pie to make this work, or else ask for the address in a register (r constraint and jmp *%[addr]) to actually do an indirect jump to an absolute address instead of a relative jump.

What is ALIGN in arch/i386/kernel/head.S in linux source code

In the head.s file present in linux source code at path arch/i386/kernel/head.S, ALIGN is used as seen in code snippet given below after ret instruction. My question is that what is this ALIGN, as per my knowledge it is not instruction, not assembler directive, so what is this and why it is used here?
You can get the code of head.S at site given below:
http://kneuro.net/cgi-bin/lxr/http/source/arch/i386/kernel/head.S?v=2.4.0
Path: arch/i386/kernel/head.S
/*
* We depend on ET to be correct. This checks for 287/387.
*/
check_x87:
movb $0,X86_HARD_MATH
clts
fninit
fstsw %ax
cmpb $0,%al
je 1f
movl %cr0,%eax
xorl $4,%eax
movl %eax,%cr0
ret
ALIGN /* why ALIGN is used and what it is? */
1: movb $1,X86_HARD_MATH
.byte 0xDB,0xE4
ret
Actually ALIGN is just a macro, defined at include/linux/linkage.h file:
#ifdef __ASSEMBLY__
#define ALIGN __ALIGN
And __ALIGN definition depends on architecture. For x86 you have next definition (in kernel 2.4), in the same file:
#if defined(__i386__) && defined(CONFIG_X86_ALIGNMENT_16)
#define __ALIGN .align 16,0x90
#define __ALIGN_STR ".align 16,0x90"
#else
#define __ALIGN .align 4,0x90
#define __ALIGN_STR ".align 4,0x90"
#endif
So in the end ALIGN macro is just .align asm directive, and it's either 4- or 16-bytes alignment (depending on CONFIG_X86_ALIGNMENT_16 option value).
You can figure out your CONFIG_X86_ALIGNMENT_16 option value from arch/i386/config.in file. This value actually depends on your processor family.
Another question is why such an alignment is needed at all. And my understanding is next. Usually CPU can access only aligned addresses on bus (for 32-bit bus the address usually should be aligned by 4 bytes, e.g. you can access 0x0, 0x4, 0x8 addresses etc., but you can't access 0x1, 0x3 addresses, because it would lead to unaligned access on bus).
But in your case I believe it's not the case, and alignment is done only for performance reasons. Basically this alignment allows CPU to fetch 1: section more quickly:
ALIGN
1: movb $1,X86_HARD_MATH
.byte 0xDB,0xE4
ret
So it seems like this ALIGN is just some minor optimization.
See also next topics:
[1] Why should code be aligned to even-address boundaries on x86?
[2] Performance optimisations of x86-64 assembly - Alignment and branch prediction

How does gcc know the register size to use in inline assembly?

I have the inline assembly code:
#define read_msr(index, buf) asm volatile ("rdmsr" : "=d"(buf[1]), "=a"(buf[0]) : "c"(index))
The code using this macro:
u32 buf[2];
read_msr(0x173, buf);
I found the disassembly is (using gnu toolchain):
mov eax,0x173
mov ecx,eax
rdmsr
mov DWORD PTR [rbp-0xc],edx
mov DWORD PTR [rbp-0x10],eax
The question is that 0x173 is less than 0xffff, why gcc does not use mov cx, 0x173? Will the gcc analysis the following instruction rdmsr? Will the gcc always know the correct register size?
It depends on the size of the value or variable passed.
If you pass a "short int" it will set "cx" and read the data from "ax" and "dx" (if buf is a short int, too).
For char it would access "cl" and so on.
So "c" refers to the "ecx" register, but this is accessed with "ecx", "cx", or "cl" depending on the size of the access, which I think makes sense.
To test you can try passing (unsigned short)0x173, it should change the code.
There is no analysis of the inline assembly (in fact it is after text substitution direclty copied to the output assembly, including syntax errors). Also there is no default register size, depending on whether you have a 32 or 64 bit target. This would be way to limiting.
I think the answer is because the current default data size is 32-bit. In 64-bit long mode, the default data size is also 32-bit, unless you use "rex.w" prefix.
Intel specifies the RDMSR instruction as using (all of) ECX to determine the model specific register. That being the case, and apparently as specified by your macro, GCC has every reason to load your constant into the full ECX.
So the question about why it doesn't load CX seems completely inappropriate. It looks like GCC is generating the right code.
(You didn't ask why it stages the load of ECX inefficiently by using EAX; I don't know the answer to that).

Resources