SYSENTER on Intel CPUs

SYSENTER on Intel CPUs - winapi

So AFAIK the syscall instruction, is AMD's equivalent to sysenter. So in theory one should only find a syscall instruction on AMD chips right? Well, apparently that's not the case, as I was messing with ntdll.dll and ntdll.dll (WOW64 version), I found that the regular version uses syscall whereas the ntdll.dll from WOW64 uses sysenter. Why is that?

All x86-64 CPUs support syscall in 64-bit mode; it's the only way to make 64-bit system calls.
32-bit code uses whatever the CPU supports that's faster than int.
Your info about only AMD supporting syscall is true only in 32-bit user-space mode (legacy and compat modes).
Intel's sysenter became the primary choice for 32-bit user-space; Intel won that fight for dominance. Also, apparently AMD's legacy-mode syscall is a nightmare for the kernel to deal with; 32-bit Linux kernels don't even enable it. 64-bit Linux kernels do allow syscall from 32-bit user-space (compat mode) on AMD CPUs that support that. (Some links to the relevant comments on kernel asm entry points in this answer.)
Note that AMD CPUs don't support sysenter in compat mode, only legacy mode, so under a 64-bit kernel apparently you have to use syscall in 32-bit user-space if you want to avoid the slow int 0x80 on AMD.
AMD designed AMD64 (which became x86-64), and defined a new (fairly good) behaviour for how syscall works in 64-bit mode which is different from how it works in 32-bit mode. (e.g. in 64-bit userspace it saves the old RFLAGS into R11, which doesn't exist in legacy mode and thus can't be what it does there.)
Intel adopted the 64-bit syscall as part of implementing their version of x86-64 in a way that's compatible with AMD's. (Modulo some implementation bugs, e.g. what happens if you attempt to sysret with a non-canonical RCX user-space return address; on Intel the fault is taken with privilege level = ring 0, but with RSP still the already-restored user-space stack => another thread can take over the kernel. So kernels can only use it safely if RCX is known safe.)
i.e. AMD's system call instruction won for x86-64 because they designed AMD64 while Intel was betting on IA-64 (Itanium); their syscall instruction became the only standard that anyone uses on x86-64 because there's no reason to use anything else. syscall is efficient and meets the needs of kernel devs.
Dispatching to pick an instruction that works on the current CPU is thus unnecessary.
https://reverseengineering.stackexchange.com/questions/16454/struggling-between-syscall-or-sysenter-windows explains more details.

Related

Why is everything named win32?

Involving the Windows operating system, a lot of things use the number 32, especially with Win32. I see it in everything from system folders:
C:\Windows\System32\
to system files:
C:\Windows\System32\win32k.sys
to Windows app development:
Develop a Win32 Application
The significance of the number 32 and computers makes me think 32-bit processors, but, if this is correct, why is there a need to explicitly mention 32-bit systems?
Googling around brought me to the Win32 API. This is, I presume, the main cause of its frequent use, but that doesn't change my question. The Windows operating system works perfectly fine on 64-bit systems.
Is Windows specialized for 32-bit systems?
Or is this just a historical thing (i.e. Windows and its API were developed before the 64-bit system emerged)?

Before Win32 there was Win16 (although perhaps maybe not using that name), and running 32-bit code was a special feature or it had special requirements, especially your CPU had to be able to do so.
Intel 8086 was a 16-bit CPU with 20-bit address space.
Intel 80286 was a 16-bit CPU with 24-bit address space.
Intel 80386 was full 32-bit, both registers and address space.

Restoring SIMD registers on Windows x64 during unwinding

On Windows x86-64, how are callee-saved SIMD registers restored during unwinding? If one was writing assembly code, how would one ensure that this happened properly?

How to monitor resources needed to set watchpoints in gdb?

On x86 GDB uses some special hardware resources (debug registers?) to set watchpoints. In some situations, when there is not enough of that resources, GDB will set the watchpoint, but it won't work.
Is there any way to programmatically monitor the availability of this resources on Linux? Maybe some info in procfs, or something. I need this info to choose machine in pool for debugging.
From GDB Internals:
"Since they depend on hardware resources, hardware breakpoints may be limited in number; when the user asks for more, gdb will start trying to set software breakpoints. (On some architectures, notably the 32-bit x86 platforms, gdb cannot always know whether there's enough hardware resources to insert all the hardware breakpoints and watchpoints. On those platforms, gdb prints an error message only when the program being debugged is continued.)"
"Too many different watchpoints requested. (On some architectures, this situation is impossible to detect until the debugged program is resumed.) Note that x86 debug registers are used both for hardware breakpoints and for watchpoints, so setting too many hardware breakpoints might cause watchpoint insertion to fail."
"The 32-bit Intel x86 processors feature special debug registers designed to facilitate debugging. gdb provides a generic library of functions that x86-based ports can use to implement support for watchpoints and hardware-assisted breakpoints."

I need this info to choose machine in pool for debugging.
No, you don't. The x86 debug registers (there are 4) are per-process resource, not per-machine resource [1]. You can have up to 4 hardware watchpoints for every process you are debugging. If someone else is debugging on the same machine, you are not going to interfere with each other.
[1] More precisely, the registers are multiplexed by the kernel: in the same way as e.g. the EAX register. Every process on the system and the kernel itself uses EAX, there is only a single EAX register on (single-core) CPU, yet it all works fine through the magic of time-slicing.

x86 virtual address length

When reading
Intel® 64 and IA-32 Architectures Software Developer Manuals
and Some relevant tutorials about protected mode.
I came across with this question.
According to the manual and the blog
http://translate.google.com/translate?hl=en&sl=auto&tl=en&u=http%3A%2F%2Fwww.fh-zwickau.de%2Fdoc%2Fprmo%2Fpmtutor%2Ftext%2Fr_phys7.htm (translated by google)
The virtual address should be 16 + 32 bits,am I right?
So, what is the address with provided when programming using some low level assembly languages? Or, simply put it, what is the address we saw when we are debugging?
It's 32bits I assume.
Is the address in programming or debugging, linear address?
Thanks very much.

In usermode on modern x86 systems like windows/*nix, the virtual addresses are either 64-bit (though some bits are currently unused) or 32-bit.

Virtual addresses on x86 machines are always 32bit, usually with userspace on addresses 0 - 0x7FFFFFFF and kernel addresses 0x8000000 - 0xFFFFFFFF (there are exceptions how the address space is split, of course). The page you posted link to speaks about 16-bit real mode, where addresses are 16-bit but in fact there can be addressed only 1MB of memory, because also the segmentation was used (additional register).
I am not sure why you speak about 16 + 32 bits - maybe you mixed the virtual addressing and segmentation. There are still 16-bit segmentation registers, however, segmentation is deprecated an on most operating systems it is not used. See the Intel manuals you mentioned for details how this works.
Size of virtual addresses has nothing to do with low-level assembly, there you still usually write 32-bit apps (as long as you are not writing your own OS which requires some real-mode code to boot on x86).

16-bit Assembly on 64-bit Windows?

I decided to start learning assembly a while ago, and so I started with 16-bit assembly, using FASM.
However, I recently got a really new computer running Windows 7 64-bit, and now none of the compiled .COM files that the program assembles work any more. They give an error message, saying that the .COM is not compatible with 64-bit however.
32-bit assemblies still work, however I'd rather start with 16 and work my way up...
Is it possible to run a 16-bit program on windows 7? Or is there a specific way to compile them? Or should I give up and skip to 32-bit instead?

The reason you can't use 16-bit assembly is because the 16-bit subsystem has been removed from all 64-bit versions of Windows.
The only way to remedy this is to install something like DOSBox, or a virtual machine package such as VirtualBox and then install FreeDOS into that. That way, you get true DOS anyway. (NTVDM is not true DOS.)
Personally, would I encourage writing 16-bit assembly for DOS? No. I'd use 32- or even 64-bit assembly, the reason being there are a different set of function calls for different operating systems (called the ABI). So, the ABI for 64-bit Linux applications is different to 32-bit ones. I am not sure if that's the case with Windows. However, I guarantee that the meaning of interrupts is probably different.
Also, you've got all sorts of things to consider with 16-bit assembly, like the memory model in use. I might be wrong, but I believe DOS gives you 64K memory to play with "and that's it". Everything, your entire heap and stack along with code must fit into this space, as I understand it, which makes you wonder how anything ever worked, really.

My advice would be to just write 32-bit code. While it might initially seem like it would make sense to learn how to write 16-bit code, then "graduate" to 32-bit code, I'd say in reality rather the opposite is true: writing 32-bit code is actually easier because quite a few arbitrary architectural constraints (e.g., on what you can use as a base register) are basically gone in 32-bit code.
For that matter, I'd consider it open to substantial question whether there's ever a real reason to write 16-bit x86 code at all. For most practical purposes, it's a dead platform -- for desktop machines it's seriously obsolete, and for embedded machines, you're more likely to see things like ARMs or Microchip PICs. Unless you have a specific target in mind and know for sure that it's going to be a 16-bit x86, I'd probably forget that it existed, just like most of the rest of the world has.

32-bit Windows 7 and older include / enable NTVDM by default. On 32-bit Win8+, you can enable it in Windows Features.
On 64-bit Windows (or any other 64-bit OS), you need an emulator or full virtualization.
A kernel in long mode can't use vm86 mode to provide a virtual 8086 real-mode environment. This is a limitation of the AMD64 / x86-64 architecture.
With a 64-bit kernel running, the only way for your CPU to natively run in 16-bit mode is 16-bit protected mode (yes this exists; no, nobody uses it, and AFAIK mainstream OSes don't provide a way to use it). Or for the kernel to switch the CPU out of long mode back to legacy mode, but 64-bit kernels don't do that.
But actually, with hardware virtualization (VirtualBox, Hyper-V or whatever using Intel VT-x or AMD SVM), a 64-bit kernel can be the hypervisor for an entire virtual machine, whether that VM is running in 16-bit real mode or running a 32-bit OS (like Windows 98 or 2000) which can in turn use vm86 mode to run 16-bit real-mode executables.
Especially on a 64-bit kernel, it's usually easier to just emulate a 16-bit PC entirely (like DOSBOX does), instead of using HW virtualization to running normal instructions natively but trap direct hardware access (in / out, loads/stores to VGA memory, etc.) and int instructions that make DOS system calls / BIOS calls / whatever.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio