any apps or the system kernel can access or even modify the content of CPU cahce and/or TLB?
I found a short description about the CPU cache from this webiste:
"No programming language has direct access to CPU cache. Reading and writing the cache is something done automatically by the hardware; there's NO way to write instructions which treat the cache as any kind of separate entity. Reads and writes to the cache happen as side-effect to all instructions that touch memory."
From this message, it seems there is no way to read/write the content of CPU cahce/TLB.
However, I also got another information that conflicts with the above one. That information implies that a debug tool may be able to dump/show the content of CPU cache.
Currently I'm confused. so please help me.
I got some answers from another post: dump the contents of TLB buffer of x86 CPU. Thanks adamdunson.
People could read this document about test registers, but it is only available on very old x86 machines test registers
Another descriptions from wiki https://en.wikipedia.org/wiki/Test_register:
A test register, in the Intel 80486 processor, was a register used by
the processor, usually to do a self-test. Most of these registers were
undocumented, and used by specialized software. The test registers
were named TR3 to TR7. Regular programs don't usually require these
registers to work. With the Pentium, the test registers were replaced
by a variety of model-specific registers (MSRs).
Two test registers, TR6 and TR7, were provided for the purpose of
testing. TR6 was the test command register, and TR7 was the test data
register. These registers were accessed by variants of the MOV
instruction. A test register may either be the source operand or the
destination operand. The MOV instructions are defined in both
real-address mode and protected mode. The test registers are
privileged resources. In protected mode, the MOV instructions that
access them can only be executed at privilege level 0. An attempt to
read or write the test registers when executing at any other privilege
level causes a general protection exception. Also, those instructions
generate invalid opcode exception on any CPU newer than 80486.
In fact, I'm still expecting some similar functions on Intel i7 or i5. Unfortunately, I do not find any related document about that. If anyone has such information, please let me know.
Related
I searched the web a lot but didn't find a short explanation about what write_cr0(read_cr0() | 0x10000) really do. It is related to the Linux kernel and I curios about developing LKM's. I want to know what this really do and what are the security issues with this.
It used to remove the write protection on the syscall table.
But how it is really works? and what does each thing in this line?
CR0 is one of the control registers available on x86 CPUs, which contains flags controlling CPU features related to memory protection, multitasking, paging, etc. You can find a full description in Volume 3, Section 2.5 of Intel's Software Developer's Manual.
These registers are accessed by special instructions that the compiler doesn't normally generate, so read_cr0() is a function which executes the instruction to read this register (via inline assembly) and returns the result in a general-purpose register. Likewise, write_cr0() writes to this register.
The function calls are likely to be inlined, so that the generated code would be something like
mov eax, cr0
or eax, 0x10000
mov cr0, eax
The OR with 0x10000 sets bit 16, the Write Protect bit. On early 32-bit x86 CPUs, code running at supervisor level (like the kernel) was always allowed to write all of virtual memory, regardless of whether the page was marked read-only. This bit makes that optional, so that when it is set, such accesses will cause page faults. This line of code probably follows an earlier line which temporarily cleared the bit.
I have an Ubuntu (trying this on either 14.04 or 16.04) KVM host with an Intel E5-26xx v3 processor.
There is a certain flag that I need to have exposed to the guest VM, but QEMU/libvirt is not exposing that, even if I use the cpu mode='host-passthrough' in my VM libvirt XML definition. I believe this is due to what is defined in this file /usr/share/libvirt/cpu_map.xml in which the flag that I like to get exposed is not defined.
So, I'd like to be able to modify cpu_map.xml and manually add the CPU flag definition, but I'm not positive on how/where I can get the results of the CPUID function and whether they're in ebx/ecx etc. Any pointer is appreciated.
Disclaimer: I haven't meddled into CPU architecture, so my knowledge is very limited in this area.
Retrieving the results from the CPUID instruction is quite simple:
Check if the ID flag (bit 21) in the EFLAGS register is set, which indicates the availability of the CPUID instruction
Set the EAX and ECX registers to certain values
Call CPUID
Interpret the values of the EAX, EBX, ECX, and EDX registers
There are many sites convering the interpretation of the results. One of them is LowLevel. Many of them only cover a subset of possible results.
A thread on the specifics of CPUID in VMs extends this basic knowledge to:
UserCPUID is what will be visible to the guest Ring-3 code running natively when using binary translation. With binary translation, typically only Ring-0 (or IOPL-3) code is subject to binary translation. Most Ring-3 code runs natively (in a mode we refer to as "direct execution.")
Prior to the introduction of CPUID faulting, there was no way to intercept guest execution of the CPUID instruction when the guest was running under direct execution. Some CPUs support a limited ability to override the results of some CPUID leaves (on a register by register basis) even without intercepting the CPUID instruction. Hence, userCPUID is based on hostCPUID, but the registers that can be overridden have guestCPUID values.
The host-passthrough model will aim to expose every host CPU feature to the guest, but there are some exceptions to this rule. If the CPU feature is very new, then QEMU, KVM and libvirt may not be aware of its existence. KVM is conservative by default and so will not expose any feature it doesn't know about. In this case, merely editting cpu_map.xml is not going to help as that only tells libvirt about it - you'd still need QEMU & KVM to know about it which requires code changes. The second case is that some CPU features are not safe to expose to the guest, and so KVM will explicitly block them.
You can see what libvirt believes the host to have by using 'virsh capabilities'
I am using the following code to flush a cache line on a raspberry pi 2:
static inline void flush(void addr)
{
asm volatile("mcr p15, 0, %0, c7, c6, 1"::"r"(addr));
}
I am getting an error that this is a privileged instruction when I run this. Is this code correct? Is there any way to flush the cache line from user space on this machine? On x86 clflush works without any modification.
Is this code correct?
As a matter of fact, no. That's some bogus non-existent system register encoding - cache maintenance operations live in the c7 space, not c12.
What's more incorrect, though, is the assumption that you can do this. Prior to ARMv8, all cache maintenance operations can only be executed in privileged modes. From userspace, you'd need support from the OS to allow you to request it; Linux, for example, has an ARM-specific syscall which GCC provides an interface to via __clear_cache() - there might be some permission-related caveats, although I don't see any reference to VMA permissions in the current mainline kernel code, so maybe it was a quirk of older kernels.
Either way, the only cache maintenance concern which really applies to userspace code is coherency between the instruction and data caches, to cater for JITs or self-modifying code. Things like data cache coherency with main memory should never be relevant to userspace code (which would normally be calling into driver code within the OS in situations where such things did matter), and on many systems require separate outer cache maintenance which only the OS is in a position to manage anyway.
In Windows the high memory of every process (0x80000000 or 0xc0000000)
Is reserved for kernel code, user code cannot access these regions of memory, if it tries so an access violation exception will be thrown.
I wish to know how is the kernel space protected ?
Is it via memory segmentations or via paging ?
I would like to hear a technical explanation.
Thanks a lot,
Michael.
Assuming you are talking about x86 and x64 architectures.
Memory protection is achieved using the paging system. Each page table entry on an x86/x64 CPU has a bit to indicate whether it is a user or supervisor page. Accesses to supervisor pages are only permitted for code running with CPL<3, whereas accesses to non supervisor pages are possible regardless of CPL.
CPL is the "Current Privilege Level" which is sometimes referred to as Ring. Windows only uses two rings, although the CPU implements 4. Ring 0 is the CPU mode in which what Windows refers to as "kernel mode" runs. Ring 3 is the CPU mode in which "User mode" runs. Since code running at CPL=3 cannot access supervisor pages, this is how memory protection is implemented.
The answer for ARM is likely to be similar, but different.
That's an easy one and doesn't require talking about rings and kernel behavior. Accessing virtual memory at a particular address requires that address to be mapped, the operating system has to allocate a memory page for that address. The low-level winapi function that does that is VirtualAlloc(). Which takes an optional address, first argument. The OS will simply fail a request for an unmappable address. Otherwise the exact same mechanism that prevents you from mapping any address in the lowest 64KB of the address space.
Once Windows has loaded an executable in memory and transfert execution to the entry point, do values in registers and stack are meaningful? If so, where can I find more informations about it?
Officially, the registers at the entry point of PE file do not have defined values. You're supposed to use APIs, such as GetCommandLine to retrieve the information you need. However, since the kernel function that eventually transfers control to the entry point did not change much from the old days, some PE packers and malware started to rely on its peculiarities. The two more or less reliable registers are:
EAX points to the entry point of the application (because the kernel function uses call eax to jump to it)
EBX points to the Process Environment Block (PEB).
Chapter 5 of Windows Internals Fifth Edition covers the mechanism of Windows creating a process in detail. That would give you more information about Windows loading an executable in memory and transferring execution to the entry point.
I found this up-to-date reference that covers how registers are used in various calling conventions on various operating systems and by various compilers. It's quite detailed, and seems comprehensive:
Agner Fog's Calling Conventions document