Debugging 0xc000001d exception with WinDbg - windows

I'm trying to find a root cause of the "Illegal instruction" exception (0xc000001d) with WinDbg.
The project was built with VC++2015. I've got two memory dumps from two test runs.
For now I found the following that is true for both dumps:
the exception points to the "movq mmword ptr [ecx], xmm0" instruction
xmm0 contains zeros
the exception occurs in an object constructor
the address is inside DS
the address belongs to a heap entry which looks valid
the address points to the object is being constructed, so it seems like it tries to put zero to the obj.m_data member that looks valid too
I have no idea where to go further, so I'd appreciate any directions.
UPD:
...
movq xmm0,mmword ptr [esi]
lea ecx,[edi+94h]
movq mmword ptr [ecx],xmm0 ; << this causes the exception

Illegal instruction is raised when the operating system handles a fault from the CPU where it has failed to decode an instruction. This can occur if an instruction extension is not supported by the CPU or the operating system.
msdn : illegal instruction AVX. In this case the bug in msvc 2013 occurred when the CPU supported AVX, but the operating system did not.
The CPUs which are failing don't appear to support SSE2, which is a likely cause for this issue.
In the case I came across the AVX issue, when using a tool to identify if AVX was used, there was a CPU test which decided that the AVX was not supported by the tool (supplied by Intel).
I am not aware of a tool by AMD, and would be wary of such a tool working, as it may be that it is the operating system support which is missing.
Update
Why does an instruction fail if the operating system does not support it? An example of this is the AVX instructions, which from wikipedia : AVX states.
AVX adds new register-state through the 256-bit wide YMM register file, so explicit operating system support is required to properly save and restore AVX's expanded registers between context switches.
Any change to the work or memory needed by the operating system, probably requires explicit opt-in. In the case of AVX, the extra registers changed the amount of data stored for a context switch.

Related

How does the tct command work under the hood?

The windbg command tct executes a program until it reaches a call instruction or a ret instruction. I am wondering how the debugger implements this functionality under the hood.
I could imagine that the debugger scans the instructions from the current instructions for the next call or ret and sets according breakpoints on the found instructions. However, I think this is unlikely because it would also have to take into account jmp instructions so that there are an arbitrary number of possible call or ret instructions where such a breakpoint would have to be set.
On the other hand, I wonder if the x86/x64 CPU provides a functionality that raises an exception to be caught by the debugger whenever the CPU is about to process a call or ret instruction. Yet, I have not heard of such a functionality.
I'd guess that it single-steps repeatedly, until the next instruction is a call or ret, instead of trying to figure out where to set a breakpoint. (Which in the general case could be as hard as solving the Halting Problem.)
It's possible it could optimize that by scanning forward over "straight line" code and setting a breakpoint on the next jmp/jcc/loop or other control-transfer instruction (e.g. xabort), and also catching signals/exceptions that could transfer control to an SEH handler.
I'm also not aware of any HW support for breaking on a certain type of instruction or opcode: the x86 debug registers DR0..7 allow hardware breakpoints at code addresses without rewriting machine code to int3, and also hardware watchpoints (to trap data load/store to a specific address or range of addresses). But not filtering by opcode.

What does write_cr0(read_cr0() | 0x10000) do?

I searched the web a lot but didn't find a short explanation about what write_cr0(read_cr0() | 0x10000) really do. It is related to the Linux kernel and I curios about developing LKM's. I want to know what this really do and what are the security issues with this.
It used to remove the write protection on the syscall table.
But how it is really works? and what does each thing in this line?
CR0 is one of the control registers available on x86 CPUs, which contains flags controlling CPU features related to memory protection, multitasking, paging, etc. You can find a full description in Volume 3, Section 2.5 of Intel's Software Developer's Manual.
These registers are accessed by special instructions that the compiler doesn't normally generate, so read_cr0() is a function which executes the instruction to read this register (via inline assembly) and returns the result in a general-purpose register. Likewise, write_cr0() writes to this register.
The function calls are likely to be inlined, so that the generated code would be something like
mov eax, cr0
or eax, 0x10000
mov cr0, eax
The OR with 0x10000 sets bit 16, the Write Protect bit. On early 32-bit x86 CPUs, code running at supervisor level (like the kernel) was always allowed to write all of virtual memory, regardless of whether the page was marked read-only. This bit makes that optional, so that when it is set, such accesses will cause page faults. This line of code probably follows an earlier line which temporarily cleared the bit.

Insert an undefined instruction in X86 code to be detected by Intel PIN

I'm using a PIN based simulator to test some new architectural modifications. I need to test a "new" instruction with two operands (a register and a memory location) using my simulator.
Since it's tedious to use GCC Machine description to add only one instructions it seemed logical to use NOPs or Undefined Instructions. PIN would easily be able to detect a NOP instruction using INS_IsNop, but it would interfere with NOPs added naturally to the code, It also has either no operands or a single memory operand.
The only option left is to use and undefined instruction. undefined instructions would never interfere with the rest of the code, and can be detected by PIN using INS_IsInvalid.
The problem is I don't know how to add an undefined instruction (with operands) using GCC inline assembly. How do I do that?
So it turns out that x86 has an explicit "unknown instruction" (see this). gcc can produce this by simply using:
asm("ud2");
As for an undefined instruction with operands, I'm not sure what that would mean. Once you have an undefined opcode, the additional bytes are all undefined.
But maybe you can get what you want with something like:
asm(".byte 0x0f, 0x0b");
Try using a prefix that doesn't normally apply to an instruction. e.g.
rep add eax, [rsi + rax*4 - 15]
will assemble just fine. Some instruction set extensions are done this way. e.g. lzcnt is encoded as rep bsf, so it executes as bsf on older CPUs, rather than generating an illegal instruction exception. (Prefixes that don't apply are ignored, as required by the x86 ISA.)
This will let you take advantage of the assembler's ability to encode instruction operands, which as David Wohlferd notes in his answer, is a problem if you use ud2.

Can I dump/modify the content of x86 CPU cache/TLB

any apps or the system kernel can access or even modify the content of CPU cahce and/or TLB?
I found a short description about the CPU cache from this webiste:
"No programming language has direct access to CPU cache. Reading and writing the cache is something done automatically by the hardware; there's NO way to write instructions which treat the cache as any kind of separate entity. Reads and writes to the cache happen as side-effect to all instructions that touch memory."
From this message, it seems there is no way to read/write the content of CPU cahce/TLB.
However, I also got another information that conflicts with the above one. That information implies that a debug tool may be able to dump/show the content of CPU cache.
Currently I'm confused. so please help me.
I got some answers from another post: dump the contents of TLB buffer of x86 CPU. Thanks adamdunson.
People could read this document about test registers, but it is only available on very old x86 machines test registers
Another descriptions from wiki https://en.wikipedia.org/wiki/Test_register:
A test register, in the Intel 80486 processor, was a register used by
the processor, usually to do a self-test. Most of these registers were
undocumented, and used by specialized software. The test registers
were named TR3 to TR7. Regular programs don't usually require these
registers to work. With the Pentium, the test registers were replaced
by a variety of model-specific registers (MSRs).
Two test registers, TR6 and TR7, were provided for the purpose of
testing. TR6 was the test command register, and TR7 was the test data
register. These registers were accessed by variants of the MOV
instruction. A test register may either be the source operand or the
destination operand. The MOV instructions are defined in both
real-address mode and protected mode. The test registers are
privileged resources. In protected mode, the MOV instructions that
access them can only be executed at privilege level 0. An attempt to
read or write the test registers when executing at any other privilege
level causes a general protection exception. Also, those instructions
generate invalid opcode exception on any CPU newer than 80486.
In fact, I'm still expecting some similar functions on Intel i7 or i5. Unfortunately, I do not find any related document about that. If anyone has such information, please let me know.

inline assembly error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'

I have an inline AT&T style assembly block, which works with XMM registers and there are no problems in Release configuration of my XCode project, however I've stumbled upon this strange error (which is supposedly a GCC bug) in Debug configuration... Can I fix it somehow? There is nothing special in assembly code, but I am using a lot of memory constraints (12 constraints), can this cause this problem?
Not a complete answer, sorry, but the comments section is too short for this ...
Can you post a sample asm("..." :::) line that demonstrates the problem ?
The use of XMM registers is not the issue, the error message indicates that GCC wanted to create code like, say:
movdqa (%rax),%xmm0
i.e. memory loads/stores through pointers held in general registers, and you specified more memory locations than available general-purpose regs (it's probably 12 in debug mode because because RBP, RSP are used for frame/stackpointer and likely RBX for the global offset table and RAX reserved for returns) without realizing register re-use potential.
You might be able to eek things out by doing something like:
void *all_mem_args_tbl[16] = { memarg1, memarg2, ... };
void *trashme;
asm ("movq (%0), %1\n\t"
"movdqa (%1), %xmm0\n\t"
"movq 8(%0), %1\n\t"
"movdqa (%1), %xmm1\n\t"
...
: "r"all_mem_args_tbl : "r"(trashme) : ...);
i.e. put all the mem locations into a table that you pass as operand, and then manage the actual general-purpose register use on your own. It might be two pointer accesses through the indirection table, but whether that makes a difference is hard to say without knowing your complete assembler code piece.
The Debug configuration uses -O0 by default. Since this flag disables optimisations, the compiler is probably not being able to allocate registers given the constraints specified by your inline assembly code, resulting in register starvation.
One solution is to specify a different optimisation level, e.g. -Os, which is the one used by default in the Release configuration.

Resources