For a stack address I have the following PDE / PTE info from Windgb:
kd> !pte 6EFFC
VA 0006effc
PDE at C0600000 PTE at C0000370
contains 0000000065D39867 contains 0000000000000020
pfn 65d39 ---DA--UWEV not valid
DemandZero
Protect: 1 - Readonly
How does WinDBG find out about the readonly state if even the PTE is not valid and how can it be changed? Has to be done via VAD?
If the 'valid' bit of the PTE is not set (which is the case in your example) then the PTE is handled by the operating system, not by the MMU.
In this case your PTE is a software PTE (_MMPTE_SOFTWARE structure; != _MMPTE_HARDWARE [you can 'dt' both structures on windbg]), which can results in 4 types of software PTE, depending on the bits set in the bitfield.
If bits 12 to 31 are all zero, then this is a "Demand Zero" PTE (thus, not resolved via VAD). Bits 5 to 9 indicates page protection (0x20 = 5th bit set = Read Only).
Protection bits are not officially documented, although you can find their values on some pages on the net. Taken from this reactos page:
#define MM_ZERO_ACCESS 0 // this value is not used.
#define MM_READONLY 1
#define MM_EXECUTE 2
#define MM_EXECUTE_READ 3
#define MM_READWRITE 4 // bit 2 is set if this is writable.
#define MM_WRITECOPY 5
#define MM_EXECUTE_READWRITE 6
#define MM_EXECUTE_WRITECOPY 7
#define MM_NOCACHE 8
#define MM_DECOMMIT 0x10
#define MM_NOACCESS MM_DECOMMIT|MM_NOCACHE
(Note: remember you have to left shift by 5 the above constants as protection bits start at bit 5)
See this blog post "Windows Virtual Address Translation and the Pagefile" (especially the part discussing Software PTEs) for a very good explanation about the various PTEs.
Related
While executing in the kernel mode, is there any way to get the userspace CR3 value when Page Table Isolation(PTI) is enabled?
In current Linux, see arch/x86/entry/calling.h for asm .macro SWITCH_TO_USER_CR3_NOSTACK and other stuff to see how Linux flips between kernel vs. user CR3. And the earlier comment on the constants it uses:
/*
* PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two
* halves:
*/
#define PTI_USER_PGTABLE_BIT PAGE_SHIFT
#define PTI_USER_PGTABLE_MASK (1 << PTI_USER_PGTABLE_BIT)
#define PTI_USER_PCID_BIT X86_CR3_PTI_PCID_USER_BIT
#define PTI_USER_PCID_MASK (1 << PTI_USER_PCID_BIT)
#define PTI_USER_PGTABLE_AND_PCID_MASK (PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK)
It looks like the kernel CR3 is always the lower one, so setting bit 12 in the current CR3 always makes it point to the user-space page directory. (If the current task has a user-space, and if PTI is enabled. These asm macros are only used in code-paths that are about to return to user-space.)
.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
...
mov %cr3, \scratch_reg
...
.Lwrcr3_\#:
/* Flip the PGD to the user version */
orq $(PTI_USER_PGTABLE_MASK), \scratch_reg
mov \scratch_reg, %cr3
These macros are used in entry_64.S, entry_64_compat.S, and entry_32.S in paths that returns to user-space.
There's presumably a cleaner way to access user-space page tables from C.
Your best bet might be to look at the page-fault handler to find out how it accesses the process's page table. (Or mmap's implementation of MAP_POPULATE).
I need to specify user configuration settings for JTAGenum Jtag enumeration utility.
In particular, in this part
// Target specific, check your documentation or guess
#define SCAN_LEN 1890 // used for IR enum. bigger the better
#define IR_LEN 5
// IR registers must be IR_LEN wide:
#define IR_IDCODE "01100" // always 011
#define IR_SAMPLE "10100" // always 101
#define IR_PRELOAD IR_SAMPLE
In the user manual noted that IR_LEN defines the length of the JTAG instruction register. "If you change this you should also add ‘0’s to each of the corresponding IR_** instruction definitions."
JTAG Instruction Register (IR) length for the target CPU is 4 bits.
So I set IR_LEN = 4.
Not clear if I should change also #define IR_IDCODE and #define IR_PRELOAD values, and where should be added mentioned "‘0’s to each of the corresponding IR_** instruction definitions."
Your starting point has a register length of 5 and shows five bit codes.
In keeping with your quoted instruction, it would appear that when adapting this for a device with a 4 bit register, you should shorten these codes to four bits, by removing the final zero from each.
ie
#define IR_LEN 4
// IR registers must be IR_LEN wide:
#define IR_IDCODE "0110" // always 011
#define IR_SAMPLE "1010" // always 101
I'm working on a x86 system with Linux 3.6.0. For some experiments, I need to know how the IRQ is mapped to the vector. I learn from many book saying that for vector 0x0 to 0x20 is for traps and exceptions, and from vector 0x20 afterward is for the external device interrupts. And this also defined in the source code Linux/arch/x86/include/asm/irq_vectors.h
However, what I'm puzzled is that when I check the do_IRQ function,
http://lxr.linux.no/linux+v3.6/arch/x86/kernel/irq.c#L181
I found the IRQ is fetched by looking up the "vector_irq" array:
unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);
/* high bit used in ret_from_ code */
unsigned vector = ~regs->orig_ax;
unsigned irq;
...
irq = __this_cpu_read(vector_irq[vector]); // get the IRQ from the vector_irq
// print out the vector_irq
prink("CPU-ID:%d, vector: 0x%x - irq: %d", smp_processor_id(), vector, irq);
}
By instrumenting the code with printk, the vector-irq mapping I got is like below and I don't have any clue why this is the mapping. I though the mapping should be (irq + 0x20 = vector), but it seems not the case.
from: Linux/arch/x86/include/asm/irq_vector.h
* Vectors 0 ... 31 : system traps and exceptions - hardcoded events
* Vectors 32 ... 127 : device interrupts = 0x20 – 0x7F
But my output is:
CPU-ID=0.Vector=0x56 (irq=58)
CPU-ID=0.Vector=0x66 (irq=59)
CPU-ID=0.Vector=0x76 (irq=60)
CPU-ID=0.Vector=0x86 (irq=61)
CPU-ID=0.Vector=0x96 (irq=62)
CPU-ID=0.Vector=0xa6 (irq=63)
CPU-ID=0.Vector=0xb6 (irq=64)
BTW, these irqs are my 10GB ethernet cards with MSIX enabled. Could anyone give me some ideas about why this is the mapping? and what's the rules for making this mapping?
Thanks.
William
The irq number (which is what you use in software) is not the same as the vector number (which is what the interrupt controller actually uses).
The x86 I/OAPIC interrupt controller assigns interrupt priorities in groups of 16, so the vector numbers are spaced out to prevent them from interfering with each other
(see the function __assign_irq_vector in arch/x86/kernel/apic/io_apic.c).
I guess my question is how the vectors are assigned for a particular
IRQ number and what's are the rules behind.
The IOAPIC supports a register called IOREDTBL for each IRQ input. Software assigns the desired vector number for the IRQ input using bit 7-0 of this register. It is this vector number that serves as an index into the processors Interrupt Descriptor Table. Quoting the IOAPIC manual (82093AA)
7:0 Interrupt Vector (INTVEC)—R/W: The vector field is an 8 bit field
containing the interrupt vector for this interrupt. Vector values
range from 10h to FEh.
Note that these registers are not directly accessible to software. To access IOAPIC registers (not to be confused with Local APIC registers) software must use the IOREGSEL and IOWIN registers to indirectly interact with the IOAPIC. These registers are also described in the IOAPIC manual.
The source information for the IOAPIC can be a little tricky to dig up. Here's a link to the example I used:
IOAPIC data sheet link
set $eflags does not change eflags value.
The old eflags value remains after eg. =>$set $eflag=0x243 [this is just an example input].
Alternatively, is there any way to set individual flags of eflags?
I'm looking for something like: set ZF[zero flag]. Is there a gdb command to do that?
set $eflags without parenthesis works in GDB 7.7.1
To set an individual flag, use its index. E.g., ZF is the 6th bit, so we can set it with:
set $ZF = 6 # define a GDB variable: no effect on registers
set $eflags |= (1 << $ZF) # set bit 6 in EFLAGS, the ZF bit.
or just directly:
set $eflags |= (1 << 6)
The same goes for all other bitwise operations: How do you set, clear, and toggle a single bit?
# Clear
set $eflags &= ~(1 << $ZF)
# Toggle
set $eflags ^= (1 << $ZF)
What causes confusion is that many bits are either reserved, cannot be modified directly by any instruction, or cannot be modified from user mode, see also: How to read and write x86 flags registers directly? and so GDB does not touch them.
For example:
(gdb) set $eflags = 0
(gdb) i r eflags
eflags 0x202 [ IF ]
(gdb) set $eflags = 0xFFFFFFFF
(gdb) i r eflags
eflags 0x54fd7 [ CF PF AF ZF SF TF IF DF OF NT RF AC ]
0x202 in binary is:
0010 0000 0010
0x54fd7 in binary is:
0101 0100 1111 1101 0111
TODO understand why each of those bits were set or not, by looking at the manual http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf and GDB source code.
Ones that I understand:
all reserved registers were left at their fixed value: 1 for bit 1, and 0 for bits 3, 5, 15 and 22-31
set ($eflags)=0x243
worked in my tests for any hex value.
It's wrong to set all flags in eflags register. So some bits reserved and must be 0.(3,5,15,22 and greater) bit 1 must be 1. There is rflags too. But all hi dword is zero. So there is no need to use rflags instead of eflags for all operations changed flags. But I know peoples that use free bits for own usage.
More suitable rflags hi dword. So in 64-bit architecture enough free registers to use. But in 32-bit architecture, no. So strongly recommend to do so.
Because in future architectures some of these bits may be used. But these flags are not touched from changing 32-bit to 64-bit. If the only register that may be not changed at all. So all possible reasons for any case already used. I don't imagine any situation that may be used some additional flag don't be used till now. It may be to some cardinal processor architecture change. I don't think some decide to do so for obvious reason all soft must be thrown out and rewritten from the very beginning. It's extremely hard and huge work.
eflags [ ZF ]
And if you want to set arbitrary value use this
eflags 0x42
What is the maximum size that we can allocate using kzalloc() in a single call?
This is a very frequently asked question. Also please let me know if i can verify that value.
The upper limit (number of bytes that can be allocated in a single kmalloc / kzalloc request), is a function of:
the processor – really, the page size – and
the number of buddy system freelists (MAX_ORDER).
On both x86 and ARM, with a standard page size of 4 Kb and MAX_ORDER of 11, the kmalloc upper limit on a single call is 4 MB!
Details, including explanations and code to test this, here:
http://kaiwantech.wordpress.com/2011/08/17/kmalloc-and-vmalloc-linux-kernel-memory-allocation-api-limits/
No different to kmalloc(). That's the question you should ask (or search), because kzalloc is just a thin wrapper that sets GFP_ZERO.
Up to about PAGE_SIZE (at least 4k) is no problem :p. Beyond that... you're right to say lots of people people have asked, it's definitely something you have to think about. Apparently it depends on the kernel version - there used to be a hard 128k limit, but it's been increased (or maybe dropped altogether) now. That's just the hard limit though, what you can actually get depends on a given system. (And very definitely on the kernel version).
Maybe read What is the difference between vmalloc and kmalloc?
You can always "verify" the allocation by checking the return value from kzalloc(), but by then you've probably already logged an allocation failure backtrace. Other than that, no - I don't think there's a good way to check in advance.
However, it depends on your kernel version and config. These limits normally locate in linux/slab.h, usually descripted as below(this example is under linux 2.6.32):
#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
(MAX_ORDER + PAGE_SHIFT - 1) : 25)
#define KMALLOC_MAX_SIZE (1UL << KMALLOC_SHIFT_HIGH)
#define KMALLOC_MAX_ORDER (KMALLOC_SHIFT_HIGH - PAGE_SHIFT)
And you can test them with code below:
#include <linux/module.h>
#include <linux/slab.h>
int init_module()
{
printk(KERN_INFO "KMALLOC_SHILFT_LOW:%d, KMALLOC_SHILFT_HIGH:%d, KMALLOC_MIN_SIZE:%d, KMALLOC_MAX_SIZE:%lu\n",
KMALLOC_SHIFT_LOW, KMALLOC_SHIFT_HIGH, KMALLOC_MIN_SIZE, KMALLOC_MAX_SIZE);
return 0;
}
void cleanup_module()
{
return;
}
Finally, the results under linux 2.6.32 32bits are: 3, 22, 8, 4194304, it means the min size is 8 bytes, and the max size is 4MB.
PS.
you can also check the actual size of memory allocated by kmalloc, just use ksize(), i.e.
void *p = kmalloc(15, GFP_KERNEL);
printk(KERN_INFO "%u\n", ksize(p)); /* this will print "16" under my kernel */