Segment definitions for linux on x86 - linux-kernel

Linux 3.4.6 defines the following macros in arch/x86/include/asm/segment.h. Can anybody explain why the __USER macros add 3 to the defined constant and why this is not done for __KERNEL macros?
#define __KERNEL_CS (GDT_ENTRY_KERNEL_CS*8)
#define __KERNEL_DS (GDT_ENTRY_KERNEL_DS*8)
#define __USER_DS (GDT_ENTRY_DEFAULT_USER_DS*8+3)
#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8+3)

These four symbols represent segment descriptors. The two least-significant bits of these descriptors contain the privilege level associated with them, and the third least-significant bit contains the descriptor table type (GDT or LDT). This is made clearer by code occurring a little later:
/* User mode is privilege level 3 */
#define USER_RPL 0x3
/* LDT segment has TI set, GDT has it cleared */
#define SEGMENT_LDT 0x4
#define SEGMENT_GDT 0x0
/* Bottom two bits of selector give the ring privilege level */
#define SEGMENT_RPL_MASK 0x3
/* Bit 2 is table indicator (LDT/GDT) */
#define SEGMENT_TI_MASK 0x4
To achieve this, the descriptor table entry is multiplied by 8, which shifts it three bits to the left, and then ORed with the table type and privilege level (using addition):
/* GDT, ring 0 (kernel mode) */
#define __KERNEL_CS (GDT_ENTRY_KERNEL_CS*8)
/* GDT, ring 3 (user mode) */
#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8+3)

Related

Prefetching in vm_fault(), Linux drivers

I'm implementing a simple device driver. The program that uses this driver takes in arguments from the user whether to use demand paging or prefetching(fetches next page only). But when the user requests for prefetching is should send this information to the driver. The problem is vm_fault has a standard structure as follows:
int (*fault)(struct vm_area_struct *vma, struct vm_fault *vmf);
So how to incorporate this additional information of prefetching into these, so that I can use it to write a different routine for prefetching?
Or is there any other way to achieve this?
[EDIT]
To give a clearer picture:
This how a program takes input.
./user_prog [filename] --prefetch
The user_prog sets some flags in it, now how to send these flags information to dev.c(the driver file), as all the arguments to functions are fixed like above fault(). I hope this gives more clarification.
You can use the flags in mmap() to pass your custom flags too.
void *mmap(void *addr, size_t length, int prot, int flags,
int fd, off_t offset);
Make sure your custom flag values uses bits different from the flag values used by mmap(). From the manpage, the macros are defined in sys/mman.h. Find the exact values(may vary across systems) with echo '#include <sys/mman.h>' | gcc -E - -dM | grep MAP_*. My system has this:
#define MAP_32BIT 0x40
#define MAP_TYPE 0x0f
#define MAP_EXECUTABLE 0x01000
#define MAP_FAILED ((void *) -1)
#define MAP_PRIVATE 0x02
#define MAP_ANON MAP_ANONYMOUS
#define MAP_LOCKED 0x02000
#define MAP_STACK 0x20000
#define MAP_NORESERVE 0x04000
#define MAP_HUGE_SHIFT 26
#define MAP_POPULATE 0x08000
#define MAP_DENYWRITE 0x00800
#define MAP_FILE 0
#define MAP_SHARED 0x01
#define MAP_GROWSDOWN 0x00100
#define MAP_HUGE_MASK 0x3f
#define MAP_HUGETLB 0x40000
#define MAP_FIXED 0x10
#define MAP_ANONYMOUS 0x20
#define MAP_NONBLOCK 0x10000
Some non-clashing flags would be 0x200 and 0x400.

gcc with intel x86-32 bit assembly : accessing C function arguments

I am doing an operating system implementation work.
Here's the code first :
//generate software interrupt
void generate_interrupt(int n) {
asm("mov al, byte ptr [n]");
asm("mov byte ptr [genint+1], al");
asm("jmp genint");
asm("genint:");
asm("int 0");
}
I am compiling above code with -masm=intel option in gcc. Also,
this is not complete code to generate software interrupt.
My problem is I am getting error as n undefined, how do I resolve it, please help?
Also it promts error at link time not at compile time, below is an image
When you are using GCC, you must use GCC-style extended asm to access variables declared in C, even if you are using Intel assembly syntax. The ability to write C variable names directly into an assembly insert is a feature of MSVC, which GCC does not copy.
For constructs like this, it is also important to use a single assembly insert, not several in a row; GCC can and will rearrange assembly inserts relative to the surrounding code, including relative to other assembly inserts, unless you take specific steps to prevent it.
This particular construct should be written
void generate_interrupt(unsigned char n)
{
asm ("mov byte ptr [1f+1], %0\n\t"
"jmp 1f\n"
"1:\n\t"
"int 0"
: /* no outputs */ : "r" (n));
}
Note that I have removed the initial mov and any insistence on involving the A register, instead telling GCC to load n into any convenient register for me with the "r" input constraint. It is best to do as little as possible in an assembly insert, and to leave the choice of registers to the compiler as much as possible.
I have also changed the type of n to unsigned char to match the actual requirements of the INT instruction, and I am using the 1f local label syntax so that this works correctly if generate_interrupt is made an inline function.
Having said all that, I implore you to find an implementation strategy for your operating system that does not involve self-modifying code. Well, unless you plan to get a whole lot more use out of the self-modifications, anyway.
This isn't an answer to your specific question about passing parameters into inline assembly (see #zwol's answer). This addresses using self modifying code unnecessarily for this particular task.
Macro Method if Interrupt Numbers are Known at Compile-time
An alternative to using self modifying code is to create a C macro that generates the specific interrupt you want. One trick is you need to a macro that converts a number to a string. Stringize macros are quite common and documented in the GCC documentation.
You could create a macro GENERATE_INTERRUPT that looks like this:
#define STRINGIZE_INTERNAL(s) #s
#define STRINGIZE(s) STRINGIZE_INTERNAL(s)
#define GENERATE_INTERRUPT(n) asm ("int " STRINGIZE(n));
STRINGIZE will take a numeric value and convert it into a string. GENERATE_INTERRUPT simply takes the number, converts it to a string and appends it to the end of the of the INT instruction.
You use it like this:
GENERATE_INTERRUPT(0);
GENERATE_INTERRUPT(3);
GENERATE_INTERRUPT(255);
The generated instructions should look like:
int 0x0
int3
int 0xff
Jump Table Method if Interrupt Numbers are Known Only at Run-time
If you need to call interrupts only known at run-time then one can create a table of interrupt calls (using int instruction) followed by a ret. generate_interrupt would then simply retrieve the interrupt number off the stack, compute the position in the table where the specific int can be found and jmp to it.
In the following code I get GNU assembler to generate the table of 256 interrupt call each followed by a ret using the .rept directive. Each code fragment fits in 4 bytes. The result code generation and the generate_interrupt function could look like:
/* We use GNU assembly to create a table of interrupt calls followed by a ret
* using the .rept directive. 256 entries (0 to 255) are generated.
* generate_interrupt is a simple function that takes the interrupt number
* as a parameter, computes the offset in the interrupt table and jumps to it.
* The specific interrupted needed will be called followed by a RET to return
* back from the function */
extern void generate_interrupt(unsigned char int_no);
asm (".pushsection .text\n\t"
/* Generate the table of interrupt calls */
".align 4\n"
"int_jmp_table:\n\t"
"intno=0\n\t"
".rept 256\n\t"
"\tint intno\n\t"
"\tret\n\t"
"\t.align 4\n\t"
"\tintno=intno+1\n\t"
".endr\n\t"
/* generate_interrupt function */
".global generate_interrupt\n" /* Give this function global visibility */
"generate_interrupt:\n\t"
#ifdef __x86_64__
"movzx edi, dil\n\t" /* Zero extend int_no (in DIL) across RDI */
"lea rax, int_jmp_table[rip]\n\t" /* Get base of interrupt jmp table */
"lea rax, [rax+rdi*4]\n\t" /* Add table base to offset = jmp address */
"jmp rax\n\t" /* Do sepcified interrupt */
#else
"movzx eax, byte ptr 4[esp]\n\t" /* Get Zero extend int_no (arg1 on stack) */
"lea eax, int_jmp_table[eax*4]\n\t" /* Compute jump address */
"jmp eax\n\t" /* Do specified interrupt */
#endif
".popsection");
int main()
{
generate_interrupt (0);
generate_interrupt (3);
generate_interrupt (255);
}
If you were to look at the generated code in the object file you'd find the interrupt call table (int_jmp_table) looks similar to this:
00000000 <int_jmp_table>:
0: cd 00 int 0x0
2: c3 ret
3: 90 nop
4: cd 01 int 0x1
6: c3 ret
7: 90 nop
8: cd 02 int 0x2
a: c3 ret
b: 90 nop
c: cc int3
d: c3 ret
e: 66 90 xchg ax,ax
10: cd 04 int 0x4
12: c3 ret
13: 90 nop
...
[snip]
Because I used .align 4 each entry is padded out to 4 bytes. This makes the address calculation for the jmp easier.

st_mode of a symlink has weird value

I'm stat()'ing this symlink (on Kubuntu GNU/Linux 16.04), and am getting the weird value of 0100600 octal (33152 decimal). If I bitwise-and it with S_IFMT (which is 0170000 octal), I get 0600 octal. What does that mean? stat.h lists the following values:
/* File types. */
#define __S_IFDIR 0040000 /* Directory. */
#define __S_IFCHR 0020000 /* Character device. */
#define __S_IFBLK 0060000 /* Block device. */
#define __S_IFREG 0100000 /* Regular file. */
#define __S_IFIFO 0010000 /* FIFO. */
#define __S_IFLNK 0120000 /* Symbolic link. */
#define __S_IFSOCK 0140000 /* Socket. */
I'm expecting to see 0120000, not 0600 (all octal). What gives?
Based on #dave_thompson_085's observation: Indeed, stat() follows symlinks; I should be calling lstat() - which does exactly the same thing, but doesn't follow the link.

PE/COFF symbol type field

Microsoft's documentation for PE/COFF says of the type field in the symbol table:
"The most significant byte specifies whether the symbol is a pointer to, function returning, or array of the base type that is specified in the LSB. Microsoft tools use this field only to indicate whether the symbol is a function, so that the only two resulting values are 0x0 and 0x20 for the Type field."
However, the documentation and winnt.h both specify that IMAGE_SYM_DTYPE_FUNCTION = 2, not 0x20. Even if this is taken to be the value of the MSB, that would give a value for the entire field of 0x200, not 0x20.
What am I missing?
Check winnt.h for following lines:
// type packing constants
#define N_BTMASK 0x000F
#define N_TMASK 0x0030
#define N_TMASK1 0x00C0
#define N_TMASK2 0x00F0
#define N_BTSHFT 4
#define N_TSHIFT 2
// MACROS
// Basic Type of x
#define BTYPE(x) ((x) & N_BTMASK)
// Is x a pointer?
#ifndef ISPTR
#define ISPTR(x) (((x) & N_TMASK) == (IMAGE_SYM_DTYPE_POINTER << N_BTSHFT))
#endif
// Is x a function?
#ifndef ISFCN
#define ISFCN(x) (((x) & N_TMASK) == (IMAGE_SYM_DTYPE_FUNCTION << N_BTSHFT))
#endif
So it seems official MSB, LSB description is wrong - they are not bytes but nibbles. So 0x20 would be a function (MS nibble = 2) returning base type of IMAGE_SYM_TYPE_NULL (LS nibble = 0) .

how can I find native exceptions in an x64 stack?

I've been using WinDbg to debug dump files for a while now.
There's a nice "trick" that works with x86 native programs, you can scan the stack for the CONTEXT_ALL flags (0x1003f).
In x64 the CONTEXT_ALL flags apparently don't contain 0x1003f ...
Now the problem is that sometimes when you mix native with managed code, the regular methods of finding exceptions (like .exc or .lastevent).
What is the equivalent of this 0x1003f in x64 ? is there such a constant ?
EDIT:
BTW, if you were wondering, in theory it should have been 10003f because of the definitions:
#define CONTEXT_I386 0x00010000
#define CONTEXT_AMD64 0x00100000
#define CONTEXT_CONTROL 0x00000001L // SS:SP, CS:IP, FLAGS, BP
#define CONTEXT_INTEGER 0x00000002L // AX, BX, CX, DX, SI, DI
#define CONTEXT_SEGMENTS 0x00000004L // DS, ES, FS, GS
#define CONTEXT_FLOATING_POINT 0x00000008L // 387 state
#define CONTEXT_DEBUG_REGISTERS 0x00000010L // DB 0-3,6,7
#define CONTEXT_EXTENDED_REGISTERS 0x00000020L // cpu specific extensions
#define CONTEXT_FULL (CONTEXT_CONTROL | CONTEXT_INTEGER | CONTEXT_SEGMENTS)
#define CONTEXT_ALL (CONTEXT_FULL | CONTEXT_FLOATING_POINT | CONTEXT_DEBUG_REGISTERS | CONTEXT_EXTENDED_REGISTERS)
#define CONTEXT_I386_FULL CONTEXT_I386 | CONTEXT_FULL
#define CONTEXT_I386_ALL CONTEXT_I386 | CONTEXT_ALL
#define CONTEXT_AMD64_FULL CONTEXT_AMD64 | CONTEXT_FULL
#define CONTEXT_AMD64_ALL CONTEXT_AMD64 | CONTEXT_ALL
But it's not...
I usually use the segment register values for my key to finding context records (ES and DS have the same value and are next to each other in the CONTEXT stucture). The flags trick is neat too though.
Forcing an exception in a test application then digging the context record structure off the stack, looks like the magic value in my case would be 0x10001f:
0:000> dt ntdll!_context 000df1d0
...
+0x030 ContextFlags : 0x10001f
...
+0x03a SegDs : 0x2b
+0x03c SegEs : 0x2b
...
Also note that the ContextFlags value is not at the start of the structure, so if you find that value you'll have to subtract ##c++(#FIELD_OFFSET(ntdll!_CONTEXT, ContextFlags)) from it to get the base of the context structure.
Also, just in case it wasn't obvious, this value comes from a sample size of exactly one. It may not be correct in your environment and it is of course subject to change (as is anything implementation specific such as this).

Resources