I am using the esp value of kernel stack to calculate the process descriptor pointer value.
According to ULK book, I just need to mask 13 least significant bits of esp to obtain the base address of the thread_info structure.
My test is:
write a kernel module because I need to get value of kernel stack
In the kernel init function, get the value of kernel stack
use following formula to get the process descriptor pointer of the process running on the CPU: *((unsigned int*) esp & 0xffffe000)
use the current macro, print out its value.
I think the value of step3 should be same as the value of step 4.
But my experiment results shows: sometimes they are same, and sometimes they are different. Could any explain why? Or am I missing anything?
This is because at the base of the kernel stack you will find a struct thread_info instance (platform dependent) and not a struct task_struct. The current() macro provides a pointer to the current task_struct.
Try the following:
struct thread_info *info = (struct thread_info*)(esp & 0xfffe000);
struct task_struct *my_current = info->task;
Now you can compare my_current with current().
Finally, I solved this problem. Everything is correct expect for the size of kernel stack. My kernel use 4KB stack instead of 8KB stack. So I just need to mask low 12 bits of the ESP.
Thanks for all the suggestions and answer!
Related
I have a "Hello World" program to which I've attached lldb. I'm trying to answer a few questions for myself about the results I get when I try to get the address of library functions:
(lldb) image lookup -n printf
1 match found in /usr/lib/system/libsystem_c.dylib:
Address: libsystem_c.dylib[0x000000000003f550] (libsystem_c.dylib.__TEXT.__text + 253892)
Summary: libsystem_c.dylib`printf
(lldb) image lookup -n scanf
1 match found in /usr/lib/system/libsystem_c.dylib:
Address: libsystem_c.dylib[0x000000000003fc69] (libsystem_c.dylib.__TEXT.__text + 255709)
Summary: libsystem_c.dylib`scanf
(lldb) expr &printf
(int (*)(const char *__restrict, ...)) $2 = 0x00007fff6f8c5550 (libsystem_c.dylib`printf)
(lldb) expr &scanf
error: unsupported expression with unknown type
I have three questions here:
What kind of address is 0x00007fff6f8c5550? I assume it is the function pointer to printf. Is this a virtual address that exists only in the mapped space of the current process? If yes, why does another program return the same address for printf?
Assuming it's some global shared address that is the same for every process, would modifying the contents of the data at this address (which I haven't been able to do yet) create a copy of the modified memory page and will the address change? (i'm on Mac OS and I assume one process cannot change shared memory for another process)
Why does expr &scanf not work, but expr &printf does?
What kind of address is 0x00007fff6f8c5550? I assume it is the function pointer to printf.
Yes, that's correct.
Is this a virtual address that exists only
in the mapped space of the current process?
Well, yes and no. It is a virtual address specific to your process and you should not assume it's valid in another process. But:
If yes, why does another
program return the same address for printf?
As an optimization, macOS uses a shared mapping for a lot of the system libraries. They are loaded once at boot and used by all processes. For a given boot, the address is constant across all such processes. However, the address is randomized each boot for security.
Assuming it's some global shared address that is the same for every process, would modifying the contents of the data at this address
(which I haven't been able to do yet) create a copy of the modified
memory page and will the address change?
Well, it is mapped copy-on-write. So, modifying it would create a copy. However, that wouldn't change its address. The OS would simply modify the mapping so that the memory around that address is private to your process.
(i'm on Mac OS and I assume
one process cannot change shared memory for another process)
Well, processes can cooperate to have writable shared memory. But, in general, you're correct that security precautions prevent unwanted modifications to a process's memory.
Why does expr &scanf not work, but expr &printf does?
Your program (presumably) doesn't use scanf, so there's no debugging information regarding it. The main thing lldb is missing is the type of scanf. If you use a cast expression, it can work:
(lldb) p scanf
error: 'scanf' has unknown type; cast it to its declared type to use it
(lldb) p &scanf
error: unsupported expression with unknown type
(lldb) p (int(*)(const char * __restrict, ...))scanf
(int (*)(const char *__restrict, ...)) $3 = 0x00007fffd7e958d4 (libsystem_c.dylib`scanf)
Conversely, it works for printf because your program does use it.
I want to obtain the return address of the current user stack frame
from some Linux kernel structure on a Linux x86_64 VM using a VMI-based approach.
I can access the content of more or less all registers, but solely at the moment
of a context switch (CR3 event), so registers like RBP or RSP are
pointing to the kernel stack and not user stack.
My first approach was to obtain the stack/base pointer and derive
from it the offset to the return address. However, accessing the
task_struct member thread (type thread_struct) and inside of it the
members sp or usersp doesn't yield the desired result. The member
usersp does point to some area in the user stack, but it definitively does not
hold the current position of the user stack pointer. I used some simple
C program, which prints out the stack pointer within each function, to verify
the validity of the usersp member.
Another approach was to follow the ret_stack pointer (type ftrace_ret_stack) of task_struct that contains the member ret. This member refers to the same value
as the usersp member from above.
I could however derive the top of stack by accessing the field start_stack
from the structure mm_struct that is nested in the task_struct (a pointer to it). This information seems to come most closely to the actual stack pointer. But, I have no idea how to derive from it the desired return address.
I have noticed there is some raw pointer called stack within the task_struct,
but couldn't figure out any detailed information about it.
I'm aware of the 'trapframe' structure on Windows systems, but couldn't find
the equivalent on Linux.
Thanks in advance.
I need longjmp/setjmp in a .kext file for OS X. Unfortunately, I don't think there's any official support for these functions in XNU. Is there any fundamental reason why this cannot work or is it just not implemented right now?
Any ideas how I could get this to work?
If it helps, I want to try to get Lua to run in the OS X kernel but the runtime seems to depend on either longjmp/setjmp or C++ exceptions both of which are not available in XNU.
There's nothing about standard-compliant use of setjmp/longjmp which stops you from using it in a kernel context. The main thing to be careful about regarding the kernel execution context is that the current thread is usually identified via pointer arithmetic on the current stack pointer, so unlike in user space, you can't use green threads or otherwise mess with the rsp register (on x86-64). longjmp does set the stack pointer, but only to the value previously saved by setjmp, which will be in the same stack if you stick to standard use, so that's safe.
As far as I'm aware, compilers don't treat setjmp() calls specially, so you can implement your own version quite easily as a function in assembly language. Setjmp will need to save the return pointer, the stack pointer, and any callee-saved registers to the jmp_buf-typed array passed into the function; all of this is defined in the ABI for the platform in question (x86-64 sysv in the case of OS X). Then return 0 (set rax to 0 on x86-64). Your version of longjmp will simply need to restore the contents of this array and return to the saved location, with the passed-in value as the return value (copy the argument to rax on x86-64). To comply with the standard, you must return 1 if 0 is passed to longjmp.
In userspace, setjmp/longjmp typically also affect the signal mask, which doesn't apply in the kernel.
What does "a GP/function address pair" mean in Itanium C++ ABI? What does GP stand for?
Short explanation: gp is, for all practical means, a hidden parameter to all functions that comply with the Itanium ABI. It's a kind of this pointer to the global variables the function uses. As far as I know, no mainstream OS does it anymore.
GP stands for "globals pointer". It's a base address for data statically allocated by executables, and the Itanium architecture has a register just for it.
For instance, if you had these global variables and this function in your program:
int foo;
int bar;
int baz;
int func()
{
foo++;
bar += foo;
baz *= bar / foo;
return foo + bar + baz;
}
The gp/function pair would conceptually be &foo, &func. The code generated for func would refer to gp to find where the globals are located. The compiler knows foo can be found at gp, bar can be found at gp + 4 and baz can be found at gp + 8.
Assuming funcĀ is defined in an external library, if you call it from your program, the compiler will use a sequence of instructions like this one:
save current gp value to the stack;
load code address from the pair for func into some register;
load gp value from same pair into GP;
perform indirect call to the register where we stored the code address;
restore old gp value that we saved on the stack before, resume calling function.
This makes executables fully position-independent since they don't ever store absolute addresses to data symbols, and therefore makes it possible to maintain only one instance of any executable file in memory, no matter how many processes use it (you could even load the same executable multiple times within a single process and still only have one copy of the executable code systemwide), at the cost of making function pointers a little weird. With the Itanium ABI, a function pointer is not a code address (like it is with "regular" x86 ABIs): it's an address to a gp value and a code address, since that code address might not be worth much if it can't access its global variables, just like a method might not be able to do much if it doesn't have a this pointer.
The only other ABI I know that uses this concept was the Mac OS Classic PowerPC ABI. They called those pairs "transition vectors".
Since x86_64 supports RIP-relative addressing (x86 did not have an equivalent EIP-relative addressing), it's now pretty easy to create position-independent code without having to use an additional register or having to use "enhanced" function pointers. Code and data just have to be kept at constant offsets. Therefore, this part of the Itanium ABI is probably gone for good on Intel platforms.
From the Itanium Register Conventions:
8.2 The gp Register
Every procedure that references statically-allocated data or calls another procedure requires a pointer to its data segment in the gp register, so that it can access its static data and its linkage tables. Each load module has its own data segment, and the gp register must be set correctly prior to calling any entry point within that load module.
The linkage conventions require that each load module define exactly one gp value to refer to a location within its short data segment. It is expected that this location will be chosen to maximize the usefulness of short-displacement immediate instructions for addressing scalars and linkage table entries. The DLL loader will determine the absolute value of the gp register for each load module after loading its data segment into memory.
For calls within a load module, the gp register will remain unchanged, so calls known to be local can be optimized accordingly.
For calls between load modules, the gp register must be initialized with the correct gp value for the new load module, and the calling function must ensure that its own gp value is saved and restored.
Just a comment about this quote from the other answer:
It is expected that this location will be chosen to maximize the usefulness of short-displacement immediate instructions for addressing scalars and linkage table entries.
What this is talking about: Itanium has three different ways to put a value into a register (where 'immediate' here means 'offset from the base'). You can support a full 64 bit offset from anywhere, but it takes two instructions:
// r34 has base address
movl r33 = <my immediate>
;;
add r35 = r34, r35
;;
Not only does that take 2 separate clocks, it takes 3 instruction slots across 2 bundles to make that happen.
There are two shorter versions: add14 (also adds) and add22 (also addl). The difference was in the immediate size each could handle. Each took a single 'A' slot iirc, and completed in a single clock.
add14 could use any register as the source & target, but could only handle up to 14 bit immediates.
add22 could use any register as the target, but for source, only two bits were allocated. So you could only use r0, r1, r2, r3 as the source regs. r0 is not a real register - it's hardwired to 0. But using one of the other 3 as a local stack registers, means you can address 256 times the memory using simple offsets, compared to using the local stack registers. Therefore, if you put your global base address into r1 (the convention), you could access that much more local offsets before having to do a separate movl and/or modifying gp for the next section of code.
In the Linux kernel, is there a way to traverse down to the buffer_heads from within a module?
I can see how to get to struct bio (task_struct macro: current->bio). But how can I get to the buffer heads? The buffer_head struct holds some information I'd like to obtain at any point regarding physical block numbers.
Nevermind. I was looking at this wrong.