Dynamic Memory Allocation in kernel's VDSO - linux-kernel

For an experiment, I need to instrument and allocate entry nodes for a hashtable inside arch/x86/vdso/vclock_gettime.c through the following typical approach.
struct h_struct *phe = (struct h_struct*) kmalloc(sizeof(struct h_struct), GFP_HIGHUSER);
that I have tested and used in other areas of kernel where it compiled and worked as expected. However, in the case of VDSO it results in the failed linking
CC arch/x86/vdso/vclock_gettime.o
VDSO arch/x86/vdso/vdso.so.dbg
arch/x86/vdso/vclock_gettime.o: In function `kmalloc':
linux-3.10.0/include/linux/slub_def.h:171: undefined reference to `kmalloc_caches'
linux-3.10.0/include/linux/slub_def.h:171: undefined reference to `kmem_cache_alloc_trace'
collect2: error: ld returned 1 exit status
OBJCOPY arch/x86/vdso/vdso.so
I am aware that VDSO has a special status, where although allocated in kernel space is mapped into userspace in the address space of every process.
I wonder, if someone more experienced can spot or suggest the way to allocate memory in vdso for my needs.
P.S. malloc can't be used as that requires stdlib.h which results in linking against glibc

Related

ld: 32-bit RIP relative reference out of range in simple gcc program

related to "ld: 32-bit RIP relative reference out of range" on Mac OSX but not solved and in a more complex context. the relevant computer(s) have >32GB of RAM.
static const int K=1024;
static const int M=K*K;
static const int G=K*M;
const int MC = G;
void donada(float *rvec, const int MC) { rvec[MC-1]= 1.0; return; }
float notused[1][MC]; // 4GB, ramp up with first index
float used[MC]; // 4GB
int main() {
donada( used, MC );
donada( notused[1], MC );
}
and gcc -Wall -o test test.cc. compiling this program not on osx yields
ld: 32-bit RIP relative reference out of range (4294967395 max is
+/-2GB): from _main (0x100000F92) to _used (0x200001000) in '_main' from /var/folders/yl/8gp3pgbn1l562ywg_q86rk6800\ 00z9/T/test-b3bebf.o
for architecture x86_64
on linux, there is a similar error
test.cc:(.text+0x18): relocation truncated to fit: R_X86_64_32 against symbol `used' defined in .bss section in /tmp/ccqcNh2C.o
I first thought compiler flag -Os would fix this, but it does not. It would be appropriate for gcc or clang to provide a more suggestive error message.
the relevant computer(s) have >32GB of RAM.
That's actually not very relevant. The issue is that 64-bit GCC defaults to -mcmodel=small, and you are trying to access data that is 4GiB away from its base symbol, which is not comatible with the small model.
From documentation:
-mcmodel=small
Generate code for the small code model: the program and its symbols
must be linked in the lower 2 GB of the address space. Pointers are 64 bits.
Programs can be statically or dynamically linked. This is the default code model.
-mcmodel=medium
Generate code for the medium model: The program is linked in the lower 2 GB
of the address space. Small symbols are also placed there.
Symbols with sizes larger than -mlarge-data-threshold are put into large data
or bss sections and can be located above 2GB.
Programs can be statically or dynamically linked.
-mcmodel=large
Generate code for the large model: This model makes no assumptions about addresses
and sizes of sections.
To correctly link your program, you need to use -mcmodel=large.
However note that this is not well tested (almost nobody does that), and that all code you (statically) link into your program will need to be built that way.
It is probably much better to dynamically allocate your arrays instead.
I first thought compiler flag -Os would fix this
It can't: -Os minimizes code size. Your program is that you are forcing the compiler to allocate very large contiguous data array. There is nothing the compiler could optimize for size there.

What's the difference between _int_malloc and malloc (in Valgrind)

I am amazed that I can't find any document stating the difference between _int_malloc and malloc in the output of Valgrind's callgrind tool.
Could anybody explain what's their difference?
Furthermore, I actually write C++ code, so I am using exclusively new not malloc, but in the callgrind output only mallocs are showing up.
The malloc listed in the callgrind output will be the implementation of malloc provided by the glibc function __libc_malloc in the file glibc/malloc/malloc.c.
This function calls another function, intended for internal use only, named _int_malloc, which does most of the hard work.
As writing standard libraries is very difficult, the authors must be very good programmers and therefore very lazy. So, instead of writing memory allocation code twice, the new operator calls malloc in order to get the memory it requires.

What is the role of undefined exception handler (__und_svc) in kprobes?

I tried to convert the kprobe as loadable kernel module.
I am able to run the samples available in samples/kprobes/ folder from
kernel tree.
If we configure kprobes in kernel(CONFIG_KPROBES), then svc_entry macro will be expanded with 64 bytes in __und_svc() handler.
Reference :
http://lxr.free-electrons.com/source/arch/arm/kernel/entry-armv.S?a=arm#L245
My aim is without touching kernel side, make kprobe as kernel module.
so kernel is compiled without enabling CONFIG_KPROBES. so svc_entry macro will be expanded with 0 in
__und_svc()
I would like to get cleared from these doubts.
If kprobe is handled undefined instruction exception(bcos kprobe
only created), then why __und_svc() is invoked. what is the role of __und_svc() handler with respect to kprobes??
If 64 bytes memory is compulsory, then how to allocate without
compiling the kernel. i.e How to do it dynamically.??
Please share your knowledge.
You may not get responses as your understanding of things is not very good and it will take some time for anyone on the linux-arm-kernel list to respond. Read kprobes.txt and study the ARM architecture in detail.
If kprobe is handled undefined instruction exception(bcos kprobe only created), then why __und_svc() is invoked. what is the role of __und_svc() handler with respect to kprobes?
On the ARM, mode 0b11011 is the undefined instruction mode. The flow when an undefined instruction happens is,
lr_und = pc of undef instruction + 4
SPSR_und = CPSR of mode where the instruction occurred.
Change mode to ARM with interrupt disabled.
PC = vector base + 4
The main vector table of step four is located at __vectors_start and this just branches to
vector_und. The code is a macro called vector_stub, which makes a descision to call either __und_svc or __und_usr. The stack is the 4/8k page that is reserved per process. It is the kernel page which contains both the task structure and the kernel stack.
kprobe works by placing undefined instructions at code addresses that you wish to probe. Ie, it involves the undefined instruction handler. This should be pretty obvious. It calls two routines, call_fpe or do_undefinstr(). You are interested in the 2nd case, which gets the opcode and calls call_undef_hook(). Add a hook with register_undef_hook(); which you can see arch_init_kprobes(). The main callback kprobe_handler is called with a struct pt_regs *regs, which happens to be the extra memory reserved in __und_svc. Notice for instance, kretprobe_trampoline(), which is playing tricks with the stack that it is currently executing with.
If 64 bytes memory is compulsory, then how to allocate without compiling the kernel. i.e How to do it dynamically.?
No it is not. You can use a different mechanism, but you may have to modify the kprobes code. Most likely you will have to limit functionality. It is also possible to completely re-write the stack frame and reserve the extra 64bytes after the fact. It is not an allocation as in kmalloc(). It is just adding/subtracting a number from the supervisor stack pointer. I would guess that the code re-writes the return address from the undefined handler to execute in the context (ISR, bottom half/thread IRQ, work_queue, kernel task) of the kprobed address. But there are probably additional issues you haven't yet encountered. If arch_init_kprobes() is never called, then you can just always do the reservation in __und_svc; it just eats 64 bytes of stack which will make it more likely that the kernel stack will overflow. Ie, change,
__und_svc:
# Always reserve 64 bytes, even if kprobe is not active.
svc_entry 64
arch_init_kprobes() is what actually installs the feature.

Want good understanding on shared libraries at the memory level

Please somebody help.
I am creating a shared library and when run with this command this gives a error
"gcc -shared libx.o -o libx.so"
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: libx.o: relocation R_X86_64_32 against `.rodata' can not be used when making a shared object;
recompile with -fPIC
libx.o: could not read symbols: Bad value
collect2: ld returned 1 exit status
So, I run it with -FPIC, it compiles, please can you give me a good understanding of -FPIC significance at the memory level, I mean how it is shared in the physical memory between 2 programs using this shared library.
Thanks a lot.
-fpic stands for position independent code.
you can read drepper to get more idea on dynamic linking http://www.akkadia.org/drepper/dsohowto.pdf
Seems a duplicate of similar post GCC -fPIC option
For systems with virtual memory the loader is likely to map the shared code into some contiguous pages in the memory space of the applications that are using that library. In order to share these pages between multiple processes they must be:
read-only.
able to be mapped at an arbitrary in the address space of a process.
consequences:
Most code is not read-only in that it can not just be mapped into the memory space of a process and run - it must first be modified by the loader in ways that are specific to each process. In order to achieve read-only text you pass the -fpic option to the compiler. This causes the compiler to generate less optimal machine code but with the benefit that it is readonly.
Efficient code can often not be mapped to an arbitrary location in the address space. Commonly efficient code is either constrained to a particular address, or to a low range of addresses. The -fpic options instructs the compiler to use less efficient code gen but with the benefit of not having a constraint about where it is run.
Now we can understand your problem:
relocation R_X86_64_32 against `.rodata' - Here the linker is warning you that the compiler has used codgen that is constrained run in to a low range of addresses. Therefore, it is unsuitable for use in a shared library.

Where is MAP_FAILED returned

A linux kernel newbie question.
the man pages of mmap state that ".. otherwise, it shall return a value of MAP_FAILED and set errno to indicate the error... "
I have looked through the kernel code for mmap under /usr/src/linux/mm/mmap.c but I could not find a place where mmap returns MAP_FAILED.
Can anyone point me as to where can I find the same.
Thanks
You won't find MAP_FAILED in the kernel; instead, it's defined in userspace and used by mmap, the userspace function that wraps the system call. See the glibc source for mmap.

Resources