How to time functions in Linux Kernel's start_kernel function? - time

I need to figure out the time function calls take inside Linux kernel's start_kernel function.
Why this is a problem is because the kernel would not boot if I put a ktime_get() function call before the timekeeping_init() function call inside that same function.
In my setup I think most likely the kernel spends a lot of time inside mm_init(). But I can not verify due to the above mention problem.
Any ideas on how to get around this?
Thanks.

This early in the boot there's not really much that's available. printk should work, though.
If you're on x86, you could use rdtsc(). There's only 1 CPU running at this time, so the usual warnings about confusing results don't apply yet.

Related

Profiling Rust with execution time for each *line* of code?

I have profiled my Rust code and see one processor-intensive function that takes a large portion of the time. Since I cannot break the function into smaller parts, I hope I can see which line in the function takes what portion of time. Currently I have tried CLion's Rust profiler, but it does not have that feature.
It would be best if the tool runs on MacOS since I do not have a Windows/Linux machine (except for virtualization).
P.S. Visual studio seems to have this feature; but I am using Rust. https://learn.microsoft.com/en-us/visualstudio/profiling/how-to-collect-line-level-sampling-data?view=vs-2017 It has:
Line-level sampling is the ability of the profiler to determine where in the code of a processor-intensive function, such as a function that has high exclusive samples, the processor has to spend most of its time.
Thanks for any suggestions!
EDIT: With C++, I do see source code line level information. For example, the following toy shows that, the "for" loop takes most of the time within the big function. But I am using Rust...
To get source code annotation in perf annotate or perf report you need to compile with debug=2 in your cargo toml.
If you also want source annotations for standard library functions you additionally need to pass -Zbuild-std to cargo (requires nightly).
Once compiled, "lines" of Rust do not exist. The optimiser does its job by completely reorganising the code you wrote and finding the minimal machine code that behaves the same as what you intended.
Functions are often inlined, so even measuring the time spent in a function can give incorrect results - or else change the performance characteristics of your program if you prevent it from being inlined to do so.

How to implement new instruction in linux KVM at unused x86 opcode

As a part of understanding virtualization, I am trying to extend the support of KVM and defin a new instruction. The instruction will use previously unused opcodes.
ref- ref.x86asm.net/coder32.html.
Now, lets say an instruction like 'CPUID' (which causes a vm-exit) and i want to add a new instruction, say - 'NEWCPUID', which is similar to 'CPUID' in priviledge and is trapped by hypervisor, but will differ in the implementation.
After going through some online resources, I was able to understand how to define new system calls, but I am not sure about which all files in linux source code do I need to add the code for NEWCPUID? Is there a better way than only relying on 'find' command?
I am facing below challenges:
1. Which all places in linux source code do I need to add code?
2. Not sure how this new instruction can be mapped to a previously unused opcode?
As I am completely new to this field and willing to learn this, can someone explain me in short how to go about this task? I will need the right direction to achieve this. If there is a reference/tutorial/blog describing the process, it will be of great help!
Here are answers to some of your questions:
... but I am not sure about which all files in linux source code do I need to add the code for NEWCPUID?
A - The right place to add emulation for KVM is arch/x86/kvm/emulate.c. Take a look at how opcode_table[] is defined and the hooks to the functions that they execute. The basic idea is the guest executes and undefined instruction such as "db 0xunused"; this is results in an exit since the instruction is undefined. In KVM, you look at the rip from the VMCS/VMCB and determine if it's an instruction KVM knows about (such as NEWCPUID) and then KVM calls x86_emulate_instruction().
...Is there a better way than only relying on 'find' command?
A - Yes, pick an example system call and then use a symbol cross reference such as cscope.
...n me in short how to go about this task?
A - As I mentioned in 1, first of all find a way for the guest to attempt to execute this unused opcode (such as the db trick). I think the assembler will trying to reject unknown opcodes. So, that the first step. Second, check whether your instruction causes an vmexit(). For this, you can use tracing. Tracing emits a lot of output, so, you have to use some filter options. If tracing is overwhelming, simply printk something in vmx_handle_exit (vmx.c). Finally, find a way to hook to your custom function from here. KVM already has handle_exception() to handle guest exceptions; that would be a good place to insert your custom function. See how this function calls emulate_instruction to emulate an exception to be injected to the guest.
I have deliberately skipped some of the questions since I consider them essential to figure out yourself in the process of learning. BTW, I don't think this may not be the best way to understand virtualization. A better way might be to write your own userspace hypervisor that utlizes kvm services via /dev/kvm or maybe just a standalone hypervisor.

OpenMP4 - Why does calling a function multiple times makes it faster

So I'm currently learning openmp4.
What I experienced is that if I call a function a 2nd time it will get significantly faster.
The omp block is inside of this function.
In my example the 1st call takes 5 seconds and the 2nd only 0,3s.
I am using the intel-icc with an Intel Xeon Phi(60cores 240Threads).
Could someone please explain why this is happening?
I think that is due to the OpenMP threading initialization. The creation of the threading team, and initializing each thread, etc.. This in fact is an expensive procedure.
I don't know what your code looks like, but this can also be the effect of caching. During first time, the mic will start caching the needed data. The second time: the data will already be cached and ready. But I can't confirm this until I see the code.

user defined page fault and exception handlers

I am trying to understand if we can add our page fault handlers / exception handlers in kernel / user mode and handle the fault we induced before giving the control back to the kernel.
The task here will be not modifying the existing kernel code (do_page_fault fn) but add a user defined handler which will be looked up when a page fault or and exception is triggered
One could find tools like "kprobe" which provide hooks at instruction, but looks like this will not serve my purpose.
Will be great if somebody can help me understand this or point to good references.
From user space, you can define a signal handler for SIGSEGV, so your own function will be invoked whenever an invalid memory access is made. When combined with mprotect(), this lets a program manage its own virtual memory, all from user-space.
However, I get the impression that you're looking for a way to intercept all page faults (major, minor, and invalid) and invoke an arbitrary kernel function in response. I don't know a clean way to do this. When I needed this functionality in my own research projects, I ended up adding code to do_page_fault(). It works fine for me, but it's a hack. I would be very interested if someone knew of a clean way to do this (i.e., that could be used by a module on a vanilla kernel).
If you don't won't to change the way kernel handles these fault and just add yours before, then kprobes will server your purpose. They are a little difficult to handle, because you get arguments of probed functions in structure containing registers and on stack and you have to know, where exactly did compiler put each of them. BUT, if you need it for specific functions (known during creation of probes), then you can use jprobes (here is a nice example on how to use both), which require functions for probing with exactly same arguments as probed one (so no mangling in registers/stack).
You can dynamically load a kernel module and install jprobes on chosen functions without having to modify your kernel.
You want can install a user-level pager with gnu libsegsev. I haven't used it, but it seems to be just what you are looking for.
I do not think it would be possible - first of all, the page fault handler is a complex function which need direct access to virtual memory subsystem structures.
Secondly, imagine it would not be an issue, yet in order to write a page fault handler in user space you should be able to capture a fault which is by default a force transfer to kernel space, so at least you should prevent this to happen.
To this end you would need a supervisor to keep track of all memory access, but you cannot guarantee that supervisor code was already mapped and present in memory.

Does every function end up in kernel mode?

Can we say that while programming, showing something on output, adding values etc., we always interact with system? I mean whether every function in app ends up(finally) in kernel.
I don't know if this approach varies from OS to OS so I mean Windows.
I appreciate Your response, and I am sorry for my English.
No, adding two values together will pretty sure not use any system code.
You always interact with the system in that the CPU (or some other processor like a GPU) has to execute your code.
Not every instruction executed by the CPU will involve a kernel-mode operation, though.
No, for example in Windows all messaging and COM objects don't end in Kernel-mode but they may use some kernel-mode resources like HANDLEs.

Resources