How to execute a piece of kernel code on all CPUs?

How to execute a piece of kernel code on all CPUs? - linux-kernel

I'm trying to make a kernel module to enable FOP compatibility mode for x87 FPU. This is done via setting bit 2 in IA32_MISC_ENABLE MSR. Here's the code:
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <asm/msr-index.h>
#include <asm/msr.h>
MODULE_LICENSE("GPL");
MODULE_AUTHOR("10110111");
MODULE_DESCRIPTION("Module to enable FOPcode compatibility mode");
MODULE_VERSION("0.1");
static int __init fopCompat_init(void)
{
unsigned long long misc_enable=native_read_msr(MSR_IA32_MISC_ENABLE);
printk(KERN_INFO "Before trying to set FOP_COMPAT, IA32_MISC_ENABLE=%llx,"
" i.e. FOP_COMPAT is %senabled\n"
,misc_enable,misc_enable&MSR_IA32_MISC_ENABLE_X87_COMPAT?"":"NOT ");
wrmsrl(MSR_IA32_MISC_ENABLE,misc_enable|MSR_IA32_MISC_ENABLE_X87_COMPAT);
misc_enable=native_read_msr(MSR_IA32_MISC_ENABLE);
printk(KERN_INFO "Tried to set FOP_COMPAT. Result: IA32_MISC_ENABLE=%llx,"
" i.e. FOP_COMPAT is now %senabled\n"
,misc_enable,misc_enable&MSR_IA32_MISC_ENABLE_X87_COMPAT?"":"NOT ");
return 0;
}
static void __exit fopCompat_exit(void)
{
const unsigned long long misc_enable=native_read_msr(MSR_IA32_MISC_ENABLE);
printk(KERN_INFO "Quitting FOP-compat with IA32_MISC_ENABLE=%llx\n",misc_enable);
if(!(misc_enable & MSR_IA32_MISC_ENABLE_X87_COMPAT))
printk(KERN_INFO "NOTE: seems some CPUs still have to be set up, "
"or compatibility mode will work inconsistently\n");
printk(KERN_INFO "\n");
}
module_init(fopCompat_init);
module_exit(fopCompat_exit);
It seems to work, but on multiple insmod/rmmod cycles I sometimes get dmesg output that the compatibility mode wasn't still enabled, although it was immediately after doing wrmsr. After some thinking I realized that it's because the module code was executed on different logical CPUs (I have Core i7 with 4 cores*HT=8 logical CPUs), so I had a 1/8 chance of getting "enabled" print on doing rmmod. After repeating the cycle for about 20 times I got consistent "enabled" prints, and my userspace application happily works with it.
So now my question is: how can I make my code execute on all logical CPUs present on the system, so as to enable compatibility mode for all of them?

For execute code on every CPU use on_each_cpu function.
Signature:
int on_each_cpu(void (*func) (void *info), void *info, int wait)
Description:
Call a function on all processors.
If wait parameter is non-zero, it waits for function's completion on all CPUs.
Function func shouldn't sleep, but whole on_each_cpu() call shouldn't be done in atomic context.

Related

keyboard interrupt routine visual studio C++ console app

I am using VS 2022 Preview to write a C++ console application. I wish to detect a keyboard hit and have my interrupt handler function called. I want the key press detected quickly in case main is in a long loop and therefore not using kbhit().
I found signal() but the debugger stops when the Control-C is detected. Maybe it is a peculiarity of the IDE. Is there a function or system call that I should use?
Edit: I am vaguely aware of threads. Could I spawn a thread that just watches kbd and then have it raise(?) an interrupt when a key is pressed?

I was able to do it by adding a thread. On the target I will have real interrupts to trigger my ISR but this is close enough for algorithm development. It seemed that terminating the thread was more trouble than it was worth so I rationalized that I am simulating an embedded system that does not need fancy shutdowns.
I decided to just accept one character at a time in the phony ISR then I can buffer them and wait and process the whole string when I see a CR, a simple minded command line processor.
// Scheduler.cpp : This file contains the 'main' function. Program execution begins and ends there.
//
#include <Windows.h>
#include <iostream>
#include <thread>
#include <conio.h>
void phonyISR(int tbd)
{
char c;
while (1)
{
std::cout << "\nphonyISR() waiting for kbd input:";
c = _getch();
std::cout << "\nGot >" << c << "<";
}
}
int main(int argc, char* argv[])
{
int tbd;
std::thread t = std::thread(phonyISR, tbd);
// Main thread doing its stuff
int i = 0;
while (1)
{
Sleep(2000);
std::cout << "\nMain: " << i++;
}
return 0;
}

Trap memory accesses inside a standard executable built with MinGW

So my problem sounds like this.
I have some platform dependent code (embedded system) which writes to some MMIO locations that are hardcoded at specific addresses.
I compile this code with some management code inside a standard executable (mainly for testing) but also for simulation (because it takes longer to find basic bugs inside the actual HW platform).
To alleviate the hardcoded pointers, i just redefine them to some variables inside the memory pool. And this works really well.
The problem is that there is specific hardware behavior on some of the MMIO locations (w1c for example) which makes "correct" testing hard to impossible.
These are the solutions i thought of:
1 - Somehow redefine the accesses to those registers and try to insert some immediate function to simulate the dynamic behavior. This is not really usable since there are various ways to write to the MMIO locations (pointers and stuff).
2 - Somehow leave the addresses hardcoded and trap the illegal access through a seg fault, find the location that triggered, extract exactly where the access was made, handle and return. I am not really sure how this would work (and even if it's possible).
3 - Use some sort of emulation. This will surely work, but it will void the whole purpose of running fast and native on a standard computer.
4 - Virtualization ?? Probably will take a lot of time to implement. Not really sure if the gain is justifiable.
Does anyone have any idea if this can be accomplished without going too deep? Maybe is there a way to manipulate the compiler in some way to define a memory area for which every access will generate a callback. Not really an expert in x86/gcc stuff.
Edit: It seems that it's not really possible to do this in a platform independent way, and since it will be only windows, i will use the available API (which seems to work as expected). Found this Q here:
Is set single step trap available on win 7?
I will put the whole "simulated" register file inside a number of pages, guard them, and trigger a callback from which i will extract all the necessary info, do my stuff then continue execution.
Thanks all for responding.

I think #2 is the best approach. I routinely use approach #4, but I use it to test code that is running in the kernel, so I need a layer below the kernel to trap and emulate the accesses. Since you have already put your code into a user-mode application, #2 should be simpler.
The answers to this question may provide help in implementing #2. How to write a signal handler to catch SIGSEGV?
What you really want to do, though, is to emulate the memory access and then have the segv handler return to the instruction after the access. This sample code works on Linux. I'm not sure if the behavior it is taking advantage of is undefined, though.
#include <stdint.h>
#include <stdio.h>
#include <signal.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static void segv_handler(int, siginfo_t *, void *);
int main()
{
struct sigaction action = { 0, };
action.sa_sigaction = segv_handler;
action.sa_flags = SA_SIGINFO;
sigaction(SIGSEGV, &action, NULL);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static void segv_handler(int, siginfo_t *info, void *ucontext_arg)
{
ucontext_t *ucontext = static_cast<ucontext_t *>(ucontext_arg);
ucontext->uc_mcontext.gregs[REG_RAX] = 1234;
ucontext->uc_mcontext.gregs[REG_RIP] += 2;
}
The code to read the register is written in assembly to ensure that both the destination register and the length of the instruction are known.

This is how the Windows version of prl's answer could look like:
#include <stdint.h>
#include <stdio.h>
#include <windows.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *);
int main()
{
SetUnhandledExceptionFilter(segv_handler);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *ep)
{
// only handle read access violation of REG_ADDR
if (ep->ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION ||
ep->ExceptionRecord->ExceptionInformation[0] != 0 ||
ep->ExceptionRecord->ExceptionInformation[1] != (ULONG_PTR)REG_ADDR)
return EXCEPTION_CONTINUE_SEARCH;
ep->ContextRecord->Rax = 1234;
ep->ContextRecord->Rip += 2;
return EXCEPTION_CONTINUE_EXECUTION;
}

So, the solution (code snippet) is as follows:
First of all, i have a variable:
__attribute__ ((aligned (4096))) int g_test;
Second, inside my main function, i do the following:
AddVectoredExceptionHandler(1, VectoredHandler);
DWORD old;
VirtualProtect(&g_test, 4096, PAGE_READWRITE | PAGE_GUARD, &old);
The handler looks like this:
LONG WINAPI VectoredHandler(struct _EXCEPTION_POINTERS *ExceptionInfo)
{
static DWORD last_addr;
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION) {
last_addr = ExceptionInfo->ExceptionRecord->ExceptionInformation[1];
ExceptionInfo->ContextRecord->EFlags |= 0x100; /* Single step to trigger the next one */
return EXCEPTION_CONTINUE_EXECUTION;
}
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP) {
DWORD old;
VirtualProtect((PVOID)(last_addr & ~PAGE_MASK), 4096, PAGE_READWRITE | PAGE_GUARD, &old);
return EXCEPTION_CONTINUE_EXECUTION;
}
return EXCEPTION_CONTINUE_SEARCH;
}
This is only a basic skeleton for the functionality. Basically I guard the page on which the variable resides, i have some linked lists in which i hold pointers to the function and values for the address in question. I check that the fault generating address is inside my list then i trigger the callback.
On first guard hit, the page protection will be disabled by the system, but i can call my PRE_WRITE callback where i can save the variable state. Because a single step is issued through the EFlags, it will be followed immediately by a single step exception (which means that the variable was written), and i can trigger a WRITE callback. All the data required for the operation is contained inside the ExceptionInformation array.
When someone tries to write to that variable:
*(int *)&g_test = 1;
A PRE_WRITE followed by a WRITE will be triggered,
When i do:
int x = *(int *)&g_test;
A READ will be issued.
In this way i can manipulate the data flow in a way that does not require modifications of the original source code.
Note: This is intended to be used as part of a test framework and any penalty hit is deemed acceptable.
For example, W1C (Write 1 to clear) operation can be accomplished:
void MYREG_hook(reg_cbk_t type)
{
/** We need to save the pre-write state
* This is safe since we are assured to be called with
* both PRE_WRITE and WRITE in the correct order
*/
static int pre;
switch (type) {
case REG_READ: /* Called pre-read */
break;
case REG_PRE_WRITE: /* Called pre-write */
pre = g_test;
break;
case REG_WRITE: /* Called after write */
g_test = pre & ~g_test; /* W1C */
break;
default:
break;
}
}
This was possible also with seg-faults on illegal addresses, but i had to issue one for each R/W, and keep track of a "virtual register file" so a bigger penalty hit. In this way i can only guard specific areas of memory or none, depending on the registered monitors.

Enabling floating point exceptions on MinGW GCC?

How does one enable floating point exceptions on MinGW GCC, where feenableexcept is missing? Even reasonably complete solutions don't actually catch this, though it would appear that they intend to. I would prefer minimal code that is close to whatever future standards will emerge. Preferably, the code should work with and without SSE. A complete solution that shows how to enable the hardware signal, catch it, and reset it is preferable. Compiling cleanly with high optimization levels and full pedantic errors and warnings is a must. The ability to catch multiple times in a unit-test scenario is important. There are several questions that provide partial answers.

This appears to work on my machine. Compile it in MinGW GCC with -fnon-call-exceptions. It isn't fully minimized yet.
#include <xmmintrin.h>
#include <cerrno>
#include <cfenv>
#include <cfloat> //or #include <float.h> // defines _controlfp_s
#include <cmath>
#include <csignal>
#ifdef _WIN32
void feenableexcept(uint16_t fpflags){
/*edit 2015-12-17, my attempt at ASM code was giving me
*problems in more complicated scenarios, so I
*switched to using _controlfp_s. I finally posted it here
*because of the upvote to the ASM version.*/
/*{// http://stackoverflow.com/questions/247053/
uint16_t mask(FE_ALL_EXCEPT & ~fpflags);
asm("fldcw %0" : : "m" (mask) : "cc");
} //https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html */
unsigned int new_word(0);
if (fpflags & FE_INVALID) new_word |= _EM_INVALID;
if (fpflags & FE_DIVBYZERO) new_word |= _EM_ZERODIVIDE;
if (fpflags & FE_OVERFLOW) new_word |= _EM_OVERFLOW;
unsigned int cw(0);
_controlfp_s(&cw,~new_word,_MCW_EM);
}
#endif
void fe_reset_traps(){
std::feclearexcept(FE_ALL_EXCEPT); //clear x87 FE state
#ifdef __SSE__
_MM_SET_EXCEPTION_STATE(0); // clear SSE FE state
#endif
feenableexcept(FE_DIVBYZERO|FE_OVERFLOW|FE_INVALID); // set x87 FE mask
#ifdef __SSE__
//set SSE FE mask (orientation of this command is different than the above)
_MM_SET_EXCEPTION_MASK(_MM_MASK_DENORM |_MM_MASK_UNDERFLOW|_MM_MASK_INEXACT);
#endif
}
void sigfpe_handler(int sig){
std::signal(sig,SIG_DFL); // block signal, if needed
std::cerr<<"A floating point exception was encountered. Exiting.\n";
fe_reset_traps(); // in testing mode the throw may not exit, so reset traps
std::signal(sig,&sigfpe_handler); // reinstall handler
throw std::exception();
}
fe_reset_traps();
std::signal(SIGFPE,&sigfpe_handler); // install handler
std::cerr<<"before\n";
std::cerr<<1.0/0.0<<"\n";
std::cerr<<"should be unreachable\n";
I'm sure it's not perfect. Let's hear what everyone else has to contribute.

Limiting memory usage for a single process in OSX /Darwin

I am trying to modify some JNI code to limit the amount of memory that a process can consume. Here is the code that I am using to test setRlimit on linux and osx. In linux it works as expected and the buf is null.
This code sets the limit to 32 MB and then tries to malloc a 64 MB buffer, if buffer is null then setrlimit works.
#include <sys/time.h>
#include <sys/resource.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/resource.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)
int main(int argc) {
pid_t pid = getpid();
struct rlimit current;
struct rlimit *newp;
int memLimit = 32 * 1024 * 1024;
int result = getrlimit(RLIMIT_AS, &current);
if (result != 0)
errExit("Unable to get rlimit");
current.rlim_cur = memLimit;
current.rlim_max = memLimit;
result = setrlimit(RLIMIT_AS, &current);
if (result != 0)
errExit("Unable to setrlimit");
printf("Doing malloc \n");
int memSize = 64 * 1024 * 1024;
char *buf = malloc(memSize);
if (buf == NULL) {
printf("Your out of memory\n");
} else {
printf("Malloc successsful\n");
}
free(buf);
}
On linux machine this is my result
memtest]$ ./m200k
Doing malloc
Your out of memory
On osx 10.8
./m200k
Doing malloc
Malloc successsful
My question is that if this does not work on osx is there a way to acomplish this task in darwin kernel. The man pages all seem to say it will work but it does not appear to do so. I have seen that launchctl has some support for limiting memory but my goal is to add this ability in code. I tried using ulimit also but this did not work either and am pretty sure ulimit uses setrlimit to set limits. Also is there a signal I can catch when setrlimit soft or hardlimit is exceeded. I haven't been able to find one.
Bonus points if it can be accomplished in windows also.
Thanks for any advice
Update
As pointed out the RLIMIT_AS is explicitly defined in the man page but is defined as the RLIMIT_RSS, so if referring to the documentation RLIMIT_RSS and RLIMIT_AS are interchangable on OSX.
/usr/include/sys/resource.h on osx 10.8
#define RLIMIT_RSS RLIMIT_AS /* source compatibility alias */
Tested trojanfoe's excellent suggestion to use RLIMIT_DATA which is described here
The RLIMIT_DATA limit specifies the maximum amount of bytes the process
data segment can occupy. The data segment for a process is the area in which
dynamic memory is located (that is, memory allocated by malloc() in C, or in C++,
with new()). If this limit is exceeded, calls to allocate new memory will fail.
The result was the same for linux and osx and that was the malloc was successful for both.
chinshaw#osx$ ./m200k
Doing malloc
Malloc successsful
chinshaw#redhat ./m200k
Doing malloc
Malloc successsful

Problems doing syscall hooking

I use the following module code to hooks syscall, (code credited to someone else, e.g., Linux Kernel: System call hooking example).
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/unistd.h>
#include <asm/semaphore.h>
#include <asm/cacheflush.h>
void **sys_call_table;
asmlinkage int (*original_call) (const char*, int, int);
asmlinkage int our_sys_open(const char* file, int flags, int mode)
{
printk(KERN_ALERT "A file was opened\n");
return original_call(file, flags, mode);
}
int set_page_rw(long unsigned int _addr)
{
struct page *pg;
pgprot_t prot;
pg = virt_to_page(_addr);
prot.pgprot = VM_READ | VM_WRITE;
return change_page_attr(pg, 1, prot);
}
int init_module()
{
// sys_call_table address in System.map
sys_call_table = (void*)0xffffffff804a1ba0;
original_call = sys_call_table[1024];
set_page_rw(sys_call_table);
sys_call_table[1024] = our_sys_open;
return 0;
}
void cleanup_module()
{
// Restore the original call
sys_call_table[1024] = original_call;
}
When insmod the compiled .ko file, terminal throws "Killed". When looking into 'cat /proc/modules' file, I get the Loading status.
my_module 10512 1 - Loading 0xffffffff882e7000 (P)
As expected, I can not rmmod this module, as it complains its in use. The system is rebooted to get a clean-slate status.
Later on, after commenting two code lines in the above source sys_call_table[1024] = our_sys_open; and sys_call_table[1024] = original_call;, it can insmod successfully. More interestingly, when uncommenting these two lines (change back to the original code), the compiled module can be insmod successfully. I dont quite understand why this happens? And is there any way to successfully compile the code and insmod it directly?
I did all this on Redhat with linux kernel 2.6.24.6.

I think you should take a look to the kprobes API, which is well documented in Documentation/krpobes.txt. It gives you the ability to install handler on every address (e.g. syscall entry) so that you can do what you want. Added bonus is that your code would be more portable.
If you're only interested in tracing those syscalls you can use the audit subsystem, coding your own userland daemon which will be able to receive events on a NETLINK socket from the audit kthread. libaudit provides a simple API to register/read events.
If you do have a good reason with not using kprobes/audit, I would suggest that you check that the value you are trying to write to is not above the page that you set writable. A quick calculation shows that:
offset_in_sys_call_table * sizeof(*sys_call_table) = 1024 * 8 = 8192
which is two pages after the one you set writable if you are using 4K pages.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio