Writing per-cpu variables physical address during kernel initialization - linux-kernel

I have the following code:
static DEFINE_PER_CPU_ALIGNED(cpu_clock_t, cpu_clock);
static void func(void *info)
{
uint64_t cpu_clock_pa = per_cpu_ptr_to_phys(get_cpu_ptr(&cpu_clock));
__asm__ __volatile__ ... //Giving the PA to VMware kernel which is supposed to write something to there
put_cpu_ptr(cpu_clock);
}
Problem is, when this code runs as part of the kernel initialization, I get a message in VMware workstation "The CPU is disabled on the guest operating system" which means some kernel panic occurred and when I use the same code after the kernel boots (Call it as part of a module initialization) it works fine...

My code was running before setup_per_cpu_areas as a3f pointed out.

Related

How can I check whether a kernel address belongs to the Linux kernel executable, and not just the core kernel text?

Coming from the Windows world, I assume that Vmlinuz is equivalent to ntoskrnl.exe, and this is the kernel executable that gets mapped in Kernel memory.
Now I want to figure out whether an address inside kernel belongs to the kernel executable or not. Is using core_kernel_text the correct way of finding this out?
Because core_kernel_text doesn't return true for some of the addresses that clearly should belong to Linux kernel executable.
For example the core_kernel_text doesn't return true when i give it the syscall entry handler address which can be found with the following code:
unsigned long system_call_entry;
rdmsrl(MSR_LSTAR, system_call_entry);
return (void *)system_call_entry;
And when I use this code snippet, the address of the syscall entry handler doesn't belong to the core kernel text or to any kernel module (using get_module_from_addr).
So how can an address for a handler that clearly belongs to Linux kernel executable such as syscall entry, don't belong to neither the core kernel or any kernel module? Then what does it belong to?
Which API do I need to use for these type of addresses that clearly belong to Linux kernel executable to assure me that the address indeed belongs to kernel?
I need such an API because I need to write a detection for malicious kernel modules that patch such handlers, and for now I need to make sure the address belongs to kernel, and not some third party kernel module or random kernel address. (Please do not discuss methods that can be used to bypass my detection, obviously it can be bypassed but that's another story)
The target kernel version is 4.15.0-112-generic, and is Ubuntu 16.04 as a VMware guest.
Reproducible code as requested:
typedef int(*core_kernel_text_t)(unsigned long addr);
core_kernel_text_t core_kernel_text_;
core_kernel_text_ = (core_kernel_text_t)kallsyms_lookup_name("core_kernel_text");
unsigned long system_call_entry;
rdmsrl(MSR_LSTAR, system_call_entry);
int isInsideCoreKernel = core_kernel_text_((unsigned long)system_call_entry);
printk("%d , 0x%pK ", isInsideCoreKernel, system_call_entry);
EDIT1: So in the MSR_LSTAR example that I gave above, it turns out that It's related to Kernel Page Table Isolation and CONFIG_RETPOLINE=y in config:
system_call value is different each time when I use rdmsrl(MSR_LSTAR, system_call)
And that's why I am getting the address 0xfffffe0000006000 aka SYSCALL64_entry_trampoline, the same as the question above.
So now the question remains, why this SYSCALL64_entry_trampoline address doesn't belong to anything? It doesn't belong to any kernel module, and it doesn't belong to the core kernel, so which executable this address belongs to and how can I check that with an API similar to core_kernel_text? It seems like it belongs to cpu_entry_area, but what is that and how can I check if an address belongs to that?
You are seeing this "weird" address in MSR_LSTAR (IA32_LSTAR) because of Kernel Page-Table Isolation (KPTI), which mitigates Meltdown. As other existing answers(1) you already found point out, the address you see is the one of a small trampoline (entry_SYSCALL_64_trampoline) that is dynamically remapped at boot time by the kernel for each CPU, and thus does not have an address within the kernel text.
(1)By the way, the answer linked above wrongly states that the corresponding config option for KPTI is CONFIG_RETPOLINE=y. This is wrong, the "retpoline" is a mitigation for Spectre, not Meltdown. The config to enable KPTI is CONFIG_PAGE_TABLE_ISOLATION=y.
You don't have many options. Either:
Tell VMWare to emulate a recent CPU that is not vulnerable to Meltdown.
Detect and implement support for the KPTI trampoline.
You can implement support for this by detecting whether the kernel supports KPTI (CONFIG_PAGE_TABLE_ISOLATION), and if so check whether current CPU has KPTI enabled. The code at kernel/cpu/bugs.c that provides information for /sys/devices/system/cpu/vulnerabilities/meltdown shows how this can be detected:
ssize_t cpu_show_meltdown(struct device *dev,
struct device_attribute *attr, char *buf)
{
if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
return sprintf(buf, "Not affected\n");
if (boot_cpu_has(X86_FEATURE_PTI))
return sprintf(buf, "Mitigation: PTI\n");
return sprintf(buf, "Vulnerable\n");
}
The actual trampoline is set up at boot and its address is stored in each CPU's "entry area" for later use (e.g. here when setting up IA32_LSTAR). This answer on Unix & Linux SE explains the purpose of the cpu entry area and its relation to KPTI.
In your module you can do the following detection:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kallsyms.h>
#include <asm/msr-index.h>
#include <asm/msr.h>
#include <asm/cpufeature.h>
#include <asm/cpu_entry_area.h>
// ...
typedef int(*core_kernel_text_t)(unsigned long addr);
core_kernel_text_t core_kernel_text_;
bool syscall_entry_64_ok(void)
{
unsigned long entry;
rdmsrl(MSR_LSTAR, entry);
if (core_kernel_text_(entry))
return true;
#ifdef CONFIG_PAGE_TABLE_ISOLATION
if (this_cpu_has(X86_FEATURE_PTI)) {
int cpu = smp_processor_id();
unsigned long trampoline = (unsigned long)get_cpu_entry_area(cpu)->entry_trampoline;
if ((entry & PAGE_MASK) == trampoline)
return true;
}
#endif
return false;
}
static int __init modinit(void)
{
core_kernel_text_ = (core_kernel_text_t)kallsyms_lookup_name("core_kernel_text");
if (!core_kernel_text_)
return -EOPNOTSUPP;
pr_info("syscall_entry_64_ok() -> %d\n", syscall_entry_64_ok());
return 0;
}

How to read/write LAPIC Registers from Kernel Module?

I'm trying to disable all Interrupts. Most of them are easy, but I have problems with the Non-Maskable Interrupts (NMIs).
To disable them, I want to manipulate the LVT Registers in the Local APIC.
Currently I am testing inside a Kernel Module, cause that's the Environment, the final code should run.
How can I read/write to the memory-mapped registers of the APIC?
I've already read many articles and everyone suggested this procedure.
I also tried to directly access the *mapped pointer, which resolves in the same result.
Instead of the foo() Function I implemented a lookup for the correct address. But according to the Intel manual and my personal inspections, the APIC always get's mapped to the physical address 0xFEE00000, which is interesting, cause I also tried the program on a Virtual Machine with 2 GB RAM.
phys_addr_t apic_base_phys = foo(); // fee00000
void __iomem *mapped = ioremap(apic_base_phys + 0x20, 0x4);
if(mapped == NULL){
printk(KERN_INFO "nullpointer\n");
} else {
uint32_t value = ioread32(mapped);
printk(KERN_INFO "Value: %x\n", value); // 0xffffffff
}
iounmap(mapped);
Output:
[ 1329.743182] apic_base_phys: fee00000
[ 1329.743198] Value: ffffffff
Address 0xFEE00020 should output the Local APIC ID, which probably not is 0xFFFFFFFF.
I also tried to read 0xFEE00030 which should output the LAPIC Version.
Got the solution by myself: On my System runs the newer x2APIC. This uses a different transfer mode.
This can be disabled by adding nox2apic to the boot options.

About U-boot driver model

I have a simple question regarding U-boot driver model. I wanted to know when and how function ops of a driver is triggered.
For example for Ethernet driver these are the ops defined:
static const struct eth_ops designware_eth_ops = {
.start = designware_eth_start,
.send = designware_eth_send,
.recv = designware_eth_recv,
.free_pkt = designware_eth_free_pkt,
.stop = designware_eth_stop,
.write_hwaddr = designware_eth_write_hwaddr,
};
Now , are these eth_ops are called at initialization stage after probe function or these are called only when some commands are run from u-boot prompt like ping , tftp etc?
Initialization stage would only probe the device and move it next subsystem ?
It depends on u-boot settings. If the bootcmd and bootargs environment variables define something related to the network like loading the kernel from tftp server it will first call the start callback and on sending and receiving the send/rec callbacks. If the kernel is loaded from flash u-boot networking is not required and if you're not using network commands on uboot shell no callback is called
Uboot drivers model is very similar to Linux model and actually there are a lot of common code between them. The only "big" difference is that uboot uses physical addressing and Linux uses MMU to convert physical to virtual address space

Issue in updating task_struct in linux kernel scheduler

I want to insert a variable into struct sched_entity which is a part of task_struct in sched.h. In Linux Kernel scheduler
struct sched_entity {
..
int my_var;
..
}
This code compiles fine but when I flashed the code into the device.
When i run it, it does not booted up. On debugging, i found that i got stuck in some core idle case. I think it is due to some strict memory barrier.
Please help me, how can i insert a variable in sched_entity or task_struct.
Thanks in advance.

Help with APIC functions in Linux

I'm trying to play around with the local APIC functions in the 2.6.32.40 linux kernel, but I am having some issues. I want to try to send a Non-Maskable Interrupts (NMI) to all of the processors on my system (I am using a Intel i7 Q740). First I read the documentation in Intel's Software Developer's Manual Volume 3 related to the APIC functions. It states that interrupts can be broadcast to all processors through the use of the Interrupt Command Register (ICR) located at address 0xFEE00300. So I wrote a kernel module with the following init function to try to write to this register:
#include <linux/init.h>
#include <linux/module.h>
#include <linux/fs.h>
MODULE_LICENSE("GPL");
#define SUCCESS 0
#define ICR_ADDRESS 0xFEE00300
#define ICR_PROGRAM 0x000C4C89
static int icr_init(void){
int * ICR = (int *)ICR_ADDRESS;
printk(KERN_ALERT "Programing ICR\n");
*ICR = ICR_PROGRAM;
return SUCCESS;
}
static void icr_exit(void){
printk(KERN_ALERT "Removing ICR Programing module removed");
}
module_init(icr_init);
module_exit(icr_exit);
However, when I insmod this module the kernel crashes and complains about being unable to handle the paging request # address 00000000fee00300. Looking under /proc/iomem I see that this address is in a ranged marked as "reserved"
fee00000-fee00fff : reserved
I've also tried using the functions under :
static inline void __default_local_send_IPI_allbutself(int vector)
but the kernel is still throwing "unable to handle paging request" messages and crashing. Does anyone have any suggestions? Why is this memory range marked as "reserved" and not marked as being used by the local APIC? Thanks in advance.
The APIC address is a physical memory address, but you are trying to access it as a linear memory address - that's why your first approach doesn't work. The memory is marked as "reserved" precisely because it belongs to the APIC, rather than real memory.
You should use the internal kernel functions. To do so, you should include <asm/apic.h> and use:
apic->send_IPI_allbutself(vector);

Resources