I use the following module code to hooks syscall, (code credited to someone else, e.g., Linux Kernel: System call hooking example).
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/unistd.h>
#include <asm/semaphore.h>
#include <asm/cacheflush.h>
void **sys_call_table;
asmlinkage int (*original_call) (const char*, int, int);
asmlinkage int our_sys_open(const char* file, int flags, int mode)
{
printk(KERN_ALERT "A file was opened\n");
return original_call(file, flags, mode);
}
int set_page_rw(long unsigned int _addr)
{
struct page *pg;
pgprot_t prot;
pg = virt_to_page(_addr);
prot.pgprot = VM_READ | VM_WRITE;
return change_page_attr(pg, 1, prot);
}
int init_module()
{
// sys_call_table address in System.map
sys_call_table = (void*)0xffffffff804a1ba0;
original_call = sys_call_table[1024];
set_page_rw(sys_call_table);
sys_call_table[1024] = our_sys_open;
return 0;
}
void cleanup_module()
{
// Restore the original call
sys_call_table[1024] = original_call;
}
When insmod the compiled .ko file, terminal throws "Killed". When looking into 'cat /proc/modules' file, I get the Loading status.
my_module 10512 1 - Loading 0xffffffff882e7000 (P)
As expected, I can not rmmod this module, as it complains its in use. The system is rebooted to get a clean-slate status.
Later on, after commenting two code lines in the above source sys_call_table[1024] = our_sys_open; and sys_call_table[1024] = original_call;, it can insmod successfully. More interestingly, when uncommenting these two lines (change back to the original code), the compiled module can be insmod successfully. I dont quite understand why this happens? And is there any way to successfully compile the code and insmod it directly?
I did all this on Redhat with linux kernel 2.6.24.6.
I think you should take a look to the kprobes API, which is well documented in Documentation/krpobes.txt. It gives you the ability to install handler on every address (e.g. syscall entry) so that you can do what you want. Added bonus is that your code would be more portable.
If you're only interested in tracing those syscalls you can use the audit subsystem, coding your own userland daemon which will be able to receive events on a NETLINK socket from the audit kthread. libaudit provides a simple API to register/read events.
If you do have a good reason with not using kprobes/audit, I would suggest that you check that the value you are trying to write to is not above the page that you set writable. A quick calculation shows that:
offset_in_sys_call_table * sizeof(*sys_call_table) = 1024 * 8 = 8192
which is two pages after the one you set writable if you are using 4K pages.
Related
I am using Yocto to build an SD Card image for my Embedded Linux Project. The Yocto branch is Warrior and the Linux kernel version is 4.19.78-linux4sam-6.2.
I am currently working on a way to read memory from an external QSPI device in the initramfs and stick the contents into a file in procfs. That part works and I echo data into the proc file and read it out successfully later in user space Linux after the board has booted.
Now I need to use the Linux Kernel module EXPORT_SYMBOL() functionality to allow an in-tree kernel module to know about my out-of-tree custom kernel module exported symbol.
In my custom module, I do this:
static unsigned char lan9730_mac_address_buffer[6];
EXPORT_SYMBOL(lan9730_mac_address_buffer);
And I patched the official kernel build in a bitbake bbappend file with this:
diff -Naur kernel-source/drivers/net/usb/smsc95xx.c kernel-source.new/drivers/net/usb/smsc95xx.c
--- kernel-source/drivers/net/usb/smsc95xx.c 2020-08-04 22:34:02.767157368 +0000
+++ kernel-source.new/drivers/net/usb/smsc95xx.c 2020-08-04 23:34:27.528435689 +0000
## -917,6 +917,27 ##
{
const u8 *mac_addr;
+ printk("=== smsc95xx_init_mac_address ===\n");
+ printk("%x:%x:%x:%x:%x:%x\n",
+ lan9730_mac_address_buffer[0],
+ lan9730_mac_address_buffer[1],
+ lan9730_mac_address_buffer[2],
+ lan9730_mac_address_buffer[3],
+ lan9730_mac_address_buffer[4],
+ lan9730_mac_address_buffer[5]);
+ printk("=== mac_addr is set ===\n");
+ if (lan9730_mac_address_buffer[0] != 0xff &&
+ lan9730_mac_address_buffer[1] != 0xff &&
+ lan9730_mac_address_buffer[2] != 0xff &&
+ lan9730_mac_address_buffer[3] != 0xff &&
+ lan9730_mac_address_buffer[4] != 0xff &&
+ lan9730_mac_address_buffer[5] != 0xff) {
+ printk("=== SUCCESS ===\n");
+ memcpy(dev->net->dev_addr, lan9730_mac_address_buffer, ETH_ALEN);
+ return;
+ }
+ printk("=== FAILURE ===\n");
+
/* maybe the boot loader passed the MAC address in devicetree */
mac_addr = of_get_mac_address(dev->udev->dev.of_node);
if (!IS_ERR(mac_addr)) {
diff -Naur kernel-source/drivers/net/usb/smsc95xx.h kernel-source.new/drivers/net/usb/smsc95xx.h
--- kernel-source/drivers/net/usb/smsc95xx.h 2020-08-04 22:32:30.824951447 +0000
+++ kernel-source.new/drivers/net/usb/smsc95xx.h 2020-08-04 23:33:50.486778978 +0000
## -361,4 +361,6 ##
#define INT_ENP_TDFO_ ((u32)BIT(12)) /* TX FIFO Overrun */
#define INT_ENP_RXDF_ ((u32)BIT(11)) /* RX Dropped Frame */
+extern unsigned char lan9730_mac_address_buffer[6];
+
#endif /* _SMSC95XX_H */
However, the Problem is that the Kernel fails to build with this error:
| GEN ./Makefile
| Using /home/me/Desktop/poky/build-microchip/tmp/work-shared/sama5d27-som1-ek-sd/kernel-source as source for kernel
| CALL /home/me/Desktop/poky/build-microchip/tmp/work-shared/sama5d27-som1-ek-sd/kernel-source/scripts/checksyscalls.sh
| Building modules, stage 2.
| MODPOST 279 modules
| ERROR: "lan9730_mac_address_buffer" [drivers/net/usb/smsc95xx.ko] undefined!
How can I refer to the Out-Of-Tree kernel module exported symbol in a patched In-Tree kernel module?
initramfs relevant code:
msg "Inserting lan9730-mac-address.ko..."
insmod /mnt/lib/modules/4.19.78-linux4sam-6.2/extra/lan9730-mac-address.ko
ls -rlt /proc/lan9730-mac-address
head -c 6 /dev/mtdblock0 > /proc/lan9730-mac-address
Out-Of-Tree module:
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/proc_fs.h>
#include <linux/sched.h>
#include <linux/uaccess.h>
#include <linux/slab.h>
const int BUFFER_SIZE = 6;
int write_length, read_length;
unsigned char lan9730_mac_address_buffer[6];
EXPORT_SYMBOL(lan9730_mac_address_buffer);
int read_proc(struct file *filp, char *buf, size_t count, loff_t *offp)
{
// Read bytes (returning the byte count) until all bytes are read.
// Then return count=0 to signal the end of the operation.
if (count > read_length)
count = read_length;
read_length = read_length - count;
copy_to_user(buf, lan9730_mac_address_buffer, count);
if (count == 0)
read_length = write_length;
return count;
}
int write_proc(struct file *filp, const char *buf, size_t count, loff_t *offp)
{
if (count > BUFFER_SIZE)
count = BUFFER_SIZE;
copy_from_user(lan9730_mac_address_buffer, buf, count);
write_length = count;
read_length = count;
return count;
}
struct file_operations proc_fops = {
read: read_proc,
write: write_proc
};
void create_new_proc_entry(void) //use of void for no arguments is compulsory now
{
proc_create("lan9730-mac-address", 0, NULL, &proc_fops);
}
int proc_init (void) {
create_new_proc_entry();
memset(lan9730_mac_address_buffer, 0x00, sizeof(lan9730_mac_address_buffer));
return 0;
}
void proc_cleanup(void) {
remove_proc_entry("lan9730-mac-address", NULL);
}
MODULE_LICENSE("GPL");
module_init(proc_init);
module_exit(proc_cleanup);
There are several ways to achieve what you want (taking into account different aspects, like module can be compiled in or be a module).
Convert Out-Of-Tree module to be In-Tree one (in your custom kernel build). This will require simple export and import as you basically done and nothing special is required, just maybe providing a header with the symbol and depmod -a run after module installation. Note, you have to use modprobe in-tree which reads and satisfies dependencies.
Turn other way around, i.e. export symbol from in-tree module and file it in the out-of-tree. In this case you simply have to check if it has been filed or not (since it's a MAC address the check against all 0's will work, no additional flags needed)
BUT, these ways are simply wrong. The driver and even your patch clearly show that it supports OF (Device Tree) and your board has support of it. So, this is a first part of the solution, you may provide correct MAC to the network card using Device Tree.
In the case you want to change it runtime the procfs approach is very strange to begin with. Network device interface in Linux has all means to update MAC from user space at any time user wants to do it. Just use ip command, like /sbin/ip link set <$ETH> addr <$MACADDR>, where <$ETH> is a network interface, for example, eth0 and <$MACADDR> is a desired address to set.
So, if this question rather about module symbols, you need to find better example for it because it's really depends to use case. You may consider to read How to export symbol from Linux kernel module in this case? as an alternative way to exporting. Another possibility how to do it right is to use software nodes (it's a new concept in recent Linux kernel).
I have two kernel modules where first module had one function exported and second module uses this function to read spi data. sample program is given below
Module-1:
int spi_fun(uint8_t *tx_buf, uint8_t *rx_buf,int len)
{
spi_sync_txrx(tx_buf,rx_buf,len);
}
Module-2:
void dummy_fun()
{
uint8_t tx[4]={0};
uint8_t rx[4]={0};
spi_fun(tx,rx,4);
}
the above mentioned scenario is working fine. If I declare a local rx buffer(spi_data[4]) inside spi_fun(), and use memcpy to copy spi_data contents to the rx_buf, kernel is crashing with error as given below
New Module-2 fun:
Module-1:
int spi_fun(uint8_t *tx_buf, uint8_t *rx_buf,int len)
{
uint8_t spi_data[4];
spi_sync_txrx(tx_buf,spi_data,len);
memcpy(rx_buf, spi_data, len); //here error
}
Kernel Error:
Internal error: Accessing user space memory outside uaccess.h
routines: 96000045 [#1] PREEMPT SMP
I have used copy_from_user/copy_to_user functions, but i was getting target buffer as zeroes.
Does anyone experienced this issue???
So my problem sounds like this.
I have some platform dependent code (embedded system) which writes to some MMIO locations that are hardcoded at specific addresses.
I compile this code with some management code inside a standard executable (mainly for testing) but also for simulation (because it takes longer to find basic bugs inside the actual HW platform).
To alleviate the hardcoded pointers, i just redefine them to some variables inside the memory pool. And this works really well.
The problem is that there is specific hardware behavior on some of the MMIO locations (w1c for example) which makes "correct" testing hard to impossible.
These are the solutions i thought of:
1 - Somehow redefine the accesses to those registers and try to insert some immediate function to simulate the dynamic behavior. This is not really usable since there are various ways to write to the MMIO locations (pointers and stuff).
2 - Somehow leave the addresses hardcoded and trap the illegal access through a seg fault, find the location that triggered, extract exactly where the access was made, handle and return. I am not really sure how this would work (and even if it's possible).
3 - Use some sort of emulation. This will surely work, but it will void the whole purpose of running fast and native on a standard computer.
4 - Virtualization ?? Probably will take a lot of time to implement. Not really sure if the gain is justifiable.
Does anyone have any idea if this can be accomplished without going too deep? Maybe is there a way to manipulate the compiler in some way to define a memory area for which every access will generate a callback. Not really an expert in x86/gcc stuff.
Edit: It seems that it's not really possible to do this in a platform independent way, and since it will be only windows, i will use the available API (which seems to work as expected). Found this Q here:
Is set single step trap available on win 7?
I will put the whole "simulated" register file inside a number of pages, guard them, and trigger a callback from which i will extract all the necessary info, do my stuff then continue execution.
Thanks all for responding.
I think #2 is the best approach. I routinely use approach #4, but I use it to test code that is running in the kernel, so I need a layer below the kernel to trap and emulate the accesses. Since you have already put your code into a user-mode application, #2 should be simpler.
The answers to this question may provide help in implementing #2. How to write a signal handler to catch SIGSEGV?
What you really want to do, though, is to emulate the memory access and then have the segv handler return to the instruction after the access. This sample code works on Linux. I'm not sure if the behavior it is taking advantage of is undefined, though.
#include <stdint.h>
#include <stdio.h>
#include <signal.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static void segv_handler(int, siginfo_t *, void *);
int main()
{
struct sigaction action = { 0, };
action.sa_sigaction = segv_handler;
action.sa_flags = SA_SIGINFO;
sigaction(SIGSEGV, &action, NULL);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static void segv_handler(int, siginfo_t *info, void *ucontext_arg)
{
ucontext_t *ucontext = static_cast<ucontext_t *>(ucontext_arg);
ucontext->uc_mcontext.gregs[REG_RAX] = 1234;
ucontext->uc_mcontext.gregs[REG_RIP] += 2;
}
The code to read the register is written in assembly to ensure that both the destination register and the length of the instruction are known.
This is how the Windows version of prl's answer could look like:
#include <stdint.h>
#include <stdio.h>
#include <windows.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *);
int main()
{
SetUnhandledExceptionFilter(segv_handler);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *ep)
{
// only handle read access violation of REG_ADDR
if (ep->ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION ||
ep->ExceptionRecord->ExceptionInformation[0] != 0 ||
ep->ExceptionRecord->ExceptionInformation[1] != (ULONG_PTR)REG_ADDR)
return EXCEPTION_CONTINUE_SEARCH;
ep->ContextRecord->Rax = 1234;
ep->ContextRecord->Rip += 2;
return EXCEPTION_CONTINUE_EXECUTION;
}
So, the solution (code snippet) is as follows:
First of all, i have a variable:
__attribute__ ((aligned (4096))) int g_test;
Second, inside my main function, i do the following:
AddVectoredExceptionHandler(1, VectoredHandler);
DWORD old;
VirtualProtect(&g_test, 4096, PAGE_READWRITE | PAGE_GUARD, &old);
The handler looks like this:
LONG WINAPI VectoredHandler(struct _EXCEPTION_POINTERS *ExceptionInfo)
{
static DWORD last_addr;
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION) {
last_addr = ExceptionInfo->ExceptionRecord->ExceptionInformation[1];
ExceptionInfo->ContextRecord->EFlags |= 0x100; /* Single step to trigger the next one */
return EXCEPTION_CONTINUE_EXECUTION;
}
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP) {
DWORD old;
VirtualProtect((PVOID)(last_addr & ~PAGE_MASK), 4096, PAGE_READWRITE | PAGE_GUARD, &old);
return EXCEPTION_CONTINUE_EXECUTION;
}
return EXCEPTION_CONTINUE_SEARCH;
}
This is only a basic skeleton for the functionality. Basically I guard the page on which the variable resides, i have some linked lists in which i hold pointers to the function and values for the address in question. I check that the fault generating address is inside my list then i trigger the callback.
On first guard hit, the page protection will be disabled by the system, but i can call my PRE_WRITE callback where i can save the variable state. Because a single step is issued through the EFlags, it will be followed immediately by a single step exception (which means that the variable was written), and i can trigger a WRITE callback. All the data required for the operation is contained inside the ExceptionInformation array.
When someone tries to write to that variable:
*(int *)&g_test = 1;
A PRE_WRITE followed by a WRITE will be triggered,
When i do:
int x = *(int *)&g_test;
A READ will be issued.
In this way i can manipulate the data flow in a way that does not require modifications of the original source code.
Note: This is intended to be used as part of a test framework and any penalty hit is deemed acceptable.
For example, W1C (Write 1 to clear) operation can be accomplished:
void MYREG_hook(reg_cbk_t type)
{
/** We need to save the pre-write state
* This is safe since we are assured to be called with
* both PRE_WRITE and WRITE in the correct order
*/
static int pre;
switch (type) {
case REG_READ: /* Called pre-read */
break;
case REG_PRE_WRITE: /* Called pre-write */
pre = g_test;
break;
case REG_WRITE: /* Called after write */
g_test = pre & ~g_test; /* W1C */
break;
default:
break;
}
}
This was possible also with seg-faults on illegal addresses, but i had to issue one for each R/W, and keep track of a "virtual register file" so a bigger penalty hit. In this way i can only guard specific areas of memory or none, depending on the registered monitors.
I'm trying to make a kernel module to enable FOP compatibility mode for x87 FPU. This is done via setting bit 2 in IA32_MISC_ENABLE MSR. Here's the code:
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <asm/msr-index.h>
#include <asm/msr.h>
MODULE_LICENSE("GPL");
MODULE_AUTHOR("10110111");
MODULE_DESCRIPTION("Module to enable FOPcode compatibility mode");
MODULE_VERSION("0.1");
static int __init fopCompat_init(void)
{
unsigned long long misc_enable=native_read_msr(MSR_IA32_MISC_ENABLE);
printk(KERN_INFO "Before trying to set FOP_COMPAT, IA32_MISC_ENABLE=%llx,"
" i.e. FOP_COMPAT is %senabled\n"
,misc_enable,misc_enable&MSR_IA32_MISC_ENABLE_X87_COMPAT?"":"NOT ");
wrmsrl(MSR_IA32_MISC_ENABLE,misc_enable|MSR_IA32_MISC_ENABLE_X87_COMPAT);
misc_enable=native_read_msr(MSR_IA32_MISC_ENABLE);
printk(KERN_INFO "Tried to set FOP_COMPAT. Result: IA32_MISC_ENABLE=%llx,"
" i.e. FOP_COMPAT is now %senabled\n"
,misc_enable,misc_enable&MSR_IA32_MISC_ENABLE_X87_COMPAT?"":"NOT ");
return 0;
}
static void __exit fopCompat_exit(void)
{
const unsigned long long misc_enable=native_read_msr(MSR_IA32_MISC_ENABLE);
printk(KERN_INFO "Quitting FOP-compat with IA32_MISC_ENABLE=%llx\n",misc_enable);
if(!(misc_enable & MSR_IA32_MISC_ENABLE_X87_COMPAT))
printk(KERN_INFO "NOTE: seems some CPUs still have to be set up, "
"or compatibility mode will work inconsistently\n");
printk(KERN_INFO "\n");
}
module_init(fopCompat_init);
module_exit(fopCompat_exit);
It seems to work, but on multiple insmod/rmmod cycles I sometimes get dmesg output that the compatibility mode wasn't still enabled, although it was immediately after doing wrmsr. After some thinking I realized that it's because the module code was executed on different logical CPUs (I have Core i7 with 4 cores*HT=8 logical CPUs), so I had a 1/8 chance of getting "enabled" print on doing rmmod. After repeating the cycle for about 20 times I got consistent "enabled" prints, and my userspace application happily works with it.
So now my question is: how can I make my code execute on all logical CPUs present on the system, so as to enable compatibility mode for all of them?
For execute code on every CPU use on_each_cpu function.
Signature:
int on_each_cpu(void (*func) (void *info), void *info, int wait)
Description:
Call a function on all processors.
If wait parameter is non-zero, it waits for function's completion on all CPUs.
Function func shouldn't sleep, but whole on_each_cpu() call shouldn't be done in atomic context.
I encountered "SYS#0" at the top of a stack and cannot find any documentation as to what that means.
Compiler: g++
OS: Solaris 9
Arch: SPARC
Memory Manager libhoard_32.so from Hoard 3.5.1
We used "gcore" to generate a core file. Looking at the output of running the "pstack" command against the core file, the only thread that was doing anything interesting had the following at the very top of its call stack:
ff309858 SYS#0 ()
ff309848 void MyHashMap<const void*,unsigned,AlignedMmapInstance<65536U>::SourceHeap>::set(const void*,unsigned) (ff31eed4, 9bf20000, 10000, 40, 9bf1fff0, ff31e738) + 134
...
pflags for that LWP shows:
/8: flags = PR_STOPPED|PR_ISTOP|PR_ASLEEP
why = PR_REQUESTED
sigmask = 0xfffffeff,0x00003fff
I could not find any mention of this syntax in the Sun documentation.
Edit: The process appears to have hung sometime prior to doing the gcore. Is "SYS#0" somehow interrelated with process hangs?
Edit: Added next stack frame and link to Hoard, pflags output
Edit: The accepted answer is correct. In addition, at least on SPARC, the g1 register should contain the system call number, but this did not appear to be the case in our core file.
The topic "what is an indirect system call?" is probably good material for another question.
Try this:
$ cat foo.c
#include <stdio.h>
int main(int argc, char *argv[]) {
char buf[1024];
proc_sysname(0, buf, 1024);
printf("%s\n", buf);
}
$ gcc -ofoo -lproc foo.c
$ ./foo
SYS#0
$
SYS#0 is therefore the string that represents system call zero. If you look in <sys/syscall.h> (the system call table) you will find the following:
/* syscall enumeration MUST begin with 1 */
/*
* SunOS/SPARC uses 0 for the indirect system call SYS_syscall
* but this doesn't count because it is just another way
* to specify the real system call number.
*/
#define SYS_syscall 0
The indirect system call syscall(SYS_syscall, foo, bar, ...) is equivalent to the direct call syscall(foo, bar, ...).