What does the construction (p is on gpu)
#pragma acc host_data use_device(p)
{...}
exactly do?
"A host_data construct makes the address of device data available
on the host." (The OpenAcc API). use_device - "directs the compiler to use the device address of any entry in list, for instance, when passing a variable to procedure" (OpenAcc Programming and best practices Guide). Does it mean that, for example, if i have the variables
int A=1;
int B=2;
#pragma acc declare device_resident(A,B)
...
alocated on the device, i can write from the host
#pragma acc host_data use_device(A,B)
{
memcpy(&A,&B,sizeof(int));
}
i suppose this is wrong. Please, explain this to me.
The OpenACC "host_data" directive is used when you need to get the device address for a variable for use within host code. It's mostly used for interoperability with CUDA or CUDA aware MPI when you want to pass in the device address of a variable.
In your example, this would most likely cause an error since passing a device address to the system "memcpy" would give a seg fault. Though if you change "memcpy" to "cudaMemcpy" or other routine which expects a device address to be passed in, then it would be fine.
This blog post may be helpful: https://devblogs.nvidia.com/parallelforall/3-versatile-openacc-interoperability-techniques/
Related
Coming from the Windows world, I assume that Vmlinuz is equivalent to ntoskrnl.exe, and this is the kernel executable that gets mapped in Kernel memory.
Now I want to figure out whether an address inside kernel belongs to the kernel executable or not. Is using core_kernel_text the correct way of finding this out?
Because core_kernel_text doesn't return true for some of the addresses that clearly should belong to Linux kernel executable.
For example the core_kernel_text doesn't return true when i give it the syscall entry handler address which can be found with the following code:
unsigned long system_call_entry;
rdmsrl(MSR_LSTAR, system_call_entry);
return (void *)system_call_entry;
And when I use this code snippet, the address of the syscall entry handler doesn't belong to the core kernel text or to any kernel module (using get_module_from_addr).
So how can an address for a handler that clearly belongs to Linux kernel executable such as syscall entry, don't belong to neither the core kernel or any kernel module? Then what does it belong to?
Which API do I need to use for these type of addresses that clearly belong to Linux kernel executable to assure me that the address indeed belongs to kernel?
I need such an API because I need to write a detection for malicious kernel modules that patch such handlers, and for now I need to make sure the address belongs to kernel, and not some third party kernel module or random kernel address. (Please do not discuss methods that can be used to bypass my detection, obviously it can be bypassed but that's another story)
The target kernel version is 4.15.0-112-generic, and is Ubuntu 16.04 as a VMware guest.
Reproducible code as requested:
typedef int(*core_kernel_text_t)(unsigned long addr);
core_kernel_text_t core_kernel_text_;
core_kernel_text_ = (core_kernel_text_t)kallsyms_lookup_name("core_kernel_text");
unsigned long system_call_entry;
rdmsrl(MSR_LSTAR, system_call_entry);
int isInsideCoreKernel = core_kernel_text_((unsigned long)system_call_entry);
printk("%d , 0x%pK ", isInsideCoreKernel, system_call_entry);
EDIT1: So in the MSR_LSTAR example that I gave above, it turns out that It's related to Kernel Page Table Isolation and CONFIG_RETPOLINE=y in config:
system_call value is different each time when I use rdmsrl(MSR_LSTAR, system_call)
And that's why I am getting the address 0xfffffe0000006000 aka SYSCALL64_entry_trampoline, the same as the question above.
So now the question remains, why this SYSCALL64_entry_trampoline address doesn't belong to anything? It doesn't belong to any kernel module, and it doesn't belong to the core kernel, so which executable this address belongs to and how can I check that with an API similar to core_kernel_text? It seems like it belongs to cpu_entry_area, but what is that and how can I check if an address belongs to that?
You are seeing this "weird" address in MSR_LSTAR (IA32_LSTAR) because of Kernel Page-Table Isolation (KPTI), which mitigates Meltdown. As other existing answers(1) you already found point out, the address you see is the one of a small trampoline (entry_SYSCALL_64_trampoline) that is dynamically remapped at boot time by the kernel for each CPU, and thus does not have an address within the kernel text.
(1)By the way, the answer linked above wrongly states that the corresponding config option for KPTI is CONFIG_RETPOLINE=y. This is wrong, the "retpoline" is a mitigation for Spectre, not Meltdown. The config to enable KPTI is CONFIG_PAGE_TABLE_ISOLATION=y.
You don't have many options. Either:
Tell VMWare to emulate a recent CPU that is not vulnerable to Meltdown.
Detect and implement support for the KPTI trampoline.
You can implement support for this by detecting whether the kernel supports KPTI (CONFIG_PAGE_TABLE_ISOLATION), and if so check whether current CPU has KPTI enabled. The code at kernel/cpu/bugs.c that provides information for /sys/devices/system/cpu/vulnerabilities/meltdown shows how this can be detected:
ssize_t cpu_show_meltdown(struct device *dev,
struct device_attribute *attr, char *buf)
{
if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
return sprintf(buf, "Not affected\n");
if (boot_cpu_has(X86_FEATURE_PTI))
return sprintf(buf, "Mitigation: PTI\n");
return sprintf(buf, "Vulnerable\n");
}
The actual trampoline is set up at boot and its address is stored in each CPU's "entry area" for later use (e.g. here when setting up IA32_LSTAR). This answer on Unix & Linux SE explains the purpose of the cpu entry area and its relation to KPTI.
In your module you can do the following detection:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kallsyms.h>
#include <asm/msr-index.h>
#include <asm/msr.h>
#include <asm/cpufeature.h>
#include <asm/cpu_entry_area.h>
// ...
typedef int(*core_kernel_text_t)(unsigned long addr);
core_kernel_text_t core_kernel_text_;
bool syscall_entry_64_ok(void)
{
unsigned long entry;
rdmsrl(MSR_LSTAR, entry);
if (core_kernel_text_(entry))
return true;
#ifdef CONFIG_PAGE_TABLE_ISOLATION
if (this_cpu_has(X86_FEATURE_PTI)) {
int cpu = smp_processor_id();
unsigned long trampoline = (unsigned long)get_cpu_entry_area(cpu)->entry_trampoline;
if ((entry & PAGE_MASK) == trampoline)
return true;
}
#endif
return false;
}
static int __init modinit(void)
{
core_kernel_text_ = (core_kernel_text_t)kallsyms_lookup_name("core_kernel_text");
if (!core_kernel_text_)
return -EOPNOTSUPP;
pr_info("syscall_entry_64_ok() -> %d\n", syscall_entry_64_ok());
return 0;
}
I'm trying to figure out how an ebpf program can change the outcome of a function (not a syscall, in my case) in kernel space. I've found numerous articles and blog posts about how ebpf turns the kernel into a programmable kernel, but it seems like every example is just read-only tracing and collecting statistics.
I can think of a few ways of doing this: 1) make a kernel application read memory from an ebpf program, 2) make ebpf change the return value of a function, 3) allow an ebpf program to call kernel functions.
The first approach does not seem like a good idea.
The second would be enough, but as far as I understand it's not easy. This question says syscalls are read-only. This bcc document says it is possible but the function needs to be whitelisted in the kernel. This makes me think that the whitelist is fixed and can only be changed by recompiling the kernel, is this correct?
The third seems to be the most flexible one, and this blog post encouraged me to look into it. This is the one I'm going for.
I started with a brand new 5.15 kernel, which should have this functionality
As the blog post says, I did something no one should do (security is not an issue since I'm just toying with this) and opened every function to ebpf by adding this to net/core/filter.c (which I'm not sure is the correct place to do so):
static bool accept_the_world(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
return true;
}
bool export_the_world(u32 kfunc_id)
{
return true;
}
const struct bpf_verifier_ops all_verifier_ops = {
.check_kfunc_call = export_the_world,
.is_valid_access = accept_the_world,
};
How does the kernel know of the existence of this struct? I don't know. None of the other bpf_verifier_ops declared are used anywhere else, so it doesn't seem like there is a register_bpf_ops
Next I was able to install bcc (after a long fight due to many broken installation guides).
I had to checkout v0.24 of bcc. I read somewhere that pahole is required when compiling the kernel, so I updated mine to v1.19.
My python file is super simple, I just copied the vfs example from bcc and simplified it:
bpf_text_kfunc = """
extern void hello_test_kfunc(void) __attribute__((section(".ksyms")));
KFUNC_PROBE(vfs_open)
{
stats_increment(S_OPEN);
hello_test_kfunc();
return 0;
}
"""
b = BPF(text=bpf_text_kfunc)
Where hello_test_kfunc is just a function that does a printk, inserted as a module into the kernel (it is present in kallsyms).
When I try to run it, I get:
/virtual/main.c:25:5: error: cannot call non-static helper function
hello_test_kfunc();
^
And this is where I'm stuck. It seems like it's the JIT that is not allowing this, but who exactly is causing this issue? BCC, libbpf or something else? Do I need to manually write bpf code to call kernel functions?
Does anyone have an example with code of what the lwn blog post I linked talks about actually working?
eBPF is fundamentally made to extend kernel functionality in very specific limited ways. Essentially a very advanced plugin system. One of the main design principles of the eBPF is that a program is not allowed to break the kernel. Therefor it is not possible to change to outcome of arbitrary kernel functions.
The kernel has facilities to call a eBPF program at any time the kernel wants and then use the return value or side effects from helper calls to effect something. The key here is that the kernel always knows it is doing this.
One sort of exception is the BPF_PROG_TYPE_STRUCT_OPS program type which can be used to replace function pointers in whitelisted structures.
But again, explicitly allowed by the kernel.
make a kernel application read memory from an ebpf program
This is not possible since the memory of an eBPF program is ephemaral, but you could define your own custom eBPF program type and pass in some memory to be modified to the eBPF program via a custom context type.
make ebpf change the return value of a function
Not possible unless you explicitly call a eBPF program from that function.
allow an ebpf program to call kernel functions.
While possible for a number for purposes, this typically doesn't give you the ability to change return values of arbitrary functions.
You are correct, certain program types are allowed to call some kernel functions. But these are again whitelisted as you discovered.
How does the kernel know of the existence of this struct?
Macro magic. The verifier builds a list of these structs. But only if the program type exists in the list of program types.
/virtual/main.c:25:5: error: cannot call non-static helper function
This seems to be a limitation of BCC, so if you want to play with this stuff you will likely have to manually compile your eBPF program and load it with libbpf or cilium/ebpf.
IS it possible to write a (linux kernel)sycall function that has more than 6 input parameters? Looking at the header I see that the defined syscall macros have a maximum of 6 parameters. I'm tempted to try to define SYSCALL7 and SYSCALL8 to allow for 7 and 8 parameters but I'm not quite sure if that will actually work.
For x86, the following function (from x86...syscall.h) copies the arguments over:
static inline void syscall_get_arguments(struct task_struct *task,
struct pt_regs *regs,
unsigned int i, unsigned int n,
unsigned long *args)
{
BUG_ON(i + n > 6);
memcpy(args, ®s->bx + i, n * sizeof(args[0]));
}
This function is described well in the comments in asm_generic/syscall.h. It copies the arguments into the syscall, and there is a limit of 6 arguments. It may be implemented in a number of ways depending on architecture. For x86 (from the snippet above) it looks like the arguments are all passed by register.
So, if you want to pass more than 6 arguments, use a struct. If you must have a SYSCALL7, then you are going to have to create a custom kernel and likely modify almost every step of the syscall process. x86_64 would likely accommodate this change easier, since it has more registers than x86.
What if one day you need 20 parameters ? I think the best way to go around your syscall problem is to use a pointer to *void.
This way you can pass a struct containing an unlimited amount of parameters.
Generally there is no limit to the number of parameter. But all these things need a standard: all kernel module write and user or caller will need to agree on a standard way to pass information from caller to callee (and vice versa) - whether it is passing by stack or register. It is called "ABI" or calling convention. There are different standard for x86 and AMD64, and generally it is the same for all UNIX in x86: Linux, FreeBSD etc.
http://www.x86-64.org/documentation/abi.pdf
Eg, x86 syscall ABI:
http://lwn.net/Articles/456731/
http://esec-lab.sogeti.com/post/2011/07/05/Linux-syscall-ABI
More details please see (to avoid repetition):
What are the calling conventions for UNIX & Linux system calls on x86-64
Why does Windows64 use a different calling convention from all other OSes on x86-64?
And userspace will have its own ABI as well:
https://www.kernel.org/doc/Documentation/ABI/README
https://lwn.net/Articles/234133/
http://lwn.net/Articles/456731/
I was able to control GPIO using mmap system call to control LED operation directly from the user space. Now I want to implement driver in kernel space.
I am trying to write my first kernel space device driver for 16*2 line of LCD in Linux for ARM controller RPi.
Now i need to access the GPIO for this purpose.
In AVR i use to access the Port like this.
#define PORTA *(volatile unsigned char*)0x30
I was reading LLD it tells to use inb() & outb() function to access the i/o port.
http://www.makelinux.net/ldd3/chp-9-sect-2
1> Can we not use #define address of port to access the GPIO ?
2> What is the advantages to use use inb() & outb() functions for controlling the GPIO ?
Please suggest.
In AVR i use to access the Port like this.
#define PORTA *(volatile unsigned char*)0x30
That's an improper definition that overloads the symbol PORTA.
Besides defining the port address as 0x30, you are also dereferencing that location.
So it is actually a read operation, but there's no indication of that in the name, i.e. you have really defined a macro for READ_PORTA.
1> Can we not use #define address of port to access the GPIO ?
Of course you can (and should).
#define PORTA (unsigned char *)0x30
You'll find similar statements in header files for device registers in the Linux source tree. When developing a new device driver, I look for a header file of #defines for all of the device's registers and command codes, and start writing one if no file is already available.
2> What is the advantages to use use inb() & outb() functions for controlling the GPIO ?
The code is then an unambiguous statement that I/O is being performed, regardless of whether the architecture uses I/O ports or memory-mapped I/O.
Anyone reading the following should be able to deduce what is going on:
x = inb(PORTA);
versus the confusion when using your macro:
x = PORTA;
The above statement using an overloaded macro would not pass a code review conducted by competent coders.
You should also get familiar with and use the Linux kernel coding style.
1) the use of defines simplifies your task often. You could, of course, not use define for your port and use this construction literally everywhere you need to access the port. But then you will have to replace the 0x30 everywhere with another address if you change the design of your device, for example, if you decide to connect your LED to port B. Also, it will make your code less readable. Alternatively you could declare a function that will access your port. If such a simple function is declared inline (if your compiler supports inlines) then there is no difference in performance.
2) the advantage of using inb() and outb() is portability of your program. If this is not an issue, then it is fine to access your port directly.
I am porting a program for an ARM chip from a IAR compiler to gcc.
In the original code, IAR specific operators such as __segment_begin and __segment_size are used to obtain the beginning and size respectively of certain memory segments.
Is there any way to do the same thing with GCC? I've searched the GCC manual but was unable to find anything relevant.
More details:
The memory segments in question have to be in fixed locations so that the program can interface correctly with certain peripherals on the chip. The original code uses the __segment_begin operator to get the address of this memory and the __segment_size to ensure that it doesn't overflow this memory.
I can achieve the same functionality by adding variables to indicate the start and end of these memory segments but if GCC had similar operators that would help minimise the amount of compiler dependent code I end up having to write and maintain.
What about the linker's flag --section-start? Which I read is supported here.
An example on how to use it can be found on the AVR Freaks Forum:
const char __attribute__((section (".honk"))) ProjString[16] = "MY PROJECT V1.1";
You will then have to add to the linker's options: -Wl,--section-start=.honk=address.
Modern versions of GCC will declare two variables for each segment, namely __start_MY_SEGMENT and __stop_MY_SEGMENT. To use these variables, you need to declare them as externs with the desired type. Following that, you and then use the '&' operator to get the address of the start and end of that segment.
extern uint8_t __start_MY_SEGMENT;
extern uint8_t __stop_MY_SEGMENT;
#define MY_SEGMENT_LEN (&__stop_MY_SEGMENT - &__start_MY_SEGMENT)