How to access physical address on Linux in C?

How to access physical address on Linux in C? - linux-kernel

Just for my study, I'm trying to implement routine to translate from Virtual Address to Physical Address on ARM Linux.
I'm looking at the specification from the material below.
http://homepages.wmich.edu/~grantner/ece6050/ARM7100vA_3.pdf
I'm implementing the routine in Kernel Module, and I could read TTBR0 for the first step.
But I realized I have to access physical address P using TTBR0 and target virtual address.
I didn't know how to access physical address P, and tried ioremap() to get virtual
address V for that physical address P.
However, when I call ioremap(), it doesn't work and dump backtrace as follows.
Does anyone know how the right way to access physical address?
[255075.566078] ------------[ cut here ]------------
[255075.570816] WARNING: at arch/arm/mm/ioremap.c:208 __arm_ioremap_pfn_caller+)
[255075.578806] Modules linked in: tv_driver(+) mtk_stp_gps mtk_fm_priv(P) mt66]
[255075.595096] Backtrace:
[255075.597590] [<c004df38>] (dump_backtrace+0x0/0x10c) from [<c057c8c4>] (dump)
[255075.606186] r7:00000009 r6:000000d0 r5:c005546c r4:00000000
[255075.611800] [<c057c8ac>] (dump_stack+0x0/0x1c) from [<c00817e0>] (warn_slow)
[255075.621378] [<c008178c>] (warn_slowpath_common+0x0/0x6c) from [<c008181c>] )
[255075.630390] r9:ca466000 r8:00000000 r7:4a25b000 r6:00000001 r5:00000004
[255075.636883] r4:00000290
[255075.639616] [<c00817f8>] (warn_slowpath_null+0x0/0x2c) from [<c005546c>] (_)
[255075.649740] [<c005537c>] (__arm_ioremap_pfn_caller+0x0/0x104) from [<c00554)
[255075.660186] [<c00554a0>] (__arm_ioremap_caller+0x0/0x64) from [<c0055518>] )
[255075.669115] r5:00000000 r4:4a25b290
[255075.672730] [<c0055504>] (__arm_ioremap+0x0/0x18) from [<bf144794>] (cl)
[255075.682695] [<bf144720>] (client_init+0x0/0x288 [driver]) from [<c00)
[255075.692774] r7:00022451 r6:00000000 r5:c07834c0 r4:4018a008
[255075.698458] [<c0043628>] (do_one_initcall+0x0/0x1a4) from [<c00bf150>] (sys)
[255075.707624] [<c00bf0c4>] (sys_init_module+0x0/0x1a4) from [<c0049b80>] (ret)
[255075.716673] r7:00000080 r6:40043d6c r5:00022451 r4:4018a008
[255075.722272] ---[ end trace bffd4c38629759a2 ]---

You cannot access directly to physical address while the MMU is active. But you have some ways to do it:
-Write an assembly code to disable MMU ( by writing in CP15 register ), so you will be able to write in any physical adress still in assembly.
-Find the virtual offset, with this offset you will be able to find the virtual address V for the physical address P.
-Use ioremap() function ( in your case i don't know why it doesn't work, i need more information :) )
-Use phys_to_virt function describe in arch/arm/include/asm/memory.h, but it's only for direct map ram memory.

Related

How to distinguish between CPU and GPU virtual address within Linux kernel module?

Within a user space program, I can easily tell whether a pointer ptr targets CPU memory (allocated with regular memory allocations) or GPU memory (allocated with cudaMalloc) using the following code.
CUDA_POINTER_ATTRIBUTE_P2P_TOKENS tokens;
CUresult status = cuPointerGetAttribute( &tokens, CU_POINTER_ATTRIBUTE_P2P_TOKENS, (intptr_t)ptr );
if ( CUDA_SUCCESS == status )
// This is GPU memory
How can I do this from within a Linux kernel module driver?
How can the Linux kernel module driver perform the same test given a user core virtual address?
Note that my kernel module uses the nvidia driver for RDMA. It calls the function nvidia_p2p_get_pages for example. As this function takes a lot of time, I'd like to check, before calling it, whether the virtual address ptr actually points to GPU memory.

QEMU-KVM: Is Guest Physical Address(GBA) same as QEMU-KVM's Virtual Address on the host?

For example, if a process on the guest has data allocated on 0x8000(as in Guest Virtual Memory), and that has been mapped into 0x4000 in Guest Physical Memory Address Space, is the data located on 0x4000 in the virtual memory address space of host-side QEMU-KVM's (sub)process(per VM)? In other words, if I write new code within QEMU's source(so I can use the QEMU-KVM's page table), compile, and run it, then can I access to the data of guest process directly just with the guest's physical memory address matching to the guest virtual memory address?

No, the page table mapping used for the user-space QEMU process is unrelated to either the page tables used by the guest itself for guest virtual -> guest physical address mapping, or to the page tables used by the KVM kernel code for guest physical address -> host physical address mapping for RAM.
When you're writing code in QEMU that needs to access guest memory, you should do so using the APIs that QEMU provides for that, which deal with converting the guest address to a host virtual address for RAM and also with handling the case when that guest address has an emulated device rather than RAM. The QEMU developer internals docs have a section on APIs for loads and stores, and the functions are also documented in doc comments in the header files.
Usually the best advice is "find some existing code in QEMU which is doing something basically the same as what you're trying to do, and follow that as an example".

I'm assuming your question is about x86 platform.
QEMU+KVM use extended page table (EPT) to map host virtual addresses (HVA) to guest physical addresses (GPA). To find HVA corresponding to GPA, you should traverse EPT of your guest.
When you need to read a virtual address from your guest, you can use GDB functions in QEMU source. This snippet will read 4 bytes at virtualAddress in the current executing process in your guest:
uint8_t instr[4];
if( cpu_memory_rw_debug( cpu, virtualAddress, (uint8_t *)instr, sizeof( instr ), 0 ) )
{
printf( "Failed to read instr!\n" );
return;
}
cpu is an CPUState* of a cpu of your guest.
When you want to read an address of a concrete process in your guest, you should set env->cr[3] to CR3 value of this process (env is an CPUX86State*). Dont forget to restore the original value after you finish reading. And of course read memory only when your guest is not executing, otherwise there can be races.

ARM platform, how to convert virtual address to physical address in kernel module?

As we know, on ARM platform, 16MB space is reserved for kernel modules below PAGE_OFFSET.
If I write a module and define a global variable, then how I get its phisical address?
It is obvious that I get a wrong physical address using virt_to_phys function.

If virt_to_phys won't work for you, you can use the MMU to do a V=>P mapping, see Find the physical address of exception vector table from kernel module

Page table in Linux kernel space during boot

I feel confuse in page table management in Linux kernel ?
In Linux kernel space, before page table is turned on. Kernel will run in virtual memory with 1-1 mapping mechanism. After page table is turned on, then kernel has consult page tables to translate a virtual address into a physical memory address.
Questions are:
At this time, after turning on page table, kernel space is still 1GB (from 0xC0000000 - 0xFFFFFFFF ) ?
And in the page tables of kernel process, only page table entries (PTE) in range from 0xC0000000 - 0xFFFFFFFF are mapped ?. PTEs are out of this range will be not mapped because kernel code never jump there ?
Mapping address before and after turning on page table is same ?
Eg. before turning on page table, the virtual address 0xC00000FF is mapped to physical address 0x000000FF, then after turning on page table, above mapping does not change. virtual address 0xC00000FF is still mapped to physical address 0x000000FF. Different thing is only that after turning on page table, CPU has consult the page table to translate virtual address to physical address which no need to do before.
The page table in kernel space is global and will be shared across all process in the system including user process ?
This mechanism is same in x86 32bit and ARM ?

The following discussion is based on 32-bit ARM Linux, and version of kernel source code is 3.9
All your questions can be addressed if you go through the procedure of setting up the initial page table(which will be overwitten later by function paging_init ) and turning on MMU.
When kernel is first launched by bootloader, Assembly function stext(in arch\arm\kernel\head.s) is the first function to run. Note that MMU has not been turned on yet at this moment.
Among other things, the two import jobs done by this function stext is:
create the initial page tabel(which will be overwitten later by
function paging_init )
turn on MMU
jump to C part of kernel initialization code and carry on
Before delving into the your questions, it is benificial to know:
Before MMU is turned on, every address issued by CPU is physical
address
After MMU is turned on, every address issued by CPU is virtual address
A proper page table should be set up before turning on MMU, otherwise your code will simply "be blown away"
By convention, Linux kernel uses higher 1GB part of virtual address and user land uses the lower 3GB part
Now the tricky part:
First trick: using position-independent code.
Assembly function stext is linked to address "PAGE_OFFSET + TEXT_OFFSET"(0xCxxxxxxx), which is a virtual address, however, since MMU has not been turned on yet, the actual address where assembly function stext is running is "PHYS_OFFSET + TEXT_OFFSET"(the actual value depends on your actual hardware), which is a physical address.
So, here is the thing: the program of function stext "thinks" that it is running in address like 0xCxxxxxxx but it is actually running in address (0x00000000 + some_offeset)(say your hardware configures 0x00000000 as the starting point of RAM). So before turning on MMU, the assembly code need to be very carefully written to make sure that nothing goes wrong during the execution procedure. In fact a techinque called position-independent code(PIC) is used.
To further explain the above, I extract several assembly code snippets:
ldr r13, =__mmap_switched # address to jump to after MMU has been enabled
b __enable_mmu # jump to function "__enable_mmu" to turn on MMU
Note that the above "ldr" instruction is a pseudo instruction which means "get the (virtual) address of function __mmap_switched and put it into r13"
And function __enable_mmu in turn calls function __turn_mmu_on:
(Note that I removed several instructions from function __turn_mmu_on which are essential instructions to the function but not of our interest)
ENTRY(__turn_mmu_on)
mcr p15, 0, r0, c1, c0, 0 # write control reg to enable MMU====> This is where MMU is turned on, after this instruction, every address issued by CPU is "virtual address" which will be translated by MMU
mov r3, r13 # r13 stores the (virtual) address to jump to after MMU has been enabled, which is (0xC0000000 + some_offset)
mov pc, r3 # a long jump
ENDPROC(__turn_mmu_on)
Second trick: identical mapping when setting up initial page table before turning on MMU.
More specifically, the same address range where kernel code is running is mapped twice.
The first mapping, as expected, maps address range 0x00000000(again,
this address depends on hardware config) through (0x00000000 +
offset) to 0xCxxxxxxx through (0xCxxxxxxx + offset)
The second mapping, interestingly, maps address range 0x00000000
through (0x00000000 + offset) to itself(i.e.: 0x00000000 -->
(0x00000000 + offset))
Why doing that?
Remember that before MMU is turned on, every address issued by CPU is physical address(starting at 0x00000000) and after MMU is turned on, every address issued by CPU is virtual address(starting at 0xC0000000).
Because ARM is a pipeline structure, at the moment MMU is turned on, there are still instructions in ARM's pipeine that are using (physical) addresses that are generated by CPU before MMU is turned on! To avoid these instructions to get blown up, an identical mapping has to be set up to cater them.
Now returning to your questions:
At this time, after turning on page table, kernel space is still 1GB (from 0xC0000000 - 0xFFFFFFFF ) ?
A: I guess you mean turning on MMU. The answer is yes, kernel space is 1GB(actually it also occupies several mega bytes below 0xC0000000, but that is not of our interest)
And in the page tables of kernel process, only page table entries (PTE) in range from 0xC0000000 - 0xFFFFFFFF are mapped ?. PTEs are out
of this range will be not mapped because kernel code never jump there
?
A: While the answer to this question is quite complicated because it involves lot of details regarding specific kernel configurations.
To fully answer this question, you need to read the part of kernel source code that set up the initial page table(assembly function __create_page_tables) and the function which sets up the final page table(C function paging_init).
To put it simple, there are two levels of page table in ARM, the first page table is PGD, which occupies 16KB. Kernel first zeros out this PGD during initialization process and does the initial mapping in assembly function __create_page_tables. In function __create_page_tables, only a very small portion of address space is mapped.
After that, the final page table is set up in function paging_init, and in this function, a quite large portion of address space is mapped. Say if you only have 512M RAM, for most common configurations, this 512M-RAM would be mapping by kernel code section by section(1 section is 1MB). If your RAM is quite large(say 2GB), only a portion of your RAM will be directly mapped.
(I will stop here because there are too many details regarding Question 2)
Mapping address before and after turning on page table is same ?
A: I think I've already answered this question in my explanation of "Second trick: identical mapping when setting up initial page table before turning on MMU."
4 . The page table in kernel space is global and will be shared across
all process in the system including user process ?
A: Yes and no. Yes because all processes share the same copy(content) of kernel page table(higher 1GB part). No because each process uses its own 16KB memory to store the kernel page table(although the content of page table for higher 1GB part is identical for every process).
5 . This mechanism is same in x86 32bit and ARM ?
Different Architectures use different mechanism

When Linux enables the MMU, it is only required that the virtual address of the kernel space is mapped. This happens very early in booting. At this point, there is no user space. There is no restrictions that the MMU can map multiple virtual addresses to the same physical address. So, when enabling the MMU, it is simplest to have a virt==phys mapping for the kernel code space and the mapping link==phys or the 0xC0000000 mapping.
Mapping address before and after turning on page table is same ?
If the physical code address is Oxff and the final link address is 0xc00000FF, then we have a duplicate mapping when turning on the MMU. Both 0xff and 0xc00000ff map to the same physical page. A simple jmp (jump) or b (branch) will move from one address space to the other. At this point, the virt==phys mapping can be removed as we are executing at the final destination address.
I think the above should answer points 1 through 3. Basically, the booting page tables are not the final page tables.
4 . The page table in kernel space is global and will be shared across all process in the system including user process?
Yes, this is a big win with a VIVT cache and for many other reasons.
5 . This mechanism is same in x86 32bit and ARM?
Of course the underlying mechanics are different. They are different even for different processors within these families; 486 vs P4 vs Amd-K6; ARM926 vs Cortex-A5 vs Cortex-A8, etc. However, the semantics are very similar.
See: Bootmem#lwn.net - An article on the early Linux memory phase.
Depending on the version, different memory pools and page table mappings are active during boot. The mappings we are all familiar with do not need to be in place until init runs.

User to kernel mode big picture?

I've to implement a char device, a LKM.
I know some basics about OS, but I feel I don't have the big picture.
In a C programm, when I call a syscall what I think it happens is that the CPU is changed to ring0, then goes to the syscall vector and jumps to a kernel memmory space function that handle it. (I think that it does int 0x80 and in eax is the offset of the syscall vector, not sure).
Then, I'm in the syscall itself, but I guess that for the kernel is the same process that was before, only that it is in kernel mode, I mean the current PCB is the process that called the syscall.
So far... so good?, correct me if something is wrong.
Others questions... how can I write/read in process memory?.
If in the syscall handler I refer to address, say, 0xbfffffff. What it means that address? physical one? Some virtual kernel one?

To read/write memory from the kernel, you need to use function calls such as get_user or __copy_to_user.
See the User Space Memory Access API of the Linux Kernel.

You can never get to ring0 from a regular process.
You'll have to write a kernel module to get to ring0.
And you never have to deal with any physical addresses, 0xbfffffff represents an address in a virtual address space of your process.

Big picture:
Everything happens in assembly. So in Intel assembly, there is a set of privilege instruction which can only be executed in Ring0 mode (http://en.wikipedia.org/wiki/Privilege_level). To make the transition into Ring0 mode, you can use the "Int" or "Sysenter" instruction:
what all happens in sysenter instruction is used in linux?
And then inside the Ring0 mode (which is your kernel mode), accessing the memory will require the privilege level to be matched via DPL/CPL/RPL attributes bits tagged in the segment register:
http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection/
You may asked, how the CPU initialize the memory and register in the first place: it is because when bootup, x86 CPU is running in realmode, unprotected (no Ring concept), and so everything is possible and lots of setup work is done.
As for virtual vs non-virtual memory address (or physical address): just remember that anything in the register used for memory addressing, is always via virtual address (if the MMU is setup, protected mode enabled). Look at the picture here (noticed that anything from the CPU is virtual address, only the memory bus will see physical address):
http://en.wikipedia.org/wiki/Memory_management_unit
As for memory separation between userspace and kernel, you can read here:
http://www.inf.fu-berlin.de/lehre/SS01/OS/Lectures/Lecture14.pdf

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio