How are tpidr_el1 and tpidr_el2 used in arm64 linux? - linux-kernel

I was following the kernel_entry code and saw arch/arm64/include/asm/assembler.h (linux 5.10.0-rc5, see https://elixir.bootlin.com/linux/v5.10-rc5/source/arch/arm64/include/asm/assembler.h#L255)
From the way ldr_this_cpu, adr_this_cpu and this_cpu_offset above are used/defined, I understood the compile-generated per-cpu variable's address is the address of the variable for the first cpu and the tpidr_el1 or tpidr_el2 contains the offset of this cpu's variable from that of the first cpu's variable. And we have to appy this offset when accessing the per-cpu variables. Is my understanding correct?
And is tpidr_el1 used for offset between cpus and tpidr_el2 for offset between virtual machines? But definition of set_my_cpu_offset or this_cpu_offset seems to be saying tpidr_el2 is used whenever VHE is implemented. Could anyone briefly explain this to me?

Related

Problem about reading/writing memory in EL2

I am trying to read/write a memory in EL2, but it doesn't return what I want.
I use kzalloc to get initialized space, then use str to write a number (0x12345678) in this space.
Next, I use __pa() to get the physical address(PA) of this space. I found PA=VA-0x80000000. I will send this PA to EL2 for reading, so I put it into one register(r1)
Third step is call hvc, after calling hvc it is in EL2. I have created a branch in hyp_stub_vectors (in arch/arm/kernel/hyp-stub.S, I am sure this file will handler hvc ), and used ldr to read this memory space to get my number.
But it failed.
I guess possible reasons are
I got a wrong physical address with __pa(). But I have walked the aarch32 stage-1 translation and got the same address, actually this space is a block, so it's OK to delete an offset to get the physical address.
in EL2 it still has address translation. But I checked some related system register and found the MMU in EL2 is disable. Possibly I checked a wrong register?
My device is Raspberry Pi 3B+, Cortex-A53
The problem may be related to cache incoherence. Given that your EL2 is running with MMU disabled, it also has data cache disabled, as stated in this paper. This means that to access a memory location in EL2 you need to get the value into RAM.
To achieve this, you can use the dc civac, x0 instruction, with x0 being a virtual address of the variable. This will flush the cache line with your variable and write the value into RAM.
P.S. To verify whether your PA is correct, read the value at __va(__pa(addr)) and make sure that it's the same.

Catching and avoiding memory corruption at fixed offset in physical memory

We have a 4-byte memory corruption that always occurs at a fixed offset in the physical memory.
The physical frame number is 0x00a4d and the offset is ending with dc0.
Question 1) Based on this information, can we say the physical address of corruption is 0x00a4d * PAGE_SIZE (4096) + dc0 = 0x00A4DDC0. Programmatically, what is best way to confirm the physical address? Ours is ppc64 based system.
Question 2) What would be the best way to find out this memory corruption? The more I read the more I get lost with the plethora of options. Should I use KASAN, or CONFIG_DEBUG_PAGEALLOC (debug_guardpage_minorder) option or a HW breakpoint?
Question 3) Since we know the corruption is at a fixed option, if we were to reserve/block that page, what again is the best option? The two I came across are memmap and Reserved memory regions
Thanks
1.) You are right about physical address.
2.) HW breakpoint is the best if you have such possibility. Do you have the appropriate device (t32 or whatever) / debug port/ could it place HW break at physical address?
Here is the more generic and dumb case which needs no HW support:
If I remember right from your previous post, you suspect the kernel code as a corruption causer.
If you have read anything about KASAN, you probably mentioned that gcc part places hooks instead of kernel code loads and stores. The kernel part provides kasan_store_bla_bla_bla hook, which handles correctness of this store. Very likely, that default functionality wouldn't help you, but you can integrate your code in this kasan store hook, which would:
2.1)Take the virtual address passed to the store kasan hook
2.2)Finds appropriate physical address by page tables walking like this (the more convenient API exists but i don't remember the function name):
pgd_t *pgd = pgd_offset(mm, addr);
pud_t *pud = pud_offset(pgd, addr);
pmd_t *pmd = pmd_offset(pud, addr);
...
As i remember from your previous post you get crash in userspace app, so you will be need to check all processes mms from task list.
2.3) Compare found physical address to the given, and check that written value is zero (as i remember from your previous post)
2.4) If match print backtraces for all cores and stop execution.

How can I shrink the OS region in RAM through U-boot?

From my understanding, after a PC/embedded system booted up, the OS will occupy the entire RAM region, the RAM will look like this:
Which means, while I'm running a program I write, all the variables, dynamic memory allocated in the stacks, heaps and etc, will remain inside the region. If I run firefox, paint, gedit, etc, they will also be running in this region. (Is this understanding correct?)
However, I would like to shrink the OS region. Below is an illustration of how I want to divide the RAM:
The reason that I want to do this is because, I want to store some data receive externally through the driver into the Custom Region at fixed physical location, then I will be able to access it directly from the user space without using copy_to_user().
I think it is possible to do that by configuring u-boot, but I have no experience in u-boot, can anyone give me some directions where to begin with, such as: do I need to modify the source of u-boot, or changing the environment variables of u-boot will be sufficient?
Or is there any alternative method of doing this?
Any help is much appreciated. Thanks!
p/s: I'm using TI ARM processor, and booting up from an SD card, I'm not sure if it matters.
The platform is ARM. min_addr and max_addr will not work on these platform since these are for Intel-only implementations.
For the ARM platform try to look at "mem=size#start" kernel parameter. Read up on Documentation/kernel-parameters.txt and arch/arm/kernel/setup.c. This option is available on most new Linux code base (ie. 2.6.XX).
You need to set the following parameters:
max_addr=some_max_physical
min_addr=some_min_physical
to be passed to the kernel through uboot in the 'bootargs' u-boot environment variable.
I found myself trying to do the opposite recently - in other words get Linux to use the additional memory in my system - although I'm using Barebox rather than u-boot on a OMAP4 platform.
I found (a bit to my surprise) that once the Barebox MLO first stage boot-loader was aware of the extra RAM, the kernel then detected and used it as well without any bootargs. Since the memory size is not passed anywhere on the boot-line, I can only assume the kernel inspects the memory mappings set up by the boot-loader to determine RAM size. This suggests that modifying your u-boot to not map all of the RAM is the way to go.
On the subject of boot-args, there was a time when you it was recommended that you mapped out a chunk of RAM (used by the frame buffer?) on OMAP4 systems, using the boot-line. It's still unclear whether this is still necessary.

Somewhat newb question about assy and the heap

Ultimately I am just trying to figure out how to dynamically allocate heap memory from within assembly.
If I call Linux sbrk() from assembly code, can I use the address returned as I would use an address of a statically (ie in the .data section of my program listing) declared chunk of memory?
I know Linux uses the hardware MMU if present, so I am not sure if what sbrk returns is a 'raw' pointer to real RAM, or is it a cooked pointer to RAM that may be modified by Linux's VM system?
I read this: How are sbrk/brk implemented in Linux?. I suspect I can not use the return value from sbrk() without worry: the MMU fault on access-non-allocated-address must cause the VM to alter the real location in RAM being addressed. Thus assy, not linked against libc or what-have-you, would not know the address has changed.
Does this make sense, or am I out to lunch?
Unix user processes live in virtual memory, no matter if written in assembler of Fortran, and should not care about physical addresses. That's kernel's business - kernel sets up and manages the MMU. You don't have to worry about it. Page faults are handled automatically and transparently.
sbrk(2) returns a virtual address specific to the process, if that's what you were asking.

User to kernel mode big picture?

I've to implement a char device, a LKM.
I know some basics about OS, but I feel I don't have the big picture.
In a C programm, when I call a syscall what I think it happens is that the CPU is changed to ring0, then goes to the syscall vector and jumps to a kernel memmory space function that handle it. (I think that it does int 0x80 and in eax is the offset of the syscall vector, not sure).
Then, I'm in the syscall itself, but I guess that for the kernel is the same process that was before, only that it is in kernel mode, I mean the current PCB is the process that called the syscall.
So far... so good?, correct me if something is wrong.
Others questions... how can I write/read in process memory?.
If in the syscall handler I refer to address, say, 0xbfffffff. What it means that address? physical one? Some virtual kernel one?
To read/write memory from the kernel, you need to use function calls such as get_user or __copy_to_user.
See the User Space Memory Access API of the Linux Kernel.
You can never get to ring0 from a regular process.
You'll have to write a kernel module to get to ring0.
And you never have to deal with any physical addresses, 0xbfffffff represents an address in a virtual address space of your process.
Big picture:
Everything happens in assembly. So in Intel assembly, there is a set of privilege instruction which can only be executed in Ring0 mode (http://en.wikipedia.org/wiki/Privilege_level). To make the transition into Ring0 mode, you can use the "Int" or "Sysenter" instruction:
what all happens in sysenter instruction is used in linux?
And then inside the Ring0 mode (which is your kernel mode), accessing the memory will require the privilege level to be matched via DPL/CPL/RPL attributes bits tagged in the segment register:
http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection/
You may asked, how the CPU initialize the memory and register in the first place: it is because when bootup, x86 CPU is running in realmode, unprotected (no Ring concept), and so everything is possible and lots of setup work is done.
As for virtual vs non-virtual memory address (or physical address): just remember that anything in the register used for memory addressing, is always via virtual address (if the MMU is setup, protected mode enabled). Look at the picture here (noticed that anything from the CPU is virtual address, only the memory bus will see physical address):
http://en.wikipedia.org/wiki/Memory_management_unit
As for memory separation between userspace and kernel, you can read here:
http://www.inf.fu-berlin.de/lehre/SS01/OS/Lectures/Lecture14.pdf

Resources