I have read article of Duartes from: http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory
In part that describes about PTE content, bit [0:11] is different with description in ARMv5 Architecture Reference Manual
Detail is:
Bit [0:11] of the PTE contain:
In Duartes article: bit 0: P (present), bit 1: R/W , bit 2: U/S (user/supervisor),...
In ARMv5 Architecture Reference Manual : Bits[1:0] Identify the type of descriptor (0b11 marks a fine page table descriptor), Bits[4:2]: The meaning of these bits is IMPLEMENTATION DEFINED,...
(Refer at: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0198e/I16780.html ) . I think the Second-level descriptor in ARMv5 Architecture Reference Manual is corresponding with PTE in Duartes's article.
So, question is the PTE descriptor is depending on platform (x86, ARM,...) ?.
For me, I think PTE descriptor should not depend on platform.
Thanks
As each architecture implement their MMU (memory management unit) differently, the PTE descriptor is architecture dependent.
If we look at Linux, it has a three level page table structure (inherited from the x86 architecture), which in most ARM platforms is wrapped to fit a two level page table structure (newer ARM have support for 3 levels). Linux also uses "dirty" and "accessed" bit that is available in the x86 architecture for the memory management logic of the kernel. These bits are not available in the ARM architecture, which ARM Linux has solved by emulating it in software. This is done by having two versions of the PTE page tables. One for the OS which contains these missing "bits", and one for the actual HW to use.
In the end, the Linux OS for different architectures behave the same. It's all about how the OS are using the hardware mechanisms that the specific architecture is offering as there are pro and cons for each.
The ARM Linux code is different depending on the type of ARM and other conditionals. pgtable.h, page.h and mostly pgtable-2level.h give some details. There are two versions of PTE values; one for Linux and one for hardware.
Related
In Cortex-A7 TRM, we can access internal L1 cache related memory via CP15 instructions. We can retreive Tag RAM/Dirty RAM MOESI state of specific cache line. As stated here.
However, it is not mentioned anywhere the detail about the 4-bit MOESI encoding (e.g. 000 refer to what state, etc.). Not anywhere in Armv7-A TRM either. Also, it said 4-bit, but won't 3-bit suffice to encode the 5 MOESI states (UC,UD,SC,SD,I).
Did I miss something?
It seems like ARM only provide these information to its Debug and Silicon partners. I posted on ARM Community with an answer : https://community.arm.com/thread/10498
A (hopefully) simple question.
Can I create my MMU page tables in ARM tightly coupled memory, or is there a restriction that prevents me doing this.
I have 16k of data TCM that seems quite suitable for this task (the instruction TCM will contain my secure world code), but I'm getting abort exceptions when enabling the MMU.
When I compile my secure world code to target SRAM everything works as expected. The problem is that on this SoC SRAM is available from an FPGA-like device that does not respect TrustZone at all.
Am I missing something here, or do I need to carve off a small piece of RAM for myself to get this all working?
I'm working on an ARM1176JZ-S.
So, many years later I found my answer in the ARM Architecture Reference Manual for ARMv6 while browsing around for something completely different!
In section B4.7.3, Page table translation in VMSAv6 (page B4-25), the answer is mentioned just after discussing whether L1 cache is used in the page table walk, and it says:
Hardware page table walks cannot cause reads from TCM.
In Linux OS, after enable the page table, kernel will only map PTEs belong to kernel space once and never remap them again ? This action is opposite with PTEs in the user space which needs to remap every time process switching happening ?
So, I want know the difference in management of PTEs in kernel and user space.
This question is a extended part from the question at:
Page table in Linux kernel space during boot
Each process has its own page tables (although the parts that describe the kernel's address space are the same and are shared.)
On a process switch, the CPU is told the address of the new table (this is a single pointer which is written to the CR3 register on x86 CPUs).
So, I want know the difference in management of PTEs in kernel and user space.
See these related questions,
Does Linux use self map for page tables?
Linux Virtual memory
Kernel developer on memory management
Position independent code and shared libraries
There are many optimizations to this,
Each task has a different PGD, but PTE values maybe shared between processes, so large chunks of memory can be mapped the same for each process; only the top-level directory (CR3 on x86, TTB on ARM) is updated.
Also, many CPUs have a TLB and cache. These need to be maintained with the memory mapping. Some caches are VIVT, VIPT and PIPT. The first two have to have some cache flushing iff the PGD and/or PTE change. Often a CPU will support a process, thread or domain id. The OS only needs to switch this register during a context switch. The hardware cache and TLB entries must contains tags with the process, thread, or domain id. This is an implementation detail for each architecture.
So it is possible that TLB flushes could be needed when a top level page registers changes. The CPU could flush the entire TLB when this happens. However, this would be a disadvantage to pages that remain mapped.
Also, sub-sections of memory can be the same. A loader or other library can use mmap to create code that is similar between processes. This common code may not need to be swapped at the page table level, depending on architecture, loader and Linux version. It could of course have a virtual alias and then it needs to be swapped.
And the final point to the answer; kernel pages are always mapped. Only a non-preemptive OS could not map the kernel, but that would make little sense as every process wants to call the kernel. I guess the micro-kernel paradigm allows for device drivers to unload when they are not in use. Linux uses module loading to handle this.
Compiled Linux kernel 2.6.34.3 for ARMv7 (Cortex-a8)
I looked into the kernel code and it looks like the Linux kernel sets the hardware page tables for the kernel address space (everything over 0xC0000000)on TTB1 (translation table base) and the user process on ttb0 (everything under 0xC0000000) which changes for every process context switch. Is this correct? I'm still confused how the MMU knows which ttb to look at for translations?
I read that the TTBCR (translation table base control register) determines which of the ttb register to walk when an MVA is not found, however the register always reads 0 which means always use TTBR0 in the ARM architecture reference manual. How is that possible? Can anyone explain to me how the Linux kernel uses these two ttbs?
I read how the ttb works from this site https://www.cs.rutgers.edu/~pxk/416/notes/10-paging.html but I still dont understand how the kernel use the two ttbs
(Double checked the kernel code, for some reason both ttb0 and ttb1 is set, but it seems like ttb1 is never used, i set the TTB1 register to 0 and the Linux kernel continue to run as usual)
The TTBR registers are used together to determine addressing for the full 32-bit or 40-bit address space. Which register is used for what address ranges is controlled via the tXsz bits in the TTBCR. There is an entry for t0sz corresponding to TTBR0 and t1sz for TTBR1.
The page tables addressed by each TTBRx register are independent, but you typically find most Linux implementations just use TTBR0. Linux expects to be able to use a 3G/1G address space partitioning scheme, which is not supported by ARM. If you look at page B3-1345 of the ARMv7 Architecture Reference Manual, you'll see that the value of t0sz and t1sz determine the address ranges supported by TTBR0 and TTBR1 respectively. To add confusion to disorientation, it is even possible to have disjoined address spaces where TTBR0 and TTBR1 support ranges that are not contiguous, resulting in a hole in the system address space. Good times!
To answer your main question though, it is recommended by ARM that TTBR0 be used to store the offset to the page tables used by USER processes, and TTBR1 be used to store the offset to the page tables used by the KERNEL. I have yet to see a single implementation that actually does this. Almost exclusively TTBR0 is used in all cases, with TTBR1 containing a duplicate copy of the L1 tables.
So how does this work? The value of TTBR is stored as part of the process state and simply restored each time a process with switched out. This is how it is expected to work. Originally, TTBR1 would hold a constant value for the kernel tables and never be replaced or swapped out, whereas TTBR0 would be changed each time you context switch between processes. Apparently most Linux implementations for ARM have decided to just basically eliminate the use of TTBR1 and stick to using TTBR0 for everything.
If you want to test this theory on your device, try whacking TTBR1 and watch nothing happen. Then try whacking TTBR0 and watch your system crash. I've yet to encounter a single instance that didn't result in this exact same result. Long story short, TTBR1 is useless by Linux, and TTBR0 is used almost exclusively and simply swapped out.
Now, once you get to LPAE support, throw all this away and start over again. This is the implementation where you will start to see the value of t0sz and t1sz being something other than zero, and hence N as well.
I have very little knowledge about ARM architecture, but from what I read in your enclosed link, then I guess Linux implements its virtual-memory management that way:
High-order bits of the virtual address determine which one to use. The base of the table is stored in one of two base registers (TTBR0 or TTBR1), depending on whether the topmost n bits of the virtual address are 0 (use TTBR0) or not (use TTBR1). The value for n is defined by the Translation Table Base Control Register (TTBCR).
The register TTBCR tells which addresses will be translated from page-tables pointed to by TTBR0 or TTBR1. If TTBCR contains 0xc000000, then any address from 0 to 0xbfffffff is translated by the page-table pointed by TTBR0, and any address from 0xc0000000 to 0xffffffff is translated by the page-table pointed by TTBR1. That match the Linux memory-split of 3GB for user process / 1GB for the kernel.
This allows one to have a design where the operating system and memory-mapped I/O are located in the upper part of the address space and managed by the page table in TTBR1 and user processes are in the lower part of memory and managed by the page table in TTB0. On a context switch, the operating system has to change TTBR0 to point to the first-level table for the new process. TTBR1 will still contain the memory map for the operating system and memory-mapped I/O.
Hence, the value of TTBR1 should never change because you want the kernel to be permanently mapped (think of what happens when an interrupt is raised). On the other hand, TTBR0 is modified at every process-switch, it contains the page-table of the current process.
See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0211k/Bihgfcgf.html
For ARM5 and lower the TTB table is fixed in size and alignment (to 16k). Each level 1 entry represents 1MB. The table entry is 32bits (16k*1M/(32bit/8) = 4GB). The TTBCR controls TTBR0 table size. From the above URL,
Selecting which Translation Table Base Register is used
The Translation Table Base Register is selected as follows:
If N = 0, always use Translation Table Base Register 0.
- This is the default case at reset. It is backwards compatible with ARMv5 or earlier processors.
If N is greater than 0, then:
- if bits [31:32-N] of the Virtual Address are all 0, use Translation Table Base Register 0 otherwise use Translation Table Base Register 1.
So the size of TTBR0 also sets the memory split. For a traditional Linux 3G/1G 1G/3G, the value 2 should be selected. 4kB table == 1G memory == bits 31..30 are zero. For a value of 6 the table is 256byte == 64MB == bits 31..26 are zero.
In Linux parlance these are page global entries (and this splits this page global directory). The entries can point to another table or just be a 1MB segment. The next table entries are page middle Linux directories and then the final page table entries. I think the page middle entries are unused on the ARM.
The MMU hardware doesn't walk the tables every time. There is a TLB (translation look aside buffer). It is like a cache for the MMU tables. When the OS updates these tables, the TLB must be flushed or the processor will use stale entries. Similarly the ARM cache is virtual tagged, so changing the mapping may also mean the cache must be flushed. For these reasons, you never want to change things on a context switch. Shared libraries text (say libc.so) should be the same on a context switch. Hopefully each process has libc.so mapped at the same virtual address. There is a big gain in doing this; lower memory use and good I-cache use.
The domain and PID registers as well as supervisor/user modes can also control memory accesses. These are single registers that can be toggled on a context switch.
See http://lwn.net/images/conf/rtlws11/papers/proc/p01.pdf for info on PID and domain use on the ARMV5. The current Linux source doesn't do exactly like the paper describes. It is entirely possible that Linux doesn't need to use this mechanism and sets the TTBCR to zero so that the VM code for ARM sub-architectures is similar.
Edit: I don't believe the TTBCR functionality can be used to achieve a 3G/1G split. I think the Rutger's page was discussing the TTBCR generically and not in the Linux context. Also, at least the 2.6.38 Linux used domains or DACR but does not use the pid or fcse as it supports a limited number of processes.
http://lwn.net/Articles/106177/ - also referenced on the Rutgers page.
The TTBR0 holds the base address of translation table 0, and information about the memory it occupies.
This is one of the translation tables for the stage 1 translation of memory accesses from modes other than Hyp mode
While in u-boot of my ARM based board (DM368) I mark some kernel partition block manually as bad. U-boot says that it was marked and, for example, while writing/reading kernel image I see it skipping this bad block.
But when I try to write the same partition from within Linux (loaded via NFS) I see that Linux nandwrite command USES this bad block! I checked this in several ways - Linux ignores bad block mark for 100%. But everywhere in the internet it is said that BBT is one for both u-boot and Linux.
So, where is the catch?
OK, the answer is found.
For some unclear reason Texas Instruments, manufacturer of the board DM365EVM which I use for development, provides the kernel with different BBT structure. They defined BBT offset as 2, while all the world, including the provided u-boot, defines this offset as 8.
I wish them a good health for many years.