Why does the GDT have 3 entries for TLS? - linux-kernel

I am reading the "Understanding the Linux Kernel" book, and would like to understand the TLS entries in the GDT.
From the book, I take there are 3 entries in each processor's GDT table, that indicate the processor's current TLS segments.
What I am trying to understand is:
Why are there 3 TLS entries in the GDT table?
I have read that the segment selector %fs is commonly used to refer to a thread-local storage address, in the form (%fs:address). When does the fs register get populated with the TLS entry from the GDT?
Does the processor have a non-programmable register similar to the tr register but for the TLS entry?
I would expect a TLS entry to be modified in a processor's GDT at a process context switch, followed by storing the index of the TLS entry in the %fs register. But based on this description, a processor would only require 1 TLS entry in the GDT.

Related

Why there are 65520 dnp3 source addresses instead of 65536?

DNP3 link-layer source and destination addresses are 16 bits each. It means it can have 2^16 = 65536 total different addresses. Based on official DNP3 docs, there are 65536 destination addresses, which I understand. But there are only 65520 source addresses, why is that? What are other remaining 16 addresses for?
On what I said above, you can read from any dnp3 docs or this link also works: https://www.ixiacom.com/company/blog/scada-distributed-network-protocol-dnp3
I'm not familiar with DNP3, but I found a specification for a DNP3 link layer protocol implementation at https://library.e.abb.com/public/06e4e2267fd04c3884515a0360210068/1MRK511380-UUS_-_en_Point_list_manual__DNP_650_series_2.1.pdf. See page 36:
1.4.1 Data Link Address: Indicates if the link address is configurable over the entire valid range of 0 to 65,519. Data link addresses 0xFFF0 through
0xFFFF are reserved for broadcast or other special purposes.
While the source doesn't indicate what these 16 addresses are reserved for (possibly as a precaution for future needs), it does indicate that they are reserved.

ARM v7 memory management unit (MMU) ttbr0 and ttbr1

In the ARMv7 VMSA MMU, there are two sets of translation tables pointed to by ttbr0 and ttbr1. The range of virtual address that will be used for translation either by tables pointed to by ttbr0 or ttbr1 is set by the 'N' field TTBCR register.
Now, if I set this TTBCR.N to 7, the address range covered by translation table at ttbr0 is 0x00000000 - 0x2000000
So the first address after 0x2000000 (i.e 0x2000004?) will use translation table at ttbr1 for translation. As per short-descriptor format in ARMv7 VMSA, translation tables can have either sections (1MB) regions, supersection (16MB) mapping regions.
My question is what happens if I place a super-section at a address location say, 0x1600000.
According to theory, then the address in the range 0x1600000 to 0x2600000 will be mapped to physical address 0x1600000. (But, this won't work as the translation table itself changes at 0x2000000 ?)
So what happens in this scenario? Also what should be placed at the first entry of ttbr1 in this case?
I think it is a programming error and page table should not set up like this with address block of one region overlapping with the other.Consider this you have set VA 0x1600000 onwards till 16 MB to be a block of 16MB super section and say that you access the location 0x1600000 ,now your TLB will have a virtual to physical mapping for a 16MB section starting form 0x1600000.Now next say you access the memory location 0x2000000 ,here look up into TLB will happen first and a matching entry will be found ,no page table walk will happen .You might have mapped 0x2000000 onwards to some other physical address space and then such a access will potentially address into unintended location.

Page table in Linux kernel space during boot

I feel confuse in page table management in Linux kernel ?
In Linux kernel space, before page table is turned on. Kernel will run in virtual memory with 1-1 mapping mechanism. After page table is turned on, then kernel has consult page tables to translate a virtual address into a physical memory address.
Questions are:
At this time, after turning on page table, kernel space is still 1GB (from 0xC0000000 - 0xFFFFFFFF ) ?
And in the page tables of kernel process, only page table entries (PTE) in range from 0xC0000000 - 0xFFFFFFFF are mapped ?. PTEs are out of this range will be not mapped because kernel code never jump there ?
Mapping address before and after turning on page table is same ?
Eg. before turning on page table, the virtual address 0xC00000FF is mapped to physical address 0x000000FF, then after turning on page table, above mapping does not change. virtual address 0xC00000FF is still mapped to physical address 0x000000FF. Different thing is only that after turning on page table, CPU has consult the page table to translate virtual address to physical address which no need to do before.
The page table in kernel space is global and will be shared across all process in the system including user process ?
This mechanism is same in x86 32bit and ARM ?
The following discussion is based on 32-bit ARM Linux, and version of kernel source code is 3.9
All your questions can be addressed if you go through the procedure of setting up the initial page table(which will be overwitten later by function paging_init ) and turning on MMU.
When kernel is first launched by bootloader, Assembly function stext(in arch\arm\kernel\head.s) is the first function to run. Note that MMU has not been turned on yet at this moment.
Among other things, the two import jobs done by this function stext is:
create the initial page tabel(which will be overwitten later by
function paging_init )
turn on MMU
jump to C part of kernel initialization code and carry on
Before delving into the your questions, it is benificial to know:
Before MMU is turned on, every address issued by CPU is physical
address
After MMU is turned on, every address issued by CPU is virtual address
A proper page table should be set up before turning on MMU, otherwise your code will simply "be blown away"
By convention, Linux kernel uses higher 1GB part of virtual address and user land uses the lower 3GB part
Now the tricky part:
First trick: using position-independent code.
Assembly function stext is linked to address "PAGE_OFFSET + TEXT_OFFSET"(0xCxxxxxxx), which is a virtual address, however, since MMU has not been turned on yet, the actual address where assembly function stext is running is "PHYS_OFFSET + TEXT_OFFSET"(the actual value depends on your actual hardware), which is a physical address.
So, here is the thing: the program of function stext "thinks" that it is running in address like 0xCxxxxxxx but it is actually running in address (0x00000000 + some_offeset)(say your hardware configures 0x00000000 as the starting point of RAM). So before turning on MMU, the assembly code need to be very carefully written to make sure that nothing goes wrong during the execution procedure. In fact a techinque called position-independent code(PIC) is used.
To further explain the above, I extract several assembly code snippets:
ldr r13, =__mmap_switched # address to jump to after MMU has been enabled
b __enable_mmu # jump to function "__enable_mmu" to turn on MMU
Note that the above "ldr" instruction is a pseudo instruction which means "get the (virtual) address of function __mmap_switched and put it into r13"
And function __enable_mmu in turn calls function __turn_mmu_on:
(Note that I removed several instructions from function __turn_mmu_on which are essential instructions to the function but not of our interest)
ENTRY(__turn_mmu_on)
mcr p15, 0, r0, c1, c0, 0 # write control reg to enable MMU====> This is where MMU is turned on, after this instruction, every address issued by CPU is "virtual address" which will be translated by MMU
mov r3, r13 # r13 stores the (virtual) address to jump to after MMU has been enabled, which is (0xC0000000 + some_offset)
mov pc, r3 # a long jump
ENDPROC(__turn_mmu_on)
Second trick: identical mapping when setting up initial page table before turning on MMU.
More specifically, the same address range where kernel code is running is mapped twice.
The first mapping, as expected, maps address range 0x00000000(again,
this address depends on hardware config) through (0x00000000 +
offset) to 0xCxxxxxxx through (0xCxxxxxxx + offset)
The second mapping, interestingly, maps address range 0x00000000
through (0x00000000 + offset) to itself(i.e.: 0x00000000 -->
(0x00000000 + offset))
Why doing that?
Remember that before MMU is turned on, every address issued by CPU is physical address(starting at 0x00000000) and after MMU is turned on, every address issued by CPU is virtual address(starting at 0xC0000000).
Because ARM is a pipeline structure, at the moment MMU is turned on, there are still instructions in ARM's pipeine that are using (physical) addresses that are generated by CPU before MMU is turned on! To avoid these instructions to get blown up, an identical mapping has to be set up to cater them.
Now returning to your questions:
At this time, after turning on page table, kernel space is still 1GB (from 0xC0000000 - 0xFFFFFFFF ) ?
A: I guess you mean turning on MMU. The answer is yes, kernel space is 1GB(actually it also occupies several mega bytes below 0xC0000000, but that is not of our interest)
And in the page tables of kernel process, only page table entries (PTE) in range from 0xC0000000 - 0xFFFFFFFF are mapped ?. PTEs are out
of this range will be not mapped because kernel code never jump there
?
A: While the answer to this question is quite complicated because it involves lot of details regarding specific kernel configurations.
To fully answer this question, you need to read the part of kernel source code that set up the initial page table(assembly function __create_page_tables) and the function which sets up the final page table(C function paging_init).
To put it simple, there are two levels of page table in ARM, the first page table is PGD, which occupies 16KB. Kernel first zeros out this PGD during initialization process and does the initial mapping in assembly function __create_page_tables. In function __create_page_tables, only a very small portion of address space is mapped.
After that, the final page table is set up in function paging_init, and in this function, a quite large portion of address space is mapped. Say if you only have 512M RAM, for most common configurations, this 512M-RAM would be mapping by kernel code section by section(1 section is 1MB). If your RAM is quite large(say 2GB), only a portion of your RAM will be directly mapped.
(I will stop here because there are too many details regarding Question 2)
Mapping address before and after turning on page table is same ?
A: I think I've already answered this question in my explanation of "Second trick: identical mapping when setting up initial page table before turning on MMU."
4 . The page table in kernel space is global and will be shared across
all process in the system including user process ?
A: Yes and no. Yes because all processes share the same copy(content) of kernel page table(higher 1GB part). No because each process uses its own 16KB memory to store the kernel page table(although the content of page table for higher 1GB part is identical for every process).
5 . This mechanism is same in x86 32bit and ARM ?
Different Architectures use different mechanism
When Linux enables the MMU, it is only required that the virtual address of the kernel space is mapped. This happens very early in booting. At this point, there is no user space. There is no restrictions that the MMU can map multiple virtual addresses to the same physical address. So, when enabling the MMU, it is simplest to have a virt==phys mapping for the kernel code space and the mapping link==phys or the 0xC0000000 mapping.
Mapping address before and after turning on page table is same ?
If the physical code address is Oxff and the final link address is 0xc00000FF, then we have a duplicate mapping when turning on the MMU. Both 0xff and 0xc00000ff map to the same physical page. A simple jmp (jump) or b (branch) will move from one address space to the other. At this point, the virt==phys mapping can be removed as we are executing at the final destination address.
I think the above should answer points 1 through 3. Basically, the booting page tables are not the final page tables.
4 . The page table in kernel space is global and will be shared across all process in the system including user process?
Yes, this is a big win with a VIVT cache and for many other reasons.
5 . This mechanism is same in x86 32bit and ARM?
Of course the underlying mechanics are different. They are different even for different processors within these families; 486 vs P4 vs Amd-K6; ARM926 vs Cortex-A5 vs Cortex-A8, etc. However, the semantics are very similar.
See: Bootmem#lwn.net - An article on the early Linux memory phase.
Depending on the version, different memory pools and page table mappings are active during boot. The mappings we are all familiar with do not need to be in place until init runs.

Linux kernel ARM Translation table base (TTB0 and TTB1)

Compiled Linux kernel 2.6.34.3 for ARMv7 (Cortex-a8)
I looked into the kernel code and it looks like the Linux kernel sets the hardware page tables for the kernel address space (everything over 0xC0000000)on TTB1 (translation table base) and the user process on ttb0 (everything under 0xC0000000) which changes for every process context switch. Is this correct? I'm still confused how the MMU knows which ttb to look at for translations?
I read that the TTBCR (translation table base control register) determines which of the ttb register to walk when an MVA is not found, however the register always reads 0 which means always use TTBR0 in the ARM architecture reference manual. How is that possible? Can anyone explain to me how the Linux kernel uses these two ttbs?
I read how the ttb works from this site https://www.cs.rutgers.edu/~pxk/416/notes/10-paging.html but I still dont understand how the kernel use the two ttbs
(Double checked the kernel code, for some reason both ttb0 and ttb1 is set, but it seems like ttb1 is never used, i set the TTB1 register to 0 and the Linux kernel continue to run as usual)
The TTBR registers are used together to determine addressing for the full 32-bit or 40-bit address space. Which register is used for what address ranges is controlled via the tXsz bits in the TTBCR. There is an entry for t0sz corresponding to TTBR0 and t1sz for TTBR1.
The page tables addressed by each TTBRx register are independent, but you typically find most Linux implementations just use TTBR0. Linux expects to be able to use a 3G/1G address space partitioning scheme, which is not supported by ARM. If you look at page B3-1345 of the ARMv7 Architecture Reference Manual, you'll see that the value of t0sz and t1sz determine the address ranges supported by TTBR0 and TTBR1 respectively. To add confusion to disorientation, it is even possible to have disjoined address spaces where TTBR0 and TTBR1 support ranges that are not contiguous, resulting in a hole in the system address space. Good times!
To answer your main question though, it is recommended by ARM that TTBR0 be used to store the offset to the page tables used by USER processes, and TTBR1 be used to store the offset to the page tables used by the KERNEL. I have yet to see a single implementation that actually does this. Almost exclusively TTBR0 is used in all cases, with TTBR1 containing a duplicate copy of the L1 tables.
So how does this work? The value of TTBR is stored as part of the process state and simply restored each time a process with switched out. This is how it is expected to work. Originally, TTBR1 would hold a constant value for the kernel tables and never be replaced or swapped out, whereas TTBR0 would be changed each time you context switch between processes. Apparently most Linux implementations for ARM have decided to just basically eliminate the use of TTBR1 and stick to using TTBR0 for everything.
If you want to test this theory on your device, try whacking TTBR1 and watch nothing happen. Then try whacking TTBR0 and watch your system crash. I've yet to encounter a single instance that didn't result in this exact same result. Long story short, TTBR1 is useless by Linux, and TTBR0 is used almost exclusively and simply swapped out.
Now, once you get to LPAE support, throw all this away and start over again. This is the implementation where you will start to see the value of t0sz and t1sz being something other than zero, and hence N as well.
I have very little knowledge about ARM architecture, but from what I read in your enclosed link, then I guess Linux implements its virtual-memory management that way:
High-order bits of the virtual address determine which one to use. The base of the table is stored in one of two base registers (TTBR0 or TTBR1), depending on whether the topmost n bits of the virtual address are 0 (use TTBR0) or not (use TTBR1). The value for n is defined by the Translation Table Base Control Register (TTBCR).
The register TTBCR tells which addresses will be translated from page-tables pointed to by TTBR0 or TTBR1. If TTBCR contains 0xc000000, then any address from 0 to 0xbfffffff is translated by the page-table pointed by TTBR0, and any address from 0xc0000000 to 0xffffffff is translated by the page-table pointed by TTBR1. That match the Linux memory-split of 3GB for user process / 1GB for the kernel.
This allows one to have a design where the operating system and memory-mapped I/O are located in the upper part of the address space and managed by the page table in TTBR1 and user processes are in the lower part of memory and managed by the page table in TTB0. On a context switch, the operating system has to change TTBR0 to point to the first-level table for the new process. TTBR1 will still contain the memory map for the operating system and memory-mapped I/O.
Hence, the value of TTBR1 should never change because you want the kernel to be permanently mapped (think of what happens when an interrupt is raised). On the other hand, TTBR0 is modified at every process-switch, it contains the page-table of the current process.
See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0211k/Bihgfcgf.html
For ARM5 and lower the TTB table is fixed in size and alignment (to 16k). Each level 1 entry represents 1MB. The table entry is 32bits (16k*1M/(32bit/8) = 4GB). The TTBCR controls TTBR0 table size. From the above URL,
Selecting which Translation Table Base Register is used
The Translation Table Base Register is selected as follows:
If N = 0, always use Translation Table Base Register 0.
- This is the default case at reset. It is backwards compatible with ARMv5 or earlier processors.
If N is greater than 0, then:
- if bits [31:32-N] of the Virtual Address are all 0, use Translation Table Base Register 0 otherwise use Translation Table Base Register 1.
So the size of TTBR0 also sets the memory split. For a traditional Linux 3G/1G 1G/3G, the value 2 should be selected. 4kB table == 1G memory == bits 31..30 are zero. For a value of 6 the table is 256byte == 64MB == bits 31..26 are zero.
In Linux parlance these are page global entries (and this splits this page global directory). The entries can point to another table or just be a 1MB segment. The next table entries are page middle Linux directories and then the final page table entries. I think the page middle entries are unused on the ARM.
The MMU hardware doesn't walk the tables every time. There is a TLB (translation look aside buffer). It is like a cache for the MMU tables. When the OS updates these tables, the TLB must be flushed or the processor will use stale entries. Similarly the ARM cache is virtual tagged, so changing the mapping may also mean the cache must be flushed. For these reasons, you never want to change things on a context switch. Shared libraries text (say libc.so) should be the same on a context switch. Hopefully each process has libc.so mapped at the same virtual address. There is a big gain in doing this; lower memory use and good I-cache use.
The domain and PID registers as well as supervisor/user modes can also control memory accesses. These are single registers that can be toggled on a context switch.
See http://lwn.net/images/conf/rtlws11/papers/proc/p01.pdf for info on PID and domain use on the ARMV5. The current Linux source doesn't do exactly like the paper describes. It is entirely possible that Linux doesn't need to use this mechanism and sets the TTBCR to zero so that the VM code for ARM sub-architectures is similar.
Edit: I don't believe the TTBCR functionality can be used to achieve a 3G/1G split. I think the Rutger's page was discussing the TTBCR generically and not in the Linux context. Also, at least the 2.6.38 Linux used domains or DACR but does not use the pid or fcse as it supports a limited number of processes.
http://lwn.net/Articles/106177/ - also referenced on the Rutgers page.
The TTBR0 holds the base address of translation table 0, and information about the memory it occupies.
This is one of the translation tables for the stage 1 translation of memory accesses from modes other than Hyp mode

How are base registers, limit registers and relocation registers used?

My understanding in address translation process in MMU(memory management unit)
-> logical address : generated by cpu.programmer concern with this address.
-> virtual address : reside in the hard disk , as a pages.
-> physical address : reside in the RAM. It is the actual address.
1: cpu generate the logical address and send it to the MMU.
2: MMU translate the logical address into the virtual address then translate it to the physical address and send the physical address to RAM.
3: when ever the RAM is full , the page which is not used rapidly is returned to the hard disk , to allocate memory to the other pages(processes).
my questions are :
1) where the value of Relocation register is added?
2) who decide the value of Relocation Register?
3) what to do with the Base register and Limit register , how to use it?
4) where the logical address goes off?
If any body can answer it , It would be grateful to me.
It is requested that , let me know it any misunderstanding in this topic.
-thanks
I can tell you how this works on x86.
All programs in non-64-bit modes operate with addresses combined of two items: segment selector (for brevity "selector" is often omitted in text and that may be confusing) and offset. This selector:offset pair is called the logical address.
The selector portion isn't always explicitly specified or manipulated with in code since the CPU has "default" associations of segment registers containing selectors with specific instructions or specific instruction encodings. It's also uncommon to manipulate selectors in 32-bit mode, but is very often necessary in 16-bit code.
The virtual address is formed from the logical address either "directly" (in real or 8086 virtual mode) or "indirectly" (in protected mode).
"Direct" virtual address = selector * 16 + offset.
"Indirect" virtual address = SegmentDescriptorTable[selector].Base + offset.
SegmentDescriptorTable is either the Global Descriptor Table (AKA GDT) or the Local Descriptor Table (AKA LDT). It's set up by the OS and describes the location and size of various segments of memory. selector is used to select a segment in the table. The Base entry of the table tells the segment's beginning (virtual address). The Limit entry tells the segment size (generally; the details are a little more complex).
When a program tries to access memory with an offset resulting access beyond the end of the segment (the CPU compares offset and Limit), the CPU generates an exception and the OS handles it, by usually terminating the program.
Btw, in real/v86 mode, even though the virtual address is formed directly from selector:offset, there's still a 16-bit Limit imposed on offsets, which is why you need to use a different selector to access more than 64KB of memory.
The Base entry in a segment descriptor can be used to either isolate the segment from the rest of the memory (Limit helps here) or to place or move the entire segment to an arbitrary virtual address without having to modify anything (or much) in the program it belongs to (if we're moving a segment, the data has to be moved in the memory, obviously). Basically, it can be used for relocation purposes. In real/v86 mode for relocation purposes the selector is changed.
The virtual address can be further translated to the physical address if the CPU is running in protected mode and has set up page tables. If there're no page tables, the physical address is the same as the virtual address. The translation is done in blocks of physical memory and address ranges that are called pages (often 4KB).
There's no dedicated relocation register on x86 CPUs. Relocation can be achieved by adjusting:
segment selectors in CPU registers or program's code
segment base addresses in GDT/LDT
offsets in program's code
physical addresses in page tables
As for virtual address : reside in the hard disk , as a pages, I'm not sure what exactly you want to say with this, but just because there's virtual to physical address translation, it doesn't mean there's also virtual on-disk memory. There are other uses for the translation besides virtual on-disk memory. And the addresses reside in the CPU and wherever your (and OS's) code writes them to, not necessarily on the disk.
Your description has a number of mistakes, much of which may be the result of imprecise documentation and common usage.
First of all, there really is no such a thing as a virtual address. There are physical and logical addresses. Sadly, the term virtual address is frequently (even in hardware documentation) used when logical address is what is meant..
The CPU instruction stream always operates on logical addresses (values may refer to physical addresses).
When the CPU needs to access a logical address, the MMU attempts to translate it to a physical addresses. It does that by looking up the address in a page table.
Several things can happen at that point:
There may not be a page table entry for the address => Access violation.
The page table entry is marked invalid => Access violation.
The page table entry indicates that no physical memory is mapped to it => Page fault.
(I omit mode access checks).
It is this last step that last step where virtual memory comes into play. At that point the page fault handler of the operating system needs to find where the corresponding page has been stored to disk, load it, update the page table, and restart the instruction.
The operating system manages the available physical memory by paging writeable memory (that has changed) to disk (read only data does not have to be written back) when there is high demand for physical memory.
I have never heard of a "relocation register" before. But doing a GOOGLE search I can see that some academic material uses it as a confusing pedagogical concept (i.e., with no relation to reality).
Some systems define the page table using base and limit registers. The base registers indicate where the page table starts in memory (this can be either a physical or logical addresses) and the limit register indicates the side of the table.
The registers are usually not loaded directly. Their values are usually written to the hardware Process Context Block (PCB). When the process context is loaded, the page table base and limit are loaded automatically.
On some systems there are multiple page tables. If there are system and user page tables, the user page tables can refer to logical addresses in the system space and the system page tables refer to physical addresses.

Resources