Where is segment table stored ? - memory-management

In the segmentation scheme, everytime a memory access is made, the MMU would do a translation from to the actual address by looking up the segment table.
Is the segment table stored inside the TLB or in RAM ?

Is the segment table stored inside the TLB or in RAM ?
This depends on which type of CPU and which mode the CPU is in.
For 80x86, when a segment register is loaded the CPU stores "base address, limit and attributes" for the segment in a hidden part of the segment register.
For real mode, virtual8086 mode and system management mode, when a segment register is loaded the CPU just does "hidden segment base = segment value * 16" and there's no tables in RAM.
For protected mode and long mode, when a segment register is loaded the CPU uses the value being loaded into the segment register as an index into a table in RAM, and (after doing protection checks) loads the "base address, limit and attributes" information from the corresponding table entry into the hidden part of the segment register.
Note that (for protected mode) almost nobody used segmentation because the segment register loads are slow (due to protection checks and table lookups); so CPU manufacturers optimised the CPU for "no segmentation" (e.g. if segment bases are zero, instead of doing "linear address = virtual address + segment base" a modern CPU will just do "linear address = virtual address" and avoid the cost of an unnecessary addition and start cache/memory lookup sooner) and didn't bother optimising segment register loads much either; and then when AMD designed long mode they realised nobody wants segmentation and disabled most of it for 64-bit code (ignoring segment bases for most segment registers to get rid of the extra addition, and ignoring segment limits to get rid of the cost of segment limit checks). However, operating systems that don't use segmentation were using gs and fs as a hack to get fast access to CPU specific or thread specific data (because, unlike some other CPUs, 80x86 doesn't have register/s that can only be modified by supervisor code that would be more convenient for this purpose); so AMD kept the "linear address = virtual address + segment base" behaviour for these 2 segment registers and added the ability to modify the hidden "base address" part of gs and fs (via. MSRs and swapgs) to make it easier to port operating systems (Windows) to long mode.
In other words, for 80x86 there are 3 different ways to set a segment's information (by calculation, by table lookup, or by MSR).
Also note that for most instructions (excluding things like segment register loads) 80x86 CPU's don't care how a segment's information was set and only use the hidden parts of segment registers. This means that the CPU doesn't have to consult a table every time it fetches code from cs and every time it fetches data from memory. It also means that the majority of the CPU doesn't care which mode the CPU is in (e.g. instructions like mov eax,[ds:address] only depend on the values in the hidden part of segment registers and don't depend on the CPU mode); which is why there's no benefit to removing obsolete CPU modes (removing support for real mode wouldn't reduce the size or complexity of the CPU).
For other CPUs; most don't support segmentation (and only support paging or nothing), and I'm not familiar with how it works for any that do support it. However I doubt any CPU would do a table lookup every time anything is fetched (it'd be far too slow/expensive to be practical); and I'd expect that for all CPUs that support segmentation, information for "currently in use" segments is stored internally somehow.

The Segment table is the reference whenever you are using the memory . So the table has to be stored permanently for later use , so it is stored in the Physical Address i.e.., the RAM.

Related

What does the following assembly instruction mean "mov rax,qword ptr gs:[20h]" [duplicate]

So I know what the following registers and their uses are supposed to be:
CS = Code Segment (used for IP)
DS = Data Segment (used for MOV)
ES = Destination Segment (used for MOVS, etc.)
SS = Stack Segment (used for SP)
But what are the following registers intended to be used for?
FS = "File Segment"?
GS = ???
Note: I'm not asking about any particular operating system -- I'm asking about what they were intended to be used for by the CPU, if anything.
There is what they were intended for, and what they are used for by Windows and Linux.
The original intention behind the segment registers was to allow a program to access many different (large) segments of memory that were intended to be independent and part of a persistent virtual store. The idea was taken from the 1966 Multics operating system, that treated files as simply addressable memory segments. No BS "Open file, write record, close file", just "Store this value into that virtual data segment" with dirty page flushing.
Our current 2010 operating systems are a giant step backwards, which is why they are called "Eunuchs". You can only address your process space's single segment, giving a so-called "flat (IMHO dull) address space". The segment registers on the x86-32 machine can still be used for real segment registers, but nobody has bothered (Andy Grove, former Intel president, had a rather famous public fit last century when he figured out after all those Intel engineers spent energy and his money to implement this feature, that nobody was going to use it. Go, Andy!)
AMD in going to 64 bits decided they didn't care if they eliminated Multics as a choice (that's the charitable interpretation; the uncharitable one is they were clueless about Multics) and so disabled the general capability of segment registers in 64 bit mode. There was still a need for threads to access thread local store, and each thread needed a a pointer ... somewhere in the immediately accessible thread state (e.g, in the registers) ... to thread local store. Since Windows and Linux both used FS and GS (thanks Nick for the clarification) for this purpose in the 32 bit version, AMD decided to let the 64 bit segment registers (GS and FS) be used essentially only for this purpose (I think you can make them point anywhere in your process space; I don't know if the application code can load them or not). Intel in their panic to not lose market share to AMD on 64 bits, and Andy being retired, decided to just copy AMD's scheme.
It would have been architecturally prettier IMHO to make each thread's memory map have an absolute virtual address (e.g, 0-FFF say) that was its thread local storage (no [segment] register pointer needed!); I did this in an 8 bit OS back in the 1970s and it was extremely handy, like having another big stack of registers to work in.
So, the segment registers are now kind of like your appendix. They serve a vestigial purpose. To our collective loss.
Those that don't know history aren't doomed to repeat it; they're doomed to doing something dumber.
The registers FS and GS are segment registers. They have no processor-defined purpose, but instead are given purpose by the OS's running them. In Windows 64-bit the GS register is used to point to operating system defined structures. FS and GS are commonly used by OS kernels to access thread-specific memory. In windows, the GS register is used to manage thread-specific memory. The linux kernel uses GS to access cpu-specific memory.
FS is used to point to the thread information block (TIB) on windows processes .
one typical example is (SEH) which store a pointer to a callback function in FS:[0x00].
GS is commonly used as a pointer to a thread local storage (TLS) .
and one example that you might have seen before is the stack canary protection (stackguard) , in gcc you might see something like this :
mov eax,gs:0x14
mov DWORD PTR [ebp-0xc],eax
TL;DR;
What is the “FS”/“GS” register intended for?
Simply to access data beyond the default data segment (DS). Exactly like ES.
The Long Read:
So I know what the following registers and their uses are supposed to be:
[...]
Well, almost, but DS is not 'some' Data Segment, but the default one. Where all operation take place by default (*1). This is where all default variables are located - essentially data and bss. It's in some way part of the reason why x86 code is rather compact. All essential data, which is what is most often accessed, (plus code and stack) is within 16 bit shorthand distance.
ES is used to access everything else (*2), everything beyond the 64 KiB of DS. Like the text of a word processor, the cells of a spreadsheet, or the picture data of a graphics program and so on. Unlike often assumed, this data doesn't get as much accessed, so needing a prefix hurts less than using longer address fields.
Similarly, it's only a minor annoyance that DS and ES might have to be loaded (and reloaded) when doing string operations - this at least is offset by one of the best character handling instruction sets of its time.
What really hurts is when user data exceeds 64 KiB and operations have to be commenced. While some operations are simply done on a single data item at a time (think A=A*2), most require two (A=A*B) or three data items (A=B*C). If these items reside in different segments, ES will be reloaded several times per operation, adding quite some overhead.
In the beginning, with small programs from the 8 bit world (*3) and equally small data sets, it wasn't a big deal, but it soon became a major performance bottleneck - and more so a true pain in the ass for programmers (and compilers). With the 386 Intel finally delivered relief by adding two more segments, so any series unary, binary or ternary operation, with elements spread out in memory, could take place without reloading ES all the time.
For programming (at least in assembly) and compiler design, this was quite a gain. Of course, there could have been even more, but with three the bottleneck was basically gone, so no need to overdo it.
Naming wise the letters F/G are simply alphabetic continuations after E. At least from the point of CPU design nothing is associated.
*1 - The usage of ES for string destination is an exception, as simply two segment registers are needed. Without they wouldn't be much useful - or always needing a segment prefix. Which could kill one of the surprising features, the use of (non repetitive) string instructions resulting in extreme performance due to their single byte encoding.
*2 - So in hindsight 'Everything Else Segment' would have been a way better naming than 'Extra Segment'.
*3 - It's always important to keep in mind that the 8086 was only meant as a stop gap measure until the 8800 was finished and mainly intended for the embedded world to keep 8080/85 customers on board.
According to the Intel Manual, in 64-bit mode these registers are intended to be used as additional base registers in some linear address calculations. I pulled this from section 3.7.4.1 (pg. 86 in the 4 volume set). Usually when the CPU is in this mode, linear address is the same as effective address, because segmentation is often not used in this mode.
So in this flat address space, FS & GS play role in addressing not just local data but certain operating system data structures(pg 2793, section 3.2.4) thus these registers were intended to be used by the operating system, however those particular designers determine.
There is some interesting trickery when using overrides in both 32 & 64-bit modes but this involves privileged software.
From the perspective of "original intentions," that's tough to say other than they are just extra registers. When the CPU is in real address mode, this is like the processor is running as a high speed 8086 and these registers have to be explicitly accessed by a program. For the sake of true 8086 emulation you'd run the CPU in virtual-8086 mode and these registers would not be used.
The FS and GS segment registers were very useful in 16-bit real mode or 16-bit protected mode under 80386 processors, when there were just 64KB segments, for example in MS-DOS.
When the 80386 processor was introduced in 1985, PC computers with 640KB RAM under MS-DOS were common. RAM was expensive and PCs were mostly running under MS-DOS in real mode with a maximum of that amount of RAM.
So, by using FS and GS, you could effectively address two more 64KB memory segments from your program without the need to change DS or ES registers whenever you need to address other segments than were loaded in DS or ES. Essentially, Raffzahn has already replied that these registers are useful when working with elements spread out in memory, to avoid reloading other segment registers like ES all the time. But I would like to emphasize that this is only relevant for 64KB segments in real mode or 16-bit protected mode.
The 16-bit protected mode was a very interesting mode that provided a feature not seen since then. The segments could have lengths in range from 1 to 65536 bytes. The range checking (the checking of the segment size) on each memory access was implemented by a CPU, that raised an interrupt on accessing memory beyond the size of the segment specified in the selector table for that segment. That prevented buffer overrun on hardware level. You could allocate own segment for each memory block (with a certain limitation on a total number). There were compilers like Borland Pascal 7.0 that made programs that run under MS-DOS in 16-bit Protected Mode known as DOS Protected Mode Interface (DPMI) using its own DOS extender.
The 80286 processor had 16-bit protected mode, but not FS/GS registers. So a program had first to check whether it is running under 80386 before using these registers, even in the real 16-bit mode. Please see an example of use of FS and GS registers a program for MS-DOS real mode.

How does the address translation between Main Memory and Disk storage work?

This question comes up when I learn about virtual memory and memory management.
The following describes what I understand so far:
The memory hierarchy, which benefits from locality principle, briefly includes (from top to bottom):
register
cache (SRAM)
main memory (DRAM)
disk storage
A page table provides the address translation from virtual address to physical address
The virtual memory provides an abstraction for physical memory
Page(Frame) is the basic unit when memory management unit (MMU) manipulates memory between main memory and disk.
A process can only understand the addresses inside the virtual address space.
Each process has its own virtual address space.
Each process has its own page table, and all page tables are maintained by the kernel.
a physical address describes a location inside main memory
Translation Lookaside Buffer (TLB) is introduced as a cache-version page table.
Cache stores a tag field for each cache line to determine its mapping to main memory
Cache line is the basic unit when MMU manipulates memory between main memory and cache.
In each cache line, it stores a tag field and a valid bit to determine what range of physical addresses (e.g. what part of main memory) resides in the cache line.
After tranlsating virtual address into physical address with TLB or page table, MMU compares the physical address with cache line tag field, and finds the desired memory content (assuming that cache hits).
I believe there's an address translation mechanism between the main memory and the disk for 2 reasons:
The main memory is the locality principle result for the disk.
To reduce main memory miss rate, the main memory applies the full associative placement policy.
However, I only find few material which possiblely relates to the mechanism.
wiki: frame table data says:
Frame table data
The simplest page table systems often maintain a frame table and a page table. The frame table holds information about which frames are mapped.
and wiki: LBA says:
Logical block addressing (LBA) is a common scheme used for specifying the location of blocks of data stored on computer storage devices, generally secondary storage systems such as hard disk drives.
So I guess there's a frame table to store the address translation between physical address and LBA, and MMU would refer to the frame table when page fault occurs.
Please help to point out how does the address translation between Main Memory and Disk storage work.
Thanks for help!
Hard-disk addressing and main memory addressing is done much differently. The CPU doesn't support hard-disk addressing by default. What modern CPUs support is PCI devices that can present an interface to hard-disks like NVME or SATA.
PCI devices have registers that are memory mapped in RAM. The position of these registers is specified in the MCFG which is an ACPI table. ACPI is a convention to represent hardware for software (the os) to be able to determine what is present on the motherboard that it needs to drive. ACPI is also a power management convention which is required for software to even shutdown the computer.
With that said, you can take example on Linux and how it does things to understand how a modern os does to make the link between pages on the hard-disk and the pages in main memory. On the swap management chapter of the kernel.org documentation (https://www.kernel.org/doc/gorman/html/understand/understand014.html), you can read the following:
11.2 Mapping Page Table Entries to Swap Entries
When a page is swapped out, Linux uses the corresponding PTE to store enough information to locate the page on disk again. Obviously a PTE is not large enough in itself to store precisely where on disk the page is located, but it is more than enough to store an index into the swap_info array and an offset within the swap_map and this is precisely what Linux does.
Each PTE, regardless of architecture, is large enough to store a swp_entry_t which is declared as follows in <linux/shmem_fs.h>
16 typedef struct {
17 unsigned long val;
18 } swp_entry_t;
Two macros are provided for the translation of PTEs to swap entries and vice versa. They are pte_to_swp_entry() and swp_entry_to_pte() respectively.
You should probably read the link above along with one of my answers on cs.stackexchange.com: https://cs.stackexchange.com/questions/142525/data-transfer-between-cpu-ram-and-secondary-storage/142553#142553. This will probably provide a fair understanding of what is going on.
Everything is PCI today. You can think of all graphics cards, Intel HD Audio, the AHCI for SATA, the xHCI to drive USB or network cards. That pretty much sums what a current modern computer supports which is audio, USB, SATA, network and graphics. Understanding PCI is thus the key to understand how low level drivers work. The higher level implementation detail is unimportant and varies between os.

What is the maximum addressable space of virtual memory?

Saw this questions asked many times. But couldn't find a reasonable answer. What is actually the limit of virtual memory?
Is it the maximum addressable size of CPU? For example if CPU is 32 bit the maximum is 4G?
Also some texts relates it to hard disk area. But I couldn't find it is a good explanation. Some says its the CPU generated address.
All the address we see are virtual address? For example the memory locations we see when debugging a program using GDB.
The historical reason behind the CPU generating virtual address? Some texts interchangeably use virtual address and logical address. How does it differ?
Unfortunately, the answer is "it depends". You didn't mention an operating system, but you implied linux when you mentioned GDB. I will try to be completely general in my answer.
There are basically three different "address spaces".
The first is logical address space. This is the range of a pointer. Modern (386 or better) have memory management units that allow an operating system to make your actual (physical) memory appear at arbitrary addresses. For a typical desktop machine, this is done in 4KB chunks. When a program accesses memory at some address, the CPU will lookup where what physical address corresponds to that logical address, and cache that in a TLB (translation lookaside buffer). This allows three things: first it allows an operating system to give each process as much address space as it likes (up to the entire range of a pointer - or beyond if there are APIs to allow programs to map/unmap sections of their address space). Second it allows it to isolate different programs entirely, by switching to a different memory mapping, making it impossible for one program to corrupt the memory of another program. Third, it provides developers with a debugging aid - random corrupt pointers may point to some address that hasn't been mapped at all, leading to "segmentation fault" or "invalid page fault" or whatever, terminology varies by OS.
The second address space is physical memory. It is simply your RAM - you have a finite quantity of RAM. There may also be hardware that has memory mapped I/O - devices that LOOK like RAM, but it's really some hardware device like a PCI card, or perhaps memory on a video card, etc.
The third type of address is virtual address space. If you have less physical memory (RAM) than the programs need, the operating system can simulate having more RAM by giving the program the illusion of having a large amount of RAM by only having a portion of that actually being RAM, and the rest being in a "swap file". For example, say your machine has 2MB of RAM. Say a program allocated 4MB. What would happen is the operating system would reserve 4MB of address space. The operating system will try to keep the most recently/frequently accessed pieces of that 4MB in actual RAM. Any sections that are not frequently/recently accessed are copied to the "swap file". Now if the program touches a part of that 4MB that isn't actually in memory, the CPU will generate a "page fault". THe operating system will find some physical memory that hasn't been accessed recently and "page in" that page. It might have to write the content of that memory page out to the page file before it can page in the data being accessed. THis is why it is called a swap file - typically, when it reads something in from the swap file, it probably has to write something out first, effectively swapping something in memory with something on disk.
Typical MMU (memory management unit) hardware keeps track of what addresses are accessed (i.e. read), and modified (i.e. written). Typical paging implementations will often leave the data on disk when it is paged in. This allows it to "discard" a page if it hasn't been modified, avoiding writing out the page when swapping. Typical operating systems will periodically scan the page tables and keep some kind of data structure that allows it to intelligently and quickly choose what piece of physical memory has not been modified, and over time builds up information about what parts of memory change often and what parts don't.
Typical operating systems will often gently page out pages that don't change often (gently because they don't want to generate too much disk I/O which would interfere with your actual work). This allows it to instantly discard a page when a swapping operation needs memory.
Typical operating systems will try to use all the "unused" memory space to "cache" (keep a copy of) pieces of files that are accessed. Memory is thousands of times faster than disk, so if something gets read often, having it in RAM is drastically faster. Typically, a virtual memory implementation will be coupled with this "disk cache" as a source of memory that can be quickly reclaimed for a swapping operation.
Writing an effective virtual memory manager is extremely difficult. It needs to dynamically adapt to changing needs.
Typical virtual memory implementations feel awfully slow. When a machine starts to use far more memory that it has RAM, overall performance gets really, really bad.

32-bit physical page table resolution

I'm running a 32 bit system in legacy mode on a 64-bit (x86-64 that is) capable architecture. When a new process is created, the kernel has to decide where in physical memory all of the pages needed at the time of instantiation are to be allocated (assuming a single thread this may include several memory regions such as the stack, the heaps etc).
I'm assuming the kernel keeps some sort of dynamic list of the physical RAM frames that are in use, and also a static list of all the regions of physical memory that have been taken up by devices for systems that use memory-mapped IO. Is this correct?
In addition, I also read that a 32-bit Windows system has a physical memory limit of 4GB (probably due to minimum address bus assumptions) so, even though a system may have more than 4 gigabytes of physical memory installed, a 32 bit kernel will only allocate addresses within the 4GB range.
Specific information regarding low-level operating system implementation for specific cases such as this is quite difficult to find online. Can anyone verify these statements and possibly refer me to a source where I could attain more information?
Thanks for your considerations.
When a new process is created, the kernel has to decide where in physical memory all of the pages needed at the time of instantiation are to be allocated
Why does it have to decide at process creation time? In fact, it only creates them on-demand - it simply creates the PTEs (i.e. "This address range is valid", but the pages are not backed in any way); when the process first starts executing, it immediately page-faults.
What is a page fault though? What happens is, first the CPU reads the TLB to see if it has an address <=> frame mapping. When that fails, it walks the PTEs looking for an entry that matches. If no entry is found, or if the entry indicates that the page isn't backed, a page-fault is generated. This means, that a CPU exception occurs and the CPU immediately jumps to a predefined address. The first thing the kernel then does is save the CPU Context (i.e. the registers at the location of the fault), then dispatches to the page fault handler.
When the page-fault occurs, Mm (the Memory Manager in NT) will read the mapping in its own data structures (remember that all PE images are memory-mapped files) and determine at that time which physical frame (i.e. 'a real piece of memory') which will be used.
Once the page fault is serviced, the page fault restores the saved CPU context, and jumps back to where it was, and retries the instruction that faulted.
You're correct that a 32-bit OS will only use 4GB of address space (not RAM! Don't forget those memory-mapped devices and files!), the processor will operate in 32-bit mode and interpret the PTEs as 32-bit (remember that AMD64 long mode adds an extra level of page tables and extends the address space to 48 bits).
32bit systems can only ever address 4gig directly (2^32 = 4gig). There's PAE hacks, which let the system have more than 4gig of physical ram, but no process can ever have more than 4gig available. As well, even if you have 4gig of ram, you'll never see more than 3.5gig or so actually available - some is reserved for memory mapping hardware devices, such as your video ram.
For one method of dealing with the physical-virtual memory mapping, look at TLB

Memory Segmentation on modern OSes: why do you need 4 segments?

From wikipedia:
"Segmentation cannot be turned off on
x86 processors, so many operating
systems use a flat memory model to
make segmentation unnoticeable to
programs. For instance, the Linux
kernel sets up only 4 segments"
I mean since protection is already taken care of by the virtual memory subsystem (PTEs have a protection bit) why would you need 4 segments (instead of 2: i.e. data/code with DPL 3 since you can execute code residing in a lower privileged segment)?
Thanks.
You didn't quote enough of that wikipedia page where it describes the four segments and why all are needed...
Usually, however, implied segments are
used. All instruction fetches come
from the code segment in the CS
register. Most memory references come
from data segment in the DS register.
Processor stack references, either
implicitly (e.g. push and pop
instructions) or explicitly (memory
accesses using the ESP or (E)BP
registers) use the stack segment in
the SS register. Finally, string
instructions (e.g. stos, movs) also
use the extra segment ES.
So if you want to set up a flat model where programmers don't need to think about segmentation, you need to set up all four of these segment registers (CS, DS, SS, ES) to have the same base. Then addresses computed with respect to all four are equivalent.
That page shows an example with all four set to base=0, limit=4Gb
You have a separate set of segments for kernel and user mode so that user mode code cannot write to kernel mode data. That would be a bad thing.

Resources