Memory Segmentation on modern OSes: why do you need 4 segments?

Memory Segmentation on modern OSes: why do you need 4 segments? - memory-management

From wikipedia:
"Segmentation cannot be turned off on
x86 processors, so many operating
systems use a flat memory model to
make segmentation unnoticeable to
programs. For instance, the Linux
kernel sets up only 4 segments"
I mean since protection is already taken care of by the virtual memory subsystem (PTEs have a protection bit) why would you need 4 segments (instead of 2: i.e. data/code with DPL 3 since you can execute code residing in a lower privileged segment)?
Thanks.

You didn't quote enough of that wikipedia page where it describes the four segments and why all are needed...
Usually, however, implied segments are
used. All instruction fetches come
from the code segment in the CS
register. Most memory references come
from data segment in the DS register.
Processor stack references, either
implicitly (e.g. push and pop
instructions) or explicitly (memory
accesses using the ESP or (E)BP
registers) use the stack segment in
the SS register. Finally, string
instructions (e.g. stos, movs) also
use the extra segment ES.
So if you want to set up a flat model where programmers don't need to think about segmentation, you need to set up all four of these segment registers (CS, DS, SS, ES) to have the same base. Then addresses computed with respect to all four are equivalent.
That page shows an example with all four set to base=0, limit=4Gb

You have a separate set of segments for kernel and user mode so that user mode code cannot write to kernel mode data. That would be a bad thing.

Related

What does the following assembly instruction mean "mov rax,qword ptr gs:[20h]" [duplicate]

So I know what the following registers and their uses are supposed to be:
CS = Code Segment (used for IP)
DS = Data Segment (used for MOV)
ES = Destination Segment (used for MOVS, etc.)
SS = Stack Segment (used for SP)
But what are the following registers intended to be used for?
FS = "File Segment"?
GS = ???
Note: I'm not asking about any particular operating system -- I'm asking about what they were intended to be used for by the CPU, if anything.

There is what they were intended for, and what they are used for by Windows and Linux.
The original intention behind the segment registers was to allow a program to access many different (large) segments of memory that were intended to be independent and part of a persistent virtual store. The idea was taken from the 1966 Multics operating system, that treated files as simply addressable memory segments. No BS "Open file, write record, close file", just "Store this value into that virtual data segment" with dirty page flushing.
Our current 2010 operating systems are a giant step backwards, which is why they are called "Eunuchs". You can only address your process space's single segment, giving a so-called "flat (IMHO dull) address space". The segment registers on the x86-32 machine can still be used for real segment registers, but nobody has bothered (Andy Grove, former Intel president, had a rather famous public fit last century when he figured out after all those Intel engineers spent energy and his money to implement this feature, that nobody was going to use it. Go, Andy!)
AMD in going to 64 bits decided they didn't care if they eliminated Multics as a choice (that's the charitable interpretation; the uncharitable one is they were clueless about Multics) and so disabled the general capability of segment registers in 64 bit mode. There was still a need for threads to access thread local store, and each thread needed a a pointer ... somewhere in the immediately accessible thread state (e.g, in the registers) ... to thread local store. Since Windows and Linux both used FS and GS (thanks Nick for the clarification) for this purpose in the 32 bit version, AMD decided to let the 64 bit segment registers (GS and FS) be used essentially only for this purpose (I think you can make them point anywhere in your process space; I don't know if the application code can load them or not). Intel in their panic to not lose market share to AMD on 64 bits, and Andy being retired, decided to just copy AMD's scheme.
It would have been architecturally prettier IMHO to make each thread's memory map have an absolute virtual address (e.g, 0-FFF say) that was its thread local storage (no [segment] register pointer needed!); I did this in an 8 bit OS back in the 1970s and it was extremely handy, like having another big stack of registers to work in.
So, the segment registers are now kind of like your appendix. They serve a vestigial purpose. To our collective loss.
Those that don't know history aren't doomed to repeat it; they're doomed to doing something dumber.

The registers FS and GS are segment registers. They have no processor-defined purpose, but instead are given purpose by the OS's running them. In Windows 64-bit the GS register is used to point to operating system defined structures. FS and GS are commonly used by OS kernels to access thread-specific memory. In windows, the GS register is used to manage thread-specific memory. The linux kernel uses GS to access cpu-specific memory.

FS is used to point to the thread information block (TIB) on windows processes .
one typical example is (SEH) which store a pointer to a callback function in FS:[0x00].
GS is commonly used as a pointer to a thread local storage (TLS) .
and one example that you might have seen before is the stack canary protection (stackguard) , in gcc you might see something like this :
mov eax,gs:0x14
mov DWORD PTR [ebp-0xc],eax

TL;DR;
What is the “FS”/“GS” register intended for?
Simply to access data beyond the default data segment (DS). Exactly like ES.
The Long Read:
So I know what the following registers and their uses are supposed to be:
[...]
Well, almost, but DS is not 'some' Data Segment, but the default one. Where all operation take place by default (*1). This is where all default variables are located - essentially data and bss. It's in some way part of the reason why x86 code is rather compact. All essential data, which is what is most often accessed, (plus code and stack) is within 16 bit shorthand distance.
ES is used to access everything else (*2), everything beyond the 64 KiB of DS. Like the text of a word processor, the cells of a spreadsheet, or the picture data of a graphics program and so on. Unlike often assumed, this data doesn't get as much accessed, so needing a prefix hurts less than using longer address fields.
Similarly, it's only a minor annoyance that DS and ES might have to be loaded (and reloaded) when doing string operations - this at least is offset by one of the best character handling instruction sets of its time.
What really hurts is when user data exceeds 64 KiB and operations have to be commenced. While some operations are simply done on a single data item at a time (think A=A*2), most require two (A=A*B) or three data items (A=B*C). If these items reside in different segments, ES will be reloaded several times per operation, adding quite some overhead.
In the beginning, with small programs from the 8 bit world (*3) and equally small data sets, it wasn't a big deal, but it soon became a major performance bottleneck - and more so a true pain in the ass for programmers (and compilers). With the 386 Intel finally delivered relief by adding two more segments, so any series unary, binary or ternary operation, with elements spread out in memory, could take place without reloading ES all the time.
For programming (at least in assembly) and compiler design, this was quite a gain. Of course, there could have been even more, but with three the bottleneck was basically gone, so no need to overdo it.
Naming wise the letters F/G are simply alphabetic continuations after E. At least from the point of CPU design nothing is associated.
*1 - The usage of ES for string destination is an exception, as simply two segment registers are needed. Without they wouldn't be much useful - or always needing a segment prefix. Which could kill one of the surprising features, the use of (non repetitive) string instructions resulting in extreme performance due to their single byte encoding.
*2 - So in hindsight 'Everything Else Segment' would have been a way better naming than 'Extra Segment'.
*3 - It's always important to keep in mind that the 8086 was only meant as a stop gap measure until the 8800 was finished and mainly intended for the embedded world to keep 8080/85 customers on board.

According to the Intel Manual, in 64-bit mode these registers are intended to be used as additional base registers in some linear address calculations. I pulled this from section 3.7.4.1 (pg. 86 in the 4 volume set). Usually when the CPU is in this mode, linear address is the same as effective address, because segmentation is often not used in this mode.
So in this flat address space, FS & GS play role in addressing not just local data but certain operating system data structures(pg 2793, section 3.2.4) thus these registers were intended to be used by the operating system, however those particular designers determine.
There is some interesting trickery when using overrides in both 32 & 64-bit modes but this involves privileged software.
From the perspective of "original intentions," that's tough to say other than they are just extra registers. When the CPU is in real address mode, this is like the processor is running as a high speed 8086 and these registers have to be explicitly accessed by a program. For the sake of true 8086 emulation you'd run the CPU in virtual-8086 mode and these registers would not be used.

The FS and GS segment registers were very useful in 16-bit real mode or 16-bit protected mode under 80386 processors, when there were just 64KB segments, for example in MS-DOS.
When the 80386 processor was introduced in 1985, PC computers with 640KB RAM under MS-DOS were common. RAM was expensive and PCs were mostly running under MS-DOS in real mode with a maximum of that amount of RAM.
So, by using FS and GS, you could effectively address two more 64KB memory segments from your program without the need to change DS or ES registers whenever you need to address other segments than were loaded in DS or ES. Essentially, Raffzahn has already replied that these registers are useful when working with elements spread out in memory, to avoid reloading other segment registers like ES all the time. But I would like to emphasize that this is only relevant for 64KB segments in real mode or 16-bit protected mode.
The 16-bit protected mode was a very interesting mode that provided a feature not seen since then. The segments could have lengths in range from 1 to 65536 bytes. The range checking (the checking of the segment size) on each memory access was implemented by a CPU, that raised an interrupt on accessing memory beyond the size of the segment specified in the selector table for that segment. That prevented buffer overrun on hardware level. You could allocate own segment for each memory block (with a certain limitation on a total number). There were compilers like Borland Pascal 7.0 that made programs that run under MS-DOS in 16-bit Protected Mode known as DOS Protected Mode Interface (DPMI) using its own DOS extender.
The 80286 processor had 16-bit protected mode, but not FS/GS registers. So a program had first to check whether it is running under 80386 before using these registers, even in the real 16-bit mode. Please see an example of use of FS and GS registers a program for MS-DOS real mode.

Difference between relative and logical address

I'm reading about memory management from a book called Operating Systems.
I've studied about this subject before and it was all clear because there were only two types of addresses introduced: Physical & Logical (Physical & Virtual). However, this book seems to introduce three types where it sometimes views two of them as the same, and sometimes as different.
Here's a quote (translated myself, so might not be the best):
At the time of writing a program it is not know at which point in the
memory the program will be, which is why symbolic addresses are used
(variable names). The process of translating symbolic addresses into
physical addresses is called address binding and it can be done at
different points in time. If, during the compilation, it is known in
which part of the memory the program will be then address binding can
be done at that point. Otherwise (the most common case) the compiler
generates relative addresses (relative to the start of the part of
the memory that the process gets). When executing a program the
loader maps relative addresses into physical addresses.
This all seems to be pretty clear. Relative maps to the physical. Here's what comes after:
During process execution, the interaction with memory is done through
sequences of reading and writing into memory locations. The CPU either
reads instructions or data from the memory or writes data into the
memory. Within both of these tasks, the CPU does not use physical
addresses but rather logical ones which the CPU generates itself. The set of all logical
addresses is called the Virtual Address Space.
This is already confusing as it is. What's the difference between a logical and a relative address? Wherever else I look this up they're never separated. Here comes an even more confusing sentence:
In case the address binding is done at the time of compilation and
loading then the virtual address space matches the physical address
space.
Earlier on it is stated that address binding is the process of converting symbolic addresses into physical addresses. But then only later on is the concept of relative addresses introduced. And loading is said to be the process of converting relative into physical. So now I'm completely lost here.
Assuming that we have no knowledge of which part of the memory the process is going to take: how does the timeline go? The program is compiled, the variable names (symbolic addresses) are translated into ... relative ones I guess? Then the CPU needs to do some read/write and it uses ... logical ones?
And furthermore, the terms relative and logical seem to be used randomly in the following sections of the book. As if they're the same, but still defined as different.
Could anyone clarify this for me? The perfect answer would be maybe an artificial example of a program timeline. At which point is which address introduced, what is the difference between a logical and a relative address?
Thanks in advance.

A relative address means a distance between two locations or addresses (which can be logical, linear/virtual or physical, which isn't important at this point).
For example, the x86 call and jump instructions have a form that specifies the distance (counted from the byte after the end of the call/jump instruction) to call/jump. That distance is simply added to the instruction pointer register ([R|E]IP) and that's the location where the next instruction will come from (again, I'm ignoring logical, ..., physical for now).
If your program contains a subroutine and calls it using such an instruction, it doesn't matter where the program is located in memory since the distance between two locations of the whole remains the same (things will become more complex if the whole program consists of several moving parts, including one or more libraries, but let's not go there).
Now, let's say your program has a global variable and needs to read it. If there is a memory reading instruction similar to the call instruction described above, you can again use the distance from the instruction pointer to the location of the variable. Prior to the 64-bit x86 CPUs there was no such instruction/mechanism to access data, only calls and jumps could be IP-relative.
In absence of such an IP-relative data addressing mechanism, you need to know the actual address of the variable, which you won't know until the program is loaded into memory for execution. What's done in this case is that the instruction that reads the variable initially receives the address of the variable relative to IP (that of the instruction that reads the variable) or simply the program's start. And that's how the program is stored on disk, with a relative address inside the instruction. Once loaded, but before the program starts execution, the address of the variable in the instruction that reads it is adjusted such that it becomes the actual address and not relative to something (IP or program's start). The further away the program's start is from address 0, the larger adjustment needs to be added to that relative address.
Get the idea?
And now something almost entirely different and unrelated...
In the context of x86 CPUs, there are these kinds of addresses:
Logical
Linear/virtual
Physical
If we go back all the way to the 8086/8088... Actually, if we go even further back to the 8080/8085, all memory addresses are 16-bit, they don't undergo any translation by the CPU and are presented as-is to the memory, hence they're physical (we're not talking about IP/PC-relative call/jump instructions here).
16 bits allow for 64KB of memory. The 8086/8088 extended those 16 bit addresses with another 16 bits to address more than 64KB of memory, but it didn't just widen all registers and addresses from 16 to 32 bits. Instead it introduced special segment registers, which would be used in pairs with those old 16-bit addresses of the 8080/8085. So, a pair of registers such as DS (a segment register) and BX (a regular general-purpose register) could address memory at address DS * 16 + BX. The pair DS:BX is the logical address, the value DS * 16 + BX is the physical address. With this scheme we can access approximately 1MB of memory (just plug in 65535 for both registers).
The 80286 slightly changed the above by introducing the so-called protected mode, in which the physical address was calculated as segment_table[DS] + BX (this allowed to go from 1MB to 16MB), but the idea was still the same.
Next came along the 80386 and widened registers to 32 bits and introduced yet another layer of indirection. The physical address was now, simplifying a bit, page_tables[segment_table[DS] + EBX].
The pair DS:EBX constitutes the logical address, this is what the program manipulates with (e.g. in instruction MOV EAX, DS:[EBX]), this is what it can observe.
segment_table[DS] + EBX constitutes the linear/virtual address (which the program may not always know since it can't see into segment_table[], a table managed by the OS). If page translation isn't enabled, this linear/virtual address is also equal to the final, physical address.
With page translation enabled, the physical address is page_tables[segment_table[DS] + EBX].
What's more to know:
logical addresses can be more complex, e.g. DS:[EAX + EBX * 2 + 3]
OSes commonly set up segment_table[] such that segment_table[any segment register]=0, effectively removing the segmentation mechanism out of the picture and ending up with e.g. physical address = page_tables[EAX + EBX * 2 + 3]. While it's not entirely correct to say that in such a set up logical and linear/virtual addresses are the same (EAX + EBX * 2 + 3), it definitely simplifies thinking.
Now, what do these segment and page tables have to do with relative addresses and relocation discussed at the beginning? These tables just let you place your program anywhere in physical memory, often in a very transparent way to the program itself. It doesn't need to know where it's physically at or whether page translation is enabled.
However, there are certain benefits to using page translation, but that's outside of the scope here.

Where is segment table stored ?

In the segmentation scheme, everytime a memory access is made, the MMU would do a translation from to the actual address by looking up the segment table.
Is the segment table stored inside the TLB or in RAM ?

Is the segment table stored inside the TLB or in RAM ?
This depends on which type of CPU and which mode the CPU is in.
For 80x86, when a segment register is loaded the CPU stores "base address, limit and attributes" for the segment in a hidden part of the segment register.
For real mode, virtual8086 mode and system management mode, when a segment register is loaded the CPU just does "hidden segment base = segment value * 16" and there's no tables in RAM.
For protected mode and long mode, when a segment register is loaded the CPU uses the value being loaded into the segment register as an index into a table in RAM, and (after doing protection checks) loads the "base address, limit and attributes" information from the corresponding table entry into the hidden part of the segment register.
Note that (for protected mode) almost nobody used segmentation because the segment register loads are slow (due to protection checks and table lookups); so CPU manufacturers optimised the CPU for "no segmentation" (e.g. if segment bases are zero, instead of doing "linear address = virtual address + segment base" a modern CPU will just do "linear address = virtual address" and avoid the cost of an unnecessary addition and start cache/memory lookup sooner) and didn't bother optimising segment register loads much either; and then when AMD designed long mode they realised nobody wants segmentation and disabled most of it for 64-bit code (ignoring segment bases for most segment registers to get rid of the extra addition, and ignoring segment limits to get rid of the cost of segment limit checks). However, operating systems that don't use segmentation were using gs and fs as a hack to get fast access to CPU specific or thread specific data (because, unlike some other CPUs, 80x86 doesn't have register/s that can only be modified by supervisor code that would be more convenient for this purpose); so AMD kept the "linear address = virtual address + segment base" behaviour for these 2 segment registers and added the ability to modify the hidden "base address" part of gs and fs (via. MSRs and swapgs) to make it easier to port operating systems (Windows) to long mode.
In other words, for 80x86 there are 3 different ways to set a segment's information (by calculation, by table lookup, or by MSR).
Also note that for most instructions (excluding things like segment register loads) 80x86 CPU's don't care how a segment's information was set and only use the hidden parts of segment registers. This means that the CPU doesn't have to consult a table every time it fetches code from cs and every time it fetches data from memory. It also means that the majority of the CPU doesn't care which mode the CPU is in (e.g. instructions like mov eax,[ds:address] only depend on the values in the hidden part of segment registers and don't depend on the CPU mode); which is why there's no benefit to removing obsolete CPU modes (removing support for real mode wouldn't reduce the size or complexity of the CPU).
For other CPUs; most don't support segmentation (and only support paging or nothing), and I'm not familiar with how it works for any that do support it. However I doubt any CPU would do a table lookup every time anything is fetched (it'd be far too slow/expensive to be practical); and I'd expect that for all CPUs that support segmentation, information for "currently in use" segments is stored internally somehow.

The Segment table is the reference whenever you are using the memory . So the table has to be stored permanently for later use , so it is stored in the Physical Address i.e.., the RAM.

Why have two overlapping data segments (e.g. in the Linux kernel)?

In the Linux kernel, as well as in many x86 tutorials online, I see that people recommend using two code segments and two data segments. I understand the need for two code segments, as the CPL needs to exactly match the DPL (for non-conforming segments).
However, none of these tutorials (nor any of the related questions on StackOverflow), specifically say why we need two data segments. These work differently than code segments, since a process with CPL=0 can access a data segment with DPL=3.
The downside to having two data segments is having to reload the DS, ES, etc. registers if we have switching between processes of different privilege levels.
So my specific question is: given that we are using a flat memory model, so that all code and segments entirely overlap, what purpose does it serve to have a user and a kernel data segment, as opposed to just one user data segment?

There is an explanation here.
Quoting from Intel manuals (Section 5.7)
Privilege level checking also occurs when the SS register is loaded with the segment selector for a stack segment.
Here all privilege levels related to the stack segment must match the CPL; that is, the CPL, the RPL of the stacksegment selector, and the DPL of the stack-segment descriptor must be the same. If the RPL and DPL are not equal
to the CPL, a general-protection exception (#GP) is generated.
Emphasis mine
That is, SS requires a data segment with DPL equals to 0 when loaded from kernel (or during switches).
This is true for 32 bit mode.
In 64 bit mode, it is possible to use a NULL selector to suppress any runtime check (including the previous one)1
In 64-bit mode, the processor does not perform runtime checking on NULL segment selectors. The processor does
not cause a #GP fault when an attempt is made to access memory where the referenced segment register has a
NULL segment selector.
Out of completeness, when performing a stack operation all the relevant information, address size, operand size and stack-address size, are either recovered from the code segment or are implicitly set to 64 bits.
1 If I remember correctly, 64 bit mode still use a kernel data segment though, for compatibility reasons.

introduction to CS - stored-program concept - can't understand concept

I really do tried to understand the Von Neumann architecture, but there is one thing I can't understand, how can the user know the number in the computer's memory if this command or if it is a data ?
I know there is a 'stored-program concept', but I understood nothing...
Can someone explain it to me in a two sentences ?
thnx !

Put simply, the user cannot look at a memory address and determine if it is a command or data. It can be both.
Its all in the interpretation; if the program counter points to a memory address, it will be interpreted as a command. If it is referenced by a read instruction, it is data.
The point of this is flexibility. A program can write (or re-write) programs into memory, that can then be executed by setting the program counter to the start address.
Modern operating systems limit this behaviour by data execution prevention, keeping parts of the memory from being interpreted as commands.

The Basic concept of Stored program concept is the idea of storing data and instructions together in main memory.

NOTE: This is a vastly oversimplified answer. I intentionally left a lot of things out for the sake of making the point
Remember that all computer memory is, for all intents and purposes on modern machines, a long list of bytes. The numbers are meaningless unless the thing that put them there has a specific purpose for them.
I could put number 5 at address 0. It could represent the 5th instruction specified by my CPU's instruction-set manual. It could represent the number of hours of sleep I had last week. It's meaningless unless it's assigned some value.
So how do computers know what to actually "do" with the numbers?
It's a large combination of standards and specifications, which are documents or code that specify which data should go where, which each piece of data means, what acceptable values for the data are, etc. Such standards are (usually) agreed upon by the masses.
Standards exist everywhere. Your BIOS has specifications as to where to look for the main operating system entry point on the boot media (your hard disk, a live CD, a bootable USB stick, etc.).
From there, the operating system adheres to standards that dictate where in memory the VGA buffer exists (0xb8000 on x86 machines, for example) in order to output all of that boot up text you see when you start your machine.
So on and so forth.
A portable executable (windows) or an ELF image (linux) or a Mach-O image (MacOS) are just files that also follow a specification, usually mandated by the operating system manufacturer, that put pieces of code at specific positions in the file. Then that file is simply loaded into memory, given a specific virtual address in user space, and then the operating system knows exactly where the entry point for your program is.
From there, it sets up the instruction pointer (IP) to point to the current instruction byte. On most CPUs, the current byte pointed to by the IP activates specific circuits in the CPU to perform some action.
For example, on x86 CPUs, byte 0x04 is the ADD instruction that takes the next byte (so IP + 1), reads it as an unsigned 8 bit number, and adds it to the al register. This is mandated by the x86 specification, which all x86 CPUs have agreed to implement.
That means when the IP register is pointing to a byte with the value of 0x04, it will perform the add and increase the IP by 2 - the first is to skip the ADD instruction itself, and the second is to skip the "argument" (operand) to the ADD instruction.
The IP advances as fast as the CPU (and the operating system's scheduler) will allow it to - which amounts to a "running" program.
What the data mean is defined entirely by what's creating the data and what's using it. In the best of circumstances, the two parties agree, usually via a standard or specification of some sort.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio