Addressing schemes used for expandable RAM's? - processor

Hi people I was wondering about something called addressing schemes that the operating systems use for expandable RAM's. Let us consider an example to throw some light on this.
"If we have a 32 bit architecture for the computer then this means we have a computer address that is 32-bit long which amounts to 2^32 addressable memory location approximately 4GB of data."
But if we add another 4GB of main memory that is now 8GB of RAM in effect, how does the computer address the extra main memory locations because that additional amount exceeds the range of a 32 bit address which is 2^32.
Can anyone throw some light on this question.

Basically, you can't address 8GB with just 32bits. At any given point in time using 32bits you can only choose from 4G memory locations .
A popular workaround is to use physical addresses which are larger than 32 bits in the page tables. This allows the operating system to define which subset of the 8GB a program is able to access. However, this subset can never be bigger than 4GB. x86 PAE is one example, but there are others which do just the same.
With this workaround the operating system itself can access the whole 8GB only by changing it's own page table. E.g. to access a memory location, it first has to map the memory location into its own address space by changing the page table, and only then can start accessing the memory location. Of course this is very cumbersome (to say the least). It can also lead to problems if parts of the operating system were written without considering this type of memory extension, device drivers are a typical example.
The problem is not new. 8bit Computers like Commodores C64 used bank switching to access more than 64KB with 16bit addresses. Early PCs used expanded memory to work around the 640KB limit. The Right Thing (TM) is of course to switch to bigger addresses before you have to resort to ugly solutions.

Related

CreateFileMapping size limit in a 64-bit process

Is there a size limit to the file mapping object? The reason I'm asking is that there is a mentioning of 2GB limit somewhere in MSDN (lost the track..) and I also checked this sample, which also expects 2GB size limit:
https://cpp.hotexamples.com/examples/-/-/CreateFileMapping/cpp-createfilemapping-function-examples.html
But I tried on a 40GB file with no problems on newest Win 10, so I'm a bit worried if there wasn't some limitation on older Windows for example.
There is no 2GB limit for file mappings. You can boot 32-bit Windows with the 3GB option or when a 32-bit process running on a 64-bit system, you get the full 4GB if the correct PE flag is set. All these limits are theoretical and you will never reach them in practice.
How large of a view you can map depends on two things;
The contiguous range of free addresses in your processes address space.
Available kernel memory for keeping track of the memory pages.
The first one is the big limit on 32-bit systems since the address space of your process is shared with system libraries, 3rd-party libraries (anti-virus, injected "tweaking" tools etc.), the PEB and TEBs, the system region, thread stacks and memory reserved by hardware. This will often put you well below 2GB. Any design requiring more than 500MB should probably be changed to only map in specific smaller ranges as needed.
For a 64-bit process on 64-bit Windows, the virtual address space is the 128-terabyte range 0x000'00000000 through 0x7FFF'FFFFFFFF (KB889654 claims 8 TB but that only applies to < Windows 8.1). Any usable range is going to be smaller but you can assume a couple of terabyte at least. 40GB is no problem and not enough to run into problems with low system resources either.

Why 64 bit mode ( Long mode ) doesn't use segment registers?

I'm a beginner level of student :) I'm studying about intel architecture,
and I'm studying a memory management such as a segmentation and paging.
I'm reading Intel's manual and it's pretty nice to understand intel's architectures.
However I'm still curious about something fundamental.
Why in the 64bit long mode, all segment registers are going to bit 0?
Why system doesn't use segment registers any longer?
Because system's 64bit of size (such as a GP registers) are enough to contain those logical address at once?
Is protection working properly in 64bit mode?
I tried to find 64bit addressing but I couldn't find in Google. Perhaps I have terrible searching skill or I may need some specfied previous knowledge to searching in google.
Hence I'd like to know why 16bit of segment registers are not going to use in 64bit mode,
and how could protection work properly in 64bit mode.
Thank you!
In a manner of speaking, when you perform array ("indexed") type addressing with general registers, you are doing essentially the same thing as the segment registers. In the bad old days of 8-bit and 16-bit programming, many applications required much more data (and occasionally more code) than a 16-bit address could reach.
So many CPUs solved this by having a larger addressable memory space than the 16-bit addresses could reach, and made those regions of memory accessible by means of "segment registers" or similar. A program would set the address in a "segment register" to an address above the (65536 byte) 16-bit address space. Then when certain instructions were executed, they would add the instruction specified address to the appropriate (or specified) "segment register" to read data (or code) beyond the range of 16-bit addresses or 16-bit offsets.
However, the situation today is opposite!
How so? Today, a 64-bit CPU can address more than (not less than) all addressable memory space. Most 64-bit CPUs today can address something like 40-bits to 48-bits of physical memory. True, there is nothing to stop them from addressing a full 64-bit memory space, but they know nobody (but the NSA) can afford that much RAM, and besides, hanging that much RAM on the CPU bus would load it down with capacitance, and slow down ALL memory accesses outside the CPU chip.
Therefore, the current generation of mainstream CPUs can address 40-bits to 48-bits of memory space, which is more than 99.999% of the market would ever imagine reaching. Note that 32-bits is 4-gigabytes (which some people do exceed today by a factor of 2, 4, 8, 16), but even 40-bits can address 256 * 4GB == 1024GB == 1TB. While 64GB of RAM is reasonable today, and perhaps even 256GB in extreme cases, 1024GB just isn't necessary except for perhaps 0.001% of applications, and is unaffordable to boot.
And if you are in that 0.001% category, just buy one of the CPUs that address 48-bits of physical memory, and you're talking 256TB... which is currently impractical because it would load down the memory bus with vastly too much capacitance (maybe even to the point the memory bus would stop completely stop working).
The point is this. When your normal addressing modes with normal 64-bit registers can already address vastly more memory than your computer can contain, the conventional reason to add segment registers vanishes.
This doesn't mean people could not find useful purposes for segment registers in 64-bit CPUs. They could. Several possibilities are evident. However, with 64-bit general registers and 64-bit address space, there is nothing that general registers could not do that segment registers can. And general purpose registers have a great many purposes, which segment registers do not. Therefore, if anyone was planning to add more registers to a modern 64-bit CPU, they would add general purpose registers (which can do "anything") rather than add very limited purpose "segment registers".
And indeed they have. As you may have noticed, AMD and Intel keep adding more [sorta] general-purpose registers to the SIMD register-file, and AMD doubled the number of [truly] general purpose registers when they designed their 64-bit x86_64 CPUs (which Intel copied).
Most answers to questions on irrelevance of segment registers in a 32/64 bit world always centers around memory addressing. We all agree that the primary purpose of segment registers was to get around address space limitation in a 16 bit DOS world. However, from a security capability perspective segment registers provide 4 rings of address space isolation, which is not available if we do 64 bit long mode, say for a 64 bit OS. This is not a problem with current popular OS's such as Windows and Linux that use only ring 0 and ring 3 with two levels of isolation. Ring 1 and 2 are sometimes part of the kernel and sometimes part of user space depending on how the code is written. With the advent of hardware virtualization (as opposed to OS virtualization) from isolation perspective, hypervisors did not quite fit in either in ring 0 or ring 1/2/3. Intel and AMD added additional instructions (e.g., INTEL VMX) for root and non-root operations of VM's.
So what is the point being made? If one is designing a new secure OS with 4 rings of isolation then we run in to problems if segmentation is disabled. As an example, we use one ring each for hardware mux code, hypervisor code /containers/VM, OS Kernel and User Space. So we can make a case for leveraging additional security afforded by segmentation based on requirements stated above. However, Intel/AMD still allow F and G segment registers to have non-zero value (i.e., segmentation is not disabled). To best of my knowledge no OS exploits this ray of hope to write more secure OS/Hypervisor for hardware virtualization.

What is the maximum addressable space of virtual memory?

Saw this questions asked many times. But couldn't find a reasonable answer. What is actually the limit of virtual memory?
Is it the maximum addressable size of CPU? For example if CPU is 32 bit the maximum is 4G?
Also some texts relates it to hard disk area. But I couldn't find it is a good explanation. Some says its the CPU generated address.
All the address we see are virtual address? For example the memory locations we see when debugging a program using GDB.
The historical reason behind the CPU generating virtual address? Some texts interchangeably use virtual address and logical address. How does it differ?
Unfortunately, the answer is "it depends". You didn't mention an operating system, but you implied linux when you mentioned GDB. I will try to be completely general in my answer.
There are basically three different "address spaces".
The first is logical address space. This is the range of a pointer. Modern (386 or better) have memory management units that allow an operating system to make your actual (physical) memory appear at arbitrary addresses. For a typical desktop machine, this is done in 4KB chunks. When a program accesses memory at some address, the CPU will lookup where what physical address corresponds to that logical address, and cache that in a TLB (translation lookaside buffer). This allows three things: first it allows an operating system to give each process as much address space as it likes (up to the entire range of a pointer - or beyond if there are APIs to allow programs to map/unmap sections of their address space). Second it allows it to isolate different programs entirely, by switching to a different memory mapping, making it impossible for one program to corrupt the memory of another program. Third, it provides developers with a debugging aid - random corrupt pointers may point to some address that hasn't been mapped at all, leading to "segmentation fault" or "invalid page fault" or whatever, terminology varies by OS.
The second address space is physical memory. It is simply your RAM - you have a finite quantity of RAM. There may also be hardware that has memory mapped I/O - devices that LOOK like RAM, but it's really some hardware device like a PCI card, or perhaps memory on a video card, etc.
The third type of address is virtual address space. If you have less physical memory (RAM) than the programs need, the operating system can simulate having more RAM by giving the program the illusion of having a large amount of RAM by only having a portion of that actually being RAM, and the rest being in a "swap file". For example, say your machine has 2MB of RAM. Say a program allocated 4MB. What would happen is the operating system would reserve 4MB of address space. The operating system will try to keep the most recently/frequently accessed pieces of that 4MB in actual RAM. Any sections that are not frequently/recently accessed are copied to the "swap file". Now if the program touches a part of that 4MB that isn't actually in memory, the CPU will generate a "page fault". THe operating system will find some physical memory that hasn't been accessed recently and "page in" that page. It might have to write the content of that memory page out to the page file before it can page in the data being accessed. THis is why it is called a swap file - typically, when it reads something in from the swap file, it probably has to write something out first, effectively swapping something in memory with something on disk.
Typical MMU (memory management unit) hardware keeps track of what addresses are accessed (i.e. read), and modified (i.e. written). Typical paging implementations will often leave the data on disk when it is paged in. This allows it to "discard" a page if it hasn't been modified, avoiding writing out the page when swapping. Typical operating systems will periodically scan the page tables and keep some kind of data structure that allows it to intelligently and quickly choose what piece of physical memory has not been modified, and over time builds up information about what parts of memory change often and what parts don't.
Typical operating systems will often gently page out pages that don't change often (gently because they don't want to generate too much disk I/O which would interfere with your actual work). This allows it to instantly discard a page when a swapping operation needs memory.
Typical operating systems will try to use all the "unused" memory space to "cache" (keep a copy of) pieces of files that are accessed. Memory is thousands of times faster than disk, so if something gets read often, having it in RAM is drastically faster. Typically, a virtual memory implementation will be coupled with this "disk cache" as a source of memory that can be quickly reclaimed for a swapping operation.
Writing an effective virtual memory manager is extremely difficult. It needs to dynamically adapt to changing needs.
Typical virtual memory implementations feel awfully slow. When a machine starts to use far more memory that it has RAM, overall performance gets really, really bad.

Difference between physical addressing and virtual addressing concept

This is a re-submission, because I am not getting any response from superuser.com. Sorry for the misunderstanding.
I need to know the difference between physical addressing and virtual addressing concept in embedded systems.
Why virtual addressing concept is implemented in embedded systems?
What is the advantage of the virtual addressing over a system with physical addressing concept in embedded systems?
How the mapping between virtual addressing to physical addressing is done in embedded systems?
Please, explain the above concept with some simple examples in some simple architecture.
Physical addressing means that your program actually knows the real layout of RAM. When you access a variable at address 0x8746b3, that's where it's really stored in the physical RAM chips.
With virtual addressing, all application memory accesses go to a page table, which then maps from the virtual to the physical address. So every application has its own "private" address space, and no program can read or write to another program's memory. This is called segmentation.
Virtual addressing has many benefits. It protects programs from crashing each other through poor pointer manipulation, etc. Because each program has its own distinct virtual memory set, no program can read another's data - this is both a safety and a security plus. Virtual memory also enables paging, where a program's physical RAM may be stored on a disk (or, now, slower flash) when not in use, then called back when an application attempts to access the page. Also, since only one program may be resident at a particular physical page, in a physical paging system, either a) all programs must be compiled to load at different memory addresses or b) every program must use Position-Independent Code, or c) some sets of programs cannot run simultaneously.
The physical-virtual mapping may be done in software (with hardware support for memory traps) or in pure hardware. Sometimes even the page tables themselves are on a special set of hardware memory. I don't know off the top of my head which embedded system does what, but every desktop has a hardware TLB (Translation Lookaside Buffer, basically a cache for the virtual-physical mappings) and some now have advanced Memory Mapping Units that help with virtual machines and the like.
The only downsides of virtual memory are added complexity in the hardware implementation and slower performance.
The VAX (Virtual Address eXtented by Digital Equipment Corp which became Compaq, which became HP) is a very good example of an virtual embeded hardware system. It was a 32 bit mini computer that had an OS called VMS or Virtual Memory Systems. Dave Cutler was one of the principle architets of the systems and he much later wrote the Kernal for Windows NT. He is a very good read for this and other stuff. The Vax had special hardware for control of the virtual space and control of opcode access for security through hardware... very secure. This system was or is the grandfather of the modfern day PC at the Kernal Level. The first BSOD I saw on WNT 3.51 I was able to read because it came from the crash dump used in VMS to stop the system when unstable. By te way Look at the name VMS and WNT and you will find the next letters in the alhabet from VMS makes the term WNT. This was not an accident. maybe a jab at DEC for letting him go.

Why is Available Physical Memory (dwAvailPhys) > Available Virtual Memory (dwAvailVirtual) in call GlobalMemoryStatus on Windows Vista x64

I am playing with an MSDN sample to do memory stress testing (see: http://msdn.microsoft.com/en-us/magazine/cc163613.aspx) and an extension of that tool that specifically eats physical memory (see http://www.donationcoder.com/Forums/bb/index.php?topic=14895.0;prev_next=next). I am obviously confused though on the differences between Virtual and Physical Memory. I thought each process has 2 GB of virtual memory (although I also read 1.5 GB because of "overhead". My understanding was that some/all/none of this virtual memory could be physical memory, and the amount of physical memory used by a process could change over time (memory could be swapped out to disc, etc.)I further thought that, in general, when you allocate memory, the operating system could use physical memory or virtual memory. From this, I conclude that dwAvailVirtual should always be equal to or greater than dwAvailPhys in the call GlobalMemoryStatus. However, I often (always?) see the opposite. What am I missing.
I apologize in advance if my question is not well formed. I'm still trying to get my head around the whole memory management system in Windows. Tutorials/Explanations/Book recs are most welcome!
Andrew
That was only true in the olden days, back when RAM was expensive. The operating system maps pages of virtual memory to RAM as needed. If there isn't enough RAM to satisfy a program's request, it starts unmapping pages to make room. If such a page contains data instead of code, it gets written to the paging file. Whenever the program accesses that page again, it generates a paging fault, letting the operating system read the page back from disk.
If the machine has little RAM and lots of processes consuming virtual memory pages, that can cause a very unpleasant effect called "thrashing". The operating system is constantly accessing the disk and machine performance slows down to a crawl.
More RAM means less disk access. There's very little reason not to use 3 or 4 GB of RAM on a 32-bit operating system, it's cheap. Even if you do not get to use all 4 GB, not all of it will be addressable due hardware devices taking space on the address bus (video, mostly). But that won't change the size of the virtual memory accessible by user code, it is still 2 Gigabytes.
Windows Internals is a good book.
The amount of virtual memory is limited by size of the address space - which is 4GB per process on a 32-bit system. And you have to subtract from this the size of regions reserved for system use and the amount of VM used already by your process (including all the libraries mapped to its address space).
On the other hand, the total amount of physical memory may be higher than the amount of virtual memory space the system has left free for your process to use (and these days it often is).
This means that if you have more than ~2GB or RAM, you can't use all your physical memory in one process (since there's not enough virtual memory space to map it to), but it can be used by many processes. Note that this limitation is removed in a 64-bit system.
I don't know if this is your issue, but the MSDN page for the GlobalMemoryStatus function contains the following warning:
On computers with more than 4 GB of memory, the GlobalMemoryStatus function can return incorrect information, reporting a value of –1 to indicate an overflow. For this reason, applications should use the GlobalMemoryStatusEx function instead.
Additionally, that page says:
On Intel x86 computers with more than 2 GB and less than 4 GB of memory, the GlobalMemoryStatus function will always return 2 GB in the dwTotalPhys member of the MEMORYSTATUS structure. Similarly, if the total available memory is between 2 and 4 GB, the dwAvailPhys member of the MEMORYSTATUS structure will be rounded down to 2 GB. If the executable is linked using the /LARGEADDRESSAWARE linker option, then the GlobalMemoryStatus function will return the correct amount of physical memory in both members.
Since you're referring to members like dwAvailPhys instead of ullAvailPhys, it sounds like you're using a MEMORYSTATUS structure instead of a MEMORYSTATUSEX structure. I don't know the consequences of that on a 64-bit platform, but on a 32-bit platform that definitely could cause incorrect memory sizes to be reported.

Resources