How 2 procecces can corrupt each other while having different virtual memory space - virtual-memory

As I understand, every process has its own virtual address space (2 processes can have the same range, but the OS will map the logical addresses to different physical addresses).
I read about memory corruption which might happen, for example, when process A change data belongs to process B.
Qu 1:
How is is possible? process A can use any of its virtual address space, and if it tries to write to an address (in a page) which it didn't use yet, the OS will make a page fault. And if the address is forbidden to access, so the process will get an alert.
So, there is no option that process A changes the data of process B because the virtual address space of process A is not related at all to the virtual address space of process B.
The MMU will always map process A's virtual address to a page it owns, or unmapped page and therefore a page fault will happen, but there is no case to map it to a process B's physical address.
As I see it, a process can corrupt only its own data, not any other process data.
where is my mistake?
Qu 2:
I understand that the virtual address space size of a process is limited by the disk size.
How can two processes have virtual address space size of disk size (or RAM size), their size can sum up to much more than the disk size (or RAM size).
Thanks :)

Related

How does a memory managemet unit map a virtual address to a physical address

If a computer system has a main memory of 1mb and virtual address space of 16mb while the disk block size is 1kb. How does the memory management unit map a virtual address to a physical address?
I assume you intend to ask "how does MMU maps virtual memory to physical memory" (as in your question description).
To start off, virtual memory is managed by operating systems. MMU only provides hardware mechanism to take advantage of it.
Operating systems will keep a map of virtual_address -> physical_address for each individual process.
For example if a program uses virtual page [0, 1, 2, 3], operating systems can map these pages to physical pages as [64, 128, 256, 512].
Since the virtual address space is larger than physical address space, not all of the virtual memory will be mapped to physical memory at any moment if physical memory cannot hold all of them. Therefore, some of the data would be swapped out to disk, and thus not present in physical memory.
For example, let's simplify your case by assuming that the virtual memory has 8 pages, but the physical memory can only hold 4 pages of data. If the process has data on virtual page [0,1,2,3,4], apparently physical memory cannot hold all 5 pages. Therefore one of the virtual pages will be put in disk, and the memory mappings of system will be something like [0->2, 1->1, 2->3, 4->0], and in this case virtual page 3 will be swapped out in disk.
Those swapped out pages will only be brought back to main memory by OS if the program needs the data, and one of the pages previously present in main memory needs to be swapped out to make space. Algorithms to determine which page to swap out is another topic (for example, LRU, clock algorithm).
In reality the memory system is more complicated than this scenario, because modern operating systems allow multiple processes running in the system, and OS itself uses other techniques (setting threshold to trigger page swapping, for example) to make memory system more efficient.
The memory management using translates LOGICAL addresses to PHYSICAL addresses.
It does that translation using PAGE TABLE defined by the operating system. The format of page tables varies with systems and there are at least three major approaches processors take to define them. Generally, a processor with have one or more privileged registers that point to the page tables for the current process. These registers are normally loaded as part of the context change that brings in a new process.
In the simple case a page table is just an array that contains the mapping between logical and physical address. Some number of high order bits of an address serve as an index into the page table. The corresponding page table entry specifies the physical page frame the logical page is mapped to.
Some number of low order bits of the addressserve as the offset into the physical page frame.
The operating system maintains the page tables in the format the MMU expects them to be in. The MMU does the translation between logical addresses an physical addresses transparently.
The disk block size is irrelevant to this translation.

Can someone explain this diagram on Paging (virtual memory) to me?

I've been trying to understand virtual memory but when I get into the real specifics of it I just get confused. I understand (or feel like I do) the fact that virtual memory is a way for a process to "think" that it has a certain amount of memory allocated to it. The virtual address space is partitioned into equal sized pages, the physical memory is partitioned into equal sized frames, and the pages map to the frames.
But like..when does this happen? In this diagram, the CPU is running Program P. That means that a part of P was already in the physical memory, correct? (Since the cpu only has access to the physical/main memory). So what exactly is being pointed at by the CPU? I see that it's a page in the virtual memory space, so like..what exactly does this page represent? Is it an instruction? Are we moving an instruction from virtual memory to physical memory, so that more of the program is in physical memory (that hadn't been needed up until that point)? Am I way off? Can someone explain this to me?
The diagram shows the process of translating a virtual address to a physical address.
The fat arrow from Program P to CPU symbolizes the program being "fed" into the CPU.1
The CPU "points" to a virtual address used by an instruction to address a memory location in the program P. It is divided into two parts:
Page Table Index (p): the virtual address contains an index into the page table, which maps a page to a page frame (f). For a description of the mechanism, including multi-level paging, read this.
Offset (o): as you can see, the offset is directly added to the physical address, since paging's smallest addressable unit is a page, not a byte
Finally, the calculated address is used to address a memory location in physical memory.
1 "fed" means "read (pronounced like red) from secondary storage into RAM and executing the program instruction by instruction".
I would not bother trying to understand that diagram because it makes no sense.
It is titled "Paging" but the diagram does not show paging at all.
What you are missing is that there are two steps. First there is logical memory translation (what the diagram kinda, sorta) shows.
Physical memory is arranged in an array of PAGE FRAMES of some fixed size (e.g., 1K, 4K).
Each process has a LOGICAL ADDRESS SPACE consisting of PAGES that match the page frame size.
The logical address space is defined by a PAGE TABLE managed by the operating system. The page table maps logical pages to physical page frames.
If there are two processes (X and Y), logical address Q in process X and address Y map to different physical page frames in most cases.
Usually there is a range of logical addresses that are assigned to the SYSTEM ADDRESS SPACE. Those logical pages map to the same physical page for all processes.
Processes only address logical pages. The have no knowledge of physical pages. The Program Counter register always refers to logical addresses. The CPU automatically translates logical pages to physical page frames. The translation is entirely transparent to the process. The operating system is the only thing that has any knowledge of physical page frame but it only manages the page tables; the CPU does the translation.
Paging is something different but related.
When a program accesses a memory address, the CPU attempts to translate that into a physical address within a page frame. Several steps occur.
The CPU locates the page table entry for the requested page.
There may not be a page page table entry at all for the page. The diagram shows a contiguous logical to physical mapping. That rarely occurs. The logical address space usually has clusters of valid pages with gaps between them. If there is no page table entry for the address, the CPU triggers an exception.
The CPU reads the page table entry to determine if it references a valid page frame.
There may be an entry for the page that has not been mapped to the logical address space (e.g., the first page is usually not mapped to trap null pointer errors). If the page has not been mapped, that triggers an exception.
The CPU checks whether the access is permitted for the current processor mode.
Read/Write/Execute protection can be set for a page and access can be restricted by mode (kernel mode, user mode, or some other mode in some processors).
If the access is not permitted, the CPU triggers an exception.
[Here is where paging comes in] The CPU checks whether the page has been mapped to a physical page frame. If not, the CPU triggers a PAGE FAULT. The OS responds by locating where the page is stored in a paging file, mapping the page table to a physical page frame, loading the data from the page file into memory, and then restarting the instruction.
I guess most of your confusion comes from the fact that the above diagram is a little bit misguided.
Please note that the lack of the IP register and some extra text # both 'Tables' are the problematic ones. The rest is OK.
I show you below the same but fixed diagram which makes more sense.
As others guys already told you, the above diagram is just the translation scheme for the addresses that the CPU use to fetch the actual instructions & operands from P's virtual address space. Did you see it? It's all about addresses and nothing else!!!
It shows you how virtual addresses are managed by the CPU (in a paged scheme) in order to reach the next instruction or operand from the real physical memory using physical addresses.
The explanation of 'Downvoter' is great for that, so no need to add anything else.

Windows - How does this memory addressing work?

Assuming x86, I'm starting to learn that addresses 0x0 thru 0x7FFFFFFF are for the process; whereas anything higher is reserved for the kernel.
I have three curiosities:
1) Does a process EVER call an address higher than 0x7FFFFFFF? I assume it will always result in some sort of access denied? How is that access denied enforced?
2) Does "shared memory" IPC work by mapping two processes virtual addresses to the same physical address range?
3) The amount of RAM in your machine can vary. You may have 2GB, or much more like 16GB. How does this affect the addressing of RAM? Does the kernel ever leave a bunch of RAM unused because it was reserved for itself, but doesn't need it? How can I see this?
I am not very sure but you will find the maximum in this MSDN doc about how it works:-
The range of virtual addresses that is available to a process is
called the virtual address space for the process. Each user-mode
process has its own private virtual address space. For a 32-bit
process, the virtual address space is usually the 2-gigabyte range
0x00000000 through 0x7FFFFFFF. For a 64-bit process, the virtual
address space is the 8-terabyte range 0x000'00000000 through
0x7FF'FFFFFFFF. A range of virtual addresses is sometimes called a
range of virtual memory.
The diagram shows the virtual address spaces for two 64-bit processes:
Notepad.exe and MyApp.exe. Each process has its own virtual address
space that goes from 0x000'0000000 through 0x7FF'FFFFFFFF. Each shaded
block represents one page (4 kilobytes in size) of virtual or physical
memory. Notice that the Notepad process uses three contiguous pages of
virtual addresses, starting at 0x7F7'93950000. But those three
contiguous pages of virtual addresses are mapped to noncontiguous
pages in physical memory. Also notice that both processes use a page
of virtual memory beginning at 0x7F7'93950000, but those virtual pages
are mapped to different pages of physical memory.

Difference between physical/logical/virtual memory address

I am a little confused about the terms physical/logical/virtual addresses in an Operating System(I use Linux- open SUSE)
Here is what I understand:
Physical Address- When the processor is in system mode, the address used by the processor is physical address.
Logical Address- When the processor is in user mode, the address used is the logical address. these are anyways mapped to some physical address by adding a base register with the offset value.It in a way provides a sort of memory protection.
I have come across discussion that virtual and logical addresses/address space are the same. Is it true?
Any help is deeply appreciated.
My answer is true for Intel CPUs running on a modern Linux system, and I am speaking about user-level processes, not kernel code. Still, I think it'll give you some insight enough to think about the other possibilities
Address Types
Regarding question 3:
I have come across discussion that virtual and logical
addresses/address space are the same. Is it true?
As far as I know they are the same, at least in modern OS's running on top of Intel processors.
Let me try to define two notions before I explain more:
Physical Address: The address of where something is physically located in the RAM chip.
Logical/Virtual Address: The address that your program uses to reach its things. It's typically converted to a physical address later by a hardware chip (mostly, not even the CPU is aware really of this conversion).
Virtual/Logical Address
The virtual address is well, a virtual address, the OS along with a hardware circuit called the MMU (Memory Management Unit) delude your program that it's running alone in the system, it's got the whole address space(having 32-bits system means your program will think it has 4 GBs of RAM; roughly speaking).
Obviously, if you have more than one program running at the time (you always do, GUI, Init process, Shell, clock app, calendar, whatever), this won't work.
What will happen is that the OS will put most of your program memory in the hard disk, the parts it uses the most will be present in the RAM, but hey, that doesn't mean they'll have the address you and your program know.
Example: Your process might have a variable named (counter) that's given the virtual address 0xff (imaginably...) and another variable named (oftenNotUsed) that's given the virtual address (0xaa).
If you read the assembly of your compiled code after all linking's happened, you'll be accessing them using those addresses but well, the (oftenNotUsed) variable won't be really there in RAM at 0xaa, it'll be in the hard disk because the process is not using it.
Moreover, the variable (counter) probably won't be physically at (0xff), it'll be somewhere else in RAM, when your CPU tries to fetch what's in 0xff, the MMU and a part of the OS, will do a mapping and get that variable from where it's really available in the RAM, the CPU won't even notice it wasn't in 0xff.
Now what happens if your program asks for the (oftenNotUsed) variable? The MMU+OS will notice this 'miss' and will fetch it for the CPU from the Harddisk into RAM then hand it over to the CPU as if it were in the address (0xaa); this fetching means some data that was present in RAM will be sent back to the Harddisk.
Now imagine this running for every process in your system. Every process thinks they have 4GB of RAMs, no one actually have that but everything works because everyone has some parts of their program available physically in the RAM but most of the program resides in the HardDisk. Don't confuse this part of the program memory being put in HD with the program data you can access through file operations.
Summary
Virtual address: The address you use in your programs, the address that your CPU use to fetch data, is not real and gets translated via MMU to some physical address; everyone has one and its size depends on your system(Linux running 32-bit has 4GB address space)
Physical address: The address you'll never reach if you're running on top of an OS. It's where your data, regardless of its virtual address, resides in RAM. This will change if your data is sent back and forth to the hard disk to accommodate more space for other processes.
All of what I have mentioned above, although it's a simplified version of the whole concept, is what's called the memory management part of the the computer system.
Consequences of this system
Processes cannot access each other memory, everyone has their separate virtual addresses and every process gets a different translation to different areas even though sometimes you may look and find that two processes try to access the same virtual address.
This system works well as a caching system, you typically don't use the whole 4GB you have available, so why waste that? let others share it and let them use it too; when your process needs more, the OS will fetch your data from the HD and replace other process' data, at an expense of course.
Physical Address- When the processor is in system mode, the address used by the processor is physical address.
Not necessarily true. It depends on the particular CPU. On x86 CPUs, once you've enabled page translation, all code ceases to operate with physical addresses or addresses trivially convertible into physical addresses (except, SMM, AFAIK, but that's not important here).
Logical Address- When the processor is in user mode, the address used is the logical address. these are anyways mapped to some physical address by adding a base register with the offset value.
Logical addresses do not necessarily apply to the user mode exclusively. On x86 CPUs they exist in the kernel mode as well.
I have come across discussion that virtual and logical addresses/address space are the same. Is it true?
It depends on the particular CPU. x86 CPUs can be configured in such a way that segments aren't used explicitly. They are used implicitly and their bases are always 0 (except for thread-local-storage segments). What remains when you drop the segment selector from a logical address is a 32-bit (or 64-bit) offset whose value coincides with the 32-bit (or 64-bit) virtual address. In this simplified set-up you may consider the two to be the same or that logical addresses don't exist. It's not true, but for most practical purposes, good enough of an approximation.
I am referring to below answer base on intel x86 CPU
Difference Between Logical to Virtual Address
Whenever your program is under execution CPU generates logical address for instructions which contains (16 bit Segment Selector and 32 bit offset ).Basically Virtual(Linear address) is generated using logical address fields.
Segment selector is 16 bit field out of which first 13bit is index (Which is a pointer to the segment descriptor resides in GDT,described below) , 1 bit TI field ( TI = 1, Refer LDT , TI=0 Refer GDT )
Now Segment Selector OR say segment identifier refers to Code Segment OR Data Segment OR Stack Segment etc. Linux contains one GDT/LDT (Global/Local Descriptor Table) Which contains 8 byte descriptor of each segments and holds the base (virtual) address of the segment.
So for for each logical address, virtual address is calculated using below steps.
1) Examines the TI field of the Segment Selector to determine which Descriptor
Table stores the Segment Descriptor. This field indicates that the Descriptor is
either in the GDT (in which case the segmentation unit gets the base linear
address of the GDT from the gdtr register) or in the active LDT (in which case the
segmentation unit gets the base linear address of that LDT from the ldtr register).
2) Computes the address of the Segment Descriptor from the index field of the Segment
Selector. The index field is multiplied by 8 (the size of a Segment Descriptor),
and the result is added to the content of the gdtr or ldtr register.
3) Adds the offset of the logical address to the Base field of the Segment Descriptor,
thus obtaining the linear(Virtual) address.
Now it is the job of Pagging unit to translate physical address from virtual address.
Refer : Understanding the linux Kernel , Chapter 2 Memory Addressing
User virtual addresses
These are the regular addresses seen by user-space programs. User addresses are either 32 or 64 bits in length, depending on the underlying hardware architecture, and each process has its own virtual address space.
Physical addresses
The addresses used between the processor and the system's memory. Physical addresses are 32- or 64-bit quantities; even 32-bit systems can use 64-bit physical addresses in some situations.
Bus addresses
The addresses used between peripheral buses and memory. Often they are the same as the physical addresses used by the processor, but that is not necessarily the case. Bus addresses are highly architecture dependent, of course.
Kernel logical addresses
These make up the normal address space of the kernel. These addresses map most or all of main memory, and are often treated as if they were physical addresses. On most architectures, logical addresses and their associated physical addresses differ only by a constant offset. Logical addresses use the hardware's native pointer size, and thus may be unable to address all of physical memory on heavily equipped 32-bit systems. Logical addresses are usually stored in variables of type unsigned long or void *. Memory returned from kmalloc has a logical address.
Kernel virtual addresses
These differ from logical addresses in that they do not necessarily have a direct mapping to physical addresses. All logical addresses are kernel virtual addresses; memory allocated by vmalloc also has a virtual address (but no direct physical mapping). The function kmap returns virtual addresses. Virtual addresses are usually stored in pointer variables.
If you have a logical address, the macro __pa() (defined in ) will return its associated physical address. Physical addresses can be mapped back to logical addresses with __va(), but only for low-memory pages.
Reference.
Normally every address issued (for x86 architecture) is a logical address which is translated to a linear address via the segment tables. After the translation into linear address, it is then translated to physical address via page table.
A nice article explaining the same in depth:
http://duartes.org/gustavo/blog/post/memory-translation-and-segmentation/
Physical Address is the address that is seen by the memory unit, i.e., one loaded into memory address register.
Logical Address is the address that is generated by the CPU.
The user program can never see the real physical address.Memory mapping unit converts the logical address to physical address.
Logical address generated by user process must be mapped to physical memory before they are used.
Logical memory is relative to the respective program i.e (Start point of program + offset)
Virtual memory uses a page table that maps to ram and disk. In this way each process can promise more memory for each individual process.
In the Usermode or UserSpace all the addresses seen by program are Virtual addresses.
When in kernel mode addresses seen by kernel are still virtual but termed as logical as they are equal to physical + pageoffset .
Physical addresses are the ones which are seen by RAM .
With Virtual memory every address in program goes through page tables.
when u write a small program eg:
int a=10;
int main()
{
printf("%d",a);
}
compile: >gcc -c fname.c
>ls
fname.o //fname.o is generated
>readelf -a fname.o >readelf_obj.txt
/readelf is a command to understand the object files and executabe file which will be in 0s and 1s. output is written in readelf_onj.txt file/
`>vim readelf_obj.txt`
/* under "section header" you will see .data .text .rodata sections of your object file. every starting or the base address is started from 0000 and grows to the respective size till it reach the size under the heading "size"----> these are the logical addresses.*/
>gcc fname.c
>ls
a.out //your executabe
>readelf -a a.out>readelf_exe.txt
>vim readelf_exe.txt
/* here the base address of all the sections are not zero. it will start from particular address and end up to the particular address. The linker will give the continuous adresses to all the sections (observe in the readelf_exe.txt file. observe base address and size of each section. They start continuously) so only the base addresses are different.---> this is called the virtual address space.*/
Physical address-> the memory ll have the physical address. when your executable file is loaded into memory it ll have physical address. Actually the virtual adresses are mapped to physical addresses for the execution.

Definition/meaning of Aliasing? (CPU cache architectures)

I'm a little confused by the meaning of "Aliasing" between CPU-cache and Physical address.
First I found It's definition on Wikipedia :
However, VIVT suffers from aliasing problems, where several different virtual addresses may refer to the same physical address. Another problem is homonyms, where the same virtual address maps to several different physical addresses.
but after a while I saw a different definition on a presentation(ppt)
of DAC'05: "Energy-Efficient Physically Tagged Caches for Embedded Processors with
Virtual Memory"
Cache aliasing and synonyms:
Alias: Same virtual address from different contexts mapped to different physical addresses Synonym: Different virtual address mapped to the same physical address (data sharing)
As I'm not a native speaker, I don't know which is correct,
though I feel the Wiki's definition is correct.
Edit:
Concept of "aliasing" in CPU cache usually means "synonym", on the contrary is "homonym". In a more generic level, "aliasing" is "confusing" or "chaos" or something like that. So In my opinion, "aliasing" exactly means the mapping of (X->Y) is "not bijective", where
"X" = the subset of physical addresses units which has been cached. (each element is a line of byte)
"Y" = the set of valid cache lines. (elements a also "line")
You'd need to learn about Virtual Memory first, but basically it's this:
The memory addresses your program uses aren't the physical addresses that the RAM uses; they're virtual addresses mapped to physical addresses by the CPU.
Multiple virtual addressses can point to the same physical address.
That means that you can have two copies of the same data in separate parts of the cache without knowing it... and they wouldn't be updated correctly, so you'd get wrong results.
Edit:
Exerpt of reference:
Cache aliasing occurs when multiple mappings to a physical page of memory have conflicting caching states, such as cached and uncached. Due to these conflicting states, data in that physical page may become corrupted when the processor's cache is flushed. If that page is being used for DMA by a driver, this can lead to hardware stability problems and system lockups.
For those who are still unconvinced:
On ARMv4 and ARMv5 processors, cache is organized as a virtual-indexed, virtual-tagged (VIVT) cache in which both the index and the tag are based on the virtual address. The main advantage of this method is that cache lookups are faster because the translation look-aside buffer (TLB) is not involved in matching cache lines for a virtual address. However, this caching method does require more frequent cache flushing because of cache aliasing, in which the same physical address can be mapped to multiple virtual addresses.
#Wu yes you do need to understand virtual memory little to understand aliasing. Let me give you a few lines of explanation first:
Lets say I have a RAM (physical memory) of 1GB. I want to present my programmer with a view that I have 4GB memory then I use virtual memory. In virtual memory, the programmer thinks that he/she has 4GB and writes their program from that perspective. They do not need to know how much physical memory exists. The advantage is that program will run on computers with different amounts of RAM. Also, the program can run on a computer together with other programs (also consuming physical memory).
So here is how virtual memory is implement. I will give a simple 1-level virtual memory system (Intel has a 2/3-level system which just makes it complicated for explanation.
Our problem here is that the programmer has 4 Billion addresses and we only have 1 billion places to put those 4 billion addresses. So, addresses are from the virtual address space need to be mapped to physical address space. This is done using a simple index table called a Page Table. You access a Page Table with a virtual address and it gives you the physical address of that memory location.
Some details: Remember that physical space is only 1GB so the system only keeps the most recently accessed 1GB worth in physical memory and keeps the rest in system disk. When the program requests a particular address, we first check if it is already in physical memory. If so, it is returned to the program. If not, it brought from the disk and put into physical memory and then returned to the program. The latter is known as a Page Fault.
Coming back to aliasing in context of virtual memory: since there is mapping between virtual -> physical addresses, it is possible to make two virtual addresses to map to the same physical address. it is the same as saying that if I look at my page table for virtual
address X and Y, I will get the same physical address in BOTH cases.
I show below a simple example of a 8 entry Page Table. Say there are 8 vitual addresses and only 3 physical addresses. The page table looks as follows:
0: 1
1: On disk
2: 2
3: 1
4: On disk
5: On disk
6: On disk
7: 0
This mean that if virtual address 4 is accessed, you will get a page fault.
If virtual addresses 3 is accessed, you will get the physical address 1
In this case, virtual addresses 0 and 3 are aliasing to the same physical address 1 for both of them
NOTE: I used the terms physical and virtual addresses everywhere to simplify the concept. In a real system, the virtual-to-physical mapping is not on a per address basis . Instead, we map chunks of virtual space to physical space. Each chunk is called a Page (thats why the mapping table is called a page table) and the size of the chunk is a property of the ISA, e.g., Intel x86 has 4Kbyte pages.

Resources