I looked up registers and memory locations and I found out that a memory location holds 8 bit data while a register can hold up to 64 bit data.
So my question is, how do you memorize a register into a single memory location if the memory location, as you've already noticed, is not big enough? How exactly do you transfer registers data to memory locations? Thanks in advance!
The "memory location" of an object is the location of its first byte.
It is not implied that the entire object resides in that one byte. A 64 bit register, stored in memory verbatim, would occupy the byte at that location and the 7 bytes that immediately follow that location.
Related
When the operating system loads Program onto the main memory , it , along with the stack and heap memory , also attaches the static data along with it. I googled about what is present in the static data which said it contained the global variables and static variables. But I am confused as both of these are already present in the text file of the program then why do we add them seperately?
The data in the executable is often referred as the data segment. The CPU doesn't interact with the hard-disk but only with RAM. The data segment must thus be loaded in RAM before the CPU can access it. The file of the executable is not really a text file. It is an executable so it has a different extension. Text files often refer to an actual file with a .txt extension.
With that said, you also asked another question not long ago (If the amount of stack memory provided to a program is fixed then why does it grow downwards in the process architecture? Or am I getting it wrong?) so I will try to give some insight for both of these in this same answer.
I don't know much about caching and low level inner CPU workings but, today mostly, the CPU doesn't even operate on RAM directly. It will load a bunch of RAM chunks into the cache and make operations on them and keep RAM-cache consistency by implementing complex mechanisms. The OS also has its role to play in RAM-cache consistency but, like I said, I am far from an expert here. Other than that, caching is mostly transparent to the OS. The CPU handles it and the OS simply provides instructions to the CPU which executes them.
Today, you have paging used by most OS and implemented on most CPU architectures. With paging, every process sees a full contiguous virtual address space. The virtual address space is accessed contiguously and the hardware MMU translates those addresses to physical ones automatically by crossing the page tables. The OS is responsible to make sure the page tables are consistent and the MMU does the rest of the job (for more info read: What is paging exactly? OSDEV). If you understand paging well, things become much clearer.
For a process, there is mostly 3 types of memory. There is the stack (often called automatic storage), the heap and the static/global data. I will attempt to give precision on all of these to give a global picture.
The stack is given a maximum size when the process begins. The OS handles that and creates the page tables and places the proper address in the stack pointer register so that stack accesses reach the proper region of physical memory. The stack is automatic storage which means that it isn't handled manually by the high level programmer. For example, in C/C++, the stack is managed by the compiler which, at the entry of a function, will create a stack frame and place offsets from the stack base pointer in the instructions. Every local variable (within a function) will be accessed with a relative negative offset from the stack base pointer. What the compiler needs to do is to create a stack frame of the proper size so that there will be enough place for all local variables of a particular function (for more info on the stack see: Each program allocates a fixed stack size? Who defines the amount of stack memory for each application running?).
For the heap, the OS reserves a very big amount of virtual memory. Today, virtual memory is very big (2^48 bytes or more). The amount of heap available for each process is often only limited by the amount of physical memory available to back virtual memory allocations. For example, a process could use malloc() to allocate 4KB of memory in C. The OS will be called with a system call by the libc library which is an implementation of the C standard library. The OS will then reserve a page of the virtual memory available for the heap and change the page tables so that accessing that portion of virtual memory will translate to somewhere in RAM (probably somewhere another process wasn't already using).
The static/global data are simply placed in the executable in the data segment. The data segment is loaded in the virtual memory alongside the text segment. The text segment will thus be able to access this data often using RIP-relative addressing.
I'm reading about memory management from a book called Operating Systems.
I've studied about this subject before and it was all clear because there were only two types of addresses introduced: Physical & Logical (Physical & Virtual). However, this book seems to introduce three types where it sometimes views two of them as the same, and sometimes as different.
Here's a quote (translated myself, so might not be the best):
At the time of writing a program it is not know at which point in the
memory the program will be, which is why symbolic addresses are used
(variable names). The process of translating symbolic addresses into
physical addresses is called address binding and it can be done at
different points in time. If, during the compilation, it is known in
which part of the memory the program will be then address binding can
be done at that point. Otherwise (the most common case) the compiler
generates relative addresses (relative to the start of the part of
the memory that the process gets). When executing a program the
loader maps relative addresses into physical addresses.
This all seems to be pretty clear. Relative maps to the physical. Here's what comes after:
During process execution, the interaction with memory is done through
sequences of reading and writing into memory locations. The CPU either
reads instructions or data from the memory or writes data into the
memory. Within both of these tasks, the CPU does not use physical
addresses but rather logical ones which the CPU generates itself. The set of all logical
addresses is called the Virtual Address Space.
This is already confusing as it is. What's the difference between a logical and a relative address? Wherever else I look this up they're never separated. Here comes an even more confusing sentence:
In case the address binding is done at the time of compilation and
loading then the virtual address space matches the physical address
space.
Earlier on it is stated that address binding is the process of converting symbolic addresses into physical addresses. But then only later on is the concept of relative addresses introduced. And loading is said to be the process of converting relative into physical. So now I'm completely lost here.
Assuming that we have no knowledge of which part of the memory the process is going to take: how does the timeline go? The program is compiled, the variable names (symbolic addresses) are translated into ... relative ones I guess? Then the CPU needs to do some read/write and it uses ... logical ones?
And furthermore, the terms relative and logical seem to be used randomly in the following sections of the book. As if they're the same, but still defined as different.
Could anyone clarify this for me? The perfect answer would be maybe an artificial example of a program timeline. At which point is which address introduced, what is the difference between a logical and a relative address?
Thanks in advance.
A relative address means a distance between two locations or addresses (which can be logical, linear/virtual or physical, which isn't important at this point).
For example, the x86 call and jump instructions have a form that specifies the distance (counted from the byte after the end of the call/jump instruction) to call/jump. That distance is simply added to the instruction pointer register ([R|E]IP) and that's the location where the next instruction will come from (again, I'm ignoring logical, ..., physical for now).
If your program contains a subroutine and calls it using such an instruction, it doesn't matter where the program is located in memory since the distance between two locations of the whole remains the same (things will become more complex if the whole program consists of several moving parts, including one or more libraries, but let's not go there).
Now, let's say your program has a global variable and needs to read it. If there is a memory reading instruction similar to the call instruction described above, you can again use the distance from the instruction pointer to the location of the variable. Prior to the 64-bit x86 CPUs there was no such instruction/mechanism to access data, only calls and jumps could be IP-relative.
In absence of such an IP-relative data addressing mechanism, you need to know the actual address of the variable, which you won't know until the program is loaded into memory for execution. What's done in this case is that the instruction that reads the variable initially receives the address of the variable relative to IP (that of the instruction that reads the variable) or simply the program's start. And that's how the program is stored on disk, with a relative address inside the instruction. Once loaded, but before the program starts execution, the address of the variable in the instruction that reads it is adjusted such that it becomes the actual address and not relative to something (IP or program's start). The further away the program's start is from address 0, the larger adjustment needs to be added to that relative address.
Get the idea?
And now something almost entirely different and unrelated...
In the context of x86 CPUs, there are these kinds of addresses:
Logical
Linear/virtual
Physical
If we go back all the way to the 8086/8088... Actually, if we go even further back to the 8080/8085, all memory addresses are 16-bit, they don't undergo any translation by the CPU and are presented as-is to the memory, hence they're physical (we're not talking about IP/PC-relative call/jump instructions here).
16 bits allow for 64KB of memory. The 8086/8088 extended those 16 bit addresses with another 16 bits to address more than 64KB of memory, but it didn't just widen all registers and addresses from 16 to 32 bits. Instead it introduced special segment registers, which would be used in pairs with those old 16-bit addresses of the 8080/8085. So, a pair of registers such as DS (a segment register) and BX (a regular general-purpose register) could address memory at address DS * 16 + BX. The pair DS:BX is the logical address, the value DS * 16 + BX is the physical address. With this scheme we can access approximately 1MB of memory (just plug in 65535 for both registers).
The 80286 slightly changed the above by introducing the so-called protected mode, in which the physical address was calculated as segment_table[DS] + BX (this allowed to go from 1MB to 16MB), but the idea was still the same.
Next came along the 80386 and widened registers to 32 bits and introduced yet another layer of indirection. The physical address was now, simplifying a bit, page_tables[segment_table[DS] + EBX].
The pair DS:EBX constitutes the logical address, this is what the program manipulates with (e.g. in instruction MOV EAX, DS:[EBX]), this is what it can observe.
segment_table[DS] + EBX constitutes the linear/virtual address (which the program may not always know since it can't see into segment_table[], a table managed by the OS). If page translation isn't enabled, this linear/virtual address is also equal to the final, physical address.
With page translation enabled, the physical address is page_tables[segment_table[DS] + EBX].
What's more to know:
logical addresses can be more complex, e.g. DS:[EAX + EBX * 2 + 3]
OSes commonly set up segment_table[] such that segment_table[any segment register]=0, effectively removing the segmentation mechanism out of the picture and ending up with e.g. physical address = page_tables[EAX + EBX * 2 + 3]. While it's not entirely correct to say that in such a set up logical and linear/virtual addresses are the same (EAX + EBX * 2 + 3), it definitely simplifies thinking.
Now, what do these segment and page tables have to do with relative addresses and relocation discussed at the beginning? These tables just let you place your program anywhere in physical memory, often in a very transparent way to the program itself. It doesn't need to know where it's physically at or whether page translation is enabled.
However, there are certain benefits to using page translation, but that's outside of the scope here.
I am developing a Linux kernel driver on 3.4. The purpose of this driver is to provide a mmap interface to Userspace from a buffer allocated in an other kernel module likely using kzalloc() (more details below). The pointer provided by mmap must point to the first address of this buffer.
I get the physical address from virt_to_phys(). I give this address right shifted by PAGE_SHIFT to remap_pfn_range() in my mmap fops call.
It is working for now but it looks to me that I am not doing the things properly because nothing ensure me that my buffer is at the top of the page (correct me if I am wrong). Maybe mmap()ing is not the right solution? I have already read the chapter 15 of LDD3 but maybe I am missing something?
Details:
The buffer is in fact a shared memory region allocated by the remoteproc module. This region is used within an asymetric multiprocessing design (OMAP4). I can get this buffer thanks to the rproc_da_to_va() call. That is why there is no way to use something like get_free_pages().
Regards
Kev
Yes, you're correct: there is no guarantee that the allocated memory is at the beginning of a page. And no simple way for you to both guarantee that and to make it truly shared memory.
Obviously you could (a) copy the data from the kzalloc'd address to a newly allocated page and insert that into the mmap'ing process' virtual address space, but then it's not shared with the original datum created by the other kernel module.
You could also (b) map the actual page being allocated by the other module into the process' memory map but it's not guaranteed to be on a page boundary and you'd also be sharing whatever other kernel data happened to reside in that page (which is both a security issue and a potential source of kernel data corruption by the user-space process into which you're sharing the page).
I suppose you could (c) modify the memory manager to return every piece of allocated data at the beginning of a page. This would work, but then every time a driver wants to allocate 12 bytes for some small structure, it will in fact be allocating 4K bytes (or whatever your page size is). That's going to waste a huge amount of memory.
There is simply no way to trick the processor into making the memory appear to be at two different offsets within a page. It's not physically possible.
Your best bet is probably to (d) modify the other driver to allocate the specific bits of data that you want to make shared in a way that ensures alignment on a page boundary (i.e. something you write to replace kzalloc).
How does Windows give 4GB address space each to multiple processes
when the total memory it can access is also limited to 4GB.
The solution of above question i found in Windows Memory Management
(Written by: Pankaj Garg)
Solution:
To achieve this Windows uses a feature of x86 processor (386 and
above) known as paging. Paging allows the software to use a different
memory address (known as logical address) than the physical memory
address. The Processor’ paging unit translates this logical address to
the physicals address transparently. This allows every process in the
system to have its own 4GB logical address space.
Can anyone help me to understand it in simpler form?
The basic idea is that you have limited physical RAM. Once it fills up, you start storing stuff on the hard disk instead. When a process requests data that is currently on disk, or asks for new memory, you kick out a page from RAM by transferring it to the disk, and then page in the data you actually need.
The OS maintains a data structure called a page table to keep track of which logical addresses correspond to the data currently in physical memory and where stuff is on the disk.
Each process has its own virtual address space, and operates using logical addresses within this space. The OS is responsible for translating requests for a given process and logical address into a physical address/location on disk. It is also responsible for preventing processes from accessing memory that belongs to other processes.
When a process asks for data that is not currently in physical memory, a page fault is triggered. When this occurs, the OS selects a page to move to disk (if physical memory is full). There are several page replacement algorithms for selecting the page to kick out.
The wrong original assumption is "when the total memory it can access is also limited to 4GB". It is untrue, the total memory OS can access is not that limited.
There is a limit on 32-bit addresses that 32-bit code can access. It is (1 << 32) which is 4 GB. However this is the amount to access simultaneously only. Imagine OS has cards A, B, ..., F and applications can access only four at a time. App1 might be seeing ABCD, App2 - ABEF, App3 - ABCF. The apps see 4, but OS manages 6.
The limit on 32-bit flat memory model does not imply that the entire OS is subject to the same limit.
Windows uses a technique called virtual memory. Each process has its own memory. One of the reasons this is done, is due to security reasons, to forbid accessing the memory of other processes.
As you've pointed out, the assigned virtual memory can be bigger than the actual physical memory. This is where the process of paging comes into places. My knowledge of memory management and microarchitecture is a bit rusty, so I don't want to post anything wrong, but I 'd recommend reading http://en.wikipedia.org/wiki/Virtual_memory
If you are interested in more literature, I'd recommend reading 'Structured Computer Organization – Tannenbaum'
Virtual address space is not RAM. It's an address space. Each page (the size of a page depends on the system) can be unmapped (the page is nowhere and not accessible. it does not exist), mapped to a file (the page is not directly accessible, its content is stored on disk), mapped to RAM (that's the pages that you can actually access).
Pages mapped to RAM can be swappable or pinned. Pinned pages will never be swapped out to disk. Swappable pages are associated to an area on disc and may be written to that area to free up the RAM they are using.
Pages mapped to RAM can also be read only, write only, read write. If they are writable they may be directly writable or copy-on-write.
Multiple pages (both within the same address space and across separate address spaces) may be mapped identically. This i how two separate processes may access the same data in memory (which may happen at different addresses in each process).
In a modern operating system each process has it's own address space. On 32 bit operating systems each process has 4GiB of address space. On 64 bit operating systems 32 bit processes still only have 4GiB (4 gigabinary bytes) of address space but 64 bit processes may have more. Generally they have 18 EiB (18 exabinary bytes, that is 18,874,368 TiB).
The size of the address space is totally independent of both the amount of RAM memory and the amount of actually allocated space. You can have 100 processes each with 18 EiB of address space on a machine with one gigabyte of RAM. In fact windows has been giving 4GiB of address space to each process since the time when the typical machine had just a few megabytes or RAM.
Assuming the context is 32-bit system:
In addition to http://en.wikipedia.org/wiki/Virtual_memory , However the memory abstraction given by the kernel to each process is 4GB, A process can actually use a far lesser than 4GB, because in each process the kernel is also mapped in most of the pages of the process. In general in NT system out of 4GB, 2GB is used by kernel and in *nix system 1 GB is used by kernel.
I read this a long time ago during my OS course with Windows as case study. The numbers I give may not be accurate but they can give you a decent idea of what happens behind the scenes. From what I can recall:
In windows The memory model used is Demand Paging. On Intel a page size is 4k. Initially when you run a program, only 4 pages each of 4K is loaded from your program. which means a total of 16k of memory is allocated. Programs may be bigger but there is no need to load the whole program at once into memory. Some of these pages are data pages i.e. read/writeable where your variables and data structures are. while the other are code pages which contain the executable code i.e. the code segment. The IP is set to the first instruction of the code segment and the program starts its execution under the impression that 4GB is allocated.
When further pages are needed that is you request more memory (data segment) or your program executes further and need other executable instructions (code segment) Windows check if there is sufficient amount of memory available. If yes then these pages are loaded and mapped into the process's address space. if not much memory is available, then windows checks which pages have not been used for quite some time (this is run for all the processes not just the calling process). when it finds such pages, it moves them to the Paging file to free the space in memory and loads the requested pages.
if sometimes your program calls code from some dll that is already loaded windows simply maps those pages into your process's address space. there is no need to load these pages again as they are already availble in the memory. thus it avoids duplication as well as saves space.
So theoretically the processes are using more memory than available and they can use 4GB of memory but in reality only the portion of the process is loaded at one time.
(Do mark my answer if you find it useful)
I'm neither sure about if this is a right place to ask nor sure about how to put my query.
Let me put it this way:
Main Memory starting at 0x00000 to 0xFFFFF.
Diskspace starting at 0x00000000 to 0xFFFFFFFF.
But what we'll be able to access will not be from 0th byte till last byte right?
On hardisk I guess at the 0th byte we have MBR. & at someplace we have Filesystem (we are able to acess only this). What else?
Similarly with the Main memory. We have some Kernel Memory & User Memory(in which each processes live). What else?
My question is what are all the regions from 0th byte till the last byte? I don't know what to search for or where to find such information? If any one can post some links, that would be great.
EDIT:
I'm using x86 32Bit on Windows. Actually I was reading a book on Computer security where author mentions that a malware can either live on the disk or in the memory.(which is very true). But when we say computer is infected that doesn't mean only files (which are part of filesystem) is infected. There are other area's which are not mean't for user, like MBR. or Kernel Memory.
So, the question popped up in my mind. What are all such areas that I may not be aware about?
Apart from the fact that the answer to this question is highly dependent on the OS, disk space is not at all part of the main memory. On Intel architectures, disk access takes some I/O address space (which is different from memory address) per channel. And the exact number of words depends on what channel: IDE/ATA/SATA/SCSI. On other architectures which are memory mapped like the PowerPC disk access do take some memory address space, but still not much.
To illustrate (and be warned that this is a very simplified example, not the real world), assume a memory mapped CPU* like the PowerPC trying to access a disk with LBA addressing. The disk really only need 2 to 3 words of memory to hold multiple Gigabytes of data. That is, we only need 12 bytes to store and retrieve Gigabytes of data:
2 words (8 bytes) to tell the disk where to seek to, that is, at what address do we want to read form or write to.
1 word (4 bytes) to actually do the read and write. Every time you read from this address, the 2 word pointer automagically increment by 1 character (or 4 if you read in 32 bits).
But the above is an abstracted view of what really happens. Most disk controllers have several more registers to control power management, disk spin speed, enter and exit sleep modes etc.
And what are the addresses of these memory locations? Well, it depends on what I/O channel you're talking about. The old-school ISA bus depends on the user setting jumpers on cards to set the addresses. So for those you need to ask the user. The PCI bus auto-negotiates the addresses with the disk controllers at boot time and then, depending on architecture, either tells your bios what devices exist or pass them as parameters to the bootloader or store them in some temporary registers on the system bus. USB works like PCI but negotiates with the OS instead of the BIOS... etc.
As you can see, there is no simple answer to this even if you limit it to only specific cases like Windows7 running on 64 bit AMD CPU running on Dell motherboards.
*note: since you're worried about memory locations.
Your question is complex, and hard to answer without knowing the scope of where the view of memory is.
Pretending we're in ring-0 with direct mapped memory, a PC-compatible has multiple memory regions. Lower memory, BIOS mapped code, IO ports, video memory, etc. They all live in the same memory space. You communicate with peripherals by reading and writing from specific memory addresses (which are mapped to those components). These addresses are setup by the hardware in question and the drivers in use.
Once we enter user mode, you have to deal with virtual memory. Addresses are symbolic, and may or may not map to any particular part of physical memory. I'd suggest reading up on Virtual memory