How many memory references are required by the CPU to execute immediate address instruction and why?

How many memory references are required by the CPU to execute immediate address instruction and why? - cpu

In immediate addressing mode I know that instead of operand's address we have the value of operand in the instruction itself.
So, does CPU requires any memory reference as the effective address of the operand is the instruction's address itself.
Any insights would be very much helpful.

Related

Can virtual memory be used to support Data Breakpoint feature in i386?

I was lurking around my OS textbook and it mentioned that virtual address translation can be implemented on data breakpoint (for program debugging). I only know that the debugger uses INT 3 to pause the program, local and global variables being processed in someway in the debug control & address registers. But after some digging I only found information regarding linear address in the use of debug register. No articles or discussions about the mechanism behind virtual address related data breakpoint at all. So how exactly this work?

Linear addresses are virtual, in x86 terminology. x86 memory addressing goes:
addressing mode like [ebp + eax*4] to "effective address" (the offset part of a seg:off). (And every addressing mode implies a segment, if you don't manually override with [fs: rdi] for example. Normally DS, unless the base register is R/E/BP or R/ESP in which case SS. Or for implicit addressing modes as part of e.g. push rax or stosb, it depends on the instruction.)
seg:off -> linear by adding the segment base to the offset.
translation of that linear address to physical. (And if virtualizing, from guest-physical to true physical.)
All steps are done by the CPU hardware, first using the segment base, then using the page-table pointed to by CR3. Or the TLB which caches the translations from that page table.
The hardware debug registers for hardware breakpoints / watchpoints use virtual addresses. https://en.wikipedia.org/wiki/X86_debug_register explains it as follows:
The addresses in these registers are linear addresses. If paging is enabled, the linear addresses are translated into physical addresses by the processor's paging mechanism. If paging is not enabled, these linear addresses are the same as physical addresses.
That implies that a watchpoint can trigger when you access the same physical address from a different virtual address than the one you put in the debug register. (If that description on Wikipedia is accurate; I'd test it and/or check Intel or AMD's manuals if that matters.)
I don't actually know the details; know x86 has a TF flag and debug registers, and a general idea of things they can do, but I've never written code to use them.
I only know that the debugger uses INT 3 to pause the program
"hardware breakpoint" means the CPU will stop without software having to rewrite the executing code to 0xCC int3. The debug registers can do this, and also detect access to certain memory locations by any instruction.
So you can set a watchpoint to break when anything your program reads or writes a certain global variable in memory, letting you find code that modifies it through a pointer or something. And since it's HW supported, you can run at full speed instead of having to single-step and have software check every access.
See also
Need an overview of debugging process from the hardware layer
How does GDB restore instruction after breakpoint
How does gdb set software breakpoints in shared library functions?
Why Single Stepping Instruction on X86?
Intel's manuals.

Difference between relative and logical address

I'm reading about memory management from a book called Operating Systems.
I've studied about this subject before and it was all clear because there were only two types of addresses introduced: Physical & Logical (Physical & Virtual). However, this book seems to introduce three types where it sometimes views two of them as the same, and sometimes as different.
Here's a quote (translated myself, so might not be the best):
At the time of writing a program it is not know at which point in the
memory the program will be, which is why symbolic addresses are used
(variable names). The process of translating symbolic addresses into
physical addresses is called address binding and it can be done at
different points in time. If, during the compilation, it is known in
which part of the memory the program will be then address binding can
be done at that point. Otherwise (the most common case) the compiler
generates relative addresses (relative to the start of the part of
the memory that the process gets). When executing a program the
loader maps relative addresses into physical addresses.
This all seems to be pretty clear. Relative maps to the physical. Here's what comes after:
During process execution, the interaction with memory is done through
sequences of reading and writing into memory locations. The CPU either
reads instructions or data from the memory or writes data into the
memory. Within both of these tasks, the CPU does not use physical
addresses but rather logical ones which the CPU generates itself. The set of all logical
addresses is called the Virtual Address Space.
This is already confusing as it is. What's the difference between a logical and a relative address? Wherever else I look this up they're never separated. Here comes an even more confusing sentence:
In case the address binding is done at the time of compilation and
loading then the virtual address space matches the physical address
space.
Earlier on it is stated that address binding is the process of converting symbolic addresses into physical addresses. But then only later on is the concept of relative addresses introduced. And loading is said to be the process of converting relative into physical. So now I'm completely lost here.
Assuming that we have no knowledge of which part of the memory the process is going to take: how does the timeline go? The program is compiled, the variable names (symbolic addresses) are translated into ... relative ones I guess? Then the CPU needs to do some read/write and it uses ... logical ones?
And furthermore, the terms relative and logical seem to be used randomly in the following sections of the book. As if they're the same, but still defined as different.
Could anyone clarify this for me? The perfect answer would be maybe an artificial example of a program timeline. At which point is which address introduced, what is the difference between a logical and a relative address?
Thanks in advance.

A relative address means a distance between two locations or addresses (which can be logical, linear/virtual or physical, which isn't important at this point).
For example, the x86 call and jump instructions have a form that specifies the distance (counted from the byte after the end of the call/jump instruction) to call/jump. That distance is simply added to the instruction pointer register ([R|E]IP) and that's the location where the next instruction will come from (again, I'm ignoring logical, ..., physical for now).
If your program contains a subroutine and calls it using such an instruction, it doesn't matter where the program is located in memory since the distance between two locations of the whole remains the same (things will become more complex if the whole program consists of several moving parts, including one or more libraries, but let's not go there).
Now, let's say your program has a global variable and needs to read it. If there is a memory reading instruction similar to the call instruction described above, you can again use the distance from the instruction pointer to the location of the variable. Prior to the 64-bit x86 CPUs there was no such instruction/mechanism to access data, only calls and jumps could be IP-relative.
In absence of such an IP-relative data addressing mechanism, you need to know the actual address of the variable, which you won't know until the program is loaded into memory for execution. What's done in this case is that the instruction that reads the variable initially receives the address of the variable relative to IP (that of the instruction that reads the variable) or simply the program's start. And that's how the program is stored on disk, with a relative address inside the instruction. Once loaded, but before the program starts execution, the address of the variable in the instruction that reads it is adjusted such that it becomes the actual address and not relative to something (IP or program's start). The further away the program's start is from address 0, the larger adjustment needs to be added to that relative address.
Get the idea?
And now something almost entirely different and unrelated...
In the context of x86 CPUs, there are these kinds of addresses:
Logical
Linear/virtual
Physical
If we go back all the way to the 8086/8088... Actually, if we go even further back to the 8080/8085, all memory addresses are 16-bit, they don't undergo any translation by the CPU and are presented as-is to the memory, hence they're physical (we're not talking about IP/PC-relative call/jump instructions here).
16 bits allow for 64KB of memory. The 8086/8088 extended those 16 bit addresses with another 16 bits to address more than 64KB of memory, but it didn't just widen all registers and addresses from 16 to 32 bits. Instead it introduced special segment registers, which would be used in pairs with those old 16-bit addresses of the 8080/8085. So, a pair of registers such as DS (a segment register) and BX (a regular general-purpose register) could address memory at address DS * 16 + BX. The pair DS:BX is the logical address, the value DS * 16 + BX is the physical address. With this scheme we can access approximately 1MB of memory (just plug in 65535 for both registers).
The 80286 slightly changed the above by introducing the so-called protected mode, in which the physical address was calculated as segment_table[DS] + BX (this allowed to go from 1MB to 16MB), but the idea was still the same.
Next came along the 80386 and widened registers to 32 bits and introduced yet another layer of indirection. The physical address was now, simplifying a bit, page_tables[segment_table[DS] + EBX].
The pair DS:EBX constitutes the logical address, this is what the program manipulates with (e.g. in instruction MOV EAX, DS:[EBX]), this is what it can observe.
segment_table[DS] + EBX constitutes the linear/virtual address (which the program may not always know since it can't see into segment_table[], a table managed by the OS). If page translation isn't enabled, this linear/virtual address is also equal to the final, physical address.
With page translation enabled, the physical address is page_tables[segment_table[DS] + EBX].
What's more to know:
logical addresses can be more complex, e.g. DS:[EAX + EBX * 2 + 3]
OSes commonly set up segment_table[] such that segment_table[any segment register]=0, effectively removing the segmentation mechanism out of the picture and ending up with e.g. physical address = page_tables[EAX + EBX * 2 + 3]. While it's not entirely correct to say that in such a set up logical and linear/virtual addresses are the same (EAX + EBX * 2 + 3), it definitely simplifies thinking.
Now, what do these segment and page tables have to do with relative addresses and relocation discussed at the beginning? These tables just let you place your program anywhere in physical memory, often in a very transparent way to the program itself. It doesn't need to know where it's physically at or whether page translation is enabled.
However, there are certain benefits to using page translation, but that's outside of the scope here.

Does a cache line flush access the TLB?

Assuming that we have intentionally thrashed the DTLB, and would like to proceed to flush a specific cache line from L1-3 using clflush on a memory region which is (most likely) disjoint from the addresses pointed to by the TLB entries; would this in fact bring the page base address of the cache line we are flushing back into the TLB?
In short, does a clflush touch the TLB at all? I'm assuming that due to this instruction honouring coherency, it will subsequently write that line back to memory (obviously needing a TLB look-up.)

From Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A: Instruction Set Reference, A-L: "Invalidates the cache line that contains the linear address specified with the source operand from all levels of the processor cache hierarchy (data and instruction) ."
Since it uses the linear (virtual) address, the address needs to be translated, which means that a page table walk would be needed on a TLB miss. (This would generally be the case even for a different kind of instruction that pushed cache entries out to higher levels of cache since L1 caches are typically physically tagged for x86. In general, tagging with the virtual address has fallen out of favor. Using the physical address for tags means that the physical address is needed to check the cache for a hit, so even if it was not sent to memory, translation would be needed.)
While it would be possible to avoid loading the TLB for such accesses, the extra complexity of such special-case handling would almost certainly not be viewed as worth the bother given that CLFLUSH is not commonly used.

MapViewOfFileEx - valid lpBaseAddress

In answer to a question about mapping non-contiguous blocks of files into contiguous memory, here, it was suggested by one respondent that I should use VirtualAllocEx() with MEM_RESERVE in order to establish a 'safe' value for the final (lpBaseAddress) parameter for MapViewOfFileEx().
Further investigation revealed that this approach causes MapViewofFileEx() to fail with error 487: "Attempt to access invalid address." The MSDN page says:
"No other memory allocation can take place in the region that is used for mapping, including the use of the VirtualAlloc or VirtualAllocEx function to reserve memory."
While the documentation might be considered ambiguous with respect to valid sequences of calls, experimentation suggests that it is not valid to reserve memory for MapViewOfFileEx() using VirtualAllocEx().
On the web, I've found examples with hard-coded values - example:
#define BASE_MEM (VOID*)0x01000000
...
hMap = MapViewOfFileEx( hFile, FILE_MAP_WRITE, 0, 0, 0, BASE_MEM );
To me, this seems inadequate and unreliable... It is far from clear to me why this address is safe, or how many blocks can be safely be mapped there. It seems even more shaky given that I need my solution to work in the context of other allocations... and that I need my source to compile and work in both 32 and 64 bit contexts.
What I'd like to know is if there is any way to reliably reserve a pool of address space in order that - subsequently - it can be reliably used by MapViewOfFileEx to map blocks to explicit memory addresses.

You almost got to the solution by yourself but fell short of the last small step.
As you figured, use VirtualAlloc (with MEM_RESERVE) to find room in your address space, but after that (and before MapViewOfFileEx) use VirtualFree (with MEM_RELEASE). Now the address range will be free again. Then use the same memory address (returned by VirtualAlloc) with MapViewOfFileEx.

What you are trying to do is impossible.
From the MapViewOfFileEx docs, the pointer you supply is "A pointer to the memory address in the calling process address space where mapping begins. This must be a multiple of the system's memory allocation granularity, or the function fails."
The memory allocation granularity is 64K, so you cannot map disparate 4K pages from the file into adjacent 4K pages in virtual memory.

If you provide a base address, the function will try to map your file at that address. If it cannot use that base address (because something is already using all or part of the requested memory region), then the call will fail.
For most applications, there's no real point trying to fix the address yourself. If you're a sophisticated database process and you're trying to carefully manage your own memory layout on a machine with a known configuration for efficiency reasons, then it might be reasonable. But you'd have to be prepared for failure.
In 64-bit processes, the virtual address space is pretty wide open, so it might be possible to select a base address with some certainty, but I don't think I'd bother.
From MSDN:
While it is possible to specify an address that is safe now (not used by the operating system), there is no guarantee that the address will remain safe over time. Therefore, it is better to let the operating system choose the address.
I believe "over time" refers to future versions of the OS and whatever run-time libraries you're using (e.g., for memory allocation), which might take a different approach to memory layout.
Also:
If the lpBaseAddress parameter specifies a base offset, the function succeeds if the specified memory region is not already in use by the calling process. The system does not ensure that the same memory region is available for the memory mapped file in other 32-bit processes.
So basically, your instinct is right: specifying a base address is not reliable. You can try, but you must be prepared for failure.
So to directly answer your question:
What I'd like to know is if there is any way to reliably reserve a pool of address space in order that - subsequently - it can be reliably used by MapViewOfFileEx to map blocks to explicit memory addresses.
No, there isn't. Not without applying many constraints on the runtime environment (e.g., limiting to a specific version of the OS, setting base addresses for all of your DLLs, disallowing DLL injection, etc.).

memory allocation vs. swapping (under Windows)

sorry for my rather general question, but I could not find a definite answer to it:
Given that I have free swap memory left and I allocate memory in reasonable chunks (~1MB) -> can memory allocation still fail for any reason?

The smartass answer would be "yes, memory allocation can fail for any reason". That may not be what you are looking for.
Generally, whether your system has free memory left is not related to whether allocations succeed. Rather, the question is whether your process address space has free virtual address space.
The allocator (malloc, operator new, ...) first looks if there is free address space in the current process that is already mapped, that is, the kernel is aware that the addresses should be usable. If there is, that address space is reserved in the allocator and returned.
Otherwise, the kernel is asked to map new address space to the process. This may fail, but generally doesn't, as mapping does not imply using physical memory yet -- it is just a promise that, should someone try to access this address, the kernel will try to find physical memory and set up the MMU tables so the virtual->physical translation finds it.
When the system is out of memory, there is no physical memory left, the process is suspended and the kernel attempts to free physical memory by moving other processes' memory to disk. The application does not notice this, except that executing a single assembler instruction apparently took a long time.
Memory allocations in the process fail if there is no mapped free region large enough and the kernel refuses to establish a mapping. For example, not all virtual addresses are useable, as most operating systems map the kernel at some address (typically, 0x80000000, 0xc0000000, 0xe0000000 or something such on 32 bit architectures), so there is a per-process limit that may be lower than the system limit (for example, a 32 bit process on Windows can only allocate 2 GB, even if the system is 64 bit). File mappings (such as the program itself and DLLs) further reduce the available space.

A very general and theoretical answer would be no, it can not. One of the reasons it could possibly under very peculiar circumstances fail is that there would be some weird fragmentation of your available / allocatable memory. I wonder whether you're trying get (probably very minor) performance boost (skipping if pointer == NULL - kind of thing) or you're just wondering and want to discuss it, in which case you should probably use chat.

Yes, memory allocation often fails when you run out of memory space in a 32-bit application (can be 2, 3 or 4 GB depending on OS version and settings). This would be due to a memory leak. It can also fail if your OS runs out of space in your swap file.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio