Cache and scratchpad memories - caching

Could someone explain what is the difference between cache memory and scratchpad memory? I'm currently learning about computer architecture.

A scratchpad is just that a place to keep some stuff. Cache, is memory you talk through normally not talk at. Scratchpad is like a post it note, something you write something on and keep with you. Cache is paper you send off to someone else with instructions like a memo.
Cache can be in various places, layers (L1, L2, L3...). both scratchpad and cache are just sram in some chip, with an address and data bus and read/write/etc control signals. (as are many other things in a computer which may or may not be used for addressable ram). During boot, before the ram on the far side (slower ram side, processor being the near side) is initialized (eventually dram typically if you have a cache otherwise why have a cache) it may be possible to access the cache as addressable ram. It depends very much on the system/design though, there may be a control register that enables it to behave as a simple ram, or there may be a mode, or its normal mode may be such that so long as you dont address more than the size of the ram based on its alignment (perhaps a 32K ram between 32K boundaries) then it may not try to evict anything and generate bus cycles on the dram/slow/far side of the cache allowing you to use it as ram just like a scratchpad.
BUT, the normal use case for a cache is as an ideally invisible pathway to ram. You dont access the cache ram using cache addressing you use the address space of the ram beyond and the cache simply allows the processor to continue without waiting for the slow ram.
Talking about booting again, think about the kinds of things you need to do when booting, namely bringing up the dram controller, which is most definitely a non-trivial thing. Having some on chip memory allows you to if nothing else temporarily have some ram for a small stack and for some variables. You can for example us a compiler on a compiled language like C which needs at a minimum some ram for stack and variables. Depending on space you can put some program there too, likely running there much faster than from flash. The alternative to having no ram is likely having to write the dram init in assembly using only general purpose or other registers in the processor, taking a complicated task and making it that much more difficult. Once the main system ram is up, then you may or may not choose to not use the on chip (scratchpad) ram.
I would and do argue that if you want to test the dram to see if it is working then you need to not use that ram to test that ram, the test program should not run in nor use the ram under test. Having scratchpad ram on chip (or some other ram in the address space, perhaps video card ram for example) could be used for the dram test program. Unfortunately lots of folks will use the ram under test to hold the stack and program and variables and heap from the program doing the test, leaving important parts of the ram untested other than one or a small number of patterns.

Related

Is it possible to "gracefully" use virtual memory in a program whose regular use would consume all physical RAM?

I am intending to write a program to create huge relational networks out of unstructured data - the exact implementation is irrelevant but imagine a GPT-3-style large language model. Training such a model would require potentially 100+ gigabytes of available random access memory as links get reinforced between new and existing nodes in the graph. Only a small portion of the entire model would likely be loaded at any given time, but potentially any region of memory may be accessed randomly.
I do not have a machine with 512 Gb of physical RAM. However, I do have one with a 512 Gb NVMe SSD that I can dedicate for the purpose. I see two potential options for making this program work without specialized hardware:
I can write my own memory manager that would swap pages between "hot" resident memory and "cold" on the hard disk, probably using memory-mapped files or some similar construct. This would require me coding all memory accesses in the modeling program to use this custom memory manager, and coding the page cache and concurrent access handlers and all of the other low-level stuff that comes along with it, which would take days and very likely introduce bugs. Also performance would likely be poor. Or,
I can configure the operating system to use the entire SSD as a page file / SWAP filesystem, and then just have the program reserve as much virtual memory as it needs - the same as any other normal program, relying on the kernel's memory manager which is already doing the page mapping + swapping + caching for me.
The problem I foresee with #2 is making the operating system understand what I am trying to do in a "cooperative" way. Ideally I would like to hint to the OS that I would only like a specific fraction of resident memory and swap the rest, to keep overall system RAM usage below 90% or so. Otherwise the OS will allocate 99% of physical RAM and then start aggressively compacting and cutting down memory from other background programs, which ends up making the whole system unresponsive. Linux apparently just starts sacrificing entire processes if it gets too bad.
Does there exist a kernel command in any language or operating system that would let me tell the OS to chill out and proactively swap user memory to disk? I have looked through VMM functions in kernel32.dll and the Linux paging and swap daemon (kswapd) documentation, but nothing looks like what I need. Perhaps some way to reserve, say, 1Gb of pages and then "donate" them back to the kernel to make sure they get used for processes that aren't my own? Some way to configure memory pressure or limits or make kswapd work more aggressively for just my process?

Where do processor's instructions resides ? in CPU cache or?

My question is about CPU and its instructions. I know they have to be stored somewhere, so i would like to know where they are being stored ? I suppose in cache...
In cache, and memory (for all of the programs instructions which may not be needed anytime soon).
The exact cache depends on the CPU. And cache is really just a general term used to describe any "memory" on the CPU chip, as opposed to on the separate RAM chip.
A modern CPU often has multiple caches. That means that the cache used for storing instructions is called an "instruction cache".
Information on caches in general can be found at Wikipedia - http://en.wikipedia.org/wiki/CPU_cache.
Some latest CPU's have something like a loop-cache(for decoded-instructions). When your critical loop is 64 bytes or smaller, then it is the fastest. Then there are outer-caches like L1-instruction cache and L2-L3 caches. The outermost home of the instructions is RAM. Of course they all are loaded from ROM, HDD, modem, floppy disc, cd-rom,.... Those are written with programming languages using a keyboard which is used to pass instructions/opinions in your mind. ----> Brain is the source, cache and cpu-registers are the last stop.
Instructions are data, just as data is data. Everything in binary and everything is stored in RAM and everything is loaded from memory into the CPU (and on the way, is cached). The CPU knows when it is loading instructions, rather than data, because the CPU knows what it is doing (e.g. I'm running this instruction now, I need to run this one next, I need to load it... ah, I'm loading an instruction).

What is the maximum addressable space of virtual memory?

Saw this questions asked many times. But couldn't find a reasonable answer. What is actually the limit of virtual memory?
Is it the maximum addressable size of CPU? For example if CPU is 32 bit the maximum is 4G?
Also some texts relates it to hard disk area. But I couldn't find it is a good explanation. Some says its the CPU generated address.
All the address we see are virtual address? For example the memory locations we see when debugging a program using GDB.
The historical reason behind the CPU generating virtual address? Some texts interchangeably use virtual address and logical address. How does it differ?
Unfortunately, the answer is "it depends". You didn't mention an operating system, but you implied linux when you mentioned GDB. I will try to be completely general in my answer.
There are basically three different "address spaces".
The first is logical address space. This is the range of a pointer. Modern (386 or better) have memory management units that allow an operating system to make your actual (physical) memory appear at arbitrary addresses. For a typical desktop machine, this is done in 4KB chunks. When a program accesses memory at some address, the CPU will lookup where what physical address corresponds to that logical address, and cache that in a TLB (translation lookaside buffer). This allows three things: first it allows an operating system to give each process as much address space as it likes (up to the entire range of a pointer - or beyond if there are APIs to allow programs to map/unmap sections of their address space). Second it allows it to isolate different programs entirely, by switching to a different memory mapping, making it impossible for one program to corrupt the memory of another program. Third, it provides developers with a debugging aid - random corrupt pointers may point to some address that hasn't been mapped at all, leading to "segmentation fault" or "invalid page fault" or whatever, terminology varies by OS.
The second address space is physical memory. It is simply your RAM - you have a finite quantity of RAM. There may also be hardware that has memory mapped I/O - devices that LOOK like RAM, but it's really some hardware device like a PCI card, or perhaps memory on a video card, etc.
The third type of address is virtual address space. If you have less physical memory (RAM) than the programs need, the operating system can simulate having more RAM by giving the program the illusion of having a large amount of RAM by only having a portion of that actually being RAM, and the rest being in a "swap file". For example, say your machine has 2MB of RAM. Say a program allocated 4MB. What would happen is the operating system would reserve 4MB of address space. The operating system will try to keep the most recently/frequently accessed pieces of that 4MB in actual RAM. Any sections that are not frequently/recently accessed are copied to the "swap file". Now if the program touches a part of that 4MB that isn't actually in memory, the CPU will generate a "page fault". THe operating system will find some physical memory that hasn't been accessed recently and "page in" that page. It might have to write the content of that memory page out to the page file before it can page in the data being accessed. THis is why it is called a swap file - typically, when it reads something in from the swap file, it probably has to write something out first, effectively swapping something in memory with something on disk.
Typical MMU (memory management unit) hardware keeps track of what addresses are accessed (i.e. read), and modified (i.e. written). Typical paging implementations will often leave the data on disk when it is paged in. This allows it to "discard" a page if it hasn't been modified, avoiding writing out the page when swapping. Typical operating systems will periodically scan the page tables and keep some kind of data structure that allows it to intelligently and quickly choose what piece of physical memory has not been modified, and over time builds up information about what parts of memory change often and what parts don't.
Typical operating systems will often gently page out pages that don't change often (gently because they don't want to generate too much disk I/O which would interfere with your actual work). This allows it to instantly discard a page when a swapping operation needs memory.
Typical operating systems will try to use all the "unused" memory space to "cache" (keep a copy of) pieces of files that are accessed. Memory is thousands of times faster than disk, so if something gets read often, having it in RAM is drastically faster. Typically, a virtual memory implementation will be coupled with this "disk cache" as a source of memory that can be quickly reclaimed for a swapping operation.
Writing an effective virtual memory manager is extremely difficult. It needs to dynamically adapt to changing needs.
Typical virtual memory implementations feel awfully slow. When a machine starts to use far more memory that it has RAM, overall performance gets really, really bad.

caches vs paging

So I'm in a computer architecture class, and I guess I'm having a hard time differentiating between caching and pages.
The only explanation I can come up with is that pages are the OS's way of tricking a program that it's doing all it's work in a specified region of memory, vs a cache memory is the hardware's way of tricking the OS that it's reading from one specified region of memory, when it's really not.
Does the os direct the hardware that it needs a "new page" or is that taken care of by the os trying to read the address that is "out of range" of the current cache "page" (for lack of a better term).
Am I on the right track or am I completely crazy?
Caching and pages are orthogonal concepts.
A cache is a high-speed "memory" that acts to minimise the number of accesses to a large low-speed "memory". In the most general sense, the high-speed "memory" could be your hard disk acting to cache web pages fetched from the web (low-speed "memory"). Of course, in the context of computer architecture, the term "cache" is more likely to refer to physical RAM used to speed up access to slower RAM or disk.
Pages, OTOH, are simply a unit of management for the contents of RAM or disk.
These two concepts come together in implementing virtual memory systems. A process may allocate 500 MB of memory. This may be more that the physical RAM available to give to the process, so the operating system allocates blocks on disk called pages, which will hold the contents of certain logical pages in the process's address-space.
When the process accesses a location in its address-space, and the associated page isn't currently mapped into physical memory, the CPU signals a page fault, and the OS responds by fetching the page from disk while the process is in a suspended state. Once the page is mapped, the process resumes and is able to access that memory location as if it was there all along.
The common view that virtual memory is a way of tricking the process into thinking it has tons of RAM isn't the only way to think about this. You could also think of a process's address-space as being logically stored on disk pages, with the OS-assisted mapping into RAM being just a way to cache the contents of those pages such that the process isn't continually accessing the hard drive. In this sense, caching and paged virtual memory are logically the same thing. Just keep in mind that, while this viewpoint may help to understand the relationship between the two concepts, it isn't entirely accurate, since it is possible to run without virtual memory at all, just physical memory (in fact, most embedded systems run this way).
I think Paging is bring instruction data from disk or secondary memory to the main memory while caching is bring instruction data from main memory to CPU

2 basic computer questions

Question 1:
Where exactly does the internal register and internal cache exist? I understand that when a program is loaded into main memory it contains a text section, a stack, a heap and so on. However is the register located in a fixed area of main memory, or is it physically on the CPU and doesn't reside in main memory? Does this apply to the cache as well?
Questions 2:
How exactly does a device controller use direct memory access without using the CPU to schedule/move datum between the local buffer and main memory?
Basic answer:
The CPU registers are directly on the CPU. The L1, L2, and L3 caches are often on-chip; however, they may be shared between multiple cores or processors, so they're not always "physically on the CPU." However, they're never part of main memory either. The general principle is that the closer memory is to the CPU, the faster and more expensive (and thus smaller) it is. Every item in the cache has a particular main memory address associated with it (however, the same slot can be associated with different addresses at different times). However, there is no direct association between registers and main memory. That is why if you use the register keyword in C (not that it's often necessary, since the compiler is usually a better optimizer), you can not use the & operator.
The DMA controller executes the transfer directly. The CPU watches the bus so it knows when changes are made "behind its back", which invalidate its cache(s).
Even though the CPU is the central processing unit, it's not the sole "mover and shaker". Devices live on buses, along with CPUs, and also RAM. Modern buses allow devices to communicate with RAM without involving the CPU. Some devices are programmed simply by making changes to pieces of RAM which devices poll. Device drivers may poll pieces of RAM that a device is writing into, but usually a CPU receives an interrupt from the device telling it that there's something in a piece of RAM that's ready to read.
So, in answer to your question 2, the CPU isn't involved in memory transfers across the bus, except inasmuch as cache coherence messages about the invalidation of cache lines are involved. Bear in mind that the scenarios are tricky. The CPU might have modified byte 1 on a cache line when a device decides to modify byte 40. Getting that dirty cache line out of the CPU needs to happen before the device can modify data, but on x86 anyway, that activity is initiated by the bus, not the CPU.

Resources