Is the number of memory modules equivalent to the number of memory banks? - memory-management

If I'm attempting to determine the n-way interleave of a memory system. Would this be represented by the number of physical memory modules?

Related

How caches are connected to cores?

I have very fundamental question on how physically (in RTL) caches (e.g. L1,L2) are connected to cores (e.g. Arm Cortex A53)? How many read/writes ports/bus are there and what is width of it? Is it 32-bit bus? How to calculate theoretical max bandwidth/throughput on L1 cache connected to Arm Cortex A53 running at 1400MHz?
On web lots of information is available on how caches work but couldn't find how it is connected.
You can get the information in the ARM documentation (which is pretty complete compared to others):
L1 data cache:
(configurable) sizes of 8KB, 16KB, 32KB, or 64KB.
Data side cache line length of 64 bytes.
256-bit write interface to the L2 memory system.
128-bit read interface to the L2 memory system.
64-bit read path from the data L1 memory system to the datapath.
128-bit write path from the datapath to the L1 memory system.
Note there is one datapath since it is mentioned when there are multiple of them, hence there is certainly 1 port unless 2 ports share the same datapath which would be surprising.
L2 cache:
All bus interfaces are 128-bits wide.
Configurable L2 cache size of 128KB, 256KB, 512KB, 1MB and 2MB.
Fixed line length of 64 bytes.
General information:
One to four cores, each with an L1 memory system and a single shared L2 cache.
In-order pipeline with symmetric dual-issue of most instructions.
Harvard Level 1 (L1) memory system with a Memory Management Unit (MMU).
Level 2 (L2) memory system providing cluster memory coherency, optionally including an L2 cache.
The Level 1 (L1) data cache controller, that generates the control signals for the associated embedded tag, data, and dirty RAMs, and arbitrates between the different sources requesting access to the memory resources. The data cache is 4-way set associative and uses a Physically Indexed, Physically Tagged (PIPT) scheme for lookup that enables unambiguous address management in the system.
The Store Buffer (STB) holds store operations when they have left the load/store pipeline and have been committed by the DPU. The STB can request access to the cache RAMs in the DCU, request the BIU to initiate linefills, or request the BIU to write out the data on the external write channel. External data writes are through the SCU.
The STB can merge several store transactions into a single transaction if they are to the same 128-bit aligned address.
An upper-bound for the L1 bandwidth is frequency * interface_width * number_of_paths so 1400MHz * 64bit * 1 = 10.43 GiB/s from the L1 (reads) and 20.86 GiB/s to the L1 (writes). In practice, the concurrency can be a problem but it is hard to know which part of the chip will be a limiting factor.
Note that there are many other documents available but this one is the most interesting. I am not sure you can get the physical information about cache in RTL since I expect this information to be confidential, hence not publicly available (because I guess competitors could take benefit of this).

Can fixed partitioning suffer from external fragmentation?

Is it possible for fixed sized partitioning to suffer from external partitioning?
My sir said it's not possible and that fixed sized partitioning can only suffer from internal fragmentation. But consider this case, a fixed sized memory of 30Kb, divided into 3 partitions of 10Kb each and a process of 10Kb resides in the middle partition. Now a new process of 20Kb requires memory, but it can't be assigned memory because even if the required memory is available, it is not contiguous. Isn't this external fragmentation?
But consider this case, a fixed sized memory of 30Kb, divided into 3 partitions of 10Kb each and a process of 10Kb resides in the middle partition. Now a new process of 20Kb requires memory, but it can't be assigned memory because even if the required memory is available, it is not contiguous. Isn't this external fragmentation?
No.
For fixed size partitioning you can't allocate anything larger than a partition; so even if all partitions were empty the allocation would fail because it's larger than the size of a partition (20 Kib > 10 KiB).
For allocations that are possible (not larger than a partition) external fragmentation is impossible (mostly because it becomes internal fragmentation instead).

When is it advantageous to define virtual memory smaller than physical memory?

Generally as we know virtual memory is larger than physical memory.But when is it advantageous to define virtual memory smaller than physical memory?
If you have pointer-heavy code, you can save memory by choosing a smaller address space. For example, a pointer on a 32-bit platform occupies 4 bytes versus 8 bytes on 64-bit. The same goes for integer types like size_t.
This only works and makes sense if:
Your code/application/server uses multiple processes and all processes together need more memory than the amount of virtual memory (otherwise you wouldn't need more physical than virtual memory).
Your platform supports more physical than virtual memory (for example, Intel PAE).
The smaller amount of virtual memory is enough for each single process.
Imagine a large server system supporting multiple users. You don't want users to hog memory, so you restrict the size of the logical (virtual) address space by limiting page table size.

Is contiguous memory easier to get in a 64-bit address space? If so why?

A comment in this blog states:
We know how to make chunked heaps, but there would be some overhead to
using them. We have more requests for faster storage management than
we do for larger heaps in the 32-bit JVM. If you really want large
heaps, switch to the 64-bit JVM. We still need contiguous memory,
but it's much easier to get in a 64-bit address space.
This implication of the above statement is that it is easier to get contiguous memory in a 64-bit address space. Is this true? If so why?
That's very true. A process must allocate memory from the virtual memory address space. Which stores both code and data and whose size is restricted by the addressing capability of the architecture. You can never address more than 2^32 bytes in a 32-bit process, not counting bank-switching tricks. That's 4 gigabytes. The operating system typically takes a big chunk out of that as well, on 32-bit Windows for example that cuts down the addressable VM size to 2 gigabytes.
Ideally, allocations are made so that they fit snugly together. That very rarely works out in practice. Shared libraries or DLLs in particular need to pick a preferred load address and that has to be guessed up front when the library is built.
So in practice, the allocations are made from the holes in between existing ones and the largest possible contiguous allocation you can get is restricted by the size of the largest hole. Usually much smaller than the addressable VM size, on Windows it is typically around 650 megabytes. That tends to go down-hill from there as the available address space is getting fragmented by allocations. Particularly by native code that can't afford to have allocations moved by a compacting garbage collector. If you use Windows then you can get insight in the VM allocations with the SysInternals' VMMap utility.
This problem completely disappears in a 64-bit process. The theoretical addressable virtual memory size is 2^64, an enormous number. So large that current processors don't implement it, they can go up to 2^48. Further restricted by the operating system version you have and its willingness to keep page mapping tables for that much VM. Eight terabytes is a typical limit. By implication, the holes between allocations are huge. Your program will keel over on paging file thrashing before it dies from OOM.
I can't speak for how the JVM is implemented obviously, but from a purely theoretical viewpoint, if you have a significantly larger virtual address space (eg 64-bit as compared with 32-bit) it should be significantly easier to find a large block of contiguous memory which is available for allocation (going to extremes - you've got no chance of finding a contiguous 4GB of free memory in a 32-bit address space, but a significant chance of finding this space in a full 64-bit address space).
It should be noted that whatever the virtual address space size, this is still going to be implemented by allocation of (probably) non-contiguous physical memory pages, particularly if the requested allocation is large - the larger virtual address space just means there are a likely to be a lot more contiguous virtual addresses available for use.

Are there any memory restrictions on Linux Kernel Modules?

Are there any restrictions on memory usage by a Linux Kernel Module i.e Code Segment size or amount of global memory or any thing.
In 2.6.35, load_module() bails out if the length of the module to load exceeds 64 MB: http://lxr.linux.no/#linux+v2.6.35/kernel/module.c#L2118
vmalloc() is used to allocate space for the module -- this fails if you try to allocate more pages than available in your physical memory (which in turn will probably only be an issue for embedded stuff with low RAM)
Furthermore, kzalloc() (and in turn, kmalloc()) are used. Depending on the allocator used (SLAB, SLOB, SLUB), there may be restrictions as well. SLAB defines a KMALLOC_MAX_SIZE wich defines the maximum number of bytes you can allocate with a single call to kmalloc().

Resources