Does CUDA mapped memory take up GPU RAM? - memory-management

For example, if I have a GPU with 2GB RAM and in my app allocate large array, like 1GB, as mapped memory (page-locked host memory that is mapped to GPU address space, allocated with cudaHostAlloc()), will the amount of available GPU memory be reduced for that 1GB of mapped memory, or will I still have (close to) 2GB as I had before allocation and use?

Mapping host memory so that it appears in the GPU address space does not consume memory from the GPU on-board memory.
You can verify this in a number of ways, such as using cudaMemGetInfo

Related

allocation of memory more than 8 KB for device driver in linux kernel space

I wrote a code for the device driver in kernel space, I would like to allocate more than 8KB by using _get_free_pages(size, flags).
Can you please provide the solution to allocate large lower memory in kernel space? (NOTE: Contiguous and physical memory is required here)
Initially, I was able to allocate 4KB, by changing the order it improved to 8KB, but I would like to allocate more than that.

Can gpu use swap space when its ram is full?

I'm doing some gpu calculation using OpenCL where I need to create a buffer with size about 5 GB. My laptop has an integrated gpu with 1.5 GB ram size. I tried to run the code and it gave the wrong result. So I guess it's because the ram of gpu is full. My question is that whether there is some "swap space"(or virtual memory) that gpu can utilize when its ram is full? I know that cpu has this mechanism. But I'm not sure for gpu.
No, it cannot (at least on most GPUs). Because the GPU uses its own memory (the RAM on your graphics card) in general.
Also OpenCL code in your kernels don't do any malloc (inside the kernel). You'll use clCreateBuffer
That would depend on the GPU and whether it had an MMU and DMA access to the host memory.
A GPU with an MMU can virtualize GPU and host memory, so that it can appear as a single address space, with the physical host memory accesses handled by DMA transfer. I would imagine that if your GPU had that capability that would already be done; in which case you problem is most probably elsewhere.

How is virtual address space greater than physical address space?

How is the Virtual address space greater than Physical address space ?
suppose a Virtual 0x7000 maps to physical address 0x8000, can another virtual address lets say
0x7500 map to the same physical location as 0x8000, if not then how can there be more virtual
address and limited physical memory since mapping has to convert to physical address?
Please help me understand this concept.
http://en.wikipedia.org/wiki/Virtual_memory.
Virtual Memory uses both physical ram and hard disk space to represent more memory than may physically exist, and provides an interface whereby each program can request memory resources without having to be concerned with the other programs existent on the machine and which memory addresses they may request.
The whole virtual address space does not have to be mapped to physical memory at the same time. That's what makes it "virtual". The contents of that virtual memory which is allocated but not currently mapped to physical memory reside on some form of external storage, typically disk.
It is the memory management system's job to move virtual memory pages into and out of physical memory as needed, and the requirement to do so is why virtual-memory computers can slow down overall when enough memory is allocated that it no longer all fits in physical memory at the same time.

zone_NORMAL and ZONE_HIGHMEM on 32 and 64 bit kernels

I trying to to make the linux memory management a little bit more clear for tuning and performances purposes.
By reading this very interesting redbook "Linux Performance and Tuning Guidelines" found on the IBM website I came across something I don't fully understand.
On 32-bit architectures such as the IA-32, the Linux kernel can directly address only the first gigabyte of physical memory (896 MB when considering the reserved range). Memory above the so-called ZONE_NORMAL must be mapped into the lower 1 GB. This mapping is completely transparent to applications, but allocating a memory page in ZONE_HIGHMEM causes a small performance degradation.
why the memory above 896 MB has to be mapped into the lower 1GB ?
Why there is an impact on performances by allocating a memory page in ZONE_HIGHMEM ?
what is the ZONE_HIGHMEM used for then ?
why a kernel that is able to recognize up to 4gb ( CONFIG_HIGHMEM=y ) can just use the first gigabyte ?
Thanks in advance
When a user process traps in to the kernel, the page tables are not changed. This means that one linear address space must be able to cover both the memory addresses available to the user process, and the memory addresses available to the kernel.
On IA-32, which allows a 4GB linear address space, usually the first 3GB of the linear address space are allocated to the user process, and the last 1GB of the linear address space is allocated to the kernel.
The kernel must use its 1GB range of addresses to be able to address any part of physical memory it needs to. Memory above 896MB is not "mapped into the low 1GB" - what happens is that physical memory below 896MB is assigned a permanent linear address in the kernel's part of the linear address space, whereas as memory above that limit must be assigned a temporary mapping in the remaining part of the linear address space.
There is no impact on performance when mapping a ZONE_HIGHMEM page into a userspace process - for a userspace process, all physical memory pages are equal. The impact on performance exists when the kernel needs to access a non-user page in ZONE_HIGHMEM - to do so, it must map it into the linear address space if it is not already mapped.

Kernel memory address space

I've read that, on a 32-bit system with 4GB system memory, 2GB is allocated to user mode and 2GB allocated to kernel mode. But, If I had a system with 512 MB of memory, would it be partitioned as 256 MB to user and 256 MB to kernel address space?
You are confusing physical and virtual memory. 2GB is allocated to user/system, but it is virtual memory. It is even more correct to say that they are not rather allocated but they constitute an addressing space. Initially this space is not bound to physical memory at all. When application actually needs memory (first time is at start up) physical memory is allocated and some addresses from address space are mapped to it. When memory is allocated but not used long enough or PC is running out of physical memory data can be dumped in swap file, and stay there until requested. This mapping is transparent for application and it has no idea where data currently is: on chip or on HDD. So the address space is always splitted the same way.
This is not about memory (physical or virtual), but about address space.
You can plug 16GB of physical memory into your computer and make a 100GB swapfile, but 32-bit (non-enterprise) Windows will still only see 4GB (and subtract 0.75 GB for GPU memory and such). Via PAE, it could use more, but non-enterprise versions won't do that.
On top of the actual amount of memory, there is address space, which is limited to 4GB as well. Basically it is no more and no less than the collection of "numbers" (which, in this case, are addresses) that can be represented by a 32 bit number.
Since the kernel will need memory too, there is some arbitrary line drawn, which happens to be at the 2GB boundary for 32bit Windows, but can be configured differently, too.
It has nothing to do with the amount of memory on your computer (virtual or phsyical), but it is a limiting factor of how much memory you can use within a single program instance. It is not, however, a limiting factor on the memory that several programs could use.
As far as I can tell, what you are referring to are limits of how much memory can be allocated. This is much different than how much memory the OS allocated during runtime.

Resources