dma_alloc_coherent fails when buffer > 2M on kernel 3.2 - linux-kernel

I have this x86 device and a kernel module that tries to allocate DMA memory. It has a parameter called dmasize that allows to control the size of allocated memory.
I've noticed that allocation succeeds when dmasize=2M but not if larger. Even at boot time.
I heard there was a limitation by CONSISTENT_DMA_SIZE, but looking at lxr, I can't find it for arch x86 kernel 3.2.
Not sure if it is relevant, but this is a 32 bit machine with 8GB of RAM and a pae enabled kernel.
This is the call to dma_alloc_coherent:
dma_addr_t dma_handle;
if (!(_dma_vbase = dma_alloc_coherent(0, alloc_size, &dma_handle, GFP_KERNEL)) || !dma_handle) {
gprintk("_alloc_mpool: Kernel failed to allocate the memory pool of size 0x%lx\n", (unsigned long)alloc_size);
return;
}
Appreciate anyone who can help with this.

Just in case anyone comes across this, the answer is as follows:
The config flag CONFIG_FORCE_MAX_ZONEORDER which defaults at 11 at most architecture is the cause for this limitation.
increasing it to 12 (and recompiling the kernel) fixes the problem.
I suspect using CMA will also be possible but since my kernel doesn't support it, I cannot say for sure.

Related

check if the mapped memory supports write combining

I write a kernel driver which exposes to the user space my I/O device.
Using mmap the application gets virtual address to write into the device.
Since i want the application write uses a big PCIe transaction, the driver maps this memory to be write combining.
According to the memory type (write-combining or non-cached) the application applies an optimal method to work with the device.
But, some architectures do not support write-combining or may support but just for part of the memory space.
Hence, it is important that the kernel driver tell to application if it succeeded to map the memory to be write-combining or not.
I need a generic way to check in the kernel driver if the memory it mapped (or going to map) is write-combining or not.
How can i do it?
here is part of my code:
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
io_remap_pfn_range(vma, vma->vm_start, pfn, PAGE_SIZE, vma->vm_page_prot);
First, you can find out if an architecture supports write-combining at compile time, with the macro ARCH_HAS_IOREMAP_WC. See, for instance, here.
At run-time, you can check the return values from ioremap_wc, or set_memory_wc and friends for success or not.

ensure the DMA -capable memory

I was reading section 'Part Id' of the following document I'm not sure how relevant this document to kernel 2.6.35 for instance; specifically it says:
..the DMA address of the memory must be within the dma_mask of the device..
and they recommend to pass certain flags, such as GFP_DMA, to kmalloc, so that it ensures the memory will fall within DMA mask provided.
However if the memory is allocated from cache pool created by kmem_cache_create, and with kmem_cache_alloc(.. GFP_ATOMIC), this doesn't meet requirements outlined in DMA-API.txt ?
On the other hand, LDD talks about __GFP_DMA flag with regard to legacy ISA devices, therefore I'm not sure this is applicable to PCI/PCIe devices.
This is x86 64-bit platform if it matters:
pci_set_dma_mask(dev, 0xffffffffffffffffULL);
pci_set_consistent_dma_mask(dev, 0xffffffffffffffffULL);
I would appreciate to hear some explanations on it.
For GFP_* for DMA
On x86:
ISA - when using kmalloc() need to bitwise-or GFP_DMA with GFP_KERNEL (or _ATOMIC) because of the following:
GFP_DMA guarantees:
(1) physical addresses are consecutive when get_free_page returns more than one page and
(2) only addresses lower than MAX_DMA_ADDRESS are returned. MAX_DMA_ADDRESS is 16MB on the PC because of ISA constraings
PCI - don't need to use GFP_DMA because there is no MAX_DMA_ADDRESS limit
The dma_mask is checked by the device when calling dma_map_* or dma_alloc_coherent.
dma_alloc_coherent ensures the memory allocated is able to be used by dma_map_* which gives other benifits too. (the implementation may choose to ignore flags that affect the location of the returned memory, like GFP_DMA)
You can refer to http://coweb.cc.gatech.edu/sysHackfest/uploads/58/DMA_howto.1.txt

Windows kernel memory protection

In Windows the high memory of every process (0x80000000 or 0xc0000000)
Is reserved for kernel code, user code cannot access these regions of memory, if it tries so an access violation exception will be thrown.
I wish to know how is the kernel space protected ?
Is it via memory segmentations or via paging ?
I would like to hear a technical explanation.
Thanks a lot,
Michael.
Assuming you are talking about x86 and x64 architectures.
Memory protection is achieved using the paging system. Each page table entry on an x86/x64 CPU has a bit to indicate whether it is a user or supervisor page. Accesses to supervisor pages are only permitted for code running with CPL<3, whereas accesses to non supervisor pages are possible regardless of CPL.
CPL is the "Current Privilege Level" which is sometimes referred to as Ring. Windows only uses two rings, although the CPU implements 4. Ring 0 is the CPU mode in which what Windows refers to as "kernel mode" runs. Ring 3 is the CPU mode in which "User mode" runs. Since code running at CPL=3 cannot access supervisor pages, this is how memory protection is implemented.
The answer for ARM is likely to be similar, but different.
That's an easy one and doesn't require talking about rings and kernel behavior. Accessing virtual memory at a particular address requires that address to be mapped, the operating system has to allocate a memory page for that address. The low-level winapi function that does that is VirtualAlloc(). Which takes an optional address, first argument. The OS will simply fail a request for an unmappable address. Otherwise the exact same mechanism that prevents you from mapping any address in the lowest 64KB of the address space.

How can I shrink the OS region in RAM through U-boot?

From my understanding, after a PC/embedded system booted up, the OS will occupy the entire RAM region, the RAM will look like this:
Which means, while I'm running a program I write, all the variables, dynamic memory allocated in the stacks, heaps and etc, will remain inside the region. If I run firefox, paint, gedit, etc, they will also be running in this region. (Is this understanding correct?)
However, I would like to shrink the OS region. Below is an illustration of how I want to divide the RAM:
The reason that I want to do this is because, I want to store some data receive externally through the driver into the Custom Region at fixed physical location, then I will be able to access it directly from the user space without using copy_to_user().
I think it is possible to do that by configuring u-boot, but I have no experience in u-boot, can anyone give me some directions where to begin with, such as: do I need to modify the source of u-boot, or changing the environment variables of u-boot will be sufficient?
Or is there any alternative method of doing this?
Any help is much appreciated. Thanks!
p/s: I'm using TI ARM processor, and booting up from an SD card, I'm not sure if it matters.
The platform is ARM. min_addr and max_addr will not work on these platform since these are for Intel-only implementations.
For the ARM platform try to look at "mem=size#start" kernel parameter. Read up on Documentation/kernel-parameters.txt and arch/arm/kernel/setup.c. This option is available on most new Linux code base (ie. 2.6.XX).
You need to set the following parameters:
max_addr=some_max_physical
min_addr=some_min_physical
to be passed to the kernel through uboot in the 'bootargs' u-boot environment variable.
I found myself trying to do the opposite recently - in other words get Linux to use the additional memory in my system - although I'm using Barebox rather than u-boot on a OMAP4 platform.
I found (a bit to my surprise) that once the Barebox MLO first stage boot-loader was aware of the extra RAM, the kernel then detected and used it as well without any bootargs. Since the memory size is not passed anywhere on the boot-line, I can only assume the kernel inspects the memory mappings set up by the boot-loader to determine RAM size. This suggests that modifying your u-boot to not map all of the RAM is the way to go.
On the subject of boot-args, there was a time when you it was recommended that you mapped out a chunk of RAM (used by the frame buffer?) on OMAP4 systems, using the boot-line. It's still unclear whether this is still necessary.

Somewhat newb question about assy and the heap

Ultimately I am just trying to figure out how to dynamically allocate heap memory from within assembly.
If I call Linux sbrk() from assembly code, can I use the address returned as I would use an address of a statically (ie in the .data section of my program listing) declared chunk of memory?
I know Linux uses the hardware MMU if present, so I am not sure if what sbrk returns is a 'raw' pointer to real RAM, or is it a cooked pointer to RAM that may be modified by Linux's VM system?
I read this: How are sbrk/brk implemented in Linux?. I suspect I can not use the return value from sbrk() without worry: the MMU fault on access-non-allocated-address must cause the VM to alter the real location in RAM being addressed. Thus assy, not linked against libc or what-have-you, would not know the address has changed.
Does this make sense, or am I out to lunch?
Unix user processes live in virtual memory, no matter if written in assembler of Fortran, and should not care about physical addresses. That's kernel's business - kernel sets up and manages the MMU. You don't have to worry about it. Page faults are handled automatically and transparently.
sbrk(2) returns a virtual address specific to the process, if that's what you were asking.

Resources