How to compile kernel with bigge page size - linux-kernel

I tried to mount a parition with block size larger than 4kb.
the operation failed with the following error:
Function not implemented
(I tried enabling Huge Pages, got the same error).
After some research I found that this is probably due to the blocksize being larger than the os page size.
As I understand the page size is determined at compile time,
so I'd like to try compile a kernel with different page size.

You can use these kernel configurations from make menuconfig and change the page size.
CONFIG_<ARCH>_4K_PAGES
CONFIG_<ARCH>_16K_PAGES
CONFIG_<ARCH>_64K_PAGES

Related

Can gcc be configured to compile position-independent code for the code but position-dependent code for the data?

I'm trying to build bootable code for an ARM M7-based embedded system that is able to execute in place at two different locations in the QSPI, so that if one version gets corrupted, the backup version of the image can be executed in a different place.
Compiling with -fpic seems to produce a relocatable code image that is (nearly) able to execute in both places fine. However, the problem is that the data/bss the code refers to is also getting offset by the same amount - that is, the compiler is assuming that the .data and .bss segments live immediately after the .text segment, which isn't true for XIP embedded systems (where the RAM is separate).
As a result, if the original binary was linked to run at 0x60000000 (and using a fixed ram area at 0x20000000) but is then executed in place at 0x60100000 instead , the ram addresses will be shifted by 0x100000 as well (i.e. to 0x20100000), which isn't what I want at all.
Clearly, what I'd like to do is to modify gcc's behaviour so that references to the code (executing in place in two different places in the QSPI) are position-independent, while references to the .data/bss segments (in a fixed position in RAM) are position-dependent (as per normal).
Is this something that gcc can be tweaked to achieve (e.g. by some obscure linker attribute flag)? Or is this just out of its reach? Thanks!

umfpack: An error occurred: numeric factorization: not enough memory

I'm having a problem while running a Scilab code.
As title suggests, I get the error numeric factorization: not enough memory, related to the umfpack function.
In task manager I see a memory usage around 3GB (my system has 16GB).
Can anyone help me with this issue?
My guess is that you are attempting to use umfpack with matrices which are too big, and then it is unable to allocate the memory required (probably more than the 13GB you still have available).
See also https://scicomp.stackexchange.com/a/8972/21926

What is the use of flag PF_MEMALLOC

When I am browsing some code in one device driver in linux, I found the flag PF_MEMALLOC is being set in the thread (process). I found the definition of this flag in header file, which saying that "Allocating Memory"
#define PF_MEMALLOC 0x00000800 /* Allocating memory */
So, my doubt here is, what exactly the use of this flag when set it in a process/thread like code current->flags |= PF_MEMALLOC;
This flag is used within the kernel to indicate a thread that is current executing with a memory-allocation path, and therefore is allowed to recursively allocate any memory it requires ignoring watermarks and without being forced to write out dirty pages.
This is to ensure that if the code that is attempting to free pages in order to satisfy an original allocation request itself has to allocate a small amount of memory to proceed, that code won't then recursively try to free pages.
Most drivers should not require this flag.

Size constraints of initramfs on ARM?

I'm creating a bootable Linux system on a PicoZed board (ARM CortexA9 core), and I've run into a "limitation", which I don't think really is a limitation (I get the feeling it's another problem masquerading as a limitation).
I boot by starting the system in JTAG boot mode; after powering on the board I use the xmd debugger to place u-boot into the system's RAM and then I run it.
Next I place the kernel (uImage), the gzip'd initramfs image and the device tree into memory. Finally I tell u-boot to boot the system using bootm, and using three arguments to point out the memory locations of all the images.
All of this works, and I manage to boot up Linux + userland. However, I need to grow the initramfs, and this is where I run into problems.
The working image is 16MiB exactly. I tried to make it 24MiB (completely regenerated from scratch each time I try to boot), but just after the kernel has loaded and it tries to find init the kernel reports file system faults and fails. There shouldn't be any overlaps, but just in case I tried moving things around a little, but the same problem occurred.
After searching for some tips, I saw someone on a forum say that the image needs to be placed at a 16MiB alignment (which I don't think is true, but I tried it none the less, but it didn't get a working system). Another post claimed that the images must be aligned with their sizes (which I again don't think is true, but I tried that as well, but with no change). Yet another post claimed that this happens if the initramfs image crosses the __init end boundary, and that placing the initramfs image firmly inside will allow the memory to be reclaimed after the image has been uncompressed by the kernel, and placing it beyond the __init section will work but then that memory is forever lost after boot. I know way too little about Linux in order to know if any of this is in any way true/accurate, and I have no idea where "__init"'s - if such a thing exists - end-boundary is, but if the issue is that I was crossing it I tried moving the initramfs image way beyond anywhere were I was previously using it, but that didn't change anything.
I also tried the original location (which works with a 16MiB image) and create a 16MiB + 1K sized image, but this didn't work either. (Checking that it didn't overlay any of the over images, obviously).
This originally led me to think there's a 16MiB initramfs size limit lurking somewhere. But on searching for it, it got me thinking that doesn't make sense -- as far as I can gather the bootm command in u-boot shoud set up the tag list for the system (which includes the location and size of the initramfs), and I haven't come across any note about a 16MiB limit in relation to the tagged lists for the initramfs so far.
I have found a page which claims that the initramfs size is in practical terms limited to roughly half the size of physical ram in the system, and the PicoZed board has 1G RAM, so we're in orders of magnitude away from what "should" be a limitation.
To clarify: The 16MiB image is the size of the raw image; compressed it's just under 6MiB. However, if I populate it so it's not compressed as tightly it doesn't make any difference -- the problem doesn't appear to be related to the size of the compressed image, only the uncompressed image.
Main question:
Where is this apparent 16MiB initramfs size limit coming from?
Side-question:
Is there such a thing as an "kernel __init section" which is reclaimed by the kernel after it has uncompressed/loaded images? If so, how do I see/configure the location/size of it?
Size constraints of initramfs on ARM?
I have not encountered a 16 MiB size constraint as you allege.
At one time I thought there was a size limit too, but that turned out to be a memory footprint issue during boot. Once that was sorted out I've been using large initramfs of 30MiB (e.g. with glibc, gtstreamer and qt5 libraries).
Where is this apparent 16MiB initramfs size limit coming from?
There isn't one. A ramfs is only constrained by available RAM.
There is a definition for the "Default RAM disk size", but this would not affect the size of an initramfs.
Your method of booting with the U-Boot bootm command is suspect, i.e. you're passing the memory address of the initramfs as the second argument.
The U-Boot documentation clearly describes the second argument as "the address of an initrd image" (emphasis added).
There is no mention of initramfs as an argument.
Linux kernel documentation states that the initramfs archive can be "linked into the linux kernel image". There's a kernel menuconfig entry for specifying the path to the initramfs cpio file. The make script will append this cpio file to the kernel image so that there is a single image file for booting.
Or (like an initrd) "a separate file" can be passed to the kernel at boot to populate the initramfs.
U-Boot passes the location (and length) of this archive either as an ATAG entry or as a reserved memory region in the Device Tree blob.
The kernel expects either a cpio archive for the initramfs or a tar archive for an initrd.
You neglect to mention (besides its compression) what kind of archive this "initramfs" or "separate file" actually is .
So it's not clear if you're booting the kernel with an initrd (tar archive) or an initramfs (cpio archive).
Your repeated reference to the initramfs as an "image" file rather than a cpio archive suggests that you really are using an initrd.
An initrd would definitely have a size constraint.
Is there such a thing as an "kernel __init section" which is reclaimed by the kernel after it has uncompressed/loaded images?
Yes, there is an __init section of memory, which is released after kernel initialization is complete.
If so, how do I see/configure the location/size of it?
Usually routines and data that have no use after initialization can be declared with the __init macro. See this.
The location and size of this memory section would be under the control of the linker script, rather than explicit user control. The kernel System.map file should have the info for review.
I think you need to pass ramdisk_size in uboot bootargs command
ramdisk_size needs to be set if the ram-disk uncompress file
size is bigger than default setting. It should be more than
ramdisk uncompress file size.

Cuda kernel launch failure

I am trying to call two kernels as shown below
for (t=0; t<=time_total; t++)
{
//kernel calls
kernel1<<<noOfBlocks,noOfThreadsPerBlock>>>(** SOME PARAMETERS **);
checkCudaError(cudaThreadSynchronize());
kernel2<<<noOfBlocks,noOfThreadsPerBlock>>>(** SOME PARAMETERS **);
checkCudaError(cudaThreadSynchronize());
}
And the structure of the second kernel is
var[index+0]=**SOME CALCULATION**
var[index+1]=**SOME CALCULATION**
var[index+2]=**SOME CALCULATION**
Now when I execute this code, checkCudaError does not report anything and the code is executed giving some output but visual studio gives the following exception
First-chance exception at 0x7640c41f in **.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0039f9c4..
First-chance exception at 0x7640c41f in **.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0039f9c4..
And when I check on Nsight it says kernel 2 is having the following error
CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
Now the problem is that var array in kernel 2 is giving some of the rows correct some are copies of other row values and some are garbage.
Also when I do this
var[index+0]=3
var[index+1]=3
var[index+2]=3
All the values of var are set to 3
A few side notes:
cudaThreadSynchronize() is deprecated in favor of cudaDeviceSynchronize().
The fact that nsight is reporting an error on the 2nd kernel launch, but your error checking code is not, leads me to believe your error checking code is broken.
Now, regarding your issue, out of resources is frequently due to a code requesting too many registers (too many registers per thread times the number of threads per threadblock requested.) Try re-compiling your code specifying -Xptxas -v to get verbose output, and then recompiling again with -maxrregcount 20 (or something like that) to try to work around this for test purposes.
If this "fixes" your problem, you may then want to consider the following:
See if there is a way you can re-order or restructure your code to reduce the register pressure
If not, then adjust your maxrregcount value upwards to approximately the highest value that will allow your code to compile and run according to the launch configurations (number of threads per block) that you care about. You may also want to benchmark your code at different levels of this setting, as it can affect occupancy. Usually if you have it set to the highest value that will compile and run, then you are limiting yourself to one threadblock per SM at execution time. This may be OK, or there may be a lower setting that is better, allowing two threadblocks per SM residency, and possibly higher performance. Only benchmarking your code will tell.

Resources