Memory carveouts questions - linux-kernel

I am planning to allocate memory which is used by a different processor by making the calls specified in ion_heap_create functions.
Please see: https://android.googlesource.com/kernel/msm/+/android-msm-mako-3.4-jb-mr1/drivers/gpu/ion/ion_heap.c
ion-heap used the following linux functions shown below header file:
http://lxr.free-electrons.com/source/include/linux/genalloc.h#L78
Now, this piece of memory will be used by Another processor for its needs and Linux will not use these.
This is my understanding - So, My question is Will such a thing leave to fragmentation issues.
Suppose it is like this:
|--------------|
| Linux Memory |
|------------- |
| Carveouts |
|------------- |
|Linux Memory |
Question is How does linux handle such a scenario? Does Virtual memory subsystem know anything about the carveouts if so, how does it ensure that
linux process/kernel does not use the memory in the carveouts.

Related

Any way to force rarely-changing compilation units' code and data to "outside edges" of their allocation regions?

I'm using the gcc toolset built into a silicon vendor's software along with a boot loader that can skip any 8K blocks of code whose hash matches what's already in the device, and would like to arrange things to minimize upload time after test builds. Since most changes happen to a small number of compilation units which comprise a small minority of the code image, and the compilation units that change infrequently don't use any symbols which are in the frequently-changing parts, I would expect that one could reduce the number of circumstances requiring changes to the parts of the code image associated with rarely-changing parts of the code by laying out memory as (SU = stable units; OCU = often-changing units)
Flash:
[bot] | startup | SU code | OCU code | [unused] | OCU const | OCU init | SU init | SU const | [top]
RAM:
[bot] | [stack] | SU bss | OCU bss | [unused] | OCU data | SU data | [top]
If each of the SU flash sections only contains references to other things within SU flash or RAM sections, then changes to the OCU portions of the code would not affect the addresses of anything referenced within an SU flash section, and would thus not require any changes to 8K blobs that contain only SU code, SU const, or initial values for SU data.
What would one have to do in a linker script to designate some compilation units as "SU" for purposes of the above layout, and have it organize memory as shown above? Note that stack is being placed at the bottom of RAM to ensure that stack overflow will cause a trap rather than resulting in arbitrary memory corruption, but my main focus is on how to make the addresses of SU parts of the program be independent of the contents of anything else.
If gcc can't handle such top-down placements as indicated here, what would one have to partition storage so that all of the SU parts would start at fixed addresses and accommodate the resulting discontinuity in the flash storage of initial values for the data section?

Kernel stack for a user process | Linux kernel

As per my understanding there is a separate kernel stack for each user process.
How this kernel stack is used, why can't we just use one stack for all the user processes?
How this helps us with preemption?
When the kernel runs in interrupt context, what stack is used?
[EDIT: The architecture of interest is x86]
How this kernel stack is used
It is used for example when usermode process enters kernel through syscall. In syscall handler inside kernel you will use kernel stack for local variables.
why can't we just use one stack for all the user processes?
But how? How they will use it simultaneously on SMP systems? This will lead to data corruptions.
How this helps us with preemption?
I'am not sure what are you asking about. Basically it relates to preemption very indirectly. If you was interrupted by system timer you probably will switch to different thread with different kernel stack. The context may be saved on the top of that stack (I'am not sure if linux implements it the same way). Also there is a preempt_counter thing in linux, which placed on the top of kernel stack. This variable could be incremented/decremented by preempt_disable(enable). What means that kernel thread preemption switched off/on. It is widely used f.e. by spinlocks.
When the kernel runs in interrupt context, what stack is used?
When we go from user -> kernel in this case the following is happens:
The kernel stack is used. The processor switches to this stack defined by the SS0 and ESP0 fields of the TSS.
The processor pushes the exception parameters on the kernel stack
+--------------------+ KSTACKTOP
| 0x00000 | old SS | " - 4
| old ESP | " - 8
| old EFLAGS | " - 12
| 0x00000 | old CS | " - 16
| old EIP | " - 20 <---- ESP
+--------------------+
The processor reads IDT entry N (depending on which IRQ or exception occured) and sets CS:EIP to point to the handler function described by the entry.
The handler function takes control and handles the exception.
Source: https://pdos.csail.mit.edu/6.828/2016/labs/lab3/

How to do a TRUE rescan of PCIe bus

I have an FPGA (Like most of the people asking this question) that gets configured after my Linux kernel does the initial PCIe bus scan and enumeration. As you can guess, the FPGA implements a PCIe endpoint.
I would Like to have the PCIe core re-enumerate the ENTIRE PCIe bus so that my FPGA will then show up and I can load my driver module. I would also like the ability to SWAP the FPGA load out for a different configuration. By this I mean I would like to be able to:
Boot Linux
Configure FPGA
Enumerate PCIe endpoint and load module
Remove PCIe endpoint
Re-configure FPGA
Re-enumerate PCIe endpoint
All without rebooting Linux
Here are solutions that have been proposed elsewhere but do not solve the problem.
echo 1 > /sys/bus/pci/rescan This seems to work (only sometimes) and it does not work if I want to hotswap the FPGA load after it was first enumerated.
Can the Hotplug/power managment facilities of PCIe be used to make this work? If so is there any good resources for how to use the Hotplug system with PCIe? (LDD does not quite cover it thoroughly enough)
Re-enumerating the PCIe bus/tree via echo 1 > /sys/bus/pci/rescan is the correct solution. We are using it the same way as you described it.
We are using echo 1 > $pcidevice/remove to disconnect the driver from the device and to detach the device from the tree. The driver (xillybus) is not unloaded, just disconnected.
A better solution is to rescan only the node where your FPGA is attached to. This reduces the over all impact for the system.
This technique is used in the RC3E FPGA cloud system.
This is really dependent on exactly what is changed on the FPGA. The problem is in how PCIe enumeration and address assignment is done, particularly how the PCIe switches are configured. The allocation MUST be done in one shot as a depth-first search. After this is complete, it is not possible to go insert additional bus numbers or address space without changing all of the subsequent allocations, which would require reloading all of the corresponding device drivers. Basically, once the bus is enumerated and addresses are assigned, you can't change the overall allocations without re-enumerating the entire bus, which requires a reboot. Preallocating resources on a specific PCIe port can alleviate this problem, and is required for PCIe hot plugging.
If the PCIe BAR configuration has not changed, then usually doing a remove/hot reset/rescan is sufficient and no reboots are required.
If the BAR configuration has changed, then it's a different story. If the new BARs are smaller, then there should be no problem. But if the new BARs are larger or there are more BARs, if there isn't enough address space allocated to the switch port that the device is attached to, then those BARs cannot be allocated address space and the device will fail to enumerate. In this case, a reboot is required to so that resources can be reassigned. Don't forget that there are also 32 bit BARs and 64 bit BARs and these BARs are assigned form two different pools of address space, so changing BAR types can also require a reboot to re-enumerate.
If you're going from no device to a device (i.e. blank FPGA to configured FPGA), then bus numbers may need to be reassigned, which requires a reboot.
From The Doctor
Here is how to reset the Vegas before same as a reset in windows. This is based on the Vendor ID.
lspci -n | grep 1002: | egrep -v ".1"| awk '{print "find /sys | grep ""$1"/rescan" -| tac -;"}' | sh - | sed s/^/echo\ 1\ >\ "&/g | sed s/$/"/g
The output of that put in your /etc/rc.local to reset your Vegas after bootup similar to the devcon restart script.
echo 1 > "/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rescan"
echo 1 > "/sys/devices/pci0000:00/0000:00:1c.5/0000:03:00.0/rescan"
echo 1 > "/sys/devices/pci0000:00/0000:00:1d.0/0000:06:00.0/rescan"
echo 1 > "/sys/devices/pci0000:00/0000:00:1d.1/0000:07:00.0/rescan"

Access chip Select register with 8bit bus size

We have a problem to communicate with a register, CS4, at at 0x10020000. In U-boot that reg has the value 0x45fab3c1, but when we try to access it we get: 0x10101010 and we are not able to write too.
With CS3 everything seems ok, we can read and write. CS3 is at: 0x10000000.
The main/only differences between cs3 and cs4 are:
Chip Select: Lp_cs3
Bus size: 32 bit
Bus control: 2 wait state R/W ACK disabled
Allocated size 32Kbyte
Chip Select: Lp_cs4
Bus size: 8 bit
Bus control: 2 wait state R/W ACK disabled
Allocated size: 4 KByte
In userspace we use:
/*————————————————————————————————*/
//code from memedit.c
int fd;
fd = open("/dev/mem", O_SYNC | O_RDWR);
mem = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset & (~4095));
printf("/dev/mem[0x%08x] = 0x%08x", offset, *(unsigned int*)&mem[offset & 4095]);
//to write
*((unsigned int *)&mem[offset & 4095]) = input;
/*————————————————————————————————*/
In our kernel module:
/*————————————————————————————————*/
#define CS4_START 0x10020000U
#define CS4_STOP 0x10040000U
#define CS4_SIZE 0x00020000U
#define CS3_START 0x10000000U
#define CS3_STOP 0x10020000U
#define CS3_SIZE 0x00020000U
void __iomem *cs3_ioaddr = ioremap ((volatile unsigned long)(CS3_START), CS3_SIZE);
printk("We read value at CS3: %x \n\n\n",in_be32(cs3_ioaddr+0x0018));
out_be32(cs3_ioaddr+0x0018,0x00000001);
printk("We read written value: %x \n\n\n",in_be32(cs3_ioaddr+0x0018));
/*————————————————————————————————*/
Chip Select are correctly initialized...
Platform is based on mpc5200b CPU and fpga is a Xilinx Virtex4.
Kernel we use: 2.6.33
More information:
i've tried inn/outb, in_8/out_8... but when i try to read/write with this code inside kernel:
/*----------------*/
static struct device_node *memoria_cs4;
static void __iomem *reg_cs4;
memoria_cs4 = of_find_node_by_path("/localbus/fpga#0,0/cs#0");
reg_cs4 = of_iomap(memoria_cs4, 0);
printk("Value before, at reg_cs4+0x001: %x \n",in_8(reg_cs4+0x323));
out_8(reg_cs4+0x001,0xFA);
printk("Value after, at reg_cs4+0x001: %x \n",in_8(reg_cs4+0x323));
/*----------------*/
i get before and after the same value: 0x10. But the value, i see in uboot, is: 0xFB.
i've tried also inb/outb...
That code, with cs3, but with in_be32/out_be 32, works... naturally i've changed memory location in device tree from cs4 to cs3... But i've tried it also with ioremap(), and the same: cs3 works, but cs4 not...
Thanks again in advance…
neorf
Just to clarify the situation. You have a board with following architecture:
+------------------+
| | BUS3 (CS3) +------------------+
| +------ 32 bit -------+ xilinx |
| | +------------------+
| |
| mpc5200b |
| | BUS4 (CS4) +------------------+
| +------- ? bit -------+ xilinx |
| | +------------------+
| |
+------------------
The board has mpc5200b and xilinx connected throw 2 buses.
Bus3 and bus4 are connected via 'Multi-Function External LocalPlus Bus'.
To write to xilinx via BUS4 (CS4) you must to write to some virtual address,
that will be translated to some physical address.
Virtual addresses for BUS4 inside u-boot and kernel doesn't match.
Then the physical address will be compared with registers 'CS3 Start' / 'CS3 Stop'.
If match, physical transfer will be started on BUS4 (this could be simple checked using oscilloscope).
Physical transfer has a lot of configurable options (address size, data size, read only, write only).
All that options must match hardware design.
(Chip Select 4 Configuration Register, Chip Select 4 Control Register).
You told, what in u-boot BUS3 and BUS4 works. You can write and read correct data.
After that linux kernel starts.
BUS3 continue to work as expected, but BUS4 stop work.
You access CS4 space inside mpc5200b using correct instruction (the same, like in u-boot: out_be32)
using correct Virtual and physical addresses, but BUS4 doesn't work.
The kind of failure, you mentioned, is like an access to wrong address or like an access to correct address, but with wrong access mode.
I think, that LocalPlus Bus registers content is changed by kernel code during kernel boot.
Chip Selects (active low), CS[4] and CS[5] shared with ATA. May be conflict with another kernel driver.
Without hardware design, without kernel source and compilation options, without dts-file content, without u-boot source I can't find more specific answer.

Change user space memory protection flags from kernel module

I am writing a kernel module that has access to a particular process's memory. I have done an anonymous mapping on some of the user space memory with do_mmap():
#define MAP_FLAGS (MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS)
prot = PROT_WRITE;
retval = do_mmap(NULL, vaddr, vsize, prot, MAP_FLAGS, 0);
vaddr and vsize are set earlier, and the call succeeds. After I write to that memory block from the kernel module (via copy_to_user), I want to remove the PROT_WRITE permission on it (like I would with mprotect in normal user space). I can't seem to find a function that will allow this.
I attempted unmapping the region and remapping it with the correct protections, but that zeroes out the memory block, erasing all the data I just wrote; setting MAP_UNINITIALIZED might fix that, but, from the man pages:
MAP_UNINITIALIZED (since Linux 2.6.33)
Don't clear anonymous pages. This flag is intended to improve performance on embedded
devices. This flag is only honored if the kernel was configured with the
CONFIG_MMAP_ALLOW_UNINITIALIZED option. Because of the security implications, that option
is normally enabled only on embedded devices (i.e., devices where one has complete
control of the contents of user memory).
so, while that might do what I want, it wouldn't be very portable. Is there a standard way to accomplish what I've suggested?
After some more research, I found a function called get_user_pages() (best documentation I've found is here) that returns a list of pages from userspace at a given address that can be mapped to kernel space with kmap() and written to that way (in my case, using kernel_read()). This can be used as a replacement for copy_to_user() because it allows forcing write permissions on the pages retrieved. The only drawback is that you have to write page by page, instead of all in one go, but it does solve the problem I described in my question.
In userspace there is a system call mprotect that can modify the protection flags on existing mapping. You probably need to follow from the implementation of that system call, or maybe simply call it directly from your code. See mm/protect.c.

Resources