Difference between Primary GIC vs Secondary GIC in ARM - linux-kernel

As per the kernel documentation of gic device tree bindings
"Primary GIC is attached directly to the CPU and typically has PPIs and SGIs."
"Secondary GICs are cascaded into the upward interrupt controller and do not
have PPIs or SGIs."
I do not find the term primary/secondary GIC in GIC-400.
Can anyone please describe the difference or point to the document which tell something about it.

The terminology Primary and Secondary is Linux terminology. You will not find it in an ARM data sheet. The PPIs are banked per-CPU. The code can execute on any CPU, so only the extra interrupts maybe set. I think this is rather pedantic as most systems will only have one controller.
Find out how many interrupts are supported.
The GIC only supports up to 1020 interrupt sources.
This comes from allocation of the address space on the controller. There is only decode room to fit 1020 external interrupts. Note, the define CONFIG_GIC_NON_BANKED and the note Frankein-GIC without banked registers... It is most normal that in an SMP-ARM that the GIC will service all cores. The variable gic_nr tracks the number of controllers. If you have more that 1020 interrupt sources, you need another controller.
Can anyone please describe the difference?
The primary will service the PPIs. The PPI exist on the 2nd controller, but they are redundant. The 2nd (and higher) controllers are just for external interrupts.

Related

Linux kernel initialization - When are devicetree blobs parsed and tree nodes are loaded?

I would like to establish a milestone roadmap for Linux initialization for me to easily understand. (For an embedded system) Here is what I got:
Bootloader loads kernel to RAM and starts it
Linux kernel enters head.o, starts start_kernel()
CPU architecture is found, MMU is started.
setup_arch() is called, setting CPU up.
Kernel subsystems are loaded.
do_initcalls() is called and modules with *_initcall() and module_init() functions are started.
Then /sbin/init (or alike) is run.
I don't know when exactly devicetree is processed here. Is it when do_initcall() functions are beings processed or is it something prior to that?
In general when devicetree is parsed, and when tree nodes are processed?
Thank you very much in advance.
Any correction to my thoughts are highly appreciated.
It's a good question.
Firstly, I think you already know that the kernel will use data in the DT to identify the specific machine, in case of general use across different platform or hardware, we need it to establish in the early boot so that it has the opportunity to run machine-specific fixups.
Here is some information I digest from linux kernel documents.
In the majority of cases, the machine identity is irrelevant, and the kernel will instead select setup code based on the machine’s core CPU or SoC. On ARM for example, setup_arch() in arch/arm/kernel/setup.c will call setup_machine_fdt() in arch/arm/kernel/devtree.c which searches through the machine_desc table and selects the machine_desc which best matches the device tree data. It determines the best match by looking at the ‘compatible’ property in the root device tree node, and comparing it with the dt_compat list in struct machine_desc (which is defined in arch/arm/include/asm/mach/arch.h if you’re curious).
As for the Linux Initialization, I think there are something we can add in the list.
Put on START button, reset signal trigger
CS:IP fix to the BIOS 0XFFFF0 address
Jump to the start of BIOS
Self-check, start of hardware device like keyboard, real mode IDT & GDT
Load Bootloader like grub2 or syslinux.
Bootloader loads kernel to RAM and starts it (boot.img->core.img).
A20 Open, call setup.s, switch into protected mode
Linux kernel enters head.o, IDT & GDT refresh, decompress_kernel(), starts start_kernel()
INIT_TASK(init_task) create
trap_init()
CPU architecture is found, MMU is started (mmu_init()).
setup_arch() is called, setting CPU up.
Kernel subsystems are loaded.
do_initcalls() is called and modules with *_initcall() and module_init() functions are started.
rest_init() will create process 1 & 2, in other word, /sbin/init (or alike) and kthreadd is run.

What is the equivalent of 'Cache line size' in PCIE?

In PCI configuration space, Cache line size indicates system cacheline size in units of DWORDs. This register must be implemented by master devices that can generate the Memory Write and Invalidate command.
The value in this register is also used by master devices to determine whether to use Read, Read Line, or Read Multiple commands for accessing memory.
Slave devices that want to allow memory bursting using cacheline wrap addressing mode must implement this register to know when a burst sequence wraps to the beginning of the cacheline.
But this field is implemented by PCI Express devices as a read-write field for legacy compatibility purposes but has no effect on any PCI Express device behavior.
Then how PCIe system implements memory-write-invalidate feature ?
The PCIe has a supplement protocol that is called Address Translation Services (ATS), in this protocol, there is a description for invalidation (chapter 3). The bottom line is a MsgD Transaction Layer Packet (TLP) called Invalidate that can do that.
Note that in general, it is completely separate (protocol-wise) from the MWr TLP.
As far as I know, PCIe does not have an explicit message memory write and invalidate. Instead, a root complex that recieves a write that happens to cover an entire cacheline can avoid reading that cacheline and invalidate it immediately.
I think in most cases you would simply generate MaxPayloadSize requests if possible, and hopefully also trigger this behaviour. If you must know the cacheline size from the device, I would suggest designing a device-specific mechanism, and configuring it from your driver.

ensure the DMA -capable memory

I was reading section 'Part Id' of the following document I'm not sure how relevant this document to kernel 2.6.35 for instance; specifically it says:
..the DMA address of the memory must be within the dma_mask of the device..
and they recommend to pass certain flags, such as GFP_DMA, to kmalloc, so that it ensures the memory will fall within DMA mask provided.
However if the memory is allocated from cache pool created by kmem_cache_create, and with kmem_cache_alloc(.. GFP_ATOMIC), this doesn't meet requirements outlined in DMA-API.txt ?
On the other hand, LDD talks about __GFP_DMA flag with regard to legacy ISA devices, therefore I'm not sure this is applicable to PCI/PCIe devices.
This is x86 64-bit platform if it matters:
pci_set_dma_mask(dev, 0xffffffffffffffffULL);
pci_set_consistent_dma_mask(dev, 0xffffffffffffffffULL);
I would appreciate to hear some explanations on it.
For GFP_* for DMA
On x86:
ISA - when using kmalloc() need to bitwise-or GFP_DMA with GFP_KERNEL (or _ATOMIC) because of the following:
GFP_DMA guarantees:
(1) physical addresses are consecutive when get_free_page returns more than one page and
(2) only addresses lower than MAX_DMA_ADDRESS are returned. MAX_DMA_ADDRESS is 16MB on the PC because of ISA constraings
PCI - don't need to use GFP_DMA because there is no MAX_DMA_ADDRESS limit
The dma_mask is checked by the device when calling dma_map_* or dma_alloc_coherent.
dma_alloc_coherent ensures the memory allocated is able to be used by dma_map_* which gives other benifits too. (the implementation may choose to ignore flags that affect the location of the returned memory, like GFP_DMA)
You can refer to http://coweb.cc.gatech.edu/sysHackfest/uploads/58/DMA_howto.1.txt

From Kernel Space to User Space: Inner-workings of Interrupts

I have been trying to understand how do h/w interrupts end up in some user space code, through the kernel.
My research led me to understand that:
1- An external device needs attention from CPU
2- It signals the CPU by raising an interrupt (h/w trance to cpu or bus)
3- The CPU asserts, saves current context, looks up address of ISR in the
interrupt descriptor table (vector)
4- CPU switches to kernel (privileged) mode and executes the ISR.
Question #1: How did the kernel store ISR address in interrupt vector table? It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual? The more detail on this subject the better please.
In user space how can a programmer write a piece of code that listens to a h/w device notifications?
This is what I understand so far.
5- The kernel driver for that specific device has now the message from the device and is now executing the ISR.
Question #3:If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
I might be completely off track here, any guidance would be appreciated.
I am not looking for specific details answers, I am only trying to understand the general picture.
Question #1: How did the kernel store ISR address in interrupt vector table?
Driver calls request_irq kernel function (defined in include/linux/interrupt.h and in kernel/irq/manage.c), and Linux kernel will register it in right way according to current CPU/arch rules.
It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual?
In x86 Linux kernel stores ISR in Interrupt Descriptor Table (IDT), it format is described by vendor (Intel - volume 3) and also in many resources like http://en.wikipedia.org/wiki/Interrupt_descriptor_table and http://wiki.osdev.org/IDT and http://phrack.org/issues/59/4.html and http://en.wikibooks.org/wiki/X86_Assembly/Advanced_Interrupts.
Pointer to IDT table is registered in special CPU register (IDTR) with special assembler commands: LIDT and SIDT.
If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
Driver usually registers some device special file in /dev; pointers to several driver functions are registered for this file as "File Operations". User-space program opens this file (syscall open), and kernels calls device's special code for open; then program calls poll or read syscall on this fd, kernel will call *poll or *read of driver's file operations (http://www.makelinux.net/ldd3/chp-3-sect-7.shtml). Driver may put caller to sleep (wait_event*) and irq handler will wake it up (wake_up* - http://www.makelinux.net/ldd3/chp-6-sect-2 ).
You can read more about linux driver creation in book LINUX DEVICE DRIVERS (2005) by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman: https://lwn.net/Kernel/LDD3/
Chapter 3: Char Drivers https://lwn.net/images/pdf/LDD3/ch03.pdf
Chapter 10: Interrupt Handling https://lwn.net/images/pdf/LDD3/ch10.pdf

Windows kernel memory protection

In Windows the high memory of every process (0x80000000 or 0xc0000000)
Is reserved for kernel code, user code cannot access these regions of memory, if it tries so an access violation exception will be thrown.
I wish to know how is the kernel space protected ?
Is it via memory segmentations or via paging ?
I would like to hear a technical explanation.
Thanks a lot,
Michael.
Assuming you are talking about x86 and x64 architectures.
Memory protection is achieved using the paging system. Each page table entry on an x86/x64 CPU has a bit to indicate whether it is a user or supervisor page. Accesses to supervisor pages are only permitted for code running with CPL<3, whereas accesses to non supervisor pages are possible regardless of CPL.
CPL is the "Current Privilege Level" which is sometimes referred to as Ring. Windows only uses two rings, although the CPU implements 4. Ring 0 is the CPU mode in which what Windows refers to as "kernel mode" runs. Ring 3 is the CPU mode in which "User mode" runs. Since code running at CPL=3 cannot access supervisor pages, this is how memory protection is implemented.
The answer for ARM is likely to be similar, but different.
That's an easy one and doesn't require talking about rings and kernel behavior. Accessing virtual memory at a particular address requires that address to be mapped, the operating system has to allocate a memory page for that address. The low-level winapi function that does that is VirtualAlloc(). Which takes an optional address, first argument. The OS will simply fail a request for an unmappable address. Otherwise the exact same mechanism that prevents you from mapping any address in the lowest 64KB of the address space.

Resources