I want to know which core of a multicore processor initializes first when the cpu boots ? ( i mean at the bootloader level ) is the first core ? or random core ?
You want to read the local apic, which you can read about in "volume 2a":
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
Each processor has a corresponding local apic, in each local apic there is an apic ID register, which gets assigned a unique value at system init time.
The initial core that comes online is called the bootstrap processor (BSP), and can indeed be any physical core on the die. More info is in "volume 3a", where they talk about the bootstrap processor selection process.
Here is an excerpt from vol3a:
8.4.1 BSP and AP Processors
The MP initialization protocol defines two classes of processors: the bootstrap processor (BSP) and the application processors (APs). Following a power-up or RESET of an MP system, system hardware dynamically selects one of the processors on the system bus as the BSP. The remaining processors are designated as APs.
As part of the BSP selection mechanism, the BSP flag is set in the IA32_APIC_BASE MSR (see Figure 10-5) of the BSP, indicating that it is the BSP. This flag is cleared for all other processors.
The BSP executes the BIOS’s boot-strap code to configure the APIC environment, sets up system-wide data structures, and starts and initializes the APs. When the BSP and APs are initialized, the BSP then begins executing the operating-system initialization code.
This depends on the architecture of the processor itself. There is not really a standard for something like this. For instance the PS3 core has 9 cores one which schedules tasks to the other 8. In this case it is fair to think about it in terms of that one core processing instructions before the other 8. As far as other processors are concerned this is a more difficult thing to discern. It would be sensible to assume that the bootloader sends its instructions to the set of cores at which point whatever logic gates assign instructions to cores do so in whatever manner they always to. In most cases I know of there is not really a difference between task scheduling at boot and at any other time. The most basic task scheduling hardware will just select the first available core which is usually whichever core is considered the "first" one by the machine. But like I keep saying different machines do it differently so I would suggest finding out what core you are using and checking what that one does. Good Luck.
Each processor has its own local APIC with related local APIC ID, this one may be read from local APIC register (the same one give different ID on each processor)
Related
I would like to establish a milestone roadmap for Linux initialization for me to easily understand. (For an embedded system) Here is what I got:
Bootloader loads kernel to RAM and starts it
Linux kernel enters head.o, starts start_kernel()
CPU architecture is found, MMU is started.
setup_arch() is called, setting CPU up.
Kernel subsystems are loaded.
do_initcalls() is called and modules with *_initcall() and module_init() functions are started.
Then /sbin/init (or alike) is run.
I don't know when exactly devicetree is processed here. Is it when do_initcall() functions are beings processed or is it something prior to that?
In general when devicetree is parsed, and when tree nodes are processed?
Thank you very much in advance.
Any correction to my thoughts are highly appreciated.
It's a good question.
Firstly, I think you already know that the kernel will use data in the DT to identify the specific machine, in case of general use across different platform or hardware, we need it to establish in the early boot so that it has the opportunity to run machine-specific fixups.
Here is some information I digest from linux kernel documents.
In the majority of cases, the machine identity is irrelevant, and the kernel will instead select setup code based on the machine’s core CPU or SoC. On ARM for example, setup_arch() in arch/arm/kernel/setup.c will call setup_machine_fdt() in arch/arm/kernel/devtree.c which searches through the machine_desc table and selects the machine_desc which best matches the device tree data. It determines the best match by looking at the ‘compatible’ property in the root device tree node, and comparing it with the dt_compat list in struct machine_desc (which is defined in arch/arm/include/asm/mach/arch.h if you’re curious).
As for the Linux Initialization, I think there are something we can add in the list.
Put on START button, reset signal trigger
CS:IP fix to the BIOS 0XFFFF0 address
Jump to the start of BIOS
Self-check, start of hardware device like keyboard, real mode IDT & GDT
Load Bootloader like grub2 or syslinux.
Bootloader loads kernel to RAM and starts it (boot.img->core.img).
A20 Open, call setup.s, switch into protected mode
Linux kernel enters head.o, IDT & GDT refresh, decompress_kernel(), starts start_kernel()
INIT_TASK(init_task) create
trap_init()
CPU architecture is found, MMU is started (mmu_init()).
setup_arch() is called, setting CPU up.
Kernel subsystems are loaded.
do_initcalls() is called and modules with *_initcall() and module_init() functions are started.
rest_init() will create process 1 & 2, in other word, /sbin/init (or alike) and kthreadd is run.
I am going through the Uboot & kernel startup process. What exactly is the use of the FDT (Flat device tree) ?
Many link i have read they state that uboot pass the board & SOC configuration information to Kernel in the form of FDT
https://wiki.freebsd.org/FlattenedDeviceTree
Why kernel need the board configuration information ?
I am asking this question because when ever we make device driver in linux we use to initialize the device at probe() or module_init() call & use request_mem_region() & ioremap() function to get the range of address
& then set the clock & other register of the driver.
What does request_mem_region() actually do and when it is needed?
Now if my device drivers for onchip & offchip devices are doing the full board initialisation.
Then what is the use of flattened device tree for the kernel ?
You are right in assuming that the board files and device-trees are required for initialisation of on-chip blocks and off-chip peripherals.
While booting-up, the respective drivers for all on-chip blocks of an SoC and off-chip peripherals interfaced to it need to be "probed" i.e. loaded and called. On bus-es like USB and PCI, the peripherals can be detected physically and enumerated and their corresponding driver probed. However in general such a facility is NOT available is case of the rest of the peripherals on the rest of the buses like I2C, SPI etc.
In addition to above, when the device-driver is probed, one also needs to provide some information to it about the way in which we intend to configure and utilise the hardware. This varies depending upon the use case. For example the baud-rate at which we would like to operate an UART port.
Both the above classes of information i.e.
Physical Topology of the hardware.
Configuration options of the hardware.
were usually defined as structs within the "board" file.
However using the board-file approach required one to re-build the kernel even to simply modify a configurable option to a different value during initialisation. Also when several physical boards differing slightly in their topology/configuration exist, the "board" file approach becomes too cumbersome to maintain.
Hence the interest in maintaining this information separately in a device-tree. Any device-driver can parse the relevant branches and leaves of the device-tree to obtain the information it requires.
When developing your own device-driver, if your platform supports the device-tree, then you are encouraged to utilise the device tree to store the "platform data" required by your device-driver. This should help you clearly separate :
the generic driver code for your device in the <driver.c> file and
the device's config options specific to this platform into the device-tree.
A step-by-step approach to porting the Linux kernel to a board/SoC should help you appreciate the nuances involved and the advantages of using a device-tree.
In kernel 3.11.0, in the struct perf_event_attr, there are three members named exclude_hv/exclude_host/exclude_guest.
I know the exclude_host field is to exclude events generated by the host when running kvm. But what is the meaning of exclude_hv? Is it used in the Xen?
What is the mechanism in hardware that supports the function of exclude_host? As far as I know, in the performance monitoring select registers, there are no such bits that control the event counter to exclude events generated by the host.
This is a bit old but for those looking at the answer, as me:
exclude_hv: do not count events that occur in the hypervisor.
The distinction between events occurred in user space, kernel, hypervisor, host, etc is done in software. The kernel and/or hypervisor will retire and replace the event count and configuration on each change of context.
Here is an excellent description of perf_events, which is the kernel module that handles the performance counters.
From my understanding, after a PC/embedded system booted up, the OS will occupy the entire RAM region, the RAM will look like this:
Which means, while I'm running a program I write, all the variables, dynamic memory allocated in the stacks, heaps and etc, will remain inside the region. If I run firefox, paint, gedit, etc, they will also be running in this region. (Is this understanding correct?)
However, I would like to shrink the OS region. Below is an illustration of how I want to divide the RAM:
The reason that I want to do this is because, I want to store some data receive externally through the driver into the Custom Region at fixed physical location, then I will be able to access it directly from the user space without using copy_to_user().
I think it is possible to do that by configuring u-boot, but I have no experience in u-boot, can anyone give me some directions where to begin with, such as: do I need to modify the source of u-boot, or changing the environment variables of u-boot will be sufficient?
Or is there any alternative method of doing this?
Any help is much appreciated. Thanks!
p/s: I'm using TI ARM processor, and booting up from an SD card, I'm not sure if it matters.
The platform is ARM. min_addr and max_addr will not work on these platform since these are for Intel-only implementations.
For the ARM platform try to look at "mem=size#start" kernel parameter. Read up on Documentation/kernel-parameters.txt and arch/arm/kernel/setup.c. This option is available on most new Linux code base (ie. 2.6.XX).
You need to set the following parameters:
max_addr=some_max_physical
min_addr=some_min_physical
to be passed to the kernel through uboot in the 'bootargs' u-boot environment variable.
I found myself trying to do the opposite recently - in other words get Linux to use the additional memory in my system - although I'm using Barebox rather than u-boot on a OMAP4 platform.
I found (a bit to my surprise) that once the Barebox MLO first stage boot-loader was aware of the extra RAM, the kernel then detected and used it as well without any bootargs. Since the memory size is not passed anywhere on the boot-line, I can only assume the kernel inspects the memory mappings set up by the boot-loader to determine RAM size. This suggests that modifying your u-boot to not map all of the RAM is the way to go.
On the subject of boot-args, there was a time when you it was recommended that you mapped out a chunk of RAM (used by the frame buffer?) on OMAP4 systems, using the boot-line. It's still unclear whether this is still necessary.
If I understand correctly, a memory adderss in system space is accesible only from kernel mode. Does it mean when components mapped in system space are executed the processor must be swicthed to kernel mode?
For ex: the virtual memory manager is a frequently used component and is mapped in system space. Whenever the VMM runs in the context of user process (lets say it translated an address), does the processor must be swicthed to kernel mode?
Thanks,
Suresh.
Typically, there's 2 parts involved.The MMU(Memory manage unit) which is a hardware component that does the translation from virtual addresses to physical addresses. And the operating system VM subsystem.
The operating system part needs to run in privileged mode (a.k.a. kernel mode) and will set up/change the mapping in the MMU based on the the user space needs.
E.g. to request more (virtual) memory, or map a file into memory, a transition to kernel mode is needed and the VM subsystem can change the mapping of the process.
Around this there's often a ton of tricks to be made - e-g. map the whole address space of the kernel into the user process virtual space, but change its access so the process can't use that memory - this means whenever you transit to kernel mode you don't need to reload the mapping for the kernel.
Taking your example of the virtual memory manager, it never actually runs in user space. To allocate memory, user mode applications make calls to the Win32 API (NTDLL.DLL as one example) to routines such as VirtualAlloc.
With regards to address translation, here's a summary of how it works (based on the content from Windows Internals 5th Edition).
The VMM uses page tables which the CPU uses to translate virtual addresses to physical addresses. The page tables live in the system space. Each table contains many PTEs (page table entries) which stores the physical address to which a virtual address is mapped. I won't go into too much detail here, but the point is that all of the VMM's work is performed in system space and not in user space.
As for context switching - when a thread running in user space needs to run in the system space, then a context switch will occur. Since the memory manager lives in system space, it's threads never need to make a context switch, since it already lives in the system space.
Apologies for the simplistic explanation, this is quite a complicated topic of discussion in depth. I would highly recommend that you pick up a copy of Windows Internals as this sounds like it would come in handy for you.