First physical core to boot - linux-kernel

As I was reading through the kernel source code I noticed that a mapping between the physical core id and the virtual core number is being created. This could be because there is some degree of uncertainty in the order in which the cores are brought up.
In a multi-core system, which physical core is the first to boot? Is it always physical core #0? Does this hold for x86, x64, ARM and ARM64?

According to the Intel SDM, in recent Intel processors the selection of the bootstrap processor (BSP) is handled either "through a special system bus cycle" or "by platform-specific arrangement of the combination of hardware, BIOS, and/or configuration input options."
In my experience (with Intel processors only), the BSP always has APIC ID 0 (although this is not guaranteed). However, I don't know whether that means that it is always the same physical core within the processor, or even if there is any way to tell.
For more information, see section 8.4 of the Intel SDM, volume 3A.

Related

what PSH kernel in intel edison mean? Is it the name of primary bootloader present inside ROM?

I was going through the logs after booting up the intel edison. I came accross the word. Is it the name of bios?Does it do some security verification like key matching/checking and all ?
Intel Edison board, more precisely Intel Tangier SoC, has a Minute IA (i486+, also known as Pentium ISA microarchitecture) based MCU (for example, Intel Quark D2000 SoC has it as far as I know) which is part of so called Platform Services Hub (PSH). PSH has own Page Cache (to keep RTOS and its applications), LAPIC. The peripheral, such as DMA and I2C, is shared with System Controller Unit (SCU). SCU actually controls PSH.
When system starts MCU boots first. Inside it is a Viper RTOS with some modifications, i.e. it has a library to support sensors.
There is no information available from Intel regarding use of open source RTOS, such as Zephyr, on PSH.

ARM big.LITTLE Performance Counters

I have recently purchased a development board utilizing Samsung Exynos5422 application processor (Cortex™-A15 2.0Ghz quad core and Cortex™-A7 quad core CPUs). I have tried to extract the performance counters in android using perf v3.0.8; however, none of the counters outputs a value (They are all "not counted"). Does anyone know how to solve this issue?
(The kernel version is 3.10.9)

How can a 4GB process run on only 2 GB RAM?

Given a 32-bit/64-bit processor can a 4GB process run on 2GB RAM. Will it use virtual memory or it wont run at all?
This is HIGHLY platform dependent. On many 32bit OS's, no single process can ever use more than 2GB of memory, regardless of the physical memory installed or virtual memory allocated.
For example, my work computers use 32bit Linux with PAE (Physical Address Extensions) to allow it to have 16GB of RAM installed. The 2GB per process limit still applies however. Having the extra RAM simply allows me to have more individual processes running. 32bit Windows is the same way.
64bit OS's are more of a mixed bag. 64bit Linux will allow individual processes to map memory well in excess of 32GB (but again, varies from Kernel to Kernel). You will be limited only by the amount of Swap (Linux virtual memory) you have. 64bit Windows is a complete crap shoot. Certain versions will only allow 2GB per process, but most will allow >32GB limited only by the amount of Page File the user has allocated.
Microsoft provides a useful table breaking down the various memory limits on various OS versions/editions. Unfortunately there is no such table that I can find with cursory searching for Linux since it is so fragmented.
Short answer: Depends on the system.
Most 32-bit systems have a limitation of 2GB per process. If your system allows >2GB per process, then we can move on to the next part of your question.
Most modern systems use Virtual Memory. Yet, there are some constrained (and various old) systems that would just run out of space and make you cry. I believe uClinux supports both MMU and MMU-less architectures. Most 32-bit processors have a MMU (a few don't, see ARM Cortex-M0) and a handful of 16-bit or 8-bit have it as well (see Atmel ATtiny13A-MMU and Atari MMU).
Any process that needs more memory than is physically available will require a form of Memory Swap (e.g., a partition or file).
Virtual Memory is divided in pages. At some point, a page reside either in RAM or in Swap. Any attempt to access a memory page that's not loaded in RAM will trigger an interruption called Page Fault, which is handled by the kernel.
A 64-bit process needing 4GB on a 64-bit OS can generally run in 2GB of physical RAM, by using virtual memory, assuming disk swap space is available, but performance will be severely impacted if all of that memory is frequently accessed.
A 32-bit process can't address exactly 4GB of memory in practice (some address space overhead is required by the operating system), so it won't run. Depending on the OS, it can probably run a process that needs > 2GB and < 3-4GB.

Which Virtual-memory translation technique (consider x86) is used generally?

It is known that there are different kinds of Virtual-address translations (x86) with the help of MMU such as Segmentation, Paging, Combined Segmentation-Paging (Paged segmentation, Segmented Paging), etc. Each having its own advantages and disadvantages.
My Questions:
1) Are general Operating systems like Linux/Windows use only one particular technique (like Paging)?
2) If 2 or more techniques are available for a given OS, when & where each one will be used?
Can we customize according to our needs?
3) If 'Paging' only is used, then x86's segment registers are used in what ways?
In general, modern operating systems on x86 use paging and not segmentation. This means the base address of segment registers is set to zero and the segment limit is set to the maximum. Paging is used to map virtual addresses to physical addresses, this gives the OS fine grained control of the address space of a process, protection between processes, and protection between privileged (kernel) and user address space. Segments are still used in x86 for special purposes:
to run legacy operating systems and applications in a virtual environment
to efficiently access thread local storage for each thread in a multithreaded application (with thanks to #PaulA.Clayton for pointing this out).
Microsoft Windows switched from segmentation to a flat, linear memory model with Windows 95.
http://technet.microsoft.com/en-us/library/cc751120.aspx
Windows 95 addresses this issue by using the 32-bit capabilities of the 80386 (and above) processor architecture to support a flat, linear memory model for 32-bit operating system functionality and Win32-based applications. A linear addressing model simplifies the development process for application vendors, and removes the performance penalties imposed by the segmented memory architecture.
In order to run old Win16 applications (Windows 3.1) Windows 95 ran a 16-bit virtual machine where all Win16 applications ran. Newer, 32-bit applications ran in separate address spaces using the paging facility of the MMU.
Here's the relevant description from the link above

Detecting HyperThreading without CPUID?

I'm working on a number-crunching application and I'm trying to squeeze all possible performance out of it that I can. I'm designing it to work for both Windows and *nix and even for multi-CPU machines.
The way I have it currently set up, it asks the OS how many cores there are, sets affinity on each core to a function that runs a CPUID ASM command (yes, it'll get run multiple times on the same CPU; no biggie, it's just initialization code) and checks for HyperThreading in the Features request of CPUID. From the responses to the CPUID command it calculates how many threads it should run. Of course, if a core/CPU supports HyperThreading it will spawn two on a single core.
However, I ran into a branch case with my own machine. I run an HP laptop with a Core 2 Duo. I replaced the factory processor a while back with a better Core 2 Duo that supports HyperThreading. However, the BIOS does not support it as the factory processor didn't. So, even though the CPU reports that it has HyperThreading it's not capable of utilizing it.
I'm aware that in Windows you can detect HyperThreading by simply counting the logical cores (as each physical HyperThreading-enabled core is split into two logical cores). However, I'm not sure if such a thing is available in *nix (particularly Linux; my test bed).
If HyperTreading is enabled on a dual-core processor, wil the Linux function sysconf(_SC_NPROCESSORS_CONF) show that there are four processors or just two?
If I can get a reliable count on both systems then I can simply skip the CPUID-based HyperThreading checking (after all, it's a possibility that it is disabled/not available in BIOS) and use what the OS reports, but unfortunately because of my branch case I'm not able to determine this.
P.S.: In my Windows section of the code I am parsing the return of GetLogicalProcessorInformation()
Bonus points: Anybody know how to mod a BIOS so I can actually HyperThread my CPU ;)? Motherboard is an HP 578129-001 with the AMD M96 chipset (yuck).

Resources