I've come accross a feature of modern Intel CPUs called MSR 0x2E that is somehow involved in power management functions. I figured that MSR stands for Model Specific Register, and I guess this MSR has been there at least since the Sandy Bridge generation, but possible much longer.
Now, many mainboards since 2012 lock this register at boot, so that any attempts to write to it result in a GPF. This is a problem for a certain popular OS (that shall not be mentioned here), as it cannot be installed on those mainboards without modifying either the board or the OS.
Out of curiosity, what does MSR 0x2E do? And why do most mainboard (MSI, Asrock, Foxconn, ...) vendors lock it?
(For security reasons? Are they selling a premium unlocked version (they do this with VT-X/VT-D which is unavailable on some 'consumer' boards)? Do they have an agreement with Apple or Microsoft? etc...)
Related
I have an ARM Cortex-A53 based embedded system which has 4 cores. It is not implemented with ARM TrustZone.
Is it possible to run the following OSs simultaneously?
Core0:Some type of RTOS
Core1:Some type of RTOS
Core2 and Core3: Linux
All of them use some shared memory space to exchange data.
Boot sequences until loading image(monolithic RTOS and Linux kernel) into DDR are processed by external chip.
Do I need to use a hypervisor, or just treat all cores as independent logical CPUs?
I am not familiar with ARMv8, should I pay additional attentions in setting MMU, GIC, etc. in my case?
That's a very-very vague question, so answer gonna be the same sort.
That's how ARMv8 looks like.
Is it possible to run the following OSs simultaneously?
Yes, there should not be restrictions for that.
All of them use some shared memory space to exchange data.
Yes, you could map same region of physical memory to all of them. How to sync access to that shared memory from different OSs (eg isolated from each other environments) is more important question though.
Boot sequences until loading image(monolithic RTOS and Linux kernel)
into DDR are processed by external chip.
For sure you should have an image of OS in memory before passing control to Kernel entry. So should be done from EL3 or EL2.
Do I need to use a hypervisor, or just treat all cores as independent
logical CPUs?
Yes, you do need hypervisor. That's probably the best way to organise interaction between different OSs.
should I pay additional attentions in setting MMU, GIC, etc. in my
case?
There are MMU for each EL. So MMU-EL0 are totally independent. MMU-EL1 (OS/Kernel) to organise interaction between App in same OS. MMU-EL2 (hypervisor) to organise interaction between different OS. But all in all probably not something special.
GIC, that's depends on how you are gonna organise interrupts. It's possible to route interrupts to all cores or only particular one. Use them to change EL and select which OS is gonna to handle it. So yes, GIC might need quite an attention.
I was reading a book which describe a historical perspective:
Pentium 4E (2004, 125 M transistors). Added hyperthreading, a method to run two programs simultaneously on a single processor, as well as EM64T, Intel’s implementation of a 64-bit extension to IA32 developed by Advanced Micro Devices (AMD), which we refer to as x86-64
I'm a little bit confused here,here is my two questions:
Q1-does it mean that x86-64 is just an alias name of EM64T?
Q2- And is IA32 developed by AMD? isn't IA32 designed by Intel and first implemented in the 80386 microprocessor in 1985? https://en.wikipedia.org/wiki/IA-32
AMD first named its (original) 64-bit ISA version x86-64. Intel later named its (mostly compatible) version EMT64. See here at Intel:
x64 is a generic name for the 64-bit extensions to Intel's and AMD's 32-bit x86 instruction set architecture (ISA). AMD introduced the first version of x64, initially called x86-64 and later renamed AMD64. Intel named their implementation IA-32e and then EMT64. There are some slight incompatibilities between the two versions, but most code works fine on both versions; details can be found in the Intel® 64 and IA-32 Architectures Software Developer's Manuals and the AMD64 Architecture Tech Docs. We call this intersection flavor x64. Neither is to be confused with the 64-bit Intel® Itanium® architecture, which is called IA-64.
So x64 can be considered standard nowadays.
Relating to your second question: Your assumptions are correct. Intel developed the IA32 ISA and AMD then licensed it with complicated contracts.
Q1. x86-64 is a general name for both Intel's and AMD's implementation. AMD's implementation is also called AMD64, Intel's implementation is also called EMT64.
Q2. Yes. But AMD was the first to make 64-bit implementation of it. Intel's IA64 was different, it was not 64-bit implementation of IA32.
I have a PCIe generated core / endpoint with the xilinx core generator tool for a spartan6 fpga on a development board which I have modified a bit to enable MSI and send these every couple of seconds.
Also, I did a simple C kernel module on my linux desktop in which I plugged in the development board. This registers device, allocates memory, enables bus mastership for device and handles the interrupts etc.
What I want to do now is some DMA transfer from the board to the PC, and then will send an interrupt when finished, so that the cpu can go and read it. I'm not a Verilog expert, and the code I have doesn't seem to be capable of any DMA functions.
I couldn't find any relevant information online, so this is my last hope.
Original text from comment above:
Have you implemented a transaction layer above the generated PCIe core? Why don't you use a free PCIe core if your HDL skills are not so high? PCIe is a very big thing....
Yes, the Xilinx IPCore generator adds a very simple PIO interface ontop of the link layer to handle simple PIO transactioons. Note: PIO transaction are outdated and not allowed for new devices.
Currently I know two rather good IPCores:
XILLYBUS
free educational license
create the IPCore for your FPGA device online and download a netlist
free linux and windows drivers (the linux driver will be included in the standard kernel)
8-bit and 32-bit FIFO interface and a memory interface
linux-driver mapps FPGA to /dev/xillybus_read /dev/xillybus_write devices
RIFFA
I'm not sure if this core is still maintained
free driver
it has a strange interface with up to 12 FIFO channels
free HDL sources
All these cores require the Xilinx Core Generator to generate a PCIe core for your device/board. The core itself provides transaction handling, ...
I am very curious in messing up with HW. But my top level "messing" so far was linked or inline assembler in C program. If my understanding of CPU and ring mode is right, I cannot directly from user mode app access some low level CPU features, like disabling interrupts, or changing protected mode segments, so I must use system calls to do everything I want.
But, if I am right, drivers can run in ring mode 0. I actually don´t know much about drivers, but this is what I ask for. I just want to know, is learning how to write your own drivers and than call them the way I should go, to do what I wrote?
I know I could write whole new OS (at least to some point), but what I exactly want to do is acessing some low level features of HW from standart windows application. So, is driver the way to go?
Short answer: yes.
Long answer: Managing access to low-level hardware features is exactly the job of the OS kernel and if you only want access to a single feature there's no need to start your own OS from scratch. Most modern OSes, such as WIndows, Linux, or the BSDs, allow you to add code to the kernel through kernel modules.
When writing a kernel module (or device driver), you write code that is going to be executed inside the OS kernel and will thus be running in CPU ring 0. Great power comes with great responsibility, which in this case means that you should really know what you're doing as there will be no pre-configured OS interface to prevent you from doing the wrong things. You should therefore study the manuals of your hardware (e.g., Intel's x86 software developer's manuals, device specs, ...) as well as standard operating systems development literature (where you're also going to find plenty on the web -- OSDev, OSDever, OSR, Linux Device Drivers).
If you want to play with HW write some programs for 16-bit real-mode (or even with your own transition to protected-mode). There you have to deal with ASM, BIOS interrupts, segments, video memory and a lot of other low-level stuff.
Is there any small tool that gives me access to the data gathered by the Intel CPU Counters (like L1/L2 cache misses, branch prediction failures ... you know there are hunderts of them on modern Core2 CPU's).
It must work on Windows (while being able to use it with Solaris, FreeBSD, Linux, MacOSX would of course be nice).
Check out the Intel PCM (Performance Counter Monitor) tool which does exactly what you want to do.
Link: https://software.intel.com/en-us/articles/intel-performance-counter-monitor-a-better-way-to-measure-cpu-utilization
Intel PCM provides a rich API that allows you to instrument your code. Furthermore, to date, PCM is the only tool to read uncore events too.
This thread seems a little old but if you're still interested, I wrote a howto recently on this topic using nothing more than rdmsr and wrmsr in Linux. It only deals with the performance counters on an Intel uncore for Westmere, but the process I described might help you figure out what you need if you haven't already. I'm sure Windows has some equivalent program or function call to RDMSR and WRMSR. The problem is you need to be ring 0 (kernel mode) to read MSRs. I have no idea how to do that in Windows. I won't be able to help with any Windows questions but may be able to answer some MSR-related questions if you have any. I'm by no means an expert though.
PAPI is a very promising lead, however, I believe they discontinued support for Windows (and therefore .NET C#) quite a few years ago.
On the windows front, Visual Studio 2010 Premium comes with performance explorer. If you run any project or binary in instrumentation mode, you can get access to hardware events such as instructions retired.
The results can be somewhat mixed and inconsistent depending external factors, but it integrates with Visual Studio nicely and you get detailed counts (avg, maximum, total) on a per method/module level.
Intel V-tune performance analyzer also exposes these natively. I haven't played with this tool yet but it might be a more flexible API than what Visual Studio 2010 exposes.
You didn't write of your are looking for a application or for a library.
For Windows there is Intel VTune. But this not exactly an small tool. For linux I have used oprofile, which works without kernel patches.
On OS X, Shark lets you get data from the PMCs. I'm not sure what's available on Windows other than Intel's tools (VTune, as mentioned by drhirsch).
Try this
http://icl.cs.utk.edu/papi/
It is a full library that allows you to read any CPU counters data, works both on Windows and Linux [and other OS]
This thread looks pretty old. But still, all the above mentioned counters are available at Intel PCM .These counters can be used as a Microsoft Perfmon plugin or a command prompt interface. The Intel PCM gives informations like L2 and L3 cache hit ratio, cache misses etc.