How are memory mapped devices allocated an address and how does the CPU know what it is - cpu

I gather that the main ways of the CPU addressing devices are "port" and "memory" mapped.
In both of these:
How are devices dynamically assigned an address - who assigns it and how?
How does the CPU then know a device exists, has been assigned and what the address is, particularly its running programs? (how does this work both if the computer is on and off)
How do interrupts work with these devices?
What's the distinction between what the OS and the hardware does?
Is it fair to say that Memory Mapped is the dominant approach in modern systems?
Realise this might be a lot in one go but thanks in advance!

In general, CPU does not know that a specific address is a memory mapped.
it's SW responsibility (BIOS/drivers mainly) to put define the address range as uncacheable (so each read/write will go through to the device and not held internally until WB), out of the core there is some mapping that redirect specific addresses to a device rather than to the DDR (memory).
short answers to part of your bullets (I'm not sure I understand all the questions):
How are devices dynamically assigned an address - who assigns it and how?
Either BIOS define such ranges (the driver communicates on a new device to the BIOS, BIOS save some addresses for plug and play devices)
How does the CPU then know a device exists, has been assigned and what the address is, particularly its running programs? (how does this work both if the computer is on and off)
The CPU doesn't know that, these addresses are treated as normal uncacheable addresses.
Is it fair to say that Memory Mapped is the dominant approach in modern systems?
Yes, it's easier to treat it just another place in memory (it also a bit faster).

Related

Mapping device memory into user process address space

While I was going through the LDD3 book, in chapter 15, (memory mapping and dma), the introduction of mmap call says:
mmap() system call allows mapping of device memory directly into user process address space.
The confusion is regarding the address space. Why would device memory be mapped to user space since the kernel only takes care of the device. Why would one need to map it in user space. If the device memory is mapped in user space, why kernel manages it then? And what if the device could be used erroneously if it lies in user address space?
Please correct me if I am wrong, I am just new to it.
Thanks
The same chapter you are referring to, has the answer to your question.
A definitive example of mmap usage can be seen by looking at a subset of the virtual memory areas for the X Window System server. Whenever the program reads or writes in the assigned address range, it is actually accessing the device. In the X server example, using mmap allows quick and easy access to the video card’s memory. For a performance-critical application like this, direct access makes a large difference.
...
Another typical example is a program controlling a PCI device. Most PCI peripherals map their control registers to a memory address, and a high-performance application might prefer to have direct access to the registers instead of repeatedly having to call ioctl to get its work
done.
But you are correct that usually kernel drivers handle devices without revealing device memory to user space:
As you might suspect, not every device lends itself to the mmap abstraction; it makes no sense, for instance, for serial ports and other stream-oriented devices. Another limitation of mmap is that mapping is PAGE_SIZE grained.
In the end, it all depends on how you want your device to be used from user space:
which interfaces you want to provide from driver to user space
what are performance requirements
Usually you hide device memory from user, but sometimes it's needed to give user a direct access to device memory (when alternative is bad performance or ugly interface). Only you, as an engineer, can decide which way is the best, in each particular case.
There are few usages I can think of:
user mode drivers - in this case kernel driver is only posing as a stub: for mapping memory to user space, passing interrupts, etc. (this is common for proprietary drivers).
some user space application is filling or reading DMA buffers directly, to avoid copying them between user space and kernel space.
Regards,
Mateusz.

How does the CPU know where to look for a given physical memory address?

If I understand correctly, whenever a CPU is turned on it jumps to a manufacturer hardwired default physical memory address and starts executing the binary code present there. Naturally in a PC the default location maps to the ROM containing the BIOS.
Suppose I have a system with RAM installed as well, how does the CPU know that it is supposed to search for that specific address in the ROM chip containing the BIOS and not RAM's address?
Do the manufacturers of the Motherboard and the RAM have some standard or contract where they agree that the memory addresses of their hardware will never overlap?
I think you will see this diagram very often:
It summarize several things (I assumed you are familiar with the definition of "Physical Addresses" and "Virtual Addresses":
All program and wires within the CPU, always communicate and exchange among themselves "virtual addresses". You will never encounter any "physical addresses".
To address any DRAM outside the CPU, you will need "physical addresses".
The same "virtual addresses", can map to different "physical addresses". Eg, the following instruction:
load eax, (virtual_address_XXXX) (load from memory into EAX)
Same assembly instruction, but running under different processes, will result in accessing different parts of physical memory. (this is done through pagetable + MMU)
Translation from virtual address to physical address will need the MMU. Any electrical signals detected outside the CPU, is always at the physical addresses level. So all hardware devices (eg, memory) have to recognize that.
https://www.slideserve.com/stacie/computer-architecture-memory-management-units
So to start off, your question in the "title" is not really correct - the CPU does not know and see any "physical addresses", it is the hardware devices. But all get translated by the MMU (or IOMMU) (can cached by the TLB).
Note that some CPU does not have MMU. So "physical" and also the same value as "virtual address".
Let us take it step by step in a quick way:
When you reset your PC, it actually runs the code from BIOS.
Bios code is code written by Motherboard manufacturer to boot up the board. Then and after Bios finish its job. MBR; Master Boot Record, will run it is a piece of code written at the head of your HDD (address 0) when you installed you operating system, say windows or Linux.
This piece of code is responsible for jumping to your windows drive to start it which is called bootloader. So,
BIOS(Non-Volatile Memory) -> MBR(HDD) -> OS
If you think about it, you can find that there is an option of booting sequence inside BIOS to identify which is MBR should be read from HDD, DVD, ...etc
Read more about BIOS:
https://en.wikipedia.org/wiki/BIOS
Read more about MBR:
https://en.wikipedia.org/wiki/Master_boot_record

How does MMU deal with Memory mapped registers?

Am I correct when I say that addresses of memory mapped registers are always physical addresses?
If yes then how does MMU deal with these addresses and decide not to do virtual to physical translations for them?
The MMU doesn't decide anything. It merely maps addresses according to what it has been told by the OS: virtual addresses to physical ones, and/or interrupts the application program if the mapping for a particular physical address is marked as "invalid" or somehow inconsistent with the operation of the current machine instruction (e.g., for instruction fetches, "not executable", for stores, "read only", etc.).
The operating system establishes a set of rules and conventions that ensure that applications cannot create greif for one another. If writing to memory-mapped I/O devices is OK for this OS, then the OS will set MMU mappings (e.g. page map registers) to allow it; otherwise it will not set MMU pages to map to I/O devices.
For most general purpose OSes, allowing arbitrary programs to write to I/O registers is a definition of "causes grief" and they simply never set up such a mapping. This is how Windows acts from the point of user processes.
For special purpose OSes, have separate processes share I/O pages may be fine, especially if the processes running are trusted (e.g, part of the OS or pass some certification authority who asserts good quality). Then multiple trusted processes might share memory-mapped I/O devices safely and conveniently. Even untrusted processes can be run on such an OS; it simply doesn't give them access to I/O.
Back in 1972, I built a unique virtual memory 16 bit minicomputer. The MMU had two kinds of page mappings: mapping of virtual pages to physical (as you'd expect), and mapping of a page to a single 32 byte I/O device. What this means is that the OS can hand any process any device (not critical to OS function) safely.
In particular, it meant that each I/O driver has its own address space; if it screwed up, no problem. You could debug device drivers while running the OS without fear. (Windows suffered from I/O driver corruption destroying windows for years; still does I think but their quality control "trustedness checking" is wicked strong now).
Alas, it wasn't a commercial success. I was forced to go into software to make a living :-{
You are correct.
All registers and or memory locations within a processor's memory map are physical addresses.
Virtual to physical translations are done by the MMU and only occur within contiguous blocks of memory in which code can be executed from. i.e RAM or internal flash. No virtual to physical translations occur when other parts of the memory map are accessed, because they do not interact with the MMU.

What happens when we plug a piece of hardware into a computer system?

When we plug a piece of hardware into a computer system, say a NIC (Network Interface Card) or a sound card, what happens under the hood so that we coud use that piece of hardware?
I can think of the following 2 scenarios, correct me if I am wrong.
If the hardware has its own memory chips, someone will arrange for a range of address space to map to those memory chips.
If the hardware doesn't have its own memory chips, someone will allocate a range of address in the main memory of the computer system to accomodate that hardware.
I am not sure the aforemetioned someone is the operating system or the CPU.
And another question: Does hardware always need some memory to work?
Am I right on this?
Many thanks.
The world is not that easily defined.
first off look at the hardware and what it does. Take a mouse for example, it is trying to deliver x and y coordinate changes and button status, that can be as little as a few bytes or even a single byte two bits define what the other 6 mean, update x, update y, update buttons, that kind of thing. And the memory requirement is just enough to hold those bytes. Take a serial mouse there is already at least one byte of storage in the serial port so do you need any more? usb, another story just to speak usb back and forth takes memory for the messages, but that memory can be in the usb logic, so do you need any more for such small information.
NICs and sound cards are another category and more interesting. For nics you have packets of data coming and going and you need some buffer space, ring, fifo, etc to allow for multiple packets to be in flight in both directions for efficiency and interrupt latency and the like. You also need registers, these have their storage in the hardware/logic itself and wont need main memory. In both the sound card case and the nic case you can either have memory on the board with the hardware or have it use system memory that it can access semi-directly (dma, etc). Sound cards are similar but different in that you can think of the packets as being fixed sized and continuous. Basically you need to ping-pong buffers to or from the card at some rate, 44100khz 16 bit per sample stereo is 44100 * 2 * 2 = 176400 bytes per second, say for example the driver/software is preparing the next 8192 bytes at a time and while the hardware is playing the pong buffer software is filling the ping buffer, when hardware drains the pong buffer it indicates this to the software, starts draining the ping buffer and the software fills the ping buffer.
All interesting stuff but to get to the point. With the nic or sound card you could have as little as two registers, an address/command register and a data register. Quite painful but was often used in the old days in restricted systems, still used as well. Or you could go to the other extreme and desire to have all of the memory on the device mapped into system memory's address space as well as each register having its own unique address. With audio you dont really need random access to the memory so you dont really need this, graphics you do, nic cards you could argue do you leave the packet on the nic or do you make a copy in system memory where you can have a much larger software buffer/ring freeing the hardwares limited buffer/ring. If on nic then you would want random access, if not then you dont.
For isa/pci/pcie, etc on x86 systems the hardware is usually mapped directly into the processors memory space. So for 32 bit systems you can address up to 4GB, well even if you have 4GB worth of memory some of that memory you cannot get to because video cards, hardware registers, PCI, etc consume some of that address space (registers or memory or both, whatever the hardware was designed to use). As distasteful as it may appear to day this is why there was a distiction between I/O mapped I/O and memory mapped I/O on x86 systems, its another address bit if you will. You could have all of your registers in I/O space and not lose memory space, and map memory into nice neat aligned chunks, requiring less of your ram to be replaced with hardware. either way, isa had basically vendor specific ways of mapping into the memory space available to the isa bus, jumpers, interesting detection schemes with programmable address decoders, etc. PCI and its successors came up with something more standard. When the computer boots (talking x86 machines in general now) the BIOS goes out on the pcie bus and looks to see who is out there by talking to config space that is mapped per card in a known place. Using a known protocol the cards indicate the desired amount of memory they require, the BIOS then allocates out of the flat memory space for the processor chunks of memory for each device and tells the device what address and how much it has been allocated. It is certainly possible for the operating system to re-do or override this but typically the BIOS does this discovery for the system and the operating system simply reads the config space on each device which includes the vendor id and device id and then knows how and where to talk to the device. For this memory space I believe the hardware contains the memory/registers. For general system memory to dma to/from I believe the operating system and device drivers have to provide the mechanism for allocating that system memory then telling the hardware what address to dma to/from.
The x86 way of doing it with the bios handling the ugly details and having system memory address space and pci address space being the same address space has its pros and cons. A pro is that the hardware can easily dma to/from system memory because it does not have to know how to get from pcie address space to system address space. The negative is the case of a 32 bit system where pcie normally consumes up to 1GB of address space and the dram you bought for that hole is not available. The transition from 32 bit to 64 bit is slow and painful, the bioses and pcie chips are still limiting to the lower 4gig and limiting to 1gb for all the pcie devices, even if the chipset has a 64 bit mode, and this is with 64 bit processors and more than 4gb of ram. the mmu allowes for fragmented memory so that is not an issue. Slowly the chipsets and bioses are catching up but it is taking time.
USB. these are serial mostly master/slave protocols. Like a serial port but bigger and faster and more complicated, and like a serial port both the master and slave hardware need to have ram to store the messages, very much like a nic. Like a nic, in theory, you can be register based and pull the memory sequentially or have it mapped in to system memory and have random access to it, etc. Think of it this way, the usb interface can/does sit on a pcie interface even if it is on the motherboard. A number of devices are pcie devices on your motherboard even if they are not an actual pcie connector with a card. And they fall into the pcie cagetory of how you might design your interface or who has what memory where.
Some devices like video cards have lots of memory on board, more than is practical or is at least painful to allow all of it to be mapped into pcie memory space at once. And these would want to use a sliding window type arrangement. Tell the video card you want to look at address 0x0000 in the video cards address space, but your window may only be 0x1000 bytes (for example) in system/pcie space. When you want to look at addresses 0x1000 to 0x1FFF in video memory space you write some register to move the window then the same pcie memory space accesses different memory on the video card.
x86 being the dominant architecture has this overlapped pcie and system memory addressing thing but that is not how the whole world works. Other solutions include having independent system and pcie address spaces, with sliding windows, like the video card problem above, allowing you to have say a 2gb video card mapped flat in pcie space but limiting the window into pcie space to something not painful for the host system.
hardware designs are as varied as software designs. take 100 software engineers and give them a specification and you may get as many as 100 different solutions. Same with hardware give them a specification and you may get 100 different pcie designs. Some standards are in place to limit that, and/or cloning where you want to make a sound blaster compatible card, you dont change the interface, but given the freedom software has the hardware can and will vary and with the number of types of pcie devices (sound, hard disk controllers, video, usb, networking,etc) you will get that many different mixes of registers and addressable memory.
sorry for the long answer, hope this helps. I would dig through linux and/or bsd sources for device drivers along with programmers reference manuals if you can get access to them, and see how different hardware designs use register and memory space and see what designs are painful for the software folks and what designs are elegant and well done.
The answer depends on what is the interface of the hardware- is it over USB or PCI-Express? (and there could be others connectivity methods too - USB and PCI-Express are the most common)
With USB
The host learns about the newly arrived device by reading the descriptors and loads the appropriate device driver. The device would have presented its ID that is used for Plug n Play. The device is also assigned an address by the Host. Once the device driver kicks-in it configures the device and makes it ready for data transfer. The data transfer is done using IRP, the transfer technique and how the IRPs are loaded depend upon whether the transfer is isochronous data or bulk or other modes.
So to answer your second question - yes the hardware needs some memory to work. The Driver and the USB Host Controller Driver together setup the Memory on the host for the USB Device - the USB Device Driver then accordingly communicates/drives the device.
With PCI-Express
It is similar - sorry I do not have hands on experience with PCI-Express.

Difference between physical addressing and virtual addressing concept

This is a re-submission, because I am not getting any response from superuser.com. Sorry for the misunderstanding.
I need to know the difference between physical addressing and virtual addressing concept in embedded systems.
Why virtual addressing concept is implemented in embedded systems?
What is the advantage of the virtual addressing over a system with physical addressing concept in embedded systems?
How the mapping between virtual addressing to physical addressing is done in embedded systems?
Please, explain the above concept with some simple examples in some simple architecture.
Physical addressing means that your program actually knows the real layout of RAM. When you access a variable at address 0x8746b3, that's where it's really stored in the physical RAM chips.
With virtual addressing, all application memory accesses go to a page table, which then maps from the virtual to the physical address. So every application has its own "private" address space, and no program can read or write to another program's memory. This is called segmentation.
Virtual addressing has many benefits. It protects programs from crashing each other through poor pointer manipulation, etc. Because each program has its own distinct virtual memory set, no program can read another's data - this is both a safety and a security plus. Virtual memory also enables paging, where a program's physical RAM may be stored on a disk (or, now, slower flash) when not in use, then called back when an application attempts to access the page. Also, since only one program may be resident at a particular physical page, in a physical paging system, either a) all programs must be compiled to load at different memory addresses or b) every program must use Position-Independent Code, or c) some sets of programs cannot run simultaneously.
The physical-virtual mapping may be done in software (with hardware support for memory traps) or in pure hardware. Sometimes even the page tables themselves are on a special set of hardware memory. I don't know off the top of my head which embedded system does what, but every desktop has a hardware TLB (Translation Lookaside Buffer, basically a cache for the virtual-physical mappings) and some now have advanced Memory Mapping Units that help with virtual machines and the like.
The only downsides of virtual memory are added complexity in the hardware implementation and slower performance.
The VAX (Virtual Address eXtented by Digital Equipment Corp which became Compaq, which became HP) is a very good example of an virtual embeded hardware system. It was a 32 bit mini computer that had an OS called VMS or Virtual Memory Systems. Dave Cutler was one of the principle architets of the systems and he much later wrote the Kernal for Windows NT. He is a very good read for this and other stuff. The Vax had special hardware for control of the virtual space and control of opcode access for security through hardware... very secure. This system was or is the grandfather of the modfern day PC at the Kernal Level. The first BSOD I saw on WNT 3.51 I was able to read because it came from the crash dump used in VMS to stop the system when unstable. By te way Look at the name VMS and WNT and you will find the next letters in the alhabet from VMS makes the term WNT. This was not an accident. maybe a jab at DEC for letting him go.

Resources