I have some doubts about ARM bootloader.
Is cache enabled or disabled in bootloader?
Why cache needs to be enabled/disabled in bootloader? What will happen?
Who will manage cache in bootloader ? On what basis cache entry will be made ?
Is cache enabled or disabled in bootloader?
It all depends on the architecture of the boot-loader. But in general, enabling cache (instruction cache in specific) allows 'fast' execution. So, if boot-loader needs to perform certain task repeatedly, e.g. deciphering a binary etc, then 'fast' execution would help.
Why cache needs to be enabled/disabled in bootloader? What will happen?
First part of this question is already answered above.
If cache is disabled, of-course boot-loader will behave like a normal firmware / bare-metal code.
But if enabled, then boot-loader needs to provide correct mapping table to MMU, with all precautions such as for SFR region none of the caching should be enabled and so on.
Who will manage cache in bootloader ? On what basis cache entry will be made ?
Cache is anyways managed by MMU. But as mentioned above, mapping needs to be provided by boot-loader or any executing entity to MMU so that depending on address-range, cache is enabled or disabled. ARM provides multiple levels of granularity for memory mapping, so as per requirements memory sections can be defined and behavior of cache can be finely controlled.
Related
I read somewhere "A modern server has 144GB RAM memory", is that 144GB all used as cache?
When we talk about a server's cache, does that mean the server's memory?
It all depends on the caching method utilized by the applications that run on the sever. There are numerous caching methods, but two methods frequently used are persistent Caching and In Memory Caching.
With persistent cache, the application stores cache values somewhere intended to be “permanent”, such as the file system, database or otherwise.
Whereas, with In Memory Caching, the application uses the memory (AKA RAM, in your question 144GB) to store data. Using this method, the data is intended to be semi-permanent and not persist across reboots, application recycles, or otherwise.
If, when coding, you allocate a new object, dictionary, list or otherwise, these objects are stored in memory. Additionally, all of a servers memory is not available to the applications that run on said server. All operating systems and processes that are installed use the same RAM. Therefore, it’s common for a device that has 4GB RAM to only have 2GB reasonably usable, as the other 2GB is used by the operating system. Of course, these numbers depend on a lot of factors.
There are some questions that make me confused:
Msi interrupt is a memory write request. Can msi ensure that all DMA data have been written into ram? or only ensure that the data has been transferred completely on pci bridge?
If msi interrupt only ensures the data transfer completely on pci bridge. How to guarantee all DMA data write into ram when getting msi interrupt?
Does msi memory write request really write into ram?
Thanks in advance.
Ensuring that DMA data have been written to the bus prior to the MSI write is the responsibility of the device. The device should not issue the MSI write until everything that the driver/OS needs to see with respect to the device request has been done, whether that entails memory reads, memory writes or whatever else. But, assuming things have been done in the appropriate order (on the bus) by the device (DMA write(s), then MSI write), it is then up to the host bridge to ensure that the data is written to RAM in the correct order. But typically the MSI write itself has nothing to do with any guarantees. The host bridge simply ensures that its memory transactions are executed in the order given (and the memory subsystem ensures coherence among all the CPUs and peripherals so that the data appear to have been written to memory in the correct order even if there are caches and such).
As for your question 3, the MSI write goes to wherever the device is told to send it when you setup MSI in the device registers. Typically, that "MSI memory write" is directed to an address associated with the system interrupt controller and not to actual RAM, but it's the OS/driver responsibility to configure the correct address.
I was reading something a few months ago about windows chipset iterations and PCH upgrades between them and I'm pretty sure I saw something on DMA cache coherency and that it involves the home agent or QHL (Nehalem) but I can't find it now.
So I ask if anyone knows the details of any method of DMA cache coherency that has been employed by Intel and how it works.
Nehalem's global queue on the optimisation manual:
Cacheline requests from the cores or from a remote package or the
I/O Hub are handled by the GQ.
The global queue checks to see if the line is on the package and if it is, it snoops the appropriate cores using the core valid bits. If this is a dual socket system then the request will be sent to the QHL (Home agent on SnB) if home snoop is being used which will then send to the QPI link that the NUMA node bitmap refers to. If source snoop is being used then the GQ will check its own 2 bit i/o directory cache in order to generate a message for the correct QPI link the QHL (QPI agent on SnB) must generate another message to the correct LLC that has been assigned that address range. I'm not sure what happens on COD mode on Haswell or SNC on the mesh architecture.
I'm working on a uboot test application that will work with a special DMA engine. The DMA engine will transfer data between memories without "notify" cache. Therefore, I expect that if I keep transferring different data to the same destination, I should get the stale data.
However, I found that I always get the correct data the DMA engine sent. This makes me think that maybe the dcache is not enabled. So I tried the uboot build-in cmd - dcache. It shows my data cache is enabled. And I checked the TLB table and all pages are marked as "write back write allocate". So it means the cache is enabled?
And more interesting thing I found is that, I wrote a simple program that just keeps reading the same address. And I found that by disabling the dcache using the dcache cmd, the time to run the test just tripled. I tried a similar simple test in Linux on the same hardware and the cache can enable more than 15 times performance boost. So this must not be a hardware issue.
In summary, I found that my cache is working to some extent but not fully working. And it might be a configuration issue. Is there any theory can explain what I found? How can I continue to debug... Thanks
Let me answer it myself...
Code in Uboot is a little misleading... it run
set_section_dcache(i, DCACHE_WRITEBACK_WRITETHROUGH)
but after checking the MMU, it turns out that the memory type is set to be device.
I guess NTFS (file system of Windows) has some cache. Suppose I have a file, which is frequently accessed (read-only). How can I check if this file is in the file system cache ? Can I increase the file system cache size ?
Check
http://blogs.technet.com/b/askperf/archive/2010/08/13/introduction-to-the-new-sysinternals-tool-rammap.aspx
You can use RamMap which will give you a dedicated view of how current system is caching files.
Also to mention, cache isn't based on file, more by block/page.
There is no direct way from user space to detect if a file has been cached (partially or completely). In a multithreaded/multiprocessing environment, once you have received this information, it is instantly out of date.
There is no "limit" to caching in Windows that can be adjusted (although my data is Windows 7 and prior versions). The cache manager simply uses the memory manager to place data into memory and get callbacks when physical memory needs to be reclaimed (say, by an application's demands). The memory manager trades off file cache against memory demands of processes.