Linux SD (mmc) read/write: where is it actually done? - linux-kernel

In order to support data mangling I need to write a custom device driver inserting a short amount of code at latest possible moment before actual write to SD (mmc driver) and, specularly, at the earliest possible moment after data is read back from SD.
I am aware all I/O is done using DMA transfer directly from/to disk cache structures, this means I will have to allocate a new buffer, transcode buffer to temp, point DMA to temp and start transfer. Reverse path on read.
Ideally I should use standard kernel crypto facilities (dm-crypt and LUKS), but my linux device is a small embedded ARM device which slows to a crawl with standard encryption, so I'm willing to trade some security for speed and settle for a "smart-obfuscation" instead of true crypto.
I need to find the point where to insert my code. In that point I need to have access to the data buffer, the sector number where buffer will be written/read and be able to redirect DMA transfer to a temp buffer.
kernel/drivers/mmc/core/core.c seems to have only routines dealing with card as a whole (reser, reset, ...) and not for actual data handling.
I have been unable to find the right place (to date) can someone point me to the right file, please?
EDIT:
As pointed out in a comment I don't really need to change data at the "absolute last moment", but that seemed the best solution because:
Mangling will no change data length.
Mangling depends on actual logical sector.
Data in disk cache should remain readable and usable.
Only data going to SD needs to be mangled (no mangling for data in Flash).
I will need to do the same modification to a desktop PC to be able to read/write SDs used in the embedded system.
Overhead should be kept as low as possible (embedded has low mem and computational power).
Any (roughly) equivalent solution can be evaluated.
I am also willing to forgo DMA usage and force PIO-mode for SD if that makes things easier; this would lift requirement of sector copying as requested mangling can be done "on the fly" while transferring data from buffer to peripheral.

Related

in ESP32 / ESP-IDF - when to use EEPROM vs NVS vs SPIFFS?

I'm fairly new to doing production work on ESP32 microcontrollers, and I'm wanting a little context and nuance from people who've been around the block a few times. So this question is a bit more on that kind of thing rather than a "how do I code X" kind of question.
I have lots of data storage needs on my current project.
larger blobs of data that need to be stored less often
smaller blobs of data that need to be updated more often
factory settings (like serial number, board revision, etc) that are particular to a given device, but aren't going to be encoded in C.
etc
I'm familiar with storing data in "blobs", and I'm familiar with encoding / decoding data with protocol buffers.
So given all that, I'm trying to gain context on the differences between my various storage options on the ESP32, and when to use each.
EEprom
NVS
SPIFFS / LittleFS
other options...
What use cases make you pick one of these options over another?
There's no EEPROM on the ESP32, just the flash.
NVS is a simple non-volatile key-value store with different data types (integers 8-64 bits, strings, blobs). It's reasonably convenient to use, does wear levelling and supports flash encryption (although that a bit of a hassle). I'd use it for storing factory settings and anything else which is reasonably small (there's a 4000 byte limit on strings, 508,000 byte limit on blobs). If the device needs to write often, you might want to create a separate, dedicated, read-only NVS partition for storing device attributes (serial, hw info) so it's guaranteed to not get clobbered by power failures during write.
ESP IDF supports SPIFFS and FAT file systems.
SPIFFS is light-weight and much better in terms of wear levelling and reliability. I'd use this for storing any larger files. It doesn't support flash encryption, unfortunately.
FAT file system is probably the worst choice because it's not really natively Flash-friendly, nor reliable. Espressif has built some kind of a layer between FAT and flash to accommodate wear levelling. The only critical advantage of FAT is that it supports flash encryption.
Then there are third party options which I haven't used, unfortunately.
As always, consider the number of page erases your writes are going to cause in the flash - this gives you an estimate of how many times you can write before the chip's lifetime is reached.

To minimize the disk access using fsync

There is a situation that whenever a write occurs, fsync is used with that write. Then, how to minimise the disk access? How the kernel does this?
fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device).
I think Kernel can transfer the data of all the modified buffers to the hard disk periodically after some time. So that it can minimise the disk access.
Please give some suggestions/hints.
In general, try to avoid overthinking it. Don't call fsync, just let the kernel decide when to do the physical write.
Here are kernel options for ext4, which you can use to tune the kernel's behavior to your needs - but this would be a server tuning exercise rather than something you could implement from your application:
http://kernel.org/doc/Documentation/filesystems/ext4.txt
This might be an interesting one:
"
max_batch_time=usec Maximum amount of time ext4 should wait for
additional filesystem operations to be batch
together with a synchronous write operation.
"

DMA vs Cache difference

Probably a stupid question for most that know DMA and caches... I just know cache stores memory to somewhere closer to where you can access so you don't have to spend as much time for the I/O.
But what about DMA? It lets you access that main memory with less delay?
Could someone explain the differences, both, or why I'm just confused?
DMA is a hardware device that can move to/from memory without using CPU instructions.
For instance, a hardware device (lets say, your PCI sound device) wants audio to play back. You can either:
Write a word at a time via a CPU mov instructions.
Configure the DMA device. You give it a start address, a destination, and the number of bytes to copy. The transfer now occurs while the CPU does something else instead of spoon feeding the audio device.
DMA can be very complex (scatter gather, etc), and varies by bus type and system.
I agree fully with the first answer, and there are some common additions...
On most DMA hardwares you can also set it up to do memory to memory transfers - there are not always external devices involved. Also depending on the system you may or may not need to sync the CPU-cache in software before (or after the transfer), since the data the DMA transfers into/from memory may be done without the knowledge of the CPU-cache.
The benefit of doing any DMA is that the CPU(s) is/are able to do other things simultaneously.
Of course when the CPU also needs to access the memory, only one can gain access and the other must wait.
Mem to mem DMA is often used in embedded systems to increase performance, or may be vital to be able to access some parts of the memory at all.
To answer the question, DMA and CPU-cache are totally different things and not comparable.
I know its a bit late but answering this question will help someone like me I guess, Agreeing with the above answers, I think the question was in relation to cache.
So Yes a cache does store information somewhere closer to the memory, this could be the results of earlier computations. Moreover, whenever a data is found in cache (called a cache hit) the value is used directly. when its not found (called a cache-miss), the processor goes on to calculate the required value. Peripheral Devices (SD cards, USBs etc) can also access this data, which is why on startup we usually invalidate cache data so that the cache line is clean. We also flush cache data on startup so that all the cache data is written back to the main memory for cpu to use, after which we proceed to reset or initialize the cache.
DMA (Direct Memory Access), yes it does let you access the main memory. But I think the better definition is, it lets you access the system register, which can only be accessed by the processor. #Ronnie and #Yann Ramin were both correct in that DMA can be a device hardware, so it can be used by your serial peripheral to access system registers, but it can also be used for memory to memory transfers between two cores.
You can read up further on DMA from wikipedia, about the modes in which DMA can access the system memory. I ll explain it simply
Burst mode: DMA takes full control of the bus, CPU is idle during this time. Data is transferred in burst (as a whole) without interruption.
Cycle stealing mode: In this data is transfered one byte at a time, transfer is slow, but CPU is not idle.

How is fseek() implemented in the filesystem?

This is not a pure programming question, however it impacts the performance of programs using fseek(), hence it is important to know how it works. A little disclaimer so that it doesn't get closed.
I am wondering how efficient it is to insert data in the middle of the file. Supposing I have a file with 1MB data and then I insert something at the 512KB offset. How efficient would that be compared to appending my data at the end of the file? Just to make the example complete lets say I want to insert 16KB of data.
I understand the answer varies depending on the filesystem, however I assume that the techniques used in common filesystems are quite similar and I just want to get the right notion of it.
(disclaimer: I want just to add some hints to this interesting discussion)
IMHO there are some things to take into account:
1) fseek is not a primary system service, but a library function. To evaluate its performance we must consider how the file stream library is implemented. In general, the file I/O library adds a layer of buffering in user space, so the performance of fseek may be quite different if the target position is inside or outside the current buffer. Also, the system services that the I/O libary uses may vary a lot. I.e. on some systems the library uses extensively the file memory mapping if possible.
2) As you said, different filesystems may behave in a very different way. In particular, I would expect that a transactional filesystem must do something very smart and perhaps expensive to be prepared to a possible rollback of an aborted write operation in the middle of a file.
3) Modern OS'es have very aggressive caching algorithms. An "fseeked" file is likely to be already present in cache, so operations become much faster. But they may degrade a lot if the overall filesystem activity produced by other processes become important.
Any comments?
fseek(...) is a library call, not an OS system call. It is the run-time library that takes care of the actual overhead involved in making a system call to the OS, technically speaking, fseek is indirectly making a call to the system but really it is not (this brings up a clear distinction between the differences between a library call and a system call). fseek(...) is a standard input-output function regardless of the underlying system...however...and this is a big however...
The OS will more than likely to have cached the file in its kernel memory, that is, the direct offset to the location on the disk on where the 1's and 0's are stored, it is through the OS's kernel layers, more than likely, a top-most layer within the kernel that would have the snapshot of what the file is composed of, i.e. data irrespectively of what it contains (it does not care either way, as long as the 'pointers' to the disk structure for that offset to the lcoation on the disk is valid!)...
When fseek(..) occurs, there would be a lot of over-head, indirectly, the kernel delegated the task of reading from the disk, depending on how fragmented the file is, it could be theoretically, "all over the place", that could be a significant over-head in terms of having to, from a user-land perspective, i.e. the C code doing an fseek(...), it could be scattering itself all over the place to gather the data into a "one contiguous view of the data" and henceforth, inserting into the middle of a file, (remember at this stage, the kernel would have to adjust the location/offsets into the actual disk platter for the data) would be deemed slower than appending to the end of the file.
The reason is quite simple, the kernel "knows" what was the last offset was, and simply wipe the EOF marker and insert more data, behind the scenes, the kernel, is having to allocate another block of memory for the disk-buffer with the adjusted offset to the location on the disk following an EOF marker, once the appending of data is completed.
Let us assume the ext2 FS and the Linux OS as an example. I don't think there will be a significant performance difference between a insert and an append. In both cases the files node and offset table must be read, the relevant disk sector mapped into memory, the data updated and at some later point the data written back to disk. What will make a big performance difference in this example is good temporal and spatial locality when accessing parts of the file since this will reduce the number of load/store combos.
As a previous answers says you may be able to speed up both operations if you deal with data writes that exact multiples of the FS block size, in this case you could skip the load stage and just insert the new blocks into the files inode datastrucure. This would not be practical, as you would need low level access to the FS driver, and using it would be very restrictive and not portable.
One observation I have made about fseek on Solaris, is that each call to it resets the read buffer of the FILE. The next read will then always read a full block (8K by default). So if you have a lot of random access with small reads it's a good idea to do it unbuffered (setvbuf with NULL buffer) or even use direct syscalls (lseek+read or even better pread which is only 1 syscall instead of 2). I suppose this behaviour will be similar on other OS.
You can insert data to the middle of file efficiently only if data size is a multiple of FS sector but OSes doesn't provide such functions so you have to use low-level interface to the FS driver.
Inserting data in the middle of the file is less efficient than appending to the end because when inserting you would have to move the data after the insertion point to make room for the data being inserted. Moving these data would involve reading them from disk, writing the data to be inserted and then writing the old data after the inserted data. So you have at least one extra read and write when inserting.

Question about hard drive , 'seek' and 'read' in windows OS

Does anyone know when calling 'seek' and 'read' , how is the hard-drive physicly affected?
If i'll be more specific, I know that the harddrive has some kind of a magnetic needle that is used to read the data from the magnetic plates. So my question is , when is the needle actualy moved to the reading location?
Is it moved when we are calling the "seek" windowsApi method (no matter if an actual read performed) , or does "seek" just remember a virtual pointer , and the physical movement of the needle is performed only when the "read" method is called?
Edit: Assume that the data requested from the Hard-Drive doesn't exist in any of the caches (hard-drive cache , Os Cache , Ram and whatever else it could be)
Wanted to break out this question from your post
When is the needle actualy moved to the reading location?
I think the simple answer is "whenever data is requested that is not already present in any number of caches". The problem with predicting hard drive movement is you have to consider all of the different places that cache data read from the hard drive. If the data is present in those caches and accessible in the context requesting the data, the cache will be used instead of actually reading the hard drive. Here are just some of the places that can and do cache hard drive data
Hard Drive's internal cache
OS level caches
Program level caches
API level cache
In the case where none of the data is present then it will likely be read from the hard drive during a read call. A seek call is unlikely to cause the hard drive to move because you're not changing the physical hard drive pointer but a virtual pointer to the file within your program.
The hard drive head (needle) starts moving and the disk starts spinning up (unless already spinning) at the read operation. There is no head move or spinup at the seek operation.
Please note that the head may move nonsequentially above the disk even if you are reading a file sequentially, i.e. the the read of the 2nd, 3rd etc. 512-byte block may cause the head to move far away as well even if there aren't intervening seeks. This is partially because the file is fragmented on the filesystem, or because the firmware has a sector number remapping (i.e. logical sector 5 is not between logical sectors 4 and 6) to compensate bad-block errors.
The assumption in the question "Assume that the data requested from the Hard-Drive doesn't exist in any of the caches (hard-drive cache , Os Cache , Ram and whatever else it could be)" is difficult to assume and relatively rare. Even in this case, there is only a loose association between user mode file I/O operations and physical storage device operations.
There are many user mode File I/O functions in various windows libraries. Some of the oldest are the C library low level I/O functions. There are also the C library stream I/O functions, the C++ iostreams classes, and the manged I/O classes. There are other I/O interfaces as well that are part of other packages.
In general, all the user mode I/O Libraries are built on top of the Win32 file I/O functions including CreateFile(), SetFilePointer(), ReadFile(), and WriteFile().
Unless a file is opened in unbuffered mode the operating system can cache the files contents. This is done system wide, and not on a per-file basis. So, even if your program had not read or written a file, I/O to a file may be cached and not result in any physical storage device I/Os.
There are many factors that determine how file I/Os map to actual I/O operations on a physical device. This includes, library level bufering, OS cashing, device driver caching, hardware level cashing, device block size, file size, hardware block/sector remapping, and other factors.
The short story here is that you cannot assume that individual file level read or seek operations correspond to physical device operations, such as disk head seeking.
This gets even trickier when writes are considered. Often writes are accompanied by a flush - which the application developer assumes will push the data all the way to the physical media. Developers often assume that when a flush call returns success, that the data is guaranteed to be persistent on the storage device. This is far from true as devices and drivers often ignore flush calls.
There is more complexity with solid state drives which are not mechanical and therefore do not have 'seek' operations. Here, other physical characteristics manifest themselves such as the necessity to erase blocks before they are written to.

Resources