How is data written into FLASH MEMORY by pages? - memory-management

Note: FLASH memory is a type of EEPROM. But when I say EEPROM, I exclude FLASH memory.
I have been checking many sources but could not find a clear cut answer on this.
From what I read, it looks like most ordinary EEPROMS -nowadays- utilize SPI or I2C protocols to read and write data. The op-code, address and the data all are sent bit-by-bit in a serial manner. Alternatively, parallel buses can be used in EEPROMs too, allowing us to read/ write a whole byte at a specific adress at once (although such design demands more pins).
About FLASH memory, I read that it is possible to "erase" -which is different than reading and writing- by blocks only. A block contains many pages and a page may contain many bytes. It is possible to read/ write at a specific byte in a NOR FLASH memory, but one can only read/ write by pages in case of NAND FLASH memory.
What I wonder is how the data is being written 'by pages'. Writing the whole page in one clock cycle would require a very large bus with too many lines, and writing the whole page bit-by-bit would require too much time. So, I think the page is being written byte-by-byte serially. But then comes the question, if we are able to write byte-by-byte, then why we cannot write/ read at a specific location in FLASH memory?
It looks like there is an iteration copying the content of a buffer into the physical page, but we could skip writing until we reach the desired position during the iteration. (This does not save us time tough. Maybe we write in extra just to consolidate/ secure the data in other adresses).
So what would you say on this?

Related

Linux SD (mmc) read/write: where is it actually done?

In order to support data mangling I need to write a custom device driver inserting a short amount of code at latest possible moment before actual write to SD (mmc driver) and, specularly, at the earliest possible moment after data is read back from SD.
I am aware all I/O is done using DMA transfer directly from/to disk cache structures, this means I will have to allocate a new buffer, transcode buffer to temp, point DMA to temp and start transfer. Reverse path on read.
Ideally I should use standard kernel crypto facilities (dm-crypt and LUKS), but my linux device is a small embedded ARM device which slows to a crawl with standard encryption, so I'm willing to trade some security for speed and settle for a "smart-obfuscation" instead of true crypto.
I need to find the point where to insert my code. In that point I need to have access to the data buffer, the sector number where buffer will be written/read and be able to redirect DMA transfer to a temp buffer.
kernel/drivers/mmc/core/core.c seems to have only routines dealing with card as a whole (reser, reset, ...) and not for actual data handling.
I have been unable to find the right place (to date) can someone point me to the right file, please?
EDIT:
As pointed out in a comment I don't really need to change data at the "absolute last moment", but that seemed the best solution because:
Mangling will no change data length.
Mangling depends on actual logical sector.
Data in disk cache should remain readable and usable.
Only data going to SD needs to be mangled (no mangling for data in Flash).
I will need to do the same modification to a desktop PC to be able to read/write SDs used in the embedded system.
Overhead should be kept as low as possible (embedded has low mem and computational power).
Any (roughly) equivalent solution can be evaluated.
I am also willing to forgo DMA usage and force PIO-mode for SD if that makes things easier; this would lift requirement of sector copying as requested mangling can be done "on the fly" while transferring data from buffer to peripheral.

Array of values loaded through UART in VHDL

I am working on a project in VHDL wich includes mutliplying matrices. I would like to be able to load data from PC to arrays on FPGA using UART. I am only making my first bigger steps in VHDL and I am not sure if I am taking the right attitude.
I wanted to declare an array of integer signals, and then implement UART to receive data form PC and load it into those signals. However, I can't use for-loop for that, as it will be synthesised to load data parallelly (which is impossible, because values will be comming from PC one after another, using serial port.) And because matrices may be various sizes, in order to assign signals one by one I would need to write lots of specific code (and it appears to be a bad practice to me.)
Is the idea to use an array of signals and load data to those signals through UART realizable? And if my approach is entirely wrong, how could I achieve that?
What you want is doable but you will probably need to design a kind of hardware monitor to act as an intermediate between your UART and your storage (your array of integer signals). This hardware monitor will interpret commands coming from the UART and perform read/write operations in your storage. It will have one interface with the storage and another with the UART. You will have to define a kind of protocol with a syntax for your commands and of sequences of operations for each command.
Example: the monitor waits for commands coming from the UART. The first received character indicates whether it is a read (0) or a write (1). The four next characters are the target address, least significant byte first. If the command is a read, the monitor reads the data at the specified address in your storage and sends it to the UART, one byte at a time, least significant byte first. If the command is a write, the address is followed by a data to write in your storage at the specified address, least significant byte first, and your monitor waits until the data is received and writes it in your storage.
Optionally, the monitor could send an exit status byte at the end of each command to indicate potential errors (protocol errors, unmapped addresses, write attempts in read-only regions...)
Of course, depending on the characteristics of your application, you will probably define a completely different protocol, simpler or more complex, but the principle will be the same.
All this is usually implemented in software and runs on a CPU that has the UART as peripheral and the storage in its memory space. But if you do not have a CPU...
Warning: this is quite complex. The UART itself is quite complex. Not sure you should start with this if you are a VHDL beginner.
Your approach is not entirely wrong but you have a software orientated way of expressing this which indicate you are missing the fundamentals. People with strong software backgrounds tend to think in terms of the programming language and not in terms of the actual FPGA specific structures they want to achieve. It is the important to unlearn this if you want to be successful in designing for FPGA.
Based on what I just wrote you should consider in what type of FPGA structure you would like to store the data. The speed, resource and power requirements govern this choice. One suitable way to store the data would be in either a single or an array of either Block RAM or LUTRAM. Both of these structures can be inferred by using a signal of an array type in the hardware description language which is why I said you are not entirely off track. Consult the manual of your synthesis tool to find templates for how to infer these structures. An alternative is to use a vendor IP block or to instantiate a primitive directly but both those methods are clumsier in my opinion.
Important parameters to consider are the total number of words you need to store, the size of a word and the number of read/write operations per clock cycle. For higher number of reads per cycle an array of memories must be used since most FPGA memories only support two reads per cycle.

Combining two wires in verilog

I'm designing a Single Cycle CPU.
I have designed both the data path and controller for this CPU.
Now I have encountered a problem.
For the Instruction Memory and Data Memory, there should be a way out for inputs and outputs out of the CPU, since it is needed to write data to IM and read data from DM, and viceversa.
But the way I have designed my data path, these two memories are part of the data path.
since for writing to a memory, you need to provide an address and a data, and in the data path there are already wires connected to these memories, I don't know how I should connect two wires to a single input/output place.
for example, for writing to the IM, I provide the inputs "IM_address" and "IM_data_in".
but in the data path, the wires connected to the address input of this memory are outputs of other components, so I cannot assign the IM_address wire to this place because it should both be an input and an output at the same time.
now I know that there is something called an "inout" , but I'm not familiar with the usage of it, and I am also not sure that this might apply to my situation.
if anybody could give me a help on this, I would very much appreciate it!
thanks in advance
Only one component can read or write to any memory location at a time. If two components ever need to access the same memory, you either need to duplicate the memory and give each person their own copy, or create an arbitration scheme to prevent both components from reading/writing at the same time.
It sounds to me like you need to be using a multiplexer and selecting who is able to write to the instruction memory at any given time. I would think though that you should only be writing to the instruction memory at initialization, to program your CPU. Why would other components need to access the instruction memory?
A Multiplexer, or mux for short, is able to select one of a number of inputs to a single output. The signal that does the selection needs to be set by you.

How is fseek() implemented in the filesystem?

This is not a pure programming question, however it impacts the performance of programs using fseek(), hence it is important to know how it works. A little disclaimer so that it doesn't get closed.
I am wondering how efficient it is to insert data in the middle of the file. Supposing I have a file with 1MB data and then I insert something at the 512KB offset. How efficient would that be compared to appending my data at the end of the file? Just to make the example complete lets say I want to insert 16KB of data.
I understand the answer varies depending on the filesystem, however I assume that the techniques used in common filesystems are quite similar and I just want to get the right notion of it.
(disclaimer: I want just to add some hints to this interesting discussion)
IMHO there are some things to take into account:
1) fseek is not a primary system service, but a library function. To evaluate its performance we must consider how the file stream library is implemented. In general, the file I/O library adds a layer of buffering in user space, so the performance of fseek may be quite different if the target position is inside or outside the current buffer. Also, the system services that the I/O libary uses may vary a lot. I.e. on some systems the library uses extensively the file memory mapping if possible.
2) As you said, different filesystems may behave in a very different way. In particular, I would expect that a transactional filesystem must do something very smart and perhaps expensive to be prepared to a possible rollback of an aborted write operation in the middle of a file.
3) Modern OS'es have very aggressive caching algorithms. An "fseeked" file is likely to be already present in cache, so operations become much faster. But they may degrade a lot if the overall filesystem activity produced by other processes become important.
Any comments?
fseek(...) is a library call, not an OS system call. It is the run-time library that takes care of the actual overhead involved in making a system call to the OS, technically speaking, fseek is indirectly making a call to the system but really it is not (this brings up a clear distinction between the differences between a library call and a system call). fseek(...) is a standard input-output function regardless of the underlying system...however...and this is a big however...
The OS will more than likely to have cached the file in its kernel memory, that is, the direct offset to the location on the disk on where the 1's and 0's are stored, it is through the OS's kernel layers, more than likely, a top-most layer within the kernel that would have the snapshot of what the file is composed of, i.e. data irrespectively of what it contains (it does not care either way, as long as the 'pointers' to the disk structure for that offset to the lcoation on the disk is valid!)...
When fseek(..) occurs, there would be a lot of over-head, indirectly, the kernel delegated the task of reading from the disk, depending on how fragmented the file is, it could be theoretically, "all over the place", that could be a significant over-head in terms of having to, from a user-land perspective, i.e. the C code doing an fseek(...), it could be scattering itself all over the place to gather the data into a "one contiguous view of the data" and henceforth, inserting into the middle of a file, (remember at this stage, the kernel would have to adjust the location/offsets into the actual disk platter for the data) would be deemed slower than appending to the end of the file.
The reason is quite simple, the kernel "knows" what was the last offset was, and simply wipe the EOF marker and insert more data, behind the scenes, the kernel, is having to allocate another block of memory for the disk-buffer with the adjusted offset to the location on the disk following an EOF marker, once the appending of data is completed.
Let us assume the ext2 FS and the Linux OS as an example. I don't think there will be a significant performance difference between a insert and an append. In both cases the files node and offset table must be read, the relevant disk sector mapped into memory, the data updated and at some later point the data written back to disk. What will make a big performance difference in this example is good temporal and spatial locality when accessing parts of the file since this will reduce the number of load/store combos.
As a previous answers says you may be able to speed up both operations if you deal with data writes that exact multiples of the FS block size, in this case you could skip the load stage and just insert the new blocks into the files inode datastrucure. This would not be practical, as you would need low level access to the FS driver, and using it would be very restrictive and not portable.
One observation I have made about fseek on Solaris, is that each call to it resets the read buffer of the FILE. The next read will then always read a full block (8K by default). So if you have a lot of random access with small reads it's a good idea to do it unbuffered (setvbuf with NULL buffer) or even use direct syscalls (lseek+read or even better pread which is only 1 syscall instead of 2). I suppose this behaviour will be similar on other OS.
You can insert data to the middle of file efficiently only if data size is a multiple of FS sector but OSes doesn't provide such functions so you have to use low-level interface to the FS driver.
Inserting data in the middle of the file is less efficient than appending to the end because when inserting you would have to move the data after the insertion point to make room for the data being inserted. Moving these data would involve reading them from disk, writing the data to be inserted and then writing the old data after the inserted data. So you have at least one extra read and write when inserting.

How files are copying at the low level?

I have a small question:
For example I'm using System.IO.File.Copy() method from .NET Framework. This method is a managed wrapper for CopyFile() function from WinAPI. But how CopyFile function works? It is interacts with HDD's firmware or maybe some other operations are performed through Assembler or maybe something other...
How does it look like from the highest level to the lowest?
Better to start at the bottom and work your way up.
Disk drives are organized, at the lowest level, in to a collection of Sectors, Tracks, and Heads. Sectors are segments of a track, Tracks are area on the disks itself, represented by the heads position as the platters spins underneath it, and the head is the actual element that reads the data from the platter.
Since Tracks are measured based on the distance that a head is from the center of a disk, you can see how towards the center of the disk the "length" of a track is short than one at the outer edge of the disk.
Sectors are pieces of a track, typically of a fixed length. So, an inner track will hold fewer sectors than an outer track.
Much of this disk geometry is handled by the drive controllers themselves nowadays, though in the past this organization was managed directly by the operating systems and the disk drivers.
The drive electronics and disk drivers cooperate to try and represent the disk as a sequential series of fixed length blocks.
So, you can see that if you have a 10MB drive, and you use 512 byte disk blocks, then that drive would have a capacity of 20,480 "blocks".
This block organization is the foundation upon which everything else is built. Once you have this capability, you can tell the disk, via the disk driver and drive controller, to go to a specific block on the disk, and read/write that block with new data.
A file system organizes this heap of blocks in to it's own structure. The FS must track which blocks are being used, and by which files.
Most file systems have a fixed location "where they start", that is, some place that upon start up they can go to try and find out information about the disk layout.
Consider a crude file system that doesn't have directories, and support files that have 8 letter names and 3 letter extension, plus 1 byte of status information, and 2 bytes for block number where the file starts on the disk. We can also assume that the system has a hard limit of 1024 files. Finally, it must know which blocks on the disk are being used. For that it will use 1 bit per block.
This information is commonly called the "file system metadata". When a disk is "formatted", nowadays it's simply a matter of writing new file system metadata. In the old days, it was a matter of actually writing sector marks and other information on blank magnetic media (commonly known as a "low level format"). Today, most drives already have a low level format.
For our crude example, we must allocate space for the directory, and space for the "Table of Contents", the data that says which blocks are being used.
We'll also say that the file system must start at block 16, so that the OS can use the first 16 blocks for, say, a "boot sector".
So, at block 16, we need to store 14 bytes (each file entry) * 1024 (number of files) = 12K. Divide that by 512 (block size) is 24 blocks. For our 10MB drive, it has 20,480 blocks. 20,480 / 8 (8 bits/byte) is 2,560 bytes / 512 = 5 blocks.
Of the 20,480 block available on the disk, the file system metadata is 29 blocks. Add in the 16 for the OS, that 45 blocks out of the 20,480, leaving 20,435 "free blocks".
Finally, each of the data blocks reserves the last 2 bytes to point to the next block in the file.
Now, to read a file, you look up the file name in the directory blocks. From there, you find the offset to the first data block for the file. You read that data block, grab the last two bytes. If those two byte are 00 00, then that's the end of the file. Otherwise, take that number, load that data block, and keep going until the entire file is read.
The file system code hides the details of the pointers at the end, and simply loads blocks in to memory, for use by the program. If the program does a read(buffer, 10000), you can see how this will translate in to reading several blocks of data from the disk until the buffer has been filled, or the end of file is reached.
To write a file, the system must first find a free space in the directory. Once it has that, it then finds a free block in the TOC bitmap. Finally, it takes the data, write the directory entry, sets its first block to the available block from the bitmap, toggles the bit on the bitmap, and then takes the data and writes it to the correct block. The system will buffer this information so that it ideally only has to write the blocks once, when they're full.
As it writes the blocks, it continues to consume bits from the TOC, and chains the blocks together as it goes.
Beyond that, a "file copy" is a simple process, from a system leverage the file system code and disk drivers. The file copy simply reads a buffer in, fills it up, writes the buffer out.
The file system has to maintain all of the meta data, keep track of where you are reading from a file, or where you are writing. For example, if you read only 100 bytes from a file, obviously the system will need to read the entire 512 byte datablock, and then "know" it's on byte 101 for when you try to read another 100 bytes from the file.
Also, I hope it's obvious, this is a really, really crude file system layout, with lots of issues.
But the fundamentals are there, and all file systems work in some manner similar to this, but the details vary greatly (most modern file systems don't have hard limits any more, as a simple example).
This is a question demanding or a really long answer, but I'm trying to make it brief.
Basically, the .NET Framework wraps some "native" calls, calls that are processed in lower-level libraries. These lower-level calls are often wrapped in a buffer logic to hide complicated stuff like synchronizing file contents from you.
Below, there is the native level, interacting with the OS' kernel. The kernel, the core of any operating system, then translates your high-level instruction to something your hardware can understand. Windows and Linux are for example both using a Hardware Abstraction Layer, a system that hides hardware specific details behind a generic interface. Writing a driver for a specific device is then only the task of implementing all methods certain device has to provide.
Before anything gets called on your hardware, the filesystem gets involved, and the filesystem for itself also buffers and caches a lot, but again transparently, so you don't even notice that. The last element in the call-queue is the device itself, and again, most devices conform to some standard ( like SATA or IDE ) and can thus be interfaced in a similar manner.
I hope this helps :-)
The .NET framework invokes the Windows API.
The Windows API has functions for managing files across various file systems.
Then it depends on the file system in question. Remember, it's not necessarily a "normal" file system over a HDD; It could even be a shell extension that just emulates a drive and keeps the data in you gmail account, or whatever. The point is that the same file manipulation functions in the Windows API are used as an abstraction over many possible lower layers of data.
So the answer really depends on the kind of file system you're interested in.

Resources