Where is the early warning on the LTO tape? - scsi

SSC5 says:
4.2.5 Early-warning If writing, the application client needs an indication that it is approaching the end of the permissible recording
area (i.e., end of the partition (see 4.2.7)). This position, called
early-warning (EW), is typically reported to the application client at
a position early enough for the device to write any buffered logical
objects to the medium while still leaving enough room for additional
recorded logical objects (see figure 10 and figure 11). Some American National Standards include physical requirements for a marker placed on the medium to be detected by the device
as early-warning.
Can anyone tell me where EW is on the LTO tape, e.g LTO-5 or LTO-6?
Whether it depends on the vendor of the tape?
Whether they are tens or hundreds of MB's from EW to EOP?
I can't find the reference...

Here is a direct quote from HPE LTO-6 White Paper. Note: EWEOM stands for "Early Warning End Of Media".
The EWEOM is set to be slightly less than the native capacity of 2.5 TB for LTO-6 cartridges, as required by the LTO format.
Crucially, however, the EWEOM is slightly before the actual physical end of tape which means every LTO Ultrium format cartridge has a little bit more capacity than the stated headline figure. For LTO-6, this additional space is the equivalent of an additional 5% of capacity, although it is reserved exclusively for the system and cannot be accessed
via the backup software. The excess tape is the first section of the media that is used when there are higher than expected errors so that any rewrite and error correction takes place without losing the stated capacity of the tape.
Going back to your questions:
Can anyone tell me where EW is on the LTO tape, e.g LTO-5 or LTO-6?
5% of additional capacity corresponds to 125GB in LTO-6 media (2500GB * 5% = 125GB). This number means that the position of EW (EWEOM) in LTO-6 should be located at roughly 7 wraps before EOP. Note: 1wrap = 18GB in LTO-6. Please note that this location depends on generations. As an example, if we assume that LTO-5 media also has 5% of additional capacity, there should be 75GB for this region, and this capacity corresponds to roughly 4 wraps. This is just an example - I could not find the exact spare capacity of LTO-5.
Whether it depends on the vendor of the tape?
Since this spare capacity is required by LTO format, I believe that the location is independent on tape manufacturers.
Whether they are tens or hundreds of MB's from EW to EOP?
Once again, LTO-6 has 5% of spare capacity which corresponds to 125GB. I guess that this margin depends on generations, but it should be roughly a few percentages. This is my best guess.

Related

How to find average seek time in disk scheduling algorithms?

Seek Time : The amount of time required to move the read/write head from its current position to desired track.
I am looking for formula of average seek time used in disk scheduling algorithms.
How to find average seek time in disk scheduling algorithms?
I am looking for formula of average seek time used in disk scheduling algorithms.
The first step is to determine the geography of the device itself. This is difficult. Modern hard disks can not defined by the old "cylinders, heads, sectors" triplet, the number of sectors per track is different for different tracks (more sectors on outer tracks where the circumference is greater, less sectors on inner tracks where the circumference is smaller), and all of the information you can get about the drive (from the device itself, or from any firmware or OS API) is a lie to make legacy software happy.
To work around that you need to resort to "benchmarking tactics". Specifically, read from LBA sector 0 then LBA sector 1 and measure the time it took (to establish a "time taken when both sectors are in the same track" assumption), then read from LBA sector 0 then LBA sector N in a loop (with N starting at 2 and increasing) while measure the time it takes and comparing it to the previous value and looking for a larger increase in time taken that indicates that you've found the boundary between "track 0" and "track 1". Then repeat this (starting with the first sector in "track 1") to find the boundary between "track 1" and "track 2"; and keep repeating this to build an array of "how many sectors on each track". Note that it is not this simple - there's various pitfalls (e.g. larger physical sectors than logical sectors, sectors that are interleaved on the track, bad block replacement, internal caches built into the disk drive, etc) that need to be taken into account. Of course this will be excessively time consuming (e.g. you don't want to do this for every disk every time an OS boots), so you'll want to obtain the hard disk's identification (manufacturer and model number) and store the auto-detected geometry somewhere, so that you can skip the auto-detection if the geometry for that model of disk was previously stored.
The next step is to use the information about the real geometry (and not the fake geometry) combined with more "benchmarking tactics" to determine performance characteristics. Ideally you'd be trying to find constants for a formula like expected_time = sector_read_time + rotational_latency + distance_between_tracks * head_travel_time + head_settle_time, which could be done like:
measure time to read from first sector in first track then sector N in the first track; for every value of N (for every sector in the first track) and find the minimum time it can take, divide it by 2 and call it sector_read_time.
measure time to read from first sector in first track then sector N in the first track; for every value of N (for every sector in the first track) and find the maximum time it can take, divide it by the number of sectors in the first track, and call it rotational_latency.
measure time to read from first sector in track N then first sector in track N+1, with N ranging from 0 to "max_track - 1", and determine the average, and call it time0
measure time to read from first sector in track N then first sector in track N+2, with N ranging from 0 to "max_track - 1", and determine the average, and call it time1
assume head_travel_time = time1 - time0
assume head_settle_time = time0 - head_travel_time - sector_read_time
Note that there are various pitfalls with this too (same as before), and (if you work around them) the best you can hope for is a generic estimate (and not an accurate predictor).
Of course this will also be excessively time consuming, and if you're storing the auto-detected geometry somewhere it'd be a good idea to also store the auto-detected performance characteristics in the same place; so that you can skip all of the auto-detection if all of the information for that model of disk was previously stored.
Note that all of the above assumes "stand alone rotating platter hard disk with no caching and no hybrid/flash layer" and will be completely useless for a lot of cases. For some of the other cases (SSD, CD/DVD) you'd need different techniques to auto-detect their geometry and/or characteristics. Then there's things like RAID and virtualisation to complicate things more.
Mostly; it's far too much hassle to bother in practice.
Instead; just assume that cost = abs(previous_LBA_sector_number - next_LBA_sector_number) and/or let the hard disk sort out the optimum order itself (e.g. using Native Command Queuing - see https://en.wikipedia.org/wiki/Native_Command_Queuing ).

Logical block size on SSD

I'm currently working on a custom test/benchmark for SSD (CFast card) that runs on Win10 (written in C++). Part of the job is to read and interpret the S.M.A.R.T. attributes reported by the SSD. The one I'm interested in now is called "Total Host LBAs written", i.e. the number of LBAs written by the host system. The information I'm missing is "what is the size of memory one LBA refers to, in bytes?".
I have done some homeworks on how SSDs internally work, but I'm a bit confused in here and would hope somebody could shed some light on this, I am obviously missing something:
The FTL (Flash Translation Layer) in the SSD performs, amongst other operations (wear-leveling, garbage-collection etc.), LBA-to-physical address mapping.
The smallest memory unit that is individually readable/writable in SSD is a page. In my case, the page is said to have 16KiB of size. From this I would naively conclude that the LBA size will be the same as page size, i.e. 16KiB (or its integer multiple).
On the other hand, I would expect that the LBA will have the size of "sector" reported by GetDiskFreeSpace() from WinAPI, which reports 512B (with "SectorsPerCluster" = 8).
So, where am I thinking wrong and what is the real LBA size I can count with (or how can I get its value)? If the LBA size would be 512B (or 8*512 = 4KiB), the LBA would refer to 1/32 (1/4) of my flash page, which seems inefficient. I understand there's a need of emulation of older storages, but if it's allowed to write a single LBA, what does the SSD do then? Does it cache the whole page, rewrite the 1/32 part corresponding to the LBA, write it to empty block and update the LBA-physical address table?
Edit: sorry for using "LBA size", I know it's not semantically 100% correct, hopefully it's understandable...

How can I force an L2 cache miss?

I want to study the effects of L2 cache misses on CPU power consumption. To measure this, I have to create a benchmarks that gradually increase the working set size such that core activity (micro-operations executed per cycle) and L2 activity (L2 request per cycle) remain constant, but the ratio of L2 misses to L2 requests increases.
Can anyone show me an example of C program which forces "N" numbers of L2 cache misses?
You can generally force cache misses at some cache level by randomly accessing a working set larger than that cache level1.
You would expect the probability of any given load to be a miss to be something like: p(hit) = min(100, C / W), and p(miss) = 1 - p(hit) where p(hit) and p(miss) are the probabilities of a hit and miss, C is the relevant cache size, and W is the working set size. So for a miss rate of 50%, use a working set of twice the cache size.
A quick look at the formula above shows that p(miss) will never be 100%, since C/W only goes to 0 as W goes to infinity (and you probably can't afford an infinite amount of RAM). So your options are:
Getting "close enough" by using a very large working set (e.g., 4 GB gives you a 99%+ miss chance for a 256 KB), and pretending you have a miss rate of 100%.
Applying the formula to determine the actual expected number of misses. E.g., if you are using a working size of 2560 KB against an L2 cache of 256 KB, you have a miss rate of 90%. So if you want to examine the effect of 1,000 misses, you should make 1000 / 0.9 = ~1111 memory access to get about 1,000 misses.
Use any approximate approach but then actually count the number of misses you incur using the performance counter units on your CPU. For example, on Linux you could use PAPI or on Linux and Windows you could use Intel's PCM (if you are using Intel hardware).
Use an "almost random" approach to force the number of misses you want. The formula above is valid for random accesses, but if you choose you access pattern so that it is random with the caveat that it doesn't repeat "recent" accesses, you can get a 100% miss ratio. Here "recent" means accesses to cache lines that are likely to still be in the cache. Calculating what that means exactly is tricky, and depends in detail on the associativity and replacement algorithm of the cache, but if you don't repeat any access that has occurred in the last cache_size * 10 accesses, you should be pretty safe.
As for the C code, you should at least show us what you've tried. A basic outline is to create a vector of bytes or ints or whatever with the required size, then to randomly access that vector. If you make each access dependent on the previous access (e.g., use the integer read to calculate the index of the next read) you will also get a rough measurement of the latency of that level of cache. If the accesses are independent, you'll probably have several outstanding misses to the cache at once, and get more misses per unit time. Which one you are interested in depend on what you are studying.
For an open source project that does this kind of memory testing across different stride and working set sizes, take a look at TinyMemBench.
1 This gets a bit trickier for levels of caches that are shared among cores (usually L3 for recent Intel chips, for example) - but it should work well if your machine is pretty quiet while testing.

micro-programmed control circuit and one questions

I ran into a question:
in digital system with micro-programmed control circuit, total of distinct operation pattern of 32 signal is 450. if the micro-programmed memory contains 1K micro instruction, by using Nano memory, how many bits is reduced from micro-programmed memory?
1) 22 Kbits
2) 23 Kbits
3) 450 Kbits
4) 450*32 Kbits
I read in my notes, that (1) is true, but i couldn't understand how we get this?
Edit: Micro instructions are stored in the micro memory (control memory). There is a chance that a group of micro instructions may occur several times in a micro program. As a result the more memory space isneeded.By making use of the nano memory we can have significant saving in the memory when a group of micro operations occur several times in a micro program. Please see for nano technique ref:
Control Units
Back in the day, before .NET, when you actually had to know what a computer was, before you could make it do stuff. This question would have gotten a ton of answers.
Except, back then, the internet wasn't really a thing, and Stack overflow was not really a problem, as the concept of a stack and a heap, wasn't really a standard..
So just to make sure that we are in fact talking about the same thing, I will just tr to explain this..
The control unit in a digital computer initiates sequences of microoperations. In a bus-oriented system, the control signals that specify microoperations are
groups of bits that select the paths in multiplexers, decoders, and ALUs.
So we are looking at the control unit, and the instruction set for making it capable of actually doing stuff.
We are dealing with what steps should happen, when the compiled assembly requests a bit shift, clear a register, or similar "low level" stuff.
Some of theese instructions may be hardwired, but usually not all of them.
Micro-programs
Quote: "Microprogramming is an orderly method of designing the control unit
of a conventional computer"
(http://www2.informatik.hu-berlin.de/rok/ca/data/slides/english/ca9.pdf)
The control variables, for the control unit can be represented by a string of 1’s and 0’s called a "control word". A microprogrammed control unit is a control unit whose binary control variables are not hardwired, but are stored in a memory. Before we optimized stuff we called this memory the micro memory ;)
Typically we would actually be looking at two "memories" a control memory, and a main memory.
the control memory is for the microprogram,
and the main memory is for instructions and data
The process of code generation for the control memory is called
microprogramming.
... ok?
Transfer of information among registers in the processor is through MUXs rather
than a bus, we typically have a few register, some of which are familiar to programmers, some are not. The ones that should ring a bell for most in here, is the processor registers. The most common 4 Processor registers are:
Program counter – PC
Address register – AR
Data register – DR
Accumulator register - AC
Examples where microcode uses processor registers to do stuff
Assembly instruction "ADD"
pseudo micro code: " AC ← AC + M[EA] " where M[EA] is data from main memory register
control word: 0000
Assembly instruction "BRANCH"
pseudo micro code "If (AC < 0) then (PC ← EA) "
control word: 0001
Micro-memory
The micro memory only concerns how we organize whats in the control memory.
However when we have big instruction sets, we can do better than simply storing all the instructions. We can subdivide the control memory into "control memory" and "nano memory" (since nano is smaller than micro right ;) )
This is good as we don't waste a lot of valuable space (chip area) on microcode.
The concept of nano memory is derived from a combination of vertical and horizontal instructions, but also provides trade-offs between them.
The motorola M68k microcomputer is one the earlier and popular µComputers with this nano memory control design. Here it was shown that a significant saving of memory could be achieved when a group of micro instructions occur often in a microprogram.
Here it was shown that by structuring the memory properly, that a few bits could be used to address the instructions, without a significant cost to speed.
The reduction was so that only the upper log_2(n) bits are required to specify the nano-address, when compared to the micro-address.
what does this mean?
Well let's stay with the M68K example a bit longer:
It had 640 instructions, out of which only 280 where unique.
had the instructions been coded as simple micro memory, it would have taken up:
640x70 bits. or 44800 bits
however, as only the 280 unique instructions where required to fill all 70 bits, we could apply the nano memory technique to the remaining instructions, and get:
8 < log_2(640-280) < 9 = 9
640*9 bit micro control store, and 280x70 bit nano memory store
total of 25360 bits
or a memory savings of 19440 bits.. which could be laid out as main memory for programmers :)
this shows that the equation:
S = Hm x Wm + Hn x Wn
where:
Hm = Number of words High Level
Wm = Length of words in High Level
Hn = Number of Low Level words
Wn = Length of low level words
S = Control Memory Size (with Nano memory technique)
holds in real life.
note that, micro memory is usually designed vertically (Hm is large, Wm is small) and nano programs are usually opposite Hn small, Wn Large.
Back to the question
I had a few problems understanding the wording of the problem, - that may because my first language is Danish, but still I tried to make some sense of it and got to:
proposition 1:
1000 instructions
32 bits
450 uniques
µCode:
1000 * 32 = 32.000 bits
bit width required for nano memory:
log2(1000-450) > 9 => 10
450 * 32 = 14400
(1000-450) * 10 = 5500
32000 - (14400 + 5500) = 12.100 bits saved
Which is not any of your answers.
please provide clarification?
UPDATE:
"the control word is 32 bit. we can code the 450 pattern with 9 bit and we use these 9 bits instead of 32 bit control word. reduce memory from 1000*(32+x) to 1000*(9+x) is equal to 23kbits. – Ali Movagher"
There is your problem, we cannot code the 450 pattern with 9 bits, as far as I can see we need 10..

Understanding Negative Virtual Memory Pressure

I was re-reading Poul-Henning Kamp's paper entitled, "You're Doing It Wrong" and one of the diagrams confused me.
The x-axis of Figure 1 is labeled as "VM pressure in megabytes". The author clarifies the x-axis as being "measured in the amount of address space not resident in primary memory, because the kernel paged it out to secondary storage".
I can understand zero MB of VM pressure (all of the address space is resident in primary memory).
I can understand a positive VM pressure but I'm having a tough time picturing what negative 8 megabytes of VM pressure looks like (see the left of the x-axis of Figure 1). Putting negative 8 in the author's description leaves me with, "- 8 MB of address space not resident in primary memory". That doesn't make sense to me.
If I just conclude that the author accidentally negated positive numbers, the chart makes more sense but I'm not ready to conclude that the author has made the mistake. It's more likely that I have. But then as the pressure decreases, the runtime increases? That sounds counterintuitive.
I'm also not sure why there is a drastic change to the curves around -8 MB of VM memory pressure.
Thanks in advance!
Read "measured in the difference between amount of address space resident in primary memory and total required amount".
Word "not" somehow represents that minus sign.

Resources