Measuring read time performance on HDD - performance

Assume we have a HDD with throughput 100MB/sec and seek time = 10ms.
How long will it take to read 50MB from the disk.
Friends answer:
Read Time = Seek Time + 50MB/Throughput = 510ms.
My concerns:
First of all my understanding is HDD contains platters, platters contains sectors, sectors contain tracks. The right sector needs to be brought under the head (i.e rotational latency) then the head now needs to move between those tracks in the sector (i.e seek time) to read the data. Correct me if I am wrong here.
Now,
a) Does not this answer assume that the data lies within contigous sectors ?
Because if the data was fragemented the disk is required to rotate to bring the right sector under the head, shouldnt that add rotational latency as well ?
b) Also, does seek time mean the time from which the data was requested to the time when the data was delivered to the CPU or does it mean the time for the head to move between the tracks within a sector ? The definition keeps varying.
c) My answer for worst case:
Assume sector size is 512
Read Time = (50MB/512 * rotational delay) + SeekTime + 50MB/Throughput.

Related

How to find average seek time in disk scheduling algorithms?

Seek Time : The amount of time required to move the read/write head from its current position to desired track.
I am looking for formula of average seek time used in disk scheduling algorithms.
How to find average seek time in disk scheduling algorithms?
I am looking for formula of average seek time used in disk scheduling algorithms.
The first step is to determine the geography of the device itself. This is difficult. Modern hard disks can not defined by the old "cylinders, heads, sectors" triplet, the number of sectors per track is different for different tracks (more sectors on outer tracks where the circumference is greater, less sectors on inner tracks where the circumference is smaller), and all of the information you can get about the drive (from the device itself, or from any firmware or OS API) is a lie to make legacy software happy.
To work around that you need to resort to "benchmarking tactics". Specifically, read from LBA sector 0 then LBA sector 1 and measure the time it took (to establish a "time taken when both sectors are in the same track" assumption), then read from LBA sector 0 then LBA sector N in a loop (with N starting at 2 and increasing) while measure the time it takes and comparing it to the previous value and looking for a larger increase in time taken that indicates that you've found the boundary between "track 0" and "track 1". Then repeat this (starting with the first sector in "track 1") to find the boundary between "track 1" and "track 2"; and keep repeating this to build an array of "how many sectors on each track". Note that it is not this simple - there's various pitfalls (e.g. larger physical sectors than logical sectors, sectors that are interleaved on the track, bad block replacement, internal caches built into the disk drive, etc) that need to be taken into account. Of course this will be excessively time consuming (e.g. you don't want to do this for every disk every time an OS boots), so you'll want to obtain the hard disk's identification (manufacturer and model number) and store the auto-detected geometry somewhere, so that you can skip the auto-detection if the geometry for that model of disk was previously stored.
The next step is to use the information about the real geometry (and not the fake geometry) combined with more "benchmarking tactics" to determine performance characteristics. Ideally you'd be trying to find constants for a formula like expected_time = sector_read_time + rotational_latency + distance_between_tracks * head_travel_time + head_settle_time, which could be done like:
measure time to read from first sector in first track then sector N in the first track; for every value of N (for every sector in the first track) and find the minimum time it can take, divide it by 2 and call it sector_read_time.
measure time to read from first sector in first track then sector N in the first track; for every value of N (for every sector in the first track) and find the maximum time it can take, divide it by the number of sectors in the first track, and call it rotational_latency.
measure time to read from first sector in track N then first sector in track N+1, with N ranging from 0 to "max_track - 1", and determine the average, and call it time0
measure time to read from first sector in track N then first sector in track N+2, with N ranging from 0 to "max_track - 1", and determine the average, and call it time1
assume head_travel_time = time1 - time0
assume head_settle_time = time0 - head_travel_time - sector_read_time
Note that there are various pitfalls with this too (same as before), and (if you work around them) the best you can hope for is a generic estimate (and not an accurate predictor).
Of course this will also be excessively time consuming, and if you're storing the auto-detected geometry somewhere it'd be a good idea to also store the auto-detected performance characteristics in the same place; so that you can skip all of the auto-detection if all of the information for that model of disk was previously stored.
Note that all of the above assumes "stand alone rotating platter hard disk with no caching and no hybrid/flash layer" and will be completely useless for a lot of cases. For some of the other cases (SSD, CD/DVD) you'd need different techniques to auto-detect their geometry and/or characteristics. Then there's things like RAID and virtualisation to complicate things more.
Mostly; it's far too much hassle to bother in practice.
Instead; just assume that cost = abs(previous_LBA_sector_number - next_LBA_sector_number) and/or let the hard disk sort out the optimum order itself (e.g. using Native Command Queuing - see https://en.wikipedia.org/wiki/Native_Command_Queuing ).

HDD access + search time calculation algorithm based on read/write speed and HDD buffer size

I want to write application that will calculate average (access + search) HDD time by performing a benchmark. I know how to do test on file read/wite speed from HDD to memory and I can check on manufacturer page what is HDD's internal buffer size. Test would be performed on defragmentet partition so I think the result isn't bad approximation of real values. If read speed were equal to write speed then I could do
average_value = (copy_time - (file_size / read_speed * 2)) / (file_size / HDD_buffer_size * 2)
but read speed usually differs from write speed so this folrmula doesn't apply.
Please someone answer what the formula should be for that.
First I would use sector access instead of file to avoid FAT and fragmentation influencing benchmark. Sector access is platform depended.
But any file access routine you got at your disposal should work just change the filename to "\\\\.\\A:" or \\\\.\\PhysicalDrive0. To access removal media like floppies and USB keys use "\\\\.\\A:" but for hard drives use \\\\.\\PhysicalDrive0 as the first method will not work for those. Also change the drive letter A or HDD number 0 to whatever you need...
In Window VCL C++ I do something like this:
int hnd;
BYTE dat[512];
hnd=FileOpen("\\\\.\\PhysicalDrive0",fmOpenRead);
if (hnd>=0)
{
FileSeek(hnd,0*512,0); // position to sector you want ...
FileRead(hnd,dat,512); // read it
FileClose(hnd);
hnd=FileCreate("bootsector.dat");
if (hnd>=0)
{
FileWrite(hnd,dat,512);
FileClose(hnd);
}
}
It will open first HDD drive as device (that is why the file name is so weird). From programming side you can assume it is file containing all the sectors of your HDD.
Beware you should not overwrite file system !!! So if you are writing do not forget to restore original sectors. Also safer is to avoid write access for first sectors (where Boot and FAT usually is stored) so in case of a bug or shutdown you do not lose FS.
The code above reads boot sector of HDD0 (if accessible) and saves it to file.
In case you do not have VCL use winapi C++ sector access example or OS API you have at your disposal.
For the HDD buffers estimation/measurement I would use this:
Cache size estimation on your system?
As it is the same thing just instead of memory transfer do disc access.
I would use QueryPerformanceCounter from winapi for the time measurement (should be more than enough).
You are going a bit backwards with the equations I think (but might be wrong). I would:
measure read and write speeds (separately)
transfer_read_rate = transfer_read_size / transfer_read_time
transfer_write_rate = transfer_write_size / transfer_write_time
to mix booth rates (the average)
transfer_avg_rate = (transfer_read_size + transfer_write_size) / (transfer_read_time + transfer_write_time)
where (transfer_read_time + transfer_write_time) can be directly measured.
When I change my memory benchmark to HDD (just by replacing the STOSD transfer by continuous sector read The result looks like this (for mine setup):
You can add seek time measurement by simply measure and average seek time to random positions ... And if you also add HDD geometry (sectors per track) then you can more precisely measure the rates as you can add seek times between tracks to the equations ...
Try this:
double read_time = file_size / read_speed ;
double write_time = file_size / write_speed ;
double total_time_if_no_delays = read_time + write_time ;
double avg_value = (copy_time - total_time_if_no_delays) / (file_size / HDD_buffer_size * 2) ;

Why Is a Block in HDFS So Large?

Can somebody explain this calculation and give a lucid explanation?
A quick calculation shows that if the seek time is around 10 ms and the transfer rate is 100 MB/s, to make the seek time 1% of the transfer time, we need to make the block size around 100 MB. The default is actually 64 MB, although many HDFS installations use 128 MB blocks. This figure will continue to be revised upward as transfer speeds grow with new generations of disk drives.
A block will be stored as a contiguous piece of information on the disk, which means that the total time to read it completely is the time to locate it (seek time) + the time to read its content without doing any more seeks, i.e. sizeOfTheBlock / transferRate = transferTime.
If we keep the ratio seekTime / transferTime small (close to .01 in the text), it means we are reading data from the disk almost as fast as the physical limit imposed by the disk, with minimal time spent looking for information.
This is important since in map reduce jobs we are typically traversing (reading) the whole data set (represented by an HDFS file or folder or set of folders) and doing logic on it, so since we have to spend the full transferTime anyway to get all the data out of the disk, let's try to minimise the time spent doing seeks and read by big chunks, hence the large size of the data blocks.
In more traditional disk access software, we typically do not read the whole data set every time, so we'd rather spend more time doing plenty of seeks on smaller blocks rather than losing time transferring too much data that we won't need.
Since 100mb is divided into 10 blocks you gotta do 10 seeks and transfer rate is (10/100)mb/s for each file.
(10ms*10) + (10/100mb/s)*10 = 1.1 sec. which is greater than 1.01 anyway.
Since 100mb is divided among 10 blocks, each block has 10mb only as it is HDFS. Then it should be 10*10ms + 10mb/(100Mb/s) = 0.1s+ 0.1s = 0.2s and even lesser time.

Which is faster to process a 1TB file: a single machine or 5 networked machines?

Which is faster to process a 1TB file: a single machine or 5 networked
machines? ("To process" refers to finding the single UTF-16 character
with the most occurrences in that 1TB file). The rate of data
transfer is 1Gbit/sec, the entire 1TB file resides in 1 computer, and
each computer has a quad core CPU.
Below is my attempt at the question using an array of longs (with array size of 2^16) to keep track of the character count. This should fit into memory of a single machine, since 2^16 x 2^3 (size of long) = 2^19 = 0.5MB. Any help (links, comments, suggestions) would be much appreciated. I used the latency times cited by Jeff Dean, and I tried my best to use the best approximations that I knew of. The final answer is:
Single Machine: 5.8 hrs (due to slowness of reading from disk)
5 Networked Machines: 7.64 hrs (due to reading from disk and network)
1) Single Machine
a) Time to Read File from Disk --> 5.8 hrs
-If it takes 20ms to read 1MB seq from disk,
then to read 1TB from disk takes:
20ms/1MB x 1024MB/GB x 1024GB/TB = 20,972 secs
= 350 mins = 5.8 hrs
b) Time needed to fill array w/complete count data
--> 0 sec since it is computed while doing step 1a
-At 0.5 MB, the count array fits into L2 cache.
Since L2 cache takes only 7 ns to access,
the CPU can read & write to the count array
while waiting for the disk read.
Time: 0 sec since it is computed while doing step 1a
c) Iterate thru entire array to find max count --> 0.00625ms
-Since it takes 0.0125ms to read & write 1MB from
L2 cache and array size is 0.5MB, then the time
to iterate through the array is:
0.0125ms/MB x 0.5MB = 0.00625ms
d) Total Time
Total=a+b+c=~5.8 hrs (due to slowness of reading from disk)
2) 5 Networked Machines
a) Time to transfr 1TB over 1Gbit/s --> 6.48 hrs
1TB x 1024GB/TB x 8bits/B x 1s/Gbit
= 8,192s = 137m = 2.3hr
But since the original machine keeps a fifth of the data, it
only needs to send (4/5)ths of data, so the time required is:
2.3 hr x 4/5 = 1.84 hrs
*But to send the data, the data needs to be read, which
is (4/5)(answer 1a) = (4/5)(5.8 hrs) = 4.64 hrs
So total time = 1.84hrs + 4.64 hrs = 6.48 hrs
b) Time to fill array w/count data from original machine --> 1.16 hrs
-The original machine (that had the 1TB file) still needs to
read the remainder of the data in order to fill the array with
count data. So this requires (1/5)(answer 1a)=1.16 hrs.
The CPU time to read & write to the array is negligible, as
shown in 1b.
c) Time to fill other machine's array w/counts --> not counted
-As the file is being transferred, the count array can be
computed. This time is not counted.
d) Time required to receive 4 arrays --> (2^-6)s
-Each count array is 0.5MB
0.5MB x 4 arrays x 8bits/B x 1s/Gbit
= 2^20B/2 x 2^2 x 2^3 bits/B x 1s/2^30bits
= 2^25/2^31s = (2^-6)s
d) Time to merge arrays
--> 0 sec(since it can be merge while receiving)
e) Total time
Total=a+b+c+d+e =~ a+b =~ 6.48 hrs + 1.16 hrs = 7.64 hrs
This is not an answer but just a longer comment. You have miscalculated the size of the frequency array. 1 TiB file contains 550 Gsyms and because nothing is said about their expected freqency, you would need a count array of at least 64-bit integers (that is 8 bytes/element). The total size of this frequency array would be 2^16 * 8 = 2^19 bytes or just 512 KiB and not 4 GiB as you have miscalculated. It would only take ≈4.3 ms to send this data over 1 Gbps link (protocol headers take roughly 3% if you use TCP/IP over Ethernet with an MTU of 1500 bytes /less with jumbo frames but they are not widely supported/). Also this array size perfectly fits in the CPU cache.
You have grossly overestimated the time it would take to process the data and extract the frequency and you have also overlooked the fact that it can overlap disk reads. In fact it is so fast to update the frequency array, which resides in the CPU cache, that the computation time is negligible as most of it will overlap the slow disk reads. But you have underestimated the time it takes to read the data. Even with a multicore CPU you still have only one path to the hard drive and hence you would still need the full 5.8 hrs to read the data in the single machine case.
In fact, this is an exemple kind of data processing that neither benefits from parallel networked processing nor from having more than one CPU core. This is why supercomputers and other fast networked processing systems use distributed parallel file storages that can deliver many GB/s of aggregate read/write speeds.
You only need to send 0.8tb if your source machine is part of the 5.
It may not even make sense sending the data to other machines. Consider this:
In order to for the source machine to send the data it must first hit the disk in order to read the data into main memory before it send the data over the network. If the data is already in main memory and not being processed, you are wasting that opportunity.
So under the assumption that loading to CPU cache is much less expensive than disk to memory or data over network (which is true, unless you are dealing with alien hardware), then you are better off just doing it on the source machine, and the only place splitting up the task makes sense is if the "file" is somehow created/populated in a distributed way to start with.
So you should only count the disk read time of a 1Tb file, with a tiny bit of overhead for L1/L2 cache and CPU ops. The cache access pattern is optimal since it is sequential so you only cache miss once per piece of data.
The primary point here is that disk is the primary bottleneck which overshadows everything else.

Random write Vs Seek time

I have a very weird question here...
I am trying to write the data randomly to a file of 100 MB.
data size is 4KB and the the random offset is page alligned.(4KB ).
I am trying to write 1 GB of data at random offset on 100 MB file.
If I remove the actual code that writes the data to the disk, the entire operation takes less than a second (say 0.04 sec).
If I keep the code that writes the data its takes several seconds .
In case of random write operation, what happens internally? whether the cost is seek time or the write time? From above scenario its really confusing.. !!!!
Can anybody explain in depth please....
The same procedure applied with a sequential offset, write is very fast.
Thank you ......
If you're writing all over the file, then the disk (I presume this is on a disk) needs to seek to a new place on every write.
Also, the write speed of hard disks isn't particularly stunning.
Say for the sake of example (taken from a WD Raptor EL150) that we have a 5.9 ms seek time. If you are writing 1GB randomly everywhere in 4KB chunks, you're seeking 1,000,000,000 ÷ 4,000 × 0.0059 seconds = a total seeking time of ~1,400 seconds!

Resources