How I/O tool test corruptions - linux-kernel

I am trying to understand how I/O tools like dt and FIO test data corruption.
For example, say in 2G RAM system, dt or FIO writes 1G of data using some pattern and after writing 1G of IO it now has to see if the IO written is correct or corrupted so it reads back the 1G data that's written and determines data integrity.
My question is , how will dt or FIO know what it had written initially, I doubt if it would keep a copy of the original data
Would like to know how dt or Rio knows what was written initially.

How will dt or FIO know what it had written initially?
No needs to store all written data when you know the pattern used for write.
E.g., if the pattern was "write 1 into every byte", then you know that every byte after writing should contain 1.

Related

Transfer data async from one program to another

I am running a python program on a Raspberry Pi. This program writes data to a txt-file every second (every second some data is changed).
On a laptop I am running a Studio Basic program that reads that data file over the network from the Raspberry. This works OK as long as the time between the reads from that that file are more than 15 seconds apart. If I read/access faster than the same data is read. It looks that the windows program reads from a cache if it is accessed in less than 15 seconds. Is there a way to change the time limit so I can read more often (let us say every 5 seconds).
Note if I read the txt-data file using another python program in the Raspberry Pi than the changed data is read OK by that program. So the problem lies in the Windows system.
Please refer to this File Caching document, use win32file.CreateFile and specify FILE_FLAG_NO_BUFFERING to disable the cache, all read and write operations will directly access the physical disk.
EDIT :
For using CreateFile in VB.net, please refer to:
https://social.msdn.microsoft.com/Forums/en-US/4a2ebfaa-d56d-487a-b03d-0f9ca72e3bbc/createfile-and-deviceiocontrol-function-in-vbnet?forum=winembplatdev

Assigning chunk of data from an ipcore output to next ipcore input

I have a set of 16 data stored in BRAM ipcore. Now I have to fetch first 4 at a time and give it to the next IPcore (say FFT) for further processing. Once done with this, I have to feed the next set of 4 data. Is the situation handled by state machine? OR how can i assign values from one ipcore to the next ipcore ? Please help!!
BRAM is dual-ported. Simplest is to make a FIFO out of it with one IP core writing and the other IP core reading.
This will not work for an FFT as that requires a special addressing scheme: the data write order is different from the read order. There you need each IP core to make an address and connect it to the two ports of the BPRAM.
In all cases you need handshaking to guarantee reading only starts after the data has been written of course.

two programs accessing one file

New to this forum - looks great!
I have some Processing code that periodically reads data wirelessly from remote devices and writes that data as bytes to a file, e.g. data.dat. I want to write an Objective C program on my Mac Mini using Xcode to read this file, parse the data, and act on the data if data values indicate a problem. My question is: can my two different programs access the same file asynchronously without a problem? If this is a problem can you suggest a technique that will allow these operations?
Thanks,
Kevin H.
Multiple processes can read from the same file at a time without any problem. A process can also read from a file while another writes without problem, although you'll have to take care to ensure that you read in any new data that was written. Multiple processes should not write to the same file at at the same time, though. The OS will let you do it, but the ordering of data will be undefined, and you'll like overwrite data—in general, you're gonna have a bad time if you do that. So you should take care to ensure that only one process writes to a file at a time.
The simplest way to protect a file so that only one process can write to it at a time is with the C function flock(), although that function is admittedly a bit rudimentary and may or may not suit your use case.

What does SetFileValidData doing ? what is the difference with SetEndOfFile?

I look for a way to extend a file asynchronously and efficiently .
In a support document Asynchronous Disk I/O Appears as Synchronous on Windows NT, Windows 2000, and Windows XP said:
NOTE: Applications can make the previously mentioned write operation
asynchronous by changing the Valid Data Length of the file by using
the SetFileValidData function, and then issuing a WriteFile.
in MSDN, SetFileValidData is a function for Sets the valid data length of the specified file.
But I still not understand what is the "valid data", what is the difference between it and the size of file?
I can use SetFilePointerEx and SetEndOfFile to extend the file size, but how do this by SetFileValidData?
SetFileValidData cannot input a argument large than the size of file. In this case, what is the living meaning of SetFileValidData?
When you use SetEndOfFile to increase the length of a file, the logical file length changes and the necessary disk space is allocated, but no data is actually physically written to the disk sectors corresponding to the new part of the file. The valid data length remains the same as it was.
This means you can use SetEndOfFile to make a file very large very quickly, and if you read from the new part of the file you'll just get zeros. The valid data length increases when you write actual data to the new part of the file.
That's fine if you just want to reserve space, and will then be writing data to the file sequentially. But if you make the file very large and immediately write data near the end of it, zeros need to be written to the new part of the file, which will take a long time. If you don't actually need the file to contain zeros, you can use SetFileValidData to skip this step; the new part of the file will then contain random data from previously deleted files.
Addendum:
The rules for sparse files are different.
You should not use SetFileValidData on a file that non-privileged users have read access to; this could leak content from deleted files that belonged to other users.
Please note that SetEndOfFile() doesn't write any zeros to any allocated sectors on disk, it just allocates the space pointers inside MFT records and then updates the space bitmap of the whole file system. But the OS, or FS, will record the valid/logical file length in its MFT record.
If you enlarge the file, from 1GB to 2GB, then the appended 1GB should be all zeros, but the FS won't write the zeros to disks, it refers to this file's valid length to know that the 1GB should be zeros. If you try to read from this enlarged 1GB portion, it will fill zeros directly in RAM and then feedback to your application. But if you write any byte inside this 1GB portion, the FS has to fill with zeros from the original 1GB offset to the current pointer that your application is trying to write to, but not the other bytes from the current location to the tail of the file. Meanwhile, it records the valid/logical length to be from 0 to the current location, the physical size and allocated size is still 2GB.
But, if you use SetFileValidData(), the FS will set the valid length to 2GB directly, and won't bother to fill any zeros. Whatever location you write to, it just writes, but whatever location you read from, you may read out some garbage data which was previously generated by other applications before the file was extended into that disk space.
Agree with Harry Johnston's answer, and from the practice point of view, while SetFileValidData has performance advantage because it does not require writing zeros, it does have security implications because the file might contain data from other deleted files. So a special privilege, SE_MANAGE_VOLUME_NAME, is required, as MSDN mentioned: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365544(v=vs.85).aspx
The reason is that, if the user account of the running program doesn't have that privilege, using SetFileValidData can expose other user's deleted data into the view of that particular file, so normal users (non-administrators) are not allowed to do that. Even for privileged users, they still need to take care to use ACL (access control lists) in the file system to protect that file so that it is not shared with non-privileged users.
It seems that SenEndofFile does not really allocate reserved disk space for the target file, SetFileValidData is responsible for the job.
Refered to MSDN,
You can use the SetFileValidData function to create large files in very specific circumstances so that the performance of subsequent file I/O can be better than other methods. Specifically, if the extended portion of the file is large and will be written to randomly, such as in a database type of application, the time it takes to extend and write to the file will be faster than using SetEndOfFile and writing randomly.
If SetEndOfFile really allocate space, then SetFileValidData will do nothing better than SetEndOfFile when writing randomly. So SetEndOfFile may just create a sparse file with hole(s), while SetFileValidData do the actual allocation.

Hadoop Map-Reduce OutputFormat for assigning result to in-memory variable (not files)?

(from a Hadoop newbie)
I want to avoid files where possible in a toy Hadoop proof-of-concept example. I was able to read data from non-file-based input (thanks to http://codedemigod.com/blog/?p=120) - which generates random numbers.
I want to store the result in memory so that I can do some further (non-Map-Reduce) business logic processing on it. Essetially:
conf.setOutputFormat(InMemoryOutputFormat)
JobClient.runJob(conf);
Map result = conf.getJob().getResult(); // ?
The closest thing that seems to do what I want is store the result in a binary file output format and read it back in with the equivalent input format. That seems like unnecessary code and unnecessary computation (am I misunderstanding the premises which Map Reduce depends on?).
The problem with this idea is that Hadoop has no notion of "distributed memory". If you want the result "in memory" the next question has to be "which machine's memory?" If you really want to access it like that, you're going to have to write your own custom output format, and then also either use some existing framework for sharing memory across machines, or again, write your own.
My suggestion would be to simply write to HDFS as normal, and then for the non-MapReduce business logic just start by reading the data from HDFS via the FileSystem API, i.e.:
FileSystem fs = new JobClient(conf).getFs();
Path outputPath = new Path("/foo/bar");
FSDataInputStream in = fs.open(outputPath);
// read data and store in memory
fs.delete(outputPath, true);
Sure, it does some unnecessary disk reads and writes, but if your data is small enough to fit in-memory, why are you worried about it anyway? I'd be surprised if that was a serious bottleneck.

Resources