COMPARE AND WRITE on umapped block? - scsi

The COMPARE AND WRITE command description in the SBC-4 doesn't say anything about the case when the range of logical blocks to be replaced contains unmapped blocks.
What's the common practice to deal with this case on the target side? Should a target assume that the verification step is always successful in the case when an initiator asks to replace unmapped blocks with something meaningful?

When a block is unmapped it is equal to all zeroes. You should perform all your commands that read or compare data with this block as it was just zeroes.

Related

Golang: when there's only one writer change the value using atomic.StoreInt32, is it necessary to use atomic.LoadInt32 in the multiple readers?

As the title says.
basically what I'm wondering is that will the atomic.StoreInt32 also lock read operation while it's writing?
Another relative question:, is atomic.StoreUint64(&procRate, procCount) equivalent to atomic.StoreUint64(&procRate, atomic.LoadUint64(&procCount))?
Thanks in advance.
Yes, you need to use atomic operations when you are both loading and storing the same value. The race detector should warn you about this.
As for the second question, if the procCount value is also being used concurrently, then you still need to load it using an atomic operation. These two are not equivalent:
atomic.StoreUint64(&procRate, procCount)
atomic.StoreUint64(&procRate, atomic.LoadUint64(&procCount))
The former reads procCount directly to pass to StoreUint64, while the latter passes a copy safely obtained via LoadUint64.

How to check if user input fits in variable?

I'm trying to write a simple program to calculate a function with Fortran95/03 which needs as input a number(x) and gets as output a number(y).
The user input is a real :: input and the read call looks like
read (*,*, iostat=stat) input
if(stat > 0) then
print *, "False input, use numbers!"
The iostat helps me to check if the input was a number or a letter.
My problem is that if I enter a very big number, like 1000000000000, the program crashes with the error message "bufferoverflow". I know that I can make the real bigger than a 4 byte variable, but I also can make the input number bigger, so this does not solve the problem.
The main question is, is it possible to prevent the program crashing because of user input?
Checking the values of the user's input is a very basic technique which must be employed in all kinds of software which interacts with someone else than just the author. It is used in all programming languages.
You can just use a simple condition
if (input > input_max) then
print *, "Input value too large, try again"
cycle ! or return stop or set some flag or whatever
end if
Don't forget the value may be also too small!
It is important to understand where does the crash come from. It certainly does not come from just from inputting a large number but using the number in a bad way, for example, allocating an array which is too large or by making a calculation which triggers a floating point exception.
Read the input as a string, then validate the string input, then use an internal read to convert the validated string into a REAL.
There are many aspects of processor dependent behaviour to input and output, as a general principle if you want robustness then you need to do much of the leg work yourself. For example, if malformed input for a real is provided, then there is no requirement that a processor identify that as an error condition and return a non-zero IOSTAT code.
List directed input offers further challenges, in that it has a number of surprising features that may trip you and your users up.

why is the length of block after the block

I'm extracting data from a binary file and see that the length of the binary data block comes after the block itself (the character chunks within the block have length first then 00 and then the information)
what is the purpose of the the block? is it for error checking?
Couple of examples:
The length of block was unknown when write operation began. Consider audio stream from microphone which we want to write as single block. It is not feasible to buffer it in RAM because it may be huge. That's why after we received EOF, we append effective size of block to the file. (Alternative way would be to reserve couple of bytes for length field in the beginning of block and then, after EOF, to write length there. But this requires more IO.)
Database WALs (write-ahead logs) may use such scheme. Consider that user starts transaction and makes lots of changes. Every change is appended as single record (block) to WAL. If user decides to rollback transaction, it is easy now to go backwards and then to chop off all records which were added as part of transaction user wants to rollback.
It is common for binary files to carry two blocks of metainformation: one block in the beginning (e.g. creation date, hostname) and another one in the end (e.g. statistics and checksum). When application opens existing binary file, it first wants to load these two blocks to make decisions about memory allocation and the like. This is much easier to load last block if its length is stored in the very end of file rather then scanning file from the beginning.

Is it possible to know the serial number of the block of input data on which map function is currently working?

I am a novice in Hadoop and here I have the following questions:
(1) As I can understand, the original input file is split into several blocks and distributed over the network. Does a map function always execute on a block in its entirety? Could there be more than one map functions executing on data in a single block?
(2) Is there any way that it can be learned, from within the map function, which section of the original input text the mapper is currently working on? I would like to get something like a serial number, for instance, for each block starting from the first block of the input text.
(3) Is it possible to make the splits of the input text in such a way that each block has a predefined word count? If possible then how?
Any help would be appreciated.
As I can understand, the original input file is split into several blocks and distributed over the network. Does a map function always execute on a block in its entirety? Could there be more than one map functions executing on data in a single block?
No. A block(split to be precise) gets processed by only one mapper.
Is there any way that it can be learned, from within the map function, which section of the original input text the mapper is currently working on? I would like to get something like a serial number, for instance, for each block starting from the first block of the input text.
You can get some valuable info, like the file containing split's data, the position of the first byte in the file to process. etc, with the help of FileSplit class. You might find it helpful.
Is it possible to make the splits of the input text in such a way that each block has a predefined word count? If possible then how?
You can do that by extending FileInputFormat class. To begin with you could do this :
In your getSplits() method maintain a counter. Now, as you read the file line by line keep on tokenizing them. Collect each token and increase the counter by 1. Once the counter reaches the desired value, emit the data read upto this point as one split. Reset the counter and start with the second split.
HTH
If you define a small max split size you can actually have multiple mappers processing a single HDFS block (say 32mb max split for a 128 MB block size - you'll get 4 mappers working on the same HDFS block). With the standard input formats, you'll typically never see two or more mappers processing the same part of the block (the same records).
MapContext.getInputSplit() can usually be cast to a FileSplit and then you have the Path, offset and length of the file being / block being processed).
If your input files are true text flies, then you can use the method suggested by Tariq, but note this is highly inefficient for larger data sources as the Job Client has to process each input file to discover the split locations (so you end up reading each file twice). If you really only want each mapper to process a set number of words, you could run a job to re-format the text files into sequence files (or another format), and write the records down to disk with a fixed number of words per file (using Multiple outputs to get a file per number of words, but this again is inefficient). Maybe if you shared the use case as for why you want a fixed number of words, we can better understand your needs and come up with alternatives

Guaranteeing every line is received in full from multiple PIPEs (STDOUTs)

I asked the other day if data integrity (of flushed data) is kept
even when there are more than one PIPEs streaming into
localhost's STDIN. The answer is NO if the data flushed is large.
Data integrity question when collecting STDOUTs from multiple remote hosts over SSH
But I would like to guarantee every line flushed on each end is
passed to the single STDIN in full and won't be mixed up with
data from other pipes. Is there any way to do so? How can that be done?
(Note that it can be done if I create multiple STDINs locally.
But it is more convenient if I can process line streams through a
single STDIN. So my question focuses on the case when there is
only one STDIN at localhost with multiple (STDOUT) PIPEs into it.)
This can be done via a congestion-backoff system like that used in Ethernet.
First, assign each pipe a unique delimiter. This delimiter cannot appear unescaped in the contents of any pipe. Now, use the following pseudocode:
Check for other process' delimiter; while an odd number of a single other process' delimiters is present, wait.
Write delimiter character.
Check if another process has also written an unmatched delimiter. If so, back off a random (increasing) amount and return to first step.
Write data.
Write delimiter character.
This will ensure that, although you will have some junk, every whole message will eventually get through.

Resources