Reading files into fortran - format

So I'm writing some code in Fortran that multiplies a square matrix by itself. But the matrix I have to multiply is in a file and I'm having some issues reading it into the program. I think its because the sample data is in the following format:
3
101
010
101
The first row is the dimension of the matrix, and each row is a now in the matrix, but there aren't spaces in between the entries. So I guess my question is how do I split up those rows as I read them into a 2d array?

Read in the first number as N and use it to allocate an array of dimension N by N. Then read a row at a time of this array: array (i, 1:N)) for i=1 to N. See Fortran: reading a row of numbers into an array for the format to use.

Read using format
read (1,*) n
allocate(A(n,n))
do i=1,n
read (1,'(1000i1)'),A(i,:)
enddo
it does not matter whether you declare extra "i1" than actually needed

Related

How to know when ShearSorting is done

I'm currently doing some shearSorting and cannot figure out when this operation is supposed to be done with an n x n matrix.
What I'm doing currently is I'm copying the matrix at the start of each iteration of the loop to a temp matrix and then at the end of each iteration of the loop I'm comparing both the original and the temp matrices and if they are the same then I break out of the loop and exit. I do not like this approach as we always end up going through one extra iteration after the matrix in sorted and done which is a waste of CPU time and cycles.
There has to be a better way to do this checking. I keep finding references to log(n) to signify how many iteration we need but I don't believe they mean actual log(n) as log(5) for a 5x5 matrix in 0.69 which is impossible for number of iterations.
Any suggestions?
SO I know shearSort takes log(n) run iterations to complete so for a case of 5x5 matrix we will have 3 runs for rows and 3 runs for columns. But what if the 5x5 matrix I was given is kinda almost sorted and only needs one or 2 more iterations to be completed, in that case I do not see the point in iterating 6 time through it as this would be considered a waste of CPU power and cycles.
Also we have the following solution: if we copy the matrix at start of each iteration of the shearSort function to a temporary matrix and at the end of each iteration we compare the 2 matrices together and they are the same then we know that we are done (Note here an iteration would mean both a row and a column sort as a matrix might not need a row sort at first but would need a column sort after ). In this case we would be preserving CPU cycles in case the matrix doesn't need N + 1 iterations, but this solution would provide an issue which is when N + 1 iterations are needed then we would be doing N + 3 iterations to finish ( the extra 2 iterations would be one to check if 2 matrices are same for row and one for column).
To solve this we would have to use a combination of both solutions:
we would still be copying the matrix at start and comparing it to temp matrix at the end and if they are equal before we get to the N + 1 iterations then we are done and do not need to go on any further, and if they are not then we go to the N + 1 iteration and stop after since we know at this point the matrix should be sorted after N + 1 iterations.

implementing in CUDA a large boolean sparse matrix (having possibly 10 million entries) for RDF triples

I am looking for a suitable matrix format to represent a very large boolean sparse matrix (containing only 0's and 1's) in CUDA. I have been reading the CUSPARSE documentation and found several formats such as Compressed Sparse Row (CSR), Compressed Sparse Column (CSC), etc. Since the matrix non-zero elements are all 1, which particular format should be the perfect choice? The operations in matrix are basically writes that convert 0 to 1 based on some condition. The main aim is to query the matrix for the (row,col) pair for each 1's in a particular row. Any insight into matrix formats and efficiency of search row-wise shall be welcome.
#Robert Crovella: Many thanks for clarifying the issue. I understand CUDA does not has much role to play unless and until we decide to search for 1's (non-zero values) on different rows simultaneously, of course no writes on the matrix. This may be done as described by you for the search of all 1's in the second row (==1). Each thread can then search for a separate row asynchronously for non-zero values (in our case 1's). Just need to mention and would like to take your view on whether we can drop the values vector as it contains all ones. We will save on the space complexity a bit(though it will not be a major factor in terms of space). The space requirements will be nnz+n+1 instead of 2nnz+n+1.
As near as I can tell your question has nothing to do with CUDA. You just need a tutorial on sparse matrix compressed storage formats. I'll suggest you google for that (here's one example, but I will attempt to answer this question:
I was keen to know of a suitable search operation for finding the relevant columns among millions across a given row that have the value 1.
or:
(how) to query the matrix for the (row,col) pair for each 1's in a particular row.
If you have a matrix in CSR format, you will have a sequence of row pointers, and a set of column indices for each (non-zero) element. Suppose I have the following "sparse" matrix:
0 1 0
1 0 1
0 1 0
The CSR representation would be:
index: 0 1 2 3
values: 1 1 1 1
column indices: 1 0 2 1
row pointers: 0 1 3 4
If I have therefore a CSR representation, and I want to answer the question "Which columns for the second row have non-zero entries" it's a simple matter of starting with the row pointer for the second row (== 1), proceeding to the last element before the next row pointer (3), and reading off the respective column indices. In this case there are two column indices: 0 and 2. Therefore, row 1 has two non-zero elements, in columns 0 and 2.
Programmatically, if I wanted to increment each non-zero element in row 2 in CUDA code, I could:
int idx = threadIdx.x+blockDim.x*blockIdx.x;
if (idx < rowPtr[2] - rowPtr[1])
values[idx + rowPtr[1]]++;
CUSPARSE and CUSP provide many useful sparse matrix manipulation functions.

How can I sort a partitioned array efficiently?

I have K number of files. I call them X1, X2, ... ,XK.
Each of these files is a N x 1 array of doubles.
It means that I actually have a NK x 1 array, partitioned in K arrays. Lets call this large array X.
I need to sort X and I cannot load all data into the memory. What is the efficient algorithm to perform this sort and save the results in separate files?
I know (of course not sure efficient) How to do it, if I just want to sort H elements:
sort X1 and save it as sX1
A = sX1(1:H,1) //in Matlab
sort X2 and A
repeat steps 1,2 and 3 for other files
But H cannot be very large, again because of memory problems.
Update
The Sort with the limited memory question is different from this question, although it helped. If I want to use that questions answer or MikeB's answer, then this should be answered too:
Should I merge the K files into one file and then use external sort algorithm. If yes, How?
Thanks.
What you're attempting is called an external sort. Each partition gets sorted by itself. Then, you have to merge all the partitions to build the final sorted list. If you're only looking for the top few items you can exit the merge early.
There seem to be a few existing solutions matlab solutions for external merges. Here's a link to one over at the mathworks file exchange site: http://www.mathworks.com/matlabcentral/fileexchange/29306-external-merge-sort/content/ext_merge/merge.m
Update: the code I linked shows how it's done in matlab. Specifically, the code here: http://www.mathworks.com/matlabcentral/fileexchange/29306-external-merge-sort/content/ext_merge/extmerge.m takes a list of files that need to be merged, and eventually merges them to one file.
In your original problem statement, you said you have K files, from X1 thru XK. An external sort first sorts those files, then merges them into one file. A simple implementation would have pseudocode like this:
// external merge-sort algorithm
For each file F in (X1 ... XK)
Read file F into memory array R
Sort R
Overwrite file F with sorted data from R
Clear array R in memory
For N = K-1 down to 1
in-order merge file XN+1 and XN into file X'
erase file XN+1 and XN
rename file X' as XN
You should see that the first phase is to sort. We read each file into memory, sort it, and write it back out. This is I/O, but it's efficient; hopefully, we're using as much memory as possible so that we sort in memory as much as we can. At the end of that first loop, we have K files, each one sorted within its own domain of values.
Given the K sorted files, our next step is to merge them. Merging two files doesn't use any memory, but does lots of I/O. Merging two files looks like this, given two files named L and R we can merge them into O:
// merge two files algorithm
Get value LV from L
Get value RV from R
While L is not EOF AND R is not EOF
if ( LV <= RV )
write LV into O
get value LV from L
else
write RV into O
get value RV from R
While L is not EOF
get LV from L
write LV into O
While R is not EOF
get RV from R
write RV into O
The second loop in the merge-sort will merge two files, N+1 and N into a single file N. It loops through each of your files and merges them. This reads and re-writes lots of data, and you can get a bit more efficient than that by handling multiple files in a loop. But it works fine as I've written it.

Generating Random Matrix With Pairwise Distinct Rows and Columns

I need to randomly generate an NxN matrix of integers in the range 1 to K inclusive such that all rows and columns individually have the property that their elements are pairwise distinct.
For example for N=2 and K=3
This is ok:
1 2
2 1
This is not:
1 3
1 2
(Notice that if K < N this is impossible)
When K is sufficiently larger than N an efficient enough algorithm is just to generate a random matrix of 1..K integers, check that each row and each column is pairwise distinct, and if it isn't try again.
But what about the case where K is not much larger than N?
This is not a full answer, but a warning about an intuitive solution that does not work.
I am assuming that by "randomly generate" you mean with uniform probability on all existing such matrices.
For N=2 and K=3, here are the possible matrices, up to permutations of the set [1..K]:
1 2 1 2 1 2
2 1 2 3 3 1
(since we are ignoring permutations of the set [1..K], we can assume wlog that the first line is 1 2).
Now, an intuitive (but incorrect) strategy would be to draw the matrix entries one by one, ensuring for each entry that it is distinct from the other entries on the same line or column.
To see why it's incorrect, consider that we have drawn this:
1 2
x .
and we are now drawing x. x can be 2 or 3, but if we gave each possibility the probability 1/2, then the matrix
1 2
3 1
would get probability 1/2 of being drawn at the end, while it should have only probability 1/3.
Here is a (textual) solution. I don't think it provides good randomness, but nevertherless it could be ok for your application.
Let's generate a matrix in the range [0;K-1] (you will do +1 for all elements if you want to) with the following algorithm:
Generate the first line with any random method you want.
Each number will be the first element of a random sequence calculated in such a manner that you are guarranteed to have no duplicate in subsequent rows, that is for any distinct column x and y, you will have x[i]!=y[i] for all i in [0;N-1].
Compute each row for the previous one.
All the algorithm is based on the random generator with the property I mentioned. With a quick search, I found that the Inversive congruential generator meets this requirement. It seems to be easy to implement. It works if K is prime; if K is not prime, see on the same page 'Compound Inversive Generators'. Maybe it will be a little tricky to handle with perfect squares or cubic numbers (your problem sound like sudoku :-) ), but I think it is possible by creating compound generators with prime factors of K and different parametrization. For all generators, the first element of each column is the seed.
Whatever the value of K, the complexity is only depending on N and is O(N^2).
Deterministically generate a matrix having the desired property for rows and columns. Provided K > N, this can easily be done by starting the ith row with i, and filling in the rest of the row with i+1, i+2, etc., wrapping back to 1 after K. Other algorithms are possible.
Randomly permute columns, then randomly permute rows.
Let's show that permuting rows (i.e. picking up entire rows and assembling a new matrix from them in some order, with each row possibly in a different vertical position) leaves the desired properties intact for both rows and columns, assuming they were true before. The same reasoning then holds for column permutations, and for any sequence of permutations of either kind.
Trivially, permuting rows cannot change the property that, within each row, no element appears more than once.
The effect of permuting rows on a particular column is to reorder the elements within that column. This holds for any column, and since reordering elements cannot produce duplicate elements where there were none before, permuting rows cannot change the property that, within each column, no element appears more than once.
I'm not certain whether this algorithm is capable of generating all possible satisfying matrices, or if it does, whether it will generate all possible satisfying matrices with equal probability. Another interesting question that I don't have an answer for is: How many rounds of row-permutation-then-column-permutation are needed? More precisely, is any finite sequence of row-perm-then-column-perm rounds equivalent to a bounded number of (or in particular, one) row-perm-then-column-perm round? If so then nothing is gained by further permutations after the first row and column permutations. Perhaps someone with a stronger mathematics background can comment. But it may be good enough in any case.

What algorithm to use to delete duplicates?

Imagine that we have some file, called, for example, "A.txt". We know that there are some duplicate elements. "A.txt" is very big, more than ten times bigger than memory, maybe around 50GB. Sometimes, size of B will be approximately equal to size of A, sometimes it will be many times smaller than size of A.
Let it have structure like that:
a 1
b 2
c 445
a 1
We need to get file "B.txt", that will not have such duplicates. As example, it should be this:
a 1
b 2
c 445
I thought about algorithm that copy A and does B, then takes first string in B, and look for each another, if finds the same, deletes duplicates. Then takes second string, etc.
But I think it is way too slow. What can I use?
A is not database! No SQL, please.
Sorry, that didn't said, sorting is OK.
Although it can be sorted, what about if it cannot be sorted?
One solution would be to sort the file, then copy one line at a time to a new file, filtering out consecutive duplicates.
Then the question becomes: how do you sort a file that is too big to fit in memory?
Here's how Unix sort does it.
See also this question.
Suppose you can fit 1/k'th of the file into memory and still have room for working data structures. The whole file can be processed in k or fewer passes, as below, and this has a chance of being much faster than sorting the whole file depending on line lengths and sort-algorithm constants. Sorting averages O(n ln n) and the process below is O(k n) worst case. For example, if lines average 10 characters and there are n = 5G lines, ln(n) ~ 22.3. In addition, if your output file B is much smaller than the input file A, the process probably will take only one or two passes.
Process:
Allocate a few megabytes for input buffer I, a few gigabytes for a result buffer R, and a gigabyte or so for a hash table H. Open input file F and output file O.
Repeat: Fill I from F and process it into R, via step 3.
For each line L in I, check if L is already in H and R. If so, go on to next L, else add L to R and its hash to H.
When R is full, say with M entries, write it to O. Then repeatedly fill I from F, dedup as in step 3, and write to O. At EOF(F) go to 5.
Repeat (using old O as input F and a new O for output): Read M lines from F and copy to O. Then load R and H as in steps 2 and 3, and copy to EOF(F) with dedup as before. Set M to the new number of non-dupped lines at the beginning of each O file.
Note that after each pass, the first M lines of O contain no duplicates, and none of those M lines are duplicated in the rest of O. Thus, at least 1/k'th of the original file is processed per pass, so processing takes at most k passes.
Update 1 Instead of repeatedly writing out and reading back in the already-processed leading lines, a separate output file P should be used, to which process buffer R is appended at the end of each pass. This cuts the amount of reading and writing by a factor of k/2 when result file B is nearly as large as A, or by somewhat less when B is much smaller than A; but in no case does it increase the amount of I/O.
You will essentially have to build up a searchable result set (if the language reminds of you database technology, this is no accident, no matter how much you hate the fact that databases deal with the same questions as you do).
One of the possible efficient data structures for that is either a sorted range (implementable as a tree of some sort), or a hash table. So as you process your file, you insert each record into your result set, efficiently, and at that stage you get to check whether the result already exists. When you're done, you will have a reduced set of unique records.
Rather than duplicating the actual record, your result set could also store a reference of some sort to any one of the original records. It depends on whether the records are large enough to make that a more efficient solution.
Or you could simply add a mark to the original data whether or not the record is to be included.
(Also consider using an efficient storage format like NetCDF for binary data, as a textual representation is far far slower to access and process.)

Resources