Form a Matrix From a Large Text File Quickly - algorithm

Hi I am struggling with reading data from a file quickly enough. ( Currently left for 4hrs, then crashed) must be a simpler way.
The text file looks similar like this:
From To
1 5
3 2
2 1
4 3
From this I want to form a matrix so that there is a 1 in the according [m,n]
The current code is:
function [z] = reed (A)
[m,n]=size(A);
i=1;
while (i <= n)
z(A(1,i),A(2,i))=1;
i=i+1;
end
Which output the following matrix, z:
z =
0 0 0 0 1
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
My actual file has 280,000,000 links to and from, this code is too slow for this size file. Does anybody know a much faster was to do this in matlab?
thanks

You can do something along the lines of the following:
>> A = zeros(4,5);
>> B = importdata('testcase.txt');
>> A(sub2ind(size(A),B.data(:,1),B.data(:,2))) = 1;
My test case, 'testcase.txt' contains your sample data:
From To
1 5
3 2
2 1
4 3
The result would be:
>> A
A =
0 0 0 0 1
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
EDIT - 1
After taking a look at your data, it seems that even if you modify this code appropriately, you may not have enough memory to execute it as the matrix A would become too large.
As such, you can use sparse matrices to achieve the same as given below:
>> B = importdata('web-Stanford.txt');
>> A = sparse(B.data(:,1),B.data(:,2),1,max(max(B.data)),max(max(B.data)));
This would be the approach I'd recommend as your A matrix will have a size of [281903,281903] which would usually be too large to handle due to memory constraints. A sparse matrix on the other hand, maintains only those matrix entries which are non-zero, thus saving on a lot of space. In most cases, you can use sparse matrices more-or-less as you use normal matrices.
More information about the sparse command is given here.
EDIT - 2
I'm not sure why it isn't working for you. Here's a screenshot of how I did it in case that helps:
EDIT - 3
It seems that you're getting a double matrix in B while I'm getting a struct. I'm not sure why this is happening; I can only speculate that you deleted the header lines from the input file before you used importdata.
Basically it's just that my B.data is the same as your B. As such, you should be able to use the following instead:
>> A = sparse(B(:,1),B(:,2),1,max(max(B)),max(max(B)));

Related

Connected component labeling in matrix

I'm trying to do the following
Given the following matrix (where 1's are empty cells and 0's are obstacles):
0 0 1 1
1 0 0 0
1 0 1 1
1 1 0 0
I want it to become like this:
0 0 1 1
2 0 0 0
2 0 2 2
2 2 0 0
What I need to do is to label all connected components (free spaces).
What I already tried to do is to write a function called isConnected() which takes indexes of two cells and checks if there is a connected path between them. And by repeating this function n^2 times on every empty cell on the matrix I can label all connected spaces. But as this algorithm has a bad time complexity (n^2*n^2*O(isConnected())) I prefer to use something else.
I hope these pictures will explain better what I'm trying to accomplish:

Altering CT Volume voxel values in Matlab

I am trying to alter some voxel values in Matlab.
I am using the following code:
for p=1:100
Vol(Vol(:,:,p) > 0) = 65535; %altering voxel values in the volume to 65535 if value > 0.
end
Unfortunately, I find all the values being altered, as if the condition is not working, although if i write Vol(Vol(:,:,1)>0)= 65535 immediately in the command line it works perfectly.
Any clue where the error is?
The reason why is because you are not indexing each slice properly in your volume. When you are doing this for loop, what will happen is that the Boolean condition that is provided in Vol is modifying only the first channel.
Consider this small example. Let's create a 3 x 3 x 3 matrix of all 1s.
A = ones(3,3,3)
A(:,:,1) =
1 1 1
1 1 1
1 1 1
A(:,:,2) =
1 1 1
1 1 1
1 1 1
A(:,:,3) =
1 1 1
1 1 1
1 1 1
Let's set the first slice all to 65535 according to your condition:
A(A(:,:,1) > 0) = 65535
A(:,:,1) =
65535 65535 65535
65535 65535 65535
65535 65535 65535
A(:,:,2) =
1 1 1
1 1 1
1 1 1
A(:,:,3) =
1 1 1
1 1 1
1 1 1
This certainly works as we expect. Now let's try going to the second channel:
A(A(:,:,2) > 0) = 65535
A(:,:,1) =
65535 65535 65535
65535 65535 65535
65535 65535 65535
A(:,:,2) =
1 1 1
1 1 1
1 1 1
A(:,:,3) =
1 1 1
1 1 1
1 1 1
Oh no! It didn't work! It only worked for the first channel.... why? The reason why is because A(:,:,1) or any other channel provides a 2D matrix. If you provide a single 2D matrix, it only modifies the first slice of the volume. As such, as your loop keeps progressing, only the first channel gets modified (if at all). If you wanted to modify the second channel, you would have to create a 3D matrix, where the first slice would have all logical false, while the second slice contains the Boolean mask from Vol(:,:,2) > 0.
The 3D slicing stuff is probably complicated, especially for someone new to MATLAB. As such, I would recommend you do this to make things simpler. If you want to modify each slice, consider placing each binary mask as a temporary variable, modifying that temporary variable, then manually assigning this back to each slice. In other words:
for p=1:100
temp = Vol(:,:,p); %//Extract p'th channel
temp(temp > 0) = 65535; %// Find non-zero pixels and set to 65535
Vol(:,:,p) = temp; %// Set back to p'th channel.
end
Another recommended suggestion
Instead of using for loops, I would like to recommend this simple one-liner:
Vol(Vol > 0) = 65535;
This will automatically create a 3D Boolean matrix that will index Vol, and it will find those locations that are greater than 0, and set all of those locations to 65535. This avoids the need of any unnecessary for loops. This one-line essentially performs what the above for loop is doing, but is much more quicker... and I daresay much easier to read.
For your problem, I would just do :
Vol(Vol(:,:,1:100) > 0) = 65535;
No need for loop.

generating bitboard masks for move

I'm trying to understand how works bitboard representation in chess programming and I can't find usefull information (or just can't translate it correctly ^^) about one detail. My question is, how to generate automaticly masks for move on every position by every piece. I assume its an matrix where each piece type have define every field he can move from this position (array[5][64] for wP, bP, K, R, N, B). For example for Rook on position below, only allowed positions are:
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
1 1 R 1 1 1 1 1
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
I assume i have to create something like this for each piece type and for every tile it can step but do i have to hardcoded it manually to array or there is some possibility to automatize this process and precompute it after program run?
You could fairly easily precompute all of the bitboards that you need, since the rules of chess are well defined. For example, here is a function (in python) that would compute legal moves for a rook:
import sys
def rook(x, y):
for i in range (1, 9):
for j in range (1, 9):
if x == i or y == j:
sys.stdout.write("1")
else:
sys.stdout.write("0")
sys.stdout.write("\n")
print "Bit board of legal moves for a rook at 1, 3:"
rook(1, 3)
Instead of printing out the bitboard, you would likely store it in a compact format, such as an array of 64 bit values (since an 8x8 board needs 64 bits for each board).
This is a fairly extreme optimization technique, so the details of its implementation are going to get hairy (and be a pain to debug).
I used the wiki bitboard page as reference.

Plotting multiple matrices gnuplot

I'm fairly new to gnuplot and I'm trying to see how my array varies with iterations of my program to debug, I would ideally like to be able to make an animation of the data.
Here's what my data looks like:
'q=0'
1 0 0
0 1 1
1 1 0
'q=1'
1 0 0
0 1 1
0 1 0
and so on.
I tried using: plot "matrix.dat" index 0 matrix with image, just to plot the first matrix but I got "Warning matrix contains missing or undefined values, Matrix does not represent a grid".
I think the problem may be the comment lines. When I used this version of your data file
1 0 0
0 1 1
1 1 0
1 0 0
0 1 1
0 1 0
plotting with index works. To make an animation, you could create a series of .png files and stitch them together with another application. An example of the gnuplot code to make the .pngs would be:
set terminal png
do for [i=0:100] {
set output sprintf('matrix%03.0f.png',i)
plot 'data.dat' index i matrix with image
}

Sorting a binary 2D matrix?

I'm looking for some pointers here as I don't quite know where to start researching this one.
I have a 2D matrix with 0 or 1 in each cell, such as:
1 2 3 4
A 0 1 1 0
B 1 1 1 0
C 0 1 0 0
D 1 1 0 0
And I'd like to sort it so it is as "upper triangular" as possible, like so:
4 3 1 2
B 0 1 1 1
A 0 1 0 1
D 0 0 1 1
C 0 0 0 1
The rows and columns must remain intact, i.e. elements can't be moved individually and can only be swapped "whole".
I understand that there'll probably be pathological cases where a matrix has multiple possible sorted results (i.e. same shape, but differ in the identity of the "original" rows/columns.)
So, can anyone suggest where I might find some starting points for this? An existing library/algorithm would be great, but I'll settle for knowing the name of the problem I'm trying to solve!
I doubt it's a linear algebra problem as such, and maybe there's some kind of image processing technique that's applicable.
Any other ideas aside, my initial guess is just to write a simple insertion sort on the rows, then the columns and iterate that until it stabilises (and hope that detecting the pathological cases isn't too hard.)
More details: Some more information on what I'm trying to do may help clarify. Each row represents a competitor, each column represents a challenge. Each 1 or 0 represents "success" for the competitor on a particular challenge.
By sorting the matrix so all 1s are in the top-right, I hope to then provide a ranking of the intrinsic difficulty of each challenge and a ranking of the competitors (which will take into account the difficulty of the challenges they succeeded at, not just the number of successes.)
Note on accepted answer: I've accepted Simulated Annealing as "the answer" with the caveat that this question doesn't have a right answer. It seems like a good approach, though I haven't actually managed to come up with a scoring function that works for my problem.
An Algorithm based upon simulated annealing can handle this sort of thing without too much trouble. Not great if you have small matrices which most likely hae a fixed solution, but great if your matrices get to be larger and the problem becomes more difficult.
(However, it also fails your desire that insertions can be done incrementally.)
Preliminaries
Devise a performance function that "scores" a matrix - matrices that are closer to your triangleness should get a better score than those that are less triangle-y.
Devise a set of operations that are allowed on the matrix. Your description was a little ambiguous, but if you can swap rows then one op would be SwapRows(a, b). Another could be SwapCols(a, b).
The Annealing loop
I won't give a full exposition here, but the idea is simple. You perform random transformations on the matrix using your operations. You measure how much "better" the matrix is after the operation (using the performance function before and after the operation). Then you decide whether to commit that transformation. You repeat this process a lot.
Deciding whether to commit the transform is the fun part: you need to decide whether to perform that operation or not. Toward the end of the annealing process, you only accept transformations that improved the score of the matrix. But earlier on, in a more chaotic time, you allow transformations that don't improve the score. In the beginning, the algorithm is "hot" and anything goes. Eventually, the algorithm cools and only good transforms are allowed. If you linearly cool the algorithm, then the choice of whether to accept a transformation is:
public bool ShouldAccept(double cost, double temperature, Random random) {
return Math.Exp(-cost / temperature) > random.NextDouble();
}
You should read the excellent information contained in Numerical Recipes for more information on this algorithm.
Long story short, you should learn some of these general purpose algorithms. Doing so will allow you to solve large classes of problems that are hard to solve analytically.
Scoring algorithm
This is probably the trickiest part. You will want to devise a scorer that guides the annealing process toward your goal. The scorer should be a continuous function that results in larger numbers as the matrix approaches the ideal solution.
How do you measure the "ideal solution" - triangleness? Here is a naive and easy scorer: For every point, you know whether it should be 1 or 0. Add +1 to the score if the matrix is right, -1 if it's wrong. Here's some code so I can be explicit (not tested! please review!)
int Score(Matrix m) {
var score = 0;
for (var r = 0; r < m.NumRows; r++) {
for (var c = 0; c < m.NumCols; c++) {
var val = m.At(r, c);
var shouldBe = (c >= r) ? 1 : 0;
if (val == shouldBe) {
score++;
}
else {
score--;
}
}
}
return score;
}
With this scoring algorithm, a random field of 1s and 0s will give a score of 0. An "opposite" triangle will give the most negative score, and the correct solution will give the most positive score. Diffing two scores will give you the cost.
If this scorer doesn't work for you, then you will need to "tune" it until it produces the matrices you want.
This algorithm is based on the premise that tuning this scorer is much simpler than devising the optimal algorithm for sorting the matrix.
I came up with the below algorithm, and it seems to work correctly.
Phase 1: move rows with most 1s up and columns with most 1s right.
First the rows. Sort the rows by counting their 1s. We don't care
if 2 rows have the same number of 1s.
Now the columns. Sort the cols by
counting their 1s. We don't care
if 2 cols have the same number of
1s.
Phase 2: repeat phase 1 but with extra criterions, so that we satisfy the triangular matrix morph.
Criterion for rows: if 2 rows have the same number of 1s, we move up the row that begin with fewer 0s.
Criterion for cols: if 2 cols have the same number of 1s, we move right the col that has fewer 0s at the bottom.
Example:
Phase 1
1 2 3 4 1 2 3 4 4 1 3 2
A 0 1 1 0 B 1 1 1 0 B 0 1 1 1
B 1 1 1 0 - sort rows-> A 0 1 1 0 - sort cols-> A 0 0 1 1
C 0 1 0 0 D 1 1 0 0 D 0 1 0 1
D 1 1 0 0 C 0 1 0 0 C 0 0 0 1
Phase 2
4 1 3 2 4 1 3 2
B 0 1 1 1 B 0 1 1 1
A 0 0 1 1 - sort rows-> D 0 1 0 1 - sort cols-> "completed"
D 0 1 0 1 A 0 0 1 1
C 0 0 0 1 C 0 0 0 1
Edit: it turns out that my algorithm doesn't give proper triangular matrices always.
For example:
Phase 1
1 2 3 4 1 2 3 4
A 1 0 0 0 B 0 1 1 1
B 0 1 1 1 - sort rows-> C 0 0 1 1 - sort cols-> "completed"
C 0 0 1 1 A 1 0 0 0
D 0 0 0 1 D 0 0 0 1
Phase 2
1 2 3 4 1 2 3 4 2 1 3 4
B 0 1 1 1 B 0 1 1 1 B 1 0 1 1
C 0 0 1 1 - sort rows-> C 0 0 1 1 - sort cols-> C 0 0 1 1
A 1 0 0 0 A 1 0 0 0 A 0 1 0 0
D 0 0 0 1 D 0 0 0 1 D 0 0 0 1
(no change)
(*) Perhaps a phase 3 will increase the good results. In that phase we place the rows that start with fewer 0s in the top.
Look for a 1987 paper by Anna Lubiw on "Doubly Lexical Orderings of Matrices".
There is a citation below. The ordering is not identical to what you are looking for, but is pretty close. If nothing else, you should be able to get a pretty good idea from there.
http://dl.acm.org/citation.cfm?id=33385
Here's a starting point:
Convert each row from binary bits into a number
Sort the numbers in descending order.
Then convert each row back to binary.
Basic algorithm:
Determine the row sums and store
values. Determine the column sums
and store values.
Sort the row sums in ascending order. Sort the column
sums in ascending order.
Hopefully, you should have a matrix with as close to an upper-right triangular region as possible.
Treat rows as binary numbers, with the leftmost column as the most significant bit, and sort them in descending order, top to bottom
Treat the columns as binary numbers with the bottommost row as the most significant bit and sort them in ascending order, left to right.
Repeat until you reach a fixed point. Proof that the algorithm terminates left as an excercise for the reader.

Resources