how can I create an incidence matrix in Julia - matrix

I would like to create an incidence matrix.
I have a file with 3 columns, like:
id x y
A 22 2
B 4 21
C 21 360
D 26 2
E 22 58
F 2 347
And I want a matrix like (without col and row names):
2 4 21 22 26 58 347 360
A 1 0 0 1 0 0 0 0
B 0 1 1 0 0 0 0 0
C 0 0 1 0 0 0 0 1
D 1 0 0 0 1 0 0 0
E 0 0 0 1 0 1 0 0
F 1 0 0 0 0 0 1 0
I have started the code like:
haps = readdlm("File.txt",header=true)
hap1_2 = map(Int64,haps[1][:,2:end])
ID = (haps[1][:,1])
dic1 = Dict()
for (i in 1:21)
dic1[ID[i]] = hap1_2[i,:]
end
X=[zeros(21,22)]; #the original file has 21 rows and 22 columns
X1 = hcat(ID,X)
The problem now is that I don't know how to fill the matrix with 1s in the specific columns as in the example above.
I'm also not sure if I'm on the right way.
Any suggestion that could help me??
Thanks!

NamedArrays is a neat package which allows naming both rows and columns and seems to fit the bill for this problem. Suppose the data is in data.csv, here is one method to go about it (install NamedArrays with Pkg.add("NamedArrays")):
data,header = readcsv("data.csv",header=true);
# get the column names by looking at unique values in columns
cols = unique(vec([(header[j+1],data[i,j+1]) for i in 1:size(data,1),j=1:2]))
# row names from ID column
rows = data[:,1]
using NamedArrays
narr = NamedArray(zeros(Int,length(rows),length(cols)),(rows,cols),("id","attr"));
# now stamp in the 1s in the right places
for r=1:size(data,1),c=2:size(data,2) narr[data[r,1],(header[c],data[r,c])] = 1 ; end
Now we have (note I transposed narr for better printout):
julia> narr'
10x6 NamedArray{Int64,2}:
attr ╲ id │ A B C D E F
──────────┼─────────────────
("x",22) │ 1 0 0 0 1 0
("x",4) │ 0 1 0 0 0 0
("x",21) │ 0 0 1 0 0 0
("x",26) │ 0 0 0 1 0 0
("x",2) │ 0 0 0 0 0 1
("y",2) │ 1 0 0 1 0 0
("y",21) │ 0 1 0 0 0 0
("y",360) │ 0 0 1 0 0 0
("y",58) │ 0 0 0 0 1 0
("y",347) │ 0 0 0 0 0 1
But, if DataFrames are necessary, similar tricks should apply.
---------- UPDATE ----------
In case the column of a value should be ignored i.e. x=2 and y=2 should both set a 1 on column for value 2, then the code becomes:
using NamedArrays
data,header = readcsv("data.csv",header=true);
rows = data[:,1]
cols = map(string,sort(unique(vec(data[:,2:end]))))
narr = NamedArray(zeros(Int,length(rows),length(cols)),(rows,cols),("id","attr"));
for r=1:size(data,1),c=2:size(data,2) narr[data[r,1],string(data[r,c])] = 1 ; end
giving:
julia> narr
6x8 NamedArray{Int64,2}:
id ╲ attr │ 2 4 21 22 26 58 347 360
──────────┼───────────────────────────────────────
A │ 1 0 0 1 0 0 0 0
B │ 0 1 1 0 0 0 0 0
C │ 0 0 1 0 0 0 0 1
D │ 1 0 0 0 1 0 0 0
E │ 0 0 0 1 0 1 0 0
F │ 1 0 0 0 0 0 1 0

Here is a slight variation on something that I use for creating sparse matrices out of categorical variables for regression analyses. The function includes a variety of comments and options to suit it to your needs. Note: as written, it treats the appearances of "2" and "21" in x and y as separate. It is far less elegant in naming and appearance than the nice response from Dan Getz. The main advantage here is that it works with sparse matrices so if your data is huge, this will be helpful in reducing storage space and computation time.
function OneHot(x::Array, header::Bool)
UniqueVals = unique(x)
Val_to_Idx = [Val => Idx for (Idx, Val) in enumerate(unique(x))] ## create a dictionary that maps unique values in the input array to column positions in the new sparse matrix.
ColIdx = convert(Array{Int64}, [Val_to_Idx[Val] for Val in x])
MySparse = sparse(collect(1:length(x)), ColIdx, ones(Int32, length(x)))
if header
return [UniqueVals' ; MySparse] ## note: this won't be sparse
## alternatively use return (MySparse, UniqueVals) to get a tuple, second element is the header which you can then feed to something to name the columns or do whatever else with
else
return MySparse ## use MySparse[:, 2:end] to drop a value (which you would want to do for categorical variables in a regression)
end
end
x = [22, 4, 21, 26, 22, 2];
y = [2, 21, 360, 2, 58, 347];
Incidence = [OneHot(x, true) OneHot(y, true)]
7x10 Array{Int64,2}:
22 4 21 26 2 2 21 360 58 347
1 0 0 0 0 1 0 0 0 0
0 1 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 1 0 0
0 0 0 1 0 1 0 0 0 0
1 0 0 0 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 1

Related

Is there a way to vectorize this matlab for loop?

for i = 2:N
A(i,i-1:i+1) = [1, -2, 1];
end
Hello, matlab is telling me that this code can be faster by using spalloc for the matrix A (which I have) but also by vectorizing this for loop. I've tried to use the following:
i = 2:N
A(i, i-1:i+1)
but the result obviously turned out to be not what I want.
How can I solve this?
Thank you!
It looks like you're trying to get a second-order difference operator, except your loop winds up missing the first row and including an extra column. The normal (sparse) difference operator is generated like this:
N = 10;
v = ones(N, 1);
A = spdiags([v -2*v v], [-1 0 1], N, N);
full(A) % for display only
You'll see:
ans =
-2 1 0 0 0 0 0 0 0 0
1 -2 1 0 0 0 0 0 0 0
0 1 -2 1 0 0 0 0 0 0
0 0 1 -2 1 0 0 0 0 0
0 0 0 1 -2 1 0 0 0 0
0 0 0 0 1 -2 1 0 0 0
0 0 0 0 0 1 -2 1 0 0
0 0 0 0 0 0 1 -2 1 0
0 0 0 0 0 0 0 1 -2 1
0 0 0 0 0 0 0 0 1 -2
If that's not quite what you want (e.g., you really don't want the first row), then it's probably faster to generate it as above and then fix it up.

Assign specific color to seaborn heatmap

I'm trying to make heatmap using seaborn, but got stuck to change color on specific values. Suppose, the value 0 should be white, and value 1 should be grey, then over that uses the palette as provided by cmap.
Was trying to use mask, but got confused.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
df = pd.read_csv('/home/test.csv', index_col=0)
fig, ax = plt.subplots()
sns.heatmap(df, cmap="Reds", vmin=0, vmax=15)
plt.show()
this for the sample data
TAG A B C D E F G H I J
TAG_1 1 0 0 5 0 7 1 1 0 10
TAG_2 0 1 0 6 0 6 0 0 0 7
TAG_3 0 1 0 2 0 4 0 0 1 4
TAG_4 0 0 0 3 1 3 0 0 0 10
TAG_5 1 0 1 5 0 2 1 1 0 11
TAG_6 0 0 0 0 0 0 0 0 0 12
TAG_7 0 1 0 0 1 0 0 0 0 0
TAG_8 0 0 0 1 0 0 1 0 1 0
TAG_9 0 0 1 0 0 0 0 0 0 0
TAG_10 0 0 0 0 0 0 0 0 0 0
df.set_index('TAG', inplace=True) tells seaborn that the tags should be used as tags, not as data.
The 'binary' colormap goes smoothly from white for the lower values to dark black for the highest. Playing with vmin and vmax, setting vmin=0 and vmax to a value between 1.5 and about 5, value 0 will be white and 1 will be any desired type of gray.
To set a mask, the dataframe should be converted to a 2D numpy array and be of type float.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from io import StringIO
data_str = StringIO('''TAG A B C D E F G H I J
TAG_1 1 0 0 5 0 7 1 1 0 10
TAG_2 0 1 0 6 0 6 0 0 0 7
TAG_3 0 1 0 2 0 4 0 0 1 4
TAG_4 0 0 0 3 1 3 0 0 0 10
TAG_5 1 0 1 5 0 2 1 1 0 11
TAG_6 0 0 0 0 0 0 0 0 0 12
TAG_7 0 1 0 0 1 0 0 0 0 0
TAG_8 0 0 0 1 0 0 1 0 1 0
TAG_9 0 0 1 0 0 0 0 0 0 0
TAG_10 0 0 0 0 0 0 0 0 0 0''')
df = pd.read_csv(data_str, delim_whitespace=True)
df.set_index('TAG', inplace=True)
values = df.to_numpy(dtype=float)
ax = sns.heatmap(values, cmap='Reds', vmin=0, vmax=15, square=True)
sns.heatmap(values, xticklabels=df.columns, yticklabels=df.index,
cmap=plt.get_cmap('binary'), vmin=0, vmax=2, mask=values > 1, cbar=False, ax=ax)
plt.show()
Alternatively, a custom colormap could be created. That way the colorbar will also show the adapted colors.
from matplotlib.colors import LinearSegmentedColormap
cmap_reds = plt.get_cmap('Reds')
num_colors = 15
colors = ['white', 'grey'] + [cmap_reds(i / num_colors) for i in range(2, num_colors)]
cmap = LinearSegmentedColormap.from_list('', colors, num_colors)
ax = sns.heatmap(df, cmap=cmap, vmin=0, vmax=num_colors, square=True, cbar=False)
cbar = plt.colorbar(ax.collections[0], ticks=range(num_colors + 1))
plt.show()

Finding islands of ones with zeros boundary

I am trying to find islands of numbers in a matrix.
By an island, I mean a rectangular area where ones are connected with each other either horizontally, vertically or diagonally including the boundary layer of zeros
Suppose I have this matrix:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
0 0 0 1 1 1 0 1 1 0 0 0 1 1 1 1 0
0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 1
0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0
0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
By boundary layer, I mean row 2 and 7, and column 3 and 10 for island#1.
This is shown below:
I want the row and column indices of the islands. So for the above matrix, the desired output is:
isl{1}= {[2 3 4 5 6 7]; % row indices of island#1
[3 4 5 6 7 8 9 10]} % column indices of island#1
isl{2}= {[2 3 4 5 6 7]; % row indices of island#2
[12 13 14 15 16 17]}; % column indices of island#2
isl{3} ={[9 10 11 12]; % row indices of island#3
[2 3 4 5 6 7 8 9 10 11];} % column indices of island#3
It doesn't matter which island is detected first.
While I know that the [r,c] = find(matrix) function can give the row and column indices of ones but I have no clues on how to detect the connected ones since they can be connected in horizontal, vertical and diagonal order.
Any ideas on how to deal with this problem?
You should look at the BoundingBox and ConvexHull stats returned by regionprops:
a = imread('circlesBrightDark.png');
bw = a < 100;
s = regionprops('table',bw,'BoundingBox','ConvexHull')
https://www.mathworks.com/help/images/ref/regionprops.html
Finding the connected components and their bounding boxes is the easy part. The more difficult part is merging the bounding boxes into islands.
Bounding Boxes
First the easy part.
function bBoxes = getIslandBoxes(lMap)
% find bounding box of each candidate island
% lMap is a logical matrix containing zero or more connected components
bw = bwlabel(lMap); % label connected components in logical matrix
bBoxes = struct2cell(regionprops(bw, 'BoundingBox')); % get bounding boxes
bBoxes = cellfun(#round, bBoxes, 'UniformOutput', false); % round values
end
The values are rounded because the bounding boxes returned by regionprops lies outside its respective component on the grid lines rather than the cell center, and we need integer values to use as subscripts into the matrix. For example, a component that looks like this:
0 0 0
0 1 0
0 0 0
will have a bounding box of
[ 1.5000 1.5000 1.0000 1.0000 ]
which we round to
[ 2 2 1 1]
Merging
Now the hard part. First, the merge condition:
We merge bounding box b2 into bounding box b1 if b2 and the island of b1 (including the boundary layer) have a non-null intersection.
This condition ensures that bounding boxes are merged when one component is wholly or partially inside the bounding box of another, but it also catches the edge cases when a bounding box is within the zero boundary of another. Once all of the bounding boxes are merged, they are guaranteed to have a boundary of all zeros (or border the edge of the matrix), otherwise the nonzero value in its boundary would have been merged.
Since merging involves deleting the merged bounding box, the loops are done backwards so that we don't end up indexing non-existent array elements.
Unfortunately, making one pass through the array comparing each element to all the others is insufficient to catch all cases. To signal that all of the possible bounding boxes have been merged into islands, we use a flag called anyMerged and loop until we get through one complete iteration without merging anything.
function mBoxes = mergeBoxes(bBoxes)
% find bounding boxes that intersect, and merge them
mBoxes = bBoxes;
% merge bounding boxes that overlap
anyMerged = true; % flag to show when we've finished
while (anyMerged)
anyMerged = false; % no boxes merged on this iteration so far...
for box1 = numel(mBoxes):-1:2
for box2 = box1-1:-1:1
% if intersection between bounding boxes is > 0, merge
% the size of box1 is increased b y 1 on all sides...
% this is so that components that lie within the borders
% of another component, but not inside the bounding box,
% are merged
if (rectint(mBoxes{box1} + [-1 -1 2 2], mBoxes{box2}) > 0)
coords1 = rect2corners(mBoxes{box1});
coords2 = rect2corners(mBoxes{box2});
minX = min(coords1(1), coords2(1));
minY = min(coords1(2), coords2(2));
maxX = max(coords1(3), coords2(3));
maxY = max(coords1(4), coords2(4));
mBoxes{box2} = [minX, minY, maxX-minX+1, maxY-minY+1]; % merge
mBoxes(box1) = []; % delete redundant bounding box
anyMerged = true; % bounding boxes merged: loop again
break;
end
end
end
end
end
The merge function uses a small utility function that converts rectangles with the format [x y width height] to a vector of subscripts for the top-left, bottom-right corners [x1 y1 x2 y2]. (This was actually used in another function to check that an island had a zero border, but as discussed above, this check is unnecessary.)
function corners = rect2corners(rect)
% change from rect = x, y, width, height
% to corners = x1, y1, x2, y2
corners = [rect(1), ...
rect(2), ...
rect(1) + rect(3) - 1, ...
rect(2) + rect(4) - 1];
end
Output Formatting and Driver Function
The return value from mergeBoxes is a cell array of rectangle objects. If you find this format useful, you can stop here, but it's easy to get to the format requested with ranges of rows and columns for each island:
function rRanges = rect2range(bBoxes, mSize)
% convert rect = x, y, width, height to
% range = y:y+height-1; x:x+width-1
% and expand range by 1 in all 4 directions to include zero border,
% making sure to stay within borders of original matrix
rangeFun = #(rect) {max(rect(2)-1,1):min(rect(2)+rect(4),mSize(1));...
max(rect(1)-1,1):min(rect(1)+rect(3),mSize(2))};
rRanges = cellfun(rangeFun, bBoxes, 'UniformOutput', false);
end
All that's left is a main function to tie all of the others together and we're done.
function theIslands = getIslandRects(m)
% get rectangle around each component in map
lMap = logical(m);
% get the bounding boxes of candidate islands
bBoxes = getIslandBoxes(lMap);
% merge bounding boxes that overlap
bBoxes = mergeBoxes(bBoxes);
% convert bounding boxes to row/column ranges
theIslands = rect2range(bBoxes, size(lMap));
end
Here's a run using the sample matrix given in the question:
M =
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
0 0 0 1 1 1 0 1 1 0 0 0 1 1 1 1 0
0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 1
0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0
0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> getIslandRects(M)
ans =
{
[1,1] =
{
[1,1] =
9 10 11 12
[2,1] =
2 3 4 5 6 7 8 9 10 11
}
[1,2] =
{
[1,1] =
2 3 4 5 6 7
[2,1] =
3 4 5 6 7 8 9 10
}
[1,3] =
{
[1,1] =
2 3 4 5 6 7
[2,1] =
12 13 14 15 16 17
}
}
Quite easy!
Just use bwboundaries to get the boundaries of each of the blobs. you can then just get the min and max in each x and y direction of each boundary to build your box.
Use image dilation and regionprops
mat = [...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1;
0 0 0 1 1 1 0 1 1 0 0 0 1 1 1 1 0;
0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 1;
0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0;
0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
mat=logical(mat);
dil_mat=imdilate(mat,true(2,2)); %here we make bridges to 1 px away ones
l_mat=bwlabel(dil_mat,8);
bb = regionprops(l_mat,'BoundingBox');
bb = struct2cell(bb); bb = cellfun(#(x) fix(x), bb, 'un',0);
isl = cellfun(#(x) {max(1,x(2)):min(x(2)+x(4),size(mat,1)),...
max(1,x(1)):min(x(1)+x(3),size(mat,2))},bb,'un',0);

Adjacent Elements in MATLAB with Mathematical Formulation

I have a set with elements and the possible adjacent combinations for this are:
So the total possible combinations are c=11 which can be calculated with the formula:
I can model this using a as below whose elements can be represented as a(n,c) are:
I have tried to implement this in MATLAB, but since I have hard-coded the above math my code is not extensible for cases where n > 4:
n=4;
c=((n^2)/2)+(n/2)+1;
A=zeros(n,c);
for i=1:n
A(i,i+1)=1;
end
for i=1:n-1
A(i,n+i+1)=1;
A(i+1,n+i+1)=1;
end
for i=1:n-2
A(i,n+i+4)=1;
A(i+1,n+i+4)=1;
A(i+2,n+i+4)=1;
end
for i=1:n-3
A(i,n+i+6)=1;
A(i+1,n+i+6)=1;
A(i+2,n+i+6)=1;
A(i+3,n+i+6)=1;
end
Is there a relatively low complexity method to transform this problem in MATLAB with n number of elements of set N, following my above mathematical formulation?
The easy way to go about this is to take a bit pattern with the first k bits set and shift it down n - k times, saving each shifted column vector to the result. So, starting from
1
0
0
0
Shift 1, 2, and 3 times to get
|1 0 0 0|
|0 1 0 0|
|0 0 1 0|
|0 0 0 1|
We'll use circshift to achieve this.
function A = adjcombs(n)
c = (n^2 + n)/2 + 1; % number of combinations
A = zeros(n,c); % preallocate output array
col_idx = 1; % skip the first (all-zero) column
curr_col = zeros(n,1); % column vector containing current combination
for elem_count = 1:n
curr_col(elem_count) = 1; % add another element to our combination
for shift_count = 0:(n - elem_count)
col_idx = col_idx + 1; % increment column index
% shift the current column and insert it at the proper index
A(:,col_idx) = circshift(curr_col, shift_count);
end
end
end
Calling the function with n = 4 and 6 we get:
>> A = adjcombs(4)
A =
0 1 0 0 0 1 0 0 1 0 1
0 0 1 0 0 1 1 0 1 1 1
0 0 0 1 0 0 1 1 1 1 1
0 0 0 0 1 0 0 1 0 1 1
>> A = adjcombs(6)
A =
0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1
0 0 1 0 0 0 0 1 1 0 0 0 1 1 0 0 1 1 0 1 1 1
0 0 0 1 0 0 0 0 1 1 0 0 1 1 1 0 1 1 1 1 1 1
0 0 0 0 1 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 1 0 1 1 1 1 1
0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 1

Convert adjacent elements of matrix in Matlab

I'm working on Brushfire algorithm and I need to make a loop which will scan through the matrix and find the adjacent zeros with ones and convert "1" to "2". Assume that I have a matrix 5 by 5:
0 0 0 0 0
0 1 1 1 1
0 0 1 1 1
0 0 1 1 1
0 0 1 1 1
Can I somehow make it:
0 0 0 0 0
0 2 2 2 2
0 0 2 1 1
0 0 2 1 1
0 0 2 1 1
Thank you
With the image processing toolbox, the algorithm would be:
A = [0 0 0 0 0
0 1 1 1 1
0 0 1 1 1
0 0 1 1 1
0 0 1 1 1];
B = A;
%# set pixels at border between 0 and 1 to 2
B(imdilate(~A,true(3)) & A>0) = 2;
You do it with 2D-convolution, using the standard function conv2. Denoting your matrix as X,
mask = [0 1 0; 1 1 1; 0 1 0]; %// or [1 1 1; 1 1 1; 1 1 1] to include diagonal adjacency
X(conv2(double(~X), mask, 'same') & X) = 2;

Resources