REFERENCE STRUCTURE = 00000 A,B,C = 120.000 120.000 42.560
ALPHA,BETA,GAMMA = 90.000 90.000 90.000 SPGR = P1
31984 1 new.pdb
x y z
1 C 8.17500 93.80900 21.90700 8 4 2 0 0 0 0 0 -0.036 1
2 C 9.34800 94.14800 22.73500 1 16 9 0 0 0 0 0 0.038 1
3 C 8.05800 95.47500 24.28800 6 9 15 0 0 0 0 0 0.038 1
4 C 6.95800 94.40500 22.32000 12 1 6 0 0 0 0 0 0.060 1
5 O 7.20600 96.40600 26.25200 15 0 0 0 0 0 0 0 -0.270 1
6 C 6.88800 95.13100 23.50200 4 10 3 0 0 0 0 0 -0.036 1
7 O 4.60000 94.52600 21.81800 1645872 0 0 0 0 0 0 0 -0.245 1
8 H 8.26600 93.17800 21.03500 1 0 0 0 0 0 0 0 0.063 1
9 C 9.25800 94.94800 23.85500 2 3 11 0 0 0 0 0 -0.037 1
10 H 5.98600 95.70100 23.66700 6 0 0 0 0 0 0 0 0.063 1
11 H 10.19600 95.24800 24.29800 9 0 0 0 0 0 0 0 0.063 1
12 C 5.70900 94.23600 21.42300 13454 7 4 0 0 0 0 0 0.337 1
13 O 5.87600 93.60100 20.21100 14 12 0 0 0 0 0 0 -0.477 1
14 H 5.04400 93.52600 19.73800 13 0 0 0 0 0 0 0 0.295 1
I have this file structure and I need to make all the columns after the x, y and z columns to be zero and the last column to be deleted. for example I need to have the following as output (sample).
1 C 8.17500 93.80900 21.90700 0 0 0 0 0 0 0 0 0
2 C 9.34800 94.14800 22.73500 0 0 0 0 0 0 0 0 0
If the pattern is predictable, a find/replace would work
%s/\v(\d+ \w+\s+([0-9\.]+\s+){3}).*/\10 0 0 0 0 0 0 0 0
Breakdown
%s/ - substitute every line
\v - very magic
(\d+ \w+\s+([0-9\.]+\s+){3}) - capture everything between ()
searches for '1 C 8.17500 93.80900 21.90700 '
.* - remaining character after our captured group
/\1 - replace, insert the captured group
0 0 0 0 0 0 0 0 0 - add required zero's
Regexbuddy comment
// (\d+ \w+\s+([0-9\.]+\s+){3}).*
//
// Options: Case insensitive; Exact spacing; Dot matches line breaks; ^$ match at line breaks; Default line breaks; Numbered capture; Allow duplicate names; Greedy quantifiers; Allow zero-length matches
//
// Match the regex below and capture its match into backreference number 1 «(\d+ \w+\s+([0-9\.]+\s+){3})»
// Match a single character that is a “digit” (ASCII 0–9 only) «\d+»
// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the character “ ” literally « »
// Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match a single character that is a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s+»
// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the regex below and capture its match into backreference number 2 «([0-9\.]+\s+){3}»
// Exactly 3 times «{3}»
// Match a single character present in the list below «[0-9\.]+»
// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// A character in the range between “0” and “9” «0-9»
// The literal character “.” «\.»
// Match a single character that is a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s+»
// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match any single character «.*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Related
I have the following data:
client_id <- c(1,2,3,1,2,3)
product_id <- c(10,10,10,20,20,20)
connected <- c(1,1,0,1,0,0)
clientID_productID <- paste0(client_id,";",product_id)
df <- data.frame(client_id, product_id,connected,clientID_productID)
client_id product_id connected clientID_productID
1 1 10 1 1;10
2 2 10 1 2;10
3 3 10 0 3;10
4 1 20 1 1;20
5 2 20 0 2;20
6 3 20 0 3;20
The goal is to produce a relational matrix:
client_id product_id clientID_productID client_pro_1_10 client_pro_2_10 client_pro_3_10 client_pro_1_20 client_pro_2_20 client_pro_3_20
1 1 10 1;10 0 1 0 0 0 0
2 2 10 2;10 1 0 0 0 0 0
3 3 10 3;10 0 0 0 0 0 0
4 1 20 1;20 0 0 0 0 0 0
5 2 20 2;20 0 0 0 0 0 0
6 3 20 3;20 0 0 0 0 0 0
In other words, when product_id equals 10, clients 1 and 2 are connected. Importantly, I do not want client 1 to be connected with herself. When product_id=20, I have only one client, meaning that there is no connection, so I should have only zeros.
To be more specific, all that I am trying to create is a square matrix of relations, with all the combinations of client/product in the columns. A client can only be connected with another if they bought the same product.
I have searched a bunch and played with other code. The difference between this problem and others already answered is that I want to keep on my table client number 3, even though she never bought any product. I want to show that she does not have a relationship with any other client. Right now, I am able to create the matrix by stacking the relationships by product (How to create relational matrix in R?), but I am struggling with a way to not stack them.
I apologize if the question is not specific enough, or too specific. Thank you anyway, stackoverflow is a lifesaver for beginners.
I believe I figured it out.
It is for sure not the most elegant answer, though.
client_id <- c(1,2,3,1,2,3)
product_id <- c(10,10,10,20,20,20)
connected <- c(1,1,0,1,0,0)
clientID_productID <- paste0(client_id,";",product_id)
df <- data.frame(client_id, product_id,connected,clientID_productID)
df2 <- inner_join(df[c(1:3)], df[c(1:3)], by = c("product_id", "connected"))
df2$Source <- paste0(df2$client_id.x,"|",df2$product_id)
df2$Target <- paste0(df2$client_id.y,"|",df2$product_id)
df2 <- df2[order(df2$product_id),]
indices = unique(as.character(df2$Source))
mtx <- as.matrix(dcast(df2, Source ~ Target, value.var="connected", fill=0))
rownames(mtx) = mtx[,"Source"]
mtx <- mtx[,-1]
diag(mtx)=0
mtx = as.data.frame(mtx)
mtx = mtx[indices, indices]
I got the result I wanted:
1|10 2|10 3|10 1|20 2|20 3|20
1|10 0 1 0 0 0 0
2|10 1 0 0 0 0 0
3|10 0 0 0 0 0 0
1|20 0 0 0 0 0 0
2|20 0 0 0 0 0 0
3|20 0 0 0 0 0 0
I am trying to find islands of numbers in a matrix.
By an island, I mean a rectangular area where ones are connected with each other either horizontally, vertically or diagonally including the boundary layer of zeros
Suppose I have this matrix:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
0 0 0 1 1 1 0 1 1 0 0 0 1 1 1 1 0
0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 1
0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0
0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
By boundary layer, I mean row 2 and 7, and column 3 and 10 for island#1.
This is shown below:
I want the row and column indices of the islands. So for the above matrix, the desired output is:
isl{1}= {[2 3 4 5 6 7]; % row indices of island#1
[3 4 5 6 7 8 9 10]} % column indices of island#1
isl{2}= {[2 3 4 5 6 7]; % row indices of island#2
[12 13 14 15 16 17]}; % column indices of island#2
isl{3} ={[9 10 11 12]; % row indices of island#3
[2 3 4 5 6 7 8 9 10 11];} % column indices of island#3
It doesn't matter which island is detected first.
While I know that the [r,c] = find(matrix) function can give the row and column indices of ones but I have no clues on how to detect the connected ones since they can be connected in horizontal, vertical and diagonal order.
Any ideas on how to deal with this problem?
You should look at the BoundingBox and ConvexHull stats returned by regionprops:
a = imread('circlesBrightDark.png');
bw = a < 100;
s = regionprops('table',bw,'BoundingBox','ConvexHull')
https://www.mathworks.com/help/images/ref/regionprops.html
Finding the connected components and their bounding boxes is the easy part. The more difficult part is merging the bounding boxes into islands.
Bounding Boxes
First the easy part.
function bBoxes = getIslandBoxes(lMap)
% find bounding box of each candidate island
% lMap is a logical matrix containing zero or more connected components
bw = bwlabel(lMap); % label connected components in logical matrix
bBoxes = struct2cell(regionprops(bw, 'BoundingBox')); % get bounding boxes
bBoxes = cellfun(#round, bBoxes, 'UniformOutput', false); % round values
end
The values are rounded because the bounding boxes returned by regionprops lies outside its respective component on the grid lines rather than the cell center, and we need integer values to use as subscripts into the matrix. For example, a component that looks like this:
0 0 0
0 1 0
0 0 0
will have a bounding box of
[ 1.5000 1.5000 1.0000 1.0000 ]
which we round to
[ 2 2 1 1]
Merging
Now the hard part. First, the merge condition:
We merge bounding box b2 into bounding box b1 if b2 and the island of b1 (including the boundary layer) have a non-null intersection.
This condition ensures that bounding boxes are merged when one component is wholly or partially inside the bounding box of another, but it also catches the edge cases when a bounding box is within the zero boundary of another. Once all of the bounding boxes are merged, they are guaranteed to have a boundary of all zeros (or border the edge of the matrix), otherwise the nonzero value in its boundary would have been merged.
Since merging involves deleting the merged bounding box, the loops are done backwards so that we don't end up indexing non-existent array elements.
Unfortunately, making one pass through the array comparing each element to all the others is insufficient to catch all cases. To signal that all of the possible bounding boxes have been merged into islands, we use a flag called anyMerged and loop until we get through one complete iteration without merging anything.
function mBoxes = mergeBoxes(bBoxes)
% find bounding boxes that intersect, and merge them
mBoxes = bBoxes;
% merge bounding boxes that overlap
anyMerged = true; % flag to show when we've finished
while (anyMerged)
anyMerged = false; % no boxes merged on this iteration so far...
for box1 = numel(mBoxes):-1:2
for box2 = box1-1:-1:1
% if intersection between bounding boxes is > 0, merge
% the size of box1 is increased b y 1 on all sides...
% this is so that components that lie within the borders
% of another component, but not inside the bounding box,
% are merged
if (rectint(mBoxes{box1} + [-1 -1 2 2], mBoxes{box2}) > 0)
coords1 = rect2corners(mBoxes{box1});
coords2 = rect2corners(mBoxes{box2});
minX = min(coords1(1), coords2(1));
minY = min(coords1(2), coords2(2));
maxX = max(coords1(3), coords2(3));
maxY = max(coords1(4), coords2(4));
mBoxes{box2} = [minX, minY, maxX-minX+1, maxY-minY+1]; % merge
mBoxes(box1) = []; % delete redundant bounding box
anyMerged = true; % bounding boxes merged: loop again
break;
end
end
end
end
end
The merge function uses a small utility function that converts rectangles with the format [x y width height] to a vector of subscripts for the top-left, bottom-right corners [x1 y1 x2 y2]. (This was actually used in another function to check that an island had a zero border, but as discussed above, this check is unnecessary.)
function corners = rect2corners(rect)
% change from rect = x, y, width, height
% to corners = x1, y1, x2, y2
corners = [rect(1), ...
rect(2), ...
rect(1) + rect(3) - 1, ...
rect(2) + rect(4) - 1];
end
Output Formatting and Driver Function
The return value from mergeBoxes is a cell array of rectangle objects. If you find this format useful, you can stop here, but it's easy to get to the format requested with ranges of rows and columns for each island:
function rRanges = rect2range(bBoxes, mSize)
% convert rect = x, y, width, height to
% range = y:y+height-1; x:x+width-1
% and expand range by 1 in all 4 directions to include zero border,
% making sure to stay within borders of original matrix
rangeFun = #(rect) {max(rect(2)-1,1):min(rect(2)+rect(4),mSize(1));...
max(rect(1)-1,1):min(rect(1)+rect(3),mSize(2))};
rRanges = cellfun(rangeFun, bBoxes, 'UniformOutput', false);
end
All that's left is a main function to tie all of the others together and we're done.
function theIslands = getIslandRects(m)
% get rectangle around each component in map
lMap = logical(m);
% get the bounding boxes of candidate islands
bBoxes = getIslandBoxes(lMap);
% merge bounding boxes that overlap
bBoxes = mergeBoxes(bBoxes);
% convert bounding boxes to row/column ranges
theIslands = rect2range(bBoxes, size(lMap));
end
Here's a run using the sample matrix given in the question:
M =
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
0 0 0 1 1 1 0 1 1 0 0 0 1 1 1 1 0
0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 1
0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0
0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> getIslandRects(M)
ans =
{
[1,1] =
{
[1,1] =
9 10 11 12
[2,1] =
2 3 4 5 6 7 8 9 10 11
}
[1,2] =
{
[1,1] =
2 3 4 5 6 7
[2,1] =
3 4 5 6 7 8 9 10
}
[1,3] =
{
[1,1] =
2 3 4 5 6 7
[2,1] =
12 13 14 15 16 17
}
}
Quite easy!
Just use bwboundaries to get the boundaries of each of the blobs. you can then just get the min and max in each x and y direction of each boundary to build your box.
Use image dilation and regionprops
mat = [...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1;
0 0 0 1 1 1 0 1 1 0 0 0 1 1 1 1 0;
0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 1;
0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0;
0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
mat=logical(mat);
dil_mat=imdilate(mat,true(2,2)); %here we make bridges to 1 px away ones
l_mat=bwlabel(dil_mat,8);
bb = regionprops(l_mat,'BoundingBox');
bb = struct2cell(bb); bb = cellfun(#(x) fix(x), bb, 'un',0);
isl = cellfun(#(x) {max(1,x(2)):min(x(2)+x(4),size(mat,1)),...
max(1,x(1)):min(x(1)+x(3),size(mat,2))},bb,'un',0);
Let us consider the example:
The pbm file "imFile.pbm" contains the pixels as follows :
P1
# Comment
9 6
0 0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
How can I determine the width and Height of the image. I have used the following code but failed.
with open("imFile.pbm", 'rb') as f:
image = f.size
print image
f.close()
When I compiled it in my ubuntu14.04 os, it shows error. Any suggestion will be appreciated. Thank you in advance.
The width and height is right there in the file, in the first row after the first one, skipping comments. That's not what .size is for; you need to read and parse the file.
with open("imFile.pbm", 'r') as f:
lines = f.readlines()
lines.pop(0) # skip header
line = lines.pop(0) # get next line
while line.startswith("#"): # repeat till that line is not a comment
line = lines.pop(0)
width, height= line.split() # split the first non-comment lin
print("%s x %s" % (width, height)) # => 9 x 6
Let Y be a vector of length N, containing numbers from 1 to 10. As example code you can use:
Y = vec(1:10);
I am writing the code which must create an N x 10 matrix, each row consisting of all zeros except for a 1 only in the position which corresponds to the number in vector Y. Thus, 1 in Y becomes 10000000000, 3 becomes 0010000000, and so on.
This approach works:
cell2mat(arrayfun(#(x)eye(10)(x,:), Y, 'UniformOutput', false))
My next idea was to "optimize", so eye(10) is not generated N times, and I wrote this:
theEye = eye(10);
cell2mat(arrayfun(#(x)theEye(x,:), Y, 'UniformOutput', false))
However, now Octave is giving me error:
error: can't perform indexing operations for diagonal matrix type
error: evaluating argument list element number 1
Why do I get this error? What is wrong?
Bonus questions — do you see a better way to do what I am doing? Is my attempt to optimize making things easier for Octave?
I ran this code in Octave and eye creates a matrix of a class (or whatever this is) known as a Diagonal Matrix:
octave:3> theEye = eye(10);
octave:4> theEye
theEye =
Diagonal Matrix
1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1
In fact, the documentation for Octave says that if the matrix is diagonal, a special object is created to handle the diagonal matrices instead of a standard matrix: https://www.gnu.org/software/octave/doc/interpreter/Creating-Diagonal-Matrices.html
What's interesting is that we can slice into this matrix outside of the arrayfun call, regardless of it being in a separate class.
octave:1> theEye = eye(10);
octave:2> theEye(1,:)
ans =
Diagonal Matrix
1 0 0 0 0 0 0 0 0 0
However, as soon as we put this into an arrayfun call, it decides to crap out:
octave:5> arrayfun(#(x)theEye(x,:), 1:3, 'uni', 0)
error: can't perform indexing operations for diagonal matrix type
This to me doesn't make any sense, especially since we can slice into it outside of arrayfun. One may suspect that it has something to do with arrayfun and since you are specifying UniformOutput to be false, a cell array of elements is returned per element in Y and perhaps something is going wrong when storing these slices into each cell array element.
However, this doesn't seem to be the culprit either. I took the first three rows of theEye, placed them into a cell array and merged them together using cell2mat:
octave:6> cell2mat({theEye(1,:); theEye(2,:); theEye(3,:)})
ans =
1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
As such, I suspect that it may be some sort of internal bug (if you could call it that...). Thanks to user carandraug (see comment above), this is indeed a bug and it has been reported: https://savannah.gnu.org/bugs/?47510. What may also provide insight is that this code runs as expected in MATLAB.
In any case, one thing you can take away from this is that I would seriously refrain from using cell2mat. Just use straight up indexing:
Y = vec(1:10);
theEye = eye(10);
out = theEye(Y,:);
This would index into theEye and extract out the relevant rows stored in Y and create a matrix where each row is zero except for the corresponding value seen in each element Y.
Also, have a look at this post for a similar example: Replace specific columns in a matrix with a constant column vector
However, it is defined over the columns instead of the rows, but it's very similar to what you want to achieve.
Another approach; We start with the data:
>> len = 10; % max number
>> vec = randi(len, [1 7]) % vector of numbers
vec =
1 10 9 5 7 3 6
Now we build the indicator matrix:
>> I = full(sparse(1:numel(vec), vec, 1, numel(vec), len))
I =
1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0
I got such stdout.
Queues
queue dur autoDel excl msg msgIn msgOut bytes bytesIn bytesOut cons bind
==============================================================================================================================
14531c8d-dd9b-4f41-9d92-c1344774d21c:0.0 Y Y 0 0 0 0 0 0 1 2
qmfagent-425fa29c-0892-4c08-a2d9-e7331a37dc13 Y Y 0 0 0 0 0 0 1 4
So I need to parse this output and get only something like this
14531c8d-dd9b-4f41-9d92-c1344774d21c:0.0
qmfagent-425fa29c-0892-4c08-a2d9-e7331a37dc13
Can anybody tell me how to do this ruby? Of course there could be more lines.
Split lines by newline (\n). Get last two lines.
output = <<EOD
Queues
queue dur autoDel excl msg msgIn msgOut bytes bytesIn bytesOut cons bind
==============================================================================================================================
14531c8d-dd9b-4f41-9d92-c1344774d21c:0.0 Y Y 0 0 0 0 0 0 1 2
qmfagent-425fa29c-0892-4c08-a2d9-e7331a37dc13 Y Y 0 0 0 0 0 0 1 4
EOD
lines = output.strip.split("\n") # Split lines by newline
last_two_lines = lines[-2..-1] # Get the last 2 lines.
p last_two_lines.map {|line| line.split[0]} # Get the first fields.
prints
["14531c8d-dd9b-4f41-9d92-c1344774d21c:0.0", "qmfagent-425fa29c-0892-4c08-a2d9-e7331a37dc13"]
queues = <<EOS
queue dur autoDel excl msg msgIn msgOut bytes bytesIn bytesOut cons bind
==============================================================================================================================
14531c8d-dd9b-4f41-9d92-c1344774d21c:0.0 Y Y 0 0 0 0 0 0 1 2
qmfagent-425fa29c-0892-4c08-a2d9-e7331a37dc13 Y Y 0 0 0 0 0 0 1 4
EOS
queues.lines.each {|line|
puts line.split.first if line =~ /[[\da-f]]{4}/i # detects 4 consecutive hexadecimals
}