Related
I have the following data:
client_id <- c(1,2,3,1,2,3)
product_id <- c(10,10,10,20,20,20)
connected <- c(1,1,0,1,0,0)
clientID_productID <- paste0(client_id,";",product_id)
df <- data.frame(client_id, product_id,connected,clientID_productID)
client_id product_id connected clientID_productID
1 1 10 1 1;10
2 2 10 1 2;10
3 3 10 0 3;10
4 1 20 1 1;20
5 2 20 0 2;20
6 3 20 0 3;20
The goal is to produce a relational matrix:
client_id product_id clientID_productID client_pro_1_10 client_pro_2_10 client_pro_3_10 client_pro_1_20 client_pro_2_20 client_pro_3_20
1 1 10 1;10 0 1 0 0 0 0
2 2 10 2;10 1 0 0 0 0 0
3 3 10 3;10 0 0 0 0 0 0
4 1 20 1;20 0 0 0 0 0 0
5 2 20 2;20 0 0 0 0 0 0
6 3 20 3;20 0 0 0 0 0 0
In other words, when product_id equals 10, clients 1 and 2 are connected. Importantly, I do not want client 1 to be connected with herself. When product_id=20, I have only one client, meaning that there is no connection, so I should have only zeros.
To be more specific, all that I am trying to create is a square matrix of relations, with all the combinations of client/product in the columns. A client can only be connected with another if they bought the same product.
I have searched a bunch and played with other code. The difference between this problem and others already answered is that I want to keep on my table client number 3, even though she never bought any product. I want to show that she does not have a relationship with any other client. Right now, I am able to create the matrix by stacking the relationships by product (How to create relational matrix in R?), but I am struggling with a way to not stack them.
I apologize if the question is not specific enough, or too specific. Thank you anyway, stackoverflow is a lifesaver for beginners.
I believe I figured it out.
It is for sure not the most elegant answer, though.
client_id <- c(1,2,3,1,2,3)
product_id <- c(10,10,10,20,20,20)
connected <- c(1,1,0,1,0,0)
clientID_productID <- paste0(client_id,";",product_id)
df <- data.frame(client_id, product_id,connected,clientID_productID)
df2 <- inner_join(df[c(1:3)], df[c(1:3)], by = c("product_id", "connected"))
df2$Source <- paste0(df2$client_id.x,"|",df2$product_id)
df2$Target <- paste0(df2$client_id.y,"|",df2$product_id)
df2 <- df2[order(df2$product_id),]
indices = unique(as.character(df2$Source))
mtx <- as.matrix(dcast(df2, Source ~ Target, value.var="connected", fill=0))
rownames(mtx) = mtx[,"Source"]
mtx <- mtx[,-1]
diag(mtx)=0
mtx = as.data.frame(mtx)
mtx = mtx[indices, indices]
I got the result I wanted:
1|10 2|10 3|10 1|20 2|20 3|20
1|10 0 1 0 0 0 0
2|10 1 0 0 0 0 0
3|10 0 0 0 0 0 0
1|20 0 0 0 0 0 0
2|20 0 0 0 0 0 0
3|20 0 0 0 0 0 0
Please assume A is a matrix of 4 x 4 which has:
A = 1 0 1 0
1 0 1 0
1 1 1 0
1 1 0 0
And B is a reference matrix (4 x 4) which is:
B = 1 0 1 0
1 0 1 0
1 0 1 0
1 1 1 0
Now, if A would be compared to B which is the reference matrix, by matching these two matrices, almost all of members are equal except A(4,3) and A(3,2). However, since B is the reference matrix and A is comparing to that, only differences of those members are matter which are 1 in B. In this particular example, A(4,3) is only matter, not A(3,2), Means:
>> C = B ~= A;
ans =
0 0 0 0
0 0 0 0
0 1 0 0
0 0 1 0
A(4,3) ~= B(4,3)
Finally, we are looking for a piece of code which can show how many percentage of ones in A are equal to their equivalent members at B. In this case the difference is:
(8 / 9) * 100 = 88.89 % are matched.
Please bear in mind that speed is also important here. Therefore, quicker solution are more appreciated. Thanks.
For getting only the different entries where there is a 1 in B, just add an & to it, so you'll only get these entries. To get the percentage, take the sum where A and B are 1. Then divide it by the sum of 1 in B (or the sum of 1in A -> see the note below).
A = [1 0 1 0;
1 0 1 0;
1 1 1 0;
1 1 0 0];
B = [1 0 1 0;
1 0 1 0;
1 0 1 0;
1 1 1 0];
C = (B ~= A) & B
p = sum(B(:) & A(:)) / sum(B(:)) * 100
This is the result:
C =
0 0 0 0
0 0 0 0
0 0 0 0
0 0 1 0
p =
88.8889
Edit / Note: In the OP's question it's not 100% clear if he wants the percentage in relation to the sum of ones in A or B. I assumed that it is a percentage of the reference-matrix, which is B. Therefore I divide by sum(B(:)). In case you need it in reference to the ones in A, just change the last line to:
p = sum(B(:) & A(:)) / sum(A(:)) * 100
If I got it right, what you want to know is where B == 1 and A == 0.
Try this:
>> C = B & ~A
C =
0 0 0 0
0 0 0 0
0 0 0 0
0 0 1 0
To get the percentage, you could try this:
>> 100 * sum(A(:) & B(:)) / sum(A(:))
ans =
88.8889
You can use matrix-multiplication, which must be pretty efficient as listed next.
To get the percentage value with respect to A -
percentage_wrtA = A(:).'*B(:)/sum(A(:)) * 100;
To get the percentage value with respect to B -
percentage_wrtB = A(:).'*B(:)/sum(B(:)) * 100;
Runtime tests
Here's some quick runtime tests to compare matrix-multiplication against summation of elements with (:) and ANDing -
>> M = 6000; %// Datasize
>> A = randi([0,1],M,M);
>> B = randi([0,1],M,M);
>> tic,sum(B(:) & A(:));toc
Elapsed time is 0.500149 seconds.
>> tic,A(:).'*B(:);toc
Elapsed time is 0.126881 seconds.
Try:
sum(sum(A & B))./sum(sum(A))
Output:
ans =
0.8889
On Octave I'm trying to unpack a vector in the format:
y = [ 1
2
4
1
3 ]
I want to return a matrix of dimension ( rows(y) x max value(y) ), where for each row I have a 1 in the column of the original digits value, and a zero everywhere else, i.e. for the example above
y01 = [ 1 0 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0 ]
so far I have
y01 = zeros( m, num_labels );
for i = 1:m
for j = 1:num_labels
y01(i,j) = (y(i) == j);
end
end
which works, but is going get slow for bigger matrices, and seems inefficient because it is cycling through every single value even though the majority aren't changing.
I found this for R on another thread:
f3 <- function(vec) {
U <- sort(unique(vec))
M <- matrix(0, nrow = length(vec),
ncol = length(U),
dimnames = list(NULL, U))
M[cbind(seq_len(length(vec)), match(vec, U))] <- 1L
M
}
but I don't know R and I'm not sure if/how the solution ports to octave.
Thanks for any suggestions!
Use a sparse matrix (which also saves a lot of memory) which can be used in further calculations as usual:
y = [1; 2; 4; 1; 3]
y01 = sparse (1:rows (y), y, 1)
if you really want a full matrix then use "full":
full (y01)
ans =
1 0 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0
Sparse is a more efficient way to do this when the matrix is big.
If your dimension of the result is not very high, you can try this:
y = [1; 2; 4; 1; 3]
I = eye(max(y));
y01 = I(y,:)
The result is same as full(sparse(...)).
y01 =
1 0 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0
% Vector y to Matrix Y
Y = zeros(m, num_labels);
% Loop through each row
for i = 1:m
% Use the value of y as an index; set the value matching index to 1
Y(i,y(i)) = 1;
end
Another possibility is:
y = [1; 2; 4; 1; 3]
classes = unique(y)(:)
num_labels = length(classes)
y01=[1:num_labels] == y
With the following detailed printout:
y =
1
2
4
1
3
classes =
1
2
3
4
num_labels = 4
y01 =
1 0 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0
I got such stdout.
Queues
queue dur autoDel excl msg msgIn msgOut bytes bytesIn bytesOut cons bind
==============================================================================================================================
14531c8d-dd9b-4f41-9d92-c1344774d21c:0.0 Y Y 0 0 0 0 0 0 1 2
qmfagent-425fa29c-0892-4c08-a2d9-e7331a37dc13 Y Y 0 0 0 0 0 0 1 4
So I need to parse this output and get only something like this
14531c8d-dd9b-4f41-9d92-c1344774d21c:0.0
qmfagent-425fa29c-0892-4c08-a2d9-e7331a37dc13
Can anybody tell me how to do this ruby? Of course there could be more lines.
Split lines by newline (\n). Get last two lines.
output = <<EOD
Queues
queue dur autoDel excl msg msgIn msgOut bytes bytesIn bytesOut cons bind
==============================================================================================================================
14531c8d-dd9b-4f41-9d92-c1344774d21c:0.0 Y Y 0 0 0 0 0 0 1 2
qmfagent-425fa29c-0892-4c08-a2d9-e7331a37dc13 Y Y 0 0 0 0 0 0 1 4
EOD
lines = output.strip.split("\n") # Split lines by newline
last_two_lines = lines[-2..-1] # Get the last 2 lines.
p last_two_lines.map {|line| line.split[0]} # Get the first fields.
prints
["14531c8d-dd9b-4f41-9d92-c1344774d21c:0.0", "qmfagent-425fa29c-0892-4c08-a2d9-e7331a37dc13"]
queues = <<EOS
queue dur autoDel excl msg msgIn msgOut bytes bytesIn bytesOut cons bind
==============================================================================================================================
14531c8d-dd9b-4f41-9d92-c1344774d21c:0.0 Y Y 0 0 0 0 0 0 1 2
qmfagent-425fa29c-0892-4c08-a2d9-e7331a37dc13 Y Y 0 0 0 0 0 0 1 4
EOS
queues.lines.each {|line|
puts line.split.first if line =~ /[[\da-f]]{4}/i # detects 4 consecutive hexadecimals
}
Given a matrix of size n x m filled with 0's and 1's
e.g.:
1 1 0 1 0
0 0 0 0 0
0 1 0 0 0
1 0 1 1 0
if the matrix has 1 at (i,j), fill the column j and row i with 1's
i.e., we get:
1 1 1 1 1
1 1 1 1 0
1 1 1 1 1
1 1 1 1 1
Required complexity: O(n*m) time and O(1) space
NOTE: you are not allowed to store anything except '0' or '1' in the matrix entries
Above is a Microsoft Interview Question.
I thought for two hours now. I have some clues but can't proceed any more.
Ok. The first important part of this question is that Even using a straight forward brute-force way, it can't be easily solved.
If I just use two loops to iterate through every cell in the matrix, and change the according row and column, it can't be done as the resulting matrix should be based on the origin matrix.
For example, if I see a[0][0] == 1, I can't change row 0 and column 0 all to 1, because that will affect row 1 as row 1 doesn't have 0 originally.
The second thing I noticed is that if a row r contains only 0 and a column c contains only 0, then a[r][c] must be 0; for any other position which is not in this pattern should be 1.
Then another question comes, if I find such a row and column, how can I mark the according cell a[r][c] as special as it already is 0.
My intuitive is that I should use some kind of bit operations on this. Or to meet the required complexity, I have to do something like After I take care of a[i][j], I should then proceed to deal with a[i+1][j+1], instead of scan row by row or column by column.
Even for brute-force without considering time complexity, I can't solve it with the other conditions.
Any one has a clue?
Solution: Java version
#japreiss has answered this question, and his/her answer is smart and correct. His code is in Python, and now I give the Java version. Credits all go to #japreiss
public class MatrixTransformer {
private int[][] a;
private int m;
private int n;
public MatrixTransformer(int[][] _a, int _m, int _n) {
a = _a;
m = _m;
n = _n;
}
private int scanRow(int i) {
int allZero = 0;
for(int k = 0;k < n;k++)
if (a[i][k] == 1) {
allZero = 1;
break;
}
return allZero;
}
private int scanColumn(int j) {
int allZero = 0;
for(int k = 0;k < m;k++)
if (a[k][j] == 1) {
allZero = 1;
break;
}
return allZero;
}
private void setRowToAllOnes(int i) {
for(int k = 0; k < n;k++)
a[i][k] = 1;
}
private void setColToAllOnes(int j) {
for(int k = 0; k < m;k++)
a[k][j] = 1;
}
// # we're going to use the first row and column
// # of the matrix to store row and column scan values,
// # but we need aux storage to deal with the overlap
// firstRow = scanRow(0)
// firstCol = scanCol(0)
//
// # scan each column and store result in 1st row - O(mn) work
public void transform() {
int firstRow = scanRow(0);
int firstCol = scanColumn(0);
for(int k = 0;k < n;k++) {
a[0][k] = scanColumn(k);
}
// now row 0 tells us whether each column is all zeroes or not
// it's also the correct output unless row 0 contained a 1 originally
for(int k = 0;k < m;k++) {
a[k][0] = scanRow(k);
}
a[0][0] = firstCol | firstRow;
for (int i = 1;i < m;i++)
for(int j = 1;j < n;j++)
a[i][j] = a[0][j] | a[i][0];
if (firstRow == 1) {
setRowToAllOnes(0);
}
if (firstCol == 1)
setColToAllOnes(0);
}
#Override
public String toString() {
StringBuilder sb = new StringBuilder();
for (int i = 0; i< m;i++) {
for(int j = 0;j < n;j++) {
sb.append(a[i][j] + ", ");
}
sb.append("\n");
}
return sb.toString();
}
/**
* #param args
*/
public static void main(String[] args) {
int[][] a = {{1, 1, 0, 1, 0}, {0, 0, 0, 0, 0},{0, 1, 0, 0, 0},{1, 0, 1, 1, 0}};
MatrixTransformer mt = new MatrixTransformer(a, 4, 5);
mt.transform();
System.out.println(mt);
}
}
Here is a solution in python pseudocode that uses 2 extra bools of storage. I think it is more clear than I could do in English.
def scanRow(i):
return 0 if row i is all zeroes, else 1
def scanColumn(j):
return 0 if col j is all zeroes, else 1
# we're going to use the first row and column
# of the matrix to store row and column scan values,
# but we need aux storage to deal with the overlap
firstRow = scanRow(0)
firstCol = scanCol(0)
# scan each column and store result in 1st row - O(mn) work
for col in range(1, n):
matrix[0, col] = scanColumn(col)
# now row 0 tells us whether each column is all zeroes or not
# it's also the correct output unless row 0 contained a 1 originally
# do the same for rows into column 0 - O(mn) work
for row in range(1, m):
matrix[row, 0] = scanRow(row)
matrix[0,0] = firstRow or firstCol
# now deal with the rest of the values - O(mn) work
for row in range(1, m):
for col in range(1, n):
matrix[row, col] = matrix[0, col] or matrix[row, 0]
# 3 O(mn) passes!
# go back and fix row 0 and column 0
if firstRow:
# set row 0 to all ones
if firstCol:
# set col 0 to all ones
Here's another intuition that gives a clean and simple algorithm for solving the problem.
An initial algorithm using O(n) space.
For now, let's ignore the O(1) memory constraint. Suppose that you can use O(n) memory (if the matrix is m × n). That would make this problem a lot easier and we could use the following strategy:
Create an boolean array with one entry per column.
For each column, determine whether there are any 1's in the column and store that information in the appropriate array entry.
For each row, set that row to be all 1's if there are any 1's in the row.
For each column, set that column to be all 1's if the corresponding array entry is set.
As an example, consider this array:
1 1 0 1 0
0 0 0 0 0
0 1 0 0 0
1 0 1 1 0
We'd start off by creating and populating the auxiliary array, which can be done in time O(mn) by visiting each column one at a time. This is shown here:
1 1 0 1 0
0 0 0 0 0
0 1 0 0 0
1 0 1 1 0
1 1 1 1 0 <--- aux array
Next, we iterate across the rows and fill each one in if it contains any 1's. This gives this result:
1 1 1 1 1
0 0 0 0 0
1 1 1 1 1
1 1 1 1 1
1 1 1 1 0 <--- aux array
Finally, we fill in each column with 1's if the auxiliary array has a 1 in that position. This is shown here:
1 1 1 1 1
1 1 1 1 0
1 1 1 1 1
1 1 1 1 1
1 1 1 1 0 <--- aux array
So there's one problem: this uses O(n) space, which we don't have! So why even go down this route?
A revised algorithm using O(1) space.
It turns out that we can use a very cute trick to run this algorithm using O(1) space. We need a key observation: if every row contains at least one 1, then the entire matrix becomes 1's. We therefore start off by seeing if this is the case. If it is, great! We're done.
Otherwise, there must be some row in the matrix that is all 0's. Since this row is all 0's, we know that in the "fill each row containing a 1 with 1's" step, the row won't be filled in. Therefore, we can use that row as our auxiliary array!
Let's see this in action. Start off with this:
1 1 0 1 0
0 0 0 0 0
0 1 0 0 0
1 0 1 1 0
Now, we can find a row with all 0's in it and use it as our auxiliary array:
1 1 0 1 0
0 0 0 0 0 <-- Aux array
0 1 0 0 0
1 0 1 1 0
We now fill in the auxiliary array by looking at each column and marking which ones contain at least one 1:
1 1 0 1 0
1 1 1 1 0 <-- Aux array
0 1 0 0 0
1 0 1 1 0
It's perfectly safe to fill in the 1's here because we know that they're going to get filled in anyway. Now, for each row that contains a 1, except for the auxiliary array row, we fill in those rows with 1's:
1 1 1 1 1
1 1 1 1 0 <-- Aux array
1 1 1 1 1
1 1 1 1 1
We skip the auxiliary array because initially it was all 0's, so it wouldn't normally be filled. Finally, we fill in each column with a 1 in the auxiliary array with 1's, giving this final result:
1 1 1 1 1
1 1 1 1 0 <-- Aux array
1 1 1 1 1
1 1 1 1 1
Let's do another example. Consider this setup:
1 0 0 0
0 0 1 0
0 0 0 0
0 0 1 0
We begin by finding a row that's all zeros, as shown here:
1 0 0 0
0 0 1 0
0 0 0 0 <-- Aux array
0 0 1 0
Next, let's populate that row by marking columns containing a 1:
1 0 0 0
0 0 1 0
1 0 1 0 <-- Aux array
0 0 1 0
Now, fill in all rows containing a 1:
1 1 1 1
1 1 1 1
1 0 1 0 <-- Aux array
1 1 1 1
Next, fill in all columns containing a 1 in the aux array with 1's. This is already done here, and we have our result!
As another example, consider this array:
1 0 0
0 0 1
0 1 0
Every row here contains at least one 1, so we just fill the matrix with ones and are done.
Finally, let's try this example:
0 0 0 0 0
0 0 0 0 0
0 1 0 0 0
0 0 0 0 0
0 0 0 1 0
We have lots of choices for aux arrays, so let's pick the first row:
0 0 0 0 0 <-- aux array
0 0 0 0 0
0 1 0 0 0
0 0 0 0 0
0 0 0 1 0
Now, we fill in the aux array:
0 1 0 1 0 <-- aux array
0 0 0 0 0
0 1 0 0 0
0 0 0 0 0
0 0 0 1 0
Now, we fill in the rows:
0 1 0 1 0 <-- aux array
0 0 0 0 0
1 1 1 1 1
0 0 0 0 0
1 1 1 1 1
Now, we fill in the columns based on the aux array:
0 1 0 1 0 <-- aux array
0 1 0 1 0
1 1 1 1 1
0 1 0 1 0
1 1 1 1 1
And we're done! The whole thing runs in O(mn) time because we
Do O(mn) work to find the aux array, and possibly O(mn) work immediately if one doesn't exist.
Do O(mn) work to fill in the aux array.
Do O(mn) work to fill in rows containing 1s.
Do O(mn) work to fill in columns containing 1s.
Plus, it only uses O(1) space, since we just need to store the index of the aux array and enough variables to do loops over the matrix.
EDIT: I have a Java implementation of this algorithm with comments describing it in detail available on my personal site. Enjoy!
Hope this helps!
Assuming matrix is 0-based, i.e. the first element is at mat[0][0]
Use the first row and first column as table headers to contain column and row info respectively.
1.1 Note the element at mat[0][0]. If it is 1, it will require special handling at the end (described later)
Now, start scanning the inner matrix from index[1][1] up to the last element
2.1 If the element at[row][col] == 1 then update the table header data as follows
Row: mat[row][0] = 1;
Column: mat[0][col] = 1;
At this point we have the complete info on which column and row should be set to 1
Again start scanning the inner matrix starting from mat[1][1] and set each element
to 1 if either the current row or column contains 1 in the table header:
if ( (mat[row][0] == 1) || (mat[0][col] == 1) ) then set mat[row][col] to 1.
At this point we have processed all the cells in the inner matrix and we are
yet to process the table header itself
Process the table header
If the matt[0][0] == 1 then set all the elements in the first column and first
row to 1
Done
Time complexity O(2*((n-1)(m-1)+(n+m-1)), i.e. O(2*n*m - (n+m) + 1), i.e. O(2*n*m)
Space O(1)
See my implementation at http://codepad.org/fycIyflw
Another solution would be to scan the matrix as usual, and at the first 1 you split the matrix in 4 quadrants. You then set the line and the column to 1's, and recursively process each quadrant. Just make sure to set the whole columns and rows, even though you are scanning only a quadrant.
public void setOnes(int [][] matrix){
boolean [] row = new boolean [matrix.length]
boolean [] col = new boolean [matrix[0].length]
for (int i=0;i<matrix.length;i++){
for(int j=0;j<matrix[0].length;j++){
if (matrix[i][j] == 1){
row[i] = true
col[j] = true
}
}
}
for (int i=0;i<matrix.length;i++){
for(int j=0;j<matrix[0].length;j++){
if (row[i] || col[j]){
matrix[i][j] = 1;
}
}
}
}