Subtracting row 2 from row 1 repeatedly - rstudio

I want to create in R a column in my data set where I subtract row 2 from row1, row 4 from row 3 and so forth. Moreover, I want that the subtraction result is repeated for each row (e.g.if the result from the subtraction row2-row1 is -0.294803, I want this value to be present both in row1 and row2, hence repeated twice for both factors of the subtraction, and so forth for all subtractions).
Here my data set.
I tried with the function aggregate but I didn't succeed.
Any hint?

Another possible solution can be:
x <- read.table("mydata.csv",header=T,sep=";")
x$diff <- rep(x$log[seq(2,nrow(x),by=2)] - x$log[seq(1,nrow(x),by=2)], each=2)
By using the function seq(), you can generate the sequences of row positions:
1, 3, 5, ... 9
2, 4, 6, ... 10
Afterwards, the code subtracts the rows 2...10 to the rows 1...9. Each result is replicated by using the command rep() and it's assigned to the new column diff.

solution 1
One way to that is with one simple loop:
(download mydata.csv)
a = read.table("mydata.csv",header=T,sep=";")
a$delta= NA
for(i in seq(1, nrow(a), by=2 )){
a[i,"delta"] = a[i+1,"delta"] = a[i+1,"log"] - a[i,"log"]
}
What is going on here is that the for loop iterates on every odd number (that's what the seq(...,by=2) does. So for the first, third, fifth, etc. row we assign to that row AND the following one the computed difference.
which returns:
> a
su match log delta
1 1 match 5.80 0.30
2 1 mismatch 6.10 0.30
3 2 match 6.09 -0.04
4 2 mismatch 6.05 -0.04
5 3 match 6.42 -0.12
6 3 mismatch 6.30 -0.12
7 4 match 6.20 -0.20
8 4 mismatch 6.00 -0.20
9 5 match 5.90 0.19
10 5 mismatch 6.09 0.19
solution 2
If you have a lot of data this approach can be slow. And generally R works better with another form of iterative functions which are the apply family.
The same code of above can be optimized like this:
a$delta = rep(
sapply(seq(1, nrow(a), by=2 ),
function(i){ a[i+1,"log"] - a[i,"log"] }
),
each=2)
Which gives the very same result as the first solution, should be faster, but also somewhat less intuitive.
solution 3
Finally it looks to me that you're trying to use a convoluted approach by using the long dataframe format, given your kind of data.
I'd reshape it to wide, and then operate more logically with separate columns, without the need of duplicate data.
Like this:
a = read.table("mydata.csv",header=T,sep=";")
a = reshape(a, idvar = "su", timevar = "match", direction = "wide")
#now creating what you want became a very simple thing:
a$delta = a[[3]]-a[[2]]
Which returns:
>a
su log.match log.mismatch delta
1 1 5.80 6.10 0.30
3 2 6.09 6.05 -0.04
5 3 6.42 6.30 -0.12
7 4 6.20 6.00 -0.20
9 5 5.90 6.09 0.19
The delta column contains the values you need. If you really need the long format for further analysis you can always go back with:
a= reshape(a, idvar = "su", timevar = "match", direction = "long")
#sort to original order:
a = a[with(a, order(su)), ]

Related

How can I subtract columns from a 2D array of JULIA?

I'm new to julia and I have a problem.
I am working with Julia (Jupyter notebook) and I do not know how can I do column 3 - column 2 and write the result as a new column at the end of the matrix/array2D.
I have tried this:
newCol = array[(1:end),3] - array[(1:end),2]
Any suggestion?
You can subtract the two columns and then concatenate it with the original array using the normal build-an-array syntax:
julia> arr
2x3 Array{Int32,2}:
1 2 3
5 6 7
julia> [arr [arr[:,3] - arr[:,2]]]
2x4 Array{Int32,2}:
1 2 3 1
5 6 7 1
Or use hcat:
julia> hcat(arr,arr[:,3] - arr[:,2])
2x4 Array{Int32,2}:
1 2 3 1
5 6 7 1
(Note that neither of these act in place, so you'd need to assign the result somewhere if you want to use it later.)

effective way of transformation from 2D to 1D vector

i want to create 1D vector in matlab from given matrix,for this i have implemented following algorithm ,which use trivial way
% create one dimensional vector from 2D matrix
function [x]=one_dimensional(b,m,n)
k=1;
for i=1:m
for t=1:n
x(k)=b(i,t);
k=k+1;
end
end
x;
end
when i run it using following example,it seems to do it's task fine
b=[2 1 3;4 2 3;1 5 4]
b =
2 1 3
4 2 3
1 5 4
>> one_dimensional(b,3,3)
ans =
2 1 3 4 2 3 1 5 4
but generally i know that,arrays are not good way to use in matlab,because it's performance,so what should be effective way for transformation matrix into row/column vector?i am just care about performance.thanks very much
You can use the (:) operator...But it works on columns not rows so you need to transpose using the 'operator before , for example:
b=b.';
b(:)'
ans=
2 1 3 4 2 3 1 5 4
and I transposed again to get a row output (otherwise it'll the same vector only in column form)
or also, this is an option (probably a slower one):
reshape(b.',1,[])

Sparse Matrices Storage formats - Conversion

Is there an efficient way of converting a sparse matrix in Compressed Row Storage(CRS) format to Coordinate List (COO) format ?
Have a look at Yousef Saad's library SPARSKIT -- he has subroutines to convert back and forth between compressed sparse row and coordinate formats, as well as several other sparse matrix storage schemes.
Anyhow, to see how to get the coordinate format from the compressed one, it's easiest to consider how you could have come up with the compressed row format in the first place. Say you have a sparse matrix in COO, where you've put everything in order, for example
rows: 1 1 1 1 2 2 2 2 2 3 3 3 ...
cols: 1 3 5 9 2 3 7 9 11 1 2 3 ...
So the non-zero entries in row 1 are (1,1), (1,3), (1,5), (1,9) and so forth. You're storing a lot of redundant data in the array of rows; you can instead just have an array ia such that ia(i) tells you the starting address in the array cols for row i. In our example above, we would then have
ia : 1 5 10 ...
cols: 1 3 5 9 2 3 7 9 11 1 2 3 ...
To go from COO to CSR, we just use the fact that
ia(i+1) = ia(i) + number of non-zero entries in row i
for any i. Knowing that, you can work backwards to get the COO format from CSR.

matlab for loop: fastest and most efficient method to reproduce large matrix

My data is a 2096x252 matrix of double values. I need a for loop or an equivalent which performs the following:
Each time the matrix is reproduced the first array is deleted and the second becomes the first. When the loop runs again, the remaining matrix is reproduced and the first array is deleted and the next becomes the first and so on.
I've tried using repmat but it is too slow and tedious when dealing with large matrices (2096x252).
Example input:
1 2 3 4
3 4 5 6
3 5 7 5
9 6 3 2
Desired output:
1 2 3 4
3 4 5 6
3 5 7 5
9 6 3 2
3 4 5 6
3 5 7 5
9 6 3 2
3 5 7 5
9 6 3 2
9 6 3 2
Generally with Matlab it is much faster to pre-allocate a large array than to build it incrementally. When you know in advance the final size of the large array there's no reason not to follow this general advice.
Something like the following should do what you want. Suppose you have an array in(nrows, ncols); then
indices = [0 nrows:-1:1];
out = zeros(sum(indices),ncols);
for ix = 1:nrows
out(1+sum(indices(1:ix)):sum(indices(1:ix+1)),:) = in(ix:end,:);
end
This worked on your small test input. I expect you can figure out what is going on.
Whether it is the fastest of all possible approaches I don't know, but I expect it to be much faster than building a large matrix incrementally.
Disclaimer:
You'll probably have memory issues with large matrices, but that is not the question.
Now, to the business:
For a given matrix A, the straightforward approach with the for loop would be:
[N, M] = size(A);
B = zeros(sum(1:N), M);
offset = 1;
for i = 1:N
B(offset:offset + N - i, :) = A(i:end, :);
offset = offset + size(A(i:end, :), 1);
end
B is the desired output matrix.
However, this solution is expected to be slow as well, because of the for loop.
Edit: preallocated B instead of dynamically changing size (this optimization should achieve a slight speedup).

How to get the mean of rows of a matrix in Octave?

>> a = [2,3,4;6,7,8]
a =
2 3 4
6 7 8
>> mean(a)
ans =
4 5 6
where [4 5 6] is the mean for each column
How can I get the mean for each row?
In my example, I would expect [3;7]
From http://www.mathworks.co.uk/help/techdoc/ref/mean.html:
For matrices, mean(A,2) is a column vector containing the mean value of each row.
In Octave it's the same.
Alternatively to the other answer, you can simply use the transpose feature
>> a'
ans =
2 6
3 7
4 8
>> mean(a')
ans =
3 7
I suggest this answer over the other because it works for any row based octave function (max , min , sum , etc)
You can do
mean (a, 2)
returns : [3; 7]
Trick is the 2nd parameter specifies along which dimension you want mean. 1 is default ("Column").

Resources