I have a time series dataset of accelerometry values where there are many sub-seconds of measurements but the actual number of sub-seconds recorded per second is variable.
So I would be starting with something that looks like this:
Date time
Dec sec
Acc X
1
.00
0.5
1
.25
0.5
1
.50
0.6
1
.75
0.5
2
.00
0.6
2
.40
0.5
2
.80
0.5
3
.00
0.5
3
.50
0.5
4
.00
0.6
4
.25
0.5
4
.50
0.5
4
.75
0.5
And trying to convert it to wide format where each row is a second, and the columns are the decimal seconds corresponding to each second.
sub1
sub2
sub3
sub4
.5
.5
.6
.5
.6
.5
.5
NaN
.5
.5
NaN
NaN
.6
.5
.5
.5
In code this would look like:
%Preallocate some space
Dpts_observations = NaN(13,3);
%These are the "seconds" number
Dpts_observations(:,1)=[1 1 1 1...
2 2 2...
3 3...
4 4 4 4];
%These are the "decimal seconds"
Dpts_observations(:,2) = [0.00 0.25 0.50 0.75...
0.00 0.33 0.66...
0.00 0.50 ...
0.00 0.25 0.50 0.75]
%Here's actual acceleration values
Dpts_observations(:,3) = [0.5 0.5 0.5 0.5...
0.6 0.5 0.5...
0.4 0.5...
0.5 0.5 0.6 0.4]
%I have this in a separate file but I have summary data that helps me
determine the row indexes corresponding to sub-seconds that belong to the same second and I use them to manually extract from long form to wide form.
%Create table to hold indexing information
Seconds = [1 2 3 4];
Obs_per_sec = [4 3 2 4];
Start_index = [1 5 8 10];
End_index = [4 7 9 13];
Dpts_attributes = table(Seconds, Obs_per_sec, Start_index, End_index);
%Preallocate new array
Acc_X = NaN(4,4);
%Loop through seconds
for i=1:max(size(Dpts_attributes))
Acc_X(i, 1:Dpts_attributes.Obs_per_sec(i))=Dpts_observations(Dpts_attributes.Start_index(i):Dpts_attributes.End_index(i),3);
end
Now this is working but its very slow. In reality, I have a huge data set consisting of millions of seconds and I'm hoping there might be a better solution than the one I currently have going. My data is all numeric to try to make everything as fast a possible.
Thank you!
I am trying to replace the sub and super diagonals of a matrix in Octave.
This is the code I am using:
A=[-3 -2 -1 0 1 2 3;0.1 0.2 0.2 0.5 0.6 -0.1 0]'
P=zeros(4,4)
for (k=1:7)
j=A(k,1)
diag(P,j)=A(k,2)
end
This is the error I got: diag(0,_): subscripts must be either integers 1 to (2^63)-1 or logicals
But all the little parts are okay. diag(P,-3) works fine, but when I ask to replace in the loop it refuses!
What can I do about it? Is this: diag(P,j)=e, not the right code to substitute super and sub diagonals?
The reason you're getting an error is that diag(P,j) is not a reference to the diagonal of P, it is a function that returns the values on that diagonal. So what you're doing is assigning the value A(k,2) to the return value of the function and, since it's never assigned to a variable name, the value is lost and nothing changes.
To fix your loop, you would need to provide indices into P and assign to those. One way is to use logical indexing to tell MATLAB which values in P to change. For example,
P = zeros(4)
M = logical(diag([1,1,1], -1))
P(M) = 3
gives us
P =
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
M =
0 0 0 0
1 0 0 0
0 1 0 0
0 0 1 0
P =
0 0 0 0
3 0 0 0
0 3 0 0
0 0 3 0
The unfortunate part is that we can't specify both which diagonal we want to create and the size of the resulting matrix, so we have to calculate the number of elements on the diagonal before creating it.
A=[-3 -2 -1 0 1 2 3;0.1 0.2 0.2 0.5 0.6 -0.1 0].'
n=4; % Number of rows/columns in P...
% If we want a non-square matrix, we'll have to do more math
P=zeros(n);
for k=1:2*n-1 % Remove hardcoded values to make the code more general.
j=A(k,1);
diag_length = n-abs(j);
M=diag(true(1,diag_length),j); % Create logical array with true on jth diagonal
P(M)=A(k,2);
end
The result is:
P =
0.5000 0.6000 -0.1000 0
0.2000 0.5000 0.6000 -0.1000
0.2000 0.2000 0.5000 0.6000
0.1000 0.2000 0.2000 0.5000
Another approach is to use spdiags. One of the uses of spdiags takes the columns of one matrix and uses them to build the diagonals of the output matrix. You pass the indices of the diagonals to set, and the matrix of values for each of the diagonals, along with the matrix size.
If we only pass one value for each diagonal, spdiags will only set one value, so we'll have to duplicate the input vector n times. (spdiags will happily throw away values, but won't fill them in.)
A=[-3 -2 -1 0 1 2 3;0.1 0.2 0.2 0.5 0.6 -0.1 0].'
n = 4;
diag_idx = A(:,1).'; % indices of diagonals
diag_val = A(:,2).'; % corresponding values
diag_val = repmat(diag_val, n, 1); % duplicate values n times
P = spdiags(diag_val, diag_idx, n, n);
P = full(P);
That last line is because spdiags creates a sparse matrix. full turns it into a regular matrix. The final value of P is what you'd expect:
P =
0.5000 0.6000 -0.1000 0
0.2000 0.5000 0.6000 -0.1000
0.2000 0.2000 0.5000 0.6000
0.1000 0.2000 0.2000 0.5000
Of course, if you're into one-liners, you can combine all of these commands together.
P = full(spdiags(repmat(A(:,2).', n, 1), A(:,1).', n, n));
I want to generate n random numbers between 0 and 1 that sum of them is less equal than one.
Sum(n random number between 0 and 1) <= 1
n?
For example: 3 random number between 0 and 1:
0.2 , 0.3 , 0.4
0.2 + 0.3 + 0.4 = 0.9 <=1
It sounds like you would need to generate the numbers separately while keeping track of the previous numbers. We'll use your example:
Generate the first number between 0 and 1 = 0.2
1.0 - 0.2 = 0.8: Generate the next number between 0 and 0.8 = 0.3
0.8 - 0.3 = 0.5: Generate the next number between 0 and 0.5 = 0.4
I have applied classification algorithm on dataset and came out with below stats:
Correctly Classified Instances 684 76.1693 %
Incorrectly Classified Instances 214 23.8307 %
Kappa statistic 0
Mean absolute error 0.1343
Root mean squared error 0.2582
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 898
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0 0 0 0 0 0.5 1
0 0 0 0 0 0.5 2
1 1 0.762 1 0.865 0.5 3
0 0 0 0 0 ? 4
0 0 0 0 0 0.5 5
0 0 0 0 0 0.5 U
Weighted Avg. 0.762 0.762 0.58 0.762 0.659 0.5
=== Confusion Matrix ===
a b c d e f <-- classified as
0 0 8 0 0 0 | a = 1
0 0 99 0 0 0 | b = 2
0 0 684 0 0 0 | c = 3
0 0 0 0 0 0 | d = 4
0 0 67 0 0 0 | e = 5
0 0 40 0 0 0 | f = U
I can understand much of the data however there is a problem interpreting the values since i am new to Weka:
1. Which error rate to report overall?
2. How to interpret if something interesting about the model?
1) Overall error measure
The triplet Precision, Recall and F-Measure together is reported quite often because each number represents a different aspect of the model.
If would like to have a single number only then take Percent (In)correctly Classified Instances or Weighted Avg. F-Measure.
The other error measures are also useful but they require deeper knowledge of statistics (which I'm lacking :-)
2) Something interesting about the model
From Detailed Accuracy By Class and Confusion Matrix you can see that the model is quite simple. It classifies everything as class 3. The error measures looks quite successful, but it is just because 76% of instances in the dataset have the class 3. The model corresponds with often used baseline algorithm called "most common class".
The ROC area is also useful in terms of evaluating accuracy and interpreting how interesting a model is. Simply speaking, the true positive rate is plotted against the false positive rate and the ROC area is calculated as the area underneath this curve. A high ROC area, say 0.9 to 1, indicates that the model is very good at classifying instances, whereas a ROC area of 0.5 (as in your model) means that the model is no better at classification than a random method like flipping coins.
I have a data file of format
time x-axis y-axis val1 val2
0 1 1 0.3 0.5
0 1 2 0.3 0.5
0 2 1 0.3 0.5
0 2 2 0.3 0.5
1 1 1 0.6 0.3
1 1 2 0.6 0.3
1 2 1 0.6 0.3
1 2 2 0.6 0.3
I wish to draw gif/video of val1 at xy given above with various time steps. How can i do that?
I googled and found various solutions not matching my requirement. Please help.