How to convert from long to wide format when the column numbers per row are variable? (MATLAB) - performance

I have a time series dataset of accelerometry values where there are many sub-seconds of measurements but the actual number of sub-seconds recorded per second is variable.
So I would be starting with something that looks like this:
Date time
Dec sec
Acc X
1
.00
0.5
1
.25
0.5
1
.50
0.6
1
.75
0.5
2
.00
0.6
2
.40
0.5
2
.80
0.5
3
.00
0.5
3
.50
0.5
4
.00
0.6
4
.25
0.5
4
.50
0.5
4
.75
0.5
And trying to convert it to wide format where each row is a second, and the columns are the decimal seconds corresponding to each second.
sub1
sub2
sub3
sub4
.5
.5
.6
.5
.6
.5
.5
NaN
.5
.5
NaN
NaN
.6
.5
.5
.5
In code this would look like:
%Preallocate some space
Dpts_observations = NaN(13,3);
%These are the "seconds" number
Dpts_observations(:,1)=[1 1 1 1...
2 2 2...
3 3...
4 4 4 4];
%These are the "decimal seconds"
Dpts_observations(:,2) = [0.00 0.25 0.50 0.75...
0.00 0.33 0.66...
0.00 0.50 ...
0.00 0.25 0.50 0.75]
%Here's actual acceleration values
Dpts_observations(:,3) = [0.5 0.5 0.5 0.5...
0.6 0.5 0.5...
0.4 0.5...
0.5 0.5 0.6 0.4]
%I have this in a separate file but I have summary data that helps me
determine the row indexes corresponding to sub-seconds that belong to the same second and I use them to manually extract from long form to wide form.
%Create table to hold indexing information
Seconds = [1 2 3 4];
Obs_per_sec = [4 3 2 4];
Start_index = [1 5 8 10];
End_index = [4 7 9 13];
Dpts_attributes = table(Seconds, Obs_per_sec, Start_index, End_index);
%Preallocate new array
Acc_X = NaN(4,4);
%Loop through seconds
for i=1:max(size(Dpts_attributes))
Acc_X(i, 1:Dpts_attributes.Obs_per_sec(i))=Dpts_observations(Dpts_attributes.Start_index(i):Dpts_attributes.End_index(i),3);
end
Now this is working but its very slow. In reality, I have a huge data set consisting of millions of seconds and I'm hoping there might be a better solution than the one I currently have going. My data is all numeric to try to make everything as fast a possible.
Thank you!

Related

How can I generate a range of random floating point numbers in Julia?

I noticed that rand(x) where x is an integer gives me an array of random floating points. I want to know how I can generate an array of random float type variables within a certain range. I tried using a range as follows:
rand(.4:.6, 5, 5)
And I get:
0.4 0.4 0.4 0.4 0.4
0.4 0.4 0.4 0.4 0.4
0.4 0.4 0.4 0.4 0.4
0.4 0.4 0.4 0.4 0.4
0.4 0.4 0.4 0.4 0.4
How can I get a range instead of the lowest number in the range?
Perhaps a bit more elegant, as you actually want to sample from a Uniform distribution, you can use the Distribution package:
julia> using Distributions
julia> rand(Uniform(0.4,0.6),5,5)
5×5 Array{Float64,2}:
0.547602 0.513855 0.414453 0.511282 0.550517
0.575946 0.520085 0.564056 0.478139 0.48139
0.409698 0.596125 0.477438 0.53572 0.445147
0.567152 0.585673 0.53824 0.597792 0.594287
0.549916 0.56659 0.502528 0.550121 0.554276
The same method then applies from sampling from other well-known or user-defined distributions (just give the distribution as the first parameter to rand())
You need a step parameter:
rand(.4:.1:.6, 5, 5)
The .1 will provide a step for your range which is necessary for floating point numbers and not necessary for incrementing by 1. The issue is that it will assume 1 regardless of implicit precision. If you need the increment more precise than do the following:
rand(.4:.0001:.6, 5, 5)
This will give you a result that looks similar to:
0.4587 0.557 0.586 0.4541 0.4686
0.4545 0.4789 0.4921 0.4451 0.4212
0.4373 0.5056 0.4229 0.5167 0.5504
0.5494 0.4068 0.5316 0.4378 0.5495
0.4368 0.4384 0.5265 0.5995 0.5231
You can do it with
julia> map(x->0.4+x*(0.6-0.4),rand(5,5))
5×5 Array{Float64,2}:
0.455445 0.475007 0.518734 0.463064 0.400925
0.509436 0.527338 0.566976 0.482812 0.501817
0.405967 0.563425 0.574607 0.502343 0.483075
0.50317 0.482894 0.54584 0.594157 0.528844
0.50418 0.515788 0.5554 0.580199 0.505396
The general rule is
julia> map( x -> start + x * (stop - start), rand(5,5) )
where start is 0.4 and stop is 0.6
You can even generate a six sided dice this way by having x ranging from 1 to 7 that is 1 < x < 7 since the probability of x being exactly 1.0 or 7.0 is zero
julia> map(x->Integer(floor(1+x*(7-1))),rand(5,5))
5×5 Array{Int64,2}:
2 6 6 3 2
3 1 3 1 6
5 4 6 1 5
3 6 5 5 3
3 4 3 5 4
or you can use
julia> rand(1:6,5,5)
5×5 Array{Int64,2}:
3 6 3 5 5
2 1 3 3 3
1 5 4 1 5
5 5 5 5 1
3 2 1 5 6
Just another simple solution (using vectorized operations)
0.2 .* rand(5,5) .+ 0.4
And if efficiency matters...
#time 0.2 .* rand(10000, 10000) .+ 0.4
>> 0.798906 seconds (4 allocations: 1.490 GiB, 5.20% gc time)
#time map(x -> 0.4 + x * (0.6 - 0.4), rand(10000, 10000))
>> 0.836322 seconds (49.20 k allocations: 1.493 GiB, 7.08% gc time)
using Distributions
#time rand(Uniform(0.4, 0.6), 10000, 10000)
>> 1.310401 seconds (2 allocations: 762.940 MiB, 1.51% gc time)
#time rand(0.2:0.000001:0.4, 10000, 10000)
>> 1.715034 seconds (2 allocations: 762.940 MiB, 6.24% gc time)

Delete a row if the absolute value of all columns are less than 1?

I need to delete rows a table (.csv) only if in all column absolute values for that row are less than 1, how can I accomplish this?
Example
Year Parameter1 Parameter2 Parameter3 Parameter4
1 -0.3 0.1 -2.5 1.0
2 -0.3 0.1 0.8 0.1
3 -0.3 0.1 -3.8 1.6
4 -0.6 0.5 -0.2 0.4
5 0.3 -0.1 -0.5 1.3
And I want to output to result in:
Year Parameter1 Parameter2 Parameter3 Parameter4
1 -0.3 0.1 -2.5 1.0
3 -0.3 0.1 -3.8 1.6
5 0.3 -0.1 -0.5 1.3
Thanks in advance!

What's the fastest way to unroll a matrix in MATLAB?

How do I turn a matrix:
[ 0.12 0.23 0.34 ;
0.45 0.56 0.67 ;
0.78 0.89 0.90 ]
into a 'coordinate' matrix with a bunch of rows?
[ 1 1 0.12 ;
1 2 0.23 ;
1 3 0.34 ;
2 1 0.45 ;
2 2 0.56 ;
2 3 0.67 ;
3 1 0.78 ;
3 2 0.89 ;
3 3 0.90 ]
(permutation of the rows is irrelevant, it only matters that the data is in this structure)
Right now I'm using a for loop but that takes a long time.
Here is an option using ind2sub:
mat= [ 0.12 0.23 0.34 ;
0.45 0.56 0.67 ;
0.78 0.89 0.90 ] ;
[I,J] = ind2sub(size(mat), 1:numel(mat));
r=[I', J', mat(:)]
r =
1.0000 1.0000 0.1200
2.0000 1.0000 0.4500
3.0000 1.0000 0.7800
1.0000 2.0000 0.2300
2.0000 2.0000 0.5600
3.0000 2.0000 0.8900
1.0000 3.0000 0.3400
2.0000 3.0000 0.6700
3.0000 3.0000 0.9000
Note that the indices are reversed compared to your example.
A = [ .12 .23 .34 ;
.45 .56 .67 ;
.78 .89 .90 ];
[ii jj] = meshgrid(1:size(A,1),1:size(A,2));
B = A.';
R = [ii(:) jj(:) B(:)];
If you don't mind a different order (according to your edit), you can do it more easily:
[ii jj] = ndgrid(1:size(A,1),1:size(A,2));
R = [ii(:) jj(:) A(:)];
In addition to generating the row/col indexes with meshgrid, you can use all three outputs of find as follows:
[II,JJ,AA]= find(A.'); %' note the transpose since you want to read across
M = [JJ II AA]
M =
1 1 0.12
1 2 0.23
1 3 0.34
2 1 0.45
2 2 0.56
2 3 0.67
3 1 0.78
3 2 0.89
3 3 0.9
Limited application because zeros get lost. Nasty, but correct workaround (thanks user664303):
B = A.'; v = B == 0; %' transpose to read across, otherwise work directly with A
[II, JJ, AA] = find(B + v);
M = [JJ II AA-v(:)];
Needless to say, I would recommend one of the other solutions. :) In particular, ndgrid is the most natural solution to obtaining the row,col inds.
I find ndgrid to be the most natural solution, but here's a fun way to do it manually with the odd couple of kron and repmat:
M = [kron(1:size(A,2),ones(1,size(A,1))).' ... %' row indexes
repmat((1:size(A,1))',size(A,2),1) ... %' col indexes
reshape(A.',[],1)] %' matrix values, read across
Simple adjustment to read down, as is natural in MATLAB:
M = [repmat((1:size(A,1))',size(A,2),1) ... %' row indexes (still)
kron(1:size(A,2),ones(1,size(A,1))).' ... %' column indexes
A(:)] % matrix values, read down
(Also since my first answer was obscenely hackish.)
I also find kron to be a nice tool to replicate each element at a time rather than than the entire array at a time, as repmat does. For example:
>> 1:size(A,2)
ans =
1 2 3
>> kron(1:size(A,2),ones(1,size(A,1)))
ans =
1 1 1 2 2 2 3 3 3
Taking this a bit further, we can generate a new function called repel to replicate elements of an array as opposed to the whole array:
>> repel = #(x,m,n) kron(x,ones(m,n));
>> repel(1:4,1,2)
ans =
1 1 2 2 3 3 4 4
>> repel(1:3,2,2)
ans =
1 1 2 2 3 3
1 1 2 2 3 3

How to plot time graph in gnuplot

I have a data file of format
time x-axis y-axis val1 val2
0 1 1 0.3 0.5
0 1 2 0.3 0.5
0 2 1 0.3 0.5
0 2 2 0.3 0.5
1 1 1 0.6 0.3
1 1 2 0.6 0.3
1 2 1 0.6 0.3
1 2 2 0.6 0.3
I wish to draw gif/video of val1 at xy given above with various time steps. How can i do that?
I googled and found various solutions not matching my requirement. Please help.

Filter Data In a Cleaner/More Efficient Way

I have a set of data with a bunch of columns. Something like the following (in reality my data has about half a million rows):
big = [
1 1 0.93 0.58;
1 2 0.40 0.34;
1 3 0.26 0.31;
1 4 0.40 0.26;
2 1 0.60 0.04;
2 2 0.84 0.55;
2 3 0.53 0.72;
2 4 0.00 0.39;
3 1 0.27 0.51;
3 2 0.46 0.18;
3 3 0.61 0.01;
3 4 0.07 0.04;
4 1 0.26 0.43;
4 2 0.77 0.91;
4 3 0.49 0.80;
4 4 0.40 0.55;
5 1 0.77 0.40;
5 2 0.91 0.28;
5 3 0.80 0.65;
5 4 0.05 0.06;
6 1 0.41 0.37;
6 2 0.11 0.87;
6 3 0.78 0.61;
6 4 0.87 0.51
];
Now, let's say I want to get rid of the rows where the first column is a 3 or a 6.
I'm doing that like so:
filterRows = [3 6];
for i = filterRows
big = big(~ismember(1:size(big,1), find(big(:,1) == i)), :);
end
Which works, but the loop makes me think I'm missing a more efficient trick. Is there a better way to do this?
Originally I tried:
big(find(big(:,1) == filterRows ),:) = [];
but of course that doesn't work.
Use logical indexing:
rows = (big(:, 1) == 3 | big(:, 1) == 6);
big(rows, :) = [];
In the general case, where the values of the first column are stored in filterRows, you can generate the logical vector rows with ismember:
rows = ismember(big(:, 1), filterRows);
or with bsxfun:
rows = any(bsxfun(#eq, big(:, 1), filterRows(:).'), 2);

Resources