I'm making a chart but I would like to use lines rather than points.
Using the style of lines, all the points are connected and the graph has a network appearance, which I don't want.
set grid
set ticslevel 0.1
set samples 51, 51
set isosamples 20, 20
set border 1+2+4+8
unset key
splot 'matrix.dat' matrix
part of data to matrix plot
0.261 0.665 0.225 0.382 0.255 0.574 0.356
0.338 0.845 0.0363 0.167 0.727 0.0805 0.764
0.225 0.196 0.107 0.153 0.347 0.338 0.168
0.157 0.443 0.0671 0.135 0.312 0.408 0.362
0.151 0.281 0.0572 0.103 0.309 0.49 0.242
0.12 0.336 0.0604 0.173 0.19 0.395 0.153
0.119 0.173 0.0336 0.145 0.156 0.219 0.177
0.123 0.0452 0.0165 0.149 0.0932 0.0663 0.133
0.123 0.0741 0.00373 0.136 0.0346 0.485 0.131
0.111 0.241 0.0124 0.105 0.0127 1.01 0.122
0.096 0.475 0.0194 0.0569 0.0284 1.67 0.102
0.0777 0.773 0.0175 0.00929 0.0375 2.42 0.0831
0.059 1.11 0.0123 0.0322 0.0408 3.23 0.0635
0.0438 1.48 6.44E-4 0.0659 0.0265 4.07 0.0445
0.0349 1.92 0.0192 0.078 0.00585 4.92 0.0254
0.0392 2.42 0.0446 0.0632 0.0306 5.73 0.00774
0.0518 2.97 0.0745 0.031 0.0729 6.46 0.00716
This cannot be done automatically. You must determine the rows and columns of your matrix. First, to get the number of rows, use
stats 'matrix.dat' using 1 nooutput
rows = STATS_records
For the number of columns, use then
stats 'matrix.dat' matrix nooutput
cols = STATS_records/rows
And now plot every line
unset key
splot for [i=0:cols-1] 'matrix.dat' matrix every ::i::i lt 1 with lines
Result (with 4.6.4) is:
I think Christoph's solution is just what you need, but to make the point clear, by providing the matrix and using splot matrix alone will just generate a mesh.
So you will need to specify the lines with complete X, Y and Z vectors and then plot them using splot with lines/linespoints. I'm adding an example below in case it may be helpful for anyone else.
You arrange your data file as follows:
10 1 0.261 2 0.665 3 0.225 4 0.382 5 0.255 6 0.574 7 0.356
20 1 0.338 2 0.845 3 0.0363 4 0.167 5 0.727 6 0.0805 7 0.764
30 1 0.225 2 0.196 3 0.107 4 0.153 5 0.347 6 0.338 7 0.168
40 1 0.157 2 0.443 3 0.0671 4 0.135 5 0.312 6 0.408 7 0.362
And then plot as follows:
set grid
set ticslevel 0.1
#set samples 51, 51
#set isosamples 20, 20
#set border 1+2+4+8
unset key
splot 'matrix.dat' using 1:2:3 with linespoints, \
'matrix.dat' using 1:4:5 with linespoints, \
'matrix.dat' using 1:6:7 with linespoints, \
'matrix.dat' using 1:8:9 with linespoints, \
'matrix.dat' using 1:10:11 with linespoints, \
'matrix.dat' using 1:12:13 with linespoints, \
'matrix.dat' using 1:14:15 with linespoints
With the resultant plot
Related
I have a time series dataset of accelerometry values where there are many sub-seconds of measurements but the actual number of sub-seconds recorded per second is variable.
So I would be starting with something that looks like this:
Date time
Dec sec
Acc X
1
.00
0.5
1
.25
0.5
1
.50
0.6
1
.75
0.5
2
.00
0.6
2
.40
0.5
2
.80
0.5
3
.00
0.5
3
.50
0.5
4
.00
0.6
4
.25
0.5
4
.50
0.5
4
.75
0.5
And trying to convert it to wide format where each row is a second, and the columns are the decimal seconds corresponding to each second.
sub1
sub2
sub3
sub4
.5
.5
.6
.5
.6
.5
.5
NaN
.5
.5
NaN
NaN
.6
.5
.5
.5
In code this would look like:
%Preallocate some space
Dpts_observations = NaN(13,3);
%These are the "seconds" number
Dpts_observations(:,1)=[1 1 1 1...
2 2 2...
3 3...
4 4 4 4];
%These are the "decimal seconds"
Dpts_observations(:,2) = [0.00 0.25 0.50 0.75...
0.00 0.33 0.66...
0.00 0.50 ...
0.00 0.25 0.50 0.75]
%Here's actual acceleration values
Dpts_observations(:,3) = [0.5 0.5 0.5 0.5...
0.6 0.5 0.5...
0.4 0.5...
0.5 0.5 0.6 0.4]
%I have this in a separate file but I have summary data that helps me
determine the row indexes corresponding to sub-seconds that belong to the same second and I use them to manually extract from long form to wide form.
%Create table to hold indexing information
Seconds = [1 2 3 4];
Obs_per_sec = [4 3 2 4];
Start_index = [1 5 8 10];
End_index = [4 7 9 13];
Dpts_attributes = table(Seconds, Obs_per_sec, Start_index, End_index);
%Preallocate new array
Acc_X = NaN(4,4);
%Loop through seconds
for i=1:max(size(Dpts_attributes))
Acc_X(i, 1:Dpts_attributes.Obs_per_sec(i))=Dpts_observations(Dpts_attributes.Start_index(i):Dpts_attributes.End_index(i),3);
end
Now this is working but its very slow. In reality, I have a huge data set consisting of millions of seconds and I'm hoping there might be a better solution than the one I currently have going. My data is all numeric to try to make everything as fast a possible.
Thank you!
With gnuplot you can create a 3D-like plot with splot and interactively change the view.
You also can create animations with gnuplot with set terminal gif animate.
### interactive animation?
reset session
set view equal
set border 0
unset tics
$Data <<EOD
1 1.000 0.000 0.000
2 0.500 0.866 0.000
3 -0.500 0.866 0.000
4 -1.000 0.000 0.000
5 -0.500 -0.866 0.000
6 0.500 -0.866 0.000
1 1.000 0.000 0.000
EOD
$Off <<EOD
1 0.00 0.00 0.1
2 0.00 0.00 -0.1
3 0.00 0.00 0.1
4 0.00 0.00 -0.1
5 0.00 0.00 0.1
6 0.00 0.00 -0.1
1 0.00 0.00 0.1
EOD
set xrange[-2:2]
set yrange[-2:2]
set zrange[-2:2]
set view 45,45
max=10.
Offset(n,axis,i) = real(word($Off[n+1],axis+1))*sin(2*pi*i/max)
set term gif animate delay 5 size 400,300
set output "Molecule.gif"
do for [i=0:max] {
splot $Data u ($2+Offset($0,1,i)):($3+Offset($0,2,i)):($4+Offset($0,3,i)) \
w lp pt 7 ps 2 lw 2 lc rgb "red" not
unset autoscale
}
set term wxt size 400,300
set margin 0
splot $Data u 2:3:4 w lp pt 7 ps 2 lw 2 lc rgb "red" not
set output
### end of code
Now, my question is: is there maybe the chance to also create interactive animations? I would like to rotate the view while it is animated. Is this somehow possible with gnuplot? Any ideas?
Edit:
#Ethan's answer solves this question. However, is there maybe a way to avoid the flickering of the mouse cursor?
Putting the plot commands in a loop does not disable mouse interaction. The simple answer should work:
set xrange[-2:2]
set yrange[-2:2]
set zrange[-2:2]
set view 45,45
Offset(n,axis,i) = real(word($Off[n+1],axis+1))*sin(2*pi*i/max)
# Loop forever
# but allow an explicit end condition triggered by a hot key
done = 0
bind "d" "done = 1"
while (!done) {
do for [i=0:10] {
splot $Data u ($2+Offset($0,1,i)):($3+Offset($0,2,i)):($4+Offset($0,3,i)) \
w lp pt 7 ps 2 lw 2 lc rgb "red" not
pause 0.1
}
}
How do I turn a matrix:
[ 0.12 0.23 0.34 ;
0.45 0.56 0.67 ;
0.78 0.89 0.90 ]
into a 'coordinate' matrix with a bunch of rows?
[ 1 1 0.12 ;
1 2 0.23 ;
1 3 0.34 ;
2 1 0.45 ;
2 2 0.56 ;
2 3 0.67 ;
3 1 0.78 ;
3 2 0.89 ;
3 3 0.90 ]
(permutation of the rows is irrelevant, it only matters that the data is in this structure)
Right now I'm using a for loop but that takes a long time.
Here is an option using ind2sub:
mat= [ 0.12 0.23 0.34 ;
0.45 0.56 0.67 ;
0.78 0.89 0.90 ] ;
[I,J] = ind2sub(size(mat), 1:numel(mat));
r=[I', J', mat(:)]
r =
1.0000 1.0000 0.1200
2.0000 1.0000 0.4500
3.0000 1.0000 0.7800
1.0000 2.0000 0.2300
2.0000 2.0000 0.5600
3.0000 2.0000 0.8900
1.0000 3.0000 0.3400
2.0000 3.0000 0.6700
3.0000 3.0000 0.9000
Note that the indices are reversed compared to your example.
A = [ .12 .23 .34 ;
.45 .56 .67 ;
.78 .89 .90 ];
[ii jj] = meshgrid(1:size(A,1),1:size(A,2));
B = A.';
R = [ii(:) jj(:) B(:)];
If you don't mind a different order (according to your edit), you can do it more easily:
[ii jj] = ndgrid(1:size(A,1),1:size(A,2));
R = [ii(:) jj(:) A(:)];
In addition to generating the row/col indexes with meshgrid, you can use all three outputs of find as follows:
[II,JJ,AA]= find(A.'); %' note the transpose since you want to read across
M = [JJ II AA]
M =
1 1 0.12
1 2 0.23
1 3 0.34
2 1 0.45
2 2 0.56
2 3 0.67
3 1 0.78
3 2 0.89
3 3 0.9
Limited application because zeros get lost. Nasty, but correct workaround (thanks user664303):
B = A.'; v = B == 0; %' transpose to read across, otherwise work directly with A
[II, JJ, AA] = find(B + v);
M = [JJ II AA-v(:)];
Needless to say, I would recommend one of the other solutions. :) In particular, ndgrid is the most natural solution to obtaining the row,col inds.
I find ndgrid to be the most natural solution, but here's a fun way to do it manually with the odd couple of kron and repmat:
M = [kron(1:size(A,2),ones(1,size(A,1))).' ... %' row indexes
repmat((1:size(A,1))',size(A,2),1) ... %' col indexes
reshape(A.',[],1)] %' matrix values, read across
Simple adjustment to read down, as is natural in MATLAB:
M = [repmat((1:size(A,1))',size(A,2),1) ... %' row indexes (still)
kron(1:size(A,2),ones(1,size(A,1))).' ... %' column indexes
A(:)] % matrix values, read down
(Also since my first answer was obscenely hackish.)
I also find kron to be a nice tool to replicate each element at a time rather than than the entire array at a time, as repmat does. For example:
>> 1:size(A,2)
ans =
1 2 3
>> kron(1:size(A,2),ones(1,size(A,1)))
ans =
1 1 1 2 2 2 3 3 3
Taking this a bit further, we can generate a new function called repel to replicate elements of an array as opposed to the whole array:
>> repel = #(x,m,n) kron(x,ones(m,n));
>> repel(1:4,1,2)
ans =
1 1 2 2 3 3 4 4
>> repel(1:3,2,2)
ans =
1 1 2 2 3 3
1 1 2 2 3 3
I have a set of data with a bunch of columns. Something like the following (in reality my data has about half a million rows):
big = [
1 1 0.93 0.58;
1 2 0.40 0.34;
1 3 0.26 0.31;
1 4 0.40 0.26;
2 1 0.60 0.04;
2 2 0.84 0.55;
2 3 0.53 0.72;
2 4 0.00 0.39;
3 1 0.27 0.51;
3 2 0.46 0.18;
3 3 0.61 0.01;
3 4 0.07 0.04;
4 1 0.26 0.43;
4 2 0.77 0.91;
4 3 0.49 0.80;
4 4 0.40 0.55;
5 1 0.77 0.40;
5 2 0.91 0.28;
5 3 0.80 0.65;
5 4 0.05 0.06;
6 1 0.41 0.37;
6 2 0.11 0.87;
6 3 0.78 0.61;
6 4 0.87 0.51
];
Now, let's say I want to get rid of the rows where the first column is a 3 or a 6.
I'm doing that like so:
filterRows = [3 6];
for i = filterRows
big = big(~ismember(1:size(big,1), find(big(:,1) == i)), :);
end
Which works, but the loop makes me think I'm missing a more efficient trick. Is there a better way to do this?
Originally I tried:
big(find(big(:,1) == filterRows ),:) = [];
but of course that doesn't work.
Use logical indexing:
rows = (big(:, 1) == 3 | big(:, 1) == 6);
big(rows, :) = [];
In the general case, where the values of the first column are stored in filterRows, you can generate the logical vector rows with ismember:
rows = ismember(big(:, 1), filterRows);
or with bsxfun:
rows = any(bsxfun(#eq, big(:, 1), filterRows(:).'), 2);
I need to sort a matrix so that all elements stay in their columns and each column is in ascending order. Is there a vectorized column-wise sort for a matrix or a data frame in R? (My matrix is all-positive and bounded by B, so I can add j*B to each cell in column j and do a regular one-dimensional sort:
> set.seed(100523); m <- matrix(round(runif(30),2), nrow=6); m
[,1] [,2] [,3] [,4] [,5]
[1,] 0.47 0.32 0.29 0.54 0.38
[2,] 0.38 0.91 0.76 0.43 0.92
[3,] 0.71 0.32 0.48 0.16 0.85
[4,] 0.88 0.83 0.61 0.95 0.72
[5,] 0.16 0.57 0.70 0.82 0.05
[6,] 0.77 0.03 0.75 0.26 0.05
> offset <- rep(seq_len(5), rep(6, 5)); offset
[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5
> m <- matrix(sort(m + offset), nrow=nrow(m)) - offset; m
[,1] [,2] [,3] [,4] [,5]
[1,] 0.16 0.03 0.29 0.16 0.05
[2,] 0.38 0.32 0.48 0.26 0.05
[3,] 0.47 0.32 0.61 0.43 0.38
[4,] 0.71 0.57 0.70 0.54 0.72
[5,] 0.77 0.83 0.75 0.82 0.85
[6,] 0.88 0.91 0.76 0.95 0.92
But is there something more beautiful already included?) Otherwise, what would be the fastest way if my matrix has around 1M (10M, 100M) entries (roughly a square matrix)? I'm worried about the performance penalty of apply and friends.
Actually, I don't need "sort", just "top n", with n being around 30 or 100, say. I am thinking about using apply and the partial parameter of sort, but I wonder if this is cheaper than just doing a vectorized sort. So, before doing benchmarks on my own, I'd like to ask for advice by experienced users.
If you want to use sort, ?sort indicates that method = "quick" can be twice as fast as the default method with on the order of 1 million elements.
Start with apply(m, 2, sort, method = "quick") and see if that provides sufficient speed.
Do note the comments on this in ?sort though; ties are sorted in a non-stable manner.
I have put down a quick testing framework for the solutions proposed so far.
library(rbenchmark)
sort.q <- function(m) {
sort(m, method='quick')
}
sort.p <- function(m) {
mm <- sort(m, partial=TOP)[1:TOP]
sort(mm)
}
sort.all.g <- function(f) {
function(m) {
o <- matrix(rep(seq_len(SIZE), rep(SIZE, SIZE)), nrow=SIZE)
matrix(f(m+o), nrow=SIZE)[1:TOP,]-o[1:TOP,]
}
}
sort.all <- sort.all.g(sort)
sort.all.q <- sort.all.g(sort.q)
apply.sort.g <- function(f) {
function(m) {
apply(m, 2, f)[1:TOP,]
}
}
apply.sort <- apply.sort.g(sort)
apply.sort.p <- apply.sort.g(sort.p)
apply.sort.q <- apply.sort.g(sort.q)
bb <- NULL
SIZE_LIMITS <- 3:9
TOP_LIMITS <- 2:5
for (SIZE in floor(sqrt(10)^SIZE_LIMITS)) {
for (TOP in floor(sqrt(10)^TOP_LIMITS)) {
print(c(SIZE, TOP))
TOP <- min(TOP, SIZE)
m <- matrix(runif(SIZE*SIZE), floor(SIZE))
if (SIZE < 1000) {
mr <- apply.sort(m)
stopifnot(apply.sort.q(m) == mr)
stopifnot(apply.sort.p(m) == mr)
stopifnot(sort.all(m) == mr)
stopifnot(sort.all.q(m) == mr)
}
b <- benchmark(apply.sort(m),
apply.sort.q(m),
apply.sort.p(m),
sort.all(m),
sort.all.q(m),
columns= c("test", "elapsed", "relative",
"user.self", "sys.self"),
replications=1,
order=NULL)
b$SIZE <- SIZE
b$TOP <- TOP
b$test <- factor(x=b$test, levels=b$test)
bb <- rbind(bb, b)
}
}
ftable(xtabs(user.self ~ SIZE+test+TOP, bb))
The results so far indicate that for all but the biggest matrices, apply really hurts performance unless doing a "top n". For "small" matrices < 1e6, just sorting the whole thing without apply is competitive. For "huge" matrices, sorting the whole array becomes slower than apply. Using partial works best for "huge" matrices and is only a slight loss for "small" matrices.
Please feel free to add your own sorting routine :-)
TOP 10 31 100 316
SIZE test
31 apply.sort(m) 0.004 0.012 0.000 0.000
apply.sort.q(m) 0.008 0.016 0.000 0.000
apply.sort.p(m) 0.008 0.020 0.000 0.000
sort.all(m) 0.000 0.008 0.000 0.000
sort.all.q(m) 0.000 0.004 0.000 0.000
100 apply.sort(m) 0.012 0.016 0.028 0.000
apply.sort.q(m) 0.016 0.016 0.036 0.000
apply.sort.p(m) 0.020 0.020 0.040 0.000
sort.all(m) 0.000 0.004 0.008 0.000
sort.all.q(m) 0.004 0.004 0.004 0.000
316 apply.sort(m) 0.060 0.060 0.056 0.060
apply.sort.q(m) 0.064 0.060 0.060 0.072
apply.sort.p(m) 0.064 0.068 0.108 0.076
sort.all(m) 0.016 0.016 0.020 0.024
sort.all.q(m) 0.020 0.016 0.024 0.024
1000 apply.sort(m) 0.356 0.276 0.276 0.292
apply.sort.q(m) 0.348 0.316 0.288 0.296
apply.sort.p(m) 0.256 0.264 0.276 0.320
sort.all(m) 0.268 0.244 0.213 0.244
sort.all.q(m) 0.260 0.232 0.200 0.208
3162 apply.sort(m) 1.997 1.948 2.012 2.108
apply.sort.q(m) 1.916 1.880 1.892 1.901
apply.sort.p(m) 1.300 1.316 1.376 1.544
sort.all(m) 2.424 2.452 2.432 2.480
sort.all.q(m) 2.188 2.184 2.265 2.244
10000 apply.sort(m) 18.193 18.466 18.781 18.965
apply.sort.q(m) 15.837 15.861 15.977 16.313
apply.sort.p(m) 9.005 9.108 9.304 9.925
sort.all(m) 26.030 25.710 25.722 26.686
sort.all.q(m) 23.341 23.645 24.010 24.073
31622 apply.sort(m) 201.265 197.568 196.181 196.104
apply.sort.q(m) 163.190 160.810 158.757 160.050
apply.sort.p(m) 82.337 81.305 80.641 82.490
sort.all(m) 296.239 288.810 289.303 288.954
sort.all.q(m) 260.872 249.984 254.867 252.087
Does
apply(m, 2, sort)
do the job? :)
Or for top-10, say, use:
apply(m, 2 ,function(x) {sort(x,dec=TRUE)[1:10]})
Performance is strong - for 1e7 rows and 5 cols (5e7 numbers in total), my computer took around 9 or 10 seconds.
R is very fast at matrix calculations. A matrix with 1e7 elements in 1e4 columns gets sorted in under 3 seconds on my machine
set.seed(1)
m <- matrix(runif(1e7), ncol=1e4)
system.time(sm <- apply(m, 2, sort))
user system elapsed
2.62 0.14 2.79
The first 5 columns:
sm[1:15, 1:5]
[,1] [,2] [,3] [,4] [,5]
[1,] 2.607703e-05 0.0002085913 9.364448e-05 0.0001937598 1.157424e-05
[2,] 9.228056e-05 0.0003156713 4.948019e-04 0.0002542199 2.126186e-04
[3,] 1.607228e-04 0.0003988042 5.015987e-04 0.0004544661 5.855639e-04
[4,] 5.756689e-04 0.0004399747 5.762535e-04 0.0004621083 5.877446e-04
[5,] 6.932740e-04 0.0004676797 5.784736e-04 0.0004749235 6.470268e-04
[6,] 7.856274e-04 0.0005927107 8.244428e-04 0.0005443178 6.498618e-04
[7,] 8.489799e-04 0.0006210336 9.249109e-04 0.0005917936 6.548134e-04
[8,] 1.001975e-03 0.0006522120 9.424880e-04 0.0007702231 6.569310e-04
[9,] 1.042956e-03 0.0007237203 1.101990e-03 0.0009826915 6.810103e-04
[10,] 1.246256e-03 0.0007968422 1.117999e-03 0.0009873926 6.888523e-04
[11,] 1.337960e-03 0.0009294956 1.229132e-03 0.0009997757 8.671272e-04
[12,] 1.372295e-03 0.0012221676 1.329478e-03 0.0010375632 8.806398e-04
[13,] 1.583430e-03 0.0012781983 1.433513e-03 0.0010662393 8.886999e-04
[14,] 1.603961e-03 0.0013518191 1.458616e-03 0.0012068383 8.903167e-04
[15,] 1.673268e-03 0.0013697683 1.590524e-03 0.0013617468 1.024081e-03
They say there's a fine line between genius and madness... take a look at this and see what you think of the idea. As in the question, the goal is to find the top 30 elements of a vector vec that might be long (1e7, 1e8, or more elements).
topn = 30
sdmult = max(1,qnorm(1-(topn/length(vec))))
sdmin = 1e-5
acceptmult = 10
calcsd = max(sd(vec),sdmin)
calcmn = mean(vec)
thresh = calcmn + sdmult*calcsd
subs = which(vec > thresh)
while (length(subs) > topn * acceptmult) {
thresh = thresh + calcsd
subs = which(vec > thresh)
}
while (length(subs) < topn) {
thresh = thresh - calcsd
subs = which(vec > thresh)
}
topvals = sort(vec[subs],dec=TRUE)[1:topn]
The basic idea is that even if we don't know much about the distribution of vec, we'd certainly expect the highest values in vec to be several standard deviations above the mean. If vec were normally distributed, then the qnorm expression on line 2 gives a rough idea how many sd's above the mean we'd need to look to find the highest topn values (e.g. if vec contains 1e8 values, the top 30 values are likely to be located in the region starting 5 sd's above the mean.) Even if vec isn't normal, this assumption is unlikely to be massively far away from the truth.
Ok, so we compute the mean and sd of vec, and use these to propose a threshold to look above - a certain number of sd's above the mean. We're hoping to find in this upper tail a subset of slightly more than topn values. If we do, we can sort it and easily identify the highest topn values - which will be the highest topn values in vec overall.
Now the exact rules here can probably be tweaked a bit, but the idea is that we need to guard against the original threshold being "out" for some reason. We therefore exploit the fact that it's quick to check how many elements lie above a certain threshold. So, we first raise the threshold, in increments of calcsd, until there are fewer than 10 * topn elements above the threshold. Then, if needed. we reduce thresh (again in steps of calcsd) until we definitely have at least topn elements above the threshold. This bi-directional search should always lead to a "threshold set" whose size is fairly close to topn (hopefully within a factor of 10 or 100). As topn is relatively small (typical value 30), it will be really fast to sort this threshold set, which of course immediately gives us the highest topn elements in the original vector vec.
My claim is that the calculations involved in generating a decent threshold set are all quick in R, so if only the top 30 or so elements of a very large vector are required, this indirect approach will beat any approach that involves sorting the whole vector.
What do you think?! If you think it's an interesting idea, please like/vote up :) I'll look at doing some proper timings but my initial tests on randomly generated data were really promising - it'd be great to test it out on "real" data though...!
Cheers :)