I am using GNUplot for plotting a small matrix.
The matrix is 100x100 by size. e.g.
1.23212 2.43123 -1.24312 ......
-4.23123 2.00458 5.60234 ......
......
The data is not neatly stored in the file.
So from C++ point of view, due to the lack of length per data, there is no way to load a whole number, but it has to check when the number is loading. I guess this should be the reason of the slow plotting speed.
Now I have 3 questions:
Q1: Is loading the bottle neck?
Q2: If I can make the data file neatly stored. e.g.
1.23212 2.43123 -1.24312 ......
-4.23123 2.00458 5.60234 ......
......
Does the plotting speed get any improvement? (Maybe GNUplot can check what the pattern is. Thus improve the loading speed. Not sure.)
Q3: Any other options that I can set to make it faster?
Edit
I tried these:
-3.07826e-21 -2.63821e-20 -1.05205e-19 -3.25317e-19 -9.1551e-19
when outputting used setw to make sure there they are aligned. But I think I still need to tell GNUplot to load 13 characters at one time, then perform strtod.
I would guess, in order to fit a general case, where there is no information for the length of the number, it is safe to do it digit by digit until it there is a space.
Related
I want to build an application that would do something equivalent to running lsof (maybe changing it to output differently, because string processing may mean it is not real time enough) in a loop and then associate each line (entries) with what iteration it was present in, what I will be referring further as frames, as later on it will be better for understanding. My intention with it is that showing the times in which files are open by applications can reveal something about their structure, while not having big impact on their execution, which is often a problem. One problem I have is on processing the output, which would be a table relating "frames X entry", for that I am already anticipating that I will have wildly variable entry lengths. Which can fall in that problem of representing on geometry when you have very different scales, the smaller get infinitely small, while the bigger gets giant and fragmentation makes it even worse; so my question is if plotting libraries deal with this problem and how they do it
The easiest and most well-established technique for showing both small and large values in reasonable detail is a logarithmic scale. Instead of plotting raw values, plot their logarithms. This is notoriously problematic if you can have zero or even negative values, but as I understand your situations all your lengths would be strictly positive so this should work.
Another statistical solution you could apply is to plot ranks instead of raw values. Take all the observed values, and put them in a sorted list. When plotting any single data point, instead of plotting the value itself you look up that value in the list of values (possibly using binary search since it's a sorted list) then plot the index at which you found the value.
This is a monotonous transformation, so small values map to small indices and big values to big indices. On the other hand it completely discards the actual magnitude, only the relative comparisons matter.
If this is too radical, you could consider using it as an ingredient for something more tuneable. You could experiment with a linear combination, i.e. plot
a*x + b*log(x) + c*rank(x)
then tweak a, b and c till the result looks pleasing.
I am working on drawing graphs with Gnuplot.
The thing is as it works on, due to high memeory usage, it does not work properly, or be killed in a few minutes.
My laptops memory is 4GB. And the file size if around 1GB to 1.5 GB.
Actually, I am a beginner of C language and gnuplotting. What I cannot understand is that why this 'simple-looking' work takes so many memories. It's just matching points of t and x between.
I'll write down a part of the file below. And the code I wrote down on the terminal was;
plot "fl1.dat" u 1:2 linetype 1.
1.00000e+00 1.88822e-01
2.00000e+00 3.55019e-01
3.00000e+00 -1.74283e+00
4.00000e+00 -2.67627e+00
...
...
...
Is only way I can do is add more RAM, or using computer on lab?
Thank you.
Plotting of a data file is done to see the overall or global behavior of some quantity, not the local behavior for which you can just see the value from the data file. This, said, in your case, I think you do not need to plot each and every point from the file since the file is huge and it seems pointless to plot it all. Thus I suggest the following:
pl 'fl1.dat' u 1:2 every 10
This will plot every 10'th point only but if anyway there are two many points spaced very finely, then that would still show a global behavior of the plot nicely. Remember that this won't connect the individual points. If you still want a continuous line, I suggest to create another data file with every 10th file in it and then plot it as usual with lines.
Another thing to note is that the choice of output terminal can have a tremendous effect on the memory consumption: interactive windows or vector formats will consume much more (I guess because these formats keep track of every single data-point, even though, as stressed by Peaceful, you probably don't need all those points). So a quick way to reduce the memory consumption may be to set the output terminal to a modestly-sized png, eg:
set terminal png size 1000,1000
set output "mygraph.png"
This question already has an answer here:
save high resolution figures with parfor in matlab
(1 answer)
Closed 8 years ago.
I've got a ~1600 line program that reads in images (either tiff or raw), performs a whole bunch of different mathematical and statistical analyses, and then outputs graphs and data tables at the end.
Almost two-thirds of my processing time is due to looping 16 times over the following code:
h = figure('Visible','off','units','normalized','outerposition',[0 0 1 1]);
set(h,'PaperPositionMode','auto');
imagesc(picdata); colormap(hot);
imgtmp = hardcopy(h,'-dzbuffer','-r0');
imwrite(imgtmp,hot,'picname.png');
Naturally, 'picname.png' and picdata are changing each time around.
Is there a better way to invisibly plot and save these pictures? The processing time mostly takes place inside imwrite, with hardcopy coming second. The whole purpose of the pictures is just to get a general idea of what the data looks like; I'm not going to need to load them back into Matlab to do future processing of any sort.
Try to place the figure off-screen (e.g., Position=[-1000,-1000,500,500]). This will make it "Visible" and yet no actual rendering will need to take place, which should make things faster.
Also, try to reuse the same figure for all images - no need to recreate the figure and image axes and colormap every time.
Finally, try using my ScreenCapture utility rather than hardcopy+imwrite. It uses a different method for taking a "screenshot" which may possibly be faster.
I have a long time series with some repeating and similar looking signals in it (not entirely periodical). The length of the time series is about 60000 samples. To identify the signals, I take out one of them, having a length of around 1000 samples and move it along my timeseries data sample by sample, and compute cross-correlation coefficient (in Matlab: corrcoef). If this value is above some threshold, then there is a match.
But this is excruciatingly slow (using 'for loop' to move the window).
Is there a way to speed this up, or maybe there is already some mechanism in Matlab for this ?
Many thanks
Edited: added information, regarding using 'xcorr' instead:
If I use 'xcorr', or at least the way I have used it, I get the wrong picture. Looking at the data (first plot), there are two types of repeating signals. One marked by red rectangles, whereas the other and having much larger amplitudes (this is coherent noise) is marked by a black rectangle. I am interested in the first type. Second plot shows the signal I am looking for, blown up.
If I use 'xcorr', I get the third plot. As you see, 'xcorr' gives me the wrong signal (there is in fact high cross correlation between my signal and coherent noise).
But using "'corrcoef' and moving the window, I get the last plot which is the correct one.
There maybe a problem of normalization when using 'xcorr', but I don't know.
I can think of two ways to speed things up.
1) make your template 1024 elements long. Suddenly, correlation can be done using FFT, which is significantly faster than DFT or element-by-element multiplication for every position.
2) Ask yourself what it is about your template shape that you really care about. Do you really need the very high frequencies, or are you really after lower frequencies? If you could re-sample your template and signal so it no longer contains any frequencies you don't care about, it will make the processing very significantly faster. Steps to take would include
determine the highest frequency you care about
filter your data so higher frequencies are blocked
resample the resulting data at a lower sampling frequency
Now combine that with a template whose size is a power of 2
You might find this link interesting reading.
Let us know if any of the above helps!
Your problem seems like a textbook example of cross-correlation. Therefore, there's no good reason using any solution other than xcorr. A few technical comments:
xcorr assumes that the mean was removed from the two cross-correlated signals. Furthermore, by default it does not scale the signals' standard deviations. Both of these issues can be solved by z-scoring your two signals: c=xcorr(zscore(longSig,1),zscore(shortSig,1)); c=c/n; where n is the length of the shorter signal should produce results equivalent with your sliding window method.
xcorr's output is ordered according to lags, which can obtained as in a second output argument ([c,lags]=xcorr(..). Always plot xcorr results by plot(lags,c). I recommend trying a synthetic signal to verify that you understand how to interpret this chart.
xcorr's implementation already uses Discere Fourier Transform, so unless you have unusual conditions it will be a waste of time to code a frequency-domain cross-correlation again.
Finally, a comment about terminology: Correlating corresponding time points between two signals is plain correlation. That's what corrcoef does (it name stands for correlation coefficient, no 'cross-correlation' there). Cross-correlation is the result of shifting one of the signals and calculating the correlation coefficient for each lag.
I am trying to use Mathematica to analyse some raw data. I'd like to be able to dynamically display the range of data I'm interested in using Manipulate and ListLinePlot, but the plot rendering is extremely slow. How can I speed it up?
Here are some additional details. An external text file stores the raw data: the first column is a timestamp, the second, third and fourth columns are data readings, for example:
1309555993069, -2.369941, 6.129157, 6.823794
1309555993122, -2.260978, 6.170018, 7.014479
1309555993183, -2.070293, 6.129157, 6.823794
1309555993242, -1.988571, 6.238119, 7.123442
A single data file contains up to 2·106 lines. To display, for example, the second column, I use:
x = Import["path/to/datafile"];
ListLinePlot[x[[All, {1, 2}]]]
The execution time of this operation is unbearably long. To display a variable range of data I tried to use Manipulate:
Manipulate[ListLinePlot[Take[x, numrows][[All, {1, 2}]]], {numrows, 1, Length[x]}]
This instruction works, but it quickly crawls when I try to display more than few thousand lines. How can I speed it up?
Some additional details:
MATLAB displays the same amount of data on the same computer almost instantaneously, thus the raw data size shouldn't be an issue.
I already tried to turn off graphics antialiasing, but it didn't impact rendering speed at all.
Using DataRange to avoid Take doesn't help.
Using MaxPlotPoints distorts too much the plot to be useful.
Not using Take in Manipulate doesn't help.
The rendering seems to take huge amount of time. Running Timing[ListLinePlot[Take[x,100000][[All, {1, 2}]]]] returns 0.33: this means that the evaluation of Take by itself is almost instantaneous, is the plot rendering that slows everything down.
I am running Mathematica on Ubuntu Linux 11.10 using the fglrx drivers. Forcing Mathematica to use mesa drivers didn't help.
Any hint?
If your goal is to just visualize your data quickly but properly, you can use the following trick, which I am constantly using.
I partition the data into a number of blocks corresponding roughly to the resolution of my screen (usually 1000 or less), more detail cannot be displayed anyway. Then I determine the Min and Max of each block, and draw a zig-zag line from min to max to min to max... The result will look exactly like the original data. You can however not "zoom in", as you would then see the zig-zag line (e.g. when exporting to high-res pdf). Then you need to use a larger number of blocks.
rv = RandomVariate[ExponentialDistribution[2], 100000];
ListLinePlot[rv, PlotRange -> All] (* original, slow *)
ListLinePlot[rv, PlotRange -> All, MaxPlotPoints -> 1000] (* fast but distorted *)
numberOfBlocks = 1000;
ListLinePlot[Riffle ## Through[{Min /# # &, Max /# # &}[
Partition[rv,Floor[Length[rv]/numberOfBlocks]]]], PlotRange -> All]
You can add the DataRange->{...} option to label the x-axis appropriately.
Hope this helps!
EDIT:
See also this similar question on Mathematica Stackexchange:
https://mathematica.stackexchange.com/q/140/58
I haven't tested extensively this on my machine (I have a Mac, so I can't rule out Linux-specific issues). but a couple of points occur to me. The following was pretty quick for me, but obviously slower than if the data set was smaller. You are plotting hundreds of thousands of data points.
data = Accumulate#RandomVariate[NormalDistribution[], 200000];
Manipulate[ListLinePlot[Take[data, n]], {n, 1, Length[data]}]
In a Manipulate, you are allowing the amount of data shown with Take to vary arbitrarily. Try only incrementing numrows every 100 or so points, so there is less to render.
Try using the ContinuousAction->False option (see documentation) (I see #Szabolcs had the same idea as I was typing.
I was about to suggest MaxPlotPoints, but instead try the PerformanceGoal ->"Speed" option. (see documentation)
I also noticed that occasionally Mathematica will take too long to render graphics. Actually it must be some translation step from a Mathematica Graphics expression to some other representation that takes long because once rendered, resizing (and thus re-rendering) the graphic is much faster. Pre-version-6 graphics rendering used to be faster for many examples (but also lacks a lot of functionality that 6+ have).
Some ideas about what you could do:
Use the MaxPlotPoints option of ListLinePlot to reduce the data before plotting. It might not make a difference in looks if its downsampled. The Method option should choose the downsample algorithm, but I can't find any docs for it (anyone?)
Use ContinuousAction -> False in Manipulate to stop it from recomputing everything in real time as you drag the sliders.
Another idea here is using the Ramer–Douglas–Peucker algorithm to reduce the number of data points before plotting. This will likely preserve the shape of the data better. I don't know if you still need this so I won't provide an implementation.