I am working on drawing graphs with Gnuplot.
The thing is as it works on, due to high memeory usage, it does not work properly, or be killed in a few minutes.
My laptops memory is 4GB. And the file size if around 1GB to 1.5 GB.
Actually, I am a beginner of C language and gnuplotting. What I cannot understand is that why this 'simple-looking' work takes so many memories. It's just matching points of t and x between.
I'll write down a part of the file below. And the code I wrote down on the terminal was;
plot "fl1.dat" u 1:2 linetype 1.
1.00000e+00 1.88822e-01
2.00000e+00 3.55019e-01
3.00000e+00 -1.74283e+00
4.00000e+00 -2.67627e+00
...
...
...
Is only way I can do is add more RAM, or using computer on lab?
Thank you.
Plotting of a data file is done to see the overall or global behavior of some quantity, not the local behavior for which you can just see the value from the data file. This, said, in your case, I think you do not need to plot each and every point from the file since the file is huge and it seems pointless to plot it all. Thus I suggest the following:
pl 'fl1.dat' u 1:2 every 10
This will plot every 10'th point only but if anyway there are two many points spaced very finely, then that would still show a global behavior of the plot nicely. Remember that this won't connect the individual points. If you still want a continuous line, I suggest to create another data file with every 10th file in it and then plot it as usual with lines.
Another thing to note is that the choice of output terminal can have a tremendous effect on the memory consumption: interactive windows or vector formats will consume much more (I guess because these formats keep track of every single data-point, even though, as stressed by Peaceful, you probably don't need all those points). So a quick way to reduce the memory consumption may be to set the output terminal to a modestly-sized png, eg:
set terminal png size 1000,1000
set output "mygraph.png"
Related
Usually when I search a file with grep, the search is done sequentially. Is it possible to perform a non-sequential search or a parallel search? Or for example, a search between line l1 and line l2 without having to go through the first l1-1 lines?
You can use tail -n +N file | grep to begin a grep at a given line offset.
You can combine head with tail to search over just a fixed range.
However, this still must scan the file for end of line characters.
In general, sequential reads are the fastest reads for disks. Trying to do a parallel search will most likely cause random disk seeks and perform worse.
For what it is worth, a typical book contains about 200 words per page. At a typical 5 letters per word, you're looking at about 1kb per page, so 1000 pages would still be 1MB. A standard desktop hard drive can easily read that in a fraction of a second.
You can't speed up disk read throughput this way. In fact, I can almost guarantee you are not saturating your disk read rate right now for a file that small. You can use iostat to confirm.
If your file is completely ASCII, you may be able to speed things up by setting you locale to the C locale to avoid doing any type of Unicode translation.
If you need to do multiple searches over the same file, it would be worthwhile to build a reverse index to do the search. For code there are tools like exuberant ctags that can do that for you. Otherwise, you're probably looking at building a custom tool. There are tools for doing general text search over large corpuses, but that's probably overkill for you. You could even load the file into a database like Postgresql that supports full text search and have it build an index for you.
Padding the lines to a fixed record length is not necessarily going to solve your problem. As I mentioned before, I don't think you have an IO throughout issue, you could see that yourself by simply moving the file to a temporary ram disk that you create. That removes all potential IO. If that's still not fast enough for you then you're going to have to pursue an entirely different solution.
if your lines are fixed length, you can use dd to read a particular section of the file:
dd if=myfile.txt bs=<line_leght> count=<lines_to_read> skip=<start_line> | other_commands
Note that dd will read from disk using the block size specified for input (bs). That might be slow and could be batched, by reading a group of lines at once so that you pull from disk at least 4kb. In this case you want to look at skip_bytes and count_bytes flags to be able to start and end at lines that are not multiple of your block size.
Another interesting option is the output block size obs, which could benefit from being either the same of input or a single line.
The simple answer is: you can't. What you want contradicts itself: You don't want to scan the entire file, but you want to know where each line ends. You can't know where each line ends without actually scanning the file. QED ;)
This question already has an answer here:
save high resolution figures with parfor in matlab
(1 answer)
Closed 8 years ago.
I've got a ~1600 line program that reads in images (either tiff or raw), performs a whole bunch of different mathematical and statistical analyses, and then outputs graphs and data tables at the end.
Almost two-thirds of my processing time is due to looping 16 times over the following code:
h = figure('Visible','off','units','normalized','outerposition',[0 0 1 1]);
set(h,'PaperPositionMode','auto');
imagesc(picdata); colormap(hot);
imgtmp = hardcopy(h,'-dzbuffer','-r0');
imwrite(imgtmp,hot,'picname.png');
Naturally, 'picname.png' and picdata are changing each time around.
Is there a better way to invisibly plot and save these pictures? The processing time mostly takes place inside imwrite, with hardcopy coming second. The whole purpose of the pictures is just to get a general idea of what the data looks like; I'm not going to need to load them back into Matlab to do future processing of any sort.
Try to place the figure off-screen (e.g., Position=[-1000,-1000,500,500]). This will make it "Visible" and yet no actual rendering will need to take place, which should make things faster.
Also, try to reuse the same figure for all images - no need to recreate the figure and image axes and colormap every time.
Finally, try using my ScreenCapture utility rather than hardcopy+imwrite. It uses a different method for taking a "screenshot" which may possibly be faster.
I have a working MATLAB program measures data to tune a machine in real-time using the SOAP library from MATLAB (several measurements per second). It was working well, updating two figures, each containing four sub-plots as the tuning proceeds. Recently the plot has stopped updating, just showing a grey box (actually two shades of grey if you re-size the window).
I can tell that the program is running properly from debug information written to the MATLAB console. Also, sometimes the plots update in a burst, adding many new points, when they should be updating with every new point.
I have made several small changes to the code to reduce the comms traffic, but my biggest recent change is to all lots of calls to toc to measure the time taken in various parts of the code, with a single tic at the start.
Is it possible these extra timing calls could be suppressing the plots?
Here is a cut down copy of my code. It is a nested function that makes use of some configuration data from the top level function. The figure is one of two created in the top level function then completely redrawn as new data arrives.
function acc_plot(accFig, accData)
figure(accFig);
sp1 = '221';
% Plot current vs raw position
subplot(sp1);
plot(xRawPos,yCfbDcM,'r.', xRawPos,yCfbDcP,'b.')
hold on;
if tuneConfig.dualSensors(accData.axisIx)
plot(xRawPosB,yCfbDcM,'g.', xRawPosB,yCfbDcP,'m.')
end
title(['acc check ' tuneConfig.axisNames{accData.axisIx}])
ylabel('CFBM(r)/CFBP(b) [A]')
xlabel(xPosLabel)
grid on
end
Add "drawnow" to your function to force refresh.
I am using GNUplot for plotting a small matrix.
The matrix is 100x100 by size. e.g.
1.23212 2.43123 -1.24312 ......
-4.23123 2.00458 5.60234 ......
......
The data is not neatly stored in the file.
So from C++ point of view, due to the lack of length per data, there is no way to load a whole number, but it has to check when the number is loading. I guess this should be the reason of the slow plotting speed.
Now I have 3 questions:
Q1: Is loading the bottle neck?
Q2: If I can make the data file neatly stored. e.g.
1.23212 2.43123 -1.24312 ......
-4.23123 2.00458 5.60234 ......
......
Does the plotting speed get any improvement? (Maybe GNUplot can check what the pattern is. Thus improve the loading speed. Not sure.)
Q3: Any other options that I can set to make it faster?
Edit
I tried these:
-3.07826e-21 -2.63821e-20 -1.05205e-19 -3.25317e-19 -9.1551e-19
when outputting used setw to make sure there they are aligned. But I think I still need to tell GNUplot to load 13 characters at one time, then perform strtod.
I would guess, in order to fit a general case, where there is no information for the length of the number, it is safe to do it digit by digit until it there is a space.
I am trying to use Mathematica to analyse some raw data. I'd like to be able to dynamically display the range of data I'm interested in using Manipulate and ListLinePlot, but the plot rendering is extremely slow. How can I speed it up?
Here are some additional details. An external text file stores the raw data: the first column is a timestamp, the second, third and fourth columns are data readings, for example:
1309555993069, -2.369941, 6.129157, 6.823794
1309555993122, -2.260978, 6.170018, 7.014479
1309555993183, -2.070293, 6.129157, 6.823794
1309555993242, -1.988571, 6.238119, 7.123442
A single data file contains up to 2·106 lines. To display, for example, the second column, I use:
x = Import["path/to/datafile"];
ListLinePlot[x[[All, {1, 2}]]]
The execution time of this operation is unbearably long. To display a variable range of data I tried to use Manipulate:
Manipulate[ListLinePlot[Take[x, numrows][[All, {1, 2}]]], {numrows, 1, Length[x]}]
This instruction works, but it quickly crawls when I try to display more than few thousand lines. How can I speed it up?
Some additional details:
MATLAB displays the same amount of data on the same computer almost instantaneously, thus the raw data size shouldn't be an issue.
I already tried to turn off graphics antialiasing, but it didn't impact rendering speed at all.
Using DataRange to avoid Take doesn't help.
Using MaxPlotPoints distorts too much the plot to be useful.
Not using Take in Manipulate doesn't help.
The rendering seems to take huge amount of time. Running Timing[ListLinePlot[Take[x,100000][[All, {1, 2}]]]] returns 0.33: this means that the evaluation of Take by itself is almost instantaneous, is the plot rendering that slows everything down.
I am running Mathematica on Ubuntu Linux 11.10 using the fglrx drivers. Forcing Mathematica to use mesa drivers didn't help.
Any hint?
If your goal is to just visualize your data quickly but properly, you can use the following trick, which I am constantly using.
I partition the data into a number of blocks corresponding roughly to the resolution of my screen (usually 1000 or less), more detail cannot be displayed anyway. Then I determine the Min and Max of each block, and draw a zig-zag line from min to max to min to max... The result will look exactly like the original data. You can however not "zoom in", as you would then see the zig-zag line (e.g. when exporting to high-res pdf). Then you need to use a larger number of blocks.
rv = RandomVariate[ExponentialDistribution[2], 100000];
ListLinePlot[rv, PlotRange -> All] (* original, slow *)
ListLinePlot[rv, PlotRange -> All, MaxPlotPoints -> 1000] (* fast but distorted *)
numberOfBlocks = 1000;
ListLinePlot[Riffle ## Through[{Min /# # &, Max /# # &}[
Partition[rv,Floor[Length[rv]/numberOfBlocks]]]], PlotRange -> All]
You can add the DataRange->{...} option to label the x-axis appropriately.
Hope this helps!
EDIT:
See also this similar question on Mathematica Stackexchange:
https://mathematica.stackexchange.com/q/140/58
I haven't tested extensively this on my machine (I have a Mac, so I can't rule out Linux-specific issues). but a couple of points occur to me. The following was pretty quick for me, but obviously slower than if the data set was smaller. You are plotting hundreds of thousands of data points.
data = Accumulate#RandomVariate[NormalDistribution[], 200000];
Manipulate[ListLinePlot[Take[data, n]], {n, 1, Length[data]}]
In a Manipulate, you are allowing the amount of data shown with Take to vary arbitrarily. Try only incrementing numrows every 100 or so points, so there is less to render.
Try using the ContinuousAction->False option (see documentation) (I see #Szabolcs had the same idea as I was typing.
I was about to suggest MaxPlotPoints, but instead try the PerformanceGoal ->"Speed" option. (see documentation)
I also noticed that occasionally Mathematica will take too long to render graphics. Actually it must be some translation step from a Mathematica Graphics expression to some other representation that takes long because once rendered, resizing (and thus re-rendering) the graphic is much faster. Pre-version-6 graphics rendering used to be faster for many examples (but also lacks a lot of functionality that 6+ have).
Some ideas about what you could do:
Use the MaxPlotPoints option of ListLinePlot to reduce the data before plotting. It might not make a difference in looks if its downsampled. The Method option should choose the downsample algorithm, but I can't find any docs for it (anyone?)
Use ContinuousAction -> False in Manipulate to stop it from recomputing everything in real time as you drag the sliders.
Another idea here is using the Ramer–Douglas–Peucker algorithm to reduce the number of data points before plotting. This will likely preserve the shape of the data better. I don't know if you still need this so I won't provide an implementation.