I have a large data file, coastlines.csv, which represents the coastlines of the world, in this format
-4.94237 55.725449
-4.941922 55.725585
....
where each row represents a point, the first entry its the longitude, the second one is the latitude, both in degrees. The file has ~ 3 x 10^6 lines, and weighs ~ 700 MB.
Plotting this file with gnuplot
plot 'coastlines.csv'
takes some time, which is understandable. When I make the plot above and then click on it, draw a square with the mouse and zoom to replot a small region of the world only, the new plot takes the same amount of time as the full one.
I have the impression that gnuplot is checking all the points in the file again, because it does not know which ones will fall within the new plotting window.
Is there a way to speed up this replot?
Thanks!
Related
When I do a scatter plot, by default it shows the axis from 0.0 to 1.0 fractions.
For example, the following graph contains a straight line that goes from (0,0) to (10m,10m), but it shows:
Detailed data generation show at: Large plot: ~20 million samples, gigabytes of data
How to make the axes show from 0 to 10 million instead?
The inspiration for this comes from this question.
Tested in VisIt 2.13.3.
Since scatter plot associates variables of potentially radically different scales, by default, it maps each variable's range into [0,1]. We have this ticket for it. You can manually change by going to scatter plot attribute's window and Apperance tab and un-checking the 'Normalize the axes to a cube' option
I'm trying to make an "animated" plot a lot of data (the position of 1000 particles) from a big text file with a script like:
set terminal wxt size 1000,600
k=999999
N = 999
do for [i=0:k]{
plot for [j=0:N-1] "pos.txt" using 2*j+1:2*j+2 every ::2*i+1::2*i+1 ls 1 pt 7 ps 2 notitle
In the file, every line has the coordinates X and Y at a certain time of the points I want to plot. I'm using every to plot all the data in each row once and then move on to the next row.
The output is something like this (1000 particles moving)
However the plotting is way too slow and I don't know what I can do to make it plot faster. It plots a row once every 5 or more seconds. The file weights some MBs. Should I change the terminal? Or the way I store the data? I think there might be a problem when gnuplot loads a big file.
Some particles dissappear in the simulation so I also get the error line 14: warning: Skipping data file with no valid points when the index j (well 2j+1) goes over the number of particles but I tried making it so that it reads the number of particles each time and it's even slower. Many thanks.
I suspect gnuplot is reading the whole file every time you plot, as opposite to read up to the line in question, then next line, then next, etc. One possible strategy is to separate your particles trajectory into different files, but specially it could help to remove the plot for by simply a plot plus a block selection with every, where instead of selecting the column for the particle you have your particles positions for the same time step in the same block.
Now your data looks something like this:
x1 y1 x2 y2 x3 y3 # Time step 1
x1 y1 x2 y2 x3 y3 # Time step 2
And gnuplot needs to read the file once for every time step and particle. If you structure the file as follows (note one blank line between blocks):
# Time step 1
x1 y1
x2 y2
x3 y3
# Time step 2
x1 y1
x2 y2
x3 y3
Then you don't need the plot for, instead just select the corresponding block with all the particles by inserting one extra semicolon in every:
set terminal wxt size 1000,600
k=999999
#N = 999 you don't need this anymore!
do for [i=0:k] {
plot "pos.txt" every :::i::i
}
The code above reads the file for every time step, rather than every time step and particle, and plots all the particles at once.
If performance is very critical, you may consider using a completely different data format. Although changing the format of the ASCII file gives a huge improvement, it scales badly, because gnuplot must always scan from the beginning of the data file in order to determine the position where to start at. I did some testing, and to plot the first 1000 frames it took me 60s, whereas the points 9000 to 10000 took 600s to plot.
You would need a data format which allows you to seek at any data set in constant time. In my thesis I saved all my experimental data (huge data sets) with hdf5, and then you can use the external utility h5totxt to extract the desired data set. Here, the position of the requested data set can be calculated without scanning the whole file, and the access time is independent of the frame number.
For testing I used the following python script to generate a test data file points.h5:
from numpy import random
import h5py
P = random.normal(size=(10000,1000,2))
f = h5py.File('points.h5', 'w')
f.create_dataset('points', data=P)
The gnuplot script for plotting is
set terminal wxt size 1000,600
k=9999
do for [i=0:9999]{
plot sprintf("< h5totxt -s ' ' -x %d points.h5", i) using 1:2 ls 1 pt 7 ps 2 title sprintf("%d", i)
}
Now, plotting of 1000 frames takes 40s, no matter which frames you take (0-1000 or 9000-10000).
I'm building a photographic film scanner. The electronic hardware is done now I have to finish the mechanical advance mechanism then I'm almost done.
I'm using a line scan sensor so it's one pixel width by 2000 height. The data stream I will be sending to the PC over USB with a FTDI FIFO bridge will be just 1 byte values of the pixels. The scanner will pull through an entire strip of 36 frames so I will end up scanning the entire strip. For the beginning I'm willing to manually split them up in Photoshop but I would like to implement something in my program to do this for me. I'm using C++ in VS. So, basically I need to find a way for the PC to detect the near black strips in between the images on the film, isolate the images and save them as individual files.
Could someone give me some advice for this?
That sounds pretty simple compared to the things you've already implemented; you could
calculate an average pixel value per row, and call the resulting signal s(n) (n being the row number).
set a threshold for s(n), setting everything below that threshold to 0 and everything above to 1
Assuming you don't know the exact pixel height of the black bars and the negatives, search for periodicities in s(n). What I describe in the following is total overkill, but that's how I roll:
use FFTw to calculate a discrete fourier transform of s(n), call it S(f) (f being the frequency, i.e. 1/period).
find argmax(abs(S(f))); that f represents the distance between two black bars: number of rows / f is the bar distance.
S(f) is complex, and thus has an argument; arctan(imag(S(f_max))/real(S(f_max)))*number of rows will give you the position of the bars.
To calculate the width of the bars, you could do the same with the second highest peak of abs(S(f)), but it'll probably be easier to just count the average length of 0 around the calculated center positions of the black bars.
To get the exact width of the image strip, only take the pixels in which the image border may lie: r_left(x) would be the signal representing the few pixels in which the actual image might border to the filmstrip material, x being the coordinate along that row). Now, use a simplistic high pass filter (e.g. f(x):= r_left(x)-r_left(x-1)) to find the sharpest edge in that region (argmax(abs(f(x)))). Use the average of these edges as the border location.
By the way, if you want to write a source block that takes your scanned image as input and outputs a stream of pixel row vectors, using GNU Radio would offer you a nice method of having a flow graph of connected signal processing blocks that does exactly what you want, without you having to care about getting data from A to B.
I forgot to add: Use the resulting coordinates with something like openCV, or any other library capable of reading images and specifying sub-images by coordinates as well as saving to new images.
I have an image (logical values), like this
I need to get this image resampled from pixel to mm or cm; this is the code I use to get the resampling:
function [ Ires ] = imresample3( I, pixDim )
[r,c]=size(I);
x=1:1:c;
y=1:1:r;
[X,Y]=meshgrid(x,y);
rn=r*pixDim;
cn=c*pixDim;
xNew=1:pixDim:cn;
yNew=1:pixDim:rn;
[Xnew,Ynew]=meshgrid(xNew,yNew);
Id=double(I);
Ires=interp2(X,Y,Id,Xnew,Ynew);
end
What I get is a black image. I suspect that this code does something that is not what I have in mind: it seems to take only the upper-left part of the image.
What I want is, instead, to have the same image on a mm/cm scale: what I expect is that every white pixel should be mapped from the original position to the new position (in mm/cm); what happen is certainly not what I expect.
I'm not sure that interp2 is the right command to use.
I don't want to resize the image, I just want to go from pixel world to mm/cm world.
pixDim is of course the dimension of the image pixel, obtained dividing the height of the ear in cm by the height of the ear in mm (and it is on average 0.019 cm).
Any ideas?
EDIT: I was quite sure that the code had no sense, but someone told me to do that way...anyway, if I have two edged ears, I need first to scale both the the real dimension and then perform some operations on them. What I mean with "real dimension" is that if one has size 6.5x3.5cm and the other has size 6x3.2cm, I need to perform operations on this dimensions.
I don't get how can I move from the pixel dimension to cm dimension BEFORE doing operation.
I want to move from one world to the other because I want to get rid of the capturing distance (because I suppose that if a picture of the ear is taken near and the other is taken far, they should have different size in pixel dimension).
Am I correct? There is a way to do it? I thought I can plot the ear scaling the axis, but then I suppose I cannot subtract one from the other, right?
Matlab does not use units. To apply your factor of 0.019cm/pixel you have to scale by a factor of 0.019 to have a 1cm grid, but this would cause any artefact below a size of 1cm to be lost.
Best practice is to display the data using multiple axis, one for cm and one for pixels. It's explained here: http://www.mathworks.de/de/help/matlab/creating_plots/using-multiple-x-and-y-axes.html
Any function processing the data should be independent of the scale or use the scale factor as an input argument, everything else is a sign of some serious algorithmic issues.
I want to show how used space changes on my disk by drawing a figure with x-axis the sampling time point and y-axis storage used on disk.
However, currently, the storage used is recorded in bytes, which is not human-readable when value goes beyond GB.
So, could I re-tic axis in gnuplot? In my case, could I change the value 100000000, for example, into 100MB?
Thanks and Best Regards.
You have two main options. The first (and probably easiest) is to scale things when you plot:
plot 'datafile' using 1:($2/1e6) title 'Usage in MB'
This will plot the second data column in the file datafile with each value divided by 1e6, versus time (first column).
You can also re-tic the axes, but this is a bit less general.
set ytics ("100" 1e8)
Another option would be to use scientific notation on the y axis (as I have been doing with these big numbers above). To do that, the command is
set format y '%.2e'
This will print the y tics using scientific notation with 2 figures after the decimal point. You could also try
set format y '%.2g'
which will print the more compact of either scientific or normal notation.