I could understand if it was being used for some sort of metadata, but I can't find anything confirming this.
Is there a reason for this added padding? If so, what is it?
The padding has to do with something called "data alignment". Many BMP files are aligned so that each row is a multiple of 32-bits. It makes for faster reading, and easier reading of arbitrary rows (depending on CPU).
This question already has an answer here:
save high resolution figures with parfor in matlab
(1 answer)
Closed 8 years ago.
I've got a ~1600 line program that reads in images (either tiff or raw), performs a whole bunch of different mathematical and statistical analyses, and then outputs graphs and data tables at the end.
Almost two-thirds of my processing time is due to looping 16 times over the following code:
h = figure('Visible','off','units','normalized','outerposition',[0 0 1 1]);
set(h,'PaperPositionMode','auto');
imagesc(picdata); colormap(hot);
imgtmp = hardcopy(h,'-dzbuffer','-r0');
imwrite(imgtmp,hot,'picname.png');
Naturally, 'picname.png' and picdata are changing each time around.
Is there a better way to invisibly plot and save these pictures? The processing time mostly takes place inside imwrite, with hardcopy coming second. The whole purpose of the pictures is just to get a general idea of what the data looks like; I'm not going to need to load them back into Matlab to do future processing of any sort.
Try to place the figure off-screen (e.g., Position=[-1000,-1000,500,500]). This will make it "Visible" and yet no actual rendering will need to take place, which should make things faster.
Also, try to reuse the same figure for all images - no need to recreate the figure and image axes and colormap every time.
Finally, try using my ScreenCapture utility rather than hardcopy+imwrite. It uses a different method for taking a "screenshot" which may possibly be faster.
I am doing motion detection.
I compare 2 images at a time.
The differences are compared at the pixel level.
I want to store the differences in a file.
I have tried saving the hex value into 2 dimensional string and using the binary formatter serializing it out to a file. But the size is 495kb and the original image size is only 32kb.
What is the most efficient way of storing differences?
I am using C#
Thanks
There are many ways. Maybe took a look how bdiff is doing it. In general, compare the binary value, not a hex representation. Maybe also the binary formatter serialization adds some overhead.
I am using GNUplot for plotting a small matrix.
The matrix is 100x100 by size. e.g.
1.23212 2.43123 -1.24312 ......
-4.23123 2.00458 5.60234 ......
......
The data is not neatly stored in the file.
So from C++ point of view, due to the lack of length per data, there is no way to load a whole number, but it has to check when the number is loading. I guess this should be the reason of the slow plotting speed.
Now I have 3 questions:
Q1: Is loading the bottle neck?
Q2: If I can make the data file neatly stored. e.g.
1.23212 2.43123 -1.24312 ......
-4.23123 2.00458 5.60234 ......
......
Does the plotting speed get any improvement? (Maybe GNUplot can check what the pattern is. Thus improve the loading speed. Not sure.)
Q3: Any other options that I can set to make it faster?
Edit
I tried these:
-3.07826e-21 -2.63821e-20 -1.05205e-19 -3.25317e-19 -9.1551e-19
when outputting used setw to make sure there they are aligned. But I think I still need to tell GNUplot to load 13 characters at one time, then perform strtod.
I would guess, in order to fit a general case, where there is no information for the length of the number, it is safe to do it digit by digit until it there is a space.
I'm doing calculations and the resultant text file right now has 288012413 lines, with 4 columns. Sample column:
288012413; 4855 18668 5.5677643628300215
the file is nearly 12 GB's.
That's just unreasonable. It's plain text. Is there a more efficient way? I only need about 3 decimal places, but would a limiter save much room?
Go ahead and use MySQL database
MSSQL express has a limit of 4GB
MS Access has a limit of 4 GB
So these options are out. I think by using a simple database like mysql or sSQLLite without indexing will be your best bet. It will probably be faster accessing the data using a database anyway and on top of that the file size may be smaller.
Well,
The first column looks suspiciously like a line number - if this is the case then you can probably just get rid of it saving around 11 characters per line.
If you only need about 3 decimal places then you can round / truncate the last column, potentially saving another 12 characters per line.
I.e. you can get rid of 23 characters per line. That line is 40 characters long, so you can approximatley halve your file size.
If you do round the last column then you should be aware of the effect that rounding errors may have on your calculations - if the end result needs to be accurate to 3 dp then you might want to keep a couple of extra digits of precision depending on the type of calculation.
You might also want to look into compressing the file if it is just used to storing the results.
Reducing the 4th field to 3 decimal places should reduce the file to around 8GB.
If it's just array data, I would look into something like HDF5:
http://www.hdfgroup.org/HDF5/
The format is supported by most languages, has built-in compression and is well supported and widely used.
If you are going to use the result as a lookup table, why use ASCII for numeric data? why not define a struct like so:
struct x {
long lineno;
short thing1;
short thing2;
double value;
}
and write the struct to a binary file? Since all the records are of a known size, advancing through them later is easy.
well, if the files are that big, and you are doing calculations that require any sort of precision with the numbers, you are not going to want a limiter. That might possibly do more harm than good, and with a 12-15 GB file, problems like that will be really hard to debug. I would use some compression utility, such as GZIP, ZIP, BlakHole, 7ZIP or something like that to compress it.
Also, what encoding are you using? If you are just storing numbers, all you need is ASCII. If you are using Unicode encodings, that will double to quadruple the size of the file vs. ASCII.
Like AShelly, but smaller.
Assuming line #'s are continuous...
struct x {
short thing1;
short thing2;
short value; // you said only 3dp. so store as fixed point n*1000. you get 2 digits left of dp
}
save in binary file.
lseek() read() and write() are your friends.
file will be large(ish) at around 1.7Gb.
The most obvious answer is just "split the data". Put them to different files, eg. 1 mln lines per file. NTFS is quite good at handling hundreds of thousands of files per folder.
Then you've got a number of answers regarding reducing data size.
Next, why keep the data as text if you have a fixed-sized structure? Store the numbers as binaries - this will reduce the space even more (text format is very redundant).
Finally, DBMS can be your best friend. NoSQL DBMS should work well, though I am not an expert in this area and I dont know which one will hold a trillion of records.
If I were you, I would go with the fixed-sized binary format, where each record occupies the fixed (16-20?) bytes of space. Then even if I keep the data in one file, I can easily determine at which position I need to start reading the file. If you need to do lookup (say by column 1) and the data is not re-generated all the time, then it could be possible to do one-time sorting by lookup key after generation -- this would be slow, but as a one-time procedure it would be acceptable.