I've been trying to plot a dataset containing about 500,000 values using gnuplot. Although the plotting went well, the SVG file it produced was too large (about 25 MB) and takes ages to render. Is there some way I can improve the file size?
I have vague understanding of the SVG file format and I realize that this is because SVG is a vector format and thus have to store 500,000 points individually.
I also tried Scour and re-printing the SVG without any success.
The time it takes to render you SVG file is proportional to the amount of information in it. Thus, the only way to speed up rendering is to reduce the amount of data
I think it is a little tedious to fiddle with an already generated SVG file. I would suggest to reduce the amount of data for gnuplot to plot.
Maybe every or some other reduction of data can help like splitting the data into multiple plots...
I would recommend keeping it in vector graphic format and then choosing a resolution for the document that you put it in later.
Main reason for doing this is that you might one day use that image in a poster (for example) and print it at hundreds of times the current resolution.
I normally convert my final pdf into djvu format.
pdf2djvu --dpi=600 -o my_file_600.djvu my_file.pdf
This lets me specify the resolution of the document as a whole (including the text), rather than different resolutions scattered throughout.
On the downside it does mean having a large pdf for the original document. However, this can be mitigated against if you are latex to make your original pdf - for you can use the draft option until you have finished, so that images are not imported in your day-to-day editing of the text (where rendering large images would be annoying).
Did you try printing to PDF and then convert to SVG?
In Linux, you can do that with imagemagick, which you even be able to use to reduce the size of your original SVG file.
Or there are online converters, such as http://image.online-convert.com/convert-to-svg
Related
I wanted to scan Book pages and combine the images to an pdf "ebook" (just for me), but the file sizes get really huge. Even .jpg resulted in an pdf file with 60mb+ in size.
Do you have any idea how I can compress it any further? I.e. which file format I could choose for this specific purpose? (The book contains pictures and written text.)
Thank you for your help.
I tried to save it as .jpg and other file formats like .png, but didnt get small enough for the file to be easy handled, without loosing to much resolution.
Images are expensive things.
Ignoring compression you’re looking at 3bytes per pixel of data.
If you want to keep images you could reduce this by turning your images into greyscale. That reduces it to 1byte per pixel (again ignoring compression).
Or you could turn it into black and white. Which would be 1 but per pixel.
Or, alternatively, you could use OCR to translate your image into actual text which is a much more efficient way of storing books.
When I take pictures with my camera the file sizes seem exceedingly large. In this example the original is 5186kb. I wrote a java program to just read the image and then write it again, removing any information except the pixel values.
The first time it is rewritten, the file size goes down to 1005kb, over a 500% reduction! To make sure I wasn't losing data to compression I iterated the program 100 times on the resulting images and the file size stayed at exactly 1005kb with no loss in image quality.
My question is, what is the camera storing in the other 4181kb? Some sort of metadata I know, but it seems like a lot. I would like to know what I am losing by resizing my images like this.
Assuming the file format you are using is .jpg, the original file was saved in a higher value of jpg compression say 95%, while when you resave the file, you probably were using say 85% jpg compression value.
The size doesn't change in consequent saves as the compression value stays the same
We have a pdf page which contains one or more figures which are two-dimensional plots of experimental results. The figures may or may not be embedded in text. Each plot has the x and y axis with their labels and unit measurements marked in the plot. Inside each figure are one or more plots, each with a different color.
How can we convert the plot into a table of corresponding x and y values (say for 100 points) ?
I have already tried WebPlotDigitizer but it works only when the input is a standalone picture of a plot.
What I think I'll have to do is extract the plots from the PDF and process it further. Now, I am not able to find a tool for doing that. I have attached a sample PDF from which the plots have to be extracted.
Note that the 2 plots in the last page of the PDF are images and can be extracted readily(I've found a couple of software for those).The other plots are not images and the software are not able to extract them.
Is there any open source software that can achieve that?
Plots in this PDF file you have provided are made with vector drawings, so the only way to extract them is to convert PDF into image (i.e. render pages). Try ImageMagick's convert command line, see this answer
As Photoshop is very well scriptable, it is actually possible to extract images from a PDF programmatically (as opposed to pages; see Photoshop JavaScript documentation).
Then you have the whole set of instruments to adjust the images, so that further processing (interpretation) is easier to accomplish.
I thought this was a little odd.
Open Paint on Windows (I'm using Windows 7) and draw something (anything).
Then save as a .png for example called 1.png. Then save 'n' number of other copies straight away without modifying the image (2.png, 3.png,..etc).
I notice that 1.png has a different checksum to 2/3/4/../n.png.
1.png also varies in such (sometimes smaller and other times bigger) compared to the other images.
What is going on?
The difference in filesize is due to the choice of scanline filters used by the compressor. I don't have any idea why your application would use a different set of filters when compressing the image multiple times, but it's harmless.
There's no time stamp in the two images that Mohammad posted. According to "pngcheck -v", the only difference is in the content of the IDAT chunk. The image signatures computed by ImageMagick are identical. Neither image contains a tIME chunk.
"pngcrush" produces two identical images with a smaller filesize (11493 bytes).
According to "pngtest -mv" (pngtest is included in the libpng distribution), one image uses only the PNG "none" filter while the other uses the "none", "sub", and "up" filters.
The Wikipedia article on the PNG format seems to suggest there's a timestamp in there. That alone would do it.
Doesn't explain why subsequent files have the same checksum.
As per my understanding,
1. .eps format images are vector images.
2. When we draw something in word (like a flowchart) that is stored
as a vector image.
I am almost sure about the first, not sure about the second. Please correct me if I am wrong.
Assuming this two things, when a latex file (where .eps images are inserted) or a word file (that contains vector images) is converted into pdf, do the images get converted into raster images?
Also, I think PDFBox/xpdf can only extract raster images from the pdf (as they are embedded as XObjects), not vector images. Is that understanding correct? This question in stackoverflow is related, but have not been answered yet.
Your point 1 is incorrect, eps files are PostScript programs, they may contain vector information, or text or image data, or all of the above.
point 2 In PDF there isn't a 'vector image', an image means a bitmap and therefore cannot be vector.
If you convert a PostScript program to a PDF file, then the result depends entirely on the conversion program you use. In general vectors will be retained as vectors, and text as text. However it is entirely possible that an application might render the entire PostScript program and insert the result as an image in the PDF.
So the answer to your first question ("do the images get converted into raster images") is 'maybe, but probably not'.
I'm afraid I have no idea about the capabilities of PDFBox/xpdf, but since collections of vectors may not be arranged as 'images' (they could be held as Form XObjects, or Patterns) in any atomic fashion, there isn't any obvious way to know when to stop extracting. And what format would you store the result in anyway ?