adding jpeg image to pdf using xobject - image

Enhancing in-house/internal software, that creates very basic PDF files, to handle images. Using JPEG seems to be the easiest place. Inserting JPEG image into Xobject stream by opening Image file and performing Reads from image file and Writes into PDF file. However, when viewing PDF file, getting errors on Image length. Appears PDF JPEG Image stream contents is not the exact same format as the Image file itself. I'm not using any PDF development tools to build the PDF file, just developing code myself to do this.
Does anyone know exactly how to add JPEG image file contents into Xobject stream? Appears to be or needs to be a conversion or something.

Related

Estimating PDF or TIFF file size before creating it from base64 encoded images

I have a feeling that this is not doable, but anyway it wouldn't harm to hear more from the great guys # stackoverflow. We have a module in a system that I'm working on that converts previously scanned papers (images are saved in the database in base64 format) to Tiff or PDF file then storing it on the disk. The customer recently requested a feature that allows him to view the PDF or TIFF file size before creating the file on the disk. So, is there any method to estimate the final size of the PDF or Tiff file from only its base64 encoded images?

Create small high quality PDF embedding optimized PNG?

I'm trying to create a small PDF file, embedding one optimized PNG image displayed as a header and footer on a 3 page PDF (same image must appear 6x in the PDF)
My optimized PNG image is only 2.3KB. It looks very sharp.
Failed with libreoffice
When I insert just one instance of the 2.3KB PNG image into a Libreoffice Writer doc containing only text, then export as PDF I can see that the image gets re-compressed to JPG and the resulting PDF file grows by about 40KB after adding the image. It also loses quality, the PNG also gets JPG fuzzy edges.
If I right click the image and select compression, there is no way to disable recompressing the image (it's already optimized better than libreoffice could do it) I've tried setting a compression level of 0,1,9 etc. Choosing JPG, no resize, lossless, etc but there was no improvement.
Failed with wkhtmltopdf
I also tried making a test page and used wkhtml2pdf but it did the same thing. Adding the low quality flag made no difference.
PDF Spec suggests PNG is supported?
From skimming the PDF spec, it looks like PNG images are supported.
Even plain text PDF files are surprisingly large
The disappointing thing is also when I take a 7KB HTML file which is basically just <html><body><p>foo...</p><p>bar...</p> (only about 15 paragraphs) with no CSS. The resulting 2 page PDF file is 30KB. Why should a 7kb (almost plain text) file become 30kb as a PDF?
Suggestions?
Can someone please suggest how to make a small PDF file in Linux?
I need to include 7KB of text and repeat one PNG image 6 times.
Manually or programatically. I'll take whatever I can get at this point.
PDF Spec suggests PNG is supported?
PNG isn't supported per se; PDF allows embedding JPEG images as-is, but not PNG images. PDF does borrow a set of features of the PNG format, however.
rinohtype (full disclosure: I'm the author) tries to embed as much as possible from PNG images as-is into the PDF. This does involve some bit-juggling to separate the alpha channel from the color data for example, but no reencoding of the image is performed. It does not (yet) support interlaced PNGs.
rinohtype should be able to do what you want to achieve. But please note that it currently is in a beta stage, so you might encounter some bugs.
Even plain text PDF files are surprisingly large
To keep the PDF size as small as possible, make sure not to embed/subset any of the fonts. Use only the fonts from the base 14 PDF fonts which are provided by PDF readers.
What you want is certainly achievable. Regarding the image quality, I would recommend making your image twice the size that you want it to actually display at in the PDF to keep it looking sharp.
As to the size, I've just modified a test in my PDF writer module (WIP..) to include a 7.2K png, 200px x 70px, in a PDF twice and the PDF came out at 6.8K 8). There's not much text included, but more text will only add what it's worth + a small percentage.
You can see the module and original test here.. https://github.com/DoccaPDF/docca-pdf-writer/blob/master/src/tests/writer.js#L40
That test adds ~112K of images to the PDF and results in a 103K PDF.
Of course not all images are created equal so you milage may vary..
*the images are only actually added to the PDF once, but are displayed multiple time.

Decode JPEG image stripped from inside a PDFs file

I have code that decompresses jpgs into bit maps which works fine for JPEG files, however when I feed the code a JPEG I have stripped directly from a PDFs XObject I get errors.
Adobe reader displays the image fine so I don't believe it's corrupted. I have read through JPEG and PDFs documentation and don't find any obvious problems.
My question is this, is there anything different in the "JPEG" embedded inside a PDFs stream and a normal JPEG? And if so what is it?
Note: I can manually open the PDFs, copy the image, paste into paint, and save...when I do this everything works....my problem is I need this automated.
When my code parses the PDFs, strips out the image stream, dumps the binary to a file, and then I try and open this file, it does not work. What am I missing?
My errors seem to be occurring in the Huffman decoding process, the cdt and Huffman tables appear to be read in fine.
Pardon my using the answer section but I overflowed the comment section:
My questions:
1. What code is failing to decode the JPEG? You say you "have code" but where did that come from? Why do you think that it is reliable?
What is the file format of the JPEG stream? JFIF, ADOBE, EXIF, none specified?
Could there be something in the file format that your decoder cannot handle? Does your encoder check for different types of APPn markers?
What is the JPEG format? What type of SOS marker?
Does this encoder source handle all the normally formats? Baseline, Extended, Sequential, progressive? If you have progressive JPEG and and encoder that only does baseline, you are going to have a problem.
How many components does the JPEG stream have?
Some Adobe files have 4 components and decoders may only be able to handle 1 or 3.

Matlab image conversion jp2 to jpg

I have a weird problem in matlab. I have code that takes in a directory of jp2 files and converts all of them to either tiff, png, or a jpg file. Then it puts these files in a new directory. The user can specify how big they want the file to be in terms of how many pixels are used (EX: 1:3:end is every three pixels). This code works perfectly for the png and tiff conversions.
With the jpg conversion there is no error whatsoever but when I go to click on the jpeg file in the new folder (which it does go to at least) It says "Windows Photo Viewer can't open this picture because the file appears to be damaged, corrupted, or is too large" I tried opening the pictures in other viewers but it said the same thing. All of the png and tiff pictures opened fine.
Some help would be greatly appreciated, thanks!
Edit: I noticed when I call imshow on the location of the jpeg file it actually does show up in matlab. It still does not show up in any image viewers though

The best image file format for TEXT and tools to make it

Need to convert pdf file to image file (jpg, png, gif) to show on the web.
Exploring goole application to that reads PDF files shows that they are using PNG. But hov to onvert 2000x2000 file so it have only 150 kb?
Is there any command line tool?
PNG and GIF are better than JPEG, GIF probably better than PNG. TIFF usually is the best.

Resources