Are there other image file format that are fast and can be used as intermediary file for conversion jsimilar to MIFF file? - ghostscript

I want to convert an EPS file to MIFF file using GhostScript. But standard GhostScript installation in Windows 10 does not include miff24 device. This miff24 device is used by GS to convert to MIFF file. So I cannot convert the EPS file to MIFF since there is no miff24 device.
Are the any alternative image format similar to MIFF that can be used in GhostScript? Similar with MIFF which is faster and light.

Related

Difference between TIFF image vs TIFF file

These 2 files are TIFF files but the file type is different, one is TIFF IMAGE and TIFF File. Can someone tell me the difference between them? I can't find the exact reason.
EDIT:
After enabling file extension

How to convert a matrix containing a non-demosaiced image to a RAW image file, openable by Lightroom?

I have a camera module, from which I am reading out "RAW", non-demosaiced image data (this camera module uses a Bayer BGGR filter). I am currently storing this in a MATLAB matrix. I am aware that MATLAB can demosaic this image for me, but I would like to use Adobe Lightroom's demosaicing algorithm and processing tools.
Do any tools exist to convert this matrix (using MATLAB or otherwise) into a standard RAW file, such as Adobe's DNG format? I understand that DNG is very similar to TIFF, can this be leveraged?
As I know, you can use the Adobe DNG SDK. Download Adobe DNG SDK from here.
Adboe DNG SDK can read dng format and save dng as tif format.
If you want to read bayer format( non-demosaiced image data ), you can try to hack Adobe DNG SDK. Replace of bayer data before demosaic.
Something need to notice.
You must use correct "bayer type"( BGGR , RGGB.. etc ).
You must use correct "bits per sample".
You must use correct "width" and "height".

Unable to extract text and images from specific PDF

Can anyone please let me know how I can extract all the text and images from a PDF. I am able to extract images in scenario like, which I created a PDF with few lines of text and 2 png images using Google Docs. But, I am unable to extract images from a sample pdf.
I have tried with the following:
In Ruby:
1) "pdf-reader" gem, it is supporting extraction of only few formats of images.
2) "docsplit" gem, it is only able to extract text and unable to extract images.
Command-line utility:
1) "pdfimages" tool, it is supporting extraction of only few formats of images.
Java library:
1) "pdfbox" library, it is supporting extraction of only few formats of images.
1.
Extracting text:
pdftotext -layout the.pdf -
Extract all pages' text to <stdout>.
pdftotext -layout -nopgbrk the.pdf the-3-5.txt
Extract all pages' text to file the.txt, and don't insert these pesky ^L characters signifying new pages.
pdftotext -f 3 -l 5 -layout the.pdf -
Extract pages' 3--5 text to the-3-5.txt.
2.
Extracting images
pdfimages -f 4 -l 7 -j the.pdf myprefix--
Extract all images from pages 4 through 7 as JPEGs (if possible!) and name them with the prefix myprefix---.
If extracting as JPEGs is not possible, the images will be extracted as pure raster PPM or PGM.
The latest versions of pdfimages (Poppler fork) lets you specify -png (and more) to get all images as PNGs.
Using the latest version of pdfimages gives you these options:
$ pdfimages -h
pdfimages version 0.33.0
Copyright 2005-2015 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
Usage: pdfimages [options] <PDF-file> <image-root>
-f <int> : first page to convert
-l <int> : last page to convert
-png : change the default output format to PNG
-tiff : change the default output format to TIFF
-j : write JPEG images as JPEG files
-jp2 : write JPEG2000 images as JP2 files
-jbig2 : write JBIG2 images as JBIG2 files
-ccitt : write CCITT images as CCITT files
-all : equivalent to -png -tiff -j -jp2 -jbig2 -ccitt
-list : print list of images instead of saving
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
-p : include page numbers in output file names
-q : don't print any messages or errors
[....]
What more image formats do you want? If you need other formats use ImageMagick's convert command.
Also, there are no other "formats" embedded in PDFs.
Basically, the only compression methods for images embedded in PDFs are:
JPEG (then /DCTDEcode filter is mentioned as uncompression hint to the PDF viewer),
JBIG2 (/JBIG2Encode),
Fax compression (CCITTFaxDecode) and
JPEG2000 (JPXDecode).
All other images embedded in PDFs basically are pure raster data anyway (PPM or PGM), and their PDF-internal compression is one of the other standard compression methods available for general stream compression:
/FlateDecode (ZIP/Deflate algorithm),
/LZWDecode (Lempel-Ziv-Welch algorithm) and
/RunLengthDecode.
Update
I only now had time to look at your linked sample PDF, sorry.
As #mkl wrote in his comment, what looks like an image isn't always an image in PDF technical parlance. For example, on your PDF's page 7 there is the (famous) tiger head. This is completely composed from vector elements, which are placed inline into the page's /Contents stream.
The same is true for the depicted chess board.
I believe the tiger image was designed with the help some vector graphics program a few decades ago (Adobe Illustator?) when it had freshly been released, and exported to EPS. A PDF viewer in may cases has now way to identify inline vector elements (which could be simple horizontal lines) from other contents. Unless these vector elements are "grouped" into an XObject (which pdfimages would no be able to extract either, but which would help with manual isolation and extraction...)
These vector elements cannot be automatically extracted by any (Free and Open Source Software, or gratis closed source software) tool I know.
A "real" image in PDF parlance is a rectangle of pixel data. These are the only type of images which can be extracted by a tool like pdfimages.

what is the format of a binary image & how is it different from jpg, png images?

I searched the internet for the basic formats of image files (e.g. .jpg, .png, .gif) as there is a specific format for .doc, .pdf etc. But didn't got anything relevant. And today I also came with an .bin image format. BIN signifies that the image is in the Binary format. So, what is the Internal format of .jpg image file. And How is it different from .bin (Binary) format. Because everything is Basically saved in Binary Form. And How is BITMAP Image different from .jpg format.
if you open the files in notepad or change the jpg to .txt and a exe to .txt you will see the first X amount of bytes defines what type of file it is etc. I have never looked into where the "standard" is but as you will see all JPEGS start with a specific byte and EXE start with a specific byte no matter what the content
Also JPEG is a licensed compressed form of an image and BMP is Microsoft Windows version of an image(with little or no compression I believe. png is open source or GPL licensed and technically your files do not need to be "licensed" to convert to JPEG. This is the same as a .MP3 vs a.OGG in terms of music

Ghostscript Stamp Image on PDF

Is there any way to stamp or overlap a tiff image on a existing PDF file and output the result using Ghostscript?
I have two PDF which i want to merge in a result PDF with one over the other using ghostscript. I want to know if this can be done and how, or if it may work with one PDF as tiff image on top of the base PDF.
Can ghostscript make this stamp using layers in the PDF?
Thank you for your answers
The pdfwrite device in Ghostscript doesn't really support layers, so you can't use that. Also its unclear why you think layers would help.
TIFF isn't part of PostScript (or PDF), so you can't directly read a TIFF file into GS. I have elsewhere posted a PostScript program which reads TIFF files and renders them for output. You could use that to read a TIFF file.
However, you would have to mess about with either the PDF interpreter or a custom EndPage procedure in order to read and render the TIFF file. And unless you take specific kinds of action, it will be opaque, which may well not be what you want.
The Ghostscript PDF interpreter doesn't really lend itself to this kind of manipulation, have you considered using pdftk instead ?

Resources