As per my understanding,
1. .eps format images are vector images.
2. When we draw something in word (like a flowchart) that is stored
as a vector image.
I am almost sure about the first, not sure about the second. Please correct me if I am wrong.
Assuming this two things, when a latex file (where .eps images are inserted) or a word file (that contains vector images) is converted into pdf, do the images get converted into raster images?
Also, I think PDFBox/xpdf can only extract raster images from the pdf (as they are embedded as XObjects), not vector images. Is that understanding correct? This question in stackoverflow is related, but have not been answered yet.
Your point 1 is incorrect, eps files are PostScript programs, they may contain vector information, or text or image data, or all of the above.
point 2 In PDF there isn't a 'vector image', an image means a bitmap and therefore cannot be vector.
If you convert a PostScript program to a PDF file, then the result depends entirely on the conversion program you use. In general vectors will be retained as vectors, and text as text. However it is entirely possible that an application might render the entire PostScript program and insert the result as an image in the PDF.
So the answer to your first question ("do the images get converted into raster images") is 'maybe, but probably not'.
I'm afraid I have no idea about the capabilities of PDFBox/xpdf, but since collections of vectors may not be arranged as 'images' (they could be held as Form XObjects, or Patterns) in any atomic fashion, there isn't any obvious way to know when to stop extracting. And what format would you store the result in anyway ?
Related
I wanted to scan Book pages and combine the images to an pdf "ebook" (just for me), but the file sizes get really huge. Even .jpg resulted in an pdf file with 60mb+ in size.
Do you have any idea how I can compress it any further? I.e. which file format I could choose for this specific purpose? (The book contains pictures and written text.)
Thank you for your help.
I tried to save it as .jpg and other file formats like .png, but didnt get small enough for the file to be easy handled, without loosing to much resolution.
Images are expensive things.
Ignoring compression you’re looking at 3bytes per pixel of data.
If you want to keep images you could reduce this by turning your images into greyscale. That reduces it to 1byte per pixel (again ignoring compression).
Or you could turn it into black and white. Which would be 1 but per pixel.
Or, alternatively, you could use OCR to translate your image into actual text which is a much more efficient way of storing books.
been trying to find the answer to why everybody converts an image to grayscale before processing?
For example, this website with instructions teaching people how to build a simple scanning program converts photo to greyscale first before passing commands to manipulate the image itself.
In the second example, this thread on stackoverflow shows a person also converts the image to grayscale before extracting text from his image.
Does this process make the image easier to manipulate? Or does it give better results when extracting text? If so, shouldn't a binary image give the best result in the case of extracting text?
More often than not, grayscale has all the relevant information to complete a particular task. So reducing the image to grayscale greatly simplifies calculations and removes redundancies.
Binary image is great too but it sacrifices too many information for it to be useful in many cases. And most library supports a minimum of 8 bit image processing anyway for a true binary data structure to be useful.
Imagine having to create a program to recognize text on paper. Having a color image doesn't help you to better read the text. The text can be in various color but you can read the text even if its in black and white. You can argue that binary image should also give the same performance and that is true IF there are no noise such as shadow on the paper.
Once there are noise elements exist on the image, you will need more information to separate text from noise and that is when grayscale is useful.
Moreover the most used and reliable information for advanced image processing is the edges and its textures. Both which can be obtained from a grayscale image.
I would like to know why we need to decode let's say a png to a bitmap in order to show the image.
Why not just show the png like that (encoded).
I'm asking here a moron type of question on purpose. It's clear to me it's impossible to show an encoded image just like that but I want to know why, and how an image is shown on a screen because it's easy just to do :
canvas.drawBitmap(((AndroidImage)Image).bitmap, x, y, null);
I want to understand the full of it. I'm guessing we need to show every pixels one by one, but I want more details.
It's easy to know how to do, it's a bit harder to understand why.
If someone has a course/tuto/article/explanation that explains it... I would appreciate
Thanks in advance
PS : Please don't respond "you need to decode/convert png to bitmap" I know that... And that's not my question
There are lots of reasons. There is not really a direct relation between 'a value in a file' and 'a pixel on a screen'.
You need to know the width and height of the bitmap. You cannot infer this from the image size -- it has to be stored somewhere inside the image file. (Or anywhere else. Point is, you have to know its size.)
You need to know the bit depth and color model of the bitmap. You cannot meaningfully copy an 8-bit indexed image directly onto a screen that accepts 32-bit BGR ordering with an unused byte, for example.
Your example, the PNG file format, specifies that all image data is compressed. This is for a sane reason: the PNG format was designed for use on web pages, in a time period where every byte still counted. But even the lowly simple BMP file format uses a very specific form of 'encoding': in its 24-bit format, every line consists of sets of BGR values for each pixel and is padded at the end with enough bytes to make its total length evenly divisible by 4.
JPEG uses an even more advanced encoding scheme (which is too difficult to explain in a few short words) so it can compress images even more. The advanced encoding scheme allows far more compression than regular methods (which in turn means there is only the tiniest relation between 'values in the file' and 'pixels on the screen').
I'm working on a software project in which I have to compare a set of 'input' images against another 'source' set of images and find out if there is a match between any of them. The source images cannot be edited/modified in any way; the input images can be scaled/cropped in order to find a match. The images can be in BMP,JPEG,GIF,PNG,TIFF of any dimensions.
A constraint: I'm not allowed to use any external libraries. ImageMagick is an exception and can be used.
I intend to use Java/Python. The software is purely command-line based.
I was reading on SO and some common image comparing algorithms. I'm planning to take 2 approaches.
1. I could use Histograms/buckets to find out the RGB values of the 2 images being compared.
2. Use SIFT/SURF to fin keypoint descriptors and find the euclidean distance between them and output the result based on the resultant distance.
The 2 images in comparison can be in different formats. An intuitive thought is that before analysis/comparison, the 2 images must be converted to a common format.I reasoned that the image should be converted to the one with lesser quality e.g. if the 2 input images are BMP and JPEG, convert the BMP to JPEG. This can be thought of as a pre-processing step.
My question:
Is image conversion to a common format required? Can 2 images of different formats be compared? IF they have to be converted before comparison, is my assumption of comparing from higher quality(BMP) to lower(JPEG) correct? It'd also be helpful if someone can suggest some algorithms for image conversion.
EDIT
A match is said to be found if the pattern image is found in the source image.
Say for example the source image consists of a football field with one player. If the pattern image contains the player EXACTLY as he is in the source image, then its a match.
No, conversion to a common format on disk is not required, and likely not helpful. If you extract feature descriptors from an image (SIFT/SURF, for example), it matters much less how the original images were stored on disk. The feature descriptors should be invariant to small compression artifacts.
A bit more...
Suppose you have a BMP that is an image of object X in your source dataset.
Then, in your input/query dataset, you have another image of object X, but it has been saved as a JPEG.
You have no idea how what noise was introduced in the encoding process that produced either of these images. There is lighting differences, atmospheric effects, lens effects, sensor noise, tone-mapping, gammut-mapping. Some of these vary from image to image, others vary from camera to camera. All this is done before the image even gets saved to storage in the camera. Yes, there are also JPEG compression artifacts, but to assume the BMP is "higher" quality and then degrade it through JPEG compression will not help. Perhaps the BMP has even gone through JPEG compression before being saved as a BMP.
I'm trying to develop a mobile application, and I'm wondering the easiest way to convert an image into a text file, and then be able to recreate it later in memory said text. The image(s) in question will contain no more than 16 or so colors, so it would work out fine.
Basically, brute-forcing this solution would require me saving each individual's pixel color data into a file. However, this would result in a HUGE file. I know there's a better way - like, if there's a huge portion of the image that consists of the same color, breaking up the area into smaller squares and rectangles and saving their coordinates and size to file.
Here's an example. The image is supposed to be just black/white. The big color boxes represent theoretical 'data points' in the outputted text file. These boxes would really state their origin, size, and what color they should be.
E.g., top box has an origin of 0,0, a size of 359,48, and it represents the color black.
Saved in a text file, the data would be 0,0,359,48,0.
What kind of algorithm would this be?
NOTE: The SDK that I am using cannot return a pixel's color from an X,Y coordinate. However, I can load external information into the program from a text file and manipulate it that way. This data that I need to export to a text file will be from a different utility that will have the capability to get a pixel's color from X,Y coordinates.
EDIT: Added a picture
EDIT2: Added constraints
Could you elaborate on why you want to save an image (or its parts) as plain text? Can't you use a binary representation instead? Also, if images typically have lots of contiguous runs of pixels of same color, you may want to use the so-called run-length encoding (RLE). Alternatively, one of Lempel-Ziv-something compression algorithms could be used (LZ77, LZ78, LZW).
Compress the image into a compressed format (e.g. JPEG, PNG, GIF, etc) and then save it as a .txt file or whatever. To recreate the image, just read in the file into your program using whatever library function suits your particular needs.
If it's necessary that the .txt file have some textual meaning, then you may be in some trouble.
In cs there is an algorithm like spatial index to recursivley subdivide a plane into 4 tiles. If the cell has the same size it looks like a quadtree. If want you to subdivide a plane into pattern (of colors) you can use this tiling idea to dynamically change the size of the cell. A good start to look at is a z-curve or a hilbert curve.