I started to work on a PNG encoding/decoding library for learning purposes so I want to implement every part of it by hand.
I got pretty long with it but now I'm a bit stuck. Here are the things I succesfully implemented already:
I can load a PNG binary and go through its bytes
I can read the signature and the IHDR chunk for metadata
I can read the IDAT chunks and concatenate the image data into a buffer
I can read and interpret the zlib headers from the above mentioned image data
And here is where I got stuck. I vaguely know the steps from here which are:
Extract the zlib compressed data according to its headers
Figure out the filtering methods used and "undo" them to get the raw data
If everything went correctly, now I have raw RGB data in the form of [<R of 1st pixel>, <G of 1st pixel>, <B of 1st pixel>, <R of 2nd pixel>, <G of 2nd pixel>, etc...]
My questions are:
Is there any easy-to-understand implementation (maybe with examples) or guide on the zlib extraction as I found the official specifications hard to understand
Can there be multiple filtering methods used in the same file? How to figure these out? How to figure out the "borders" of these differently filtered parts?
Is my understanding of the how the final data will look like correct? What about the alpha channel or when a palette is used?
Yes. You can look at puff.c, which is an inflate implementation written with the express purpose of being a guide to how to decode a deflate stream.
Each line of the image can use a different filter, which is specified in the first byte of the decompressed line.
Yes, if you get it all right, then you will have a sequence of pixels, where each pixel is a grayscale value, G, that with an alpha channel, GA, RGB (red-green-blue, in that order), or RGBA.
Related
I am trying to automate the cleanup process of a large amount of scanned films. I have all the images in 48-bit RGBI TIFF files (RGB + Infrared), and I can use the infrared channel to create masks for dust removal. I wonder if there is any decent open source implementation of in-painting that I can use to achieve this (all the other software I use for batch processing are open source libraries I access through Ruby interfaces).
My first choice was ImageMagick, but I couldn't find any advanced in-painting option in it (maybe I am wrong, though). I have heard this can be done with MagickWand libraries, but I haven't been able to find a concrete example yet.
I have also had a look at OpenCV, but it seems that OpenCV's in-paint method accept only 8-bit-per-channel images, while I must preserve the 16.
Is there any other library, or even an interesting code snippet I am not aware of? Any help is appreciated.
Samples:
Full Picture
IR Channel
Dust and scratch mask
What I want to remove automatically
What I consider too large to remove with no user intervention
You can also download the original TIFF file here. It contains two alpha channels. One is the original IR channel, and the other one is the IR channel already prepared for dust removal.
I have had an attempt at this, and can go some way to achieving some of your objectives... I can read in your 16-bit image, detect the dust pixels using the IR channel data, and replace them, and write out the result without any alpha channel and all the while preserving your 16-bit data.
The part that is lacking is the replacement algorithm - I have just propagated the next pixel from above. You, or someone cleverer than me on Stack Overflow, may be able to implement a better algorithm but this may be a start.
It is in Perl, but I guess it could be readily converted to another language. Here is the code:
#!/usr/bin/perl
use strict;
use warnings;
use Image::Magick;
# Open the input image
my $image = Image::Magick->new;
$image->ReadImage("pa.tiff");
my $v=0;
# Get its width and height
my ($width,$height)=$image->Get('width','height');
# Create output image of matching size
my $out= $image->Clone();
# Remove alpha channel from output image
$out->Set(alpha=>'off');
# Load Red, Green, Blue and Alpha channels of input image into arrays, values normalised to 1.0
my (#R,#G,#B,#A);
for my $y (0..($height-1)){
my $j=0;
my #RGBA=$image->GetPixels(map=>'RGBA',height=>1,width=>$width,x=>0,y=>$y,normalize=>1);
for my $x (0..($width-1)){
$R[$x][$y]=$RGBA[$j++];
$G[$x][$y]=$RGBA[$j++];
$B[$x][$y]=$RGBA[$j++];
$A[$x][$y]=$RGBA[$j++];
}
}
# Now process image
my ($d,$r,$s,#colours);
for my $y (0..($height-1)){
for my $x (0..($width-1)){
# See if IR channel says this is dust, and if so, replace with pixel above
if($A[$x][$y]<0.01){
$colours[0]=$R[$x][$y-1];
$colours[1]=$G[$x][$y-1];
$colours[2]=$B[$x][$y-1];
$R[$x][$y]=$R[$x][$y-1];
$G[$x][$y]=$G[$x][$y-1];
$B[$x][$y]=$B[$x][$y-1];
$out->SetPixel(x=>$x,y=>$y,color=>\#colours);
}
}
}
$out->write(filename=>'out.tif',compression=>'lzw');
The result looks like this, but I had to make it a JPEG just to fit on SO:
I cannot comment, so I write an answer.
I suggest using G'Mic with the filter "inpaint".
You should load the image, take the IR image and convert it to b/w, then tell the filter inpaint to fill the areas marked in the IR image.
OpenCV has a good algorithm for image inpaiting, which is basically what you were searching for.
https://docs.opencv.org/3.3.1/df/d3d/tutorial_py_inpainting.html
If that will not help then only Neural Networks algorithms
I have a binary file that starts off with some data. After this data, the file has a JPEG image embedded in it. After the image, the file continues with some other data. I wish to use the image as an OpenGL texture.
At the moment, the only method I know of creating an OpenGL texture with Magick is to read the image file with Magick, write it into a blob, and upload the blob.data() to opengl (from this link: http://www.imagemagick.org/pipermail/magick-developers/2005-July/002276.html).
I am trying to use Magick++, but it only allows me to specify a filename, not a C-style filehandle or filestream...Are my only options the following? :
Save the JPEG image portion in the binary file as a separate temporary file and get Magick++ to read that image. I don't wish to do this as writing to disk will slow my program down.
Read the image portion into an array, create a Blob with the array as its data, and then read the Blob to obtain an image. I don't wish to do this either because after I get the image, I will need to again write the image data to another blob, and the entire code becomes unnecessarily long.
Switch to another library like DevIL, which offers support for what I want. Unfortunately, DevIL is not as feature rich as Magick.
I also looked into the core C API for Magick, where I can specify a filehandle, but the documentation says that the filehandle is closed by Magick after the image is read, which is definitely not good for my program (it is going to be pretty ugly to get the rest of my program to reopen the binary file to continue its processing...
If there is a way to provide Magick with custom I/O routines, or better still, a cleaner way of using Magick with OpenGL, please enlighten me!
The next release of GraphicsMagick does not close the input file handle after the image is read. You can try the latest development snapshot.
You could consider using mmap() (memory mapped file) to access the data and treat it as an in-memory BLOB using Magick++. The main issue with this is you might not know how long the data was in case you need to access data following the embedded JPEG image data.
It is trivial to add FILE* support to Magick++. The only reason I did not do so was for philosophical reasons (C++ vs C).
I have an idea for an app that would take some flash content which contains graphics and images like various geometric shapes and polygons and some random images and convert them to PDF.
Also, since I envision this app to be used my multiple users I want this process to be quick and scalable. One possible solution I could think of is have a small flash client with the capability of assembling the above mentioned graphics and images. Generate some sort of XML, send it to a server running a Java process which could render the PDF using iText.
I was wondering what are the other possible ways to do it or the best practices. Technology isn't an issue; open source or commercial.
I understand that image uploads etc will take variable amount of time so consider that images are readily available. Here are the criterias in terms of what I am looking for in a solution for PDF rendering:
No constraint on the flash client because the PDF render engine.
Scalable to multiple users
Speed and Efficiency
Least amount of serialization / deserialization
I would appreciate if you could share your tech stack idea. Thanks a lot!
PS: I would appreciated if you don't get bogged down my Flash >> XML >> Java approach.
I believe it to be one of the many approaches that could be taken.
If generating the PDF in the browser using Flash is an option, then consider using AlivePdf. If not, then check out XSL:FO, we use it for server side conversion to PDF.
I believe iText generates PDFs in Java code. It may or may not use XML as its data source; POJOs will do just as well.
Another way is XSL-FO. It requires an XML data source and an XSL-FO stylesheet to transform the XML and generate a PDF. Apache's Xalan (or any other XSL-T library) can do it for you.
"Quick" and "scalable" may require more than these. Uploading a lot of images is a process that has its own timescale and optimizations that have nothing to do with PDFs.
There's pdflib for PHP, and FPDF (also for PHP).
So you're also willing to consider other clients? It sounds like you've got a kids drawing app and want to generate something that'll preserve the state of their drawing at the time.
Lets face it, XML isn't that efficient. That's not its purpose. It's both machine and human readable, validatable, etc etc.
Instead, how about a <Canvas> based web page that submitted the state of that canvas to the server in JSON (fewer bytes, and less work to build them). The server can then work in whatever the hell library/language it wants. Lots of JSON->my-language libraries floating around out there.
Your choice in PDF libraries is then limited only by what is you have installed on your server. You also said you wanted to do as little reading/writing as possible.
The most efficient possible setup would be to have a read-only partial PDF already loaded into memory tailored to minimize the impact of canvas changes (including images). Each session would dupe that partial PDF, convert the JSON to PDF graphic commands, and save the PDF.
To minimize structural changes to the PDF you'd want to use Inline Images. No new objects in the PDF means you don't need to change your cross reference table at all (until you add fonts or want to reuse an existing image). You could build the "doc info" dictionary padded with a specific amount of spaces between objects so you could fill it in without changing any byte offsets (which would force you to recompute the xref table).
You may or may not need to mess with the page size... we are just talking about one page here, right?
So the PDF would look something like...
%%PDF-1.6
<3-4 random high order bytes to convince folks that we're a binary stream>
1 0 obj
<</Type/Catalog/Pages 2 0 R>>
endobj
2 0 obj
<</Type/Pages/Count 1/Kids[3 0 R]>>
endobj
3 0 obj
<</Type/Page/Contents 4 0 R/MediaBox[0 0 612 792]/Parent 2 0 R>>
endobj
5 0 obj
<</Type/DocInfo/Author() --<insert big whitespace gap here>--
/Title() --<ditto>--
/Subject() --<ditto>--
/Keywords() --<ditto>--
/Creator(My app's Name)
/Producer(My pdf library's name)
/CreationDate(encodedDateWhenThisTemplateWasBuilt) D:YYYYMMDDHHMMSS-timeZoneOffset
/ModDate() --<another, smaller whitespace gap>--
>>
4 0 obj
<</Filter/SeveralDifferentFiltersAvailable/Length --<byte length of the stream in this file>-->>
stream
And your template stops there. You'd have a similar "end of the PDF" template that would look something like this:
endstream
endobj
xref
0 6
0000000000 65535 f
0000000010 00000 n
0000000025 00000 n
0000000039 00000 n
0000000097 00000 n
0000000050 00000 n
trailer
<</Root 1 0 R/Size 6/Info 5 0 R>>
startxref
--<some white space>--
%%EOF
The columns of numbers at the end are all wrong. The first column is the byte offset of that particular object (and I'm not up for counting bytes just now thank you). The second column is largely irrelevant.
PDF filling app will need to know:
The byte offsets of everything you intend to fill in within the first template.
All the "doc info" fields, which are all optional by the way. The /Info key and the dictionary it points to are optional for that matter. You could yank 'em if you cared to.
the /Length key of the content stream. That needs to be the post-filter byte length of the stream itself.
How to convert the JSON into pdf drawing commands. If you wanted to cheat a bit you could use iText[Sharp]'s PdfContentByte class, use its drawing commands, and then get the finished byte stream and slap that into your PDF. Be sure you use Inline Images or this whole scheme goes right out the window. There are probably other libraries you could gut similarly if you felt the need. Or you could just read up on the PDF spec and roll your own. You'll be sticking to a fairly limited subset of PDF's content syntax.
The byte offset of the word "xref" from the start of the file. You can calculate this: LengthOfInitialTemplate + LengthOfContentStream + OffsetFromStartOf2ndTemplateTo'xref'.
The byte offset of the line below "startxref", which is where you write the aforecalculated byte offset of 'xref'
You're not going to get much more efficient than that. You'd read in your templates once. Read/calculate the byte offsets you needed once.
I'm trying to open an image file and store a list of pixels by color in a variable/array so I can output them one by one.
Image type: Could be BMP, JPG, GIF or PNG. Any of them is fine and only one needs to be supported.
Color Output: RGB or Hex.
I've looked at a couple libraries (RMagick, Quick_Magick, Mini_Magick, etc) and they all seem like overkill. Heroku also has some sort of difficulties with ImageMagick and my tests don't run. My application is in Sinatra.
Any suggestions?
You can use Rmagick's each_pixel method for this. each_pixel receives a block. For each pixel, the block is passed the pixel, the column number and the row number of the pixel. It iterates over the pixels from left-to-right and top-to-bottom.
So something like:
pixels = []
img.each_pixel do |pixel, c, r|
pixels.push(pixel)
end
# pixels now contains each individual pixel of img
I think Chunky PNG should do it for you. It's pure ruby, reasonably lightweight, memory efficient, and provides access to pixel data as well as image metadata.
If you are only opening the file to display the bytes, and don't need to manipulate it as an image, then it's a simple process of opening the file like any other, reading X number of bytes, then iterating over them. Something like:
File.open('path/to/image.file', 'rb') do |fi|
byte_block = fi.read(1024)
byte_block.each_byte do |b|
puts b.asc
end
end
That will merely output bytes as decimal. You'll want to look at the byte values and build up RGB values to determine colors, so maybe using each_slice(3) and reading in multiples of 3 bytes will help.
Various image formats contain differing header and trailing blocks used to store information about the image, data format and EXIF information for the capturing device, depending on the type. Probably going with a something that is uncompressed would be good if you are going to read a file and output the bytes directly, such as uncompressed TIFF. Once you've decided on that you can jump into the file to skip headers if you want, or just read those too to see or learn what's in them. Wikipedia's Image file formats page is a good jumping off place for more info on the various formats available.
If you only want to see the image data then one of the high-level libraries will help as they have interfaces to grab particular sections of the image. But, actually accessing the bytes isn't hard, nor is it to jump around.
If you want to learn more about the EXIF block, used to describe a lot of different vendor's Jpeg and TIFF formats ExifTool can be handy. It's written in Perl so you can look at how the code works. The docs nicely show the header blocks and fields, and you can read/write values using the app.
I'm in the process of testing a new router so I haven't had a chance to test that code, but it should be close. I'll check it in a bit and update the answer if that didn't work.
I have a format for signatures in a database that an uncooperative vendor has been using for a client of ours. We are replacing their system.
A signature begins:
0A000500010002000100020001000100010001000100010001000100D100010001004F0001000100
01000100010001000100010001000100010001000100010001000100FF00FF00FF00010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
010001000100010002001C0001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00
DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00
C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000
C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00
DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00
C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000
C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00
DA00FF00C100C000C100FF00DA00FF00C100C000C100...
continues with more image data and ends with a long series of 0100's.
Any ideas on what the file format is?
Thanks.
Taking the first 16 or so bytes of data and putting them into a file, the linux 'file' command says:
$ file test.file
test.file: PCX ver. 2.5 image data
Looks like it might be a really simple format. 0A seems to be a header. Then it looks like pairs of darkness values, although it seems like 50% of the space is wasted. If you post a file, I'd be happy to try to write a converter.
The whole file is necessary because it looks like there's a fixed image size, and it might take a little fiddling. Do you have an image that's not confidential, but still has data in it?
That looks like raw pixel values from an 8-bit A/D converter (scanner), embedded in 16-bit words, little-Endian (x86) format.
Are the files all the same size? That would give you a strong clue to the image size.
It looks like RAW, un-encoded image data. Have you tried loading it straight into a two dimensional buffer and seeing what you get? Copy the file and name it foo.raw and try loading it in Photoshop, for example. If my guess is correct and it is just raw 16bit samples, you will have to supply the width and height yourself. The number of channels may be 1x16bit. As tfinniga says, the first two bytes may be a header you will have to skip.
I never seen this file format... maybe is a 'private format', and looks like easy to decode.
Edit: looks like the bytes are arranged in group of 2 (2 'xx' hexadecimal numbers)