Determining end of JPEG (when merged with another file) - byte

I'm creating a program that "hides" an encrypted file at the end of a JPEG. The problem is, when retrieving this encrypted file again, I need to be able to determine when the JPEG it was stored in ends. At first I thought this wouldn't be a problem because I can just run through the file checking for 0xFF and 0xD9, the bytes JPEG uses to end the image. However... I'm noticing that in quite a few JPEGs, this combination of bytes is not exclusive... So my program thinks the image has ended randomly half way through it.
I'm thinking there must be a set way of expressing that a JPEG has finished, otherwise me adding a load of bytes to the end of the file would obviously corrupt it... Is there a practical way to do this?

You should read the JFIF file format specifications

Well, there are always two places in the file that you can find with 100% reliability. The beginning and the end. So, when you add the hidden file, add another 4 bytes that stores the original length of the file and a special signature that's always distinct. When reading it back, first seek to the end - 8 and read that length and signature. Then just seek to that position.

You should read my answer on this question "Detect Eof for JPG images".

You're likely running into the thumbnail in the header, when moving through the file you should find that most marked segments contain a length indicator, here's a reference for which do and which don't. You can skip the bytes within those segments as the true eoi marker will not be within them.
Within the actual jpeg compressed data, any FF byte should be followed either by 00 (the zero byte is then discarded), or by FE to mark a comment (which has a length indicator, and can be skipped as described above).
Theoretically the only way you encounter a false eoi reading in the compressed data is within a comment.

Related

copy a file omitting the first X characters

I am manipulating proprietary files, which are very similar to wave files but with a custom header, longer than the wav header (200 bytes versus 36 bytes). The samples are similar though. These files are quite large (200Meg typically).
I am trying to batch convert the proprietary files to wav.
I wrote a short script using the wavefile gem. I just read the whole array of samples then create the wave file. It works fine with smaller examples but I have a memory allocation error for larger ones.
I noticed that using Fileutils.cp, copying the file is impressively fast. I am wondering if I could somehow copy the file while "omitting" the first 164 bytes, then just write the wave header in the first 36bytes and rename the file (.wav).
What would be the best/easiest way?
Something like this would likely work:
File.open(src, 'rb') do |r|
File.open(dst, 'wb') do |w|
w.write(new_dst_header)
r.seek(200)
until r.eof?
w.write(r.read(chunk_size))
end
end
end
The bigger chunk_size, the faster it goes, and the more memory you use.

How does a program determine the size of a file without reading it whole?

In the question Using C++ filestreams (fstream), how can you determine the size of a file?, the top answer is the following C++ snippet:
ifstream file("example.txt", ios::binary | ios::ate);
return file.tellg();
Running it myself I noticed that the size of arbitrarily large files could be determined instantaneously and with a single read operation.
Conventionally I would assume that to determine the size of a file, one would have to move through it byte-by-byte, adding to a byte-counter. How is this achieved instead? Metadata?
The size of the file is embedded in the file metadata in the file system. Different file systems have different ways of storing this information.
Edit Obviously, this is an incomplete answer. When someone will provide an answer where he'll exemplify on a common filesystem like ex3 or ntfs or fat exactly how the file size it's known and stored, i'll delete this answer.
The file size is stored as metadata on most filesystems. In addition to GI Joe's answer above, you can use the stat function on posix systems:
stat(3) manpage
struct stat statbuf;
stat("filename.txt", &statbuf);
printf("The file is %d bytes long\n", statbuf.st_size);
When ios::ate is set, the initial position will be the end of the file, but you are free to seek thereafter.
tellg returns the position of the current character in the input stream. The other key part is ios::binary.
So all it does it seek to the end of the file for you when it opens the filestream and tell you the current position (which is the end of the file). I guess you could say it's a sort of hack in a way, but the logic makes sense.
If you would like to learn how filestreams work at a lower level, please read this StackOverflow question.

Read an image pixel by pixel in Ruby

I'm trying to open an image file and store a list of pixels by color in a variable/array so I can output them one by one.
Image type: Could be BMP, JPG, GIF or PNG. Any of them is fine and only one needs to be supported.
Color Output: RGB or Hex.
I've looked at a couple libraries (RMagick, Quick_Magick, Mini_Magick, etc) and they all seem like overkill. Heroku also has some sort of difficulties with ImageMagick and my tests don't run. My application is in Sinatra.
Any suggestions?
You can use Rmagick's each_pixel method for this. each_pixel receives a block. For each pixel, the block is passed the pixel, the column number and the row number of the pixel. It iterates over the pixels from left-to-right and top-to-bottom.
So something like:
pixels = []
img.each_pixel do |pixel, c, r|
pixels.push(pixel)
end
# pixels now contains each individual pixel of img
I think Chunky PNG should do it for you. It's pure ruby, reasonably lightweight, memory efficient, and provides access to pixel data as well as image metadata.
If you are only opening the file to display the bytes, and don't need to manipulate it as an image, then it's a simple process of opening the file like any other, reading X number of bytes, then iterating over them. Something like:
File.open('path/to/image.file', 'rb') do |fi|
byte_block = fi.read(1024)
byte_block.each_byte do |b|
puts b.asc
end
end
That will merely output bytes as decimal. You'll want to look at the byte values and build up RGB values to determine colors, so maybe using each_slice(3) and reading in multiples of 3 bytes will help.
Various image formats contain differing header and trailing blocks used to store information about the image, data format and EXIF information for the capturing device, depending on the type. Probably going with a something that is uncompressed would be good if you are going to read a file and output the bytes directly, such as uncompressed TIFF. Once you've decided on that you can jump into the file to skip headers if you want, or just read those too to see or learn what's in them. Wikipedia's Image file formats page is a good jumping off place for more info on the various formats available.
If you only want to see the image data then one of the high-level libraries will help as they have interfaces to grab particular sections of the image. But, actually accessing the bytes isn't hard, nor is it to jump around.
If you want to learn more about the EXIF block, used to describe a lot of different vendor's Jpeg and TIFF formats ExifTool can be handy. It's written in Perl so you can look at how the code works. The docs nicely show the header blocks and fields, and you can read/write values using the app.
I'm in the process of testing a new router so I haven't had a chance to test that code, but it should be close. I'll check it in a bit and update the answer if that didn't work.

Can anyone identify this image format?

I have a format for signatures in a database that an uncooperative vendor has been using for a client of ours. We are replacing their system.
A signature begins:
0A000500010002000100020001000100010001000100010001000100D100010001004F0001000100
01000100010001000100010001000100010001000100010001000100FF00FF00FF00010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
010001000100010002001C0001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100010001000100010001000100010001000100010001000100
01000100010001000100010001000100DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00
DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00
C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000
C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00
DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00
C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000
C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00DA00FF00C100C000C100FF00
DA00FF00C100C000C100FF00DA00FF00C100C000C100...
continues with more image data and ends with a long series of 0100's.
Any ideas on what the file format is?
Thanks.
Taking the first 16 or so bytes of data and putting them into a file, the linux 'file' command says:
$ file test.file
test.file: PCX ver. 2.5 image data
Looks like it might be a really simple format. 0A seems to be a header. Then it looks like pairs of darkness values, although it seems like 50% of the space is wasted. If you post a file, I'd be happy to try to write a converter.
The whole file is necessary because it looks like there's a fixed image size, and it might take a little fiddling. Do you have an image that's not confidential, but still has data in it?
That looks like raw pixel values from an 8-bit A/D converter (scanner), embedded in 16-bit words, little-Endian (x86) format.
Are the files all the same size? That would give you a strong clue to the image size.
It looks like RAW, un-encoded image data. Have you tried loading it straight into a two dimensional buffer and seeing what you get? Copy the file and name it foo.raw and try loading it in Photoshop, for example. If my guess is correct and it is just raw 16bit samples, you will have to supply the width and height yourself. The number of channels may be 1x16bit. As tfinniga says, the first two bytes may be a header you will have to skip.
I never seen this file format... maybe is a 'private format', and looks like easy to decode.
Edit: looks like the bytes are arranged in group of 2 (2 'xx' hexadecimal numbers)

Modifying an IO stream in-place? Ruby

I've been writing a ruby programme that merges the content of two files.
For example if a torrent have been downloaded two times separately, it tries to merge their contents for the blocks which have been completed.
So, I've been looking for a method which modifies a stream only at the place required and saves only that block instead of saving the whole stream again.
I'm reading the file in blocks of 16 KiBs, and how do I "replace" (not append) the content of that 16 KiBs so that only those bytes are written to disk and not the whole file is re-written each time!
Kind of,
#Doesn't exist unfortunately.
#By default it appends instead of replacing, so file size grows.
IO.write(file_name, content, offset, :replace => true)
Is there exists a method which achieves kind of that functionality?
Open the file in "r+b" mode, seek to the location and just write to it:
f=File.new("some.existing.file", "r+b");
f.seek(1024);
f.write("test\n");
f.close()
This will overwrite 5 characters of the file, following offset 1024.
If the file is shorter than your seek offset, an appropriate number of null characters are inserted to the file.

Resources