I am a neophyte neural network user trying to get to grips with TensorFlow. I have used the MNIST dataset as a test, and would now like to use real world data.
Can anyone point me to a "Howto" or paper or source which tells me how to go about converting digital photographs in files, (jpeg, png, gif, wmf), into a tensors ready for import into TensorFlow please?
Cheers!
You can use the TensorFlow image functions to load images and convert them into tensors. After loading the images, you will likely want to look at tf.image.resize_bilinear to resize the images to standard sizes.
The standard way to load data into Tensorflow is to use a TFRecords file.
Another approach is to convert whatever data you have into a supported format. This approach makes it easier to mix and match data sets and network architectures. The recommended format for TensorFlow is a TFRecords file containing tf.train.Example protocol buffers.
-Tensorflow Documentation
Basically TFRecord is a binary representation of your data or images along with its labels, file names, and other information. Its main advantages are to allow you to stream data into the model efficiently by using Tensorflow's threading and to increase flexibility between different models.
You can use this script to generate your own TFRecord files.
Additionally, you can read on how to use the script here.
Related
I would like to know if the image file type matters at all in image classification using Keras, Tensorflow, or any other machine learning library. For example:
If I were to train using only JPG files, will the accuracy be significantly affected if I were to evaluate the model using only PNG files?
If so, will it be better to train using both JPG and PNG files so I can evaluate using both types?
Or does the image file type not matter at all?
The file type does not matter.
During training (and inference for that matter) images are converted into a tensors (you can think of this just as a multi dimensional array) where each pixel is represented by a small group of numbers (or a single number for black and white images).
Machine learning is performed on these tensors rather than the image itself so the original file format really doesn't matter.
I am very unfamiliar with Caffe. My task is to train an autoencoder net on image pairs, given in .tif format, where one is a grayscale image of nerves, and the other is the corresponding binary mask which shows if a certain structure is present on the image or not. I have these in the same "train" folder. What I would like to accomplish, is a meaningful experiment with these images (segmentation, classification, it is not specified). My first problem is that I do not know how to feed the images into the net without an existing train.txt. Can I use the images directly, or another format like lmdb, hdf5 needed? Any suggestion is appreciated.
you can accomplish it with simple classification (existing like alexnet, googlenet, lenet). You can use only the binary mask or gray scale image and the class name to do this. Nvidia Digits is a good graphical tool to make the pair dataset and learning....
Please see this link:
https://developer.nvidia.com/digits
I have an wavelet compressed image, but not sure what parameter its using for compression, is there a way to un-compress this image?. I tried using a jpeg-2000 image viewer but it did not help.
As per my understanding one should know the wavelet on which it was compressed to proceed further, but this information is missing at present. Does this mean the images remain encrypted and cant be decoded?
Do you know the data format? if you plot it can you see the main bands and sub bands at the various levels of detail?
Once you have the data format, you can simply try a few possible wavelet shapes, starting with the Haar for simplicity. At least you will get a good impression of the image content.
If you don't know the data format, you are probably stuffed.
In medical imaging, there appears to be two ways of storing huge gigapixel images:
Use lots of JPEG images (either packed into files or individually) and cook up some bizarre index format to describe what goes where. Tack on some metadata in some other format.
Use TIFF's tile and multi-image support to cleanly store the images as a single file, and provide downsampled versions for zooming speed. Then abuse various TIFF tags to store metadata in non-standard ways. Also, store tiles with overlapping boundaries that must be individually translated later.
In both cases, the reader must understand the format well enough to understand how to draw things and read the metadata.
Is there a better way to store these images? Is TIFF (or BigTIFF) still the right format for this? Does XMP solve the problem of metadata?
The main issues are:
Storing images in a way that allows for rapid random access (tiling)
Storing downsampled images for rapid zooming (pyramid)
Handling cases where tiles are overlapping or sparse (scanners often work by moving a camera over a slide in 2D and capturing only where there is something to image)
Storing important metadata, including associated images like a slide's label and thumbnail
Support for lossy storage
What kind of (hopefully non-proprietary) formats do people use to store large aerial photographs or maps? These images have similar properties.
It seems like starting with TIFF or BigTIFF and defining a useful subset of tags + XMP metadata might be the way to go. FITS is no good since it is basically for lossless data and doesn't have a very appropriate metadata mechanism.
The problem with TIFF is that it just allows too much flexibility, but a subset of TIFF should be acceptable.
The solution may very well be http://ome-xml.org/ and http://ome-xml.org/wiki/OmeTiff.
It looks like DICOM now has support:
ftp://medical.nema.org/MEDICAL/Dicom/Final/sup145_ft.pdf
You probably want FITS.
Arbitrary size
1--3 dimensional data
Extensive header
Widely used in astronomy and endorsed by NASA and the IAU
I'm a pathologist (and hobbyist programmer) so virtual slides and digital pathology are a huge interest of mine. You may be interested in the OpenSlide project. They have characterized a number of the proprietary formats from the large vendors (Aperio, BioImagene, etc). Most seem to consist of a pyramidal zoomed (scanned at different microscopic objectives, of course), large tiff files containing multiple tiled tiffs or compressed (JPEG or JPEG2000) images.
The industry standard is DICOM Sup 145; getting vendors to adopt it though has been sluggish, but inventing yet another format would probably not be helpful.
PNG might work for you. It can handle large images, metadata, and the PNG format can have some interlacing, so you can get up to (down to?) an n/8 x n/8 downsampled image pretty easily.
I'm not sure if PNG can do rapid random access. It is chunked, but that might not be enough.
You could represent sparse data with the transparency channel.
JPEG2000 might be worth a look, some interesting efforts from National libraries in this space.
I'm thinking to build a library to manipulate images(my own image type that I will develop), but first I need to understand the structure of a image
How it is mounted?
About the layer technology...
Where I can find some good resources to understand these things?
Thanks.
That all depends on the image format in question.
Most image formats, however, consist of the following:
A header that contains general file information (how long, what format, dimensions, color space, compression algorithm, etc.)
The pixel data (potentially compressed, in which case some other structure may apply)
Other metadata (EXIF, ...)
Many popular image formats such as JPEG or PNG have freely available specifications of the file format.
If you actually want to work with more complex images, containing layers and such (possibly Photoshop or similar) then things get more difficult. They additionally contain layers, so multiple chunks of pixel data, maybe metadata for the layers, in the case of Photoshop even vector data (for layer masks and other paths), etc.
What's more, most primary file formats used by major proprietary image editing software tend to be not fully specified, at least not publicly. There are resources out there but expect them to be incomplete at best.
Still, starting a project like this without much prior knowledge of image file formats in general might not be a feasible idea.
A good start to everyone that needs to know the basics about digital images is the chapter 2 of the classic book by Gonzalez and Woods, Digital Image Processing.
A short answer, roughly speaking: for manipulation in memory, images are 2D arrays. There are lots of variants, but the 2D array is the classic way.
For C, C++ and Python, take a look on OpenCV. For Python, see PIL. For Java, see JAI. Finally, to a overview about an "image structure", take a close look inside IplImage structure in OpenCV documentation.
Image file formats vary wildly. However, depending on which language/platform you're coding against, you may have generalized means of working with images and translating them into the format you chose. Each platform will have its own means of building and accessing images, however, so there's little I can tell you of substance without a declaration of your programming platform of choice.
Personally, I prefer C#/.NET. So here are some links on image manipulation in that platform:
http://www.aspfree.com/c/a/C-Sharp/Basic-Image-Manipulation-using-GDI-and-C/
http://www.aspfree.com/c/a/Code-Examples/Handling-Animation-and-Bitmaps-Using-GDI-for-Image-Manipulation/
Each image format, has a differant structure and comprresion.
Maybe you should explain in more detail your goals.
A quick amazon search yields a couple of books that could be very useful on the subject. Both are based around openGL, one of the most common graphic libraries. The first is a general introduction to computer graphics sort of text book and the second is a manual for openGL (commonly known as the red book).
Computer Graphics with OpenGL (3rd Edition)
OpenGL Programming Guide (The Red Book)
I can personally attest to the usefulness of both books.
If you're interested in the innards of various image file formats wotsit is a pretty good start. If you prefer hardcopy then go to the Encyclopedia of Graphics file formats. And if you want to look at sample sourcecode check out imagemagick. It can open-, convert to-, and save- most popular image file formats written in C++ with interfaces to most other languages.
Unless you're doing something very unique, I would encourage you to use an existing file format. Look at PNG or TIFF. They are incredibly flexible.
As a veteran in the field, I would say that the last thing the world needs is a new image file format. ;-)