Clustering Photos in R? - image

I have a kind of general R question here:
Usually with digicams we tend to click a lot of immages which may be repetitive and can waste online space while sharing on Picassa or is an overhead when trying to delete some unwanted images.
Is it possible to cluster photos using R? I mean there are some clustering abilities in Matlab for image processing, but is this kind of functionality available or are there any suggestions to do this so in R?
Please provide some ideas if any on this topic.

If you look at CRAN, there are various (I count about 10) packages to read image data. And of course, there are various packages to do clustering. In theory, you could just plug the raw image data into the clustering algorithms, but in practice that wouldn't work very well. In terms of speed, it would be very slow, and in terms of accuracy, it would probably be pretty bad too. Modern techniques to cluster image data rely on specialized features extracted from images and operate on that. The best features are application dependent, but some of the best known are SIFT, SURF, and HOG. Older techniques relied on histograms of colors of the image as features, and that is quite doable with the aforementioned R packages, but it is not very accurate - it can hardly distinguish between a picture of the sea and a picture of a blue room.
So what to do? It depends on your ultimate objective, really. One way could be using one of various open source feature extractors out there, save the data to text or other R-readable formats, and then do the data processing in R as usual.
A nice open source C library to extract features that has a cli interface is vlfeat. If you use this, I recommend using dense SIFT extraction on the three color channels. Then represent each image by the concatenated SIFT vectors and apply your favorite clustering technique (that can handle vectors with dimensionalities in the thousands). That would hardly give you state of the art performance, but it's a start.
This page has various reference implementations of feature extractors, but binary only.
Beware: in my experience, R doesn't scale too well with large, high dimensional datasets (with sizes in the GB range). I love R to death, but use C++ for this stuff.

Related

How can I isolate and recolor specific color range?

Given an image of the region containing the lips and other "noise" (teeth, skin), how can we isolate and recolor only the lips (simulating a "lipstick" effect)?
Attached is a photo describing the lips/mouth states.
What we have tried so far is a three-part process:
Color matching the lips using a stable point on the lips (provided by internal API).
Use this color as the base color for the lips isolation.
Recolor the lips (lipstick behavior)
We tried a few algorithms like hue difference, HSV difference, ∆and E after converting them to CIE color space. Unfortunately, nothing has panned out or has produced artifacts due to the skin's relative similarity in color to the lips and the discoloration from shadows cast by the nose and mouth.
What are we missing? Is there a better way to approach it?
We are looking for a solution/direction from a classic Computer Vision color algorithm, not a solution from the Machine Learning/Depp Learning domain. Thanks!
You probably won't like this answer, but your question is ill-posed (there is no measurable solution that is better than others, there are only peoples' opinions.)
In this case, the best answer you can hope for then is usually:
Ask an expert for a large set of examples that would be acceptable in practice.
Your problem can easily be solved by an appropriate artist (who you trust will produce usable results) with access to the right tools (for example photoshop,) but a single artist (or even a group of them) can't possibly scale to millions (or whatever large number you care about) of examples.
To address the short-coming of the artist-based solution, you can use the following strategy:
Collect a sufficiently large set of before and after images created by artists, who you deem trustworthy.
Apply your favorite machine learning algorithm to learn a mapping from the before to the after images. There are many possible choices, and it almost really doesn't matter which you choose as long as you know how to use it well.
Note, the above two steps are usually not one-and-done, as most algorithms are. Usually, you will come across pathological or not-well behaved examples to your ML solution above in using the product. The key is to collect these examples, pass them through the artist and retrain or update your ML model. Repeat this enough times and you will produce a state-of-the-art solution to your problem.
Whether you have the funding, time, motivation and resources to accomplish this is another matter.
You should try semantic segmentation techniques that would definitely give you very good results and it would be a generalized concept.

Time Series Anomaly Detection from Data vs Image

I was assigned with project to do anomaly detection on for our company KPI. I googled and found AnomalyDetection by Twitter. There was an idea from my colleague to do the anomaly detection on the graph images (comparing with previous week images to identify anomaly points) instead of using time-series raw data.
I am not familiar with the Anomaly Detection, anyone here experienced and able to advice which one is better (Anomaly Detection from data or image) in term of:
1. Accuracy
2. Storage
3. Processing
Advantages:
Data-agnostic. Can theoretically be ran on anything where one can get an image/visualization out.
Image models are relatively well understood.
Pretrained models are available.
Disadvantages:
Requires much more data to learn useful model.
The image pixel space is much more complicated than the time-series it represents. Probably at least 100x.
Requires much more compute power. Both at training time, and at prediction time. Probably at least 100x.
Requires much more storage for datasets. Probably at least 100x.
Sensitive to changes in visualization.
A change in tickmarks or font for example would be an anomaly. Even a change in image compression may impact, if not controlled for.
Lose explain-ability. May be hard to know why a certain image is anomaly, even for simple cases like a mean shift.
Much more complex model setup and infrastructure needed
For an application like Anomaly Detection on Time Series on metrics, I would not recommend doing it. I am not even sure I have seen it studied.
I think it is unlikely that a high performing Anomaly Detection system for metrics can be built effectively with image processing on graphs.
Anomalies are typically quite rare, which means that it is a "low data" scenario. But also many anomalies are quite simple, and can be detected with simple methods - as basic as well chosen thresholds can go a long way. Using image processing does not help with any of these challenges, in fact it is worse in most regards.

Artificial neural network image transformation

I have a pairs of images (input-output) but I don't know the transformation to going from A (input) to B (output). I want to record image A and get image B. Physically I can change the setup to get A or B, but I want to do it by software.
If I understood well, a trained Artificial Neural Network is able to do that, having an input can give the corresponding output, is it right?
Is there any software/ANN that just "training" it with entering a number of input-output pairs will be able to provide the correct output if the input is a new (but similar to the others) image?
Thanks
If you have some relevant amount of image pairs (input/output pair) and you don't know transformation between input and output you could train ANN on that training set to imitate that unknown transformation. You will be able to well train your ANN only if you have sufficient amount of training image pairs, but it could be pretty impossible when that unknown transformation is complicated.
For example if that transformation simply increases intensity values of pixels at input image by given value, ANN will very fast learn to imitate that behavior, but if that unknown transformation is some complicated convolution or few serial convolutions or something more complicated it will be very hard, near impossible to train ANN to imitate that transformation. So, more complex transformation will need bigger training set and more complex ANN design.
There are plenty of free opensource ANN libraries implemented in many languages. You could start for example with that tutorial: http://www.codeproject.com/Articles/13091/Artificial-Neural-Networks-made-easy-with-the-FANN
What you are asking is possible in principle -- in theory, an ANN with sufficiently many hidden units can learn an arbitrary function to map inputs to outputs. However, as the comments and other answers have mentioned, there may be many technical issues with your particular problem that could make it impractical. I would classify these problems as (a) mapping complexity, (b) model complexity, (c) scaling complexity, and (d) implementation complexity. They are all somewhat related, but hopefully this is a useful way to break things down.
Mapping complexity
As mentioned by Springfield762, there are many possible functions that map from one image to another image. If the relationship between your input images and your output images is relatively simple -- like increasing the intensity of each pixel by a constant amount -- then an ANN would be able to learn this mapping without much difficulty. There are probably many more transformations that would be similarly easy to learn, such as skewing, flipping, rotating, or translating an image -- basically any affine transformation would be easy to learn. Other, nonlinear transformations could also be feasible, such as squaring the intensity of each pixel.
As a general rule, the more complicated the relationship between your input and output images, the more difficult it will be to get a model to learn this mapping for you.
Model complexity
The more complex the mapping from inputs to outputs, the more complex your ANN model will be to be able to capture this mapping. Models with many hidden layers have been shown in the past 10 years to perform quite well on tasks that people had previously thought impossible, but often these state-of-the-art models have millions or even billions of parameters and take weeks to train on GPU hardware. A simple model can capture many simple mappings, but if you have a complex input-output map to learn, you'll need a large, complex model.
Scaling complexity
Yves mentioned in the comments that it can be difficult to scale models up to typical image sizes. If your images are relatively small (currently the state of the art is to model images on the order of 100x100 pixels), then you can probably just throw a bunch of raw pixel data at an ANN model and see what happens. But if you're using 6000x4000 images from your shiny Nikon DSLR, it's going to be quite difficult to process those in a reasonable amount of time. You'd be better off compressing your image data somehow (PCA is a common technique) and then trying to learn the mapping in the compressed space.
In addition, larger images will have a larger space of possible mappings between them, so you'll need more of your larger images as training data than you would if you had small images.
Springfield762 also mentioned this: If the mapping between your input and output images is simple, then you'll only need a few examples to learn the mapping successfully. But if you have a complicated mapping, then you'll need much more training data to have a chance at learning the mapping properly.
Implementation complexity
It's unlikely that a tool already exists that would let you just throw image data into an ANN model and have a mapping appear. Most likely you'll need, at a minimum, to implement some code that will pre-process your image data. In addition, if you have lots of large images you'll probably need to write code to handle loading data from disk, etc. (There are a lot of "big data" tools for things like this, but they all require some amount of work to get set up.)
There are many, many open source ANN toolkits out there nowadays. FANN (already mentioned) is a popular one in C++ with bindings in other languages. Caffe is quite popular, and is also implemented in C++ with bindings. There seem to be many toolkits that use Python and Theano or some other GPU acceleration library -- Keras, Lasagne, Hebel, Pylearn2, neon, and Theanets (I wrote this one). Many people use Torch, written in Lua. Matlab has at least one neural network toolbox. I'm less familiar with other ecosystems, but Java seems to have Deeplearning4j, C# has Accord, and even R has darch.
But with any of these neural network toolkits, you're going to have to write some code to load the data, process it into the appropriate input format, construct (or load) a network model, train the model, etc.
The problem you're trying to solve is a canonical classification problem that neural networks can help you solve. You treat the B images as a set of labels that you match to A, and once trained, the neural network will be able to match the B images to new input based on where the network locates new input in a high-dimensional vector space. I assume you'd use some combination of convolutional networks to create your features, and softmax for multinomial classification on the output layer. More here: http://deeplearning4j.org/convolutionalnets.html
Since this has been written there has been a lot of work in the realm of cgans ( conditional generative adversarial networks ) please refer to:
https://arxiv.org/pdf/1611.07004.pdf

OpenCV: Fingerprint Image and Compare Against Database

I have a database of images. When I take a new picture, I want to compare it against the images in this database and receive a similarity score (using OpenCV). This way I want to detect, if I have an image, which is very similar to the fresh picture.
Is it possible to create a fingerprint/hash of my database images and match new ones against it?
I'm searching for a alogrithm code snippet or technical demo and not for a commercial solution.
Best,
Stefan
As Pual R has commented, this "fingerprint/hash" is usually a set of feature vectors or a set of feature descriptors. But most of feature vectors used in computer vision are usually too computationally expensive for searching against a database. So this task need a special kind of feature descriptors because such descriptors as SURF and SIFT will take too much time for searching even with various optimizations.
The only thing that OpenCV has for your task (object categorization) is implementation of Bag of visual Words (BOW).
It can compute special kind of image features and train visual words vocabulary. Next you can use this vocabulary to find similar images in your database and compute similarity score.
Here is OpenCV documentation for bag of words. Also OpenCV has a sample named bagofwords_classification.cpp. It is really big but might be helpful.
Content-based image retrieval systems are still a field of active research: http://citeseerx.ist.psu.edu/search?q=content-based+image+retrieval
First you have to be clear, what constitutes similar in your context:
Similar color distribution: Use something like color descriptors for subdivisions of the image, you should get some fairly satisfying results.
Similar objects: Since the computer does not know, what an object is, you will not get very far, unless you have some extensive domain knowledge about the object (or few object classes). A good overview about the current state of research can be seen here (results) and soon here.
There is no "serve all needs"-algorithm for the problem you described. The more you can share about the specifics of your problem, the better answers you might get. Posting some representative images (if possible) and describing the desired outcome is also very helpful.
This would be a good question for computer-vision.stackexchange.com, if it already existed.
You can use pHash Algorithm and store phash value in Database, then use this code:
double const mismatch = algo->compare(image1Hash, image2Hash);
Here 'mismatch' value can easly tell you the similarity ratio between two images.
pHash function:
AverageHash
PHASH
MarrHildrethHash
RadialVarianceHash
BlockMeanHash
BlockMeanHash
ColorMomentHash
These function are well Enough to evaluate Image Similarities in Every Aspects.

Determine the differences between two nearly identical photographs

This is a fairly broad question; what tools/libraries exist to take two photographs that are not identical, but extremely similar, and identify the specific differences between them?
An example would be to take a picture of my couch on Friday after my girlfriend is done cleaning and before a long weekend of having friends over, drinking, and playing rock band. Two days later I take a second photo of the couch; lighting is identical, the couch hasn't moved a milimeter, and I use a tripod in a fixed location.
What tools could I use to generate a diff of the images, or a third heatmap image of the differences? Are there any tools for .NET?
This depends largely on the image format and compression. But, at the end of the day, you are probably taking two rasters and comparing them pixel by pixel.
Take a look at the Perceptual Image Difference Utility.
The most obvious way to see every tiny, normally nigh-imperceptible difference, would be to XOR the pixel data. If the lighting is even slightly different, though, it might be too much. Differencing (subtracting) the pixel data might be more what you're looking for, depending on how subtle the differences are.
One place to start is with a rich image processing library such as IM. You can dabble with its operators interactively with the IMlab tool, call it directly from C or C++, or use its really decent Lua binding to drive it from Lua. It supports a wide array of operations on bitmaps, as well as an extensible library of file formats.
Even if you haven't deliberately moved anything, you might want to use an algorithm such as SIFT to get good sub-pixel quality alignment between the frames. Unless you want to treat the camera as fixed and detect motion of the couch as well.
I wrote this free .NET application using the toolkit my company makes (DotImage). It has a very simple algorithm, but the code is open source if you want to play with it -- you could adapt the algorithm to .NET Image classes if you don't want to buy a copy of DotImage.
http://www.atalasoft.com/cs/blogs/31appsin31days/archive/2008/05/13/image-difference-utility.aspx
Check out Andrew Kirillov's article on CodeProject. He wrote a C# application using the AForge.NET computer vision library to detect motion. On the AForge.NET website, there's a discussion of two frame differences for motion detection.
It's an interesting question. I can't refer you to any specific libraries, but the process you're asking about is basically a minimal case of motion compensation. This is the way that MPEG (MP4, DIVX, whatever) video manages to compress video so extremely well; you might look into MPEG for some information about the way those motion compensation algorithms are implemented.
One other thing to keep in mind; JPEG compression is a block-based compression; much of the benefit that MPEG brings from things is to actually do a block comparison. If most of your image (say the background) is the same from one image to the next, those blocks will be unchanged. It's a quick way to reduce the amount of data needed to be compared.
just use the .net imaging classes, create a new bitmap() x 2 and look at the R & G & B values of each pixel, you can also look at the A (Alpha/transparency) values if you want to when determining difference.
also a note, using the getPixel(y, x) method can be vastly slow, there is another way to get the entire image (less elegant) and for each ing through it yourself if i remember it was called the getBitmap or something similar, look in the imaging/bitmap classes & read some tutes they really are all you need & aren't that difficult to use, dont go third party unless you have to.

Resources