This is a fairly broad question; what tools/libraries exist to take two photographs that are not identical, but extremely similar, and identify the specific differences between them?
An example would be to take a picture of my couch on Friday after my girlfriend is done cleaning and before a long weekend of having friends over, drinking, and playing rock band. Two days later I take a second photo of the couch; lighting is identical, the couch hasn't moved a milimeter, and I use a tripod in a fixed location.
What tools could I use to generate a diff of the images, or a third heatmap image of the differences? Are there any tools for .NET?
This depends largely on the image format and compression. But, at the end of the day, you are probably taking two rasters and comparing them pixel by pixel.
Take a look at the Perceptual Image Difference Utility.
The most obvious way to see every tiny, normally nigh-imperceptible difference, would be to XOR the pixel data. If the lighting is even slightly different, though, it might be too much. Differencing (subtracting) the pixel data might be more what you're looking for, depending on how subtle the differences are.
One place to start is with a rich image processing library such as IM. You can dabble with its operators interactively with the IMlab tool, call it directly from C or C++, or use its really decent Lua binding to drive it from Lua. It supports a wide array of operations on bitmaps, as well as an extensible library of file formats.
Even if you haven't deliberately moved anything, you might want to use an algorithm such as SIFT to get good sub-pixel quality alignment between the frames. Unless you want to treat the camera as fixed and detect motion of the couch as well.
I wrote this free .NET application using the toolkit my company makes (DotImage). It has a very simple algorithm, but the code is open source if you want to play with it -- you could adapt the algorithm to .NET Image classes if you don't want to buy a copy of DotImage.
http://www.atalasoft.com/cs/blogs/31appsin31days/archive/2008/05/13/image-difference-utility.aspx
Check out Andrew Kirillov's article on CodeProject. He wrote a C# application using the AForge.NET computer vision library to detect motion. On the AForge.NET website, there's a discussion of two frame differences for motion detection.
It's an interesting question. I can't refer you to any specific libraries, but the process you're asking about is basically a minimal case of motion compensation. This is the way that MPEG (MP4, DIVX, whatever) video manages to compress video so extremely well; you might look into MPEG for some information about the way those motion compensation algorithms are implemented.
One other thing to keep in mind; JPEG compression is a block-based compression; much of the benefit that MPEG brings from things is to actually do a block comparison. If most of your image (say the background) is the same from one image to the next, those blocks will be unchanged. It's a quick way to reduce the amount of data needed to be compared.
just use the .net imaging classes, create a new bitmap() x 2 and look at the R & G & B values of each pixel, you can also look at the A (Alpha/transparency) values if you want to when determining difference.
also a note, using the getPixel(y, x) method can be vastly slow, there is another way to get the entire image (less elegant) and for each ing through it yourself if i remember it was called the getBitmap or something similar, look in the imaging/bitmap classes & read some tutes they really are all you need & aren't that difficult to use, dont go third party unless you have to.
Related
Given an image of the region containing the lips and other "noise" (teeth, skin), how can we isolate and recolor only the lips (simulating a "lipstick" effect)?
Attached is a photo describing the lips/mouth states.
What we have tried so far is a three-part process:
Color matching the lips using a stable point on the lips (provided by internal API).
Use this color as the base color for the lips isolation.
Recolor the lips (lipstick behavior)
We tried a few algorithms like hue difference, HSV difference, ∆and E after converting them to CIE color space. Unfortunately, nothing has panned out or has produced artifacts due to the skin's relative similarity in color to the lips and the discoloration from shadows cast by the nose and mouth.
What are we missing? Is there a better way to approach it?
We are looking for a solution/direction from a classic Computer Vision color algorithm, not a solution from the Machine Learning/Depp Learning domain. Thanks!
You probably won't like this answer, but your question is ill-posed (there is no measurable solution that is better than others, there are only peoples' opinions.)
In this case, the best answer you can hope for then is usually:
Ask an expert for a large set of examples that would be acceptable in practice.
Your problem can easily be solved by an appropriate artist (who you trust will produce usable results) with access to the right tools (for example photoshop,) but a single artist (or even a group of them) can't possibly scale to millions (or whatever large number you care about) of examples.
To address the short-coming of the artist-based solution, you can use the following strategy:
Collect a sufficiently large set of before and after images created by artists, who you deem trustworthy.
Apply your favorite machine learning algorithm to learn a mapping from the before to the after images. There are many possible choices, and it almost really doesn't matter which you choose as long as you know how to use it well.
Note, the above two steps are usually not one-and-done, as most algorithms are. Usually, you will come across pathological or not-well behaved examples to your ML solution above in using the product. The key is to collect these examples, pass them through the artist and retrain or update your ML model. Repeat this enough times and you will produce a state-of-the-art solution to your problem.
Whether you have the funding, time, motivation and resources to accomplish this is another matter.
You should try semantic segmentation techniques that would definitely give you very good results and it would be a generalized concept.
What are the different options and solutions (software) that will help distinguish professional (good) from amateur (bad) photo?
The criteria can be the contrast, sharpness, noise, presence of compression artifacts, etc. The question is, what are the tools that allow all this to determine it (the machine, not the man). For all of these criteria can be represented as mathematical models, you think?
Or in other words - to "feed" tool 1000 high-quality photos and 1000 substandard. And machine itself has identified the factors that distinguish the good from the bad image.
This is a quite vague definition of a problem. The only thing you have is 1000 high-quality photos and 1000 substandard photos. Your application however, is quite concrete, and I doubt (but I'm not sure) that you will find such a software.
Without looking to your images and have some tests is also difficult to say if contrast/gamma would be enough to classify them properly.
What you can do, if you know a bit of coding in matlab/python/C, is to use some existing libraries to try to solve your problem. I can't help you with that, as this itself is a quite tedious work, but, I can give you some insights.
To define your problem you will need:
Input: 1000 pro images, 1000 std images
You can represent this as 2000 images and a 2000 binary vector (1 for pro, 0 for std)
Features
Images itself might not give you enough information. What you can do is extract features from images. This step is called feature extraction and is an open research field in Computer Vision. There several feature extractors out there, you can try a couple of the most used ones, such as HoG or SIFT (have a look here for examples).
This feature extractors will give you a 1xM numerical vector for each image. With N images, you have a NxM matrix composed of N images and their descriptor.
Classification:
Once you managed to extract features from the image, having X = NxM data and y = label binary vector, you can use any machine learning algorithm, such as Deep Neural Networks, Random Forests, Supported Vector Machines, or any other one, to train your data, and classify it later.
By putting everything together, you might be able to get decent results.
Is this professional vs amateur photographer or equipment classification? I mean like distinguishing between a DLSR photo or a cell phone photo. Or is this distinguishing between an amateur with a DLSR and a professional with the same equipment? Or, are we talking about photoshop editing at the end.
In the case of equipment, I think features to look at would be noise, contrast, color gamut. In the case of skill of the photographer, you will probably have to look at features based on edge representations, natural scene metrics etc.
But, you will need to create a data matrix and then run a machine learning classification algorithm on it and hopefully find some features.
Are you making or looking for art for humans or for robots?
The definition of a great photo is subjective by definition. What to one person may be junk to another is genius. Trying to take that and turn it into an equation is asking for AI to take over humankind.
I don't mean to be harsh in my assessment. My degree is in Fine Art, Painting. I put a lot of time and effort into thinking about images. It's a wonder to me how a child might make an image that seemed to be thoughtless but was a breakthrough in my perception. Conversely you can work for hours, day, months or even years and then feel like the result just doesn't measure up.
Photography is an ingenious invention, therefore all photography is a source for amazement.
I do agree with you however. Some photos are truly impressive. I think that if we approach this as coders than what we are seeking is 'likes' or 'page views' or some other method of getting counts from many people. I know that is not the answer you were looking for but I don't think you can find a better one. I wish you well on your quest.
If you want to judge a photo on technicalities then the edge of the physics should be your target. Currently that would be mirrorless cameras and 3d imaging.
I need to implement algorithm which for input has picture ( jpeg ) and create new picture like output, but only with bodies ( background is removed completely ). Input picture is picture with people from vacation and I need to recognize human bodies and remove background. Can someone suggest me what algorithm to use, what book to buy to learn that algorihms ?
Check this link it will perfectly answer your question of removing the background and performing further processing
neural networks are particuarly useful for this kind of task, but the theory is a universe, if you're doing it from scratch ... that's a lot of work
This is a segmentation problem. In the general case, segmenting images is a hard research problem (I just spent five years doing a doctorate on segmenting greyscale medical images, for example) and the way you go about it is strongly tied to the type of images with which you have to deal. The best advice I can give is to go and read the appropriate literature on segmenting colour images (e.g. use Google Scholar). In terms of books, this one's a good general-purpose introduction to image processing:
http://www.amazon.co.uk/Digital-Image-Processing-Rafael-Gonzalez/dp/0130946508/ref=sr_1_7?ie=UTF8&qid=1326236038&sr=8-7
Searching for "segmenting people in colour images" on Google seems to turn up some good links, incidentally.
I have a question for you: you want to implement this using an algorithm? If so, then it might require a lot of things to be done (provided you are new to the field of image processing).
Otherwise you may try using masking techniques in image editing software like Adobe Photoshop (that would hardly take 15 mins, depending upon how well you know it)
A good book to start with image processing techniques is: "Digital Image Processing" by Gonzalez and Woods; it starts from the basics, and explains stuff in depth.
Still it may take a lot of time to develop an algorithm to do this job. I recommend you use some library for the same. OpenCV(opensource computer vision) is an excellent choice. The library itself comes with demos which include programs for face detection etc. The inbuilt functions provide a variety of features (edge detection/Feature identification and extraction, you may have to use this) Here's the link
http://opencv.willowgarage.com/wiki/
The link provides a lot of reference material that you can make use of! :)
Start with facial recognition software and algorithms; they have been the most refined over the years and as long as all of your bodies have heads, you can use exif data to figure image capture orientation (of course you can't completely rely on that), sample the facial skin to get skin tone ranges, and find the attached body. Anything that is not head and body should be deleted. This process assumes that a person has roughly the same skin tone on their face as their body and the camera flash isn't washing this out. You could grab the flash duration and some other attributes from exif and adjust your ranges accordingly.
A lot of software out there can recognize faces (look at iPhoto for example), so you'll have to use the face as a reference point, along with skin tone, to find your body edges. You result isn't going to be perfect, but as long as your approach is sound, you'll end up with something useful.
And release your software as open source when you're done so I can use it... :)
You can download a free PDF of the book Computer Vision by Richard Szeliski from the author's website. Not only do you have a free book on algorithms, but it's a book that addresses this specific problem.
http://szeliski.org/Book/
You'll see this image at the top of that page of the author's website.
Used copies of the hardcover are available for about $62 if you check addall.com. If you spent some time doing image processing, you'll appreciate having a paper copy of at least one good general reference book.
Its tough but not impossible. I can't give you any code but Peter Norvig had a great talk on the value of data and in the talk he shows how he was able to take a picture of lake and remove all the houses blocking the image and have the lake expanded with boats,etc..
The computer basically learned how lakes look and boats go on lakes and then removed the houses and placed it there. He explains his process(but no code or anything).
Here it is:
Peter Norvig - The Unreasonable Effectiveness of Data
http://www.youtube.com/watch?v=yvDCzhbjYWs
I have a kind of general R question here:
Usually with digicams we tend to click a lot of immages which may be repetitive and can waste online space while sharing on Picassa or is an overhead when trying to delete some unwanted images.
Is it possible to cluster photos using R? I mean there are some clustering abilities in Matlab for image processing, but is this kind of functionality available or are there any suggestions to do this so in R?
Please provide some ideas if any on this topic.
If you look at CRAN, there are various (I count about 10) packages to read image data. And of course, there are various packages to do clustering. In theory, you could just plug the raw image data into the clustering algorithms, but in practice that wouldn't work very well. In terms of speed, it would be very slow, and in terms of accuracy, it would probably be pretty bad too. Modern techniques to cluster image data rely on specialized features extracted from images and operate on that. The best features are application dependent, but some of the best known are SIFT, SURF, and HOG. Older techniques relied on histograms of colors of the image as features, and that is quite doable with the aforementioned R packages, but it is not very accurate - it can hardly distinguish between a picture of the sea and a picture of a blue room.
So what to do? It depends on your ultimate objective, really. One way could be using one of various open source feature extractors out there, save the data to text or other R-readable formats, and then do the data processing in R as usual.
A nice open source C library to extract features that has a cli interface is vlfeat. If you use this, I recommend using dense SIFT extraction on the three color channels. Then represent each image by the concatenated SIFT vectors and apply your favorite clustering technique (that can handle vectors with dimensionalities in the thousands). That would hardly give you state of the art performance, but it's a start.
This page has various reference implementations of feature extractors, but binary only.
Beware: in my experience, R doesn't scale too well with large, high dimensional datasets (with sizes in the GB range). I love R to death, but use C++ for this stuff.
Update
I asked this question quite a while ago now, and I was curious if anything like this has been developed since I asked the question?
I don't even know if there is a term for this kind of algorithm, and I guess there won't be if nobody has invented it yet. However it also makes googling for this a bit hard. Does anybody know if there is a term for this algorithm/principle yet?
This is an idea I have been thinking about, but I do not quite know how to solve it. I would like to know if any solutions like this exists out there, or if you guys have any idea how this could be implemented.
Steganography
Steganography is basically the art of hiding messages. In modern days we do this digitally by e.g. modifying the least significant bits in a image as the one below. Thus for every pixel and for every colour component of that pixel we might be able to hide a byte or two.
This alternation is not visibly by the naked eye, but analysing the least significant bits might reveal patterns that exposes the existence and possibly content of a hidden message. To counter this we simply encrypt the message before embedding it in the image, which keeps the message safe and also helps preventing discovery of the existence of a hidden message.
Thus, in principle, steganography provides the following:
Hiding encrypted message in any kind of media data. (Images, music, video, etc.)
Complete deniability of the existence of a hidden message without the correct key.
Extraction of the hidden message with the correct key.
(source: cs.vu.nl)
Semacodes
Semacodes are a way of encoding data in a visually representation, that may be printed, copied, and scanned easily. The Data Matrix shown below is a example of a semacode containing the famous Lorem Ipsum text. This is essentially a 2D barcode with a higher capacity that usually barcodes. Programs for generating semacodes are readily available, and ditto for software for reading them, especially for cell phones. Semacodes usually contains error correcting codes, are generally very robust, and can be read in very damaged conditions.
Thus semacodes has the following properties:
Data encoding that may be printed and copied.
May be scanned and interpreted even in damaged (dirty) conditions, and generally a very robust encoding.
Combining it
So my idea is to create something that combines these two, with all of the combined properties. This means it would have to:
Embed a encrypted message in any media, probably a scanned image.
The message should be extractable even if the image is printed and scanned, and even partly damaged.
The existence of a embedded message should be undetectable without the key used for encryption.
So, first of all I would like to know if any solutions, algorithms or research is available on this? Secondly I would like to hear any ideas/thoughts on how this might be done?
I really hope to get a good discussion going on the possibilities and feasibility of implementing something like this, and I am looking forward to reading your answers.
Update
Thanks for all the good input on this. I will probably work a bit more on this idea when I have more time. I am convinced it must be possible. Think about research in embedding watermarks in music and movies.
I imagine part of the robustness of a semacode to damage/dirt/obscuration is the high contrast between the two states of any "cell". The reader can still make a good guess as to the actual state, even with some distortion.
That sort of contrast is not available in a photographic image, and is the very reason why steganography works - the lsb bit-flipping has almost no visual effect on the image itself, while digital fidelity ensures that a non-visual system can still very accurately read the embedded data.
As the two applications are sort of at opposite ends of the analog/digital spectrum (semacodes are all about being decipherable by analog (visual) processing but are on paper, not digital; steganography is all about the bits in the file and cares nothing for the analog representation, whether light or sound or something else), I imagine a combination of the two will extremely difficult, if not impossible.
Essentially what you're thinking of is being able to steganographically embed something in an image, print the image, make a colour photocopy of it, scan it in, and still be able to extract the embedded data.
I'm afraid I can't help, but if anyone achieves this, I'll be DAMN impressed! :)
It's not a complete answer, but you should look at watermarking. This technique solves your first two goals (embedable in a printed image and readable even from partly damaged scan).
Part of watermarking's reliability to distortion and transcription errors (from going from digital to analog and back) come from redundancy (e.g. repeating the data several times). Those would make the watermark detectable even without a key. However, you might be able to use redundancy techniques that are more subtle, maybe something related to erasure coding or secret sharing.
I know that's not a complete answer, but hopefully those leads will point you in the right direction!
What language/environment are you using? It shouldn't be that hard to write code that opens both the image and semacode as a bitmap (the latter as a monochrome), sets the lowest bit(s) of each byte of each pixel in the color image to the value of the corresponding pixel of the monochrome bitmap.
(optionally expand the semacode bitmap first to the same pixel-dimensions extending with white)