Can someone help me to analyse this code on Total variation filter by Guy bilboa - image

Found this very interesting code on total variation filter tvmfilter
The additional functions this code uses are very confusing but the denoising is far better than all the filters i have tried so far
i have figured out the code on my own :)

His additional function "tv" denoises with the ROF model which has been a major research topic for two decades now. See http://www.ipol.im/pub/algo/g_tv_denoising/ for a summary of current methods.
Briefly, the idea behind ROF is to approximate the given noisy image with a piecewise constant image by solving an optimization which penalizes the total variation (ie l1-norm of the gradient) of the image.
The reason this performs well is that the other denoising methods you are probably working with denoise by smoothing the image via convolution with a Gaussian (ie penalizing the l2-norm of the gradient (ie solving the heat equation on the image) ). While fast to compute, denoising by smoothing blurs edges and thus results in poor image quality. l1-norm optimization preserves edges.
It's not clear how Guy solves the tv problem in that code you linked. He references the original ROF paper so it's possible that he's just using the original method (gradient descent) which is quite slow to converge. I suggest you give this code/paper a try: http://www.stanford.edu/~tagoldst/Tom_Goldstein/Split_Bregman.html as it's probably faster than the .m file you are using.
Also, as was mentioned in the comments, you will get better denoising (ie higher SNR) using nonlocal means. However, it will take much longer for the nonlocal means algorithm to work as it requires that you search the entire image for similar patches and compute weights based on them.

Related

Stochastic Gradient Descent (Momentum) Formula Implementation C++

So I have an implementation for a neural network that I followed on Youtube. The guy uses SGD (Momentum) as an optimization algorithm and hyperbolic tangent as an activation function. I already changed the transfer function to Leaky ReLU (for the hidden layers) and Sigmoid (for the output layer).
But now I decided I should also change the optimization algorithm to Adam. And I ended up searching for SGD (Momentum) on Wikipedia for a deeper understanding of how it works and I noticed something's off. The formula the guy uses in the clip is different from the one on Wikipedia. And I'm not sure if that's a mistake, or not... The clip is one hour long, but I'm not asking you to watch the entire video, however I'm intrigued by the 54m37s mark and the Wikipedia formula, right here:
https://youtu.be/KkwX7FkLfug?t=54m37s
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum
So if you take a look at the guy's implementation and then at the Wikipedia link for SGD (Momentum) formula, basically the only difference is in delta weight's calculation.
Wikipedia states that you subtract from the momentum multiplied by the old delta weight, the learning rate multiplied by the gradient and the output value of the neuron. Whereas in the tutorial, instead of subtracting the guy adds those together. However, the formula for the new weight is correct. It simply adds the delta weight to the old weight.
So my question is, did the guy in the tutorial make a mistake, or is there something I am missing? Because somehow, I trained a neural network and it behaves accordingly, so I can't really tell what the problem is here. Thanks in advance.
I have seen momentum implemented in different ways. Personally, I followed this guide in the end: http://ruder.io/optimizing-gradient-descent
There, momentum and weights are updated separately, which I think makes it clearer.
I do not know enought about the variables in the video, so I am not sure about that, but the wikipedia version is deffinetly correct.
In the video, the gradient*learning_rate gets added instead of subtracted, which is fine if you calculate and propagate your error accordingly.
Also, where in the video says "neuron_getOutputVal()*m_gradient", if it is as I think it is, that whole thing is considered the gradient. What I mean is that you have to multiplicate what you propagate times the outputs of your neurons to get the actual gradient.
For gradient descent without momentum, once you have your actual gradient, you multiply it with a learning rate and subtract (or add, depending on how you calculated and propagated the error, but usually subtract) it from your weights.
With momentum, you do it as it says in the wikipedia, using the last "change to your weights" or "delta weights" as part of your formula.

Image segmentation with watershed thresholding

I have implemented the marker-less (so not like OpenCV) watershed algorithm proposed in a 1991 paper by Vincent and Soille.
I have also implemented a distance transform algorithm to apply it before watersheding.
It works well in a good number of cases, but sometimes it produces a little oversegmentation. I also corrected some of this with gauss filtering the distance-transform image.
I am planning to correct this by applying thresholding to the watersheds. Therefore considering watershed of only some height greater than a threshold
Considering that this paper is quite old (1991) I am wondering if anyone knows of papers or resources that explain something similar to what I intend to do.
Notes:
1) I am not using OpenCV. I am implementing all by myself from papers
2) I am going for a marker-less watersheding.

image registration(non-rigid \ nonlinear)

I'm looking for some algorithm (preferably if source code available)
for image registration.
Image deformation can't been described by homography matrix(because I think that distortion not symmetrical and not
homogeneous),more specifically deformations are like barrel/distortion and trapezoid distortion, maybe some rotation of image.
I want to obtain pairs of pixel of two images and so i can obtain representation of "deformation field".
I google a lot and find out that there are some algorithm base on some phisics ideas, but it seems that they can converge
to local maxima, but not global.
I can affort program to be semi-automatic, it means some simple user interation.
maybe some algorithms like SIFT will be appropriate?
but I think it can't provide "deformation field" with regular sufficient density.
if it important there is no scale changes.
example of complicated field
http://www.math.ucla.edu/~yanovsky/Research/ImageRegistration/2DMRI/2DMRI_lambda400_grid_only1.png
What you are looking for is "optical flow". Searching for these terms will yield you numerous results.
In OpenCV, there is a function called calcOpticalFlowFarneback() (in the video module) that does what you want.
The C API does still have an implementation of the classic paper by Horn & Schunck (1981) called "Determining optical flow".
You can also have a look at this work I've done, along with some code (but be careful, there are still some mysterious bugs in the opencl memory code. I will release a corrected version later this year.): http://lts2www.epfl.ch/people/dangelo/opticalflow
Besides OpenCV's optical flow (and mine ;-), you can have a look at ITK on itk.org for complete image registration chains (mostly aimed at medical imaging).
There's also a lot of optical flow code (matlab, C/C++...) that can be found thanks to google, for example cs.brown.edu/~dqsun/research/software.html, gpu4vision, etc
-- EDIT : about optical flow --
Optical flow is divided in two families of algorithms : the dense ones, and the others.
Dense algorithms give one motion vector per pixel, non-dense ones one vector per tracked feature.
Examples of the dense family include Horn-Schunck and Farneback (to stay with opencv), and more generally any algorithm that will minimize some cost function over the whole images (the various TV-L1 flows, etc).
An example for the non-dense family is the KLT, which is called Lucas-Kanade in opencv.
In the dense family, since the motion for each pixel is almost free, it can deal with scale changes. Keep in mind however that these algorithms can fail in the case of large motions / scales changes because they usually rely on linearizations (Taylor expansions of the motion and image changes). Furthermore, in the variational approach, each pixel contributes to the overall result. Hence, parts that are invisible in one image are likely to deviate the algorithm from the actual solution.
Anyway, techniques such as coarse-to-fine implementations are employed to bypass these limits, and these problems have usually only a small impact. Brutal illumination changes, or large occluded / unoccluded areas can also be explicitly dealt with by some algorithms, see for example this paper that computes a sparse image of "innovation" alongside the optical flow field.
i found some software medical specific, but it's complicate and it's not work with simple image formats, but seems that it do that I need.
http://www.csd.uoc.gr/~komod/FastPD/index.html
Drop - Deformable Registration using Discrete Optimization

Explaining the AdaBoost Algorithms to non-technical people

I've been trying to understand the AdaBoost algorithm without much success. I'm struggling with understanding the Viola Jones paper on Face Detection as an example.
Can you explain AdaBoost in laymen's terms and present good examples of when it's used?
Adaboost is an algorithm that combines classifiers with poor performance, aka weak learners, into a bigger classifier with much higher performance.
How does it work? In a very simplified manner:
Train a weak learner.
Add it to the set of weak learners trained so far (with an optimal weight)
Increase the importance of samples that are still miss-classified.
Go to 1.
There is a broad and detailed theory behind the scenes, but the intuition is just that: let each "dumb" classifier focus on the mistakes the previous ones were not able to fix.
AdaBoost is one of the most used algorithms in the machine learning community. In particular, it is useful when you know how to create simple classifiers (possibly many different ones, using different features), and you want to combine them in an optimal way.
In Viola and Jones, each different type of weak-learner is associated to one of the 4 or 5 different Haar features you can have.
AdaBoost uses a number of training sample images (such as faces) to pick a number of good 'features'/'classifiers'. For face recognition a classifiers is typically just a rectangle of pixels that has a certain average color value and a relative size. AdaBoost will look at a number of classifiers and find out which one is the best predictor of a face based on the sample images. After it has chosen the best classifier it will continue to find another and another until some threshold is reached and those classifiers combined together will provide the end result.
This part you may not want to share with non-technical people :) but it is interesting anyway. There are several mathematical tricks which make AdaBoost fast for face recognition such as the ability to add up all the color values of an image and store them in a 2 dimensional array so that the value in any position will be the sum of all the pixels up and to the left of that position. This array can be used to very quickly calculate the average color value of any rectangle within the image by subtracting the value found in the top left corner from the value found in the bottom right corner and dividing by the number of pixels in the rectangle. Using this trick you can quickly scan over an entire image looking for rectangles of different relative sizes that match or are close to a particular color.
Hope this helps.
This is understandable. Most of the papers you can find on Internet retell Viola-Jones and Freund-Shapire papers which are foundation of AdaBoost applied for face recognition in OpenCV. And they mostly consist of difficult formulas and algorithms from several mathematical areas combined.
Here is what can help you (short enough) -
1 - It is used in object and, mostly, in face detection-recognition.The most popular and quite good C++ library is OpenCV from Intel originally. I take the part of Face detection in OpenCV, as an example.
2 - First, a cascade of boosted classifiers working with sample rectangles ("features") is trained on sample of images with faces (called positive) and without faces (negative).
From some Googled paper:
"· Boosting refers to a general and provably effective method of producing a very accurate classifier by combining rough and moderately inaccurate rules of thumb.
· It is based on the observation that finding many rough rules of thumb can be a lot easier than finding a single, highly accurate classifier.
· To begin, we define an algorithm for finding the rules of thumb, which we call a weak learner.
· The boosting algorithm repeatedly calls this weak learner, each time feeding it a different distribution over the training data (in AdaBoost).
· Each call generates a weak classifier and we must combine all of these into a single classifier that, hopefully, is much more accurate than any one of the rules."
During this process the images are scanned to determine the distinctive areas corresponding to certain part of every face. The complex calculation-hypothesis based algorithms are applied (which are not so difficult to understand once you get the main idea).
This can take a week and the output is an XML file which contains the learned information on how to quickly detect the human face, say, in frontal position on any picture (it can be any object in other case).
3 - After that you supply this file to OpenCV face detection program which runs quite fast with up to 99% positive rate (depending on conditions).
As was mentioned here, the scanning speed can be increased greatly with technique known as "integral image".
And finally, these are helpful sources - Object Detection in OpenCV and
Generic Object Detection using AdaBoost from University of California, 2008.

Computing the difference between images

Do you guys know of any algorithms that can be used to compute difference between images?
Take this webpage for example http://tineye.com/ You give it a link or upload an image and it finds similiar images. I doubt that it compares the image in question against all of them (or maybe it does).
By compute I mean like what the Levenshtein_distance or the Hamming distance is for strings.
By no means do I need to the correct answer for a project or anything, I just found the website and got very curious. I know digg pays for a similiar service for their website.
The very simplest measures are going to be RMS-error based approaches, for example:
Root Mean Square Deviation
Peak Signal to Noise Ratio
These probably gel with your notions of distance measures, but their results are really only meaningful if you've got two images that are very close already, like if you're looking at how well a particular compression scheme preserved the original image. Also, the same result from either comparison can mean a lot of different things, depending on what kind of artifacts there are (take a look at the paper I cite below for some example photos of RMS/PSNR can be misleading).
Beyond these, there's a whole field of research devoted to image similarity. I'm no expert, but here are a few pointers:
A lot of work has gone into approaches using dimensionality reduction (PCA, SVD, eigenvalue analysis, etc) to pick out the principal components of the image and compare them across different images.
Other approaches (particularly medical imaging) use segmentation techniques to pick out important parts of images, then they compare the images based on what's found
Still others have tried to devise similarity measures that get around some of the flaws of RMS error and PSNR. There was a pretty cool paper on the spatial domain structural similarity (SSIM) measure, which tries to mimic peoples' perceptions of image error instead of direct, mathematical notions of error. The same guys did an improved translation/rotation-invariant version using wavelet analysis in this paper on WSSIM.
It looks like TinEye uses feature vectors with values for lots of attributes to do their comparison. If you hunt around on their site, you eventually get to the Ideé Labs page, and their FAQ has some (but not too many) specifics on the algorithm:
Q: How does visual search work?
A: Idée’s visual search technology uses sophisticated algorithms to analyze hundreds of image attributes such as colour, shape, texture, luminosity, complexity, objects, and regions.These attributes form a compact digital signature that describes the appearance of each image, and these signatures are calculated by and indexed by our software. When performing a visual search, these signatures are quickly compared by our search engine to return visually similar results.
This is by no means exhaustive (it's just a handful of techniques I've encountered in the course of my own research), but if you google for technical papers or look through proceedings of recent conferences on image processing, you're bound to find more methods for this stuff. It's not a solved problem, but hopefully these pointers will give you an idea of what's involved.
One technique is to use color histograms. You can use machine learning algorithms to find similar images based on the repesentation you use. For example, the commonly used k-means algorithm. I have seen other solutions trying to analyze the vertical and horizontal lines in the image after using edge detection. Texture analysis is also used.
A recent paper clustered images from picasa web. You can also try the clustering algorithm that I am working on.
Consider using lossy wavelet compression and comparing the highest relevance elements of the images.
What TinEye does is a sort of hashing over the image or parts of it (see their FAQ). It's probably not a real hash function since they want similar "hashes" for similar (or nearly identical) images. But all they need to do is comparing that hash and probably substrings of it, to know whether the images are similar/identical or whether one is contained in another.
Heres an image similarity page, but its for polygons. You could convert your image into a finite number of polygons based on color and shape, and run these algorithm on each of them.
here is some code i wrote, 4 years ago in java yikes that does image comparisons using histograms. dont look at any part of it other than buildHistograms()
https://jpicsort.dev.java.net/source/browse/jpicsort/ImageComparator.java?rev=1.7&view=markup
maybe its helpful, atleast if you are using java
Correlation techniques will make a match jump out. If they're JPEGs you could compare the dominant coefficients for each 8x8 block and get a decent match. This isn't exactly correlation but it's based on a cosine transfore, so it's a first cousin.

Resources