The SIFT descriptor is a local descriptor that introduced by David Lowe. This descriptor can be splitted up into multiple parts:
1- Constructing a scale space
2- LoG Approximation
3- Finding keypoints
4- Get rid of bad key points
5- Assigning an orientation to the keypoints
6- Generate SIFT features
So, my question is:
What is the computational complexity of SIFT descriptor? something like O(2n+logn)
Here's a paper that talks exactly about this.
The actual time complexity for a n by n image is apparently
where x is the neigborhood of the tile,pi,po are the number of input, output pins of the chip, α,β,γ are the feature fractions, and Γ0,Γ1,Γ2 are the input, compute, output clocks.
In the Bag of Features/Visual Words paradigm we have a vector V in k-dimensions, where V[i]=j if the i-th centroid (obtained by k-means algorithm) is the closest one among all the k-centroids for j visual descriptors (e.g. SIFT descriptors).
AFAIK, the resulting visual vector is very sparse (it means that most of entries are 0-value) since k is really big, but my question is: what is a reasonable value for k (and so the vector size)? Hundreds of dimensions? Thousands? Especially considering that k-means execution time depends from k.
Depends on your data, really. Here is the rule of thumb:
Too small K: your clusters will not represent for all patches.
Too large K: you may get quantization artifacts and probably overfitting.
Some weeks ago I've implemented a simple block matching stereo algorithm but the results had been bad. So I've searched on the Internet to find better algorithms. There I found the semi global matching (SGM), published by Heiko Hirschmueller. It gets one of the best results in relation to its processing time.
I've implemented the algorithm and got really good results (compared to simple block matching) as you can see here:
I've reprojected the 2D points to 3D by using the calculated disparity values with the following result
At the end of SGM I have an array with aggregated costs for each pixel. The disparity is equivalent to the index with the lowest cost value.
The problem is, that searching for the minimum only returns discrete values. This results in individually layers in the point-cloud. In other words: Round surfaces are cut into many layers (see point cloud).
Heiko mentioned in his paper, that it would be easy to get sub-pixel accuracy by fitting a polynomial function into the cost array and take the lowest point as disparity.
The problem is not bound to stereo vision, so in other words the task is the following:
given: An array of values, representing a polynomial function.
wanted: The lowest point of the polynomial function.
I don't have any idea how to do this. I need a fast algorithm, because I have to run this code for every pixel in the Image
For example: 500x500 Pixel with 60-200 costs each => Algorithm has to run 15000000-50000000 times!!).
I don't need a real time solution! My current SGM implementation (L2R and R2L matching, no cuda or multi-threading yet) takes about 20 seconds to process an image with 500x500 pixels ;).
I don't ask for libraries! I try to implement my own independent computer vision library :).
Thank you for your help!
With kind regards,
Finding the exact lowest point in a general polynomial is a hard problem, since it is equivalent to finding the root of the derivative of the polynomial. In particular, if your polynomial is of degree 6, the derivative is a quintic polynomial, which is known not to be solvable by radical. You therefore need to either: fit the function using restricted families for which computing the roots of the derivatives e.g. the integrals of prod_i(x-ri)p(q) where deg(p)<=4, OR
using an iterative method to find an APPROXIMATE minimum, (newton's method, gradient descent).
the complexity of SIFT feature extraction algorithm by Lowe
is their any web or something from where i can get it.
i think it is =>l.m ....where l is number of octaves and m is number of images in it.
want to make it comfirm... is it correct or not?
: need help in this regard
I have implemented a nice algorithm ("Non Local Means") for reducing noise in image.
It is based on it's Matlab implementation.
The problem with NLMeans is that the original algorithm is slow even on compiled languages like c/c++ and i am trying to run it using scripting language.
One of best solutions is to use improved Blockwise NLMeans algorithm which is ~60-80 times faster. The problem is that the paper which describes it is written in a complex mathematical language and it's really hard for me to understand an idea and program it
(yes, i didn't learn math at college).
That is why i am desperately looking for a pseudo code of this algorithm.
(modification of original Matlab implementation would be just perfect)
I admit, I was intrigued until I saw the result – 260+ seconds on a dual core, and that doesn't assume the overhead of a scripting language, and that's for the Optimized Block Non Local Means filter.
Let me break down the math for you – My idea of pseudo-code is writing in Ruby.
Non Local Means Filtering
Assume an image that's 200 x 100 pixels (20000 pixels total), which is a pretty tiny image. We're going to have to go through 20,000 pixels and evaluate each one on the weighted average of the other 19,999 pixels: [Sorry about the spacing, but the equation is interpreted as a link without it]
NL [v] (i) = ∑ w(i,j)v(j) [j ∈ I]
where 0 ≤ w(i,j) ≤ 1 and ∑j w(i,j) = 1
Understandably, this last part can be a little confusing, but this is really nothing more than a convolution filter the size of the whole image being applied to each pixel.
Blockwise Non Local Means Filtering
The blockwise implementation takes overlapping sets of voxels (volumetric pixels - the implementation you pointed us to is for 3D space). Presumably, taking a similar approach, you could apply this to 2D space, taking sets of overlapping pixels. Let's see if we can describe this...
NL [v] (ijk) = 1/|Ai|∑ w(ijk, i)v(i)
Where A is a vector of the pixels to be estimated, and similar circumstances as above are applied.
[NB: I may be slightly off; It's been a few years since I did heavy image processing]
In all likelihood, we're talking about reducing complexity of the algorithm at a minimal cost to reduction quality. The larger the sample vector, the higher the quality as well as the higher the complexity. By overlapping then averaging the sample vectors from the image then applying that weighted average to each pixel we're looping through the image far fewer times.
Loop through the image to collect the sample vectors and store their weighted average to an array.
Apply each weighted average (a number between 0 and 1) to each pixel times the pixels value.
Pretty simple, but the processing time is going to be horrid with larger images.
Final Thoughts
You're going to have to make some tough decisions. If you're going to use a scripting language, you're already dealing with significant interpretive overhead. It's far from optimal to use a scripting language for heavy duty image processing. If you're not processing medical images, in all likelihood, there are far better algorithms to use with lesser O's.
Hope this is helpful... I'm not terribly good at making a point clearly and concisely, so if I can clarify anything, let me know.
I have a database with 500,000 points in a 100 dimensional space, and I want to find the closest 2 points. How do I do it?
Update: Space is Euclidean, Sorry. And thanks for all the answers. BTW this is not homework.
There's a chapter in Introduction to Algorithms devoted to finding two closest points in two-dimensional space in O(n*logn) time. You can check it out on google books. In fact, I suggest it for everyone as the way they apply divide-and-conquer technique to this problem is very simple, elegant and impressive.
Although it can't be extended directly to your problem (as constant 7 would be replaced with 2^101 - 1), it should be just fine for most datasets. So, if you have reasonably random input, it will give you O(n*logn*m) complexity where n is the number of points and m is the number of dimensions.
That's all assuming you have Euclidian space. I.e., length of vector v is sqrt(v0^2 + v1^2 + v2^2 + ...). If you can choose metric, however, there could be other options to optimize the algorithm.
Use a kd tree. You're looking at a nearest neighbor problem and there are highly optimized data structures for handling this exact class of problems.
P.S. Fun problem!
You could try the ANN library, but that only gives reliable results up to 20 dimensions.
Run PCA on your data to convert vectors from 100 dimensions to say 20 dimensions. Then create a K-Nearest Neighbor tree (KD-Tree) and get the closest 2 neighbors based on euclidean distance.
Generally if no. of dimensions are very large then you have to either do a brute force approach (parallel + distributed/map reduce) or a clustering based approach.
Use the data structure known as a KD-TREE. You'll need to allocate a lot of memory, but you may discover an optimization or two along the way based on your data.
My friend was working on his Phd Thesis years ago when he encountered a similar problem. His work was on the order of 1M points across 10 dimensions. We built a kd-tree library to solve it. We may be able to dig-up the code if you want to contact us offline.
Here's his published paper: