I am currently reading over Yasutaka Furukawa et al.'s Paper "Accurate, Dense, and Robust Multi-View Stereopsis" (PDF available here), where they describe an MVS-algorithm for reconstructing a 3D point-cloud from images.
I do understand the concepts and the main steps, but there is one detail that I am struggling with. This may be because I am not an English native speaker, so maybe a small hint would be enough.
On page 4 of the linked source, in chapter 3.2 "Expansion", there is the definition of "n-adjacent" patches:
|(c(p)−c(p'))·n(p)|+|(c(p)−c(p'))·n(p')| < 2ρ_2
My question is about ρ_2, that is described as in the following:
[...] ρ_2 is determined automatically as the distance at the depth of the
midpoint of c(p) and c(p') corresponding to an image displacement of β1 pixels
in R(p).
I do not understand what "distance" in this context should be, and I do not understand the stated correspondence to the image displacement.
I know that this is a very specific question, but since this paper is somewhat popular I hoped, that there is somebody, that can help me.
Alright, I think I do get it now.
It just means, that ρ_2 is the distance you have to move in a plane, located as far away from the camera (depth) as the midpoint of c(p) and c(p'), so that you get a displacement of β1 pixels in the image showing the scene.
Related
I have never really done any optics stuff. Currently reading Optics by Hecht to get a deeper understanding of optics. I need to create a software that can take an image (a simple image, such as a red circle on a white background) and perform operations that will output an image that a person with Hyperopia (farsightedness) would see, when their eyes (or eye) are positioned on the centre of the circle. What algorithms can I use to model a lens for this purpose? Any reference to books, research papers, libraries appreciated.
[i deleted this post because i thought it was too light on details, but since no-one else is replying i've undeleted in case it helps. recently i've found that there's a scientific computing s.o. that might be a better place to ask - http://scicomp.stackexchange.com/]
it really depends on what you want to do.
for something as simple as a simulating what a farsighted person would see when looking at a (nearby) flat image, blurring (as suggested by Domi in comments) is probably fine.
things get progressively more complex when:
what is being imaged contains components at different distances (in simple terms, the blurring for each will be different)
you want to include exact effects of geometric aberrations (like chromatic aberrations on lenses)
you want to include wave-like effects (like diffraction)
for general classical aberrations you have to do physically accurate ray tracing. in practice you may find approximations that give good enough results in exchange for speed (for example, blurring is an extreme approximation). for wave-like effects i am unsure - i guess you extend ray tracing with path lengths.
my copy of hecht is very old, but in the geometrical optics section there's a section on ray tracing, and that whole chapter covers the theory.
remember that, even if blurring is good enough, you still have to work out how much blurring from the exact details of the case, and the geometries involved (basically, you want the point spread function for your system; then you likely approximate that with a gaussian).
I'm going to match the sketch face (drawing photo) in to the color photo. so for the research i want to find out what are the challenges that matching sketch drawing in to color faces. for now i have find out that
resolution pixel difference
texture difference
distance difference
and color (not much effect)
I want to know (in technical terms) what are other challenges and what are available OPEN CV and JAVA CV method and algorithms to overcome that challenges?
Here is some example of the sketches and the photos that are known to match them:
This problem is called multi-modal face recognition. There has been a lot of interest in comparing a high quality mugshot (modality 1) to low quality surveillance images (modality 2), another is frontal images to profiles, or pictures to sketches like the OP is interested in. Partial Least Squares (PLS) and Tied Factor Analysis (TFA) have been used for this purpose.
A key difficulty is computing two linear projections from the image in modality 1 (and modality 2) to a space where two points being close means that the individual is the same. This is the key technical step. Here are some papers on this approach:
Abhishek Sharma, David W Jacobs : Bypassing Synthesis: PLS for
Face Recognition with Pose, Low-Resolution and Sketch. CVPR
2011.
S.J.D. Prince, J.H. Elder, J. Warrell, F.M. Felisberti, Tied Factor
Analysis for Face Recognition across Large Pose Differences, IEEE
Patt. Anal. Mach. Intell, 30(6), 970-984, 2008. Elder is a specialist in this area and has a variety of papers on the topic.
B. Klare, Z. Li and A. K. Jain, Matching forensic sketches to
mugshot photos, IEEE Pattern Analysis and Machine Intelligence, 29
Sept. 2010.
As you can understand this is an active research area/problem. In terms using OpenCV to overcome the difficulties, let me give you an analogy: you need to build build a house (match sketches to photos) and you're asking how will having a Stanley hammer (OpenCV) will help. Sure, it will probably help. But you'll also need a lot of other resources: wood, time/money, pipes, cable, etc.
I think that James Elder's old work on the completeness of the edge map (using reconstruction by solving the Laplace equation) is quite relevant here. See the results at the end of this paper: http://elderlab.yorku.ca/~elder/publications/journals/ElderIJCV99.pdf
You could give Eigenfaces a try, though i never tested them with sketches i think they could a least be a good starting point for your research.
See Wiki: http://en.wikipedia.org/wiki/Eigenface and the Tutorial for OpenCV: http://docs.opencv.org/modules/contrib/doc/facerec/facerec_tutorial.html (including not only Eigenfaces!)
OpenCV can be used for feature extraction and machine learning required for this task. I guess you can start with the papers in the answers above, start with some basic features and prototype a classifier with OpenCV.
I guess you might also want to detect and match feature points on the faces. If you use this approach, you will have to do the feature point detectors on your own (training the Viola-Jones detector in OpenCV with your own data is an option).
My code to calculate the minimum translation vector using the Separating Axis Theorem works perfectly well, except when one of the polygons is completely contained by another polygon. I have scoured the internet for the solution to this problem and everyone just seems to ignore it ( http://www.codezealot.org/archives/55#sat-contain talks about this, but doesn't give a full solution...)
The pictures below is a screenshot from my program illustrating the problem. The translucent blue triangle is the position of the rectangle before the MTV is applied, and the other triangle is with the MTV applied.
It seems to me that the link you shared does give a solution to this. In your MTV calculation, you have to test for complete containment in a projection and change the calculations accordingly. (The pseudocode is in reference to figure 9 on that page.) Perhaps if you post your code, we can comment on why it isn't working.
I have a set of Points in 3D space.
The image below is an example:
I would like to turn these points into a surface. I just know the X,Y and Z values of the points.
For example, check out the image below, which shows a mesh of a human face generated from points in 3D space.
i googled so much but, what i found is some images and explaination
but no one has explained with practical aspect and practical example.
is there any good or best algorithms which help me to solve this problem.
Please....
Thaks...........
You want to do a Delaunay-Triangulation. See example application here: http://www.geometrylab.de/VoroGlide/.
I saw a question on reverse projecting 4 2D points to derive the corners of a rectangle in 3D space. I have a kind of more general version of the same problem:
Given either a focal length (which can be solved to produce arcseconds / pixel) or the intrinsic camera matrix (a 3x2 matrix that defines the properties of the pinhole camera model being used - it's directly related to focal length), compute the camera ray that goes through each pixel.
I'd like to take a series of frames, derive the candidate light rays from each frame, and use some sort of iterative solving approach to derive the camera pose from each frame (given a sufficiently large sample, of course)... All of that is really just massively-parallel implementations of a generalized Hough algorithm... it's getting the candidate rays in the first place that I'm having the problem with...
A friend of mine found the source code from a university for the camera matching in PhotoSynth. I'd Google around for it, if I were you.
That's a good suggestion... and I will definitely look into it (photosynth kind of resparked my interest in this subject - but I've been working on it for months for robochamps) - but it's a sparse implementation - it looks for "good" features (points in the image that should be easily identifiable in other views of the same image), and while I certainly plan to score each match based on how good the feature it's matching is, I want the full dense algorithm to derive every pixel... or should I say voxel lol?
After a little poking around, isn't it the extrinsic matrix that tells you where the camera actually is in 3-space?
I worked at a company that did a lot of this, but I always used the tools that the algorithm guys wrote. :)