How are the mathematical equations for complex vector images calculated? - algorithm

If this question is more suited for the Mathematics stack exchange, please let me know.
I know the basic idea of vector images: images are represented by mathematical equations, rather than a bitmap, which makes them infinitely scalable.
Computing this seems straight forward for something like a straight line, but for one that is more complex, such as:
https://www.wolframalpha.com/input/?i=zoidberg-like+curve
I am wondering how would the program even begin to produce that equation as the output.
Does it break the image into little curve segments and try to approximate each one? What if a large, multi-segment part of an image could be represented efficiently using only one equation, but because the computer only "sees" one segment at a time, it doesn't realize this. Would the computer test every combination of segments?
I was just curious and wondering if anyone could provide a high-level description of the basic process.
As an example, consider an image represented by equation (1):
y = abs(x)
It could also be represented as (2):
y = -x (-inf, 0)
y = x (0, inf)
But you would only be able to figure out that it could be represented as (1) if you knew what the entire image looked like. If you were scanning from left to right and trying to represent the image as an equation, then you would end up with (2).

Related

Should one calculate QR decomposition before Least Squares to speed up the process?

I am reading the book "Introduction to linear algebra" by Gilbert Strang. The section is called "Orthonormal Bases and Gram-Schmidt". The author several times emphasised the fact that with orthonormal basis it's very easy and fast to calculate Least Squares solution, since Qᵀ*Q = I, where Q is a design matrix with orthonormal basis. So your equation becomes x̂ = Qᵀb.
And I got the impression that it's a good idea to every time calculate QR decomposition before applying Least Squares. But later I figured out time complexity for QR decomposition and it turned out to be that calculating QR decomposition and after that applying Least Squares is more expensive than regular x̂ = inv(AᵀA)Aᵀb.
Is that right that there is no point in using QR decomposition to speed up Least Squares? Or maybe I got something wrong?
So the only purpose of QR decomposition regarding Least Squares is numerical stability?
There are many ways to do least squares; typically these vary in applicability, accuracy and speed.
Perhaps the Rolls-Royce method is to use SVD. This can be used to solve under-determined (fewer obs than states) and singular systems (where A'*A is not invertible) and is very accurate. It is also the slowest.
QR can only be used to solve non-singular systems (that is we must have A'*A invertible, ie A must be of full rank), and though perhaps not as accurate as SVD is also a good deal faster.
The normal equations ie
compute P = A'*A
solve P*x = A'*b
is the fastest (perhaps by a large margin if P can be computed efficiently, for example if A is sparse) but is also the least accurate. This too can only be used to solve non singular systems.
Inaccuracy should not be taken lightly nor dismissed as some academic fanciness. If you happen to know that the problems ypu will be solving are nicely behaved, then it might well be fine to use an inaccurate method. But otherwise the inaccurate routine might well fail (ie say there is no solution when there is, or worse come up with a totally bogus answer).
I'm a but confused that you seem to be suggesting forming and solving the normal equations after performing the QR decomposition. The usual way to use QR in least squares is, if A is nObs x nStates:
decompose A as A = Q*(R )
(0 )
transform b into b~ = Q'*b
(here R is upper triangular)
solve R * x = b# for x,
(here b# is the first nStates entries of b~)

explaining camera matrix from fundamental matrix?

This is a follow-up to another stack overflow question, here:
3D Correspondences from fundamental matrix
Just like in that question, I am trying to get a camera matrix from a fundamental matrix, the ultimate goal being 3d reconstruction from 2d points. The answer given there is good, and correct. I just don't understand it. It says, quote, "If you have access to Hartley and Zisserman's textbook, you can check section 9.5.3 where you will find what you need." He also provides a link to source code.
Now, here's what section 9.5.3 of the book, among other things, says:
Result 9.12. A non-zero matrix F is the fundamental matrix
corresponding to a pair of camera matrices P and P if and only if PTFP
is skew-symmetric.
That, to me, is gibberish. (I looked up skew-symmetric - it means the inverse is its negative. I have no idea how that is relevant to anything.)
Now, here is the source code given (source):
[U,S,V] = svd(F);
e = U(:,3);
P = [-vgg_contreps(e)*F e];
This is also a mystery.
So what I want to know is, how does one explain the other? Getting that code from that statement seems like black magic. How would I, or anyone, figure out that "A non-zero matrix F is the fundamental matrix corresponding to a pair of camera matrices P and P if and only if PTFP is skew-symmetric." means what the code is telling you to do, which is basically
'Take the singular value decomposition. Take the first matrix. Take the third column of that. Perform some weird re-arrangment of its values. That's your answer.' How would I have come up with this code on my own?
Can someone explain to me the section 9.5.3 and this code in plain English?
Aha, that "PTFP" is actually something I have also wondered about and could not find the answer in literature. However, this is what I figured out:
The 4x4 skew-symmetric matrix you are mentioning is not just any matrix. It is actually the dual Plücker Matrix of the baseline (see also https://en.wikipedia.org/wiki/Pl%C3%BCcker_matrix). In other words, it only gives you the line on which the camera centers are located, which is not useful for reconstruction tasks as such.
The condition you mention is identical to the more popularized fact that the fundamental matrix for the view 1 & 0 is the negative transpose of the fundamental matrix for the views 0 & 1 (using MATLAB/Octave syntax here)
Consider first that the fundamental matrix maps a point x0 in one image to line l1 in the other
l1=F*x0
Next, consider that the transpose of the projection matrix back-projects a lines l1 in the image to a plane E in space
E=P1'*l1
(I find this beautifully simple and understated in most geometry / computer vision classes)
Now, I will use a geometric argument: Two lines are corresponding epipolar lines iff they correspond to the same epipolar plane i.e. the back-projection of either line gives the same epipolar plane. Algebraically:
E=P0'*l0
E=P1'*l1
thus (the important equation)
P0'*l0=P1'*l1
Now we are almost there. Let's assume we have a 3D point X and its two projections
x0=P0*X
x1=P1*X
and the epipolar lines
l1=F*x0
l0=-F'*x1
We can just put that into the important equation and we have for all X
P0'*-F'*P1*X=P1'*F*P0*X
and finally
P0'*-F'*P1=P1'*F*P0
As you can see, the left-hand-side is the negative transpose of the right-hand-side. So this matrix is a skew symmetric 4x4 matrix.
I also published these thoughts in Section II B (towards the end of the paragraph) in the following paper. It should also explain why this matrix is a representation of the baseline.
Aichert, André, et al. "Epipolar consistency in transmission imaging."
IEEE transactions on medical imaging 34.11 (2015): 2205-2219.
https://www.ncbi.nlm.nih.gov/pubmed/25915956
Final note to #john ktejik : skew-symmetry means that a matrix is identical to its negative transpose (NOT inverse transpose)

Snake cube puzzle correctness

TrialPay posted a programming question about a snake cube puzzle on their blog.
Recently, one of our engineers introduced us to the snake cube. A snake cube is a puzzle composed of a chain of cubelets, connected by an elastic band running through each cubelet. Each cubelet can rotate 360° about the elastic band, allowing various structures to be built depending upon the way in which the chain is initially constructed, with the ultimate goal of arranging the cubes in such a way to create a cube.
Example:
This particular arrangement contains 17 groups of cubelets, composed of 8 groups of two cubelets and 9 groups of three cubelets. This arrangement can be expressed in a variety of ways, but for the purposes of this exercise, let '0' denote pieces whose rotation does not change the orientation of the puzzle, or may be considered a "straight" piece, while '1' will denote pieces whose rotation changes the puzzle configuration, or "bend" the snake. Using that schema, the snake puzzle above could be described as 001110110111010111101010100.
Challenge:
Your challenge is to write a program, in any language of your choosing, that takes the cube dimensions (X, Y, Z) and a binary string as input, and outputs '1' (without quotes) if it is possible to solve the puzzle, i.e. construct a proper XYZ cube given the cubelet orientation, and '0' if the current arrangement cannot be solved.
I posted a semi-detailed explanation of the solution, but how do I determine if the program solves the problem? I thought about getting more test cases, but I ran into some problems:
The snake cube example from TrialPay's blog has the same combination as the picture on Wikipedia's Snake Cube page and www.mathematische-basteleien.de.
It's very tedious to manually convert an image into a string.
I tried to make a program that would churn out a lot of combinations:
#We should start at the binary representation of 16777216 (00100...), because
#lesser numbers have more than 2 consecutive 0s (000111...)
i = 16777216
solved = []
while i <= 2**27:
s = str(bin(i))[2:]
#Add 0s
if len(s) < 27:
s = '0'*(27-len(s)) + s
#Check if there are more than 2 consecutive 0s
print s
if s.find("000") != -1:
if snake_cube_solution(3, 3, 3, s) == 1:
solved.append(s)
i += 1
But it just takes forever to finish executing. Is there a better way to verify the program?
Thanks in advance!
TL;DR: This isn't a programming problem, but a mathematical one. You may be better served at math.stackexchange.com.
Since the cube size and snake length are passed as input, the space of inputs a checker program would need to verify is essentially infinite. Even though checking the solutions's answer for a single input is reasonable, brute forcing this check across the entire input space is clearly not.
If your solution fails on certain cases, your checker program can help you find these. However it can't establish your program's correctness: if your solution is actually correct the checker will simply run forever and leave you wondering.
Unfortunately (or not, depending on your tastes), what you are looking for is not a program but a mathematical proof.
(Proving) Algorithm correctness is itself an entire field of study, and you can spend a long time in it. That said, proof by induction is often applicable (especially for recursive algorithms.)
Other times, navigating between state configurations can be restated as optimizing a utility function. Proving things about the space being optimized (such as it has only one extrema) can then translate to a proof of program correctness.
Your state configurations in this second approach could be snake orientations, or they might be some deeper structure. For example, the general strategy underneath solving a Rubik's cube
isn't usually stated on literal cube states, but on expressions of a group of relevant symmetries. This is what I personally expect your solution will eventually play out as.
EDIT: Years later, I feel I should point out that for a given, fixed cube size and snake length, of course the search space is actually finite. You can write a program to brute-force check all combinations. If you were clever, you could even argue that the times to check a set of cases can be treated as a set of independent random variables. From this you could build a reasonable progress bar to estimate how (very) long your wait would be.
I think your assertion that there can not be three consecutive 0's is false. Consider this arrangement:
000
100
101
100
100
101
100
100
100
One of the problems I'm having with this puzzle is the notation. A 1 indicates that the cubelet can change the puzzle's orientation, but about which axis? In my example above, assume that the Y axis is vertical and the X axis is horizontal. A 1 on the left indicates the ability to rotate about the cubelet's Y axis, and a 1 on the right indicates the ability to rotate about the cubelet's X axis.
I think it's possible to construct an arrangement similar to that above, but with three 000 groups. But I don't have the notation for it. Clearly, the example above could be modified so that the first three lines are:
001
000
101
With the first segment's 1 indicating rotation about the Y axis.
I wrote a Java application for the same problem not long ago.
I used the backtracking algorithm for this.
You just have to do an recursive search through the whole cube checking what directions are possible. If you have found one, you can stop and print the solution (I chose to print out all solutions).
For the 3x3x3 cubes my program solved them in under a second, for the bigger ones it takes about five seconds up to 15 minutes.
I'm sorry I couldn't find any code right now.

Pseudo code for Blockwise Non Local Means noise reduction algorithm

I have implemented a nice algorithm ("Non Local Means") for reducing noise in image.
It is based on it's Matlab implementation.
The problem with NLMeans is that the original algorithm is slow even on compiled languages like c/c++ and i am trying to run it using scripting language.
One of best solutions is to use improved Blockwise NLMeans algorithm which is ~60-80 times faster. The problem is that the paper which describes it is written in a complex mathematical language and it's really hard for me to understand an idea and program it
(yes, i didn't learn math at college).
That is why i am desperately looking for a pseudo code of this algorithm.
(modification of original Matlab implementation would be just perfect)
I admit, I was intrigued until I saw the result – 260+ seconds on a dual core, and that doesn't assume the overhead of a scripting language, and that's for the Optimized Block Non Local Means filter.
Let me break down the math for you – My idea of pseudo-code is writing in Ruby.
Non Local Means Filtering
Assume an image that's 200 x 100 pixels (20000 pixels total), which is a pretty tiny image. We're going to have to go through 20,000 pixels and evaluate each one on the weighted average of the other 19,999 pixels: [Sorry about the spacing, but the equation is interpreted as a link without it]
NL [v] (i) = ∑ w(i,j)v(j) [j ∈ I]
where 0 ≤ w(i,j) ≤ 1 and ∑j w(i,j) = 1
Understandably, this last part can be a little confusing, but this is really nothing more than a convolution filter the size of the whole image being applied to each pixel.
Blockwise Non Local Means Filtering
The blockwise implementation takes overlapping sets of voxels (volumetric pixels - the implementation you pointed us to is for 3D space). Presumably, taking a similar approach, you could apply this to 2D space, taking sets of overlapping pixels. Let's see if we can describe this...
NL [v] (ijk) = 1/|Ai|∑ w(ijk, i)v(i)
Where A is a vector of the pixels to be estimated, and similar circumstances as above are applied.
[NB: I may be slightly off; It's been a few years since I did heavy image processing]
Algorithm
In all likelihood, we're talking about reducing complexity of the algorithm at a minimal cost to reduction quality. The larger the sample vector, the higher the quality as well as the higher the complexity. By overlapping then averaging the sample vectors from the image then applying that weighted average to each pixel we're looping through the image far fewer times.
Loop through the image to collect the sample vectors and store their weighted average to an array.
Apply each weighted average (a number between 0 and 1) to each pixel times the pixels value.
Pretty simple, but the processing time is going to be horrid with larger images.
Final Thoughts
You're going to have to make some tough decisions. If you're going to use a scripting language, you're already dealing with significant interpretive overhead. It's far from optimal to use a scripting language for heavy duty image processing. If you're not processing medical images, in all likelihood, there are far better algorithms to use with lesser O's.
Hope this is helpful... I'm not terribly good at making a point clearly and concisely, so if I can clarify anything, let me know.

What do you think of this interest point detection algorithm?

I've been trying to come up with an interest point detection algorithm and this is what I came up with:
You go through the X and the Y axises 3n pixels at a time creating 3n x 3n squares.
For the the n x n square in the middle of the 3n x 3n square (let's call it square Z), the R, G, and B values are averaged and rounded to preset values to limit the number of colors, and that is the color that square will be treated as.
The same is done for the 8 surrounding n x n squares.
After that, the color of square Z is compared to the surrounding squares, if it matches x out of the 8 surrounding squares where x <= 3 or x => 5 then that is an interest point (a corner is detected).
And so on till all the image is covered.
The bigger n is, the faster the image will be scanned and the the less accurate the detection is, and vice versa.
This, supposedly, detects "literal corners", that is corners you can actually SEE on the image.
What do you think of this algorithm? Is it efficient? Can it be used on a live video stream (say from the camera) on a hand-held device?
I'm sorry to say that I don't think this is likely to be very good. Your algorithm looks a bit like a simplistic version of Moravec's algorithm, which is itself one of the simplest corner detection algorithms. The hardcoded limits you test against effectively make your edge test a stepped function, unlike an approach such as summed square differences. This will almost certainly give you discontinuities in your detection function (corners that don't match when they should have), for some values.
You also have the same problem as Moravec, namely that if the edge lies at an angle to the direction of neighbours being considered, then it won't be detected.
Developing algorithms is fun, and if this isn't a business-critical project, then by all means, carry on tinkering and experimenting (and don't be put off by my comments!). But the fact is, for almost any practical problem, a better algorithm for the task you want to solve almost certainly already exists. The real challenge is identifying how you can best model your problem in such a way that you can solve it using an existing, well-understood approach, designed by experts.
In particular, robust identification and analysis of edge-cases and worst-case runtimes is a tricky business; unless you are a professional algorist, you are likely to find the going difficult. But I certainly encourage you to discover this for yourself by trying. nlucaroni mentions some excellent questions to use as starting points for your analysis.
Why not try it and see if it works the way you expect? It sounds like it should. How does the performance compare with other methods? What is the complexity of the algorithm? Is it efficient compared to others? Where can it be improved? What kind of false-positives and false negatives are expected? Are they within reason based on the data I plan to use this on? What threshold should be used to compare surrounding squares? ....
this is stuff you should be doing, not us.
I would suggest you look at the SIFT algorithm. Its the defacto standard for points of interest in an image. Unfortunately, its also patented, because its so good.
If you are interested in a real time version of SIFT you can get it to run on a GPU, but its highly experimental at this point. Note if you are developing a commercial application you'd have to first purchase a license for using SIFT or get approval from David Lowe.

Resources