Confidence bounds for coefficients of a fit of data set obtained with another fit - curve-fitting

I fitted an equation to a set of data points. Then I substracted the fit previously obtained to another set of data points. After that, I fitted another equation to this new data (result of the substraction). What happens with the confidence bounds of the coefficients of this new fit?
The two equations fitted are quadratic.

Related

Principal Component Analysis - Dimensionality Reduction

When we talk about PCA we say that we use it to reduce the dimensionality of the data. I have 2-d data, and using PCA reduced the dimensionality to 1-d.
Now,
The first component will be in such a way that it captures the maximum variance. What does it mean that the 1st component has max. variance?
Also, if we take 3-d data and reduce its dimensionality to 2-d then the 1st component will be built with max variance along the x-axis or y-axis?
PCA works by first centering the data at the origin (subtracting the mean from each data point), and then rotating it to be in line with the axes (diagonalizing the covariance matrix into a “variance” matrix). The components are then sorted so that the diagonal of the variance matrix is in descending order, which translates to the first component having the largest variance, the second having the next largest variance, etc. Later, you squish your original data by zero-ing out less important components (projecting onto principal components), and then undoing the aforementioned transformations.
To answer your questions:
The first component having the max variance means that its corresponding entry in the variance matrix is the largest one.
I suppose it depends on what you call your axes.
Source: Probability and Statistics for Computer Science by David Forsyth.

Inverse of Laplacian and Gaussian Noise

Given a set of data points, I modify the data points by adding a Laplacian or a Gaussian Noise to them.
I am wondering if there exist mathematical inverse functions able to derive the original data points from the ones with noise.
My understanding is that, we can reconstruct only an estimation of the original data points that have a certain probability p of being equal to the original data points.
If this is the case, how to calculate such a probability p?

Uncertainty on pose estimate when minimizing measurement errors

Let's say I want to estimate the camera pose for a given image I and I have a set of measurements (e.g. 2D points ui and their associated 3D coordinates Pi) for which I want to minimize the error (e.g. the sum of squared reprojection errors).
My question is: How do I compute the uncertainty on my final pose estimate ?
To make my question more concrete, consider an image I from which I extracted 2D points ui and matched them with 3D points Pi. Denoting Tw the camera pose for this image, which I will be estimating, and piT the transformation mapping the 3D points to their projected 2D points. Here is a little drawing to clarify things:
My objective statement is as follows:
There exist several techniques to solve the corresponding non-linear least squares problem, consider I use the following (approximate pseudo-code for the Gauss-Newton algorithm):
I read in several places that JrT.Jr could be considered an estimate of the covariance matrix for the pose estimate. Here is a list of more accurate questions:
Can anyone explain why this is the case and/or know of a scientific document explaining this in details ?
Should I be using the value of Jr on the last iteration or should the successive JrT.Jr be somehow combined ?
Some people say that this actually is an optimistic estimate of the uncertainty, so what would be a better way to estimate the uncertainty ?
Thanks a lot, any insight on this will be appreciated.
The full mathematical argument is rather involved, but in a nutshell it goes like this:
The outer product (Jt * J) of the Jacobian matrix of the reprojection error at the optimum times itself is an approximation of the Hessian matrix of least squares error. The approximation ignores terms of order three and higher in the Taylor expansion of the error function at the optimum. See here (pag 800-801) for proof.
The inverse of the Hessian matrix is an approximation of the covariance matrix of the reprojection errors in a neighborhood of the optimal values of the parameters, under a local linear approximation of parameters-to-errors transformation (pag 814 above ref).
I do not know where the "optimistic" comment comes from. The main assumption underlying the approximation is that the behavior of the cost function (the reproj. error) in a small neighborhood of the optimum is approximately quadratic.

Understanding Gradient Descent Algorithm

I'm learning Machine Learning. I was reading a topic called Linear Regression with one variable and I got confused while understanding Gradient Descent Algorithm.
Suppose we have given a problem with a Training Set such that pair $(x^{(i)},y^{(i)})$ represents (feature/Input Variable, Target/ Output Variable). Our goal is to create a hypothesis function for this training set, Which can do prediction.
Hypothesis Function:
$$h_{\theta}(x)=\theta_0 + \theta_1 x$$
Our target is to choose $(\theta_0,\theta_1)$ to best approximate our $h_{\theta}(x)$ which will predict values on the training set
Cost Function:
$$J(\theta_0,\theta_1)=\frac{1}{2m}\sum\limits_{i=1}^m (h_{\theta}(x^{(i)})-y^{(i)})^2$$
$$J(\theta_0,\theta_1)=\frac{1}{2}\times Mean Squared Error$$
We have to minimize $J(\theta_0,\theta_1)$ to get the values $(\theta_0,\theta_1)$ which we can put in our hypothesis function to minimize it. We can do that by applying Gradient Descent Algorithm on the plot $(\theta_0,\theta_1,J(\theta_0,\theta_1))$.
My question is how we can choose $(\theta_0,\theta_1)$ and plot the curve $(\theta_0,\theta_1,J(\theta_0,\theta_1))$. In the online lecture, I was watching. The instructor told everything but didn't mentioned from where the plot will come.
At each iteration you will have some h_\theta, and you will calculate the value of 1/2n * sum{(h_\theta(x)-y)^2 | for each x in train set}.
At each iteration h_\theta is known, and the values (x,y) for each train set sample is known, so it is easy to calculate the above.
For each iteration, you have a new value for \theta, and you can calculate the new MSE.
The plot itself will have the iteration number on x axis, and MSE on y axis.
As a side note, while you can use gradient descent - there is no reason. This cost function is convex and it has a singular minimum that is well known: $\theta = (X^T*X)^{-1)X^Ty$, where yis the values of train set (1xn dimension for train set of size n), and X is 2xn matrix where each line X_i=(1,x_i).

Variable radius Gaussian blur, approximating the kernel

I'm writing a Gaussian blur with variable radius (standard deviation), i.e. each pixel of the image is convolved using a different kernel. The standard techniques to compute Gaussian blur don't work here: FFT, axis-separation, repeated box-blur—they all assume that the kernel is the same for the whole image.
Now, I'm trying to approximate it using the following scheme:
Approximate the Gaussian kernel K(x,y) with a piecewise constant function f(x,y) defined by a set N of axis-aligned rectangles Rk and coefficients αk as:
    f(x,y) = ∑k=1N αk·χRk(x,y)
Let g(x,y) be our image, then
    ∬ℝ2 K(x,y)·g(x,y) dxdy ≈
∬ℝ2 f(x,y)·g(x,y) dxdy = ∑k=1N αk·∬Rkg(x,y) dxdy
The integral on the RHS is a simple integral over a rectangle, and as such can be computed in constant time by precomputing the partial sums for the whole image.
The resulting algorithm runs in O(W·H·N) where W and H are the dimensions of the image and N is (AFAIK) inverse proportional to the error of the the approximation.
The remaining part is to find a good approximation function f(x,y). How to find the optimal approximation to the Gaussian when given either the number of rectangles N (minimizing the error) or given the error (minimizing the number of rectangles)?
Given the locations and size of the rectangles, it should be fairly easy to work out your coefficients, so the real problem is working out where to put the rectangles.
Since you are approximating a Gaussian, it seems at least reasonable to restrict our attention to rectangles whose centre coincides with the centre of the Gaussian, so we actually have only a 1-dimensional problem - working out the sizes of a nested set of rectangles which I presume are either squares or are similar to the Gaussian if you have an aspect ratio other than unity.
This can be solved via dynamic programming. Suppose that you work from the outside into the middle. At stage N you have worked out an n x k table that gives you the best possible approximation error coming from 1,2...N rings of outer pixels for up 1,2,..k different rectangles, and the size of the innermost rectangle responsible for that best error. To work out stage N+1 you consider every possible size for what will be an innermost rectangle so far, contributing x rings of pixels to the outer area. You work out the alpha for that rectangle that gives the best fit for the pixels in the new ring and the rings outside it not left to outer rectangles. Using the values in the table already calculated you know the best possible error you will get when you leave up to k outer rectangles to cover those areas, so you can work out the best total error contributed from what is now N+1 rings of pixels. This allows you to fill in the table entries for N+1 outer pixels. When you have worked your way into the middle of the area you will be able to work out the optimal solution for the whole area.

Resources