How to find out time complexity is exponential? - algorithm

I run an implemented algorithm. I captured the running time based on each input data. For example in image below first column is the input size and second column is running time based on input size. Is there anyway to find that the time complexity of this algorithm is exponential based on input and running time?
Thanks

At the first, you should rely on analysis of algorithm.
The second - data range is too short to reliably determine curve behavior.
In general case you could calculate logarithm of the second column values. For exponent a plot of Log(F(x)) versus x should be roughly linear, because (formula is edited)
Log(A * B^(C * x)) = Log(A) + x * (C / Log(B))

What is the problem you are trying to solve?
Do you want to see if it's exponential for this particular instance or are you attempting to figure out a generic way of doing this?
If first one,
Use:
http://www.shodor.org/interactivate/activities/SimplePlot/
Put your points in.
1,53
2,97
3,155
4,259
5,452
6,920
Hit Plot.
From the shape of the graph it looks like it's exponential.
If you are trying to solve this in the generic way, watch:
https://www.khanacademy.org/math/algebra/introduction-to-exponential-functions/exponential-growth-and-decay/v/constructing-linear-and-exponential-functions-from-data
If you are guessing that it's exponential you can attempt to see what the params are for a given form of the function. You should also account for errors (ie you may get slightly different functions for different points)

Related

Calculate "moving" Covariance

I've been trying to figure out how to efficiently calculate the covariance in a moving window, i.e. moving from a set of values (x[0], y[0])..(x[n-1], y[n-1]) to a new set of values (x[1], y[1])..(x[n], y[n]). In other words, the value (x[0], y[0]) gets replaces by the value (x[n], y[n]). For performance reasons I need to calculate the covariance incrementally in the sense that I'd like to express the new covariance Cov(x[1]..x[n], y[1]..y[n]) in terms of the previous covariance Cov(x[0]..x[n-1], y[0]..y[n-1]).
Starting off with the naive formula for covariance as described here:
[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance][1]
All I can come up with is:
Cov(x[1]..x[n], y[1]..y[n]) =
Cov(x[0]..x[n-1], y[0]..y[n-1]) +
(x[n]*y[n] - x[0]*y[0]) / n -
AVG(x[1]..x[n]) * AVG(y[1]..y[n]) +
AVG(x[0]..x[n-1]) * AVG(y[0]..y[n-1])
I'm sorry about the notation, I hope it's more or less clear what I'm trying to express.
However, I'm not sure if this is sufficiently numerically stable. Dealing with large values I might run into arithmetic overflows or other (for example cancellation) issues.
Is there a better way to do this?
Thanks for any help.
It looks like you are trying some form of "add the new value and subtract the old one". You are correct to worry: this method is not numerically stable. Keeping sums this way is subject to drift, but the real killer is the fact that at each step you are subtracting a large number from another large number to get what is likely a very small number.
One improvement would be to maintain your sums (of x_i, y_i, and x_i*y_i) independently, and recompute the naive formula from them at each step. Your running sums would still drift, and the naive formula is still numerically unstable, but at least you would only have one step of numerical instability.
A stable way to solve this problem would be to implement a formula for (stably) merging statistical sets, and evaluate your overall covariance using a merge tree. Moving your window would update one of your leaves, requiring an update of each node from that leaf to the root. For a window of size n, this method would take O(log n) time per update instead of the O(1) naive computation, but the result would be stable and accurate. Also, if you don't need the statistics for each incremental step, you can update the tree once per each output sample instead of once per input sample. If you have k input samples per output sample, this reduces the cost per input sample to O(1 + (log n)/k).
From the comments: the wikipedia page you reference includes a section on Knuth's online algorithm, which is relatively stable, though still prone to drift. You should be able to do something comparable for covariance; and resetting your computation every K*n samples should limit the drift at minimal cost.
Not sure why no one has mentioned this, but you can use the Welford online algorithm which relies on the running mean:
The equations should look like:
the online mean given by:

How do I use MATLAB to solve this PDE

I have the following question on a practice exam:
I need to use MATLAB to solve it. The problem is, I have not seen a problem like this before and I'm struggling to get started.
I have my 1x1 grid, split into 10x10. I know I can calculate the whole bottom row besides the corners using 1/10 * x*2. I also know I can calculate the entire right row using (1/10)(1+t)^2. However, I cannot figure out how to get enough points to be able to fill in the values for the entire grid. I know it must have something to do with the partial derivatives given in the problem, but I'm not quite sure where they come into play (especially the u_x equation). Can someone help me get a start here?
I don't need the whole solution. Once I have enough points I can easily write a matlab program to solve the rest. Really, I think I just need the x=0 axis solved, then I just fill in the middle of the grid.
I have calculated the bottom row, minus the two corners, to be 0.001, 0.004, 0.009, 0.016, 0.025, 0.036, 0.049, 0.064, 0.081. And similarly, the entire right row is trival to calculate using the given boundry condition. I just can't piece together where to go from there.
Edit: the third boundry condition equation was mistyped. it should read:
u_x(0,t) = 1/5t, NOT u(0,t) = 1/5t
First realise that the equation you have to solve is the linear wave equation, and the numerical scheme you are given can be rewritten as
( u^(n+1)_m - 2u^n_m + u^(n-1)_m )/k^2 = ( u^n_(m-1) - 2u^n_m + u^n_(m+1) )/h^2
where k is the time step and h is the delta x in space.
The reformulated numerical scheme makes clear that the left- and right-hand sides are the second order centred finite difference approximations of u_tt and u_xx respectively.
To solve the problem numerically, however, you need to use the form given to you because it is the explicit update formula that you need to implement numerically: it gives you the solution at time n+1 as a function of the previous two times n and n-1. You need to start from the initial condition and march the solution in time.
Observe that the solution is assigned on the boundaries of the domain (x=0 and x=1), so the values of the discretized solution u^(n)_0 and u^(n)_10 are known for any n (t=n*k). At the nth time step your unknown is the vector [u^(n+1)_1, u^(n+1)_2, ..., u^(n+1)_9].
Observe also that to use the update formula to find the solution at the n+1 step, requires the knowledge of the solution at two previous steps. So, how do you start from n=0 if you need information from two previous times? This is where the initial conditions come into play.
You have the solution at n=0 (t=0), but you also have u_t at t=0. These two pieces of information combined can give you both u^0 and u^1 and get you started.
I would use the following start-up scheme:
u^0_m = u(h*m,0) // initial condition on u
(u^2_m - u^0_m)/(2k) = u_t(h*m,0) // initial condition on u_t
that combined with the numerical scheme used with n=1 gives you everything you need to define a linear system for both u^1_m and u^2_m for m=1,...,9.
To summarize:
--use the start-up scheme to find solution at n=1 and n=2 simultaneously.
--from there on march in time using the numerical scheme you are given.
If you are completely lost check out things like: finite difference schemes, finite difference schemes for advection equations, finite difference schemes for hyperbolic equations, time marching.
EDITING:
For the boundary condition on u_x you typically use the ghost cell method:
Introduce a ghost cell at m=-1, i.e. a fictitious (or auxiliary) grid point that is used to deal with boundary condition, but that is not part of the solution.
The first node m=0 is back into your unknown vector, i.e. you are now working with [u_0 u_1 ... u_9].
Use the left side boundary condition to close the system.
Specifically, by writing down the centered approx of the boundary condition
u^n_(1) - u^n_(-1) = 2*h*u_x(0,k*n)
The above equation allows you to express the solution on the ghost node in terms on the solution on an internal, real node. Therefore you can apply the time-marching numerical scheme (the one you are given) to the m=0 node. (The numerical scheme applied to m=0 would contain contributions from the m=-1 ghost node, but now you have that expressed in terms of the m=1 node.)

Mathematica - Solving for the input of a taylor series such that coefficients are minimized

I need to find the value of a variable s such that the taylor expansion of an expression involving s:
Has a minimum (preferably zero, but due to binary minimum is sufficient) in as many coefficients other than 0th order as possible (preferably more than that one minimum coefficient, but 2nd and 3rd have priority).
reports the best n values of s that fulfill the condition within the region (ie show me the 3 best values of s and what the coefficients look like for each).
I have no idea how to even get the output of a Series[] command into any other mathematica command without receiving an error, much less how to actually solve the problem. The equation I am working with is too complex to post here (multi-regional but continuous polynomial expression that can be expanded). Does anyone know what commands to use for this?
The first thing you should realize is that the output of Series is not a sum but a a SeriesData object. To convert it into a sum you have to wrap it in Normal[Series[...]]. Since the question doesn't provide details, I can't say more.

Determine if a set of data is from a linear or logarithmic function?

I have a set of data points and am curious if the data represents a linear function or a logarithmic function.
The data set is 2 dimensional.
Let's say an ideal set of data points followed the function f(x) = x. If I plotted the data point I would be able to tell it is linear.
Similarly if the data points followed the function f(x) = log(x), I would be able to visually tell it is logarithmic.
On the other hand, having the program determine if a set of data is linear or logarithmic is nontrivial. How would I approach this?
One option would be to do a linear regression on the data set to get a best-fit line. If the data is linear, you'll get a very good fit and the mean squared error should be low. Otherwise, you'll get an okay fit and a reasonable error.
Alternatively, you could consider transforming the data set by converting each point (x0, x1, ..., xn, y) to (x0, x1, ..., xn, ey). If the data was linear, now it will be exponential, and if the data was logarithmic, now it will be linear. Running a linear regression and getting the mean-squared error now will have a low error for the logarithmic data and a staggeringly huge error for the linear data, since the exponential function blows up extremely quickly.
To actually implement the regression, one option would be to use a least-squares regression. This would have the added benefit of giving you a correlation coefficient in addition to the model, which could also be used to distinguish between the two data sets.
Because you've asked for how to do this in Java, a quick Google search turned up this Java code to do a linear regression. However, you might have a better fit in a language like Matlab that is specifically optimized to do these sorts of queries. For example, in Matlab, you can do this regression in one line of code by writing
linearFunction = inputs / outputs
Hope this helps!

What's a good weighting function?

I'm trying to perform some calculations on a non-directed, cyclic, weighted graph, and I'm looking for a good function to calculate an aggregate weight.
Each edge has a distance value in the range [1,∞). The algorithm should give greater importance to lower distances (it should be monotonically decreasing), and it should assign the value 0 for the distance ∞.
My first instinct was simply 1/d, which meets both of those requirements. (Well, technically 1/∞ is undefined, but programmers tend to let that one slide more easily than do mathematicians.) The problem with 1/d is that the function cares a lot more about the difference between 1/1 and 1/2 than the difference between 1/34 and 1/35. I'd like to even that out a bit more. I could use √(1/d) or ∛(1/d) or even ∜(1/d), but I feel like I'm missing out on a whole class of possibilities. Any suggestions?
(I thought of ln(1/d), but that goes to -∞ as d goes to ∞, and I can't think of a good way to push that up to 0.)
Later:
I forgot a requirement: w(1) must be 1. (This doesn't invalidate the existing answers; a multiplicative constant is fine.)
perhaps:
exp(-d)
edit: something along the lines of
exp(k(1-d)), k real
will fit your extra requirement (I'm sure you knew that but what the hey).
How about 1/ln (d + k)?
Some of the above answers are versions of a Gaussian distribution which I agree is a good choice. The Gaussian or normal distribution can be found often in nature. It is a B-Spline basis function of order-infinity.
One drawback to using it as a blending function is its infinite support requires more calculations than a finite blending function. A blend is found as a summation of product series. In practice the summation may stop when the next term is less than a tolerance.
If possible form a static table to hold discrete Gaussian function values since calculating the values is computationally expensive. Interpolate table values if needed.
How about this?
w(d) = (1 + k)/(d + k) for some large k
d = 2 + k would be the place where w(d) = 1/2
It seems you are in effect looking for a linear decrease, something along the lines of infinity - d. Obviously this solution is garbage, but since you are probably not using a arbitrary precision data type for the distance, you could use yourDatatype.MaxValue - d to get a linear decreasing function for this.
In fact you might consider using (yourDatatype.MaxValue - d) + 1 you are using doubles, because you could then assign the weight of 0 if your distance is "infinity" (since doubles actually have a value for that.)
Of course you still have to consider implementation details like w(d) = double.infinity or w(d) = integer.MaxValue, but these should be easy to spot if you know the actual data types you are using ;)

Resources