What is the complexity of the following statement? - algorithm

I'm practicing for an upcoming test by completing past tests. One of the questions asks me to determine the worst-case time complexity (in Big-O) for an algorithm. I am unsure of the correctness of my thought process when looking at the following algorithm:
Adjusting the color values of each Pixel in a Picture with height N and width M.
If we consider a simpler case of adjusting color values of each VERTICAL (N) pixel in a picture then this algorithm would be simply O(N). When we factor in the WIDTH (M) then we need to multiply M * N because for every row of pixels there is a horizontal pixel. Thus I conclude that the above algorithm has a worst-time complexity of O(M * N).
Any help or hints would be greatly appreciated! Thank you!

Assuming that "Adjusting the color values of each Pixel" takes constant time, your reasoning is correct, since there are N*M pixels, the complexity is O(N*M).
For your information, to make your answer more complete, you should also mention the assumption, that is that you assume "Adjusting the color values of each Pixel" takes constant time. If that process (which is repeated N*M times) takes, say O(M), then the algorithm is O(N*M*M), since for each pixel you need to do an O(M) operation.

Related

Is there an algorithm to sort points in the plane in different orientations in linear time (with nonlinear preprocessing)?

I have a set of points in the plane that I want to sort based on when they encounter an arbitrary sweepline. An alternative definition is that I want to be able to sort them based on any linear combination of the x- and y-coordinates. I want to do the sorting in linear time, but am allowed to perform precomputation on the set of points in quadratic time (but preferably O(n log(n))). Is this possible? I would love a link to a paper that discusses this problem, but I could not find it myself.
For example, if I have the points (2,2), (3,0), and (0,3) I want to be able to sort them on the value of 3x+2y, and get [(0,3), (3,0), (2,2)].
Edit: In the comments below the question a helpful commenter has shown me that the naive algorithm of enumerating all possible sweeplines will give a O(n^2 log(n)) preprocessing algorithm (thanks again!). Is it possible to have a O(n log(n)) preprocessing algorithm?
First note, that enumerating all of the sweeplines takes O(n^2 log(n)), but then you have to sort the n^2 sweeplines. Doing that naively will take time O(n^3 log(n)) and space O(n^3).
I think I can get average performance down to O(n) with O(n^2 log*(n)) time and O(n^2) space spent on preprocessing. (Here log* is the iterated logarithm and for all intents and purposes it is a constant.) But this is only average performance, not worst case.
The first thing to note is that there are n choose 2 = n*(n-1)/2 pairs of points. As we rotate 360 degrees, each pair will cross the other twice, for at most O(n^2) different orderings and O(n^2) pair crossings between them. Also note that after a pair crosses, it does not cross again for 180 degrees. Over any range of less than 180 degrees, a given pair either will cross once or won't.
Now the idea is that we'll store a random O(n) of those possible orderings and which sweepline they correspond to. Between any sweepline and the next, we'll see O(n^2 / n) = O(n) pairs of points cross. Therefore both sorts are correct to on average O(1), and every inversion between the first and the order we want is an inversion between the first and second sorts. We'll use this to find our final sort in O(n).
Let me fill in details backwards.
We have our O(n) sweeplines precalculated. In time O(log(n)) we find the two nearest. Let's assume we find the following data structures.
pos1: Lookup from point to its position in sweepline 1.
points1: Lookup from position to the point there in sweepline 1.
points2: Lookup from position to the point there in sweepline 2.
We will now try to sort in time O(n).
We initialize the following data structures:
upcoming: Priority queue of points that could be next.
is_seen: Bitmap from position to whether we've added the point to upcoming.
answer: A vector/array/whatever you language calls it that will hold the answer at the end.
max_j: The farthest point in line 2 that we have added to upcoming. Starts at -1.
And now we do the following.
for i in range(n):
while is_seen[i] == 0:
# Find another possible point
max_j++
point = points2[max_j]
upcoming.add(point with where it is encountered as priority)
is_seen[pos1[point]] = 1
# upcoming has points1[i] and every point that can come before it.
answer.append(upcoming.pop())
Waving my hands vigorously, every point is put into upcoming once, and taken out once. On average, upcoming has O(1) points in it, so all operations average out to O(1). Since there are n points, the total time is O(n).
OK, how do we set up our sweeplines? Since we only care about average performance, we cheat. We randomly choose O(n) pairs of points. Each pair of points defines a sweepline. We sort those sweeplines in O(n log(n)).
Now we have to sort O(n) sweeplines. How do we do this?
Well we can sort a fixed number of them by any method we want. Let's pick 4 evenly chosen sweeplines and do that. (We actually only need to do the calculation 2x. We pick 2 pairs of points. We pick the sweepline where the first 2 cross, then the second 2 cross, then the other 2 sweeplines are at 180 degrees from the first 2, and therefore are just reversed order.) After that, we can use the algorithm above to sort a sweepline between 2 others. And do that through bisection to smaller and smaller intervals.
Now, of course, the sweeplines will not be as close as they were above. But let's note that if we expect the points to agree to within an average O(f(n)) places between the sweepline, then the heap will have O(f(n)) elements in it, and operations on it will take O(log(f(n))) time, and so we get the intermediate sweepline in O(n log(f(n)). How long is the whole calculation?
Well, we have kind of a tree of calculations to do. Let's divide the sweeplines by what level they are, the group them. The grouping will be the top:
1 .. n/log(n)
n/log(n) .. n/log(log(n))
n/log(log(n)) .. n/log(log(log(n)))
...and so on.
In each group we have O(n / log^k(n)) sweeplines to calculate. Each sweepline takes O(n log^k(n)) time to calculate. Therefore each level takes O(n^2). The number of levels is the iterated logarithm, log*(n). So total preprocessing time is O(n^2 log*(n)).

Is there a decision algorithm with time complexity of Ө(n²)?

Is there a decision problem with a time complexity of Ө(n²)?
In other words, I'm looking for a decision problem for which the best known solution has been proven to have a lower bound of N².
I thought about searching for the biggest number in matrix but the problem is that matrix is an input of O(n²) so the solution is linear.
It doesn't need to be known problem, a hypothetical one would suffice as well.
Does a close pair exist?
In any "difficult" metric space, given n points, does a pair exist in distance less than r, where r is an input parameter?
Intuitively proof:
Given that r is an input parameter, you have to search every point.
For a point, you have compute the distance to every other point, that's Θ(n).
For n points, you have n*Θ(n) = Ө(n²).
Time complexity: Ө(n²)

Computational complexity with "fixed dimension"

Once I read in a scientific paper:
The computational complexity of the algorithm is O(N^d), where N
is the number of data, d is the dimension. Hence with fixed
dimension, the algorithm complexity is polynomial.
Now, this made me think, that (if I'm not mistaken), big-O notation is defined in the number of binary inputs. Thus if I fix the dimension of data, it is natural to arrive to polynomial solution. Moreover, if I would also fix N, the number of input, I would arrive to an O(1) solution, see the connected post:
Algorithm complexity with input is fix-sized
My question is, if you think that this is a valid argument for polynomial complexity? Can one really fix one dimension and the input data and claim polynomial complexity?
Yes, that's a reasonable thing to do.
It really depends on the initial problem, but in most cases I would say fixing number of dimensions is reasonable. I would expect the paper to claim something like "polynomial complexity for practical purposes" or something like that or have some arguments presented why limiting d is reasonable.
You can compare with a solution with complexity O(d^N) where fixing the number of dimensions doesn't mean that the solution is polynomial. So the one presented is clearly better when d is small.
As a quick recall from university time.
Big-O notation is just a UPPER bound of how your algorithm perform.
Mathematically, f(x) is O(g(x)) means that there exists a constant k>0 and x0 such that
f(x) <= kg(x) for all x>x0
To answer your question, you cannot fix the N, which is the independent variable.
If you fix N, says <100, we can surely arrive O(1),
because according to the definition. We can set a large K to ensure f(N) <= kG(N) for all x (<100)
This only works for some algorithms. It is not clear to me, what the "dimension" should be in some cases.
E.g. SubSetSum is is NP-complete, therefor there is no algorithm known with polynomial complexity. But the input is just N numbers. You could also see it as N numbers of bit length d. but the algorithm still has a polynomial complexity.
Same holds for Shortest Vector Problem (SVP) for lattices. The input is a N x N Basis (lets say with integer entries) and you look for the shortest non zero vector. This is also a hard problem and no algorithm with polynomial complexity is known yet.
For many problems its not just the size of the input data that makes the problem difficult, but certain properties or parameters of that data. E.g. many graph problems have complexity given in the number of nodes and edges separately.
Sometimes, the difference between this parameters might be dramatic, for example if you have something like O(n^d) the complexity is just polynomial when n grows, but exponential when d grows.
If you now happen to have an application, where you know that the value of a parameter like the dimension is always the same or there is a (small) maximal value, then regarding this parameter as fixed can give you useful inside. So statements like these are very common in scientific papers.
However, you can not just fix any parameter, e.g. your memory is finite, therefore sorting of data is constant time, because the bound on that parameter is so large that viewing it as fixed does not give you any useful insight.
So fixing all parameters is usually not an option because there has to be one aspect in which the size of your data varies. It can be an option if your complexity is very slow growing.
E.g. data structures with O(log n) operations are sometimes considered to have effectively constant complexity if the constant is also quite small. Or data structures as union-find-structures where amortized complexity of the operations is O(α(n)) where α is the inverse of the Ackermann-function, a function growing so slowly that it is impossible to get above 10 or for any size n any hardware imaginable could possible ever handle.

Big O: What is the relationship between O(log(n)) with real time

I've calculated the real time in millisecond for my algo.
I've plotted a graph comparing the actual time taken by my algorithm in Milli-Seconds(y-axis) to 'n'(x-axis) where n is the number of nodes in the tree I'm working on.
How do I relate this graph to O(log(n)) if my algorithm should ideally have a O(log(n)) complexity.
Assuming your algorithm is in O(log n) the graphs should make for a nice comparison. But don't plot log n, you need k * log n + c for some constants k and c.
The constant k describes the duration of a single step of your algorithm, whereas c summaries all constant (initialization) cost.
Depending on what you want to achieve and your algorithm / implementation you might see effects like processor cache misses, garbage collection or similar stuff with increasing n.
In case you can save n,log(n),runtime(n):
You can use 3 visualization approaches (I used Excel since it is easy and fast):
Draw a QQ-plot between log(n) and your run time:
This figure shows you the difference between the 'Theoretical' run time function and the 'Empirical' run time function. A straight line (or close) implies that they are close.
Draw two plots on the same graph: the horizontal axes is n, and the two functions are log(n) and the run time you obtained for each n:
The third analysis is the statistical approach: plot a graph where the horizontal axis is n, and the vertical axes is runtime(n). Now, add a logarithmic trend line and Rsquare.
The trend line can give you the best a,b where runtime(n)=a*log(n)+b . The correlation between the runtime and log(n) is better as Rsquare gets higher.

How to know when Big O is Logarithmic?

My question arises from the post "Plain English Explanation of Big O". I don't know the exact meaning for logarithmic complexity. I know that I can make a regression between the time and the number of operations and calculate the X-squared value, and determine so the complexity. However, I want to know a method to determine it quickly on paper.
How do you determine logarithmic complexity? Are there some good benchmarks?
Not rigorous, but it you have an algorithm that is essentially dividing the work needed to be done by half on each iteration, then you have logarithmic complexity. The classic example is binary search.
Not sure if this is what you mean, but... logarithmic complexity usually arises when you're working with a spread-out data structure like a balanced binary tree, which contains 1 node at the root, 2 children, 4 grandchildren, 8 great-grandchildren, etc. Basically at each level the number of nodes gets multiplied by some factor (2) but still only one of those is involved in the iteration. Or as another example, a loop in which the index doubles at each step:
for (int i = 1; i < N; i *= 2) { ... }
Things like that are the signatures of logarithmic complexity.
Master theorem usually works.
If you just want to know about logarithmic Big Oh, be on the lookout for when your data is cut in half each step of the recurrence.
This is because if you are processing data that is 1/2 as big as the step before it, it is an infinite series.
Here is another way of saying it.
Suppose your algorithm is linear in the number of digits in the size of the problem. So, perhaps you have a new algorithm to factor a large number, that you can show to be linear in the number of digits. A 20 digit number thereby takes twice as long to factor as a 10 digit number using your algorithm. This would have log complexity. (And it would be worth something for the inventor.)
Bisection has the same behavior. It takes roughly 10 bisection steps to cut the interval length by a factor of 1024 = 2^10, but only 20 steps will cut the interval by a factor of 2^20.
Log complexity does not always mean an algorithm is fast on all problems. The linear factor in front of the O(log(n)) may be large. So your algorithm may be terrible on small problems, not becoming useful until the problem size is appreciably large that other algorithms die an exponential (or polynomial) death.

Resources